pdf-icon

DeepSeek-R1-Distill-Qwen-1.5B

Introduction

DeepSeek-R1-Distill-Qwen-1.5B is fine-tuned based on open-source models, using samples generated by DeepSeek-R1. with 0.5 billion parameters. Key highlights of this model include:

  • Type: Causal Language Model
  • Training Stage: Pretraining & Post-training
  • Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings
  • Number of Parameters: 1.54B (1.31B non-embedding)
  • Number of Layers: 28
  • Number of Attention Heads (GQA): 12 for Q and 2 for KV
  • Context Length: Full 131,072 tokens and generation up to 8,192 tokens

Available NPU Models

Base Model

deepseek-r1-1.5B-ax630c

The Base Model providing a 128 context window and a maximum output of 1,024 tokens.

Support Platforms: LLM630 Compute Kit, Module LLM, and Module LLM Kit

  • 128 context window
  • 1,024 max output tokens
  • ttft 1075.04ms
  • avg-token/s 3.57

Install

apt install llm-model-deepseek-r1-1.5b-ax630c

Long-Context Model

deepseek-r1-1.5B-p256-ax630c

The Long-Context Model Compared to the Base Model, it provides extended context capabilities, offering a 256 context window and a maximum of 1,024 output tokens.

Support Platforms: LLM630 Compute Kit, Module LLM, Module LLM Kit

  • 256 context window
  • 1,024 max output tokens
  • ttft 3056.86ms
  • avg-token/s 3.57

Install

apt install llm-model-deepseek-r1-1.5b-p256-ax630c
On This Page