DeepSeek-R1-Distill-Qwen-1.5B

Introduction

DeepSeek-R1-Distill-Qwen-1.5B is fine-tuned based on open-source models, using samples generated by DeepSeek-R1. with 0.5 billion parameters. Key highlights of this model include:

Type: Causal Language Model
Training Stage: Pretraining & Post-training
Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings
Number of Parameters: 1.54B (1.31B non-embedding)
Number of Layers: 28
Number of Attention Heads (GQA): 12 for Q and 2 for KV
Context Length: Full 131,072 tokens and generation up to 8,192 tokens

Available NPU Models

Base Model

deepseek-r1-1.5B-ax630c

The Base Model providing a 128 context window and a maximum output of 1,024 tokens.

Support Platforms: LLM630 Compute Kit, Module LLM, and Module LLM Kit

128 context window
1,024 max output tokens
ttft 1075.04ms
avg-token/s 3.57

Install

apt install llm-model-deepseek-r1-1.5b-ax630c

Download llm-model-deepseek-r1-1.5b-ax630c

Long-Context Model

deepseek-r1-1.5B-p256-ax630c

The Long-Context Model Compared to the Base Model, it provides extended context capabilities, offering a 256 context window and a maximum of 1,024 output tokens.

Support Platforms: LLM630 Compute Kit, Module LLM, Module LLM Kit

256 context window
1,024 max output tokens
ttft 3056.86ms
avg-token/s 3.57

Install

apt install llm-model-deepseek-r1-1.5b-p256-ax630c

Download llm-model-deepseek-r1-1.5b-p256-ax630c

Next Overview

Overview

Devices & Quick Start

Module LLM

LLM630 Compute Kit

Models

Qwen2.5

Qwen3

DeepSeek-R1

SmolVLM

MeloTTS

Whisper

Llama

OpenAI API

DeepSeek-R1-Distill-Qwen-1.5B

Introduction

Available NPU Models

Base Model

Install

Long-Context Model

Install

On This Page