pdf-icon

Product Guide

Offline Voice Recognition

Industrial Control

IoT Measuring Instruments

Air Quality

PowerHub

Module13.2 PPS

Input Device

Ethernet Camera

DIP Switch Usage Guide

Module GPS v2.0

Module GNSS

Module ExtPort For Core2

Module LoRa868 V1.2

CosyVoice2-API

We provide an OpenAI API-compatible interface. You only need to install the StackFlow package.

Preparation

  1. Refer to RaspberryPi & LLM8850 Software Package Acquisition Tutorial to complete the installation of the following model and software packages.
sudo apt install lib-llm llm-sys llm-cosy-voice llm-openai-api
sudo apt install llm-model-cosyvoice2-0.5b-axcl
Note
After installing a new model, you need to manually execute sudo systemctrl restart llm-openai-api to update the model list.
Note
CosyVoice2 is a speech generation model based on LLM, capable of synthesizing natural and fluent speech. However, due to resource or design limitations, the length of each generated audio is limited. The current version supports a maximum audio length of 27 seconds. The first model load may be slow, so please be patient.

Curl Invocation

curl http://127.0.0.1:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "CosyVoice2-0.5B-axcl",
    "response_format": "wav",
    "input": "But thy eternal summer shall not fade, Nor lose possession of that fair thou ow’st; Nor shall Death brag thou wander’st in his shade, When in eternal lines to time thou grow’st; So long as men can breathe or eyes can see, So long lives this, and this gives life to thee."
  }' \
  -o output.wav

Python Invocation

from pathlib import Path
from openai import OpenAI
client = OpenAI(
    api_key="sk-",
    base_url="http://127.0.0.1:8000/v1"
)
speech_file_path = Path(__file__).parent / "output.wav"
with client.audio.speech.with_streaming_response.create(
  model="CosyVoice2-0.5B-axcl",
  voice="prompt_data",
  response_format="wav",
  input="But thy eternal summer shall not fade, Nor lose possession of that fair thou ow’st; Nor shall Death brag thou wander’st in his shade, When in eternal lines to time thou grow’st; So long as men can breathe or eyes can see, So long lives this, and this gives life to thee.",
) as response:
  response.stream_to_file(speech_file_path)

Voice Cloning

  1. Manually download the model and upload it to raspberrypi5, or pull the repository with the following command.
Tip
If git lfs is not installed, please refer to git lfs Installation Instructions for installation.
git clone --recurse-submodules https://huggingface.co/M5Stack/CosyVoice2-scripts

File Description

m5stack@raspberrypi:~/rsp/CosyVoice2-scripts $ ls -lh
total 28K
drwxrwxr-x 2 m5stack m5stack 4.0K Nov  6 15:18 asset
drwxrwxr-x 2 m5stack m5stack 4.0K Nov  6 15:18 CosyVoice-BlankEN
drwxrwxr-x 2 m5stack m5stack 4.0K Nov  6 15:19 frontend-onnx
drwxrwxr-x 3 m5stack m5stack 4.0K Nov  6 15:18 pengzhendong
-rw-rw-r-- 1 m5stack m5stack   24 Nov  6 15:18 README.md
-rw-rw-r-- 1 m5stack m5stack  103 Nov  6 15:18 requirements.txt
drwxrwxr-x 3 m5stack m5stack 4.0K Nov  6 15:18 scripts
  1. Create a virtual environment
python -m venv cosyvoice
  1. Activate the virtual environment
source cosyvoice/bin/activate
  1. Install dependency packages
pip install -r requirements.txt
  1. Run the process_prompt script
python3 scripts/process_prompt.py --prompt_text asset/en_woman1.txt --prompt_speech asset/en_woman1.mp3 --output en_woman1

Successfully generated the audio feature file

(cosyvoice) m5stack@raspberrypi:~/rsp/CosyVoice2-scripts $ python3 scripts/process_prompt.py --prompt_text asset/en_woman1.txt --prompt_speech asset/en_woman1.mp3 --output en_woman1
2025-11-06 16:16:01.526554414 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"
prompt_text But many of these southern girls have the same trouble, said Holly.
fmax 8000
prompt speech token size: torch.Size([1, 103])
  1. Copy the 'en_woman1' folder to the model directory and reinitialize the model.
cp -r en_woman1 /opt/m5stack/data/CosyVoice2-0.5B-axcl/
sudo systemctl restart llm-sys # Reset model configuration
Tip
To replace the default cloned voice, modify the prompt_dir field in the /opt/m5stack/data/models/mode_CosyVoice2-0.5B-axcl.json file to the new directory. Each replacement requires model reinitialization.

Curl Invocation

curl http://127.0.0.1:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "CosyVoice2-0.5B-axcl",
    "voice": "en_woman1",
    "response_format": "wav",
    "input": "But thy eternal summer shall not fade, Nor lose possession of that fair thou ow’st; Nor shall Death brag thou wander’st in his shade, When in eternal lines to time thou grow’st; So long as men can breathe or eyes can see, So long lives this, and this gives life to thee."
  }' \
  -o output.wav

Python Invocation

from pathlib import Path
from openai import OpenAI
client = OpenAI(
    api_key="sk-",
    base_url="http://127.0.0.1:8000/v1"
)
speech_file_path = Path(__file__).parent / "output.wav"
with client.audio.speech.with_streaming_response.create(
  model="CosyVoice2-0.5B-axcl",
  voice="en_woman1",
  response_format="wav",
  input="But thy eternal summer shall not fade, Nor lose possession of that fair thou ow’st; Nor shall Death brag thou wander’st in his shade, When in eternal lines to time thou grow’st; So long as men can breathe or eyes can see, So long lives this, and this gives life to thee.",
) as response:
  response.stream_to_file(speech_file_path)
On This Page