pdf-icon

CosyVoice2-API

We provide a usage method compatible with the OpenAI API. You only need to install the StackFlow package.

Preparation

  1. Refer to AI Pyramid Software Package Update to complete the installation of the following model packages and software packages.
apt install lib-llm llm-sys llm-cosy-voice llm-openai-api
apt install llm-model-cosyvoice2-0.5b-ax650
Note
After installing a new model each time, you need to manually execute systemctl restart llm-openai-api to update the model list.
Note
CosyVoice2 is an LLM-based speech generation model capable of synthesizing natural and fluent speech. However, due to resource or design limitations, the length of audio generated each time is limited. In the current version, the maximum generated audio length is 27s. The first model load may be slow, please wait patiently.

Curl Call

curl http://127.0.0.1:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "CosyVoice2-0.5B-ax650",
    "response_format": "wav",
    "input": "君不见黄河之水天上来,奔流到海不复回。君不见高堂明镜悲白发,朝如青丝暮成雪。人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。"
  }' \
  -o output.wav

Python Call

from pathlib import Path
from openai import OpenAI
client = OpenAI(
    api_key="sk-",
    base_url="http://127.0.0.1:8000/v1"
)
speech_file_path = Path(__file__).parent / "output.wav"
with client.audio.speech.with_streaming_response.create(
  model="CosyVoice2-0.5B-ax650",
  voice="prompt_data",
  response_format="wav",
  input='君不见黄河之水天上来,奔流到海不复回。君不见高堂明镜悲白发,朝如青丝暮成雪。人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。',
) as response:
  response.stream_to_file(speech_file_path)

Voice Cloning

  1. Manually download the model and upload it to AI Pyramid, or pull the model repository using the following command.
Tip
If git lfs is not installed, first refer to git lfs installation instructions to install it.
git clone --recurse-submodules https://huggingface.co/M5Stack/CosyVoice2-scripts

File Description

root@m5stack-AI-Pyramid:~/CosyVoice2-scripts# ls -lh
total 28K
drwxr-xr-x 2 root root 4.0K Jan  9 10:26 asset
drwxr-xr-x 2 root root 4.0K Jan  9 10:26 CosyVoice-BlankEN
drwxr-xr-x 2 root root 4.0K Jan  9 10:27 frontend-onnx
drwxr-xr-x 3 root root 4.0K Jan  9 10:26 pengzhendong
-rw-r--r-- 1 root root   24 Jan  9 10:26 README.md
-rw-r--r-- 1 root root  103 Jan  9 10:26 requirements.txt
drwxr-xr-x 3 root root 4.0K Jan  9 10:26 scripts
  1. Create a virtual environment
Tip
For the first creation, you need to install apt install python3.10-venv first.
python3 -m venv cosyvoice
  1. Activate the virtual environment
source cosyvoice/bin/activate
  1. Install dependency packages
pip install -r requirements.txt
  1. Run the process_prompt script
python3 scripts/process_prompt.py --prompt_text  asset/zh_woman1.txt --prompt_speech asset/zh_woman1.wav --output zh_woman1

Successfully generate the audio feature file

(cosyvoice) root@m5stack-AI-Pyramid:~/CosyVoice2-scripts# python3 scripts/process_prompt.py --prompt_text  asset/zh_woman1.txt --prompt_speech asset/zh_woman1.wav --output zh_woman1
2026-01-09 10:41:18.655905428 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"
prompt_text 希望你以后能够做的比我还好呦。
fmax 8000
prompt speech token size: torch.Size([1, 87])
  1. Copy the 'zh_woman1' file to the model directory and reinitialize the model.
cp -r zh_woman1 /opt/m5stack/data/CosyVoice2-0.5B-ax650/
systemctl restart llm-sys # Reset model configuration
Tip
If you want to replace the default cloned voice, modify the prompt_dir field in the /opt/m5stack/data/models/mode_CosyVoice2-0.5B-ax650.json file to the replacement directory. Each time the voice is replaced, the model needs to be reinitialized.

Curl Call

curl http://127.0.0.1:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "CosyVoice2-0.5B-ax650",
    "voice": "zh_woman1",
    "response_format": "wav",
    "input": "君不见黄河之水天上来,奔流到海不复回。君不见高堂明镜悲白发,朝如青丝暮成雪。人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。"
  }' \
  -o output.wav

Python Call

from pathlib import Path
from openai import OpenAI
client = OpenAI(
    api_key="sk-",
    base_url="http://127.0.0.1:8000/v1"
)
speech_file_path = Path(__file__).parent / "output.wav"
with client.audio.speech.with_streaming_response.create(
  model="CosyVoice2-0.5B-ax650",
  voice="zh_woman1",
  response_format="wav",
  input='君不见黄河之水天上来,奔流到海不复回。君不见高堂明镜悲白发,朝如青丝暮成雪。人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。',
) as response:
  response.stream_to_file(speech_file_path)
On This Page