CosyVoice2-API We provide a usage method compatible with the OpenAI API. You only need to install the StackFlow package.
Preparation Refer to AI Pyramid Software Package Update to complete the installation of the following model packages and software packages. apt install lib-llm llm-sys llm-cosy-voice llm-openai-api apt install lib-llm llm-sys llm-cosy-voice llm-openai-api
apt install llm-model-cosyvoice2-0.5b-ax650 apt install llm-model-cosyvoice2-0.5b-ax650
Note
After installing a new model each time, you need to manually execute systemctl restart llm-openai-api to update the model list.
Note
CosyVoice2 is an LLM-based speech generation model capable of synthesizing natural and fluent speech. However, due to resource or design limitations, the length of audio generated each time is limited. In the current version, the maximum generated audio length is 27s. The first model load may be slow, please wait patiently.
Curl Call curl http://127.0.0.1:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "CosyVoice2-0.5B-ax650",
"response_format": "wav",
"input": "君不见黄河之水天上来,奔流到海不复回。君不见高堂明镜悲白发,朝如青丝暮成雪。人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。"
}' \
-o output.wav curl http://127.0.0.1:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "CosyVoice2-0.5B-ax650",
"response_format": "wav",
"input": "君不见黄河之水天上来,奔流到海不复回。君不见高堂明镜悲白发,朝如青丝暮成雪。人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。"
}' \
-o output.wav
Python Call from pathlib import Path
from openai import OpenAI
client = OpenAI(
api_key="sk-" ,
base_url="http://127.0.0.1:8000/v1"
)
speech_file_path = Path(__file__).parent / "output.wav"
with client.audio.speech.with_streaming_response.create(
model="CosyVoice2-0.5B-ax650" ,
voice="prompt_data" ,
response_format="wav" ,
input ='君不见黄河之水天上来,奔流到海不复回。君不见高堂明镜悲白发,朝如青丝暮成雪。人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。' ,
) as response:
response.stream_to_file(speech_file_path) from pathlib import Path
from openai import OpenAI
client = OpenAI(
api_key="sk-",
base_url="http://127.0.0.1:8000/v1"
)
speech_file_path = Path(__file__).parent / "output.wav"
with client.audio.speech.with_streaming_response.create(
model="CosyVoice2-0.5B-ax650",
voice="prompt_data",
response_format="wav",
input='君不见黄河之水天上来,奔流到海不复回。君不见高堂明镜悲白发,朝如青丝暮成雪。人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。',
) as response:
response.stream_to_file(speech_file_path)
Voice Cloning Manually download the model and upload it to AI Pyramid, or pull the model repository using the following command. git clone --recurse-submodules https://huggingface.co/M5Stack/CosyVoice2-scripts git clone --recurse-submodules https://huggingface.co/M5Stack/CosyVoice2-scripts
File Description
root@m5stack-AI-Pyramid:~/CosyVoice2-scripts
total 28K
drwxr-xr-x 2 root root 4.0K Jan 9 10:26 asset
drwxr-xr-x 2 root root 4.0K Jan 9 10:26 CosyVoice-BlankEN
drwxr-xr-x 2 root root 4.0K Jan 9 10:27 frontend-onnx
drwxr-xr-x 3 root root 4.0K Jan 9 10:26 pengzhendong
-rw-r--r-- 1 root root 24 Jan 9 10:26 README.md
-rw-r--r-- 1 root root 103 Jan 9 10:26 requirements.txt
drwxr-xr-x 3 root root 4.0K Jan 9 10:26 scripts root@m5stack-AI-Pyramid:~/CosyVoice2-scripts# ls -lh
total 28K
drwxr-xr-x 2 root root 4.0K Jan 9 10:26 asset
drwxr-xr-x 2 root root 4.0K Jan 9 10:26 CosyVoice-BlankEN
drwxr-xr-x 2 root root 4.0K Jan 9 10:27 frontend-onnx
drwxr-xr-x 3 root root 4.0K Jan 9 10:26 pengzhendong
-rw-r--r-- 1 root root 24 Jan 9 10:26 README.md
-rw-r--r-- 1 root root 103 Jan 9 10:26 requirements.txt
drwxr-xr-x 3 root root 4.0K Jan 9 10:26 scripts
Create a virtual environment Tip
For the first creation, you need to install apt install python3.10-venv first.
python3 -m venv cosyvoice python3 -m venv cosyvoice
Activate the virtual environment source cosyvoice/bin/activate source cosyvoice/bin/activate
Install dependency packages pip install -r requirements.txt pip install -r requirements.txt
Run the process_prompt script python3 scripts/process_prompt.py --prompt_text asset/zh_woman1.txt --prompt_speech asset/zh_woman1.wav --output zh_woman1 python3 scripts/process_prompt.py --prompt_text asset/zh_woman1.txt --prompt_speech asset/zh_woman1.wav --output zh_woman1
Successfully generate the audio feature file
(cosyvoice) root@m5stack-AI-Pyramid:~/CosyVoice2-scripts
2026-01-09 10:41:18.655905428 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"
prompt_text 希望你以后能够做的比我还好呦。
fmax 8000
prompt speech token size: torch.Size([1, 87]) (cosyvoice) root@m5stack-AI-Pyramid:~/CosyVoice2-scripts# python3 scripts/process_prompt.py --prompt_text asset/zh_woman1.txt --prompt_speech asset/zh_woman1.wav --output zh_woman1
2026-01-09 10:41:18.655905428 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"
prompt_text 希望你以后能够做的比我还好呦。
fmax 8000
prompt speech token size: torch.Size([1, 87])
Copy the 'zh_woman1' file to the model directory and reinitialize the model. cp -r zh_woman1 /opt/m5stack/data/CosyVoice2-0.5B-ax650/ cp -r zh_woman1 /opt/m5stack/data/CosyVoice2-0.5B-ax650/
systemctl restart llm-sys systemctl restart llm-sys # Reset model configuration
Tip
If you want to replace the default cloned voice, modify the prompt_dir field in the /opt/m5stack/data/models/mode_CosyVoice2-0.5B-ax650.json file to the replacement directory. Each time the voice is replaced, the model needs to be reinitialized.
Curl Call curl http://127.0.0.1:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "CosyVoice2-0.5B-ax650",
"voice": "zh_woman1",
"response_format": "wav",
"input": "君不见黄河之水天上来,奔流到海不复回。君不见高堂明镜悲白发,朝如青丝暮成雪。人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。"
}' \
-o output.wav curl http://127.0.0.1:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "CosyVoice2-0.5B-ax650",
"voice": "zh_woman1",
"response_format": "wav",
"input": "君不见黄河之水天上来,奔流到海不复回。君不见高堂明镜悲白发,朝如青丝暮成雪。人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。"
}' \
-o output.wav
Python Call from pathlib import Path
from openai import OpenAI
client = OpenAI(
api_key="sk-" ,
base_url="http://127.0.0.1:8000/v1"
)
speech_file_path = Path(__file__).parent / "output.wav"
with client.audio.speech.with_streaming_response.create(
model="CosyVoice2-0.5B-ax650" ,
voice="zh_woman1" ,
response_format="wav" ,
input ='君不见黄河之水天上来,奔流到海不复回。君不见高堂明镜悲白发,朝如青丝暮成雪。人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。' ,
) as response:
response.stream_to_file(speech_file_path) from pathlib import Path
from openai import OpenAI
client = OpenAI(
api_key="sk-",
base_url="http://127.0.0.1:8000/v1"
)
speech_file_path = Path(__file__).parent / "output.wav"
with client.audio.speech.with_streaming_response.create(
model="CosyVoice2-0.5B-ax650",
voice="zh_woman1",
response_format="wav",
input='君不见黄河之水天上来,奔流到海不复回。君不见高堂明镜悲白发,朝如青丝暮成雪。人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。',
) as response:
response.stream_to_file(speech_file_path)