Implement input text conversion to an output audio file via API interface.
Before running the example program, the corresponding model package must be installed on the device. Refer to Model List for the model package installation tutorial.
Before running this example program, please ensure the following preparations have been completed on the LLM device:
llm-model-melotts-en-us
model package using the apt package management tool.apt install llm-model-melotts-en-us
ffmpeg
tool.apt install ffmpeg
systemctl restart llm-openai-api
On the PC side, use the OpenAI API to pass in text to implement text-to-speech conversion. Before running the example program, modify the IP part of the base_url
below to the actual IP address of the device.
from pathlib import Path
from openai import OpenAI
client = OpenAI(
api_key="sk-",
base_url="http://192.168.20.186:8000/v1"
)
speech_file_path = Path(__file__).parent / "speech.mp3"
with client.audio.speech.with_streaming_response.create(
model="melotts-en-us",
voice="alloy",
input="The quick brown fox jumped over the lazy dog."
) as response:
response.stream_to_file(speech_file_path)
Parameter Name | Type | Required | Example Value | Description |
---|---|---|---|---|
input | string | yes | "Hello, welcome to the system" | The text content to generate audio for; maximum length is 1024 characters |
model | string | yes | melotts-zh-cn | Available TTS models, including melotts-zh-cn and melotts-en-us |
voice | – | no | – | Voice style selection (not currently supported) |
response_format | string | no | mp3 | Audio output format; supports mp3 , opus , aac , flac , wav , pcm , etc. |
speed | number | no | 1.0 | Speech generation speed; range 0.25–2.0, default is 1.0 |
speech_file_path
specified in the example program.