pdf-icon

Product Guide

Industrial Control

Real-Time AI Voice Assistant

AtomS3R-M12 Volcengine Kit

Offline Voice Recognition

Thread

Module Gateway H2

IoT Measuring Instruments

IoT Cloud

Ethernet Camera

DIP Switch Usage Guide

Module GPS v2.0

Module GNSS

Module ExtPort For Core2

Module LoRa868 V1.2

Text-to-Speech

Implement input text conversion to an output audio file via API interface.

Preparation

Before running the example program, the corresponding model package must be installed on the device. Refer to Model List for the model package installation tutorial.

Before running this example program, please ensure the following preparations have been completed on the LLM device:

  1. Install the llm-model-melotts-en-us model package using the apt package management tool.
apt install llm-model-melotts-en-us
  1. Install the ffmpeg tool.
apt install ffmpeg
  1. After installation, restart the OpenAI service to make the new model take effect.
systemctl restart llm-openai-api

Example

On the PC side, use the OpenAI API to pass in text to implement text-to-speech conversion. Before running the example program, modify the IP part of the base_url below to the actual IP address of the device.

from pathlib import Path
from openai import OpenAI

client = OpenAI(
    api_key="sk-",
    base_url="http://192.168.20.186:8000/v1"
)

speech_file_path = Path(__file__).parent / "speech.mp3"
with client.audio.speech.with_streaming_response.create(
  model="melotts-en-us",
  voice="alloy",
  input="The quick brown fox jumped over the lazy dog."
) as response:
  response.stream_to_file(speech_file_path)

Request Parameters

Parameter Name Type Required Example Value Description
input string yes "Hello, welcome to the system" The text content to generate audio for; maximum length is 1024 characters
model string yes melotts-zh-cn Available TTS models, including melotts-zh-cn and melotts-en-us
voice no Voice style selection (not currently supported)
response_format string no mp3 Audio output format; supports mp3, opus, aac, flac, wav, pcm, etc.
speed number no 1.0 Speech generation speed; range 0.25–2.0, default is 1.0

Response Example

  • The audio file data will be saved to the speech_file_path specified in the example program.
On This Page