M5Module-LLM Arduino Driver Library API Documentation.
M5ModuleLLM
is used to initialize the Module LLM and provides internal members for quick initialization of various LLM units, making it easier to build applications according to your needs.
class M5ModuleLLM {
public:
bool begin(Stream * targetPort);
bool checkConnection();
void update();
m5_module_llm::ApiSys sys;
m5_module_llm::ApiLlm llm;
m5_module_llm::ApiAudio audio;
m5_module_llm::ApiTts tts;
m5_module_llm::ApiTts melotts;
m5_module_llm::ApiKws kws;
m5_module_llm::ApiAsr asr;
m5_module_llm::ApiAsr yolo;
m5_module_llm::ApiVad vad;
m5_module_llm::ApiWhisper whisper;
m5_module_llm::ApiDepthAnything depthanything;
m5_module_llm::ModuleMsg msg;
m5_module_llm::ModuleComm comm;
private:
};
Function Prototype:
bool begin(Stream* targetPort);
Function Description:
Parameters:
Return Value:
Function Prototype:
bool checkConnection();
Function Description:
sys.ping
command to check the connection status of the Module LLM.Parameters:
Return Value:
Function Prototype:
void update();
Function Description:
Parameters:
Return Value:
The internal member ApiSys sys
of M5ModuleLLM
is used to control the SYS unit, enabling operations like system reset.
Function Prototype:
int ping();
Function Description:
sys.ping
command to check the connection status of the Module LLM.Parameters:
Return Value:
Function Prototype:
int reset(bool waitResetFinish = true);
Function Description:
sys.reset
command to reset the software service.Parameters:
Return Value:
Function Prototype:
int reboot();
Function Description:
sys.reboot
command to reboot the system.Parameters:
Return Value:
Note: This function has been deprecated in version 1.3 and later, and is now automatically configured internally.
The internal member ApiAudio audio
of M5ModuleLLM
is used to control the initialization and configuration of the Audio unit.
Function prototype:
String setup(ApiAudioSetupConfig_t config = ApiAudioSetupConfig_t(), String request_id = "audio_setup");
Function description:
Parameters:
ApiAudioSetupConfig_t config:
struct ApiAudioSetupConfig_t {
int capcard = 0;
int capdevice = 0;
float capVolume = 0.5;
int playcard = 0;
int playdevice = 1;
float playVolume = 0.15;
};
Parameter | Description | Input Values |
---|---|---|
capcard | Microphone sound card index | Default sound card: 0 |
capdevice | Microphone device index | Onboard silicon microphone: 0 |
capVolume | Input volume | 0.0~10.0 (1<volume increases gain, default is 0.5) |
playcard | Speaker sound card index | Default sound card: 0 |
playdevice | Speaker device index | Onboard speaker: 1 |
playVolume | Output volume | 0.0~10.0 (1<volume increases gain, default is 0.5) |
Return Value:
The internal member ApiCamera camera
of M5ModuleLLM
is used to control the initialization and configuration of the Camera unit.
Function prototype:
String setup(ApiCameraSetupConfig_t config = ApiCameraSetupConfig_t(), String request_id = "camera_setup");
Function description:
Parameters:
ApiCameraSetupConfig_t config:
struct ApiCameraSetupConfig_t {
String response_format = "camera.raw";
String input = "/dev/video0";
bool enoutput = false;
int frame_width = 320;
int frame_height = 320;
};
Parameter | Description | Input Values |
---|---|---|
input | UVC index | "/dev/video0" |
enoutput | Whether to output image data via serial | Enable: true Disable: false |
frame_width | Image width | 320 |
frame_height | Image height | 320 |
Return Value:
The internal member ApiKws kws
of M5ModuleLLM
is used to control the initialization and configuration of the KWS unit.
Function Prototype:
String setup(ApiKwsSetupConfig_t config = ApiKwsSetupConfig_t(), String request_id = "kws_setup",
String language = "en_US");```
**Function Description:**
- Initializes the KWS unit and configures the wake-up keyword.
**Parameters:**
ApiKwsSetupConfig_t config:
- KWS unit initialization configuration:
- String request_id:
- Session ID, default can be used.
```cpp
struct ApiKwsSetupConfig_t {
String kws = "HELLO";
String model = "sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01";
String response_format = "kws.bool";
String input = "sys.pcm";
bool enoutput = true;
};
Parameter | Description | Input Values |
---|---|---|
model | Conversion Model | English Model: "sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01" Chinese Model: "sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01" |
kws | KWS Wake-up Word Text | No mixing of Chinese and English; English must be in uppercase |
enoutput | Enable UART Output | Enable: true Disable: false |
Return Value:
The internal member ApiVad vad
of M5ModuleLLM
is used to control the initialization and configuration of the VAD unit.
Function Prototype:
String setup(ApiVadSetupConfig_t config = ApiVadSetupConfig_t(), String request_id = "vad_setup");
Function Description:
Parameters:
ApiVadSetupConfig_t config:
struct ApiKwsSetupConfig_t {
String model = "silero-vad";
String response_format = "vad.bool";
String input = {"sys.pcm", "kws.1000"};
bool enoutput = true;
};
Parameter | Description | Input Values |
---|---|---|
model | Conversion Model | Model: "silero-vad" |
input | Input | KWS Wake-up Input: "kws.xxx" (input the KWS unit's work_id) Onboard Microphone Input: "sys.pcm" UART Stream Input: "vad.wav.stream.base64" |
enoutput | Enable UART Output | Enable: true Disable: false |
Return Value:
The internal member ApiAsr asr
of M5ModuleLLM
is used to control the initialization and configuration of the ASR unit.
Function Prototype:
String setup(ApiAsrSetupConfig_t config = ApiAsrSetupConfig_t(), String request_id = "asr_setup",
String language = "en_US");
Function Description:
Input Parameters:
ApiAsrSetupConfig_t config:
struct ApiAsrSetupConfig_t {
String model = "sherpa-ncnn-streaming-zipformer-20M-2023-02-17";
String response_format = "asr.utf-8.stream";
String input = ["sys.pcm", "kws.1000"];
bool enoutput = true;
float rule1 = 2.4;
float rule2 = 1.2;
float rule3 = 30.0;
};
Parameter | Description | Input Values |
---|---|---|
model | Conversion model | English Model: "sherpa-ncnn-streaming-zipformer-20M-2023-02-17" Chinese Model: "sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23" |
response_format | Output format | Normal output: "asr.utf-8" Stream output: "asr.utf-8.stream" |
input | Input | KWS wake input: "kws.xxx" (input kws unit work_id) Onboard microphone input: "sys.pcm" UART stream input: "asr.wav.stream.base64" |
rule1 | Timeout for unrecognized content wake | Unit: seconds |
rule2 | Maximum recognition interval | Unit: seconds |
rule3 | Maximum recognition timeout | Unit: seconds |
enoutput | Enable UART output | Enable: true Disable: false |
Return Value:
The internal member ApiWhisper whisper
of M5ModuleLLM
is used to control the initialization and configuration of the Whisper unit.
Function Prototype:
String setup(ApiWhisperSetupConfig_t config = ApiWhisperSetupConfig_t(), String request_id = "asr_setup",
Function Description:
Input Parameters:
ApiWhisperSetupConfig_t config:
struct ApiAsrSetupConfig_t {
String model = "whisper-tiny";
String response_format = "asr.utf-8";
String input = [ "sys.pcm", "kws.1000", "vad.1001" ];
String language = "en";
bool enoutput = true;
};
Parameter | Description | Input Values |
---|---|---|
model | Conversion model | Model: "whisper-tiny" |
response_format | Output format | Normal output: "asr.utf-8" |
input | Input | KWS wake input: "kws.xxx" (input kws unit work_id) Onboard microphone input: "sys.pcm" UART stream input: "asr.wav.stream.base64" |
language | Language used for language recognition | Default: "en" Optional: "zh", "ja" |
enoutput | Enable UART output | Enable: true Disable: false |
Return Value:
The internal member ApiLlm llm
of M5ModuleLLM
is used to control the initialization and configuration of the LLM unit.
Function prototype:
String setup(ApiLlmSetupConfig_t config = ApiLlmSetupConfig_t(), String request_id = "llm_setup");
Function Description:
Parameters:
struct ApiLlmSetupConfig_t {
String prompt;
String model = "qwen2.5-0.5B-prefill-20e";
String response_format = "llm.utf-8.stream";
String input = ["llm.utf-8", "kws.1000"];
bool enoutput = true;
int max_token_len = 127;
};
Parameter | Description | Input Values |
---|---|---|
model | Model used for conversion | Predefined model "qwen2.5-0.5B-prefill-20e" |
response_format | Output format | Normal output: "llm.utf-8" Streaming output: "llm.utf-8.stream" |
input | Input format | ASR input: "asr.xxx" (work_id of the ASR unit) UART input: "llm.utf-8" KWS wake-up interruption: "kws.xxx" (work_id of the KWS unit) |
max_length | Configures the maximum output token length (maximum returned inference text length) | Maximum value: 1023 |
prompt | Model initialization system prompt | String |
enoutput | Enable UART output | Enable: true Disable: false |
Return Value:
llm_work_id
: LLM unit work IDFunction prototype:
int inference(String work_id, String input, String request_id = "llm_inference");
Function Description:
responseMsgList
container in M5ModuleLLM.msg
.Parameters:
Return Value:
MODULE_LLM_OK
/ Error CodeFunction prototype:
int inferenceAndWaitResult(String work_id, String input, std::function<void(String&)> onResult, uint32_t timeout = 5000, String request_id = "llm_inference");
Function Description:
Parameters:
Return Value:
MODULE_LLM_OK
/ Error CodeThe internal member ApiVlm vlm
of M5ModuleLLM
is used to control the initialization and configuration of the VLM unit.
Function prototype:
String setup(ApiVlmSetupConfig_t config = ApiVlmSetupConfig_t(), String request_id = "vlm_setup");
Function Description:
Parameters:
struct ApiVlmSetupConfig_t {
String prompt;
String model = "internvl2.5-1B-ax630c";
String response_format = "vlm.utf-8.stream";
String input = ["vlm.utf-8", "kws.1000"];
bool enoutput = true;
int max_token_len = 1023;
};
Parameter | Description | Input Values |
---|---|---|
model | Model used for conversion | Predefined model "internvl2.5-1B-ax630c" |
response_format | Output format | Normal output: "vlm.utf-8" Streaming output: "vlm.utf-8.stream" |
input | Input format | ASR input: "asr.xxx" (work_id of the ASR unit) UART input: "llm.utf-8" KWS wake-up interruption: "kws.xxx" (work_id of the KWS unit) |
max_length | Configures the maximum output token length (maximum returned inference text length) | Maximum value: 1023 |
prompt | Model initialization system prompt | String |
enoutput | Enable UART output | Enable: true Disable: false |
Return Value:
vlm_work_id
: VLM unit work IDFunction prototype:
int inference(String work_id, String input, String request_id = "vlm_inference");
Function Description:
responseMsgList
container in M5ModuleLLM.msg
.Parameters:
Return Value:
MODULE_LLM_OK
/ Error CodeFunction prototype:
int inferenceAndWaitResult(String work_id, String input, std::function<void(String&)> onResult,
uint32_t timeout = 5000, String request_id = "vlm_inference");
Function Description:
Parameters:
Return Value:
MODULE_LLM_OK
/ Error CodeThe internal member ApiTts tts
of M5ModuleLLM
is used to control the initialization and configuration of the TTS unit.
Function prototype:
String setup(ApiTtsSetupConfig_t config = ApiTtsSetupConfig_t(), String request_id = "tts_setup");
Function description:
Parameters:
struct ApiTtsSetupConfig_t {
String model = "single_speaker_english_fast";
String response_format = "sys.pcm";
String input = ["tts.utf-8.stream", "kws.1000"];
bool enoutput = false;
bool enaudio = true;
};
Parameter | Description | Input values |
---|---|---|
model | Conversion model | English model: "single_speaker_english_fast" Chinese model: "single_speaker_fast" |
input | Input | LLM input: "llm.xxx" (input LLM unit's work_id) UART input: "tts.utf-8" UART stream input: "tts.utf-8.stream" KWS wake-up interrupt: "kws.xxx" (input KWS unit's work_id) |
enoutput | Enable UART output | Enable: true Disable: false |
enaudio | Enable speaker playback | Enable: true Disable: true |
Return value:
Function prototype:
int inference(String work_id, String input, uint32_t timeout = 0, String request_id = "tts_inference");
Function description:
Parameters:
Return value:
The internal member ApiMelotts melotts
of M5ModuleLLM
is used to control the initialization and configuration of the Melotts unit.
Function prototype:
String setup(ApiMelottsSetupConfig_t config = ApiMelottsSetupConfig_t(), String request_id = "melotts_setup",
String language = "en_US");
Function description:
Parameters:
struct ApiMelottsSetupConfig_t {
String model = "melotts_zh-cn";
String response_format = "sys.pcm";
std::vector<String> input = {"tts.utf-8.stream"};
bool enoutput = false;
bool enaudio = true;
};
Parameter | Description | Input values |
---|---|---|
model | Conversion model | Chinese and English model: "melotts_zh-cn" Chinese model: "single_speaker_fast" |
input | Input | LLM input: "llm.xxx" (input LLM unit's work_id) UART input: "melotts.utf-8" UART stream input: "melotts.utf-8.stream" |
enoutput | Enable UART output | Enable: true Disable: false |
enaudio | Enable speaker playback | Enable: true Disable: true |
Return value:
Function prototype:
int inference(String work_id, String input, uint32_t timeout = 0, String request_id = "tts_inference");
Function description:
Parameters:
Return value:
The internal member ApiYolo yolo
of M5ModuleLLM
is used to control the initialization and configuration of the Yolo unit.
Function prototype:
String setup(ApiYoloSetupConfig_t config = ApiYoloSetupConfig_t(), String request_id = "yolo_setup");
Function description:
Parameters:
struct ApiYoloSetupConfig_t {
String model = "yolo11n";
String response_format = "yolo.box.stream";
std::vector<String> input = {"yolo.jpeg.base64"};
bool enoutput = true;
};
Parameter | Description | Input values |
---|---|---|
model | Conversion model | Detection model: "yolo11n" Pose model: "yolo11n-pose" Hand pose model: "yolo11n-hand-pose" |
response_format | Output format | Detection output: "yolo.box.stream" Pose output: "yolo.pose.stream" |
input | Input | UVC input: "camera.xxx" (input camera unit's work_id) UART stream input: "yolo.jpeg.base64.stream" |
enoutput | Enable UART output | Enable: true Disable: false |
Return value:
The internal member ModuleMsg msg
of M5ModuleLLM
provides a container responseMsgList
used to cache various information returned from the Module LLM. Refer to the following example, where the main loop iterates to retrieve the results.
void loop()
{
module_llm.update();
// Handle response msg
for (auto& msg : module_llm.msg.responseMsgList) {
// KWS msg
if (msg.work_id == kws_work_id) {
Serial.printf(">> Keyword detected\n");
}
// ASR msg
if (msg.work_id == asr_work_id) {
if (msg.object == "asr.utf-8.stream") {
// Parse and get asr result
JsonDocument doc;
deserializeJson(doc, msg.raw_msg);
String asr_result = doc["data"]["delta"].as<String>();
Serial.printf(">> %s\n", asr_result.c_str());
}
}
}
module_llm.msg.responseMsgList.clear();
}
M5ModuleLLM_VoiceAssistant
is used to quickly create an LLM voice assistant instance, achieving a fast implementation of KWS (keyword spotting) -> ASR (speech-to-text) -> LLM (large model inference) -> TTS (text-to-speech).
M5ModuleLLM
instance to the constructor, and register the corresponding event callback functions to complete the voice assistant setup./*
* SPDX-FileCopyrightText: 2024 M5Stack Technology CO LTD
*
* SPDX-License-Identifier: MIT
*/
#include <Arduino.h>
#include <M5Unified.h>
#include <M5ModuleLLM.h>
M5ModuleLLM module_llm;
M5ModuleLLM_VoiceAssistant voice_assistant(&module_llm);
/* On ASR data callback */
void on_asr_data_input(String data, bool isFinish, int index)
{
M5.Display.setTextColor(TFT_GREEN, TFT_BLACK);
M5.Display.printf(">> %s\n", data.c_str());
/* If ASR data is finish */
if (isFinish) {
M5.Display.setTextColor(TFT_YELLOW, TFT_BLACK);
M5.Display.print(">> ");
}
};
/* On LLM data callback */
void on_llm_data_input(String data, bool isFinish, int index)
{
M5.Display.print(data);
/* If LLM data is finish */
if (isFinish) {
M5.Display.print("\n");
}
};
void setup()
{
M5.begin();
M5.Display.setTextSize(2);
M5.Display.setTextScroll(true);
/* Init module serial port */
Serial2.begin(115200, SERIAL_8N1, 16, 17); // Basic
// Serial2.begin(115200, SERIAL_8N1, 13, 14); // Core2
// Serial2.begin(115200, SERIAL_8N1, 18, 17); // CoreS3
/* Init module */
module_llm.begin(&Serial2);
/* Make sure module is connected */
M5.Display.printf(">> Check ModuleLLM connection..\n");
while (1) {
if (module_llm.checkConnection()) {
break;
}
}
/* Begin voice assistant preset */
M5.Display.printf(">> Begin voice assistant..\n");
int ret = voice_assistant.begin("HELLO");
if (ret != MODULE_LLM_OK) {
while (1) {
M5.Display.setTextColor(TFT_RED);
M5.Display.printf(">> Begin voice assistant failed\n");
}
}
/* Register on ASR data callback function */
voice_assistant.onAsrDataInput(on_asr_data_input);
/* Register on LLM data callback function */
voice_assistant.onLlmDataInput(on_llm_data_input);
M5.Display.printf(">> Voice assistant ready\n");
}
void loop()
{
/* Keep voice assistant preset update */
voice_assistant.update();
}
enum ModuleLLMErrorCode_t {
MODULE_LLM_OK = 0,
MODULE_LLM_RESET_WARN = -1,
MODULE_LLM_JSON_FORMAT_ERROR = -2,
MODULE_LLM_ACTION_MATCH_FAILED = -3,
MODULE_LLM_INFERENCE_DATA_PUSH_FAILED = -4,
MODULE_LLM_MODEL_LOADING_FAILED = -5,
MODULE_LLM_UNIT_NOT_EXIST = -6,
MODULE_LLM_UNKNOWN_OPERATION = -7,
MODULE_LLM_UNIT_RESOURCE_ALLOCATION_FAILED = -8,
MODULE_LLM_UNIT_CALL_FAILED = -9,
MODULE_LLM_MODEL_INIT_FAILED = -10,
MODULE_LLM_MODEL_RUN_FAILED = -11,
MODULE_LLM_MODULE_NOT_INITIALISED = -12,
MODULE_LLM_MODULE_ALREADY_WORKING = -13,
MODULE_LLM_MODULE_NOT_WORKING = -14,
MODULE_LLM_NO_UPDATEABLE_MODULES = -15,
MODULE_LLM_NO_MODULES_AVAILABLE_FOR_UPDATE = -16,
MODULE_LLM_FILE_OPEN_FAILED = -17,
MODULE_LLM_WAIT_RESPONSE_TIMEOUT = -97,
MODULE_LLM_RESPONSE_PARSE_FAILED = -98,
MODULE_LLM_ERROR_NONE = -99,
};