Building an AI Voice Assistant with M5Stack CoreS3 SE and LLM Module Kit: A Complete Tutorial

The world of edge AI is rapidly evolving, and M5Stack has made it incredibly accessible with their CoreS3 SE and LLM Module Kit combination. In this comprehensive tutorial, we’ll walk through creating a fully functional offline AI voice assistant that can understand speech, process natural language, and respond with synthesized speech—all without requiring an internet connection.

What You’ll Need

Hardware Components

M5Stack CoreS3 SE IoT Controller ($38.90) – A lightweight version of the CoreS3 featuring ESP32-S3, 2.0″ capacitive touch screen, and built-in speaker/microphone
M5Stack LLM Module Kit ($49.90) – Includes the LLM module with AX630C processor and LLM Mate module for connectivity

Software Requirements

Arduino IDE
M5Stack LLM library
USB-C cable for programming

Understanding the Hardware

M5Stack CoreS3 SE Features

The CoreS3 SE is a streamlined version of the popular CoreS3, designed specifically for IoT applications. Key specifications include:

Processor: ESP32-S3 dual-core Xtensa LX7 @ 240MHz
Memory: 16MB Flash, 8MB PSRAM
Display: 2.0″ capacitive touch IPS screen (320×240)
Audio: Built-in 1W speaker and dual microphones
Connectivity: WiFi, USB-C with OTG support
Power Management: AXP2101 chip for efficient power consumption

Compared to the full CoreS3, the SE version omits the camera, proximity sensor, IMU, and compass to focus on core functionality at a lower price point.

LLM Module Kit Capabilities

The LLM Module Kit is where the AI magic happens:

Processor: AX630C SoC with dual Cortex-A53 @ 1.2GHz
AI Performance: 3.2 TOPS @ INT8 precision
Memory: 4GB LPDDR4 (1GB system + 3GB for AI acceleration)
Storage: 32GB eMMC with pre-installed Ubuntu system
AI Functions: KWS (wake word), ASR (speech recognition), LLM (language model), TTS (text-to-speech)
Power Consumption: Only ~1.5W at full load

Setting Up the Development Environment

Step 1: Arduino IDE Configuration

Install the latest Arduino IDE
Add the M5Stack board package to your board manager
Select “M5Stack CoreS3” as your target board
Install the M5Stack LLM library from the Library Manager

Step 2: Library Installation

Search for and install the “M5Module LLM” library. This provides all the necessary functions to communicate with the LLM module and handle AI operations.

The Complete Voice Assistant Code

Here’s the actual working code for our voice assistant application, taken directly from the M5Stack examples:

/*
 * SPDX-FileCopyrightText: 2024 M5Stack Technology CO LTD
 *
 * SPDX-License-Identifier: MIT
 */
#include <Arduino.h>
#include <M5Unified.h>
#include <M5ModuleLLM.h>

M5ModuleLLM module_llm;
M5ModuleLLM_VoiceAssistant voice_assistant(&module_llm);

/* On ASR data callback */
void on_asr_data_input(String data, bool isFinish, int index)
{
    M5.Display.setTextColor(TFT_GREEN, TFT_BLACK);
    // M5.Display.setFont(&fonts::efontCN_12);  // Support Chinese display
    M5.Display.printf(">> %s\n", data.c_str());

    /* If ASR data is finish */
    if (isFinish) {
        M5.Display.setTextColor(TFT_YELLOW, TFT_BLACK);
        M5.Display.print(">> ");
    }
};

/* On LLM data callback */
void on_llm_data_input(String data, bool isFinish, int index)
{
    M5.Display.print(data);

    /* If LLM data is finish */
    if (isFinish) {
        M5.Display.print("\n");
    }
};

void setup()
{
    M5.begin();
    M5.Display.setTextSize(2);
    M5.Display.setTextScroll(true);

    /* Init module serial port */
    // int rxd = 16, txd = 17;  // Basic
    // int rxd = 13, txd = 14;  // Core2
    // int rxd = 18, txd = 17;  // CoreS3
    int rxd = M5.getPin(m5::pin_name_t::port_c_rxd);
    int txd = M5.getPin(m5::pin_name_t::port_c_txd);
    Serial2.begin(115200, SERIAL_8N1, rxd, txd);

    /* Init module */
    module_llm.begin(&Serial2);

    /* Make sure module is connected */
    M5.Display.printf(">> Check ModuleLLM connection..\n");
    while (1) {
        if (module_llm.checkConnection()) {
            break;
        }
    }

    /* Begin voice assistant preset */
    M5.Display.printf(">> Begin voice assistant..\n");
    int ret = voice_assistant.begin("HELLO");
    // int ret = voice_assistant.begin("你好你好", "", "zh_CN"); // Chinese kws and asr
    if (ret != MODULE_LLM_OK) {
        while (1) {
            M5.Display.setTextColor(TFT_RED);
            M5.Display.printf(">> Begin voice assistant failed\n");
        }
    }

    /* Register on ASR data callback function */
    voice_assistant.onAsrDataInput(on_asr_data_input);

    /* Register on LLM data callback function */
    voice_assistant.onLlmDataInput(on_llm_data_input);

    M5.Display.printf(">> Voice assistant ready\n");
}

void loop()
{
    /* Keep voice assistant preset update */
    voice_assistant.update();
}

Code Breakdown and Explanation

Library Includes and Object Creation

#include <Arduino.h>
#include <M5Unified.h>
#include <M5ModuleLLM.h>

M5ModuleLLM module_llm;
M5ModuleLLM_VoiceAssistant voice_assistant(&module_llm);

The code starts by including the necessary libraries:

M5Unified.h: Provides unified access to all M5Stack hardware features
M5ModuleLLM.h: Contains the LLM module communication and AI functionality

Two main objects are created:

module_llm: Handles low-level communication with the LLM module
voice_assistant: Provides high-level voice assistant functionality

ASR (Speech Recognition) Callback Function

void on_asr_data_input(String data, bool isFinish, int index)
{
    M5.Display.setTextColor(TFT_GREEN, TFT_BLACK);
    M5.Display.printf(">> %s\n", data.c_str());

    if (isFinish) {
        M5.Display.setTextColor(TFT_YELLOW, TFT_BLACK);
        M5.Display.print(">> ");
    }
}

This callback function is triggered whenever the speech recognition system processes audio input:

Real-time Display: Shows recognized text in green as it’s being processed
Progressive Updates: The isFinish parameter indicates when speech recognition is complete
Visual Feedback: Changes text color to yellow when ready for the next input
User Experience: Provides immediate visual feedback of what the system “heard”

LLM (Language Model) Response Callback

void on_llm_data_input(String data, bool isFinish, int index)
{
    M5.Display.print(data);

    if (isFinish) {
        M5.Display.print("\n");
    }
}

This callback handles the AI’s response generation:

Streaming Output: Displays the AI response as it’s being generated (token by token)
Natural Flow: Creates a typewriter effect showing the AI “thinking” and responding
Completion Handling: Adds a newline when the response is complete

Setup Function – Hardware Initialization

void setup()
{
    M5.begin();
    M5.Display.setTextSize(2);
    M5.Display.setTextScroll(true);

The setup begins with basic hardware initialization:

M5.begin(): Initializes all M5Stack hardware components
setTextSize(2): Sets readable text size for the 2″ display
setTextScroll(true): Enables automatic scrolling when text fills the screen

Serial Communication Setup

    int rxd = M5.getPin(m5::pin_name_t::port_c_rxd);
    int txd = M5.getPin(m5::pin_name_t::port_c_txd);
    Serial2.begin(115200, SERIAL_8N1, rxd, txd);

This section configures the serial communication between CoreS3 and the LLM module:

Dynamic Pin Assignment: Uses M5.getPin() to automatically get the correct pins for different M5Stack models
Port C Connection: The LLM module connects via the Port C interface
Standard Settings: 115200 baud rate with 8 data bits, no parity, 1 stop bit
Hardware Flexibility: The commented lines show pin configurations for different M5Stack models

Module Connection and Verification

    module_llm.begin(&Serial2);

    M5.Display.printf(">> Check ModuleLLM connection..\n");
    while (1) {
        if (module_llm.checkConnection()) {
            break;
        }
    }

This critical section ensures reliable communication:

Module Initialization: Starts the LLM module communication
Connection Verification: Continuously checks until the module responds
User Feedback: Displays connection status on screen
Robust Startup: Won’t proceed until connection is established

Voice Assistant Configuration

    M5.Display.printf(">> Begin voice assistant..\n");
    int ret = voice_assistant.begin("HELLO");
    // int ret = voice_assistant.begin("你好你好", "", "zh_CN"); // Chinese kws and asr
    if (ret != MODULE_LLM_OK) {
        while (1) {
            M5.Display.setTextColor(TFT_RED);
            M5.Display.printf(">> Begin voice assistant failed\n");
        }
    }

The voice assistant initialization includes:

Wake Word Setup: Configures “HELLO” as the activation phrase
Language Support: Shows how to enable Chinese wake words and ASR
Error Handling: Displays failure message if initialization fails
System Reliability: Prevents operation if the assistant can’t start properly

Callback Registration

    voice_assistant.onAsrDataInput(on_asr_data_input);
    voice_assistant.onLlmDataInput(on_llm_data_input);

This registers our callback functions:

Event-Driven Architecture: The system calls these functions when events occur
Separation of Concerns: Display logic is separated from AI processing
Customizable Responses: Easy to modify how the system responds to events

Main Loop – Continuous Operation

void loop()
{
    voice_assistant.update();
}

The main loop is elegantly simple:

Single Responsibility: Only needs to update the voice assistant
Internal Processing: All AI operations happen inside the update() function
Efficient Design: Minimal overhead in the main loop
Event Handling: Callbacks handle all user interactions and responses

How the Complete System Works

1. Initialization Sequence

Hardware Setup: CoreS3 initializes display, audio, and communication systems
Module Connection: Establishes serial communication with LLM module
AI Loading: LLM module loads the Qwen2.5-0.5B model into memory
Voice Assistant Start: Configures wake word detection and callback functions

2. Runtime Operation

Continuous Listening: System monitors for the “HELLO” wake word
Speech Capture: When activated, captures and processes audio input
Real-time Display: Shows recognized speech in green text as it’s processed
AI Processing: Sends recognized text to the language model
Response Generation: AI generates response and streams it to display
Audio Output: Text-to-speech converts response to audio

3. User Interaction Flow

User says "Hello" → Wake word detected → System activates
User speaks command → ASR processes → Text displayed in green
AI processes request → LLM generates response → Text streams to display
TTS speaks response → System returns to listening state

Compilation and Deployment

Building the Project

Open Arduino IDE and load the voice assistant example
Select “M5Stack CoreS3” as your board
Choose the correct COM port when CoreS3 is connected
Click “Upload” to compile and flash the firmware
Compilation typically takes 1-2 minutes depending on your system

Hardware Assembly

Ensure both devices are powered off
Align the LLM module with the CoreS3 SE’s M5Bus connector
Press firmly until the modules click together
Power on the system – the LLM module’s LED should turn green

Testing Your AI Assistant

Startup Sequence

When you power on the system, you’ll see:

“Check ModuleLLM connection..” – Establishing communication
“Begin voice assistant..” – Loading AI models
“Voice assistant ready” – System ready for use

Example Interactions

Based on the actual demo, here are real interactions you can try:

Basic Conversation:

Say: “Hello”
Response: “Hi, how can I help you today?”

Identity Questions:

Say: “Hello, what is your name?”
Response: “I’m a large language model created by Qwen”

Capability Inquiry:

Say: “Hello, what can you do?”
Response: Detailed explanation of language model capabilities including translation, question answering, and text generation

Translation Requests:

Say: “Hello, translate ‘How are you?’ to Spanish”
Response: “¿Cómo estás?”

Language Support:

Say: “Hello, do you know Spanish?”
Response: “Sí, conozco español” (Yes, I know Spanish)

Advanced Features and Customization

Multi-Language Support

The code includes commented lines showing Chinese language support:

// int ret = voice_assistant.begin("你好你好", "", "zh_CN"); // Chinese kws and asr

This demonstrates how to:

Set Chinese wake words (“你好你好” – “Hello Hello”)
Configure Chinese ASR (Automatic Speech Recognition)
Support multiple languages in the same application

Display Customization

The code shows how to enable Chinese character display:

// M5.Display.setFont(&fonts::efontCN_12);  // Support Chinese display

Model Management

The LLM module supports multiple AI models through its apt-based package system:

Language Models: Qwen2.5-0.5B, Qwen2.5-1.5B, Llama-3.2-1B
Vision Models: InternVL2_5-1B-MPO, YOLO11
Speech Models: Whisper-tiny, Whisper-base, MeloTTS

Performance Characteristics

Power Consumption

Based on the specifications:

Standby Mode: 104.64µA @ 4.2V (battery powered)
Active Mode: 166.27mA @ 5V (USB powered)
AI Processing: ~1.5W during inference

Response Times

From the demo video, typical response times are:

Wake Word Detection: Instant recognition
Speech Recognition: 1-2 seconds for short phrases
AI Processing: 2-4 seconds for simple queries
Complex Responses: 5-10 seconds for detailed explanations

Memory Usage

The system efficiently manages its 4GB of memory:

System Operations: 1GB reserved for Ubuntu and basic functions
AI Acceleration: 3GB dedicated to model inference and caching
Model Storage: 32GB eMMC holds multiple models and system files

Troubleshooting Common Issues

Connection Problems

If you see “Check ModuleLLM connection..” stuck on screen:

Verify proper module stacking alignment
Check that both devices are powered
Ensure the LLM module’s green LED is illuminated
Try reseating the connection

Compilation Errors

Common Arduino IDE issues:

// If you get pin definition errors, manually set pins:
int rxd = 18, txd = 17;  // For CoreS3
// int rxd = 13, txd = 14;  // For Core2
// int rxd = 16, txd = 17;  // For Basic

Performance Optimization

For better response times:

Allow 30-60 seconds for full system initialization
Speak clearly and at moderate pace
Keep queries concise for faster processing
Ensure stable power supply (USB recommended for development)

Audio Issues

If speech recognition isn’t working:

Check that the built-in microphones aren’t blocked
Ensure you’re saying “HELLO” clearly as the wake word
Verify the speaker is producing audio output
Test in a quiet environment initially

Real-World Applications

Smart Home Integration

The voice assistant can be extended for home automation:

// Example: Custom command handling
void on_asr_data_input(String data, bool isFinish, int index) {
    if (isFinish) {
        if (data.indexOf("turn on lights") >= 0) {
            // Add your smart home control code here
            controlLights(true);
        }
        // Continue with normal display
        M5.Display.setTextColor(TFT_GREEN, TFT_BLACK);
        M5.Display.printf(">> %s\n", data.c_str());
    }
}

Educational Projects

Perfect for teaching AI concepts:

Machine Learning: Demonstrate real-time inference
Natural Language Processing: Show speech-to-text and text-to-speech
Edge Computing: Explain offline AI processing
IoT Development: Integrate with sensors and actuators

Industrial Applications

Suitable for environments requiring:

Offline Operation: No internet dependency
Privacy Protection: All processing happens locally
Low Latency: Immediate response without cloud delays
Reliability: Consistent performance without network issues

Prototyping Platform

Ideal for rapid development:

Quick Iteration: Fast compile and deploy cycle
Modular Design: Easy to add sensors and actuators
Scalable Architecture: Can grow from prototype to production
Community Support: Extensive documentation and examples

Advanced Code Modifications

Custom Wake Words

To change the wake word, modify the setup function:

// Change from "HELLO" to custom wake word
int ret = voice_assistant.begin("JARVIS");  // Custom wake word

Enhanced Error Handling

Add more robust error checking:

void setup() {
    // ... existing setup code ...

    // Enhanced connection checking with timeout
    int connection_attempts = 0;
    M5.Display.printf(">> Check ModuleLLM connection..\n");
    while (connection_attempts < 30) {  // 30 second timeout
        if (module_llm.checkConnection()) {
            break;
        }
        delay(1000);
        connection_attempts++;
        M5.Display.printf("Attempt %d/30\n", connection_attempts);
    }

    if (connection_attempts >= 30) {
        M5.Display.setTextColor(TFT_RED);
        M5.Display.printf(">> Connection failed after 30 seconds\n");
        while(1) delay(1000);  // Stop execution
    }
}

Custom Response Processing

Add intelligence to response handling:

void on_llm_data_input(String data, bool isFinish, int index) {
    // Store complete responses for further processing
    static String complete_response = "";

    complete_response += data;
    M5.Display.print(data);

    if (isFinish) {
        M5.Display.print("\n");

        // Process complete response
        if (complete_response.indexOf("temperature") >= 0) {
            // Trigger temperature sensor reading
            displayTemperature();
        }

        complete_response = "";  // Reset for next response
    }
}

Future Enhancements

Model Upgrades

The system supports model updates through:

APT Package Manager: apt update && apt upgrade on the LLM module
SD Card Updates: Load new firmware via microSD card
OTA Updates: Over-the-air updates via WiFi

Integration Possibilities

Extend functionality with additional M5Stack modules:

Environmental Sensors: Add temperature, humidity, air quality monitoring
Camera Module: Enable visual AI capabilities
GPS Module: Location-aware responses
LoRaWAN Module: Long-range communication for remote deployments

API Integration

Enable OpenAI API compatibility:

# On the LLM module's Ubuntu system
apt install openai-api-plugin

This allows integration with existing OpenAI-compatible applications and services.

Conclusion

The M5Stack CoreS3 SE and LLM Module Kit combination represents a remarkable achievement in making advanced AI accessible to developers and hobbyists. The provided code demonstrates how sophisticated AI capabilities can be implemented with minimal complexity, thanks to M5Stack’s well-designed libraries and hardware integration.

Key takeaways from this tutorial:

Technical Excellence

Efficient Architecture: The callback-based design ensures responsive user interaction
Robust Communication: Serial communication with proper error handling
Modular Design: Easy to extend and customize for specific applications

Practical Benefits

Complete Offline Operation: No cloud dependency ensures privacy and reliability
Low Power Consumption: Suitable for battery-powered applications
Rapid Development: From concept to working prototype in minutes
Professional Results: Production-quality voice interaction

Educational Value

Real AI Implementation: Hands-on experience with modern AI technologies
Hardware Integration: Learn how software and hardware work together
Scalable Learning: Start simple, add complexity as skills develop

The future of AI is moving to the edge, and this tutorial shows how accessible that future has become. Whether you’re building smart home devices, educational tools, industrial applications, or just exploring AI capabilities, the M5Stack ecosystem provides a solid foundation for innovation.

Start with the provided code, experiment with modifications, and gradually build more sophisticated applications. The combination of powerful hardware, comprehensive software libraries, and extensive community support makes this an ideal platform for both learning and professional development.

Ready to build your own AI applications? The CoreS3 SE and LLM Module Kit are available from the M5Stack store, complete with documentation, examples, and community support to help you succeed in your AI journey.