Main Content

Deploy Smart Speaker System on Raspberry Pi Using Simulink

This example demonstrates how to deploy a smart speaker system on Raspberry Pi® using Simulink®. A smart speaker is a speaker that can be controlled by your voice. You run the smart speaker Simulink model on Raspberry Pi in External Mode. The voice commands are captured through the USB microphone connected to your Raspberry Pi board. You can optionally input voice commands through the pre-recorded files. The smart speaker model plays the audio on the speaker connected to the Raspberry Pi. You make the smart speaker play music with the command "Go". You make it stop playing music by saying "Stop". You increase or decrease the music volume with the commands "Up" and "Down", respectively. For details about modeling the various modules used in the smart speaker model, see Model Smart Speaker in Simulink.

Smart Speaker Model

The model can be divided into four sub-modules that perform four sub-tasks

  1. Capture 16-bit speech samples and convert them to single precision format in the range [-1,1)

  2. Recognize speech commands

  3. Prepare audio frame based on the recognized speech commands

  4. Convert audio samples to 16-bit signed integer format and play the audio on Raspberry Pi

modelName = "AudioSmartSpeakerOnRaspberryPi";
open_system(modelName)

Configure Audio I/O Blocks

The smart speaker model uses the ALSA Audio Capture (Simulink) block to capture the voice commands from a microphone connected to your Raspberry Pi board. The model uses the ALSA Audio Playback (Simulink) block to play the audio on a speaker connected to your Raspberry Pi board. The ALSA Audio IO blocks come with Simulink Support Package for Raspberry Pi Hardware. After connecting the microphone and speaker to your Raspberry Pi board, you list the audio capture and audio playback devices using listAudioDevices (Simulink).

r = raspi("raspiname","pi","password");
audioCaptureDevicesList = listAudioDevices(r,"capture");
audioPlaybackDevicesList = listAudioDevices(r,"playback");

You set the Device name in the ALSA Audio Capture:Block Parameters dialog to the device of your choice from audioCaptureDevicesList. Similarly, you configure the Device name in the ALSA Audio Playback:Block Parameters dialog to the playback device of your choice from audioPlaybackDevicesList.

Display the details of an audio capture and audio playback device from audioCaptureDevicesList and audioPlaybackDevicesList.

audioCaptureDevicesList(1)
ans =
   struct with fields:
             Name: 'USB-Audio-LogitechUSBHeadsetH340-LogitechInc.LogitechUSBHeadsetH340atusb-0000:01:00.0-1.2,fullspeed'
           Device: '2,0'
         Channels: {'2'}
         BitDepth: {'16-bit integer'}
     SamplingRate: {'44100'}
audioPlaybackDevicesList(3)
ans =
   struct with fields:
             Name: 'USB-Audio-LogitechUSBHeadsetH340-LogitechInc.LogitechUSBHeadsetH340atusb-0000:01:00.0-1.2,fullspeed'
           Device: '2,0'
         Channels: {'2'}
         BitDepth: {'16-bit integer'}
     SamplingRate: {'44100'}

To use the above devices, you set the Device name in the ALSA Audio Capture:Block Parameters and ALSA Audio Capture:Block Parameters dialog to plughw:2,0. You set the Audio sampling frequency (Hz) to 16000 as the subsequent convolutional neural network (CNN) used to recognize voice commands was trained on a 16000 Hz sampling frequency.

The model provides a manual switch to switch audio from microphone to the pre-recorded audio files. You select the voice commands using the Rotary switch. The model uses four Audio File Read (Simulink) blocks to read the audio files go.wav, stop.wav, up.wav, and down.wav. Note that Audio File Read (Simulink) block is included in Simulink Support Package for Raspberry Pi Hardware.

Modify the Data Type of the Audio Samples

ALSA Audio Capture (Simulink) and Audio File Read (Simulink) blocks outputs 16-bit signed integers audio samples with values in the interval of $[ -2^{15} , 2^{15} -1 ]$. You cast the output of these blocks output to single-precision data and multiply it by $2^{-15}$ to change the numerical range to $[ -1 , +1 )$. Note that you are changing the numerical range because the subsequent blocks expect the audio in the range $[ -1 , +1 )$.

The ALSA Audio Playback (Simulink) block expects 16-bit signed integers as input, hence the output of the preceding block that prepares audio frame must be converted to 16-bit signed integers. The range of the floating-point audio frame samples is $[ -1 , +1 )$. You multiply the floating-point audio frame samples by $2^{15}$ to bring the range to $[ -2^{15} , 2^{15} -1 ]$. After multiplying, you typecast the product to int16 data type. These int16 audio frame samples can be fed to ALSA Audio Playback (Simulink) block. The AudioSmartSpeakerOnRaspberryPi model uses Gain (Simulink) block to multiply the audio samples by the constants $2^{-15}$ or $2^{15}$. It uses Data Type Conversion (Simulink) block to typecast the audio samples to single or int16.

Configure Smart Speaker Model Settings and Run the Model in External Mode

Open the AudioSmartSpeakerOnRaspberryPi model, go to MODELING Tab and Click on Model Settings or press Ctrl+E to open the configuration parameters dialog.

  • Select a solver that supports code generation. Set Solver to auto (Automatic solver selection) and Solver type to Fixed-step.

  • Select Code Generation and set the System Target File to ert.tlc whose Description is Embedded Coder.

  • Set the Language to C++, which will automatically set the Language Standard to C++11 (ISO).

  • In Configuration > Hardware Implementation, set the Hardware board to Raspberry Pi and enter your Raspberry Pi credentials in the Board Parameters.

  • In the same window, set External mode > Communication interface to XCP on TCP/IP.

  • Check Signal logging in Configuration > Data Import/Export to enable signal monitoring in External Mode.

  • Go to the Hardware tab and click on Monitor & Tune to run the model in external mode.