Awesome
Micro speech example for TensorFlow Lite
The program analyzes an audio input with a voice recognition model that can detect 2 keywords - yes and no. The recognized keywords are then printed into a serial interface. The voice recognition model is implemented using TensorFlow Lite for Microcontrollers.
The example project can be executed on Arm Virtual Hardware (AVH) as well as on physical hardware targets.
Structure
The repository is organized as follows:
Folder | Description |
---|---|
./micro_speech/ | Contains the voice recognition model that is used by all targets. This part is similar to the original TF-Lite for Microcontrollers example, with just minor modifications.<br>TensorFlow calculation kernel is provided separately via corresponding software packs listed in Prerequisites. |
./Platform_FVP_Corstone_SSE-300_Ethos-U55/ | Project files specific for Corstone SSE-300 AVH target. |
./Platform_IMXRT1050-EVKB/ | Project files specific for IMXRT1050-EVKB target. |
./Platform_MIMXRT1064-EVK/ | Project files specific for MIMXRT1064-EVK target. |
./VSI/ | Implementation of Audio Streaming Interface for FVP targets with Virtual Streaming Interface (VSI). |
Prerequisites
Toolchain
- IDE (Windows only): Keil MDK - Professional Edition - V5.38 or higher
- alternatively, CMSIS-Toolbox command-line building tools
Arm Virtual Hardware - Corstone and Cortex-M CPUs available via AWS Marketplace
- For targets with VSI support
Specific for HW targets
- NXP MIMXRT1050-EVKB board, or
- NXP MIMXRT1064-EVK board
Note that CMSIS software packs used in the specific project will be requested and installed automatically when using Keil MDK or CMSIS-Build.
Micro speech for AVH (Fast Model) for Corstone SSE-300 with Ethos-U55
Project directory: ./Platform_FVP_Corstone_SSE-300_Ethos-U55/
This example executes the program on Corstone SSE-300 with Ethos-U55 Fixed Virtual Platforms (FVPs).
Example project has the following targets:
Example
: target runs on AVH with Virtual Streaming Interface (VSI)- Uses the VHT_Corstone_SSE-300_Ethos-U55 with VSI support<br> It is required to install the model executable and binaries in order to run this example. The directory with executables needs to be added to the system path.<br/>
- Audio test data is provided by Python script
./VSI/audio/python/arm_vsi0.py
from WAVE filetest.wav
which contains keywords 'Yes' and 'No' alternating three times. - Open the example with Keil MDK (Windows only) using the uVision project
microspeech.uvprojx
and build it for targetExample
. - Alternatively compile with CMSIS-Build using
microspeech.Example.cprj
project. - Run example from the uVision project or standalone with script
run_example.cmd
. - When running the example the audio data input is processed and detected keywords are output to the Telnet terminal with their time stamps in the audio stream. Following output can be observed for the default test.wav file included with the example:
Heard yes (152) @1100ms Heard no (141) @5500ms Heard yes (147) @9100ms Heard no (148) @13600ms Heard yes (147) @17100ms Heard no (148) @21600ms
Example Test
: internal test for Example targetAudio Provider Mock
: runs on AVH- Uses the VHT_Corstone_SSE-300_Ethos-U55 without using VSI<br>
- Audio test data is embedded in the test code and contains keywords 'Yes' and 'No' alternating indefinitely.
- Open the example with Keil MDK (Windows only) using the uVision project
microspeech.uvprojx
and build it for targetAudio Provider Mock
. - Alternatively compile with CMSIS-Build using
microspeech.Audio_Provider_Mock.cprj
project. - Run example from the uVision project or standalone with script
run_audio_provider_mock.cmd
. - When running the example the audio data input is processed and detected keywords are continuously output to the Telnet terminal with their time stamps in the audio stream. Following output can be observed for the default test.wav file included with the example:
Heard silence (149) @400ms Heard yes (158) @1200ms Heard no (143) @5600ms Heard yes (149) @9100ms Heard no (142) @13600ms Heard yes (149) @17100ms Heard no (142) @21600ms
Audio Provider Mock Test
: internal test for Audio Provider Mock target
Micro speech for IMXRT1050-EVKB board
Project directory: ./Platform_IMXRT1050-EVKB/
This example executes the program on NXP IMXRT1050-EVKB development board with an Arm Cortex-M7 processor. It uses the on-board microphone for audio input and prints recognized keywords to the serial interface. One target IMXRT1050-EVK is provided in the project.
The board shall be connected to a PC via USB port J28. Jumper J1 shall connect pins 5-6 to ensure correct power supply in such setup. The project is configured to load the program to on-board Hyper Flash so the boot switch SW7 shall be set to 0110.
Execute the program in following steps:
- Build example with MDK using uVision project
microspeech.uvprojx
or CMSIS-Build usingmicrospeech.IMXRT1050-EVKB.cprj
project. - Program and run the example with MDK or use Drag-and-drop programming through the DAP-Link drive.
- Open the DAP-Link Virtual COM port in a terminal (baudrate = 115200) and monitor recognized keywords.
Micro speech for MIMXRT1064-EVK board
Project directory: ./Platform_MIMXRT1064-EVK/
This example executes the program on NXP MIMXRT1064-EVK development board with an Arm Cortex-M7 processor. It uses the on-board microphone for audio input and prints recognized keywords to the serial interface. One target MIMXRT1064-EVK is provided in the project.
The board shall be connected to a PC via USB port J41. Jumper J1 shall connect the pins 5-6 to ensure correct power supply in such setup. The project is configured to load the program to on-board QSPI NOR flash so the boot switch SW7 shall be set to 0010.
Execute the program in following steps:
- Build example with MDK using uVision project
microspeech.uvprojx
or CMSIS-Build usingmicrospeech.MIMXRT1064-EVK.cprj
project. - Program and run the example with MDK or use Drag-and-drop programming through the DAP-Link drive.
- Open the DAP-Link Virtual COM port in a terminal (baudrate = 115200) and monitor recognized keywords.
TensorFlow-Lite kernel variants
The micro speech example uses tensorflow-lite-micro pack that contains Machine Learning software component implementing among others the universal kernel for executing TensorFlow ML operations independent from the actual load type (audio, video, or others).
Implementation of these kernel operations is available in several variants optimized for Arm targets. When using the uVision project the variant can be selected in Manage Run-Time Environment window as shown on the picture below.
When using CMSIS-Build the kernel variant is specified in the .cprj project file in the line:
<component Cclass="Machine Learning" Cgroup="TensorFlow" Csub="Kernel" Cvariant="CMSIS-NN" Cvendor="tensorflow"/>
Following kernel variants are available:
- CMSIS-NN<br>Optimizes execution on Cortex-M devices by using mathematical functions from CMSIS-NN software library.<br> Underlying implementations automatically utilize target-specific hardware extensions such as Helium (or MVE: M-Profile Vector Extensions) and SIMD (Single Instruction Multiple Data) and so maximize computing performance and reduce code footprint.<br> For devices with MVE (such as Cortex-M55), there is additional configuration field Vector Extensions in the Options for Target..-Target dialog, that specifies MVE use in the project. <br>
-
Ethos-U<br>(Not functional at this moment). Uses implementation optimized for Arm Ethos-U NPUs.
-
Reference<br>Target-independent software implementation is used. Fallback solution for operations that cannot be optimized for target hardware.
Performance measurement with Event Statistics
File ./micro_speech/src/main_functions.cc
in the repository example contains Event Statistics annotations that allow to measure performance of ML algorithms in different configurations. This currently works only in setup with Keil MDK. This video demonstrates program execution including the views into Event Statistics.
There are three events defined:
-
Event C0: signal processing part that creates a spectrogram from the current data slice.
-
Event C1: ML inference algorithm for the created spectrogram. Most compute-intensive part.
-
Event C2: Verifies if a keyword was detected.
After executing the program with different TF-Lite kernel variants it can be observed that implementations optimized for Arm hardware extensions achieve significantly better performance for ML inference (C1 measurement), but also have shorter times for signal processing (C0 event).