Ready-to-Use AI for Audio and Speech

Process and analyze audio and speech signals with pretrained AI

Audio Toolbox and the Audio Toolbox Interface for SpeechBrain and Torchaudio Libraries enable advanced signal processing and analysis tasks on audio and speech signals with pretrained AI models.

Using individual function calls and without requiring any deep learning expertise, you can:

Transcribe speech with automatic speech recognition (ASR) using speech-to-text (STT) pipelines
Synthesize speech using text-to-speech (TTS) pipelines
Detect speech with voice activity detection (VAD), identify spoken languages, and classify sounds
Enroll and identify speakers via speaker recognition deep learning models and machine learning pipelines
Separate speech sources in a cocktail party problem and enhance and denoise speech signals
Estimate musical pitch and extract embeddings from audio, speech, and music signals

The functions use pretrained machine learning and deep learning models, and are run using a combination of MATLAB, Python^®, and PyTorch^®.

Pictogram depicting the use of a network inside a headset able to translate between speech and text.

Audio Toolbox Interface for SpeechBrain and Torchaudio Libraries

The Audio Toolbox Interface for SpeechBrain and Torchaudio Libraries enables the use of a collection of pretrained AI models with Audio Toolbox functions for signal processing and signal analysis.

The interface automates the installation of Python and PyTorch, and it downloads selected deep learning models from the SpeechBrain and Torchaudio libraries. Once installed, it runs the following functions through the underlying use of local AI models:

speech2text accepts a speechClient object with the model set to emformer or whisper, in addition to the local wav2vec model, and the cloud service options like Google, IBM, Microsoft, and Amazon. Using whisper also requires downloading the model weights separately, as described in Download Whisper Speech-to-Text Model.
text2speech accepts a speechClient object with the model set to hifigan, in addition to the cloud service options like Google, IBM, Microsoft, and Amazon.

The speech2text and text2speech functions accept and return text strings and audio samples. These functions do not require you to code any signal preprocessing, feature extraction, model prediction, and output postprocessing.

Speech client object with list of model options.

Code using speech2text function with non-default speechClient object for Whisper model in translation mode.

Translate and Transcribe Multilanguage Speech Using Whisper

Documentation

Code using text2speech function for generating synthetic speech from text.

Synthesize Speech from Text Using a Local Model

Documentation

Signal labeler app with labeled speech signal and waveform overlay identifying spoken words with their transcription.

Label Speech Recordings Using Speech-to-Text in Signal Labeler

Documentation

Ready-to-Use AI with Additional Functions for Speech and Audio

Audio Toolbox includes additional functions, such as classifySound, separateSpeakers, enhanceSpeech, detectspeechnn, pitchnn, and identifyLanguage. These functions let you use advanced deep learning models for processing and analyzing audio signals without requiring AI expertise. These models do not require the Audio Toolbox Interface for SpeechBrain and Torchaudio Libraries.

Sound signal plot with tagged segments, highlighting specific sound classes.

Sound Classification with classifySound

Documentation

Four plots: original audio mix, two separated speech components, and the residual signal with negligible amplitude.

Speech Source Separation with separateSpeakers

Documentation

Two-by-two plot grid showing two different signals and time-frequency spectrograms, highlighting the differences between the original recording and its enhanced version.

Speech Enhancement with enhanceSpeech

Documentation

Using MATLAB with PyTorch for Deep Learning Model Development

MATLAB and PyTorch users who are familiar with deep learning can use both languages together to develop and train AI models, including through co-execution and model exchange workflows.

Learn more:

Download Support Package

Start using Audio Toolbox Interface for SpeechBrain and Torchaudio Libraries.

Download now

Have Questions?

Talk to a technical expert

Email us