visqol

Objective metric for perceived audio quality

Since R2024a

collapse all in page

Syntax

metric = visqol(degraded,reference,fs)

metric = visqol(degraded,reference,fs,Name=Value)

[metric,ftable] = visqol(___)

[metric,ftable,ttable] = visqol(___)

Description

metric = visqol(degraded,reference,fs) returns the mean opinion score (MOS) calculated by the Virtual Speech Quality Objective Listener (ViSQOL) metric. This metric compares the degraded speech or audio signal with a clean reference signal to measure the perceived audio quality.

example

metric = visqol(degraded,reference,fs,Name=Value) specifies options using one or more name-value arguments. For example, visqol(degraded,reference,fs,Mode="speech") computes the ViSQOL metric for speech signals.

example

[metric,ftable] = visqol(___) also returns a table containing statistics for each gammatone frequency band.

example

[metric,ftable,ttable] = visqol(___) also returns a table containing timing information on the matching of patches between the degraded and reference signals.

example

Examples

collapse all

Measure Audio Quality with ViSQOL

Open Live Script

Read in an audio signal and average together the stereo channels to convert it to mono. Listen to the audio with sound.

[rockdrums,fs] = audioread("RockDrums-48-stereo-11secs.mp3");
rockdrums = mean(rockdrums,2);
sound(rockdrums,fs)

Create two noisy signals with different levels of additive pink noise.

noisy1 = rockdrums + 0.1*pinknoise(size(rockdrums));
noisy2 = rockdrums + 0.5*pinknoise(size(rockdrums));

Listen to the first noisy signal.

sound(noisy1,fs)

Listen to the second noisy signal.

sound(noisy2,fs)

Use visqol with the clean reference signal to measure the audio quality of both noisy signals.

mos1 = visqol(noisy1,rockdrums,fs)

mos1 = 
4.2153

mos2 = visqol(noisy2,rockdrums,fs)

mos2 = 
3.2949

Use ViSQOL to Evaluate Enhanced Speech Signal

This example uses:

Open Live Script

Read in an audio file containing speech and noise. Also read in an audio file containing the original clean speech to use as a reference signal.

[noisySpeech,fs] = audioread("NoisySpeech-16-mono-3secs.ogg");
reference = audioread("CleanSpeech-16-mono-3secs.ogg");

Calculate the ViSQOL metric for the noisy speech signal using visqol.

noisySpeechMOS = visqol(noisySpeech,reference,fs,Mode="speech")

noisySpeechMOS = 2.9550

Use enhanceSpeech to enhance the speech signal. Evaluate the enhanced signal using the ViSQOL metric and see the improvement compared to the noisy signal.

enhancedSpeech = enhanceSpeech(noisySpeech,fs);
enhancedSpeechMOS = visqol(enhancedSpeech,reference,fs,Mode="speech")

enhancedSpeechMOS = single
    3.2205

Examine ViSQOL Frequency and Timing Information Tables

Open Live Script

Read in an audio signal and average together the stereo channels to convert it to mono.

[rockdrums,fs] = audioread("RockDrums-48-stereo-11secs.mp3");
rockdrums = mean(rockdrums,2);

Create a noisy signal by adding pink noise. Simulate latency and packet loss by adding zeros to the beginning and removing samples from the signal.

noisy = rockdrums + 0.5*pinknoise(size(rockdrums));
noisy = [zeros(800,1); noisy([1:60000 60001+fs/10:end],1)];

Call visqol with additional output arguments to get information about the frequency bands and timing alignment used in the ViSQOL computation. The frequency table, ftable, contains statistics about the NSIM for each gammatone frequency band. The timing table, ttable, contains information about the timing alignment between the reference and degraded signals.

[metrics,ftable,ttable] = visqol(noisy,rockdrums,fs,OutputMetric="MOS and NSIM")

metrics = 1×2

    3.2639    0.7549

ftable=32×5 table
    FrequencyBand    FVNSIM     FVNSIM10    FVNSIMSTD    DegradedEnergy
    _____________    _______    ________    _________    ______________

           50        0.72699    0.37752      0.43309         23.974    
       91.748        0.81116      0.562      0.30221         22.413    
       139.75        0.83848     0.6642      0.31742         22.104    
       194.93        0.87307    0.50747      0.29136         24.002    
       258.38        0.88401    0.58191      0.24084         22.485    
       331.33        0.82519    0.60645      0.28942         20.694    
       415.19        0.77425    0.54247       0.3168          20.48    
       511.62        0.70612    0.44807      0.40192          19.82    
       622.48        0.61074     0.3624      0.47376         18.911    
       749.95        0.57177    0.30356      0.46667         18.545    
       896.49        0.63006    0.35169      0.42972         18.668    
         1065        0.73258    0.53353      0.33579         18.228    
       1258.7        0.76097    0.44779      0.32103          18.81    
       1481.4        0.81142    0.54293       0.2684         18.695    
       1737.5        0.84971    0.45247      0.26654         19.418    
       2031.9        0.91892    0.58922      0.17226         19.147    
      ⋮

ttable=18×4 table
    PatchIndex    Similarity    ReferencePatch    DegradedPatch
    __________    __________    ______________    _____________

         1         0.77977       0.28    0.88     0.38    0.98 
         2         0.54941       0.88    1.48     0.98    1.58 
         3         0.74057       1.48    2.08     1.48    2.08 
         4         0.76372       2.08    2.68     2.08    2.68 
         5         0.76232       2.68    3.28     2.68    3.28 
         6          0.6989       3.28    3.88     3.28    3.88 
         7         0.79208       3.88    4.48     3.88    4.48 
         8         0.79986       4.48    5.08     4.48    5.08 
         9         0.80775       5.08    5.68     5.08    5.68 
        10         0.83136       5.68    6.28     5.68    6.28 
        11         0.75019       6.28    6.88     6.28    6.88 
        12         0.71107       6.88    7.48     6.88    7.48 
        13         0.76068       7.48    8.08     7.48    8.08 
        14         0.76206       8.08    8.68     8.08    8.68 
        15         0.78091       8.68    9.28     8.68    9.28 
        16         0.71875       9.28    9.88     9.28    9.88 
      ⋮

The ReferencePatch and DegradedPatch columns in ttable display the start and end times of the patch, in seconds, within the reference and degraded signals, respectively. See how the function aligned the signals after the simulated latency and packet loss.

Input Arguments

collapse all

`degraded` — Degraded audio signal
column vector

Degraded audio signal, specified as a column vector (single channel).

Data Types: single | double

`reference` — Reference audio signal
column vector

Reference audio signal, specified as a column vector (single channel).

Data Types: single | double

`fs` — Sample rate (Hz)
positive scalar

Sample rate in Hz, specified as a positive scalar.

Data Types: single | double

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: Mode="speech"

`Mode` — ViSQOL mode
`"audio"` (default) | `"speech"`

ViSQOL mode, specified as "audio" or "speech".

"audio" — Compute the metric for a generic audio signal. The recommended sample rate is 48 kHz.
"speech" — Compute the metric for a speech signal. The recommended sample rate is 16 kHz. In speech mode, the function uses voice activity detection to identify relevant parts of the signal.

Data Types: char | string

`OutputMetric` — Output metric
`"MOS"` (default) | `"NSIM"` | `"MOS and NSIM"`

Output metric, specified as "MOS", "NSIM", or "MOS and NSIM".

"MOS" — The output is a scalar representing the mean opinion score (MOS) in the range [1,5], where a higher value corresponds to higher quality.
"NSIM" — The output is a scalar representing the neurogram similarity index measure (NSIM) [2] in the range [-1,1], where 1 corresponds to a perfect similarity between the degraded and reference signals. In practice, the NSIM is generally in the range [0,1].
"MOS and NSIM" — The output is a two-element row vector with both metrics in the form [mos nsim], where the first element is the MOS value and the second element is the NSIM value.

Data Types: char | string

`ScaleMOS` — Scale MOS
`true` (default) | `false`

Scale MOS, specified as true or false. When you set this argument to true, a similarity of 1 produces an MOS of 5. If you set this argument to false, a similarity of 1 produces an MOS less than 5.

This argument only applies if the Mode is speech.

Data Types: logical

`SearchWindowSize` — Size of search window
60 (default) | nonnegative integer

Size of search window for aligning the signals, specified as a nonnegative integer. The search window size determines how many signal patches the function searches through to align the reference and degraded signals in time. For each patch in the reference signal, the function searches through 2*L+1 patches in the degraded signal, where L is the size of the search window.

A larger window helps find patches that have further deviated for reasons such as packet loss. A small or zero-length window requires less computation but does not handle large latency variations.

Output Arguments

collapse all

`metric` — ViSQOL metric
scalar | two-element row vector

ViSQOL metric measuring the quality of the degraded signal, returned as a scalar or two-element row vector. The output metric can be NSIM, MOS, or both, depending on the OutputMetric argument.

`ftable` — Frequency information table
table

Frequency information table, returned as a table with the following columns:

FrequencyBand — Center frequency of each gammatone frequency band.
FVNSIM — NSIM value for each band.
FVNSIM10 — Mean of the first decile of the NSIM.
FVNSIMSTD — Standard deviation of the NSIM.
DegradedEnergy — Energy of the degraded signal in each band.

`ttable` — Timing information table
table

Timing information table, returned as a table with the following columns:

PatchIndex — One-based index of the patch.
Similarity — Similarity metric for each patch.
ReferencePatch — Start and end times of the reference patch in seconds.
DegradedPatch — Start and end times of the degraded patch in seconds.

References

[1] Hines, Andrew, Jan Skoglund, Anil C Kokaram, and Naomi Harte. “ViSQOL: An Objective Speech Quality Model.” EURASIP Journal on Audio, Speech, and Music Processing 2015, no. 1 (December 2015): 13. https://doi.org/10.1186/s13636-015-0054-9.

[2] Hines, Andrew, and Naomi Harte. “Speech Intelligibility Prediction Using a Neurogram Similarity Index Measure.” Speech Communication 54, no. 2 (February 2012): 306–20. https://doi.org/10.1016/j.specom.2011.09.004.

[3] Hines, Andrew, Eoin Gillen, Damien Kelly, Jan Skoglund, Anil Kokaram, and Naomi Harte. “ViSQOLAudio: An Objective Audio Quality Metric for Low Bitrate Codecs.” The Journal of the Acoustical Society of America 137, no. 6 (June 1, 2015): EL449–55. https://doi.org/10.1121/1.4921674.

[4] Chinen, Michael, Felicia S. C. Lim, Jan Skoglund, Nikita Gureev, Feargus O’Gorman, and Andrew Hines. “ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric.” In 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), 1–6. Athlone, Ireland: IEEE, 2020. https://doi.org/10.1109/QoMEX48832.2020.9123150.

visqol

Syntax

Description

Examples

Measure Audio Quality with ViSQOL

Use ViSQOL to Evaluate Enhanced Speech Signal

Examine ViSQOL Frequency and Timing Information Tables

Input Arguments

`degraded` — Degraded audio signal
column vector

`reference` — Reference audio signal
column vector

`fs` — Sample rate (Hz)
positive scalar

Name-Value Arguments

`Mode` — ViSQOL mode
`"audio"` (default) | `"speech"`

`OutputMetric` — Output metric
`"MOS"` (default) | `"NSIM"` | `"MOS and NSIM"`

`ScaleMOS` — Scale MOS
`true` (default) | `false`

`SearchWindowSize` — Size of search window
60 (default) | nonnegative integer

Output Arguments

`metric` — ViSQOL metric
scalar | two-element row vector

`ftable` — Frequency information table
table

`ttable` — Timing information table
table

References

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Version History

See Also

Topics

visqol

Syntax

Description

Examples

Measure Audio Quality with ViSQOL

Use ViSQOL to Evaluate Enhanced Speech Signal

Examine ViSQOL Frequency and Timing Information Tables

Input Arguments

degraded — Degraded audio signal column vector

reference — Reference audio signal column vector

fs — Sample rate (Hz) positive scalar

Name-Value Arguments

Mode — ViSQOL mode "audio" (default) | "speech"

OutputMetric — Output metric "MOS" (default) | "NSIM" | "MOS and NSIM"

ScaleMOS — Scale MOS true (default) | false

SearchWindowSize — Size of search window 60 (default) | nonnegative integer

Output Arguments

metric — ViSQOL metric scalar | two-element row vector

ftable — Frequency information table table

ttable — Timing information table table

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

Version History

See Also

Topics

`degraded` — Degraded audio signal
column vector

`reference` — Reference audio signal
column vector

`fs` — Sample rate (Hz)
positive scalar

`Mode` — ViSQOL mode
`"audio"` (default) | `"speech"`

`OutputMetric` — Output metric
`"MOS"` (default) | `"NSIM"` | `"MOS and NSIM"`

`ScaleMOS` — Scale MOS
`true` (default) | `false`

`SearchWindowSize` — Size of search window
60 (default) | nonnegative integer

`metric` — ViSQOL metric
scalar | two-element row vector

`ftable` — Frequency information table
table

`ttable` — Timing information table
table

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.