how to separate an audio file based on different speakers

87 views (last 30 days)
I have a conversation recorded of 2 different people, the conversation contains a gap before every sentence or person to make it easier for the algorithm recorgnise the voice. I want to try to split an audio file into two, each contains only one speaker's speech. Just wondering how would I go about this task?

Answers (3)

jibrahim on 18 Nov 2021
Hi Andrei,
Speaker diarization is one way to address it. Check out this example:

Mathieu NOE
Mathieu NOE on 18 Nov 2021
this is ùy suggestion : assuming we have a wav file, I take the absolute of the signal and smooth it to get a kind of enveloppe , then I look for time instants when this enveloppe will cross a gievn threshold; this give me time values for start and stop times for each person. Now remains to be done a code to select start and stop moments that are distant by a difference equal or superior to your gap value
% load signal
%% data
[signal,Fs] = audioread('test_voice_mono.wav');
[samples,channels] = size(signal);
dt = 1/Fs;
time = (0:samples-1)*dt;
% display 1 : time domain plot
% create the signal envelop
signal_abs = abs(signal);
se = smoothdata(signal_abs,'gaussian',500);
cor_coeff = max(signal_abs)./max(se);
se = se.*cor_coeff;
% se = envelope(abs(signal),500,'peak');
threshold = max(se)/4;
[t0_pos,s0_pos,t0_neg,s0_neg]= crossing_V7(se,time,threshold,'linear'); % positive (pos) and negative (neg) slope crossing points
% ind => time index (samples)
% t0 => corresponding time (x) values
% s0 => corresponding function (y) values , obviously they must be equal to "threshold"
plot(time,signal_abs,time,se,time,threshold*ones(size(time)),'k--',t0_pos,s0_pos,'dr',t0_neg,s0_neg,'dg','linewidth',2,'markersize',12);grid on
legend('signal (rectified)','signal envelope','threshold','positive slope crossing points','negative slope crossing points');
title(['Time plot / Fs = ' num2str(Fs) ' Hz ']);
xlabel('Time (s)');ylabel('Amplitude');
period = diff(t0_pos)
function [t0_pos,s0_pos,t0_neg,s0_neg] = crossing_V7(S,t,level,imeth)
% [ind,t0,s0,t0close,s0close] = crossing_V6(S,t,level,imeth,slope_sign) % older format
% CROSSING find the crossings of a given level of a signal
% ind = CROSSING(S) returns an index vector ind, the signal
% S crosses zero at ind or at between ind and ind+1
% [ind,t0] = CROSSING(S,t) additionally returns a time
% vector t0 of the zero crossings of the signal S. The crossing
% times are linearly interpolated between the given times t
% [ind,t0] = CROSSING(S,t,level) returns the crossings of the
% given level instead of the zero crossings
% ind = CROSSING(S,[],level) as above but without time interpolation
% [ind,t0] = CROSSING(S,t,level,par) allows additional parameters
% par = {'none'|'linear'}.
% With interpolation turned off (par = 'none') this function always
% returns the value left of the zero (the data point thats nearest
% to the zero AND smaller than the zero crossing).
% check the number of input arguments
% check the time vector input for consistency
if nargin < 2 | isempty(t)
% if no time vector is given, use the index vector as time
t = 1:length(S);
elseif length(t) ~= length(S)
% if S and t are not of the same length, throw an error
error('t and S must be of identical length!');
% check the level input
if nargin < 3
% set standard value 0, if level is not given
level = 0;
% check interpolation method input
if nargin < 4
imeth = 'linear';
% make row vectors
t = t(:)';
S = S(:)';
% always search for zeros. So if we want the crossing of
% any other threshold value "level", we subtract it from
% the values and search for zeros.
S = S - level;
% first look for exact zeros
ind0 = find( S == 0 );
% then look for zero crossings between data points
S1 = S(1:end-1) .* S(2:end);
ind1 = find( S1 < 0 );
% bring exact zeros and "in-between" zeros together
ind = sort([ind0 ind1]);
% and pick the associated time values
t0 = t(ind);
s0 = S(ind);
if ~isempty(ind)
if strcmp(imeth,'linear')
% linear interpolation of crossing
for ii=1:length(t0)
%if abs(S(ind(ii))) >= eps(S(ind(ii))) % MATLAB V7 et +
if abs(S(ind(ii))) >= eps*abs(S(ind(ii))) % MATLAB V6 et - EPS * ABS(X)
% interpolate only when data point is not already zero
NUM = (t(ind(ii)+1) - t(ind(ii)));
DEN = (S(ind(ii)+1) - S(ind(ii)));
slope = NUM / DEN;
slope_sign(ii) = sign(slope);
t0(ii) = t0(ii) - S(ind(ii)) * slope;
s0(ii) = level;
% extract the positive slope crossing points
ind_pos = find(sign(slope_sign)>0);
t0_pos = t0(ind_pos);
s0_pos = s0(ind_pos);
% extract the negative slope crossing points
ind_neg = find(sign(slope_sign)<0);
t0_neg = t0(ind_neg);
s0_neg = s0(ind_neg);
% empty output
ind_pos = [];
t0_pos = [];
s0_pos = [];
% extract the negative slope crossing points
ind_neg = [];
t0_neg = [];
s0_neg = [];
Star Strider
Star Strider on 19 Nov 2021
This is called Signal separation, blind source separation, and other terms. There are several ways to do it, one being independent component analysis, used in the rica function and related functions.

Sign in to comment.

Star Strider
Star Strider on 19 Nov 2021
This is called Signal separation, blind source separation, and other terms. There are several ways to do it, one being independent component analysis, used in the rica function and related functions.


Find more on Get Started with Signal Processing Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!