how to select relative time stamp from target vector using deep learning

3 views (last 30 days)
i have 4 input signals namely, 4-D array to store frames of video , 4-D array to store frames of mfccs, 1-d array time stamps of mfcc(audio) & 1-d array of speaker identity. Target considered as '1' ie digit1. For experimentation i have considered only digit-1 with 41 frames. All 41 frames make sound digit-1. In this experiment, test signal is same as trained signal. I got correct target output1 for 41 frames. but from these target vector i have to slect corresponding mfcc frame.Ex. for first '1' in target vector i need to select mfcc-first frame. I have to generate sound from it. I am unable to do it. Please provide guidance.
The code is shown below.
clear all; close all; clc;
[visual_frames, visual_lbl,vid_fr_cnt,new_vl]=fun_visual_data_processing_30april_23();
[audio_frames, audio_lbl,aud_fr_cnt,new_al]=fun_audio_data_processing_30april_23();
[v1 v2 v3 v4]=size(visual_frames)
[a1 a2 a3 a4]=size(audio_frames)
dsX1Train = arrayDatastore(visual_frames,IterationDimension=4);
dsX2Train = arrayDatastore(audio_frames,IterationDimension=4);
dsTTrain = arrayDatastore(audio_lbl);
%dsTTTrain=arrayDatastore(new_vl);
SI=ones(41,1);
dsX3Train=arrayDatastore(SI);
dsX4Train=arrayDatastore(new_al');
[h,w,numChannels,numObservations] = size(visual_frames);
numClasses = numel(categories(visual_lbl));
imageInputSize = [h w numChannels];
filterSize = 5;
numFilters = 16;
layers1 = [
imageInputLayer(imageInputSize,Normalization="none")
convolution2dLayer(filterSize,numFilters)
batchNormalizationLayer
reluLayer
fullyConnectedLayer(50)
flattenLayer(Name="FL1")
concatenationLayer(1,4,Name="cat")
fullyConnectedLayer(numClasses)
softmaxLayer
classificationLayer];
lgraph = layerGraph(layers1);
layers2 = [
imageInputLayer(imageInputSize,Normalization="none")
convolution2dLayer(filterSize,numFilters)
batchNormalizationLayer
reluLayer
fullyConnectedLayer(50)
flattenLayer(Name="FL2")
];
lgraph = addLayers(lgraph,layers2);
lgraph = connectLayers(lgraph,"FL2","cat/in2");
numFeatures=1;
featInput = featureInputLayer(numFeatures,Name="features");
lgraph = addLayers(lgraph,featInput);
lgraph = connectLayers(lgraph,"features","cat/in3");
numFeatures=1;
featInput2 = featureInputLayer(numFeatures,Name="features2");
lgraph = addLayers(lgraph,featInput2);
lgraph = connectLayers(lgraph,"features2","cat/in4");
figure
plot(lgraph)
options = trainingOptions("sgdm", ...
MaxEpochs=15, ...
InitialLearnRate=0.01, ...
Plots="training-progress", ...
Verbose=0);
dsTrain = combine(dsX1Train,dsX2Train,dsX3Train,dsX4Train,dsTTrain);
net = trainNetwork(dsTrain,lgraph,options);
%
dsTest=dsTrain;
[ytest] = classify(net,dsTest);
OUTPUT
>> ytest'
ans =
1×41 categorical array
Columns 1 through 16
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Columns 17 through 32
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Columns 33 through 41
1 1 1 1 1 1 1 1 1

Accepted Answer

Shilpa Sonawane
Shilpa Sonawane on 6 Sep 2023
Thank you so much. I will try it definately.

More Answers (1)

Karan Singh
Karan Singh on 5 Sep 2023
Hi Sarah,
Based on the provided code and description, I am assuming that you have a dataset containing video frames and corresponding MFCC frames.
In your experiment, you are focusing on a specific digit, '1', and you have considered 41 frames that represent the sound of the digit '1'. The goal is to train a neural network model to recognize and classify these frames as '1'.
Now, you want to select the corresponding MFCC frame for each '1' in the target vector and generate sound from it. In other words, you want to extract the MFCC frame that corresponds to each predicted '1' in the target vector and use it to generate sound.
To achieve this, you can follow these steps:
  1. Extract the indices of the '1' values from the target vector using the find function:
indices = find(ytest == 1);
2. Iterate over the obtained indices and select the corresponding MFCC frame using indexing:
for i = 1:length(indices)
mfccFrame = audio_frames(:, :, :, indices(i));
% Generate sound from the selected MFCC frame
% ...
% Your sound generation code here
end
In the above code, “audio_frames” is the 4D array representing the MFCC frames, and indices contains the indices of the '1' values in the target vector. 
As the sound generation code was not provided in your code snippet, you would need to replace the comment % Generate sound from the selected MFCC frame with your specific sound generation code to generate sound from the selected MFCC frame. Here's a simple demo sound generation code using the MATLAB soundsc function to play the sound from each selected MFCC frame:
% Assuming you have extracted the indices and stored them in the variable 'indices'
for i = 1:length(indices)
% Select the corresponding MFCC frame
mfccFrame = audio_frames(:, :, :, indices(i));
% Convert MFCC frame to audio signal (dummy code)
audioSignal = mfccFrame(:); % Replace with your actual conversion code
% Normalize the audio signal
audioSignal = audioSignal / max(abs(audioSignal));
% Set the sampling rate and play the sound
fs = 44100; % Replace with your desired sampling rate
soundsc(audioSignal, fs);
% Pause for a moment to hear the sound
pause(1); % Adjust the duration as needed
end
Attached below are some documentation links that you may find helpful:
Hope this helps!

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!