How to use trainNetwork function for input as a video?

1 view (last 30 days)
I have written code to recognize characters 'A' & 'B'. But during training , i got following error.
Error using trainNetwork
Invalid training data. Predictors must be a numeric array, a datastore, or a table. For networks with sequence input, predictors can also be
a cell array of sequences.
Error in trialjune25_2022 (line 77)
[netLSTM,info] = trainNetwork(imdsTrainData',labelsTrain,layers,options);
____
size(imdsTrainData')=> 37 x 1 cell array
size(labelsTrain)=> 37 x 1 cell array.
Please help to resolve the error.
clear all;
close all;
clc;
idx = 1;
files={'trainsp1_A1.avi';'trainsp1_A2.avi';'trainsp1_A3.avi';'trainsp1_A4.avi';'trainsp1_A5.avi';'trainsp1_A6.avi';'trainsp1_A7.avi'; ...
'trainsp2_A1.avi';'trainsp2_A2.avi';'trainsp2_A3.avi';'trainsp2_A4.avi';'trainsp2_A5.avi';'trainsp2_A6.avi';'trainsp2_A7.avi'; ...
'trainsp3_A1.avi';'trainsp3_A2.avi';'trainsp3_A3.avi';'trainsp3_A4.avi';'trainsp3_A5.avi';'trainsp3_A6.avi';'trainsp3_A7.avi'; ...
'trainsp1_B1.avi';'trainsp1_B2.avi';'trainsp1_B3.avi';'trainsp1_B4.avi';'trainsp1_B5.avi';'trainsp1_B6.avi';'trainsp1_B7.avi'; ...
'trainsp2_B1.avi';'trainsp2_B2.avi';'trainsp2_B3.avi';'trainsp2_B4.avi';'trainsp2_B5.avi';'trainsp2_B6.avi';'trainsp2_B7.avi'; ...
'trainsp3_B1.avi';'trainsp3_B2.avi';'trainsp3_B3.avi';'trainsp3_B4.avi';'trainsp3_B5.avi';'trainsp3_B6.avi';'trainsp3_B7.avi'; ...
};
% labels1=categorical([ones(1,21) 2*ones(1,21)]);% 2*ones(1,8) 3*ones(1,8) 4*ones(1,8) 5*ones(1,7) 6*ones(1,3) 7*ones(1,8) 8*ones(1,6) 9*ones(1,6)]);
% labels=labels1';
labels={1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,...
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2};
numFiles = numel(files)
% sequences = cell(numFiles,1);
for i = 1:numFiles
fprintf("Reading file %d of %d...\n", i, numFiles)
video{i} = readVideo(files{i});
end
numObservations = numel(video);
idx = randperm(numObservations);
N = floor(0.9 * numObservations);
idxTrain = idx(1:N);
imdsTrainData = video(idxTrain);
%imdsTrainData=imdsTrainData1';
labelsTrain1 = labels(idxTrain);
labelsTrain=labelsTrain1';
%imdsTrain={imdsTrainData' labelsTrain};
idxValidation = idx(N+1:end);
imdsValidationData = video(idxValidation);
%imdsValidationData=imdsValidationData1';
labelsValidation1 = labels(idxValidation);
labelsValidation=labelsValidation1';
%imdsValidation={imdsValidationData' labelsValidation};
% %_______________
%imdsTrain=
% imdsValidation
layers = [
imageInputLayer([38 62 3])
convolution2dLayer(3,8,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,16,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,32,'Padding','same')
batchNormalizationLayer
reluLayer
fullyConnectedLayer(2)
softmaxLayer
classificationLayer];
options = trainingOptions('adam', ...
'InitialLearnRate',1e-4, ...
'GradientThreshold',2, ...
'Shuffle','every-epoch', ...
'ValidationData',{imdsValidationData,labelsValidation}, ...
'ValidationFrequency',5, ...
'Plots','training-progress', ...
'Verbose',false);
[netLSTM,info] = trainNetwork(imdsTrainData',labelsTrain,layers,options);
  3 Comments
Shilpa Sonawane
Shilpa Sonawane on 28 Jun 2022
unction video = readVideo(filename)
vr = VideoReader(filename);
H = vr.Height;
W = vr.Width;
C = 3;
% Preallocate video array
numFrames = floor(vr.Duration * vr.FrameRate);
video = zeros(H,W,C,numFrames);
% Read frames
i = 0;
while hasFrame(vr)
i = i + 1;
video(:,:,:,i) = readFrame(vr);
end
% Remove unallocated frames
if size(video,4) > i
video(:,:,:,i+1:end) = [];
end
end

Sign in to comment.

Answers (1)

Garmit Pant
Garmit Pant on 30 Jun 2022
Hello Shilpa
It is my understanding that you want to build a classifier that takes a video input. You are using the 'trainNetwork' function and in trying to do so it is throwing an error.
As the error suggests, trainNetwork only accepts image, sequence or feature data in the form of datastore objects, cell array of numerical arrays or numerical array. The input that you are passing to the function is imdsTrainData that stores frame data and is defined as:
imdsTrainData = video(idxTrain);
As mentioned earlier, imdsTrainData stores instances from the video array that store data from the frames of the different files. To use trainNetwork, you'd need to pass one of the datatypes mentioned above. You can do so by converting your videos to sequences of feature vectors, This can be done by extracting the output of the activations function on the last pooling layer of the GoogLeNet network ("pool5-7x7_s1").
inputSize = netCNN.Layers(1).InputSize(1:2);
layerName = "pool5-7x7_s1";
tempFile = fullfile(tempdir,"hmdb51_org.mat");
if exist(tempFile,'file')
load(tempFile,"sequences")
else
numFiles = numel(files);
sequences = cell(numFiles,1);
for i = 1:numFiles
fprintf("Reading file %d of %d...\n", i, numFiles)
video = readVideo(files(i));
video = centerCrop(video,inputSize);
sequences{i,1} = activations(netCNN,video,layerName,'OutputAs','columns');
end
save(tempFile,"sequences","-v7.3");
end
You can refer the following tutorial to read more about video classification tasks using deep learning in MATLAB: https://www.mathworks.com/help/deeplearning/ug/classify-videos-using-deep-learning.html
  2 Comments
Shilpa Sonawane
Shilpa Sonawane on 1 Jul 2022
Sir/Madam,
I don't want to use pretrain network like GoogLeNet network.
Is there other way to convert video frames to sequences?
Thank you.
Shilpa Sonawane
Shilpa Sonawane on 2 Jul 2022
Sir/Mam,
I used "activations" function in my program. The error is resolved.
if there is other way to convert video frames to sequences, let me know.
Thank you.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!