How to add entries to a growing matrix in matlab?

2 views (last 30 days)
In short, I want to loop through some folders and access data from files within these folders. As I do this, I need to add single entries to a growing matrix. I know that I want the matrix to have 10 columns, but the number of rows which correspond to a single folder is undetermined. Is there a way to add entries to one row of a matrix and then keep adding new entries to the following row in a for loop?
I'm not really sure how to do this...Thank you!!

Answers (2)

Image Analyst
Image Analyst on 25 Aug 2021
Yes, just allocate way more rows than you need, keep track of how many you've inserted so far, then crop at the end when you're all done adding rows.
% Preallocate 20 thousand rows - more than you think you'll ever need.
array = double(20000, 4);
% Make a pointer to the last row that we've used.
lastRow = 0;
% Loop over all files, reading in the data and inserting it
% into the array at the appropriate rows.
for k = 1 : numberOfFolders
% Get the data somehow.
thisData = ReadYourData(filename); % You build the filename and write this function.
% Find out its size:
[rows, columns] = size(thisData)
% Insert the data
row1 = lastRow + 1;
row2 = row1 + rows - 1;
array(row1 : row2, 1 : columns) = thisData;
% Move the row pointer to the last row we've inserted so far.
lastRow = lastRow + rows;
end
% Now crop off any unused rows:
array = array(1:lastRow, :);

Walter Roberson
Walter Roberson on 25 Aug 2021
That approach tends to lead to a lot of matrix reallocations, unless it is programmed carefully. Easier to avoid that by using cell arrays to hold the parts.
projectdir = '.'; %or as appropriate
dinfo = dir(projectdir);
dinfo(~[dinfo.isdir]) = []; %folders only
dinfo(ismember({dinfo.name}, {'.', '..'})) = []; %remove . and ..
foldernames = fullfile(projectdir, {dinfo.name});
numfolders = length(foldernames);
data_cell = cell(numfolders,1);
for foldidx = 1 : numfolders
thisfolder = foldernames{foldidx};
finfo = dinfo(fullfile(thisfolder, '*.txt'));
filenames = fullfile(thisfolder, {finfo.name});
numfiles = length(filenames);
folder_cell = cell(numfiles,1);
for fileidx = 1 : numfiles
thisfile = filenames{fileidx};
this_data = readmatrix(thisfile); %or as appropriate to read data
folder_cell{fileidx} = this_data;
end
folder_data = vertcat(folder_cell{:}); %merge data for one folder
data_cell{foldidx} = folder_data;
end
all_data = vertcat(data_cell{:}); %merge data from all folders
  3 Comments
Walter Roberson
Walter Roberson on 26 Aug 2021
thisfolder = foldernames{foldidx};
At that point in the code, foldernames is a cell array of names of folders, and the code is iterating from 1 to the number of names. That particular statement is extracting the name indexed by foldidx from the cell array of folder names. The result, thisfolder, is a character vector that is the path to a particular folder.
foldernames was created as
foldernames = fullfile(projectdir, {dinfo.name});
dinfo is a struct array that has a field named name and for each entry it is a character vector. {dinfo.name} takes all of those character vectors and puts them together into a cell array of character vectors.
Does {finfo.name} go into the finfo directory listing and get all the names of the files in the current folder?
Yes, in the sense that at that point they have already been recorded in the struct array finfo and {finfo.name} is creating a cell array of character vectors of the file names. Then the fullfile() step is putting the foldername in front of each one. The result is that each entry in filenames has the path and file name to reach it -- no need to cd() around, no need to worry about attaching the correct directories in the inner loops.
In the code I posted, if all goes well, then at the end all_data is a single numeric array containing all of the entries from all of the files in all of the folders. mean(all_data) would give you a column-by-column mean.
If you needed a folder-by-folder mean instead of a mean of all of the data over all of the files in all of the folders, then you would take the mean() of folder_data -- folder_data is the data for all of the files put together for one particular folder.
The code I wrote uses readmatrix because I made a guess that all of your columns are numeric. If some of your columns are not numeric, then the readmatrix() would have to be changed.
Image Analyst
Image Analyst on 26 Aug 2021
If you want a list of all the filenames, and a list of all the foldernames where each file lives, then you can do this:
% filePattern = '**/*.*'; % If you want to recurse into subfolders.
% projectdir = pwd; % Whatever.
dinfo = dir(filePattern)
allFileNames = {dinfo.name};
allFolderNames = {dinfo.folder};
% Find rows where the filename is . or ..
dirRows = ismember({dinfo.name}, {'.', '..'});
% Get a list of all the filenames, and a list of what folders they are in.
allFileNames = allFileNames(~dirRows);
allFolderNames = allFolderNames(~dirRows);
allFileNames and allFolderNames will both have the same number of items in them and they're synced up, so for example allFolderNames(37) contains the folder where allFileNames(37) lives.
So to compare, I ran the code on my computer where I have a folder with 5 folders and 588 files, and one folder nested below that with 81 or so files nested below. Walter's code (on my machine) had 669 file names and 6 folder names (including the one nested folder names). Whereas my code gives 669 filenames and 669 foldernames (matched up with the corresponding filename). Use whichever way you want it.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!