Merge Multiple Excel .csv Files into a Single Cell Array
5 views (last 30 days)
Show older comments
Hello,
I'm currently writing a script that pulls data from similarly formatted excel files into a single cell array. Each .csv has the first n number of rows removed from its array and then copied into a master array for data analytics. The code currently functions, but is quite short of the mark in terms of computation time. At the moment, due to size matching of arrays, I have had to forego moving all data within each .csv at once. Instead, the code operates by merging each .csv row by row. I am using a cell array because each column heading has a different type of dataset. The code must use xlsread, due to running in R2017a.
This creates a significant issue since there are 1600+ .csv files with 760+ rows per file. Leading to approximately 1.2 million loops. Ideally, if I can shorten the functionality by batch merging entire .csv files into the master list, I can speed up processing by replacing 760+ loops with a single loop.
The process also needs to be repeated for multiple folder locations; meaning I will be processing approximately 15,000 .csv files in total. I've already written the code that uses prompt commands for the user to select which folders they would like to analyze. The end result is an array of all data from each of the .csv files that can be analyzed for statistical purposes and read out as a graph/table that includes missing data.
If anyone can suggest ways to improve the computation time of the code below that would be greatly appreciated. Other information regarding data analytics was provided to give a clearer understanding of the end goal.
Thank you,
%excfiles is a list of all paths to .csv in the users selected folder, numfiles is the # of files contained in the selected folder
function [DataCompilation] = Compile(excfiles,numfiles)
%reads the first excfile path and enters the 16th row (containing headers) into the first row of the master cell array (dataset)
[a,b,c]=xlsread(excfiles(1));
header=c(16,:);
dataset(1,:)=header;
%runs for the total number of files contained in the user selected folder
for i=1:numfiles(1)
[num,txt,raw]=xlsread(excfiles(i));
raw([1:16],:)=[]; %removes erroneous data contained in the first 15 cells of each excfile
numrows=size(raw); %determines # of rows in the excfile for the total amount of run times (working row by row)
dataend=size(dataset);
for i=2:numrows(1)
dataset(dataend(1)+(i-1),:)=raw(i,:); %merges indivdual excfiles to the master array
end
end
DataCompilation=dataset; %returns the master array
end
0 Comments
Answers (0)
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!