You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
Extracting specific data from multiple excel files and create a single matrix from those
51 views (last 30 days)
Show older comments
Hi, I have a file on my computer with close to 1000 excel files and I don't want to manually extract the second row from every excel file manually and combine into a single excel file.
I was wondering how I can do this on Matlab through reading every Excel file in that specific file and taking out all of the columns from the second rows and then combining all that into a single matrix?
Thanks for the help
Accepted Answer
Mathieu NOE
on 31 Aug 2021
hello
this is one example if you want to work out the entire folder
I assumed it would be numeric data so I used importdata (faster)
I also sorted the files names in natural order in case it might be relevant
It works even if your files have different size (number of columns)
clc
clearvars
fileDir = cd;
outfile = 'OUT.xlsx'; % output file name
fileNames = dir(fullfile(cd,'*.xlsx')); % get list of all excel files in directory
fileNames_sorted = natsortfiles({fileNames.name}); % sort file names into order
M= length (fileNames_sorted);
for f = 1:M
% option # 1 for numeric data only using importdata
raw = importdata( fullfile(fileDir, fileNames_sorted{f}));
second_row{f} = raw(2,:); % extract the second row
end
% write all second row lines into a matrix and store it in excel file
writecell(second_row',fullfile(cd,outfile));
14 Comments
Jonas Freiheit
on 31 Aug 2021
Hi Mathieu, I am getting this error reading
Unrecognized function or
variable 'second_row'.
Error in output (line 26)
writecell(second_row',fullfile(cd,outfile));
>>
What should I do?
Thanks
Mathieu NOE
on 31 Aug 2021
hello Jonas
can you check if the for loop is working ok ? do you get an output from this line :
what is displayed in your workspace when you type raw ?
raw = importdata( fullfile(fileDir, fileNames_sorted{f}));
Jonas Freiheit
on 31 Aug 2021
When I type raw, I get >> raw
Unrecognized function or
variable 'raw'.
yep my raw is exactly the same as yours.
Mathieu NOE
on 31 Aug 2021
so this means this line of code could not be executed ....
if you can share a couple of xls files I would like to test on my side...
Jonas Freiheit
on 31 Aug 2021
- 18136a AS ABS BC.CSV
- 18136a Deep Ocean control AS ABS BC(auto).CSV
- 18136a Deep Ocean control.CSV
- 18136b AS ABS BC.CSV
- 18136b Deep Ocean 2%.CSV
- 18136c AS ABS BC.CSV
- 18136c Deep Ccean 2.5%.CSV
- 18136d AS ABS BC.CSV
- 18136d Deep Ocean 3%.CSV
- 18136e AS ABS BC.CSV
- 18136e Deep Ocean 3.5%.CSV
- 18136f AS ABS BC.CSV
- 18136f Deep Ocean 4%.CSV
- 18136g AS ABS BC.CSV
- 18136g Deep Ocean 4.5%.CSV
- 18136h AS ABS BC.CSV
- 18136h Deep Ocean 5%.CSV
- 18136i AS ABS BC.CSV
- 18136i Deep Ocean 6%.CSV
- 18136j AS ABS BC.CSV
- 18136j Deep Ocean 8%.CSV
- 18136k AS ABS BC.CSV
- 18136k Deep Ocean 10%.CSV
- ARTEMIS_SET_control_10%.xlsx
Yeah sure, I'll send you a small folder of samples. I've tried using this folder and it returned an excel file called OUT that only had two zeroes in the excel file.
From the previous test I tried using a folder that contained 1000 excel files and Matlab somehow printed that error possibly because its too hard to process?
Also if this is the case I could create multiple smaller folders and then combine that into the Output matrix that was previously obtained and how would I do that?
Thanks
Mathieu NOE
on 1 Sep 2021
hello again
I simply modified one of your data file (ARTEMIS_SET_control_10%.xlsx) from xlsx to CSV format so they all have the same format. If you need to dig with a mix of CSV and XLSX files , I could update my code.
After that , I had not much to do and this code works like a charm, at least for this batch of 24 files;
see also at the end of my code , if you wish to have the filenames also stored in the OUT file, attached FYI
slightly updated code :
clc
clearvars
fileDir = cd;
outfile = 'OUT.xlsx'; % output file name
fileNames = dir(fullfile(fileDir,'*.csv')); % get list of files in directory
fileNames_sorted = natsortfiles({fileNames.name}); % sort file names into order (https://fr.mathworks.com/matlabcentral/fileexchange/47434-natural-order-filename-sort)
M= length (fileNames_sorted);
for f = 1:M
% option # 1 for numeric data only using importdata
raw = importdata( fullfile(fileDir, fileNames_sorted{f}));
second_row{f} = raw(2,:); % extract the second row
end
% write all second row lines into a matrix and store it in excel file
writecell(second_row',fullfile(cd,outfile)); % without filenames
% writecell([fileNames_sorted' second_row'],fullfile(cd,outfile)); % with filenames stored in column 1
(seems to me a lot of files had the same data inside , just the file names are different)
Jonas Freiheit
on 1 Sep 2021
Sorry this is not working for me, all the files are different but its only printing out 401, 0 for every column.
Its supposed to do (2,:) copy everything in the 2nd row from each excel file and then every row thats copied to copy over into a single excel file or a matrix on matlab. Since there are 24 excel files in this batch there are supposed to be 24 rows and 936 columns for the OUT matrix
Jonas Freiheit
on 1 Sep 2021
Since this is to be used for principal components analysis. I only am interested in the second row since its got the absorption of the spectra and the first row simply is the wavenumbers which is the same for all the excel files and is useless information
Mathieu NOE
on 1 Sep 2021
ok I believe I guess what you really want- which is not what I understood so far
so you want the 2nd column (and not the second row ! ) of each data file
then ok the output size will be 24 rows and 936 columns
as all input files have same dimensions, I could make the code simpler and use writematrix instead of writecell
here code :
clc
clearvars
fileDir = cd;
outfile = 'OUT.xlsx'; % output file name
fileNames = dir(fullfile(fileDir,'*.csv')); % get list of files in directory
fileNames_sorted = natsortfiles({fileNames.name}); % sort file names into order (https://fr.mathworks.com/matlabcentral/fileexchange/47434-natural-order-filename-sort)
M= length (fileNames_sorted);
second_col= [];
for f = 1:M
% option # 1 for numeric data only using importdata
raw = importdata( fullfile(fileDir, fileNames_sorted{f}));
second_col= [second_col raw(:,2)]; % extract the second column
end
% write all second columns lines into a matrix and store it in excel file
writematrix(second_col,fullfile(cd,outfile));
Jonas Freiheit
on 1 Sep 2021
Sorry I have another question, I need to create a { } cell which contains group names within it.
I need to create one thats a 5046x1 cell. containing 1682 entries saying Spot1 then 1682 entries saying Spot 2 then 1682 entries saying Spot 3. This is to group off my spectra, Do you want me to repost this so you can get more votes?
Thanks
Mathieu NOE
on 1 Sep 2021
Hi Jonas
it's not just a question of votes but yes indeed each question / topic should be addressed in a separate post
this way you can also get ore answers because it's not burried in the original post .
More Answers (1)
Ive J
on 31 Aug 2021
Edited: Ive J
on 31 Aug 2021
You can use readmatrix (assuming all values are numeric, otherwise use readtable) or fileDatastore to read those files. Something like this should work:
myfiles = ["file1.xlsx", "file2.xlsx"]; % file names: use dir to generate file names within the target directory
data = [];
for i = 1:numel(myfiles)
tab = readmatrix(myfiles(i)); % see readmatrix for more options
data = [data; tab(2, :)]; % aggregate 2nd row from all files into a new matrix
end
13 Comments
Jonas Freiheit
on 31 Aug 2021
Hi Ive J, Thanks for that, I was wondering how can I generate all the file names into myfiles without writing all the files out?
Thanks
Ive J
on 31 Aug 2021
This would work:
targetDir = pwd; % only if files are within the current directory
filenames = string({dir(fullfile(targetDir, '*.xlsx')).name}); % converted to string to be used in my example above
Jonas Freiheit
on 31 Aug 2021
Hi Ive, should I plug filenames into myfiles as myfiles=[filenames]
Because doing that gives me only 'Output' from the code as the output?
Ive J
on 31 Aug 2021
filenames and myfiles are the same. You can use my example as:
targetDir = pwd; % only if files are within the current directory
myfiles = string({dir(fullfile(targetDir, '*.xlsx')).name});data = [];
for i = 1:numel(myfiles)
tab = readmatrix(myfiles(i)); % see readmatrix for more options
data = [data; tab(2, :)]; % aggregate 2nd row from all files into a new matrix
end
% data is the matrix you're trying to generate from your excel files.
Jonas Freiheit
on 31 Aug 2021
Sorry, I tried this and again didn't work it only printed out 'output'
Ive J
on 1 Sep 2021
What's output? There is no variable called output in my snippet. I used some of your files and it works just fine:
targetDir = pwd; % only if files are within the current directory
myfiles = string({dir(fullfile(targetDir, '*.csv')).name});data = []; % you mentioned you have Excel files, which was wrong
for i = 1:numel(myfiles)
tab = readmatrix(myfiles(i)); % see readmatrix for more options
data = [data; tab(2, :)]; % aggregate 2nd row from all files into a new matrix
end
myfiles =
"18136a AS ABS BC.CSV"
"18136c AS ABS BC.CSV"
"18136d Deep Ocean 3%.CSV"
"18136f AS ABS BC.CSV"
"18136g Deep Ocean 4.5%.CSV"
data =
401.1380 0
401.1380 0
401.1380 0
401.1380 0
401.1380 0
Jonas Freiheit
on 1 Sep 2021
Sorry Ive J, That works but I can't write each file out individually since its 1000 files.
Would you know how to make it work without manually inputting the file names?
Cheers
Ive J
on 1 Sep 2021
I didn't manually read files. The second line gets the names of all CSV files within the directory.
Jonas Freiheit
on 1 Sep 2021
Sorry, I mean its only printing out a 2x5 matrix with 401.1380 and 0 like shown in the data.
Its supposed to be for example a 25x936 matrix if there are 25 excel files containing infrared spectra with 2 rows and 936 columns each. The 1st row only contains the wavenumber data which needs to be ignored and the 2nd row contains the absorption which is different for every excel file and needs to be extracted and then combined into the final matrix.
I'm using this for principal components analysis.
Ive J
on 1 Sep 2021
It's 5X2 matrix because first, I only used 5 CSV files, and secondly, your sample CSV files contained only 2 columns, so it would be simply a 5X2 matrix.
In case you have 1000 files in the folder, myfiles would be a string array of 1000 CSV file names. If each of those CSV files have also 936 columns, then the resulting matrix would be of size 1000X936.
Note that this line
myfiles = string({dir(fullfile(targetDir, '*.csv')).name});
extracts all CSV files within the target directory (path to the folder your CSV files are in), so doesn't matter if there are 5, 1000 or even more CSV files there, myfiles would still contain all those files which then will be looped over to extract the values exist in 2nd line.
Jonas Freiheit
on 1 Sep 2021
Sorry there are 936 columns in each excel file it just doesn't work for some reason..
Ive J
on 1 Sep 2021
Please attach some of these 1000 files (with 936 columns) you're trying to work with.
Also, please be more specific with ...doesn't work for some reason.. What exact error do you get in command window when running my snippet?
Jonas Freiheit
on 1 Sep 2021
Sorry, the error was that it was printing out only 401 and 0. The problem has been solved now I really appreciate the help.
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)