How To Load Multiple Text Files (specific context)

30 views (last 30 days)
Hello MatLab community, I would like to load many text files (same # of rows and columns) contained in a same folder and compile/stock all 2nd columns in a one matrix.
Here's a example : For 30 text files, the resulting matrix would thus have 30 columns and as many rows as the files contain (specifically, they'd all have 2048 rows).
But here's the catch, there's a multi-lines header (something like 8 lines of header) before the data and the data is separated by a semicolon '' ; ''.
One of the text files is attached as an example.
Also, the names of the text files do NOT follow a certain pattern and they are quite random. I've already asked a very similar question here, but I wasn't considering the header. One helpful guy wrote the script below and I'd like to tweek it a little bit to include the right parameters for the textscan().
% Set input folder
input_folder = 'C:\Users\Cotet\Downloads';
% Read all *.txt files from input folder
% NOTE: This creates a MATLAB struct with a bunch of info about each text file
% in the folder you specified.
files = dir(fullfile(input_folder, '*.txt'));
% Get full path names for each text file
file_paths = fullfile({files.folder}, {files.name});
% Read data from files, keep second column
for i = 1 : numel(file_paths)
% Read data from ith file.
% NOTE: If you're file has a text header, missing data, or
% uses non white-space delimiters, you should check out the
% documentation for textread to determine which options to use.
data = textscan(file_paths{i}, '');
% Save second data column to matrix
% NOTE: Your data files all need to have the same number of rows for this to work
A(:, i) = data(:, 2);
end
The part with which I'm concerned is this note :
% NOTE: If you're file has a text header, missing data, or
% uses non white-space delimiters, you should check out the
% documentation for textread to determine which options to use.
I've tried many things, but was ultimately unsuccessful.
Thank you so much in advance.
  8 Comments
Thomas Côté
Thomas Côté on 10 Jun 2019
Thanks Bob for the response. Unfortunately, when I use the formatspec and your code, it returns one row of empty cells (there should be 2048 rows and 4 columns). However, I think you're onto something. The FOR loop runs until the end, which is a good thing, and by that I mean this :
data = textscan(file_paths{i}, format, 'headerlines', 8);
% Save second data column to matrix
% NOTE: Your data files all need to have the same number of rows for this to work
A(:, i) = data(:, 2);
The data has 2048 rows and 4 columns. Then, I ask MatLab to stock only the 2nd column. After, the FOR loop do this with my other 30 files. So, in the end, because I have 31 files in total, I should end up with a matrix containing 31 columns (representing the 2nd column of each file) and 2048 rows (all the values of each of those 2nd columns).
Now, I have 31 columns as desired, but only 1 row with empty values. How could we fix this?
Thomas Côté
Thomas Côté on 10 Jun 2019
Edited: Thomas Côté on 10 Jun 2019
Oh, also, haha! The number '9 189.95' is actually only 189.95 (on the 9th row). The reason why you see "9 189.95" is because the commentator user "dpb" copied/pasted my text file into a MatLab script. The numbers should be red as this (text file attached) :
189.95; 424.600; 0.000; 0.000
190.09; 421.600; 0.000; 0.000
190.24; 427.600; 0.000; 0.000
190.38; 450.600; 0.000; 0.000
190.53; 421.600; 0.000; 0.000
190.68; 398.600; 0.000; 0.000

Sign in to comment.

Accepted Answer

Guillaume
Guillaume on 10 Jun 2019
As dpb suggested use one of the modern file import function such as readtable or readmatrix instead of the old textscan. These can figure the format of your file on their own or if they're struggling a bit have plenty of easy to understand options to help them along. They're also a lot more configurable, particularly if you use detectImportOptions.
For example, your text file is easily decoded with:
spectrum = readtable('1903395U1_04Jun19_154040_0001.Raw8.txt', 'HeaderLines', 8)
or for a neater table:
opts = detectImportOptions('1903395U1_04Jun19_154040_0001.Raw8.txt', 'ExpectedNumVariables', 4); %only needed once for all the files that follow the same format
spectrum = readtable('1903395U1_04Jun19_154040_0001.Raw8.txt', opts)
detectImportOptions automatically figure out that the header is 8 lines, that the delimiter is ; and that the name of the columns is on the 6th row. I've told it that there is only 4 variables despite the header having 5 names (why is there a 'scope'?).
You can easily wrap that in a loop over all the files. The detectImportOptions is only needed once if all the files follow the same format. You can store the table from each file into a cell array but if your aim is to run statistics across the files then you'd be better off storing it all as one flat table with an additional variable indicating which file the data comes from. After that you can use groupsumarry or similar to compute your statistics all at once.
So the code would be something like:
%Get list of files. You haven't explained how these can be obtained.
filelist = dir('C:\somefolder\*.txt');
%Loop to read all files:
spectra = cell(size(filelist)); %stored in a file array at first
opts = detectImportOptions(fullfile(filelist(1).folder, filelist(1).name, 'ExpectedNumVariables', 4);
for fileidx = 1:numel(filelist)
spectrum= readtable(fullfile(filelist(fileidx).folder, filelist(fileidx).name), opts); %read file
spectrum.Source = repmat({filelist(fileidx).name}, height(spectrum), 1); %add a variable indicating the source. Maybe you want to use only part of the filename
spectra{fileidx} = spectrum;
end
%flatten it all into one table
spectra = vertcat(spectra{:});
%compute some stats, e.g. mean and standard deviation of spectra at each wavelength across the files
groupsumarry(spectra, 'Wave', {'mean', 'std'}, {'Sample', 'Dark', 'Reference'})
Code untested. There might be typos. Read the error messages carefully. Note that I'm using meaningful variable names instead of the utterly useless A.
  1 Comment
Thomas Côté
Thomas Côté on 10 Jun 2019
Thanks/Merci Guillaume, I really appreciate your help and it worked! I made little modifications and here's the working script I'll use :
%Get list of files. You haven't explained how these can be obtained. God drops the files here!
filelist = dir('C:\Users\Cotet\Desktop\Calendrier de Travail\06 - Juin\4 juin\Thomas - 4200\*.txt');
%Loop to read all files:
spectra = cell(size(filelist)); %stored in a file array at first
opts = detectImportOptions(fullfile(filelist(1).folder, filelist(1).name), 'ExpectedNumVariables', 4);
for fileidx = 1:numel(filelist)
spectrum= readtable(fullfile(filelist(fileidx).folder, filelist(fileidx).name), opts); %read file
spectrum.Source = repmat({filelist(fileidx).name}, height(spectrum), 1); %add a variable indicating the source. Maybe you want to use only part of the filename
spectra{fileidx} = spectrum;
AllSpec(:, fileidx) = spectrum(:, 2);
end
utterly_useless_A = table2array(AllSpec);
% Calculate the average of the rows (second dimension) of utterly_useless_A:
avg = mean(utterly_useless_A, 2);
Spectrum_Avg = [table2array(spectrum(:,1)) avg];
I hope you don't mind the "utterly useless A". I really dig that name haha!
Have a great day!

Sign in to comment.

More Answers (0)

Categories

Find more on Data Import and Export in Help Center and File Exchange

Products


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!