Code To Extract Data From .txt File to Excel

Hi,
I have several text files with two columns and usually 2000+ rows (picture below). I would like to have some code to extract the value from the bottom row of the second column and put it into an Excel file in a certain place. Each of the text files has a different heading/name (same format as picture below though) and I would like to put the value from the text file into Excel under these headings. If necessary, I can create the headings myself and would be happy to just have code which outputs the same heading's value in the same column each time.
Currently I am doing this manually which takes a long time and will need doing lots more in the future.
Does anyone know how to do this and could help me? Happy to answer more questions or give more details.
Many thanks

4 Comments

This code does what I want for one text file.
clc
clear
fid = fopen('b2-moment-z-rfile.out','rt');
linenum1 = textscan(fid, '%f %*f', 'HeaderLines',3);
linenum2 = length(linenum1{1}) + 2
fclose(fid)
fid = fopen('b2-moment-z-rfile.out','rt');
indata = textscan(fid, '%*f %f', 'HeaderLines', linenum2);
celldisp(indata)
fclose(fid);
I now want to make it so I can run it for multiple text files of different names and output the data in such a way that the number from each text file can be copied into a separate column of Excel in one go.
Can anyone help with this?
Thanks
Update on the code using three different files as an example.
clc
clear
% 1 Moment Z
fid = fopen('b2-moment-z-rfile.out','rt');
linenum1 = textscan(fid, '%f %*f', 'HeaderLines',3);
linenum2 = length(linenum1{1}) + 2
fclose(fid);
fid = fopen('b2-moment-z-rfile.out','rt');
data = textscan(fid, '%*f %f', 'HeaderLines', linenum2);
celldisp(data)
fclose(fid);
% 2 Moment Z
fid = fopen('b1-moment-z-rfile.out','rt');
linenum1 = textscan(fid, '%f %*f', 'HeaderLines',3);
linenum2 = length(linenum1{1}) + 2
fclose(fid);
fid = fopen('b1-moment-z-rfile.out','rt');
data = [data textscan(fid, '%*f %f', 'HeaderLines', linenum2)];
celldisp(data)
fclose(fid);
% 3 Moment Z
fid = fopen('b3-moment-z-rfile.out','rt');
linenum1 = textscan(fid, '%f %*f', 'HeaderLines',3);
linenum2 = length(linenum1{1}) + 2
fclose(fid);
fid = fopen('b3-moment-z-rfile.out','rt');
data = [data textscan(fid, '%*f %f', 'HeaderLines', linenum2)];
celldisp(data)
fclose(fid);
This creates the array of the results below but each number has brackets around it which I have to manually delete and therefore lose anytime I saved using the code.
1×3 cell array
{[1106763]} {[1108402]} {[1039780]}
Does anyone know how to get rid of these brackets? I tried using erasePunctuation and isstrprop but could not get them to work. Can anyone help me?
Firstly, if you end up copy/pasting the same code and doing slight tweaks to each copy, there's always a better of doing that. Computers are very good at doing repetitive tasks, so why are you doing the repetition yourself.
Now, with regards to your question, can you attach two different example of the files your dealing with. We don't need the full files, the first 10 rows would suffice. Screenshots are not very useful, we can't import them in matlab. Actual files are much better.
Note that for a number of years now, matlab has functions that make importing text files very easy, see readmatrix or readtable. They're usually much easier to use than textscan, they open and close the file for you, usually figure out the header, delimiter and format themselves, and output directly a matrix or table.
"A lot of the files have unique names so I will have to copy the code and change the names for the text file referenced". Again, copy-pasting and modifying code is the wrong approach. Let the computer do the repetition. The basics would be:
  • write a generic function that takes the full filename as an input and does whatever processing that needs to be done.
  • Obtain the list of file some way, possibly just a dir call with the appropriate filter. At worse, input the list of file as a cell array
  • Loop over the list of file, calling the function for each file.
"This creates the array of the results below but each number has brackets [...] Does anyone know how to get rid of these brackets? I tried using erasePunctuation and isstrprop but could not get them to work"
I'm afraid you're lacking some basic understanding of matlab here. I'm not the free course matlab onramp covers cell arrays, but if you haven't done that course, I'd recommend you go through it anyway. What you show is a cell array containing matrices. The brackets are the way matlab shows to you that the cell array contains matrices. The brackets are not part of the data. They're only for display. Trying to use erasePunctuation on something that is not text at all is meaningless.
In any case, if you use readmatrix you'd be dealing with matrices directly. Please attach examples so I can show you better code.
Thank you for your reply, it is really helpful. What I have done currently is mostly what I’ve picked up googling things.
I am going to be away from my computer for the next few hours so can’t access MATLAB but I’ve attached three files. These three have very similar names but the others won’t have. All of them will be similar though in that column 1 will be a count from 1 to n and column 2 will be the data. I am only interested in the final entry.
Thank you if you’re able to help at all.

Sign in to comment.

 Accepted Answer

As we don't yet know how to identify the files to be processed, nor how exactly the data is to be exported to excel, the below just focuses on the file import:
filelist = {'b1-moment-z-rfile.txt', 'b2-moment-z-rfile.txt', 'b3-moment-z-rfile.txt'}; %it's not clear yet how this is to be obtained
datatoexport = cell(size(filelist)); %container for data to export as this is also not clear
for filenum = 1:numel(filelist)
wholecontent = readtable(filelist{filenum}); %read whole file, readtable automatically detects formatting and header and uses the header to name the table variables
datatoexport{filenum} = wholecontent(end, :); %keep last row of table (header always come with it)
end
With the above, you will get a warning for each file that matlab modifies the header to make it valid variable names (it removes the ( and " and replaces the - by _). The warning can be turned off with
warning('off', 'MATLAB:table:ModifiedAndSavedVarnames');
or matlab can be told not to do the replacement (thus keeping the (" in the variable names) with the 'PreserveVariableNames', false option of readtable.
Once details of how to identify files and what export is desired, the above can be modified accordingly.

13 Comments

That is really helpful thank you!
Below is the list of files I will need to apply this code to - a few more may get added later but will be similar.
These files are generated by a simulation I am doing and will be done several times. I would like to use the code for each simulation to export these files into an Excel table like the one below where file 1, 2 etc are the names of the ones in the list.
Is there any more information that would help you?
Grateful for your help so far.
So, it looks like you want to process all files in a directory that end in rfile.out.
I'm unclear how that maps to the rows and columns of the excel table though. Which file do File 1, File 2, etc. map to? Similarly where do Simulation 1, 2 etc. come from?
Also you said you wanted to keep the last row which consists of 2 values. It appear there's only one of these two values in your output table. Which one is it?
Yeah, they will all end in rfile.out if that is helpful.
The mapping of File 1 to that list I can go through and organise later as others may also be added but for the purpose of this we can just assume file 1 is the top of the list and so on. Simulation 2, 3 etc will be another folder with the same file names as simulation 1 but different values and my intention was to run the code for each one.
I am only interested in the value in the second column of the last row.
but different values and my intention was to run the code for each one. Again, you can design your code so it does the repetitive tasks instead of you.
Assuming you have a master directory with subfolders named 'simulation 1', 'simulation 2', etc. all containing the same number of *rfile.out, then I'd do something like this:
masterdirectory = 'C:\somewhere\somefolder';
simdirpattern = 'Simulation *';
simfilepattern = '*rfile.out';
excelfile = 'C:\anothersomewhere\somefolder\simresults.xlsx';
%get all simulation results in all the simulation directories
simfiles = dir(fullfile(masterdirectory, simdirpattern, simfilepattern);
%figure out which row and column of the output each file maps to. rows are unique directories. columns are unique file names
[~, allsimdirnames] = cellfun(@fileparts, {simfiles.folder}, 'UniformOutput', false); %extract the directory names from the full paths
[simdirnames, ~, destrows] = unique(allsimdirnames);
[simfilenames, ~, destcols] = unique({simfiles.name});
%preallocate matrix that will be exported to excel.
simresults = nan(max(destrows), max(destcols)); %filled with NaN to start with
%import each file in turn, get value in the last column of the last row and put in the corresponding cell of simresults:
for filenum = 1:numel(simfiles)
simdata = readmatrix(fullfile(simfiles(filenum).folder, simfiles(filenum).name), 'FileType', 'text');
simresults(destrows(filenum), destcols(filenum)) = simdata(end); %simdata(end) is the same as simdata(end, end)
end
%everything has been collected export to excel file. Using a table for that so we get automatic row and columns headers
tsimresults = array2table(simresults, 'VariableNames', matlab.lang.makeValidName(simfilenames), 'RowNames', matlab.lang.makeValidName(simdirnames)); %the matlab.lang.makeValidName are optional in newer versions of matlab
writetable(tsimresults, excelfile, 'WriteVariableNames', true, 'WriteRowNames', true);
Bugs and typos are possible in the above as it's untested. The principle is sound however.
edit: bugfix: extract folder name from full path of folder
edit2: bugfix: file was meant to be simfiles
edit3: bugfix: missing 'FileType', 'text' in the readtable call
Thank you, I do not understand the bit about the rows and columns though. Please could you explain it a bit more?
You want the results arranged in a table, with the rows of the table corresponding to simulation directories and the columns corresponding to files. I assume that files in different directory but with the same name would end up in the same column on different rows. This is what my code does. It works out which row (from the directory) and which column (from the file name) to fill. The 3rd output of unique gives you the destination row/column.
However, I've just realised that I use the full path of the folder instead of the folder name itself which is probably not what you want. I'll edit the code to fix.
Of course, it's possible I completely misunderstood what you're trying to achieve.
It sounds exactly like what I want. I have updated top four lines with the file paths etc and ran the code but get the following error on this line:
simdata = readmatrix(fullfile(fill(filenum).folder, fill(filenum).name));
Error using fill
Not enough input arguments.
Do I need to change anything else specific to my computer?
You miscopied. The sample code used "file" but you put in "fill"
Ah yes, I got an error before and accepted MATLAB's suggested fix without noticing that. If it is file instead i get this error:
Unrecognized function or variable 'file'.
I did warn of possible typos! Replace file by simfiles on that line.
great thank you! New error now:
Error using readmatrix (line 148)
The extension '.out' is not a known text or spreadsheet extension. To specify the file type, use 'FileType' with either 'spreadsheet' or 'text'.
e.g opts = detectImportOptions(name,'FileType','text')
I am not sure where exactly to use FileType though, to be honest most of the code is beyond me now.
Guillaume
Guillaume on 18 Mar 2020
Edited: Guillaume on 18 Mar 2020
Oh, of course, well just do as it tells you, adds 'FileType', 'text' to the detectImportOptions calls.
[...] well yes, the error message is a bit misleading since there's no detectImportOptions in the code. It's in the readmatrix call that it needs to be added. See edits.
This worked fantastic, much better than I expected. Thank you for all your help.

Sign in to comment.

More Answers (2)

Accept my answer if it is working for you. Thanks
fid = fopen('12.txt','rt');
indata = textscan(fid, '%f %f', 'HeaderLines',3);
fclose(fid);
%your data is in indata

2 Comments

Hi,
I changed the '12.txt' part to the name of my file and I am getting an error on line 2.
So with the following I get the value from the last row (in this case 2500) as indata.
fid = fopen('b2-moment-z-rfile.out','rt');
indata = textscan(fid, '%f %f', 'HeaderLines',2502);
fclose(fid);
%your data is in indata
What I would like is to repeat this for several text files and all the data be in a way I can copy it into Excel in one go.
Also worth noting the number of rows may change so is there a way to account for this?

Sign in to comment.

To process a sequence of files, place your code in the middle of a for loop over files.
For code samples, see the FAQ: FAQ#How_can_I_process_a_sequence_of_files?

2 Comments

A lot of the files have unique names so I will have to copy the code and change the names for the text file referenced. The issue is the output - as it is in the latest code i posted indata gets overwritten each time. Something like an array where each new result is added to a new column would be good.
Is there any pattern that can be used to distinguish the files you want to process from the other ones you do not want to process in the same directory? For example do you want to process all of the *moment-z-rfile.txt files? If so then you can use
dinfo = dir('*moment-z-rfile.txt');
filenames = {dinfo.name};
numfiles = length(filenames);
for K = 1 : numfiles
thisfile = filenames{K};
[folder, basename, ext] = fileparts(thisfile);
outfile = fileparts(folder, [basename '.out']);
....
end

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!