Efficiently import text file with irregular struture

5 views (last 30 days)
I have data like the sample attached. I'd like to efficiently read it in and ultimately have a table with the following columns: Value1, Value2, Header (note that header is repeated for many pairs; this is to facilite GroupBy sorting) ... example desired output below
I'm not sure where to start and would appreciate any help. Using Matlab R2018a
Ex. desired output:
Val1 Val2 info
27.32 32.8 'Header i do need #1'
27.33 32.68 'Header i do need #1'
27.05 32.73 'Header i do need #1'
27.71 32.9 'Header i do need #1'
27.71 32.71 'Header i do need #1'
etc. etc. etc.
27.57 32.41 'Header i do need #2'
27.43 32.66 'Header i do need #2'
27 32.27 'Header i do need #2'
27.05 32.27 'Header i do need #2'
27.13 32.37 'Header i do need #2'
etc. etc. etc.
27.49 32.35 'Header i do need #3'
27.84 32.17 'Header i do need #3'
27.83 32.16 'Header i do need #3'
27.07 32.44 'Header i do need #3'
27.66 32.77 'Header i do need #3'
etc. etc. etc.

Answers (1)

per isakson
per isakson on 12 Mar 2019
Edited: per isakson on 13 Mar 2019
If you need further help, please show a three line example of the output you want from example.txt
Working code
Assumptions
  • The text file fits in memory
  • The first line with "*" in the first position indicates the beginning of data
  • Lines beginning with "#" (in the data part) indicate the start of a block
Approach
  • Read the entire file into a character array.
  • Split the array into a cell array of characters, with one block in each cell
  • Loop over all blocks, parse the blocks and put the result in a cell array
  • Pre-allocate output variables based on the size of the the cell array and contained data
  • Loop over all blocks and put the data into a table
>> T = cssm( 'h:\m\cssm\example.txt' )
T =
76×3 table
Var1 Var2 info
_____ _____ __________________
27.32 32.8 "header 1 do need"
27.33 32.68 "header 1 do need"
27.05 32.73 "header 1 do need"
...
27.53 32.66 "header 2 do need"
27.68 32.98 "header 2 do need"
27.77 32.27 "header 2 do need"
27.49 32.35 "header 3 do need"
27.84 32.17 "header 3 do need"
27.83 32.16 "header 3 do need"
where
function T = cssm( ffs )
str = fileread( ffs ); % read the entire file
ixs = find( str=='*', 1,'first' ) +1; % find first position of interest
str = str( ixs : end ); % strip off leading comments
% split the text array into blocks
[ blocks, matches ] = strsplit( str, '(?m)^#[^\r\n]*' ...
, 'DelimiterType','RegularExpression' );
blocks(1) = []; % delete whatsever before the first block header
% read the blocks of text
len = length( blocks );
num = cell( len, 2 );
for jj = 1 : len
num(jj,:) = textscan( blocks{jj}, '%f%f' );
end
heights = cellfun( @numel, num(:,1) );
% preallocate a table
T = table( 'Size' , [sum( heights ),3] ...
, 'VariableTypes' , {'double','double','string'} ...
, 'VariableNames' , {'Var1','Var2','info'} );
% add data to table
ix1 = 1;
for jj = 1 : len
ix2 = ix1 + heights(jj) - 1;
T.Var1(ix1:ix2) = num{jj,1};
T.Var2(ix1:ix2) = num{jj,2};
T.info(ix1:ix2) = repmat( string(matches{jj}(3:end)), heights(jj),1 );
ix1 = ix2 + 1;
end
end
  2 Comments
newbie9
newbie9 on 12 Mar 2019
thank you @per isakson. I am looking at the pages to which you linked and am still a little stuck--I have added an example desired output
per isakson
per isakson on 13 Mar 2019
Edited: per isakson on 13 Mar 2019
I added a working code to the answer. Note that I modified the info-texts in the text file, example.txt.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!