How to import Text File with 2 different Delimiters (how to organize header data and numeric data)

Question

Ulrich Bretz on 1 Nov 2017

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/364487-how-to-import-text-file-with-2-different-delimiters-how-to-organize-header-data-and-numeric-data

Edited: Cedric on 3 Nov 2017

I want to import a text file. This contains a header (with space as delimiter) and data (tab delimited).

The txt-file looks like this:

FORMAT TAB_DELIMITED 
NUM_HEADER_BLOCKS 162 
NUM_PARAMS 646 
PT_COUNT.CND_1 3895 
FRAMES.CND_1 16 
FILE_TYPE TIME_HISTORY 
OPERATION RSP_TO_TAB 
DATA_TYPE ASCII_FLOATING_POINT 
DATE Fri Jun 23 11:20:24 2017 
DELTA_T 9.765625e-02 
TOTAL_T 3.803711e+02 
PTS_PER_FRAME 256 
PTS_PER_GROUP 256 
CHANNELS 120 
. 
. 
NUM_ZEROS 5 %end of header with line index 646
RfLongPositionFbk   RfLatPositionFbk     ...... %start of tab delimited area with the data (120 channels)
mm       mm 
-12.6182   -4.071238 
-12.6192   -4.070237 
-12.6182   -4.069237

I want to search the Line which contains "NUM_PARAMS" and want to read the numeric value, which tell me the size of the header section.
After that I want to read the file up to the line 646 in 2 rows - (1st row -> parameter name and 2nd row value.#Then I want to read the data (which is tab delimited - 120 channels).It would be fine if I can rename the channels with the names shown in the line above the units of measurement.

I started to read the full txt-file with the following code to import the header and search for the NUM_PARAM:

s = textscan(fid, '%s%s', 'delimiter', ' ');    
idx_NUM_PARAMS = find(strcmp(s{1}, 'NUM_PARAMS'), 1, 'first');      
NUM_PARAMSdbl = str2double(s{1,2}{idx_NUM_PARAMS,1});

But I imported also the data as String which is not usable because of the different delimiter.

So I read out the data in a second step:

 dataTable = readtable(fileName, 'Delimiter', '\t', 'headerLines',NUM_PARAMSdbl+4,'ReadVariableNames',true);

But I cannot name the rows with the channel names, only with the line right above the data (with the units of measurement).

Thank you for every hint how can I solve my problem.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Cedric on 1 Nov 2017

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/364487-how-to-import-text-file-with-2-different-delimiters-how-to-organize-header-data-and-numeric-data#answer_288852

Edited: Cedric on 1 Nov 2017

Open in MATLAB Online

data.txt

You may not need to use header information for parsing your file. Look at this example (applied to data.txt attached):

content = fileread( 'data.txt' ) ;
% - Split header/data.
pos = strfind( content, 'RfLongPositionFbk' ) ;
header = strtrim( content(1:pos-1) ) ;
data   = content(pos:end) ;
% - Header -> struct with numeric values when possible.
header = regexp( header, '^(\S+)\s+([^\r\n]+)', 'tokens', 'lineanchors' ) ;
header = vertcat( header{:} ) ;
fNames = regexprep( header(:,1), '\W', '_' ) ;
values = strtrim( header(:,2) ) ;
buffer = str2double( values ) ;
isNum  = ~isnan( buffer ) ;
values(isNum) = num2cell( buffer(isNum) ) ;
header = cell2struct( values,fNames ) ;
% - Data -> num array.
data = cell2mat( textscan( data, '%f %f', 'headerlines', 2 )) ;

Running this, you get:

 >> header
 header = 
  struct with fields:
               FORMAT: 'TAB_DELIMITED'
    NUM_HEADER_BLOCKS: 162
           NUM_PARAMS: 646
       PT_COUNT_CND_1: 3895
         FRAMES_CND_1: 16
            FILE_TYPE: 'TIME_HISTORY'
            OPERATION: 'RSP_TO_TAB'
            DATA_TYPE: 'ASCII_FLOATING_POINT'
                 DATE: 'Fri Jun 23 11:20:24 2017'
              DELTA_T: 0.0977
              TOTAL_T: 380.3711
        PTS_PER_FRAME: 256
        PTS_PER_GROUP: 256
             CHANNELS: 120
            NUM_ZEROS: 5
 >> data
 data =
  -12.6182   -4.0712
  -12.6192   -4.0702
  -12.6182   -4.0692

7 Comments
Show 5 older commentsHide 5 older comments

Cedric on 3 Nov 2017

Edited: Cedric on 3 Nov 2017

Open in MATLAB Online

I still don't understand if you really need the information in the header or not (if it was just for getting the number of lines in the header and the number of channels). Assuming that you just want the data and the channel names and units, the following works:

content = fileread( '012f1ri(Forum).txt' ) ;
% - Extract # parameters and channels.
nParams   = str2double( regexp( content, '(?<=NUM_PARAMS\s+)\S+', 'match', 'once' )) ; 
nChannels = str2double( regexp( content, '(?<=CHANNELS\s+)\S+', 'match', 'once' )) ; 
% - Read channel names and units, at line numParames plus 2 and 3.
fmtSpecNames = repmat( '%s', 1, nChannels ) ;
channelNames = textscan( content, fmtSpecNames, 1, 'HeaderLines', nParams+2 ) ;
channelNames = horzcat( channelNames{:} ) ;
channelUnits = textscan( content, fmtSpecNames, 1, 'HeaderLines', nParams+3 ) ;
channelUnits = horzcat( channelUnits{:} ) ;
% - Read channel data, from line numParams plus 4 on.
fmtSpecData = repmat( '%f', 1, nChannels ) ;
channelData = textscan( content, fmtSpecData, 'HeaderLines', nParams+4 ) ;
channelData = cell2mat( channelData ) ;

After running this, variables channelNames, channelUnits, and channelData contain names, units and data respectively.

Then we can convert to struct array, table, or whatever is best for you, and extract data from the header as well if needed.

Stephen23 on 3 Nov 2017

Open in MATLAB Online

Ulrich Bretz's "Answer" moved here:

That's now my status:

content = fileread(fileName);
lineStarts = [0, strfind( content, sprintf('\n') )] + 1 ;                                         
numParams_header   = str2double( regexp( content, '(?<=NUM_PARAMS\s+)\S+', 'match', 'once' ));    
header = content(lineStarts(1):(lineStarts(numParams_header+1)-1));                               
channels   = content(lineStarts(numParams_header +3):(lineStarts(numParams_header +4)-1));        
units = content(lineStarts(numParams_header +4):(lineStarts(numParams_header +5)-1));           
data = content(lineStarts(numParams_header +6):end);

How can i convert the channels and units from a sequence of characters to a char array?

I use Matlab R2014a

Cedric on 3 Nov 2017

Edited: Cedric on 3 Nov 2017

Open in MATLAB Online

The answer in my comment above does this already. But if you want to follow your current approach, you can use STRSPLIT to get cell arrays of channel names and units (and possibly STRTRIM before, to get rid of \r if STRSPLIT outputs a 121th empty cell).

For the data, I would do it this way:

 data = sscanf( data, '%f' ) ;                    % Long vector of all data.
 data = reshape( data, numel(channels), [] ).' ;  % Reshape into array.

where channels is a cell array of channel names (output of STRSPLIT).

Sign in to comment.

How to import Text File with 2 different Delimiters (how to organize header data and numeric data)

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

7 Comments
Show 5 older commentsHide 5 older comments

See Also

Categories

Tags

Community Treasure Hunt

How to import Text File with 2 different Delimiters (how to organize header data and numeric data)

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

7 Comments Show 5 older commentsHide 5 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

7 Comments
Show 5 older commentsHide 5 older comments