Extracting certain data from very large text/numeric data

I am trying to extract data from a hoc file which is a combination of text,whitespace,characters, and numbers. I need to be able to find the row index of wherever there occurs the string "section[%d]" where d is an integer, just being able to find the row when I use importdata to a cell array would be good enough, there are upwards of like 40 occurences of the string so I need to find all of them.

6 Comments

Why do you need row numbers? What do you need to extract or do then with these row numbers?
first of all you answered my last question about dealing with this and that was great. I had forgotten to mention that within all of the rows the points are parsed every now and then by a new "section" of points, so I need to be able to identify which point corresponds to the start of a new "section". If you recall the data for the most part is like
pt3dadd(x,y,z,d,e) pt3dadd(x1,y1,z1,d1,e1) } section[2] { pt3dadd(x2,... and so on
  • pt3dadd(x,y,z,d,e)
  • pt3dadd(x1,y1,z1,d1,e1)
  • }
  • section[2] {
  • pt3dadd(x2,
Kelly and I also answered this question from you:
but you gave no feedback, did you need more information?
For the current question, do you need to get the section ID or just to split the file by section and process each section with TEXTSCAN ?
This is not anything to do with calculation. I need to just find where in the text the section id string occurs because that will give me a reference for the first point in that section. The ID number doesn't matter that much since if there is section written 10 times throughout all the points it will be sections(1-10)
My regexp solution is not working for you?

Answers (2)

find(~cellfun(@isempty, regexp(YourCell, 'section\[%\d+\]', 'start')))
Based on your comment: one way to tackle that is to split the file according to section headers/footer, so you get blocks that you can process using TEXTSCAN. Example:
content = fileread('myData.txt') ;
blocks = regexp(content, '(}\s*){0,1}section\[\d+\]\s*{|}', 'split') ;
blocks = blocks(2:end-1) ; % Eliminate first empty and last
% (after last '}') blocks.
nBlocks = length(blocks) ;
data = cell(nBlocks, 1) ;
for bId = 1 : nBlocks
data{bId} = textscan(blocks{bId}, 'pt3dadd(%f,%f,%f,%f,%f)') ;
end
and if you don't want data to be a cell array of cell arrays (output of _TEXTSCAN_is a cell array of columns), you can replace the above line in the FOR loop with:
buffer = textscan(blocks{bId}, 'pt3dadd(%f,%f,%f,%f,%f)') ;
data{bId} = [buffer{:}] ;

This question is closed.

Asked:

on 28 Aug 2013

Closed:

on 20 Aug 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!