How can I extract line numbers of text data?

Question

Paschalis Garouniatis on 31 Jul 2016

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/297881-how-can-i-extract-line-numbers-of-text-data

Edited: Paschalis Garouniatis on 3 Aug 2016

portion.txt

Hello everyone. I have attached a .txt file (portion.txt) which contains a portion of my data. What I need is to create a script which will identify strings that correspond to pairs of x-y coordinates and return their line numbers. For instance, in the .txt file the first set of coordinates begins at line 3 and ends at line 138 (the number of those pairs is written above each set of coordinates, which at this case is 136). So the script should return those two numbers. Then this process should be done for the whole file. I suppose that the process can be repeated with loop since every next set of coordinates begins after 2 lines from the previous one. How can this be done? Thanks in advance.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Azzi Abdelmalek on 31 Jul 2016

1
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/297881-how-can-i-extract-line-numbers-of-text-data#answer_230379

Open in MATLAB Online

str=[]
fid=fopen('portion.txt')
l=fgetl(fid)
while ischar(l)
  str{end+1,1}=l;
  l=fgetl(fid);
end
fclose(fid)
str
idx=str(cellfun(@numel,regexp(str,'[\d\.]+'))==2)

3 Comments
Show 1 older commentHide 1 older comment

Azzi Abdelmalek on 1 Aug 2016

Open in MATLAB Online

    str=[]
fid=fopen('portion.txt')
l=fgetl(fid)
while ischar(l)
    str{end+1,1}=l;
    l=fgetl(fid);
end
fclose(fid)
clc
str
f=regexpi(str,'[e\-\+\d\.]+')
idx=cellfun(@numel,f)
id=idx==2
ii1=strfind([0 id'],[0 1])  % Begin
ii2=strfind([id' 0],[1 0])  % End

Paschalis Garouniatis on 1 Aug 2016

Edited: Paschalis Garouniatis on 2 Aug 2016

Thanks a lot Azzi for your response. It worked just fine.

Sign in to comment.

Answer 2

dpb on 31 Jul 2016

1
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/297881-how-can-i-extract-line-numbers-of-text-data#answer_230380

Edited: dpb on 1 Aug 2016

Open in MATLAB Online

fid=fopen('portion.txt','r');
i=0;              % loop counter
n=[];
while ~feof(fid)  % until we run out of data
  i=i+1;                                                  % increment counter
  n(i)=cell2mat(textscan(fid,'%d %*[^\n]',1,'headerlines',1));
  d(i)=textscan(fid,'%f %f',n(i),'collectoutput',1);         % read the section
  fgetl(fid);  % straighten out file pointer end of record
end
fid=fclose(fid);    % done with file

You'll have a list of the sizes and a cell array of M sets of nx2 coordinates to do with as wish...

Running on the file here I get...having named the m-file portion.m

>> portion
>> n
n =
 136   162
>> d
d = 
  [136x2 double]    [162x2 double]
>> cumsum([[3 2+n(1:end-1)].' [2+n].'])  % the start/stop positions from the lengths
ans =
   3   138
 141   302
>>

5 Comments
Show 3 older commentsHide 3 older comments

dpb on 1 Aug 2016

Edited: dpb on 1 Aug 2016

Always possible, sure....I presumed the only point in knowing which line numbers contained the data was to later use those to read the data. Hence, I just read the data figuring that would be the end result desired. :)

If it really is just the section locations that is wanted/needed, simply save the N values as well and if you really don't want the other data, no need to save d; just don't bother to assign it.(*)

() NB: If the data aren't of interest at all but only the position (can't think of why that could possibly be of any interest--oh, guess one could be using external editor and doing macro for line replacement or somesuch. If that's the case, remember that line numbers will change if you begin from the beginning of the file and do anything that modifies the number of lines in a section, but that's getting rather far astray) then you can use the N in computing a new 'headerlines' argument for subsequent reads in the line reading the next N and never actually read the data itself at all. This internally might revert to a *for loop of N fgetl calls similar to some other posters' solutions of counting lines altho it's possible the actual implementation is a search for that many \n characters and an fseek to that point; not sure how much TMW has worked on optimizations inside textscan; it's fairly new so is undoubtedly still evolving from one release to next.

I note that the "aircode" I typed is missing a couple details, the first being that one needs an fgetl to resynch the record location after the read for the section. I updated Answer to insert these mod's as an alternative solution.

dpb on 2 Aug 2016

Open in MATLAB Online

No need to create a new file, simply skip the odd headerlines before getting to the portion of the file that is regular and go from there--

fid=fopen('portion.txt','r');
for i=1:7, fgetl(fid); end  % skip preliminary stuff
...

From this point everything's the same excepting for the real file you'll need to add 7 to all the line numbers obtained if you're going to use them with respect to that file.

Paschalis Garouniatis on 3 Aug 2016

Edited: Paschalis Garouniatis on 3 Aug 2016

Thank you very much for your help dpb.

Sign in to comment.

Answer 3

Shameer Parmar on 1 Aug 2016

1
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/297881-how-can-i-extract-line-numbers-of-text-data#answer_230411

Edited: Shameer Parmar on 1 Aug 2016

Open in MATLAB Online

 Data = textread('portion.txt', '%s', 'delimiter', '');
 LineIndex = {};
 count = 1;
 for i=1:length(Data)
     if ~isempty(strfind(Data{i},'           '))
         temp_line = regexp(Data{i},'           ','split');
         LineIndex{count,1} = ['Begin at ',num2str(i+1)];
         LineIndex{count+1,1} = ['End at ',num2str(i + str2num(temp_line{1}))];
         count=count+2;
     end
 end

Make sure that your file "portion.txt" is in current directory.

to check output just type "LineIndex"

Output:

 LineIndex = 
    'Begin at 3'
    'End at 138'
    'Begin at 141'
    'End at 302'

2 Comments
Show NoneHide None

Paschalis Garouniatis on 1 Aug 2016

Thanks a lot for your answer Shameer.

Paschalis Garouniatis on 1 Aug 2016

I ran your code and the cell LineIndex has two specific subcells which represend the 'End at' with two numbers instead of one.

Sign in to comment.

How can I extract line numbers of text data?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments
Show 1 older commentHide 1 older comment

More Answers (2)

5 Comments
Show 3 older commentsHide 3 older comments

2 Comments
Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

How can I extract line numbers of text data?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments Show 1 older commentHide 1 older comment

More Answers (2)

5 Comments Show 3 older commentsHide 3 older comments

2 Comments Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment

5 Comments
Show 3 older commentsHide 3 older comments

2 Comments
Show NoneHide None