Finding Lines in a Large Text File with a Specific Text

Question

Sonoma Rich on 12 Jul 2019

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/471376-finding-lines-in-a-large-text-file-with-a-specific-text

Answered: Sonoma Rich on 13 Jul 2019

I am trying to read a large text file (>1GB). I only want to read lines that contain a specific text. For example, I want to read every line that contains "<field name="data". Currently I am using fgetl and reading every line, checking if the text is in the line, but it takes too long. Any suggestions?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Sonoma Rich on 13 Jul 2019

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/471376-finding-lines-in-a-large-text-file-with-a-specific-text#answer_383081

I found the following code that works well

filetext = fileread('fileread.m');
expr = '[^\n]*fileread[^\n]*';
matches = regexp(filetext,expr,'match');
disp(matches')

but the regexp function is slower than I expected. I ended up using the following method which is significantly faster.

fid = fopen('fileread.m','r');
ftext = textscan(fid,'%s','Delimiter','\n');
fclose(fid);
matches = ftext{1}(contains(ftext{1},'fileread'));
disp(matches)

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

KSSV on 12 Jul 2019

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/471376-finding-lines-in-a-large-text-file-with-a-specific-text#answer_382925

Read about textscan. This function gives you option of running a loop and reading required chunks (lines) of the file. In these chunks, you can pick your required line.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 3

Walter Roberson on 12 Jul 2019

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/471376-finding-lines-in-a-large-text-file-with-a-specific-text#answer_382928

Open in MATLAB Online

If you have enough memory:

S = fileread('YourFileNameHere.txt');
selected = regexp('^.*<fieldname\s*=.*$', 'match', 'dotexceptnewline', 'lineanchors');

And in the case where you do not care what is at the begining or end of line and just want to know what the "data" field content is, then

S = fileread('YourFileNameHere.txt');
datas = regexp('(?<=fieldname\s*=")(?<data>[^"]*)', 'tokens');

That should get you a struct array with field name 'data' that is the content of inside the quotes.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Finding Lines in a Large Text File with a Specific Text

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (2)

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

Finding Lines in a Large Text File with a Specific Text

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (2)

0 Comments Show -2 older commentsHide -2 older comments

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments