Question about optimizing reading data from text file

Question

0 votes

Hello, thanks for reading this,

I currently have a reader that reads in mesh files, and it works, but depending on the size of the file it can take a very long time. I was hoping I can optimize it for speed.

What I do first is read in a text file and change every line into a matrix of characters using the lines:

   cac = textscan( fid, '%[^\n]' );
   fclose(fid);
   A  = char( cac{1} );

where A is my character matrix. I then search through the text file for identifiers for data I need. How I accomplish this is by setting start of data indices and end of data indices. I basically read this line by line, and at the moment, I assume it will always be formatted in a certain way.

After I have these indices, I use sscanf functions to read the characters as %f or %x numbers and store them into matrices. This is the part where the profiler says it takes the longest to complete.

I posted the MATLAB reader function here: http://pastebin.com/FFtgXzg4, since it is a bit long to post here. My specific questions are: do I have to convert the whole text import into a character matrix, and is there any way I can do this without needing a for loop? The loops using sscanf take a very long time.

It works, but just barely so. I can send a test import file if needed.

1 Comment
Show -1 older comments Hide -1 older comments

Cedric on 24 May 2013

Could you post e.g. 20 lines of your data file, and define these identifiers that are are referring to?

Sign in to comment.

Sign in to answer this question.

Sign in to follow activity

Answer 1

Jonathan Sullivan on 23 May 2013

Edited: Jonathan Sullivan on 23 May 2013

Open in MATLAB Online

0 votes

You may want to use fread and regexp.

Without seeing your file, I can't say for sure this will produce the same result, but it should give you a good starting point.

% Using regexp and fread
fid = fopen(filename,'r');
tic;
A = regexp(fread(fid,'*char')','\n','split');
A = char( A{:} );
toc
fclose(fid);
% Using textscan
fid = fopen(filename,'r');
tic;
B = textscan(fid,'%[^\n]');
B2 = char(B{1});
toc
fclose(fid);

1 Comment
Show -1 older comments Hide -1 older comments

Brian on 23 May 2013

It seems that the text scan I have goes slightly faster than the regexp/fread combination. There is one last part of the code that seems to be giving me problems:

When I have my start and end indices, I use sscanf line by line to give me the real data I need. However, some of my character matrices can be very large: sometimes spanning hundreds of thousands of rows (depending on the number of tetrahedra I have).

Is it possible to read this in any kind of intelligent fashion using sscanf line by line, or use it as a vector component, or should I look into exporting the matrix to a formatted text file and re-importing it using textread and hex2dec?

In these areas, I will always have the following combination of characters:

xxx xxx xxxx x x,

where I believe it can be split by a space delimiter. That leaves me with five hexadecimal values per row.

Sign in to comment.

Question about optimizing reading data from text file

1 Comment
Show -1 older comments Hide -1 older comments

Answers (1)

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Tags

Community Treasure Hunt

Question about optimizing reading data from text file

1 Comment Show -1 older comments Hide -1 older comments

Answers (1)

1 Comment Show -1 older comments Hide -1 older comments

Categories

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

1 Comment
Show -1 older comments Hide -1 older comments