How to read in large text file with special delimiters?

2 views (last 30 days)
Hi,
Is there a way to read data from a text file with the following format per row:
A'~'648387'~'3238157'~'9'~'20'~''~'14'~''~'#@#@#
Thus, the column delimiter is '~', and the row delimiter is #@#@#.
Further, missing/null values are represented by two columns delimiters '~''~', for example:
A'~'216772930'~'Birdbox'~''~'1'~'5'~''~''~''~''~''~''~''~''~''~''~''~''~''~'1'~'213'~'#@#@#
Is there any way to specify your own row and column delimiters to be able to read in this data?
Thanks a lot in advance!

Accepted Answer

per isakson
per isakson on 24 Apr 2021
Edited: per isakson on 24 Apr 2021
"Is there any way to specify your own row and column delimiters to be able to read in this data?" No, I don't think so.
How large is the text file? I assume it fits in a fraction of the physical memory (RAM) of your computer.
I ussume that the column delimiter is the three character vector: '~'
Work around
  1. read the entire file to a character vector
  2. replace "#@#@#" by newline
  3. replace "'~'" by comma (I use "" to avoid escape characters)
  4. parse the resulting string with textscan() (for some reason readtable doesn't take strings.)
Demo
%%
chr = fileread( 'cssm.txt' );
%%
str = strrep( chr, "#@#@#", newline );
str = strrep( str, "'~'", "," );
%%
cac = textscan( str, '%s%f%f%f%f%f%f%f%f', 'Delimiter',',' );
cac{1}(1:3)
ans = 3×1 cell array
{'A'} {'A'} {'A'}
cac{2}(1:3)
ans = 3×1
648387 648387 648387
cac{6}(1:3)
ans = 3×1
NaN NaN NaN
where cssm.txt contains twentyfile copies of your first example in a single row.
  5 Comments
Ricardo Lopez A.
Ricardo Lopez A. on 26 Apr 2021
Hi Per,
Again, thanks a lot for your help!
I am working towards a deadline, but once that is past, I will certainly test your code with some large text files to see how this works! For now I am just using your initial workaround since that is working.
I will let you know as soon as I test your code, thanks a lot again!

Sign in to comment.

More Answers (0)

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!