Why does matlab save strings from delimited text file as individual characters? And how to prevent.

5 views (last 30 days)
So, I have a cell structure in Matlab (containing words, dates and numbers separated by ";" loaded from a very large file) which I take certain lines from, then do some calculations on and finally write each field to a separate file as a table (the words being the headers, the dates and numbers the data).
I have the script functioning more or less okay, be it that I keep running into a particular problem; namely that when splitting the lines using strsplit all entries are treated as individual characters. So when I select a cell entry and add a position, for example A.a{1,1}(2) it returns the second letter of the string. It also does this for numbers, making manipulation difficult. Being splitted strings Matlab treats multi-digit numbers as single numbers, so when I do A.a{1,2} it returns 122, but when I do A.a{1,2}*2 I get ans = 98 100 100 rather then 244. Now I could use str2num, but that doesn't work for words or dates so can become pretty cumbersome... I have a hard time finding the right command to convert all entries to single 'words'. I've also tried using cell2array and array2table commands, but I somehow keep running into issues. Any help would be appreciated!
  4 Comments
Stephen23
Stephen23 on 8 Sep 2017
@Sjouke Rinsma: Thank you for uploading some sample data. I note that all of the columns appear to be numeric, except for the date in the first column. I have no idea why you are wasting your time with importing that data as characters. Why not simply import the data directly as numeric?
Sjouke Rinsma
Sjouke Rinsma on 8 Sep 2017
Hi Stephen; I get what you're saying, though I'm somewhat fuzzy on how to import a ;-delimited text file as numeric data, since this one also contains the 'non-numeric dates'. dlmwrite does not recognize these, and readtable still imports everything as chars.. but maybe I'm just not familiar with right function to use in this case, or I'm just completely overlooking something.
Nevertheless, for as far as I can see, by the time I've reached line 22 I've got a completely numeric array (if I remove the ; at the end) in which I then rewrite the date. Also, for the files I've uploaded, the script seems to work fine, though as I mentioned before; when I'm working with the larger file I somehow get a matrix where toward the right most columns of a field the data types become mixed (randomly quoted and non-quoted entries in the same column). This also results in written files where some numbers are written as numeric and others as chars (?) with, resulting in different number of digits which makes everything look really messy (I've uploaded the resulting mat-file of the result structure and the final text file for one field, if you're interested). Especially that last part has got me puzzled... I would assume it's not because of the large data set, since that is actually the reason I'm using Matlab in the first place.

Sign in to comment.

Accepted Answer

Stephen23
Stephen23 on 8 Sep 2017
Edited: Stephen23 on 8 Sep 2017
Rather than wasting time importing the data as character, you would be much better of using textscan to import numeric values as numeric data, for example this reads your entire example file:
opt = {'Delimiter',';', 'CollectOutput',true};
fid = fopen('merged.txt','rt');
hdr = fgetl(fid);
fmt = ['%s',repmat('%f',1,nnz(hdr==';'))];
C = textscan(fid,fmt,opt{:});
fclose(fid);
and checking:
>> size(C{1}) % the number of date strings
ans =
6076 1
>> size(C{2}) % the size of the numeric matrix
ans =
6076 47
>> C{1}{[1,end]} % the first and last dates
ans = 07-09-2017 08:25:33
ans = 07-09-2017 10:40:54
" I work with a 200M+ lines file"
If you have a very large file that cannot be imported at once then you can adapt the code I have shown above using the method given in the MATLAB documentation, which reads blocks of data at-a-time:
Basically the trick is to use the third optional input to specify how many lines to read, and call textscan in a loop.
  1 Comment
Sjouke Rinsma
Sjouke Rinsma on 8 Sep 2017
Edited: Sjouke Rinsma on 12 Sep 2017
Should've refreshed before answering that previous post... nevertheless thanks for this, I will definitely look into it!
And so I did. Seems to be working fine now, thanks :)

Sign in to comment.

More Answers (0)

Categories

Find more on Data Import and Export in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!