why wont text scan read all rows?

23 views (last 30 days)
Aidan Goy
Aidan Goy on 4 Dec 2020
Commented: dpb on 4 Dec 2020
fileID = fopen('car_data.txt');
data = textscan(fileID,'%d %d %d %d %d %f %d %d %s', 'headerLines', 1);
fclose(fileID);
Why does this only read the first row of my text file and store it as a 1x9? I want it to read all lines and store as 406x9. Have i missed out some arguents to make it continue reading the following lines?
When i view the 1x9 array it creates here is the result

Answers (4)

dpb
dpb on 4 Dec 2020
MPG Cylinders displacement horsepower weight acceleration model year origin car name
18 8 307 130 3504 12 70 1 chevrolet chevelle malibu
15 8 350 165 3693 11.5 70 1 buick skylark 320
18 8 318 150 3436 11 70 1 plymouth satellite
16 8 304 150 3433 12 70 1 amc rebel sst
17 8 302 140 3449 10.5 70 1 ford torino
...
'Cuz it's tab-delimited and the string data with embedded blanks aren't quote-delimited.
You can use
fmt=[repmat('%d',1,8) '%s'];
data=textscan(fileID,fmt,'delimiter','\t','headerLines', 1);
and joy should ensue, but...
I'd strongly suggest to use readtable and the new(ish) table object instead.

Mathieu NOE
Mathieu NOE on 4 Dec 2020
hello
your last column has more than one word (2 or 3, so variable size)
so with the given arguments txtscan can't read more than the first line , and also only the first word
why not using readtable ?
T = readtable('car_data.txt');
% gives :
% T =
%
% 406×9 table
%
% MPG Cylinders displacement horsepower weight acceleration modelYear origin carName
% ____ _________ ____________ __________ ______ ____________ _________ ______ ________________________________________
%
% 18 8 307 130 3504 12 70 1 {'chevrolet chevelle malibu' }
% 15 8 350 165 3693 11.5 70 1 {'buick skylark 320' }
% 18 8 318 150 3436 11 70 1 {'plymouth satellite' }
% 16 8 304 150 3433 12 70 1 {'amc rebel sst' }
% and so on...

Star Strider
Star Strider on 4 Dec 2020
The first column causes problems because it includes ‘NA’ in a field that uses '%d'.
The fix for that is to reasd it as a string, use strrep to replace 'NA' with 'NaN', then use str2double to convert it to a double array:
fileID = fopen('car_data.txt');
data = textscan(fileID,'%s %d %d %d %d %f %d %d %s', 'HeaderLines',1, 'Delimiter','\t');
fclose(fileID);
data{1} = str2double(strrep(data{1}, 'NA','NaN'));
There are likely other ways to deal with it, such as I suggested in my previous Answer to How do I read in this text file using fopen fclose and fscanf and then split each column into variables? .
  4 Comments
Star Strider
Star Strider on 4 Dec 2020
Joseph Wilson —
I fail to understand the reason for that. The code reads the file correctly, and would (if you allow it to) create a full matrix.
I would still go with readtable, as I suggested previously.
dpb
dpb on 4 Dec 2020
You're correct, sorry. The '\t\ delimiter fixes the parsing of the blank-containing strings; I was still thinking of the default delimiter.
As you, I also suggested to use readtable as much simpler.

Sign in to comment.


Joseph Wilson
Joseph Wilson on 4 Dec 2020
fileID = fopen('car_data.txt');
fmt=[repmat('%q',1,8) '%s %*[^\n]']; %have to have %*[^\n] to ignore the rest of that row which changes size
data = textscan(fileID,fmt,'headerLines',1);
fclose(fileID);
for count = 1:length(data)-1
data_double(:,count) = str2double(strrep(data{count}, 'NA','NaN')); %This yeilds a 406X8 matrix
end
data_string = data{9}; %This yeilds a 406X1 cell matrix with all the make data

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!