You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
Textscan import string data from .txt file
2 views (last 30 days)
Show older comments
Hi!
When I'm using textscan to read my data I get all the data but it's not quite organized the way I would like.
I will attach a sample .txt file.
I'm using this code to import the data:
%Imports all .txt files according user input time and convert into strings
Data = cell(1, numfiles); %Preallocate empty cell
for h = 1:numfiles
filename = sprintf('%s.txt',w(h,:)); %add .txt to year and month
fileID = fopen(filename); %open filename to create fileID
Data{h} = textscan(fileID,'%s','delimiter','\n'); %read all characters in fileID
fclose(fileID); %close fileID
end
What I would like to achieve is a string starting with METAR ESXX and with varying ending (for ex. Q1011 or R08/750135 or other).
I've tried using different delimiters but I get more or less the same result with the different delimiters.
It seems to be some problem when the data is not delimited by a newline what I can tell, but I can't find the right solution to get it working.
In a previous version of my code I was using fread but I understand that textscan is better to use. Is that correct?
Do you have any suggestions to what could be changed?
Thanks!
This a sample of the result of Data.
'M04/M06 Q1020
METAR ESKN 160020Z 31003KT 0300 R08/P2000N R26/1100N BCFG NSC'
'M04/M05 Q1010 R08/750135
METAR ESKN 160050Z 31003KT 5000 BR FEW064 M04/M04 Q1011 R08/750135
METAR ESKN 160120Z 31003KT CAVOK M03/M03 Q1011 R08/750135
METAR ESKN 160150Z VRB01KT 9999 FEW003 BKN061 M03/M03 Q1011'
'R08/750135
METAR ESKN 160220Z 32004KT 9999 SCT042 BKN055 BKN066 M02/M02 Q1011'
'R08/750135
METAR ESKN 160250Z 28003KT 9999 SCT003 BKN036 BKN057 M02/M02 Q1012'
'R08/750135
METAR ESKN 160320Z VRB02KT 9999 BKN002 M02/M02 Q1012 R08/750135
METAR ESKN 160350Z 33004KT 9999 BKN002 M01/M01 Q1012 R08/750135
METAR ESKN 160420Z VRB01KT 9999 BKN002 M01/M01 Q1012 R08/750135
METAR ESKN 160450Z 00000KT 4000 BR SCT003 M02/M02 Q1013 R08/710195
METAR ESKN 160520Z 30003KT 0300 R08/P2000N R26/0750U BCFG FEW003'
'SCT072 M03/M03 Q1013 R26/710195
METAR ESKN 160550Z VRB03KT 9000 SCT066 M03/M03 Q1013 R26/710195
METAR ESKN 160620Z 29003KT 9999 FEW002 BKN068 M02/M02 Q1013'
'R26/710195
METAR ESKN 160650Z 31003KT 9999 FEW002 SCT068 M00/M00 Q1014'
Accepted Answer
dpb
on 13 Nov 2021
Read the file as is and then clean it up instead...
d=readcell('202103.txt','Delimiter',newline); % read a cellstr array
i1=find(~startsWith(d,'METAR'))-1; % locate first of line pairs
for i=1:numel(i1) % and merge those by pair
d(i1(i))=join(d(i1(i):i1(i)+1));
end
d(i1+1)=[]; % then eliminate the second
Sanity check...
>> all(startsWith(d,'METAR'))
ans =
logical
1
>>
9 Comments
Linus Dock
on 13 Nov 2021
Thank you! I will try this and see if it works.
Best regards
/Linus
Linus Dock
on 16 Nov 2021
Hello again!
Thanks for your help!
I can't get the readcell function to work with my version of Matlab 2018b.
I tried incorporating your suggestion into my code but I can't get it to function properly I'm afraid.
%Imports all .txt files according user input time and convert into strings
Data = cell(1, numfiles); %Preallocate empty cell
for h = 1:numfiles
filename = sprintf('%s.txt',w(h,:)); %add .txt to year and month
fileID = fopen(filename); %open filename to create fileID
Data{h} = textscan(fileID,'%s','delimiter','\n'); %read all characters in fileID
fclose(fileID); %close fileID
end
d=Data{:}{1};
%d=readcell('202103.txt','Delimiter',newline); % read a cellstr array
i1=find(~startsWith(d,'METAR'))-1; % locate first of line pairs
for i=1:numel(i1) % and merge those by pair
d(i1(i))=join(d(i1(i):i1(i)+1));
end
d(i1+1)=[]; % then eliminate the second
d is now just a 1x1 cell with the following content:
'METAR ESGG 010020Z 19007KT 0150 R03/0600N R21/0550N FG VV003 01/00 Q1030 R21/09//95
METAR ESGG 010050Z 20007KT 0150 R03/0550N R21/0550N FG VV002 01/00'
The code seems to work with separating the groups judging by the return symbol in front of the METAR group below. But how do I get the output as separate cells containing one METAR line.
{'METAR ESGG 010020Z 19007KT 0150 R03/0600N R21/0550N FG VV003 01/00 Q1030 R21/09//95↵METAR ESGG 010050Z 20007KT 0150 R03/0550N R21/0550N FG VV002 01/00' } {'Q1030 R21/09//95↵METAR ESGG 010050Z 20007KT 0150 R03/0550N R21/0550N FG VV002 01/00' } {'Q1030 R21/09//95↵METAR ESGG 010120Z 21007KT 0150 R03/0500N R21/0500N FG VV002 01/00' } {'Q1030 R21/09//95↵METAR ESGG 010150Z 19007KT 0100 R03/0500N R21/0450N FG VV002 01/00'
dpb
on 16 Nov 2021
Oh. Unfortunately for you, readcell was introduced in R2019a.
I had difficulty with textscan, too...the input file contains \r at the end of each METAR line and \n after the short lines. That seemed to confuse all the past ways I've used to return records as cellstr inside textscan
My usual fallback in such cases is to return to the venerable (but deprecated) textread but it also failed with a (new to me) buffer overflow because it, too, apparently became confused by the disparate terminators.
So, before reverting to fegtl and loop (which isn't all that bad, actually, just a little more code to write, but less than your above loop), I tried the simple expedient of
>> d=importdata('202103.txt');
>> whos d
Name Size Bytes Class Attributes
d 77796x1 15658840 cell
>> d(1:6)
ans =
6×1 cell array
{'METAR ESGG 010020Z 19007KT 0150 R03/0600N R21/0550N FG VV003 01/00'}
{' Q1030 R21/09//95' }
{'METAR ESGG 010050Z 20007KT 0150 R03/0550N R21/0550N FG VV002 01/00'}
{' Q1030 R21/09//95' }
{'METAR ESGG 010120Z 21007KT 0150 R03/0500N R21/0500N FG VV002 01/00'}
{' Q1030 R21/09//95' }
>>
and joy ensues.
Now the previous join trick should work as expected.
dpb
on 16 Nov 2021
Edited: dpb
on 16 Nov 2021
ADDENDUM
%Imports all .txt files according user input time and convert into strings
Data = cell(1, numfiles); %Preallocate empty cell
for h = 1:numfiles
filename = sprintf('%s.txt',w(h,:));
d=importdata(filename);
i1=find(~startsWith(d,'METAR'))-1;
for i=1:numel(i1)
d(i1(i))=join(d(i1(i):i1(i)+1));
end
d(i1+1)=[];
% Now do your business on this file BEFORE going to the next one
% That could include (and I would recommend) writing it out in clean form
% either as new text file or replacing the original (be very careful to
% have backups first if trying that) or SAVEing as .mat files.
...
Data(h)=d; % will save into your large array
end
dpb
on 16 Nov 2021
Edited: dpb
on 16 Nov 2021
ADDENDUM SECOND
If for some reason, importdata also has a problem...
%Imports all .txt files according user input time and convert into strings
Data = cell(1, numfiles); %Preallocate empty cell
for h = 1:numfiles
filename = sprintf('%s.txt',w(h,:));
fid=fopen(filename);
% replacement to read file with low-level fgetl()
d={};
i=0;
while ~feof(fid)
i=i+1;
d(i,1)={fgetl(fid)};
end
fclose(fid);
% end alternate code here...
i1=find(~startsWith(d,'METAR'))-1;
for i=1:numel(i1)
d(i1(i))=join(d(i1(i):i1(i)+1));
end
d(i1+1)=[];
% Now do your business on this file BEFORE going to the next one
% That could include (and I would recommend) writing it out in clean form
% either as new text file or replacing the original (be very careful to
% have backups first if trying that) or SAVEing as .mat files.
...
Data(h)=d; % will save into your large array
end
dpb
on 16 Nov 2021
ADDENDUM THIRD
>> frewind(fid),clear d;d={};tic;i=0;while ~feof(fid),i=i+1;d(i,1)={fgetl(fid)};end,toc
Elapsed time is 0.476456 seconds.
>> whos d
Name Size Bytes Class Attributes
d 77796x1 15658840 cell
>>
It's not too bad to use fgetl into empty cell array as far as timing goes...
>> tic,d=importdata('202103.txt');toc
Elapsed time is 0.173747 seconds.
>>
but importdata wins hands down so use it if at all possible...
Linus Dock
on 17 Nov 2021
Awesome! There was joy!
Importdata did the trick.
This is what worked for me
Data = cell(1, numfiles); %Preallocate empty cell
for h = 1:numfiles
filename = sprintf('%s.txt',w(h,:));
d=importdata(filename);
i1=find(~startsWith(d,'METAR'))-1;
for i=1:numel(i1)
d(i1(i))=join(d(i1(i):i1(i)+1));
end
d(i1+1)=[]; % then eliminate the second
Data{h}=d; % will save into your large array
end
Thanks a lot!
More Answers (0)
See Also
Categories
Find more on Text Files in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)