how to read grid data from text file ?

30 views (last 30 days)
pruth on 23 Sep 2017
Commented: dpb on 12 Jul 2019
hi I have a text file(attached). which contain ozone data. I am not able to read the data. since it is not in regular format. only latitude(-59.5S to 59.5N (1.00 degree steps) ) is given and on every latitude all ozone data is given so there are 288 longitudes(-179.375W to 179.375E (1.25 degree steps)) therefore 288 data points are there. but the problem is all data is in string format and we need to split data after every 3 digit. some random space is also given in the middle of the data so we have to remove that also otherwise data will not split in 3 correct digits .
later i will use inpolygon to grab out the data from specific region. that i will try later. but first i need to read this text file and took the data out.
hope you understand.
pruth on 23 Sep 2017
Edited: pruth on 23 Sep 2017
yes.the same original file is attached . the earlier data which I used was very differently arranged and bit simple. I am no good in programing. so finding this hard. hope you will help.

Sign in to comment.

Accepted Answer

Cedric Wannaz
Cedric Wannaz on 23 Sep 2017
Edited: Cedric Wannaz on 23 Sep 2017
The format seems to be GridTOMS as mentioned here. There is an IDL reader and there may be MATLAB ones.
If you need a stable reader, I advise you to look for a MATLAB implementation "endorsed" by NASA. If you need a quick hack to perform early tests, you can try the following (where I assume that spaces code for trailing zeros):
content = fileread( 'L3_tropo_ozone_column_jan14.txt' ) ;
% - Remove first space on all data rows.
content = regexprep( content, '(?<=[\r\n]) ', '' ) ;
% - Split by "lat = ..." separator.
blocks = regexp( content, '\s+lat[^\r\n]+', 'split' ) ;
% - Extract header from block 1.
pos = regexp( blocks{1}, '\)\s+\d', 'start' ) ;
header = blocks{1}(1:pos) ;
blocks{1} = blocks{1}(pos+1:end) ;
% - Merge blocks, remove \r\n, replace spaces by 0s.
blocks = [blocks{:}] ;
blocks = regexprep( blocks, '[\r\n]', '' ) ;
blocks(blocks == ' ') = '0' ;
% - Convert to 120x288 numeric array.
data = reshape( sscanf( blocks, '%3d' ), 288, 120 ).' ;
Note that it is easy to wrap this in a function and call it while iterating through files from a folder (using the output of DIR). It is also easy to extract meta information from the header if relevant.
dpb on 12 Jul 2019
"so i changed to sscanf( blocks, '%4d' )"
The problem is C -- the formatting was not designed with fixed-width files in mind and it simply can't handle them by default because '%4d' does NOT mean what one logically would expect; namely :"read four-character-width fields beginning at the beginning of the recore". Instead it means "read no more than 4 characters, but C silently "eats" the white space and so, as you notice, by the time it gets to the fourth entry in your input record, it begins with the 8 instead of the blank and reads "no more than" four characters. But, that's not the right answer. Fortran FORMAT gets it right, but unfortunately Mathworks chose the easy way out when rewrote MATLAB in C and used the C runtime i/o library instead of building a FORMAT facility. Late releases have (finally!! after 30 years) introduced a new fixed width text import object but that won't help you unless you can upgrade.
You simply have to count characters (including blanks) and process the resulting substrings -- with the sample record you give (NB: you're missing the leading blank at the beginning of the record)
>> str2num(reshape(rec,4,[]).')
ans =

Sign in to comment.

More Answers (2)

dpb on 23 Sep 2017
Edited: dpb on 23 Sep 2017
  1. Read the file as block of cellstr, convert to character array
  2. Convert char array of 12x75 to 1*900 line=reshape(blk.',1,[]);
  3. Select first 288*3 --> 864 characters c=line(1:864);
  4. Replace any blanks with '0' c=strrep(c,' ','0');
  5. Convert 3-digit fields dat=sscanf(c,'%3d');
  6. Go next block
Thanks to Cedric for pointing out my weak eyes... :)
file=textread('tropo.txt','%s','delimiter', '\n','whitespace', '','headerlines',3); % file as cellstr array
L=length(file); % number lines/records in file
data=zeros(L/12,288); % preallocate for resulting data
j=0; % counter for data blocks
for i=1:12:L % loop over blocks of 12 records
blk=char(file(i:i+11)); % retrieve a block, convert to character array
blk(:,1)=''; % remove leading blanks
line=reshape(blk.',1,[]); line=line(1:864); % recast as record;truncate
line=strrep(line,' ','0'); % replace blanks with leading 0
j=j+1; % increment counter
data(j,:)=sscanf(line,'%3d'); % convert to numeric
results in a double array containing the data...
From the first block I tested at command line--
>> whos data
Name Size Bytes Class Attributes
data 288x1 2304 double
Cedric Wannaz
Cedric Wannaz on 23 Sep 2017
My maybe younger eyes failed me too. I had to get tricked a couple times before I realized!

Sign in to comment.

Guillaume on 23 Sep 2017
Whoever created that format should be very ashamed. It's a pain to parse.
This is a start. I still need to figure out why I've got 292 columns instead of 288, but I've got to go.
filecontent = fileread('L3_tropo_ozone_column_jan14.txt'); %read it all
filecontent(ismember(filecontent, [10, 13])) = []; %remove line returns
longdesc = regexp(filecontent, 'Longitudes:\s*(\d+)\D+(\d+(\.\d+)?)([EW])\D+(\d+(\.\d+)?)([EW])', 'tokens', 'once'); %longitude description
longnumbers = str2double(longdesc([1 2 4]));
longnumbers(2:3) = longnumbers(2:3) .* (-1).^ strcmp(longdesc([3 5]), 'W'); %change sign for W
longitudes = linspace(longnumbers(2), longnumbers(3), longnumbers(1));
pointlats = regexp(filecontent, '\s+([0-9 ]+)lat\s*=\s*(-?\d+(\.\d+)?)', 'tokens'); %extract point strings and latitude
pointlats = vertcat(pointlats{:});
latitudes = str2double(pointlats(:, 2));
points = regexprep(pointlats(:, 1), '\s', '0'); %replace spaces with 0
points = regexp(points, '\d{3}', 'match'); %split in group of three
points = str2double(vertcat(points{:}));
dpb on 23 Sep 2017
Wonder why put the leading blank in there, though...that really is the only really bad part; the rest is pretty easy to deal with but that makes for special-casing. Oh, the no leading zero in the format is also pretty ugly; almost forgot that! :)

Sign in to comment.


Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!