how to read grid data from text file ?

Question

0 votes

L3_tropo_ozone_column_jan14.txt

hi I have a text file(attached). which contain ozone data. I am not able to read the data. since it is not in regular format. only latitude(-59.5S to 59.5N (1.00 degree steps) ) is given and on every latitude all ozone data is given so there are 288 longitudes(-179.375W to 179.375E (1.25 degree steps)) therefore 288 data points are there. but the problem is all data is in string format and we need to split data after every 3 digit. some random space is also given in the middle of the data so we have to remove that also otherwise data will not split in 3 correct digits .

later i will use inpolygon to grab out the data from specific region. that i will try later. but first i need to read this text file and took the data out.

hope you understand.

2 Comments
Show None Hide None

Cedric on 23 Sep 2017

Does this format have a name? Is it the original format in which the data is distributed?

pruth on 23 Sep 2017

Edited: pruth on 23 Sep 2017

yes.the same original file is attached . the earlier data which I used was very differently arranged and bit simple. I am no good in programing. so finding this hard. hope you will help.

Sign in to comment.

Sign in to answer this question.

Sign in to follow activity

Answer 1

Cedric on 23 Sep 2017

Edited: Cedric on 23 Sep 2017

Open in MATLAB Online

1 vote

The format seems to be GridTOMS as mentioned here. There is an IDL reader and there may be MATLAB ones.

If you need a stable reader, I advise you to look for a MATLAB implementation "endorsed" by NASA. If you need a quick hack to perform early tests, you can try the following (where I assume that spaces code for trailing zeros):

 content = fileread( 'L3_tropo_ozone_column_jan14.txt' ) ;
 % - Remove first space on all data rows.
 content = regexprep( content, '(?<=[\r\n]) ', '' ) ;
 % - Split by "lat = ..." separator.
 blocks = regexp( content, '\s+lat[^\r\n]+', 'split' ) ;
 % - Extract header from block 1.
 pos = regexp( blocks{1}, '\)\s+\d', 'start' ) ;
 header = blocks{1}(1:pos) ;
 blocks{1} = blocks{1}(pos+1:end) ;
 % - Merge blocks, remove \r\n, replace spaces by 0s.
 blocks = [blocks{:}] ;
 blocks = regexprep( blocks, '[\r\n]', '' ) ;
 blocks(blocks == ' ') = '0' ;
 % - Convert to 120x288 numeric array.
 data = reshape( sscanf( blocks, '%3d' ), 288, 120 ).' ;

Note that it is easy to wrap this in a function and call it while iterating through files from a folder (using the output of DIR). It is also easy to extract meta information from the header if relevant.

9 Comments
Show 7 older comments Hide 7 older comments

Cedric on 23 Sep 2017

Edited: Cedric on 23 Sep 2017

Open in MATLAB Online

If these files all have the same format, there should not be any problem, but check a few years/months to be sure. Pick e.g. the first and the last value for a random latitude, so you can easily compare what is in the file and what is in the array.

If all files are in the same folder, e.g. "Originals", you can automatize the process:

 dataFolder = 'Originals' ;
 dirListing = dir( fullfile( dataFolder, '*.txt' )) ;
 ozoneData = struct( 'year', [], 'month', {}, 'monthId', [], 'data', {} ) ;
 monthStr = {'January', 'February', 'March', 'April', 'May', 'June', 'July', ...
    'August', 'September', 'October', 'November', 'December'} ;
 % - Iterate through files and process.
 for fileId = 1 : numel( dirListing )
    % - Read relevant file.
    locator = fullfile( dataFolder, dirListing(fileId).name ) ;
    fprintf( 'Processing %s ..\n', locator ) ;   
    content = fileread( locator ) ;
    % - Remove first space on all data rows.
    content = regexprep( content, '(?<=[\r\n]) ', '' ) ;
    % - Split by "lat = ..." separator.
    blocks = regexp( content, '\s+lat[^\r\n]+', 'split' ) ;
    % - Extract header from block 1.
    pos = regexp( blocks{1}, '\)\s+\d', 'start' ) ;
    header = blocks{1}(1:pos) ;
    blocks{1} = blocks{1}(pos+1:end) ;
    % - Merge blocks, remove \r\n, replace spaces by 0s.
    blocks = [blocks{:}] ;
    blocks = regexprep( blocks, '[\r\n]', '' ) ;
    blocks(blocks == ' ') = '0' ;
    % - Convert to 120x288 numeric array.
    ozoneData(fileId).data = reshape( sscanf( blocks, '%3d' ), 288, 120 ).' ;
    % - Extract year and month from header, compute month ID.
    monthYear = regexp( header, '(\w+)\s+(\d+)', 'tokens', 'once' ) ;
    ozoneData(fileId).year    = str2double( monthYear{2} ) ;
    ozoneData(fileId).month   = monthYear{1} ;
    ozoneData(fileId).monthId = find( strcmpi( monthYear{1}, monthStr )) ;
 end
 % - Sort by year and month (as file naming is messing up the order).
 [~, reIndex] = sortrows( [ozoneData.year; ozoneData.monthId].' ) ;
 ozoneData = ozoneData(reIndex) ;

and then you have a struct array that you can access as follows (note that I had just a few files, so entry #2 won't be the same on your system):

 >> ozoneData(2)
 ans = 
  struct with fields:
       year: 2014
      month: 'February'
    monthId: 2
       data: [120×288 double]
 >> ozoneData(2).data
 ans =
   267   259   269   274   251   241   243   267   258   294   258   262 ...
   ...

pirapts Raptis on 12 Jul 2019

Edited: pirapts Raptis on 12 Jul 2019

Hello everybody,

i am processing some similar files (asc again, from the same dataset, but for other variable link)

the problem is that there 4 digit numbers in the files.

so i changed to sscanf( blocks, '%4d' )

which provides the correct dimernsions for the output (720X1440)

but there are misread numbers .

in the ascci file ;ooks like

559 584 656 84811281610184216791461128412291089 667 574

but matlab format the output as

559 584 656 8481 1281 6101 ...

instead of

559 584 656 848 1128 1610...

i have tried to process them line by line and the same fault appear.

also i have noticed that the blocks char has length 4108186.

i still don't understand how i get correct dimensions (720*1440*4=4147200 for 4digits), and how it stops reading at wrong digit when 4digit numbers appear

any idea on how to handle that would be really usefull

(matlab 2014b)

dpb on 12 Jul 2019

Open in MATLAB Online

"so i changed to sscanf( blocks, '%4d' )"

The problem is C -- the formatting was not designed with fixed-width files in mind and it simply can't handle them by default because '%4d' does NOT mean what one logically would expect; namely :"read four-character-width fields beginning at the beginning of the recore". Instead it means "read no more than 4 characters, but C silently "eats" the white space and so, as you notice, by the time it gets to the fourth entry in your input record, it begins with the 8 instead of the blank and reads "no more than" four characters. But, that's not the right answer. Fortran FORMAT gets it right, but unfortunately Mathworks chose the easy way out when rewrote MATLAB in C and used the C runtime i/o library instead of building a FORMAT facility. Late releases have (finally!! after 30 years) introduced a new fixed width text import object but that won't help you unless you can upgrade.

You simply have to count characters (including blanks) and process the resulting substrings -- with the sample record you give (NB: you're missing the leading blank at the beginning of the record)

>> str2num(reshape(rec,4,[]).')
ans =
         559
         584
         656
         848
        1128
        1610
        1842
        1679
        1461
        1284
        1229
        1089
         667
         574
>>

Sign in to comment.

Answer 2

dpb on 23 Sep 2017

Edited: dpb on 23 Sep 2017

Open in MATLAB Online

2 votes

Read the file as block of cellstr, convert to character array
Convert char array of 12x75 to 1*900 line=reshape(blk.',1,[]);
Select first 288*3 --> 864 characters c=line(1:864);
Replace any blanks with '0' c=strrep(c,' ','0');
Convert 3-digit fields dat=sscanf(c,'%3d');
Go next block

Thanks to Cedric for pointing out my weak eyes... :)

file=textread('tropo.txt','%s','delimiter', '\n','whitespace', '','headerlines',3);  % file as cellstr array
L=length(file);     % number lines/records in file
data=zeros(L/12,288);  % preallocate for resulting data
j=0;                   % counter for data blocks
for i=1:12:L           % loop over blocks of 12 records
  blk=char(file(i:i+11));  % retrieve a block, convert to character array
  blk(:,1)='';    % remove leading blanks
  line=reshape(blk.',1,[]); line=line(1:864);  % recast as record;truncate
  line=strrep(line,' ','0');                   % replace blanks with leading 0
  j=j+1;                                       % increment counter
  data(j,:)=sscanf(line,'%3d');                % convert to numeric
end

results in a double array containing the data...

From the first block I tested at command line--

>> whos data
Name        Size            Bytes  Class     Attributes
data      288x1              2304  double              
>>

3 Comments
Show 1 older comment Hide 1 older comment

dpb on 23 Sep 2017

Old eyes failed me...I had mistakenly thought char() had gotten rid of the leading space but didn't...thanks.

Cedric on 23 Sep 2017

My maybe younger eyes failed me too. I had to get tricked a couple times before I realized!

Sign in to comment.

Answer 3

Guillaume on 23 Sep 2017

Open in MATLAB Online

0 votes

Whoever created that format should be very ashamed. It's a pain to parse.

This is a start. I still need to figure out why I've got 292 columns instead of 288, but I've got to go.

filecontent = fileread('L3_tropo_ozone_column_jan14.txt');  %read it all
filecontent(ismember(filecontent, [10, 13])) = []; %remove line returns
longdesc = regexp(filecontent, 'Longitudes:\s*(\d+)\D+(\d+(\.\d+)?)([EW])\D+(\d+(\.\d+)?)([EW])', 'tokens', 'once');  %longitude description
longnumbers = str2double(longdesc([1 2 4]));
longnumbers(2:3) = longnumbers(2:3) .* (-1).^ strcmp(longdesc([3 5]), 'W'); %change sign for W
longitudes = linspace(longnumbers(2), longnumbers(3), longnumbers(1));
pointlats = regexp(filecontent, '\s+([0-9 ]+)lat\s*=\s*(-?\d+(\.\d+)?)', 'tokens'); %extract point strings and latitude
pointlats = vertcat(pointlats{:});
latitudes = str2double(pointlats(:, 2));
points = regexprep(pointlats(:, 1), '\s', '0'); %replace spaces with 0
points = regexp(points, '\d{3}', 'match');  %split in group of three
points = str2double(vertcat(points{:}));

5 Comments
Show 3 older comments Hide 3 older comments

Cedric on 23 Sep 2017

The format is consistent (see my comment under you answer). What is annoying is that it is designed partly because of "machine" constraints, and partly for looking "cute" to a human eye when opened in a text editor.

dpb on 23 Sep 2017

Wonder why put the leading blank in there, though...that really is the only really bad part; the rest is pretty easy to deal with but that makes for special-casing. Oh, the no leading zero in the format is also pretty ugly; almost forgot that! :)

Sign in to comment.

how to read grid data from text file ?

2 Comments
Show None Hide None

Accepted Answer

9 Comments
Show 7 older comments Hide 7 older comments

More Answers (2)

3 Comments
Show 1 older comment Hide 1 older comment

5 Comments
Show 3 older comments Hide 3 older comments

Categories

Tags

Community Treasure Hunt

how to read grid data from text file ?

2 Comments Show None Hide None

Accepted Answer

9 Comments Show 7 older comments Hide 7 older comments

More Answers (2)

3 Comments Show 1 older comment Hide 1 older comment

5 Comments Show 3 older comments Hide 3 older comments

Categories

Tags

See Also

Community Treasure Hunt

2 Comments
Show None Hide None

9 Comments
Show 7 older comments Hide 7 older comments

3 Comments
Show 1 older comment Hide 1 older comment

5 Comments
Show 3 older comments Hide 3 older comments