MATLAB Answers

Can one read one line of string title followed by multiple lines of numeric data like FORTRAN?

8 views (last 30 days)
Kenneth Lamury
Kenneth Lamury on 3 Jul 2016
Edited: dpb on 17 Jul 2016
My test.txt file consist of one line of 80 characters followed by multiple FORTRAN lines that use both integer & floating-point numbers of format I7 or F7.0. Am I restricted to ' ' or ',' delimiters (which would require much rework), or is there a way to read a FORTRAN file?
C2345678901234567890123456789012345678901234567890123456789012345678901234567890
10 3.0 4.0 5.0 7.0 9.0 10.0 15.0 20.0 25.0 xxxxxxxxxx
I plan to write MATLAB R2016a Student programming code using a FORTRAN txt/doc file.
  3 Comments
dpb
dpb on 6 Jul 2016
"how does one read a FORTRAN input file into MATLAB?(*)"
I'd suggest two things--first Matlab is NOT Fortran so don't try to turn a Fortran program literally into Matlab.
In Matlab there isn't strong typing as there is in Fortran so the distinction between integer and real is immaterial; everything numeric is DOUBLE unless you specifically cast it to something else which is also fraught with some "issues". It's not likely there's much if anything in the code that will matter whether integers are specifically stored as integers or not.
Second, it may be far simpler to MEX the Fortran and so while you're calling the function transparently from Matlab, it's still Fortran underneath and doing the job in that fashion the Fortran i/o will still function just as it did.
As far as the comment regarding the example I gave, it works just fine for the description of the input file as given at the time--you've just now added the other information.
If you're adamant about forcing Fortran into Matlab, I think the only way to read the file as it currently is constructed will be to read it on a line-by-line basis as text and then parse the fields; it's too irregular it appears (altho some of that may be the formatting?) to make much use of the expedient of simply separating columns because it isn't all numeric--although looking again (after I reformatted your posting to CODE) perhaps the numeric values are all in 7-column fields; just not all records have same number of elements? If that were to be so, then theoretically you could use the information from the way the Fortran READs are constructed to split up the file into the proper subsections of records that match and either use the File Exchange submission or "roll your own" to parse records. Note that you can call textscan multiple times on the same file handle with a number of repetitions of the format string to read to limit the scope of a given format string to the desired size of the data wanted at any given call, just like READ uses an array size.
But, it'd still be simpler to just mex it imo...or, if it's a standalone program, make an interface routine from Matlab and dispatch it to background execution and retrieve the results.
(*) I'll note that at one time I had built a mex file that took a Fortran FORMAT statement and the associated READ information and used Fortran to do the actual read operation. Unfortunately, when left previous employer, I didn't realize the only copy of it and the source to it resided on a machine there and so it was lost...I had some ambition to rewrite it but unfortunately haven't gotten the round tuit required to have done so. I do agree it's a sorely-missed facility in Matlab; I began making request for the enhancement clear back at Ver 4 20 yr ago. At this point I have little hope it'll ever happen but I still think it's a reasonable enhancement request.
C i/o is simply broken for fixed-width text files w/o delimiters other than blanks/spaces as Walter's example illustrates. Trying to beat that into submission inside Matlab which uses the C RTL is basically pounding one's head into a wall for anything but the most well-behaved (in the C sense) case; it can only be done by brute force in the general case.

Sign in to comment.

Accepted Answer

dpb
dpb on 3 Jul 2016
As long as you don't have fixed-width fields with missing data, no problems with space-delimited files--
>> type ken.dat
C2345678901234567890123456789012345678901234567890123456789012345678901234567890
10 3.0 4.0 5.0 7.0 9.0 10.0 15.0 20.0 25.0 xxxxxxxxxx
>> fmt=[repmat('%f',1,10) '%*s'];
>> fid=fopen('ken.dat','r');
>> data=cell2mat(textscan(fid,fmt,'headerlines',1,'collectoutput',1))
data =
10 3 4 5 7 9 10 15 20 25
>> fid=fclose(fid);
>> whos data
Name Size Bytes Class Attributes
data 1x10 80 double
>>
  3 Comments
dpb
dpb on 16 Jul 2016
It's all under the links starting with
doc textscan
You have to read the full content of the documentation; there's a set of links to subsections on the LHS of the page, but when you're trying to learn something it's better to just start reading linearly; there may be something in there you really, really, really need to know and didn't even know to ask! :)
"...I understood that MATLAB types the reading based upon the first read field"
No, that is not so.
textscan returns a numeric vector as a cell array element, of length number of times that field type is matched. Note the type of the returned numeric value matches that of the conversion field. This is easily-enough demonstrated--
>> s='0.032 3 1234567890 1234567890 stuff stuff morestuff';
>> c=textscan(s,'%f %d %n %u %s %c %9c')
c =
[0.0320] [3] [1.2346e+09] [1234567890] {1x1 cell} 's' 'tuff more'
>> cellfun(@class,c,'uni',0)
ans =
'double' 'int32' 'double' 'uint32' 'cell' 'char' 'char'
>> cellfun(@disp,c,'uni',0)
0.0320
3
1.2346e+09
1234567890
'stuff'
s
tuff more
>>
A string specifier '%s' function returns a similar cell vector of strings. OTOH, for each character conversion that includes a field width operator, the return is a K-by-M character array, where M is the field width. The above, too, is in the documentation if you read the details under the section on Output Arguments
There's just funky stuff that happens with C formatted input scanning that's all there is to be said; particularly if you come from a Fortran background where "a character means a character". I don't even pretend to know all the C rules regarding the differences in interpretation between %c and %s and while Matlab documentation covers the high points, it's not the C Standard that defines the underlying behavior, it's a quick 'n fairly dirty users' guide that covers the ordinary cases. Without reading it all again (for the (several millionth time it seems :) ), I know there are some given caveats regarding the difference between the two but I don't recall what they are otomh. One I know of that probably is a "gotcha'!" for your case is that when the field width is given for %c, delimiters, white-space, and end-of-line characters are also counted whereas they're not with %s (although quoted strings are a special case as well).
NB: in the latter regard the return of the '%c' alternates above regarding what actually was read/returned.
"...merge textscan into cell2mat to speed up (?) the code"
On input, no; the conversion from cell array to the double array will actually take a little longer at that point but you'll gain that back going forward for strictly numeric data in array in both ease of addressing (straight paren's indexing vis a vis the curly braces to address cell content) as well as the reduced overhead of not having the indirection of the cell storage needing that dereferencing on each address fetch. In your case, the data you were looking for, while having character data at the ends of the records was all numeric, going ahead and converting to the double array from the git-go was the more efficient route overall, skipping the unwanted comment fields.
In the end, where there were multiple arrays of various sizes as revealed by the later postings, it would have been more simple in getting to the desired end result to use the same form but piecemeal for the individual arrays in the target program since the full file isn't regular in its entirety as the first posting implied. However, while somewhat of a pit(proverbial)a(ppendage), it's certainly possible to manage to read the input from Matlab although from a totally reliable implementation as Walter has pointed out multiple times parsing the actual fixed-width fields is the only sure-fire way to ensure you read every possible valid input file correctly; your way still relies on there being a delimiter between fields to prevent C-style formatted i/o from skipping blanks when counting field widths relying on its definition of what the number of characters in a field width specification means. While I and most others with Fortran background think it's an extremely wrongheaded definition, it is the way C is specified to work.

Sign in to comment.

More Answers (1)

Walter Roberson
Walter Roberson on 3 Jul 2016
You can use field widths in fscanf() or textscan() formats. For example,
sscanf('123456789', '%3f%2d%f')
ans =
123
45
6789
  2 Comments
Kenneth Lamury
Kenneth Lamury on 16 Jul 2016
The author of the function 'fixed_width_import' still lives in Jordan and no longer supports it as he no longer has a copy of MATLAB. I found it restrictive and not useful, so I too stopped using it.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!