Extract data from csv with header ond footer
21 views (last 30 days)
Show older comments
Gabriel Stanley on 21 Jun 2022
Commented: dpb on 22 Jun 2022
I have a set of csv files which have non-comma-delimited header and footer text (corporate boilerplate). The footer text seems to be preventing readtable from parsing the file, and as the number of rows of data vary from one csv to another, I can't hard-code the data ranges.
Is there a way to open such a file, find a spefic string therein, and use the row index thereof to bound readtable or similar functions? Alternatively, what would be the best way to go about extracting the comma-delimited data in this situation?
noble sharma on 22 Jun 2022
can you share the sample file to to test, so as to provide an proper suggestion
per isakson on 22 Jun 2022
I've replaced "data" in your file by "3.14" and I've skipped the the two column header lines, because they are weird.
Here is an oldtimers solution. (The only problem is to get the number of columns right.)
fid = fopen( 'FormatExample.csv', 'r' );
cac = textscan( fid, '%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f' ...
, 'Headerlines',4, 'CollectOutput',true, 'Delimiter',',' );
[~] = fclose( fid );
To make readtable read your file, you need to "fix" delimitedTextImportOptions - I guess.
" I incorrectly assumed that readtable invoked detectImportOptions ..."
The readXXX family does some less sophisticated parsing than full-blown detectImportOptions does in an attempt to be more time-saving...of course, if it doesn't work, one ends up using a lot more time debugging and going back again...
More Answers (1)
Another approach although with a real file so could figure out how to interpret what is actually the filel content and if had details on just what is needed/wanted -- for example, there are what appear to be both variables and units on the same record in the file if there is any truth at all in the headings -- but there are 13 variables and only 7 "Unit" indicators and 29 data fields/record. It's not, therefore, exactly possible to know what belongs with what -- 13 x 2 ==>26 so the number of variables plus a units field for each is short three columns.
As @per isakson hints, I'd begin with knowing what the content of the file is and use that as additional information for detectimportoptions to build a text file import object to use with one of the higher level routines like readtable if that were the appropriate data structure to use -- again, that would only be knowable from details of the file we don't have and what is to be done with the data once loaded.
But, to revert back to the original Q? posed, the rough outline to brute-force it would be something like
Find more on Text Data Preparation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!Start Hunting!