How to read a custom file with data, and search and parse

23 views (last 30 days)
I am trying to read files with a custom format, where I need (lack of better word) the raw data of the file so I can search and parse it. So far I am doing trying with fopen and fread to get the data.
fileID = fopen('myfile.customformat');
binaryString = fread(fileID);
So the files I need to parse have a header with a string (4 bytes) followed by uint32 containing the size of the following subsequent data. The data of interest, you could say, is an array of data sets, all with their own header string (4 bytes), and size (uint32) for all the points in one set. Essentially I just need to search for the header for each data set and extract the data, but the way I am reading the data right now with fread, I get the data in an array of dec, which does not seem that well to parse e.g. searching for the header strings.
What approach should I take instead? Below is, hopefully, a better overview of the data structure in the file.
Example of the data structure
- file header (string, 4 bytes)
- size of data package (uint32, 4 bytes)
- .. random irrelevant data
- dataset1_header (string, 4 bytes )
- dataset1_size (uint32, 4 bytes)
- dataset1_data (20x float)
- dataset2_header (string, 4 bytes )
- dataset2_size (uint32, 4 bytes)
- dataset2_data (20x float)

Answers (1)

Dave Behera
Dave Behera on 4 Apr 2016
fread is used to read from a binary file. Can you try using fscanf and fileread?
  1 Comment
Simon
Simon on 7 Apr 2016
Edited: Simon on 7 Apr 2016
fileread() seems to be the better solution for now. It actually returns the correct number of bytes compared to what I read in my hex editor. I do not know why fscanf or fread do not, they actually return less, perhaps some whitespace is removed??
Now, for reading the data structure i.e. header, size for the data package, and then the data, one could for instance make a class that matches the data structure.
How would you recommend I read/split the data in MATLAB, I have the offset where the data is located e.g. 0x00 uint32 header, 0x04 uint32 size, 0x08 uint32 data?
Also, bear in mind that when I read e.g. 4 bytes (uint32), they are stored as little endian, should I just simply read the data (4 bytes), and then fliplr()?

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!