Reading sequence of binary packets consisting of multiple datatypes from binary file

13 views (last 30 days)
I need to access data stored in binary files. In this case, it is an event log file from an old scada system. Each file contains a sqeuence of packets, and a packet contains, for example, 28 bytes of data. This data can be read on low-level like this:
  • 18 chars containing the signal name as ISO 8859-1 String
  • A 16 bit signed integer LE for signal value
  • A 32 bit signed integer LE which describes unix datetime
  • A 16 bit signed integer describing additional miliseconds to unix datetime
  • 8 so called "status bits", each bit is an attribute or "signal flag"
  • A 8 bit signed integer describing signal type (just another signal property)
There are also other files using the same method of storing data using these sequences of packets.
I'm looking for the correct approach for reading such files and reading those "packets". I have imported similar files which were already encoded as strings with textscan using formatSpec, is there anything similar on binary file read?
Your help is apprechiated.

Accepted Answer

Guillaume
Guillaume on 17 Dec 2018
Use fread for binary data. Following your description, probably:
fid = fopen(filepath, 'r', 'l', 'ISO-8859-1'); %l for Little-endian which I assume is the LE in the description
%reading one record:
signal.name = fread(fid, [1 18], '*char'); %read 18 characters
signal.value = fread(fid, 1, 'int16'); %read 1 signed 16 bit integer. stored as double
signal.date = fread(fid, 1, 'int32');
signal.milliseconds = fread(fid, 1, 'int16');
signal.status = fread(fid, 1, '*uint8'); %read as unsigned 8 bit, keep as uint8
signal.type = fread(fid, 1, 'uint8');
For reading multiple records you can wrap the above in a loop. It won't be very fast though. Another option is to use the skip argument of fread to all the names in one go, rewind the file, read all the values, rewind the file, etc..:
fid = fopen(filepath, 'r', 'l', 'ISO-8859-1'); %l for Little-endian which I assume is the LE in the description
%reading all records:
signal.names = fread(fid, [18 Inf], '*char', 28)'; %skip 28 bytes between each name
fseek(fid, 18, 'bof'); %rewind back to the first value (18 bytes after the start)
signal.values = fread(fid, [1 Inf], 'int16', 28)';
fseek(fid, 20, 'bof'); %rewind back to the first date (20 bytes after the start)
signal.date = fread(fid, [1 Inf], 'int32', 28);
%etc.
Possibly, the fastest option may be to read the whole file in one go and perform the conversion afterward:
recordfields = {'name', 'value', 'date', 'milliseconds', 'status', 'type'};
recordtypes = {'char', 'int16', 'int32', 'int16', 'uint8', 'uint8'};
recordsizes = {18, 2, 4, 2, 1, 1};; %size of each type in bytes
fid = fopen(filepath, 'r', 'l', 'ISO-8859-1'); %l for Little-endian which I assume is the LE in the description
data = fread(fid, [sum(recordsizes), Inf], '*uint8')'; %read the whole lot as uint8, stored as uint8. transpose so that rows are records
data = mat2cell(data, size(data, 1), recordsizes); %split columns into each field
data = cellfun(@(col, type) typecast(col, type), data, recordtypes, 'UniformOutput', false);
record = cell2struct(data, recordfields, 2);
  3 Comments
Guillaume
Guillaume on 19 Dec 2018
Sorry for the bugs. It was obviously untested code. recordsizes was meant to be a matrix not a cell array, so you don't have to bother with [recordsizes{:}]:
recordsizes = [18, 2, 4, 2, 1, 1];
Indeed typecast only work with vectors so you need the (:) that I forgot. With regards to char not being supported unfortunately you'll need an if statement. cellfun is just a substitute for a for loop, so I'd rewrite the conversion as:
data3 = cell(size(data2)); %preallocation
for idx = 1:numel(data2)
if strcmp(recordtypes{idx}, 'char')
data3{idx} = char(data2{idx});
else
data3{idx} = typecast(data2{idx}(:), recordtypes{idx});
end
end
vik
vik on 19 Dec 2018
Unfortunately I cant publish the original file for testing, but I fixed the last bug now: before using typecast, the cell array needs to be transposed, otherwise the (:)-operator puts the vector together the wrong way.
This is the full working code and it works on all the files I need to import:
fid = fopen(filename, 'r', 'l', 'ISO-8859-1'); % Little Endian Order
recordfields = {'name', 'value', 'date', 'milliseconds', 'status', 'type'};
recordtypes = {'char', 'int16', 'int32', 'int16', 'uint8', 'uint8'};
recordsizes = [18, 2, 4, 2, 1, 1]; %size of each type in bytes
data = fread(fid, [sum(recordsizes), Inf], '*uint8')'; % Transform cell to vector
data2 = mat2cell(data, size(data, 1), recordsizes); % Same here
data2t = cellfun(@transpose,data2,'UniformOutput',false); %transpose for typecast
data3 = cell(size(data2t)); %preallocation
for idx = 1:numel(data2t)
if strcmp(recordtypes{idx}, 'char')
data3{idx} = char(data2{idx});
else
data3{idx} = typecast(data2t{idx}(:), recordtypes{idx});
end
end
record = cell2struct(data3, recordfields, 2);
Performance
The first Version with simple fread(fid, 1, 'int32') getting called thousands of times takes 1,41 Seconds to run.
The optimized Version with a onetime-call of fread takes only 0,013 Seconds.
Problem solved, thanks a lot.

Sign in to comment.

More Answers (0)

Categories

Find more on Data Type Conversion in Help Center and File Exchange

Products


Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!