Seeking suggestions to speed up JSON parsing to table.
21 views (last 30 days)
Show older comments
I am seeking assistance in speeding up the processing time needed to parse very large JSON files and endup with a flat table of values. I have a solution that works, it just takes on the order of 3-6 minutes to process even a relaively small (for the sensor data in question) file of 57,360 JSON strings.
The original file is many thousands of individual track reports from a radar system. To aid further processing and analyis work, I want to get these track reports into a flat table of 130 variables. jsondecode faithfully decodes the strings and produces a struct with all the data. Unfortunately, the data includes many structures itself, and so simply doing a struct2tableisn't then whole answer.
for i = 1:lengthOfArray
% Decode the current row of the character array containing the JSON
currentRow = jsondecode(importedFile{1,1}{i,1});
In fact, trying to different variations of converting each sub-structure to a table and concatenating into one final flat table appears to be even more time consuming than my current solutions, which is brute force reading each field of each structure/sub-structure and assigning it to a flat temporary holding structure, then at the end converting that temporary structure to a table. Here is a short excerpt to show you what I mean:
tempStruct(i).trackQuality =; %Track quality
tempStruct(i).covarianceType =; %Discriminator = cartisian or shperical
tempStruct(i).coVarCartesian_varX =; % Track Variance for X. Meters^2
tempStruct(i).coVarCartesian_covXY =; % Track Covariance for X & Y
tempStruct(i).coVarCartesian_covXZ =; % Track CoVar for X & Z
Before I continue, here are some relevant profiler results:
The tabular display I think must be the final output to the command window of the struct2Table function. I can't figure out how to suppress that.
trackReportTable = struct2table(tempStruct);
Time conversions are killing me. The radar reports seconds and microseconds in seperate fields every time a timestamp is required. There are 25 times stamps in each JSON string. So for every iteration of the loop that parses the JSON file, I have to call a custom function ambTime2mat to convert the time stamps to datenum.
function [serialDateNum] = ambTime2mat (epochSeconds,microSeconds)
%This function returns date and time in matlab date serial number format
%from a given input of seconds since 1/1/1970 and microseconds since the
%last second.
% **** NOTE: Due to the mechanisim to combine the fields, the result only
% has millisecond resolution. ****
% Epochseconds = Seconds since January 1st, 1970
% microSeconds = Microseconds since value in seconds.
% serialDateNum = date and time in Matlab date serial number format.
%EXAMPLE 01: ambTime2mat(timeInEpochSeconds,microseconds);
%Check to see that something was passed.
if nargin == 0
error('No data passed, nothing to convert');
% Epoch seconds to date serial
dnum = datenum(1970,1,1,0,0,epochSeconds);
% No apparent way to add microseconds to datenum, so convert to milliseonds
% and accept loss of resolution :(
partialSeconds = round(microSeconds/1000);
% Add the milliseconds to the date serial number
serialDateNum = addtodate(dnum,partialSeconds,'millisecond');
I think there is substantial room for improvement here, but I can't identify it. The goal is to get seconds and microseconds from seperate fields converted into a single serial date number, loosing some resolution if necessary.
Thoughts, opinions, suggestions all welcome. Keep in mind this parser is part of a larger set of analysis tools and so doing things like creating a flat table or converting to datenum is simply to get the data into a common format the other tool components expect.
Answers (0)
See Also
Find more on String Parsing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!