- https://www.mathworks.com/help/matlab/ref/jsondecode.html
- https://www.mathworks.com/help/matlab/ref/jsonencode.html
Can MATLAB jsondecode() be used to decode a massive json file?
67 views (last 30 days)
Show older comments
robert bristow-johnson
on 17 Feb 2024
Commented: robert bristow-johnson
on 19 Feb 2024
What I want to do is parse this massive Cast Vote Record for an election in Alaska in 2022. Size is 373M. I am not sure yet what class of field I am looking for, and I was hoping to view the file as a text file, but there are no line nor indents put in. They took out all of the white space, evidently. File is impossible to read with any text editor I have.
I would like to parse it, one field at a time.
I have once in my life written numerical output from a MATLAB script to a json file, simply following the rules and having choice of how I was going to format it. It was still an ugly job. I just don't think I can write my own parser, but was hoping maybe I could get jsondecode() to help me.
0 Comments
Accepted Answer
Harsha Vardhan
on 17 Feb 2024
Edited: Harsha Vardhan
on 17 Feb 2024
Hi,
I see that you are trying to parse a huge json file using MATLAB.
This can be done using 'jsondecode' and 'jsonencode' functions as below:
After extracting the contents of the zip file - https://www.elections.alaska.gov/results/22SSPG/CVR_Export_20220908084311.zip , many json files are available in the extracted folder. Among them, we will parse the largest json file - 'CvrExport.json' of size 364MB using the MATLAB code below. The below code stores the decoded json data into the 'data' variable, Further, the code also properly formats the json data and then writes to the file - 'output.json'.
% Read JSON file
jsonStr = fileread('CvrExport.json');
% Decode JSON data
data = jsondecode(jsonStr);
% Open text file for writing
fid = fopen('output.json', 'w');
% Write formatted JSON to the text file
fprintf(fid, '%s\n', jsonencode(data, PrettyPrint=true));
% Close the text file
fclose(fid);
Now, we can parse one field at a time using the dot operator as below.
data
data =
struct with fields:
Version: '5.5.52.6'
ElectionId: '2022 Primary Election and Special General'
Sessions: {192289×1 cell}
%data.Sessions will return 192289 1x1 structs. Among them, we will access the first session as below.
data.Sessions{1}
ans =
struct with fields:
TabulatorId: 91100
BatchId: 1
RecordId: 1
CountingGroupId: 2
ImageMask: 'D:\NAS\2022 Primary Election and Special General\Results\Tabulator91100\Batch001\Images\91100_00001_000001*.*'
SessionType: 'QRVote'
VotingSessionIdentifier: ''
UniqueVotingIdentifier: ''
Original: [1×1 struct]
Similarly, the json can be further parsed into the the 'Original' field.
The 2nd problem you mentioned is being unable to view the json file in a text editor. This will be a problem for the 'output.json' file too since it is also a huge file. You can have a work around for this by viewing a few lines at a time. For example, the following MATLAB script displays the first 200 lines of the 'output.json' file.
% Open the text file for reading
fid = fopen('output.json', 'r');
% Read the first 200 lines
numLines = 200;
for i = 1:numLines
line = fgetl(fid);
if line == -1
% Break if end of file is reached
break;
end
disp(line);
end
% Close the file
fclose(fid);
You can also view line by line of the huge output.json file using the 'more' command in the 'Windows Command Prompt Window (CMD)'. You may check the output below:
%Command
more +1 output.json
%Output
"Version": "5.5.52.6",
"ElectionId": "2022 Primary Election and Special General",
"Sessions": [
{
"TabulatorId": 91100,
"BatchId": 1,
"RecordId": 1,
"CountingGroupId": 2,
"ImageMask": "D:\\NAS\\2022 Primary Election and Special General\\Results\\Tabulator91100\\Batch001\\Images\\91100_00001_000001*.*",
"SessionType": "QRVote",
"VotingSessionIdentifier": "",
"UniqueVotingIdentifier": "",
"Original": {
"PrecinctPortionId": 404,
"BallotTypeId": 5,
"IsCurrent": true,
"Cards": {
"Id": 515,
"PaperIndex": 0,
"Contests": [
{
"Id": 5,
"ManifestationId": 59,
"Undervotes": 0,
"Overvotes": 0,
"OutstackConditionIds": [],
"Marks": {
"CandidateId": 141,
"ManifestationId": 904,
"PartyId": 14,
"Rank": 1,
-- More (0%) --
You may refer here for documentation of the 'jsondecode' and 'jsonencode' functions:
Hope this helps in resolving your query!
More Answers (0)
See Also
Categories
Find more on JSON Format in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!