Profile and debug load function

I have a .mat file containing an object of a custom Matlab class that mainly consists of a big data table of type table, stored in one of the class' properties. The table contains ~20k rows and contains datetime, double, categorical arrays, and objects of class idnlarx, i.e. estimated nonlinear nlarx models.
The mat file is 520 MB in size (mat file v7 with enabled compression, v7.3 makes the size 10x) and takes between 240 and 290 seconds to load.
I'm trying to speed up the loading process which is clearly CPU bound. I think the problem is the idnlarx model which utilizes most of the space (> 1 GB when loaded to RAM and instantiated).
And I'd like to understand what takes so much time to load, but the profiler only shows this:
...
That means, 94% are self-time of the load function.
How do I find out what takes so long?
Thanks,
Jan

Answers (1)

Matt J
Matt J on 18 Feb 2024
Edited: Matt J on 18 Feb 2024
You could pursue your hypothesis by extracting the table from an object, then saving the idnlarx column to its own .mat file. Then, you could check how much time it takes to load back in. I suspect your hypothesis is correct, however, because I routinely load .mat files that are ~500 MB and, while it takes some time, it is nowhere near a minute long, let alone 4 minutes.

8 Comments

I made some tests, and both, idnlarx (which does not support putting it into an array, just into a cell), and some of my custom classes (which support collecting them in an array), have significant impacts on load and save times.
I understand Matlab is not that good at looping and instantiating lots objects, I might have to live with some overhead, and the idnlarx class is quite old and complex.
However, even this simple example here causes issues and shows a big gap in the load profile:
testClassBenchmark.m
%% benchmark test class
N = 100000;
tsc = testScalarClass.empty(N,0);
for k = 1:N
tsc(k) = testScalarClass(k, "myName" + k);
end
tac = testArrayClass(1:N, "myName" + string(1:N));
%% save
tic
save("tsc.mat", "tsc")
save("tac.mat", "tac")
toc
%% load benchmark
useProfiler = true;
val = loadData("tsc");
assert(isequal(val,tsc));
val = loadData("tac");
assert(isequal(val,tac));
if useProfiler
profile on
end
meadianTimeTsc = timeit(@()loadData("tsc"),1)
meadianTimeTsa = timeit(@()loadData("tac"),1)
if useProfiler
profview
end
%% load fcn
function val = loadData(fn)
val = load(fn + ".mat").(fn);
end
testScalarClass.m
classdef testScalarClass
properties
InstanceNum (1,1) double = nan
InstanceName (1,1) string = "default"
end
methods
function obj = testScalarClass(num, name)
arguments
num double = nan
name string = "default"
end
obj.InstanceNum = num;
obj.InstanceName = name;
end
end
methods (Static)
function obj = loadobj(s)
if isstruct(s)
disp("compatibility issues!")
else
obj = s;
end
end
end % static methods
end
testArrayClass.m
classdef testArrayClass
properties
InstanceNum (:,1) double = nan
InstanceName (:,1) string = "default"
end
methods
function obj = testArrayClass(num, name)
arguments
num double = nan
name string = "default"
end
obj.InstanceNum = num;
obj.InstanceName = name;
end
end
methods (Static)
function obj = loadobj(s)
if isstruct(s)
disp("compatibility issues!")
else
obj = s;
end
end
end % static methods
end
Are there any patterns how to improve save/load speed with many class instances, probably with additional class objects nested in their properties? I could implement lazy load and split the built-in datatypes in my table from the custom ones, but the table container is not the bottleneck. I might be able to transform some of my classes from an array of objects construct to object of arrays construct, but that only works for very simple classes with scalar or simple properties.
Profiling both classes lead to:
The array class obviously is way faster, but I can't refactor everything to array classes.
Key question: What causes the overhead? Is it the class instantiation? Is it the properties assignment? What is causing the slowdown? The loadobj method seems not to be the issue. What is happening in the time not detailed profiled during load? Can I profile it?
Thanks!
Matt J
Matt J on 18 Feb 2024
Edited: Matt J on 18 Feb 2024
The speed difference is likely because data storage in the "array class" form is much more memory-contiguous. Maybe you could write a saveobj method that converts the array-of-scalars to a scalar-of-arrays, just for the purpose of .mat file storage. Similarly, youw ould write a loadobj method that will transform it back. That way, you wouldn't have to refactor the rest of your class clode.
That's a good idea, I will follow up on that to save some time and memory, but that's no solution for the built-in idnlarx class which utilizes most of the memory :(
What is happening during load which cannot be traced by Matlab? Most of the duration needed by load does not show any children, is that normal?
And: are there differences in load speed depending whether I call load("x.mat") from inside a function or a method vs global/base workspace? I see significant slowdowns when I do it from within a class. Workaround now is to load it in base workspace and inject it via a property, but that's not what I would like to do all the time.
load() is a built-in function. Timing analysis only goes as far as the built-in function, without being able to show timings of the C or C++ code.
but that's no solution for the built-in idnlarx class which utilizes most of the memory :(
Not sure why that's a problem. As we've been discussing, the amount of memory is probably not the issue. It's the data organization. Is it that you cannot form an array of idnlarx objects? If so, maybe the saveobj method can store just the idnlarx constructor argument data that built it. How long would it take to rebuild the idnlarx objects from the initial constructor data?
Hey Matt,
I think I understand your point, but I can only apply tweaks on data organization to my own code. idnlarx is a Matlab provided class, see Nonlinear ARX model - MATLAB (mathworks.com). I was not planning to mess around with its loading and storing methods. It's a complex class and I'm not sure if I can re-create the full state of an object, many properties are read-only and cannot be assigned via the constructor etc. I think the only way here would be subclassing or am I missing something here? mixins can't be used without modifying Mathworks code and I don't see other solutions.
You're right, idnlarx objects can't be packed into arrays, but I don't think that's they problem.
Matt J
Matt J on 26 Feb 2024
Edited: Matt J on 26 Feb 2024
According to your original post, the idnlarx objects are members of your custom class. It is that class whose saveobj/loadobj method I am suggesting you customize, not the internal loadobj method of idnlarx. Transform the idnlarx members into something simpler and more compact, something which can be used to rebuild the original idnlarx memebrs later.
Understood. And that's where I struggle since I can't access the internals of idnlarx to rebuild the original idnlarx objects later. At least its constructor does not let me set all class properties to rebuild and I can't set them from "outside", because they are read only.
But I'll investigate, maybe there are other ways to serialize/deserialize.

Sign in to comment.

Products

Release

R2023b

Asked:

on 18 Feb 2024

Commented:

on 27 Feb 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!