Efficiently assign data into a struct?

72 views (last 30 days)
Mack
Mack on 16 Oct 2025 at 23:17
Commented: dpb ongeveer 3 uur ago
I've got a really large set of kinematic data that I'm trying to organize into structs and sub-structs. Currently, I'm assigning each struct and sub-struct manually by calling from my data set. I'm doing this because in some instances, the data set was not recorded in the coordinate frame that I want the output to be in. For the purposes of this example, I'm only including one parent struct, however I have multiple (T1, T2, T3). Is there a better/more efficient way of doing this?
% Random data set
T1.data = randi(1000, 19);
% Finding size of data set (in my code it varies...)
T1.size = size(T1.data);
% Assign data to appropriate field
T1.var1.data = T1.data(1:end, [1 2:4]);
T1.var2.data = T1.data(1:end, [1 5:7]);
T1.var3.data = T1.data(1:end, [1 8:13]);
T1.var4.data = T1.data(1:end, [1 14:end]);
% assigning data into structs
T1.var1.x = zeros(T1.size(1),4);
T1.var1.x = T1.var1.data(1:end, 2:4);
T1.var2.x = zeros(T1.size(1),4);
T1.var2.x = T1.var2.data(1:end, 2:4);
T1.var3.x = zeros(T1.size(1),4);
T1.var3.x = T1.var3.data(1:end, 2:4);
T1.var3.y = zeros(T1.size(1),4);
T1.var3.y = T1.var3.data(1:end, 5:7);
% Redefining coordinate frame for var4
T1.var4.x = zeros(T1.size(1), 4);
T1.var4.x(1:end,1) = T1.var4.data(1:end, 3);
T1.var4.x(1:end,2) = T1.var4.data(1:end, 2);
T1.var4.x(1:end,3) = T1.var4.data(1:end, 4);
T1.var4.y = zeros(T1.size(1), 4);
T1.var4.y(1:end, 1) = T1.var4.data(1:end, 6);
T1.var4.y(1:end, 2) = T1.var4.data(1:end, 5);
T1.var4.y(1:end, 3) = T1.var4.data(1:end, 7);
My end goal is to take derivatives of the kinematics, find resultants and max/mins, and plot. But I'm hoping to organize the initial data in a better way. If anybody has any tips on how to make this code more efficient please let me know!
  3 Comments
dpb
dpb on 17 Oct 2025 at 14:31
"Use one non-scalar structure."
Or a table with the additional variable of which variable to move the meta-data from variable names to actual data. You would then be able to use rowfun and/or groupsummary and/or friends and do all the desired calculations as vector operations without having to iterate through the variable names.
dpb
dpb on 17 Oct 2025 at 18:12
Edited: dpb ongeveer 12 uur ago
To try wrap head around what your struct is trying to represent, I rearranged your code by varN which then looks like--
% Random data set
T1.data = randi(1000, 19);
% Finding size of data set (in my code it varies...)
[R,C]=size(data);
Z(R,4)=0; % a temporary initializing array size Rx4
% Assign data to appropriate field
T1.var1.data = T1.data(:, [1 2:4]);
T1.var1.x = Z;
T1.var1.x = T1.var1.data(:, 2:4);
T1.var2.data = T1.data(:, [1 5:7]);
T1.var2.x = Z;
T1.var2.x = T1.var2.data(:, 2:4);
T1.var3.data = T1.data(:, [1 8:13]);
T1.var3.x = Z;
T1.var3.x = T1.var3.data(:, 2:4);
T1.var3.y = zeros(T1.size(1),4);
T1.var3.y = T1.var3.data(:, 5:7);
T1.var4.data = T1.data(:, [1 14:end]);
% Redefining coordinate frame for var4
T1.var4.x = Z;
T1.var4.x(:,1:3) = T1.var4.data(:, [3 2 4]);
T1.var1.y = Z;
T1.var4.y(:,1:3) = T1.var4.data(:,[6 5 7]);
I think this could be simplified significantly; you're make multiple copies of the same data over and over which is quite ineffcient memory usage plus adding to the complexity of addressing what it is you want.
You set up a 4-column array in which you later said the 4th column was for results; so one can assume from the subscripting of using 1 as the first column what you have are time, x,y,result? Excepting I don't see y for var1, var2 so they're only single-axis measurements, not 2-axis?
What are varN; as noted it appears to me you would be better off with a flat table with each of those as a (perhaps categorical) indicator variable rather than storing meta-data in variable names forcing variable addressing in one form or another. While it is possible to use variables as fieldnames, it may not be as convenient as grouping variables, but we can't appreciate enough about what the data are and what is the end analysis of them to be to do more than conjecture. But certainly storing all the data and then copies of it multiple times seems pointless.

Sign in to comment.

Answers (2)

dpb
dpb on 18 Oct 2025 at 19:37
Edited: dpb on 18 Oct 2025 at 23:52
If I were doing this, I think I'd approach it more like
% Random data set
data = randi(1000, 19);
% Finding size of data set (in my code it varies...)
[R,C]=size(data);
vnames={'Time','Var','X','Y','Z','R'}; % variable names for table
H=R*2+2*(R*2); % overall height
tData=table('Size',[H,numel(vnames)], ...
'VariableTypes',repmat({'double'},1,numel(vnames)), ...
'VariableNames',vnames); % preallocate the table
% Assign data to appropriate field
tData.Time=repmat(data(:,1),6,1); % time vector for each Var set
tData.Var=[ones(R,1);2*ones(R,1);3*ones(2*R,1);4*ones(2*R,1)]; % varN indicator variable each var
i1=2;
tData.X=reshape(data(:,i1:3:end),[],1);
i1=i1+1;
tData.Y=reshape(data(:,i1:3:end),[],1);
i1=i1+1;
tData.Z=reshape(data(:,i1:3:end),[],1);
tData.R=nan(height(tData),1);
[head(tData,5); tail(tData,5)]
ans = 10×6 table
Time Var X Y Z R ____ ___ ___ ___ ___ ___ 721 1 165 406 91 NaN 676 1 525 173 727 NaN 516 1 513 345 828 NaN 275 1 23 89 478 NaN 205 1 657 941 195 NaN 196 4 11 583 998 NaN 963 4 949 708 643 NaN 504 4 762 486 960 NaN 331 4 659 532 124 NaN 780 4 654 845 422 NaN
With the above, one can process with groupsummary or varfun and many other builtin tools for working with tables and grouping variables.
I've certainly guessed at what some things are; particularly in making the assumption that having six columns for Var3 and Var4 instead of 3 was just a duplicated set of data; if they are something else, then create the proper variable for them. As well, for the initial demo I didn't do the Var4 reordering; that is cetainly doable as in my earlier comment by creating the custom sequencing vector. Or, it might be simpler to do that reordering first in the raw data table if it is consistent.
But, this removes all the duplicated data storage of your structure at the expense of one additional variable, Var that could be categorical. If you have multiple datasets as you mention with more than one T, then add that indicator variable as to which T it is (can be numeric or could be an identifiable ID, whatever you choose) and you again replace the meta-data and the very complicated storage pattern with a flat table with very simple addressing modes for virtually anything you care to do.
  3 Comments
Matt J
Matt J ongeveer 4 uur ago
Edited: Matt J ongeveer 4 uur ago
I tend to think you're better off keeping things in struct form (flat or otherwise), rather than using tables. Tables have some unfortunate performance defects:
dpb
dpb ongeveer 3 uur ago
My tendency is to use a table until I'm shown performance for the particular application isn't "good enough"...if knew were going to be really, really huge a priori, mayhaps would change in that instance.
Definitely would try to have it be as flat as possible with struct...trying to handle metadata in naming variables or fields (or file naming conventions is another fairly frequent proposed alternative) adds complexity however one chooses to try to access them.

Sign in to comment.


Matt J
Matt J on 17 Oct 2025 at 0:28
Edited: Matt J on 17 Oct 2025 at 2:49
You can replace all occurences of '1:end' with ':' and condense your indexing operations. This, for example,
T1.var4.y = zeros(T1.size(1), 4);
T1.var4.y(1:end, 1) = T1.var4.data(1:end, 6);
T1.var4.y(1:end, 2) = T1.var4.data(1:end, 5);
T1.var4.y(1:end, 3) = T1.var4.data(1:end, 7);
can be replaced with,
T1.var4.y = zeros(T1.size(1), 4);
T1.var4.y(:,1:3)= T1.var4.y(:,6:7);
Also, things like this don't make sense,
% assigning data into structs
T1.var1.x = zeros(T1.size(1),4); %remove this line?
T1.var1.x = T1.var1.data(1:end, 2:4);
The first line isn't accomplishing anything except expending CPU time, since you then overwrite T1.var1.x completely with a different matrix, of a different size.
  2 Comments
Mack
Mack on 17 Oct 2025 at 14:59
Sorry, I should have been more clear on that. I'm setting up the T1.var1.x structs and so on to store kinematic data in the x, y, and z directions, and leaving a 4th column to later calculate the resultants.
dpb
dpb on 17 Oct 2025 at 19:02
But what @Matt J is pointing out is that when you write
T% assigning data into structs
T1.var1.x = zeros(T1.size(1),4); %remove this line?
T1.var1.x = T1.var1.data(1:end, 2:4);
what you end up isn't what you think it is ...
T1.var1.data=randi(100,4); % and arbitrary 4x4 array
T1.var1.x=zeros(4);
T1.var1.x=T1.var1.data(:,2:4); % assign into it
T1.var1
ans = struct with fields:
data: [4×4 double] x: [4×3 double]
you see the resultant array is, as Matt says, overwritten in its entirety and is just the Nx3 array. You would have to write
T1.var1.x=zeros(4); % reinitialize the zero array
T1.var1.x(:,1:3)=T1.var1.data(:,2:4); % assign into it
T1.var1
ans = struct with fields:
data: [4×4 double] x: [4×4 double]
to overwrite only the three first columns.
As a side note that probably isn't particularly pertinent to your end problem with real data, I suspect that
% Random data set
T1.data = randi(1000, 19);
isn't doing precisely what you may think; the first argument is the maximum of the range of random integers between 1 and 1000, the second is the size that will by default be 19x19. This may be deliberate but one presumes in real life the height of the actual data will be much larger although you may have 19 data columns consistently?

Sign in to comment.

Categories

Find more on Tables in Help Center and File Exchange

Products


Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!