How to preallocate memory for building this structure, indexing fieldnames?

5 views (last 30 days)
I have in several files a structure called "Result" and would like to merge all of them into one structure. My difficulty is, that the fieldnames following right after "Result." are build by a string identifying an experiment name, and as this experiment name and the amount of experiment names are unknown to this moment, I have to address them by indexing.
So far this indexing works, it merges my data correctly, but preallocation of memory is missing:
START HERE A LOOP THROUGH MANY FILES, RETRIEVING THE NEXT ID
NewData = load(id); % the file referenced in id contains a structure called "Result"
casename = fieldnames(NewData.Result);
cases = size(casename,1);
% preallocation of memory could fit in here, in this line
for caseIndex = 1:cases
Result.(casename{caseIndex}).MyValue = ...
NewData.Result.casename{caseIndex}).MyValue;
end
END HERE THE LOOP THROUGH MANY FILES
Now I tried to preallocate memory by the following failing attempt:
Result.(casename{1:cases}).MyValue = zeros(cases,1);
This one also failed:
Result.(casename{[1:cases]}).MyValue = zeros(cases,1);
Do you have any idea how the correct syntax has to look like?
  2 Comments
James Tursa
James Tursa on 9 Mar 2015
How many files are you talking about? Are the case names in each file unique, or is there potential overlap of names amongst files? There may be a way to do some meaningful pre-allocation for your proposed struct organization, but are we talking about a Result struct with 100's or 1000's (or more) of field names?
Marco
Marco on 9 Mar 2015
Edited: Marco on 9 Mar 2015
The casenames are unique, there will be in total about 50 fieldnames, just to give an dimension. But it also could be only 20 or up to 100.
In one of my experiment series, I produce about 5 files, each containing the structure with about 10 fieldnames. In another experiment series, I produce about 10 files, each containing the structure with about 5 fieldnames.
Following advice given by Adam (see his answer), y learned about and meanwhile used the Profiler and found that 98% of the time my code is busy with accessing the HDD in the load(id) line. So, my question is clearly not targeting performance anymore, but I am still interested in learning how I "could" code the preallocation in a clean way, just for learning how to program such thing.

Sign in to comment.

Accepted Answer

Stephen23
Stephen23 on 9 Mar 2015
Edited: Stephen23 on 9 Mar 2015
Unlike numeric and character arrays, according to the documentation both structures and cell arrays do not require completely contiguous memory. It is sufficient to preallocate just the cell array or structure itself, but this does not require also preallocating the arrays stored inside that cell array or structure: these can simply be empty, as they are not stored in the same memory location as the structure or cell array itself. You can read more about them here:
It is apparently slower to try to preallocate the data arrays (inside the structure or cell array):
Quoting Jan Simon from the above link: For this reasons it is e.g. useless to "pre-allocate" the elements of a cell array, while pre-allocating the cell itself is strongly recommended. The same also applies to structures.
This topic is also addressed very well by Loren Shure in one of her blogs:
Where she says: Of course it depends on your specifics, but since each field is its own MATLAB array, there is not necessarily a need to initialize them all up front. The key however is to try to not grow either the structure itself or any of its contents incrementally.
  5 Comments
Stephen23
Stephen23 on 9 Mar 2015
Edited: Stephen23 on 9 Mar 2015
These are two different issues: the number of fields and the number of experiments. What you are doing now mixes these two concepts together, with the resulting difficulties that you are facing.
Your statements, e.g. "that I do not know to which final size (to which quantity of fields) my structure might grow, gathering more and more data while looping through all my data files" do not actually tell us anything about how your data is organized: does each file correspond to one experiment, or multiple experiments? Do the measured values (fields) change between experiments?
You need to seriously consider using a non-scalar structure, depending on how your data is arranged, and in particular based on this question: Are the fields the same for each experiment?
For example, every experiment might have the following four values:
Results.Temperature = ...
Results.Parameters = ...
Results.Sensor1 = ...
Results.Sensor2 = ...
If they are the same, then a non-scalar structure would be the simplest, fastest and neatest option for storing your data.
Marco
Marco on 9 Mar 2015
Edited: Marco on 10 Mar 2015
There is a saying, which I try to translate to the english: a bad concept can never be patched to the same success like you could gain success by a good concept.
Stephen, you did remind me on this saying, thanks! I am already re-studying my literature and the official docs about structures, cells, and I will also have a look on tables, keeping in mind your recommendation to use the non-scalar structure. My data is organizable like in your last example.
I think that I ran into the bad idea to use dynamically generated fieldnames, because I didn't see how to later on extract from a structure a set of data if I only remember the casename, but do not know with which index this case became located in the structure. I will especially watch out for a chapter explaining how to search in [Result.ExperimentName] for my casename string and if it is found, how to then derive the corresponding index number of it to find also the rest of the data via Result(index), then. It could work by looping through the structure, but maybe there is some more elegant solution for this.
Everyday something new on my list. But hey(!), I am progressing, little by little, and my tiny program grows bigger and bigger. By the way, as my interest is in image processing, and I have written some code which loops over my test images applying many combinations of different parameters for selecting image enhancement and segmentation algorithms, you are helping me to get the result files collected in the various runs to once become merged, so that I could better hunt for the most promising parameter set in the consolidated data. I could do so much quicker by Copy&Paste to Excel, and for sure also the statistical analysis would be faster for me to do there, but: I also want to learn MATLAB, so will do it in MATLAB. Fortunately I have absolutely no time pressure, no dead line for this project :-)

Sign in to comment.

More Answers (1)

Adam
Adam on 9 Mar 2015
Why do you need to pre-allocate? Aren't you simply copying values from one struct to another without any dynamic resizing going on of any individual field of the new struct? I don't see that pre-allocating zeros and then over-writing them with the same size of your actual data will gain you anything.
  10 Comments
Adam
Adam on 9 Mar 2015
Stephen's answer is the more complete so the right one to accept, but if you gained something useful from my answer too then that is good :)
James Tursa
James Tursa on 9 Mar 2015
Edited: James Tursa on 9 Mar 2015
Some clarification about comments above:
"... Dynamically created fields don't require presizing when you create the struct (and they can't be since a field can contain anything)."
Assuming we are only talking about the field names here (not the field elements themselved). While they don't require pre-allocation, there is a benefit. The amount of benefit depends on the number of fields to be added. Adding field names dynamically (e.g. in a loop) causes MATLAB to re-allocate memory for the field names and add more value addresses iteratively as well ... it is the equivalent of assigning to a cell array index in a loop without pre-allocating the cell array first (cells and structs are stored very similarly internally). Since you are only copying field variable addresses each iteration the copying overhead isn't likely to be much, but it is extra overhead that could potentially be avoided (if one knows all the field names up front).
"... You could try to create the struct upfront with all its fields already containing pre-allocated arrays, but as mentioned this is un-necessary and slower rather than faster if you are simply going to copy data over the top of those pre-sized arrays anyway."
Yes and no. If one is talking only about creating a struct with the proper field names up front, then pre-allocation does make sense and will be faster ... although the overhead savings could be quite small and negligible depending on the number of fields in question (and in fact the extra code to do this may wipe out the small savings altogether). If one is talking about pre-allocating the field elements themselves with variables (e.g., zeros), then this doesn't typically make sense as the references discuss (they get overwritten downstream anyway so the pre-allocation can be a waste of time and resources).
DISCLAIMER: I add these comments for clarification only. The fact is I am in agreement with others who have already posted that there are better ways to organize the data for easier and more efficient access (using dynamic field names in code is notoriously slow and limits how you can access and manipulate the data).

Sign in to comment.

Categories

Find more on Structures in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!