Reordering Nested fields of structs and using Parfor
2 views (last 30 days)
Show older comments
1. Rough Idea of my Procedure - Juggle Nested Structures according to what Parameter I need to Operate On
I am reading in a huge amount of text files with spatial coordinates and time, then organzing the data into structures with nested fields. There is more than 1 level of nestedness, but for a mininal example I give 2 levels (in fact there are more, but I am attempting to not overwhealm with details):
myStructreadIn(t).circle(c).dat(theta data); Step (1) - read in data to a struct with nested fields
I am then taking the fft on a certain parameter, such as time, and to do that I am creating a new struct, and juggling the fields, time, so that it's on the end as a vector of size [1,numTimesteps], with the vector that was previously at the end:
myNewStructForTimeFft(c).azimuthal(theta).dat( time data ) (2)
and then, in order to take the fft on another parameter, i create a new struct, and again juggle the fields so that the last one is a vector of size [1,numAzimuthalPoints],
YetAnotherNewStructForTimeFft(t).azimuthal(theta).dat( azimuthal ); (3)
and then take its fft. And so on. Lots of juggling like this, lots of data, then doing fft, xcorr, etc.
Since I have quite a lot of data, moving the data around to the different structs is quite expensive operation, even with parfor loops.
2. Question about Improving my Procedure --- Eliminate unnecessary 'field juggling' Possible? Eg via eg structFun...?
My Question is: is there a way to take the fft on the structure in (1) without creating structure in (2) (ie, dont juggle time to the last field)? I was doing some reading on structFun, can that be used?
3. Another Issue I have: Parfor crashes, but Regular For Loops Work Fine Memorywise.
Structures with nested fields make parfor loops drastically increase ram usage, when I juggle the fields as in Steps (1)-(3). I have properly initialized my structs with eg
`myStruct=struct('t', repmat({struct('circle', repmat({ struct('dat',repmat({zeros(3,1079)}, [1,540]))}, [1,numTimesteps])) }, [1,numCrossSections]));`
Ideally, it seems that although parfor loops have some overhead, my current way of juggling subfields (as in 1-3 above) is increasing the ram to my workstation's 192gb (so that matlab crashes) from 30gb if a parfor loop is being used, it barely increases about 32gb if the code is ran in parallel for one such 'field juggle'. I would like to stick with using structures for now if it's not totally the wrong way to approach a lot of data, but using another data format would be fine too, if that would fit the bill (maybe using datastore and arranging in tables---although it's not clear how to use tables with this nested idea).
1 Comment
Edric Ellis
on 11 Apr 2022
Data shuffling using (process-based) parfor is sadly never efficient. You're highly likely to end up duplicating the data. There's a chance parpool("threads") might work for you, as that can avoid duplicating data under certain circumstances. Your best bet to use parfor is (probably) to shuffle the data in your desktop MATLAB, and then use parfor on a regular array or cell array.
One other consideration - if the time-consuming computation (other than data manipulation) is indeed fft, then beware that this function is already intrinsically multithreaded by MATLAB (providing the arrays are large enough). This tends to mean that fft is already taking full advantage of your computer's resources, and trying to parallelise using parfor will not gain you anything at all.
Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!