Indexing arrays of binned data
4 views (last 30 days)
Show older comments
Dear all,
I have a cell array expData of 2745x1 cell. For every cell in this cell array I define the same range (i.e. bins). Then I discretize the data in expData based on the defined range.
Based on the discretized data in expData I want to find the corresponding values in the cell array velData, wich is illustrated in the picture below. Cell 14 is taken as an example. When the values are found I want to take the mean of it for every bin.
I tried this using accumarray but with no luck:
for i = 1:length(files)
% Define the range of the bins
rng_x{i} = -0.3:0.06:0.3;
% Assign the data of x-coordinate to a predefined range
disc_x{i} = discretize(expData{i,1}(:,1),rng_x{i});
% Calculate mean of every bin
x_mean{i} = accumarray(disc_x{1,i}(:,1), expData{i,1}(:,1),[11 1], @mean);
% Define the range of the bins
rng_z{i} = 0:0.06:0.78;
% Assign the data of z-coordinate to a predefined range
disc_z{i} = discretize(expData{i,1}(:,3),rng_z{i});
% Calculate mean of every bin
z_mean{i} = accumarray(disc_z{1,i}(:,1), expData{i,1}(:,3),[13 1], @mean);
vx_disc{i} = accumarray(disc_x{1,i}(:,1), velData{i,1}(:,1),[11 1], @mean); %Did not work
end
Splitapply does not work in this case since the bins will go empty when moving through the cells. You will get the following error if you use splitapply in this case:
"For N groups, every integer between 1 and N must occur at least once in the vector of group numbers."
2 Comments
Dana
on 9 Sep 2020
Without knowing what's in velData or what you mean by "Did not work" (did you get an error, and if so what? did it give an unexpected answer, and if so what did you expect and what did you get?), it's hard to offer suggestions. Can you provide some more details.
Answers (3)
Steven Lord
on 9 Sep 2020
Take a look at the groupsummary function.
% Include rng default so you generate the exact same random numbers I did
rng default
x = randn(10, 1);
y = -2:0.25:2;
d = discretize(x, y);
[values, groups] = groupsummary(x, d, @sum);
% Show the results in tabular form
xAndD = table(x, d, 'VariableNames', {'x_value', 'group'})
vAndG = table(values, groups, 'VariableNames', {'summed_value', 'corresponding_group'})
The value of summed_value in the row of vAndG whose corresponding_group entry is 10 represents the sum of the elements in the x variable in xAndD whose rows have 10 in the group variable.
group10_v1 = vAndG{vAndG.corresponding_group == 10, 1}
group10_v2 = sum(xAndD{xAndD.group == 10, 1})
group10_v1 == group10_v2 % True
Because of the rng default call I know that d has 10 in positions 5 and 8.
group10_v1 == x(5)+x(8) % True
Dana
on 9 Sep 2020
Index in position 1 exceeds array bounds (must not exceed 1).
This error is an indexing error, which suggests to me that one or more of your indices in that line of code are wrong. Further, it's not reporting the error from inside the function accumarray, which means the error is happening before anything is actually passed to that function. Based on that, we conclude that the error arises in the arguments you're passing to accumarray.
Since it's indicating that an index in position 1 is wrong, and the only part of that line of code with an index in position 1 that can potentially exceed 1 is velData{i,1} (the index in position 1 exceeds 1 if i>1), that's the obvious candidate. If you do size(velData,1), do you get something greater than 1? If not, that's your problem right there.
Based on my understanding of what you're trying to do, I would think you should get what you're after if you fix that problem. However, you said, "But even if I solve the above error, I doubt whether I will get the intended results. Because accumarray is saying that the data from velData will be devided into bins specified by disc_x." Isn't that what you want? I don't understand why that's a problem.
Essentially, using your strategy here, for file i, each row of expData{i,1} is associated with the same row of velData{i,1} (ignoring the indexing error, anyway). You're then binning the rows of expData{i,1} and velData{i,1} according to the values in the first column of expData{i,1}, with the index of the corresponding bin stored in the vector disc_x{i}. Next, you want to compute the means of the first column of velData{i,1} by bin. If that's what you're after, then your code should do that (again, as long as you fix the above indexing issue first).
2 Comments
J. Alex Lee
on 10 Sep 2020
From what I can gather, it hsould be possible to reorganize your experimental data into a Nx6 matrix called Data, where N is the number of coordinate,velocity pairs, and the 6 columns are organized as
x|y|z|u|v|w
-----------
| | | | |
To bin just on the (x,z) coordinates, you can use histcounts2
[~,Xedges,Zedges,binX,binZ] = histcounts2(Data(:,1),Data(:,3),nBins);
where nBins is the number of bins you want in each direction x and z.
You can use Xedges and Yedges to compute the bin centers, and binX, and binY are the assignments of each data point (row in Data) into the 1D bins along each direction.
From there you just need to use binX and binY to determine which 2D bin a data point (row in Data) belongs to. I would then just loop through those indices to find average velocities, but perhaps you can somehow use "groupsummary" as suggested above, if you are allowed to define your own groups manually
2 Comments
J. Alex Lee
on 10 Sep 2020
it would be nice if there was a "discretize2" function, this doesn't seem like such a niche need...
See Also
Categories
Find more on Data Preprocessing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!