Finding column values for each unique combination of two other columnar values in a table

5 views (last 30 days)
I have a large tabular data with three columns X, Y and Z.
I want to find unique values of Z for each unique combination of X and Y. All three columns have non unique data values. If multiple Z values for a unique (X,Y) combination exist, then I want the minimum Z value.
My thought on getting this was using
unique_X = unique(T.X);
T(~ismember(T.X_data, unique_X(i) ),:) = [ ];
And I loop it for X and Y individually and reset variable each loop, but I think there should be a much easier way to go at this. Can someone help me on this?

Answers (1)

dpb
dpb on 19 Oct 2022
% build a dataset
XYZ=randi(1000,30,3);
U=arrayfun(@(i)unique(XYZ(:,i)),1:3,'uni',0);
N=min(cellfun(@numel,U));
tXYZ=array2table(cell2mat(cellfun(@(v)v(1:N),U,'UniformOutput',0)),'variablenames',{'X','Y','Z'});
% the engine
tUbyGroup=groupsummary(tXYZ,{'X','Y'},@max); tUbyGroup.Properties.VariableNames(end)={'Max_Z'};
head(tUbyGroup)
X Y GroupCount Max_Z ___ ___ __________ _____ 32 27 1 150 80 92 1 166 90 100 1 173 100 108 1 260 106 170 1 363 156 202 1 434 169 213 1 442 244 257 1 460
Check if works if have unique in each column but a duplicated value from X in Y...
tXYZ.X(3)=tXYZ.X(2); % now there are two of that group...
tXYZ.Y(3)=tXYZ.Y(2); % now there are two of that group...
tUbyGroup=groupsummary(tXYZ,{'X','Y'},@max); tUbyGroup.Properties.VariableNames(end)={'Max_Z'};
head(tUbyGroup)
X Y GroupCount Max_Z ___ ___ __________ _____ 32 27 1 150 80 92 2 173 100 108 1 260 106 170 1 363 156 202 1 434 169 213 1 442 244 257 1 460 264 294 1 480
And, voila!!! Indeed the second group has a count of two and the Max_Z value is the larger of Z(2:3) which are the two we duplicated.

Categories

Find more on Matrices and Arrays in Help Center and File Exchange

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!