How do I identify unique rows based on multiple columns and calculate the average of the rest of the columns?

53 views (last 30 days)
I have a matrix as below:
A = [1 4 3 8; 4 5 6 9; 1 6 3 6; 2 6 9 3; 1 5 3 7];
My goal is to identify rows with both identical Column 1 and identical Column 3 values, and then calculate the average for the rest of the columns, i.e., Column 2 and Column 4, within these duplicate rows. In this example, the duplicate rows would be Rows # 1, 3, and 5. My ending matrix would be:
B = [1 5 3 7; 4 5 6 9; 2 6 9 3];
This is a much simplified example. In reality, I have 35 columns that need to be averaged, and millions of rows. What is the most efficient way of handling this? Do I have to write a loop and process each of the unique rows individually?
Many thanks!

Accepted Answer

Kevin Holly
Kevin Holly on 19 Oct 2021
Edited: Kevin Holly on 19 Oct 2021
A = [1 4 3 8; 4 5 6 9; 1 6 3 6; 2 6 9 3; 1 5 3 7]
A = 5×4
1 4 3 8 4 5 6 9 1 6 3 6 2 6 9 3 1 5 3 7
I am going to assume that any row that columns 1 and 3 are identical, irregardless of what pair, you want to ignore those rows in the averaging of other columns.
Here is my approach:
t = table(A(:,1),A(:,3))
t = 5×2 table
Var1 Var2 ____ ____ 1 3 4 6 1 3 2 9 1 3
[C, ia, ic] = unique(t,'rows')
C = 3×2 table
Var1 Var2 ____ ____ 1 3 2 9 4 6
ia = 3×1
1 4 2
ic = 5×1
1 3 1 2 1
ic==1
ans = 5×1 logical array
1 0 1 0 1
A(ic==1,:)
ans = 3×4
1 4 3 8 1 6 3 6 1 5 3 7
mean(A(ic==1,:))
ans = 1×4
1 5 3 7
B = [mean(A(ic==1,:));A(ic~=1,:)]
B = 3×4
1 5 3 7 4 5 6 9 2 6 9 3
Your answer:
B = [1 5 3 7; 4 5 6 9; 2 6 9 3]
B = 3×4
1 5 3 7 4 5 6 9 2 6 9 3
Without showing work:
t = table(A(:,1),A(:,3));
[~,~,ic] = unique(t,'rows');
B = [mean(A(ic==1,:));A(ic~=1,:)]
B = 3×4
1 5 3 7 4 5 6 9 2 6 9 3

More Answers (0)

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!