how to identify unique columns, based on the first two rows, make an average of the uniques columns and remove the duplicate columns?

Question

ehsan on 11 Jun 2019

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/466522-how-to-identify-unique-columns-based-on-the-first-two-rows-make-an-average-of-the-uniques-columns

Answered: Peter Perkins on 12 Jun 2019

I have a matrix with the size of 572 x 790. The first two rows of the matrixes are Longitude and Latitude for some stations, and the other rows are the measurments values. Now I would like to calculate the mean value of the two or more columns that have similar Lon and Lat (the first to rows). Then delete those columns' values and insert the derived mean value instead of them.

05	9.75	9.85	9.85	9.75	9.55    9.85
15	47.15	47.25	47.25	47.25	47.34   47.25
0	0	0	0	0        2.56
215	5.599	0	0	0	0        0
547	0.553	2.058	2.058	1.422	3.063    2.664
94	8.98	8.295	8.295	7.058	4.011    9.24
0	0	0	0	0        1
0	0	0	0	0        0
0	0	0	0	0        2.5
0	0	0	0	0        0
80	2.15	1.02	1.029	0.164	0.656    0.86
97	7.93	1.57	1.57	8.31	3.53    1.22
64	0.36	0.30	0.30	0.21	0.10    0
...   ...      ...     ...     ...    ...      ...

For example in this case the columns number 3, 4, and 7 are unique and I need to make an average .

I am confisung a little bit with unique and accumarray.

I appreciate any help.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Star Strider on 11 Jun 2019

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/466522-how-to-identify-unique-columns-based-on-the-first-two-rows-make-an-average-of-the-uniques-columns#answer_378725

Open in MATLAB Online

With ‘M’ as your matrix:

M = [10.0500    9.7500    9.8500    9.8500    9.7500    9.5500    9.8500
     47.1500   47.1500   47.2500   47.2500   47.2500   47.3400   47.2500
           0         0         0         0         0         0    2.5600
                                       . . .                             ];                                   

Use uniquetol (or unique), splitapply, cellfun and cell2mat to get:

[Uc,~,idx] = uniquetol(M([1 2],:)', 0.001, 'ByRows',1)              % Choose A Function
[Uc,~,idx] = unique(M([1 2],:)',  'rows')                           % Choose A Function
colcolct = splitapply(@(x){x}, M', idx);                            % Collect Columns (By Row Of Transposed Matrix)
colmeans = cellfun(@(x)mean(x,1), colcolct, 'Uni',0)                % Column Means (By Row Of Transposed Matrix)
Out = cell2mat(colmeans)'                                           % Convert To Matrix & Transpose

producing:

Out =
    9.5500    9.7500    9.7500    9.8500   10.0500
   47.3400   47.1500   47.2500   47.2500   47.1500
         0         0         0    0.8533         0
         0    5.5990         0         0    1.2150
    3.0630    0.5530    1.4220    2.2600    0.5470
    4.0110    8.9800    7.0580    8.6100   10.9400
         0         0         0    0.3333         0
         0         0         0         0         0
         0         0         0    0.8333         0
         0         0         0         0         0
    0.6560    2.1500    0.1640    0.9697    4.8000
    3.5300    7.9300    8.3100    1.4533    0.9700
    0.1000    0.3600    0.2100    0.2000    1.6400

I could not entice splitapply to both group the columns and calculate the means, explaining the cellfun call.

6 Comments
Show 4 older commentsHide 4 older comments

Star Strider on 11 Jun 2019

Our individual and collective pleasures!

Guillaume on 11 Jun 2019

As I've shown in my answer, you certainly don't need to go through cell arrays with splitapply. Using cell arrays is going to significantly slow down the code.

Sign in to comment.

Answer 2

Guillaume on 11 Jun 2019

2
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/466522-how-to-identify-unique-columns-based-on-the-first-two-rows-make-an-average-of-the-uniques-columns#answer_378727

Open in MATLAB Online

using unique and accumarray:

[latlong, ~, group] = unique(yourmatrix(:, [1, 2]), 'rows');   %optionally use the 'stable' option.
[destrow, destcol] = ndgrid(group, 3:size(yourmatrix, 2));
newmatrix = accumarray([destrow(:), destcol(:)], reshape(yourmatrix(:, 3:end), 1, []), [], @mean)
newmatrix(:, [1, 2]) = latlong

using the newer findgroups and splitapply:

[group, lat, long] = findgroups(yourmatrix(:, 1), yourmatrix(:, 2));
newmatrix = splitapply(@(rows) mean(rows, 1), yourmatrix(:, 3:end), group);
newmatrix = [lat, long, newmatrix]

But you might be better off converting your matrix into a table, particularly if columns 3:end represent different things. Using a table, what you want is very straightforward with either varfun or groupsummary:

latlongdata = array2table(yourmatrix, 'VariableNames', [{'latitude', 'longitude'}, compose('var%d', 1:size(yourmatrix, 2)-2)]);  %obviously use better variable names than var1...varn for the remaining columns
newdata = varfun(@mean, latlongdata, 'GroupingVariables', {'latitude', 'longitude'})
%or
newdata = groupsummary(latlongdata, {'latitude', 'longitude'}, 'mean');

groupsummary lets you calculate several statistics at once (e.g. mean, std, max, min, sum) whereas varfun can only calculate one.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 3

Peter Perkins on 12 Jun 2019

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/466522-how-to-identify-unique-columns-based-on-the-first-two-rows-make-an-average-of-the-uniques-columns#answer_378890

Open in MATLAB Online

You may have a reason for wanting row-oriented data, but I'm gonna suggest that you transpose your data, create a table, and use a grouped varfun. It's a one-liner:

>> T = array2table(X); T.Properties.VariableNames(1:2) = ["Lat" "Lon"]
T =
  7×13 table
     Lat      Lon      X3      X4       X5       X6      X7    X8    X9     X10     X11     X12     X13 
    _____    _____    ____    _____    _____    _____    __    __    ___    ___    _____    ____    ____
    10.05    47.15       0    1.215    0.547    10.94    0     0       0     0       4.8    0.97    1.64
     9.75    47.15       0    5.599    0.553     8.98    0     0       0     0      2.15    7.93    0.36
     9.85    47.25       0        0    2.058    8.295    0     0       0     0      1.02    1.57     0.3
     9.85    47.25       0        0    2.058    8.295    0     0       0     0     1.029    1.57     0.3
     9.75    47.25       0        0    1.422    7.058    0     0       0     0     0.164    8.31    0.21
     9.55    47.34       0        0    3.063    4.011    0     0       0     0     0.656    3.53     0.1
     9.85    47.25    2.56        0    2.664     9.24    1     0     2.5     0      0.86    1.22       0
>> T2 = varfun(@(x) repmat(mean(x),size(x)),T,'GroupingVariables',["Lat" "Lon"])
T2 =
  7×14 table
     Lat      Lon     GroupCount    Fun_X3     Fun_X4    Fun_X5    Fun_X6    Fun_X7     Fun_X8    Fun_X9     Fun_X10    Fun_X11    Fun_X12    Fun_X13
    _____    _____    __________    _______    ______    ______    ______    _______    ______    _______    _______    _______    _______    _______
     9.55    47.34        1               0        0     3.063     4.011           0      0             0       0         0.656      3.53       0.1  
     9.75    47.15        1               0    5.599     0.553      8.98           0      0             0       0          2.15      7.93      0.36  
     9.75    47.25        1               0        0     1.422     7.058           0      0             0       0         0.164      8.31      0.21  
     9.85    47.25        3         0.85333        0      2.26      8.61     0.33333      0       0.83333       0       0.96967    1.4533       0.2  
     9.85    47.25        3         0.85333        0      2.26      8.61     0.33333      0       0.83333       0       0.96967    1.4533       0.2  
     9.85    47.25        3         0.85333        0      2.26      8.61     0.33333      0       0.83333       0       0.96967    1.4533       0.2  
    10.05    47.15        1               0    1.215     0.547     10.94           0      0             0       0           4.8      0.97      1.64  

I think that does what you're looking for. It seems to match StarStrider's output.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

how to identify unique columns, based on the first two rows, make an average of the uniques columns and remove the duplicate columns?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

6 Comments
Show 4 older commentsHide 4 older comments

More Answers (2)

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

how to identify unique columns, based on the first two rows, make an average of the uniques columns and remove the duplicate columns?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

6 Comments Show 4 older commentsHide 4 older comments

More Answers (2)

0 Comments Show -2 older commentsHide -2 older comments

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

6 Comments
Show 4 older commentsHide 4 older comments

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments