Finding the index of dupplicate rows in a matrix

30 views (last 30 days)
I have matrix M 275935x2 . I want to remove duplicate rows. I tried two methods: Method 1)
M2=unique(M,'rows') % and it gave M2 179109x2 double
Method 2)
x0=find(hist(M,unique(M))>1); % it gave only 8301 duplicate values.
Which method is correct? I want to find th eindices of duplicate rows and not simply remove them. Any help is appreciated

Answers (1)

Stephen23
Stephen23 on 6 Feb 2017
Edited: Stephen23 on 6 Feb 2017
>> A = randi(1e4,275935,2);
>> [B,~,Y] = unique(A,'rows','stable');
>> [C,X] = hist(Y,unique(Y));
>> Z = ismember(Y,X(C>1)); % indices of repeated rows of A
For example this random data set had
>> nnz(Z)
ans =
816
row that occur most than once. To get the indices of the duplicate rows, try this:
[U,W] = unique(A,'rows','stable');
D = setdiff(1:size(A,1),W); %indices of duplicate rows.
  2 Comments
Danielle Leblance
Danielle Leblance on 6 Feb 2017
I am sure there is something wrong. I am attaching the data.the unique function gives a matrix B which is different than the one that I obtain if I remove the duplicates Z
Stephen23
Stephen23 on 6 Feb 2017
Edited: Stephen23 on 6 Feb 2017
This works for me:
>> load matlab.mat
>> [B,W] = unique(M,'rows','stable');
>> D = setdiff(1:size(M,1),W); % indices of duplicate rows.
And now compare:
>> M(D,:) = [];
>> isequal(M,B)
ans =
1

Sign in to comment.

Categories

Find more on Creating and Concatenating Matrices in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!