Identify Duplicate values in an array and replace with Nan

25 views (last 30 days)
Poulomi Ganguli on 7 Sep 2023
Edited: Dyuman Joshi on 7 Sep 2023
Hello: I have a array as below of dimension 40,000x3. Third columns often contain duplicate values. I need to identify duplicate values and replace with Nan. Kindly help.
2.39300000000000 0 6.16800000000000
2.38720000000000 0 6.16800000000000
2.38480000000000 0 6.16800000000000
2.37380000000000 0 6.16800000000000
2.37410000000000 0 6.16800000000000
2.37020000000000 0 6.16800000000000
2.36880000000000 0 6.16800000000000
2.36350000000000 0 6.16800000000000
0 CommentsShow -2 older commentsHide -2 older comments

Sign in to comment.

Accepted Answer

Mrutyunjaya Hiremath on 7 Sep 2023
If you want to replace only the duplicates with 'NaN' and keep one occurrence of each value intact, here is the code:
% Sample array (Replace this with your 40,000x3 array)
array = [
2.3930, 0, 6.1680;
2.3872, 0, 6.1680;
2.3848, 0, 6.1780;
2.3738, 0, 6.1680;
2.3741, 0, 6.1690;
2.3702, 0, 6.1780;
2.3688, 0, 6.1690;
2.3635, 0, 6.1780;
];
% Extract the third column
third_col = array(:, 3);
% Find unique values and their first occurrence index
[unique_vals, ~, ic] = unique(third_col);
% Count the occurrence of each unique value
counts = accumarray(ic, 1);
% Identify values that occur more than once (duplicates)
duplicate_vals = unique_vals(counts > 1);
% Replace only duplicates with NaN, keep one occurrence of each value
for val = duplicate_vals'
idx = find(third_col == val);
third_col(idx(2:end)) = NaN; % Keep the first occurrence, replace the rest with NaN
end
% Update the third column in the original array
array(:, 3) = third_col;
% Display the updated array
disp(array);
2.3930 0 6.1680 2.3872 0 NaN 2.3848 0 6.1780 2.3738 0 NaN 2.3741 0 6.1690 2.3702 0 NaN 2.3688 0 NaN 2.3635 0 NaN
In this code, for each duplicate value, find its indices in the third column using 'find'. Then, keep the first occurrence (index idx(1)) and replace the rest (idx(2:end)) with NaN. This will leave one instance of each value in the third column and replace only the duplicates with 'NaN'.
0 CommentsShow -2 older commentsHide -2 older comments

Sign in to comment.

More Answers (1)

Dyuman Joshi on 7 Sep 2023
Edited: Dyuman Joshi on 7 Sep 2023
Here's a much faster and simpler approach -
array = [2.3930, 0, 6.1680;
2.3872, 0, 6.1680;
2.3848, 0, 6.1780;
2.3738, 0, 6.1680;
2.3741, 0, 6.1690;
2.3702, 0, 6.1780;
2.3688, 0, 6.1690;
2.3635, 0, 6.1780];
%Get the unique values and the indices corresponding to their 1st occurence
%in order they appear in the array
[val,first_idx] = unique(array(:,3),'stable')
val = 3×1
6.1680 6.1780 6.1690
first_idx = 3×1
1 3 5
%Convert all the values of column 3 to NaN
array(:,3) = NaN;
%Re-assign the values according to the indices
array(first_idx,3) = val
array = 8×3
2.3930 0 6.1680 2.3872 0 NaN 2.3848 0 6.1780 2.3738 0 NaN 2.3741 0 6.1690 2.3702 0 NaN 2.3688 0 NaN 2.3635 0 NaN
0 CommentsShow -2 older commentsHide -2 older comments

Sign in to comment.

Categories

Find more on Creating and Concatenating Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!