# treating NaN as a unique value (instead of as a distinct)

118 views (last 30 days)
antonet on 2 Jul 2012
In
there is the following example:
Unique Values in Array Containing NaNs
A = [5 5 NaN NaN];
C = unique(A)
C =
5 NaN NaN
unique treats NaN values as distinct.
IS it possible to treat NaN as a unique value so at to have
C=5 NaN

#### 1 Comment

Malcolm Lidierth on 2 Jul 2012
NaNs are treated as "bigger" than +Inf by Arrays.sort() in Java and by MATLAB unique/sort too by the look of it so you can just trim the output array.
If MATLAB NaN does not return a constant NaN bit pattern (it probably does), java.lang.Double.NaN will do. But NaNs are NaNs so each is treated as unique even if the bit pattern is the same.

Kye Taylor on 2 Jul 2012
Edited: Kye Taylor on 2 Jul 2012
Write this function...
function y = myUnique(x)
y = unique(x);
if any(isnan(y))
y(isnan(y)) = []; % remove all nans
y(end+1) = NaN; % add the unique one.
end
end

#### 1 Comment

antonet on 2 Jul 2012
thank you!

Walter Roberson on 2 Jul 2012
Stealing ideas and optimizing...
function y = myUnique(x)
y = unique(x);
y(isnan(y(1:end-1))) = [];
end

Richard Brown on 2 Jul 2012
You might need to do a bit more of a rigorous test - I just tried a simple test case (100,000 entries, 10,000 NaNs), and the if version was usually slower, sometime significantly. The if version also has a much more variable runtime compared to this one
Walter Roberson on 2 Jul 2012
Note to people trying to understand my code: it makes use of an obscure trick. When you index with a logical vector, the vector you are indexing with can be shorter than the vector being indexed. The "missing" logical values are treated as false. By only applying isnan() to the elements up to one before the end of the vector, I prevent the last element of the vector from being tested for NaN, so I am preventing that last element from being deleted. This has the effect of preserving one NaN from being deleted.
Kye Taylor on 3 Jul 2012
That is cool.

James Tursa on 2 Jul 2012
Edited: James Tursa on 3 Jul 2012
Assuming the input A is double class and all the NaN values have the same underlying bit pattern (which seems to be true of the MATLAB functions):
C = typecast(unique(typecast(A,'uint64')),'double');
If you are working with single class variables then:
C = typecast(unique(typecast(A,'uint32')),'single');
The above code has two extra data copies involved. If you don't want to absorb the time/resource penalty of these data copies, you can use my TYPECASTX function from the FEX which returns a shared data copy of the input:
C = typecastx(unique(typecastx(A,'uint64')),'double');
If you are working with single class variables then:
C = typecastx(unique(typecastx(A,'uint32')),'single');
TYPECASTX can be found here:
---------------------------------
WARNING -- WARNING:
---------------------------------
The above code should not be used because the UNIQUE function does not work with uint64 and int64 class inputs. I am leaving my post here for reference, but do not use the above code. See the discussion in the comments below.

Show 1 older comment
Walter Roberson on 3 Jul 2012
Aggh, that would definitely be a bug in unique!
James Tursa on 3 Jul 2012
FOLLOW-UP:
After stepping though the UNIQUE code, it turns out the result differences are not because of changes in the UNIQUE code itself, but in the underlying double -- uint64 conversions behind the scenes. That is, both the R2010a (and prior) versions and R2010b (and later) versions of UNIQUE do the conversions to/from double inside the code without regard to input class (which seems to be a bug to me in both cases). But the conversion code itself has apparently changed. I get different answers with the following:
format hex
-inf
typecast(ans,'uint64')
double(ans)
uint64(ans)
On R2010a and earlier, a nearby number (likely the result of some rouding scheme in the background) is produced, not the original number. On R2010b and later, the original number is reproduced. It may be that if I had used a different number to start with, the R2010a and earlier versions would reproduce it and the R2010b and later would not. I don't know. I haven't had the time to fully test this out yet, and don't know the extent of the uint64 and int64 arithmetic/conversion changes that were made.
This begs the question, however, of how many other MATLAB functions have conversions to/from double in the background that would render them buggy when used with uint64 or int64 class data.
The above was done on a 32-bit WinXP machine.
James Tursa on 3 Jul 2012
FOLLOW-UP #2:
As I suspected, it wasn't hard to come up with uint64 numbers that did not work for R2010b and later. Bottom line is UNIQUE is buggy for uint64 and int64 class inputs in all versions of MATLAB as far as I can tell because of the underlying silent conversion to / from double.

Sean de Wolski on 2 Jul 2012
Replace the NaNs with an obscure number that you check to make sure is not present first. This will give you the full functionality of unique
function [u, ia, ic] = nanunique(varargin)
x = varargin{1};
t = rand;
while any(x(:)==t)
t = rand;
end
x(isnan(x)) = t;
[u, ia, ic] = unique(x,varargin{2:end});
u(u==t)=nan;
end
Then call it with something like:
nanunique([5 5 2 7 nan nan 5])