Deleting NaN from a large array

I have an extremely large array, and I am trying to delete every single NaN as the trials do not all have the same amount of variables. Is there any way to simply read the entirety of A and justs delete NaN? I have been trying to use isnan and I keep getting a deletion of everthing or only NaN back.

 Accepted Answer

DGM
DGM on 6 Oct 2021
Edited: DGM on 6 Oct 2021
Depends on what "extremely large" means. Let's start by assuming it's not too large for this to work. Furthermore, it depends on what "delete" means.
In this example, I assume a generic array and I assume "delete" means "replace with zero" for example.
% build example array with
s = [10000 10000];
A = zeros(s);
idx = randi([1 prod(s)],100,1);
A(idx) = NaN;
A(isnan(A)) = 0; % replace NaN with zero
sum(isnan(A),'all') % show that all the NaNs are gone
ans = 0
In this second example, I assume a vector and I assume "delete" means "remove this element and collapse the vector"
% build example array with
s = [1000000 1];
A = zeros(s);
idx = randi([1 prod(s)],100,1);
A(idx) = NaN;
A = A(~isnan(A)); % remove NaNs
sum(isnan(A),'all') % show that all the NaNs are gone
ans = 0
Note that element removal was only demonstrated for a vector. This is simply because you can't remove single elements from a 2 (or more) dimensional array and collapse the array accordingly. In other words, you can't have "holes" in an array. Only in the 1D case does removing a single element result in an unambiguous means to collapse the array.
If neither of these are close to what you need, or if your array is so large that these can't work, we'll work from there.

5 Comments

Thank you! Unfortunately by "delete" I meant it removes the NaN completely from the column! That way when I got to do my for loop later, it stops and doesn't starting reading the Nan in the columns. And by extremely large, I mean a 6400x120 double. I tried doing
A=A(:,~all(isnan(A))) to see if that would remove them from all columns, but it doesn't do anything and the NaN are still there.
When I try your second example, it gives me A now as 336,720x1 double.
Let's go a bit further. The reason that the second example gives you a long vector is simply that it it vectorizes the result. If NaNs exist uncoordinated throughout a 2 (or more) dimensional array, that's the only representation that can generally contain the result of removing the NaNs.
Consider the array:
A = reshape(1:20,10,2);
A([5 12 17]) = NaN
A = 10×2
1 11 2 NaN 3 13 4 14 NaN 15 6 16 7 NaN 8 18 9 19 10 20
Removing the NaNs would result in a non-rectangular array. Such a thing isn't really possible, and even if it were, the correspondence between rows would be lost. Whether that information is important in your case, I don't know.
Applying the second example vectorizes the result as explained:
A = A(~isnan(A)) % remove NaNs
A = 17×1
1 2 3 4 6 7 8 9 10 11
Note that 17 is no longer integer-divisible by the original number of columns. If by chance each column contains the same number of NaNs and row correspondence is of no concern...
A = reshape(1:20,10,2);
A([5 8 12 17]) = NaN
A = 10×2
1 11 2 NaN 3 13 4 14 NaN 15 6 16 7 NaN NaN 18 9 19 10 20
A = reshape(A(~isnan(A)),[],2)
A = 8×2
1 11 2 13 3 14 4 15 6 16 7 18 9 19 10 20
Though it's unlikely that such a case applies.
There is also the possibility that NaNs can be filled based on the surrounding data. Again, it depends whether that suits your needs.
Similarly, there is the possibility that NaN removal might be unnecessary if the subsequent processing can work around them.
Thank you so much for all your help! I think the problem is that I have a function that takes a google sheet and reads the data and puts it in Matlab. Because not every trial has the same time, a lot of the cells in columns are left empty. As a result, when the data got put into matlab, it made every column the same length, but when the data stopped, it made everything that was empty NaN.
I think it is impossible to make the columns remove the NaN, unless there is a way to do this:
Read every single column from top to bottom and as soon as it hits NaN, it deletes everything below that since it is only NaN and then move to the next column.
For reference, this is how I called my function.
exist A % Checks to See if The Array of Variables is Already in the Workspace
if ans == 1 % If The Array is in the Workspace, it prints the following:
fprintf('Lets get moving! The data is already here. \n')
else ans = 0 % If the array is NOT in the workspace, it calls it from the function.
fprintf('Grabbing the data now... Just one moment! \n')
A = GetGoogleSpreadsheet('1dqtn5aTdOIuhcLbz1coBazoInQ9a2XNJcb4rUf1BdqM'); % Calls the function GetGoogleSpreadsheet
A = str2double(A); % Converts the Cell into a Double
A(1,:) = []; % Deletes the first Row So Only the Measurements are Left (Deletes time, acceleration, position, etc from 1st row)
end
Part of my project is checking to make sure that the function doesn't read the google sheet again if its in the window, which is why I have the exist if/then.
@Colton McGarraugh 6400 by 120 is far from large. It's just a small fraction of the size of a typical digital image. If it were 10k by 10k by 8 bytes, then we'd be approaching large.
But I question your original ask. Why do you think you need to "delete" nans in the first place? It might not be necessary depending on what you want to do. For example many functions like mean() have an 'omitnan' option. Plus maybe you could just replace the nan with the median of surrounding values, like I do in my attached salt and pepper noise removal demo. Like DGM said, you can't just remove them because then you'd have holes or "ragged" edges on the 2-D matrix, neither of which is allowed.

Sign in to comment.

More Answers (0)

Categories

Products

Release

R2021b

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!