Stats functions for tables containing "nan" and "inf" (mean, mode, std)
3 views (last 30 days)
Show older comments
Hi, I am trying to find the mean and std of a column of a table. Unfortunately the presence of "NAN" and "inf" are giving me an infinite mean and NAN std.
Those data points i would like to be ignored, as i know what they mean with respect to the experiment, and are not relevant to the mean and std i am looking for.
What I have is:
clear all, clc, close all
%T9: Attempt at periodic mechanical mirror tilt (screws were able to be
%tilted, both for x and y by roughly 3pi/2 rads, forward and then back)
%Load data CSV
T9t=readtable('LF_BS_MECHTILT_T9.csv');
T9 = T9t( ~any (isnan(T9t) | isinf(T9t), 2),:)
figure
subplot(211)
plot(T9{:,{'ms'}} , str2double(T9{:,{'X1_um_','Y1_um_'}}))
legend({'x1(um)', 'y1(um)'}, 'location', "best")
title('Test 9: Attempt at periodic mechanical mirror tilt (Delta t from human error)')
xlabel('Time (seconds)');
ylabel('Distance From Centre (microns)');grid on, grid minor;
subplot(212)
plot(T9{:,{'ms'}} , T9{:,{'I1_V_'}})
legend({'Intensity (Volts)'}, 'location', "best")
title('Test 9 (I(V))')
xlabel('Time (seconds)');
ylabel('Intensity (Volts)');grid on, grid minor;
T9NCMX=mean(str2double(T9{1:9590, {'X1_um_'}}), 'omitnan');
T9NCMY=mean(str2double(T9{1:9590, {'Y1_um_'}}), 'omitnan');
T9NCSTDX=std(str2double(T9{1:9590, {'X1_um_'}}), 'omitnan');
T9NCSTDY=std(str2double(T9{1:9590, {'Y1_um_'}}), 'omitnan');
T9CMX=mean(str2double(T9{9590:17869, {'X1_um_'}}), 'omitnan');
T9CMY=mean(str2double(T9{9590:17869, {'Y1_um_'}}), 'omitnan');
T9CSTDX=std(str2double(T9{9590:17869, {'X1_um_'}}), 'omitnan');
T9CSTDY=std(str2double(T9{9590:17869, {'Y1_um_'}}), 'omitnan');
You can probably tell i have already tried quite a lot of different things and have thus gotten myself lost in a long list of incremental problemsolving attempts.
The original problem and solution that lead me to this format before trying to do stats on it can be found here:
The context is likely to be valuable if you are kind enough to help.
The relevant data file is attached, if you require anything else to be able to help then please let me know.
Thank you.
0 Comments
Answers (2)
Walter Roberson
on 29 Jun 2021
filename = 'https://www.mathworks.com/matlabcentral/answers/uploaded_files/668733/LF_BS_MECHTILT_T9.csv';
T9t = readtable(filename);
T9 = rmmissing(T9t);
subset = T9{:,{'X1_um_','Y1_um_'}};
whos subset
So you are already extracting double but you are trying to str2double() that, which is going to give you nan.
Note also the code repair for removing missing data.
4 Comments
Walter Roberson
on 29 Jun 2021
Tested in R2019b.
(The bit about USER is testing to see whether the code is being run in MATLAB Answers, in which case the file is to be read from the URL, and otherwise the file is to be read locally.)
You were getting NaN because of the str2double(), which were not necessary if you did a simple options detection.
You were also getting NaN because you had four lines with infinities, one of which is line 6702 of your file.
67460 inf inf 0.01 0.001 0.000 0.12 4.97 4.97 5.01 5.02
NAME = getenv('USER');
if strcmp(NAME, 'mluser')
filename = 'https://www.mathworks.com/matlabcentral/answers/uploaded_files/668733/LF_BS_MECHTILT_T9.csv';
else
filename = 'LF_BS_MECHTILT_T9.csv';
end
opt = detectImportOptions(filename);
T9t = readtable(filename, opt);
T9t = standardizeMissing(T9t, inf, 'Datavariables', {'X1_um_', 'Y1_um_'});
T9 = rmmissing(T9t);
subplot(2,1,1);
plot(T9{:,{'ms'}} , T9{:,{'X1_um_','Y1_um_'}});
legend({'x1(um)', 'y1(um)'}, 'location', "best");
title('Test 9: Attempt at periodic mechanical mirror tilt (Delta t from human error)');
xlabel('Time (seconds)');
ylabel('Distance From Centre (microns)'); grid on; grid minor;
subplot(2,1,2);
plot(T9{:,{'ms'}} , T9{:,{'I1_V_'}});
legend({'Intensity (Volts)'}, 'location', "best");
title('Test 9 (I(V))')
xlabel('Time (seconds)');
ylabel('Intensity (Volts)');grid on, grid minor;
Nsize = 9590;
last = min(17869, height(T9));
T9NCMX = mean(T9{1:Nsize, {'X1_um_'}}, 'omitnan') %#ok<*NOPTS>
T9NCMY = mean(T9{1:Nsize, {'Y1_um_'}}, 'omitnan')
T9NCSTDX = std(T9{1:Nsize, {'X1_um_'}}, 'omitnan')
T9NCSTDY = std(T9{1:Nsize, {'Y1_um_'}}, 'omitnan')
T9CMX = mean(T9{Nsize+1:last, {'X1_um_'}}, 'omitnan')
T9CMY = mean(T9{Nsize+1:last, {'Y1_um_'}}, 'omitnan')
T9CSTDX = std(T9{Nsize+1:last, {'X1_um_'}}, 'omitnan')
T9CSTDY = std(T9{Nsize+1:last, {'Y1_um_'}}, 'omitnan')
Walter Roberson
on 29 Jun 2021
Also, the file you supplied was shorter than how many you were trying to read... or at least was after removing missing items.
Speaking of which... you use a fixed 9590 to break the data, but since missing data is being removed, you should find probably find some other way. For example,
NAME = getenv('USER');
if strcmp(NAME, 'mluser')
filename = 'https://www.mathworks.com/matlabcentral/answers/uploaded_files/668733/LF_BS_MECHTILT_T9.csv';
else
filename = 'LF_BS_MECHTILT_T9.csv';
end
opt = detectImportOptions(filename);
T9t = readtable(filename, opt);
firstdataline = opt.DataLines(1);
T9t.LineNo = (1:height(T9t)).' + (firstdataline - 1);
T9t = standardizeMissing(T9t, inf, 'Datavariables', {'X1_um_', 'Y1_um_'});
T9 = rmmissing(T9t);
subplot(2,1,1);
plot(T9{:,{'ms'}} , T9{:,{'X1_um_','Y1_um_'}});
legend({'x1(um)', 'y1(um)'}, 'location', "best");
title('Test 9: Attempt at periodic mechanical mirror tilt (Delta t from human error)');
xlabel('Time (seconds)');
ylabel('Distance From Centre (microns)'); grid on; grid minor;
subplot(2,1,2);
plot(T9{:,{'ms'}} , T9{:,{'I1_V_'}});
legend({'Intensity (Volts)'}, 'location', "best");
title('Test 9 (I(V))')
xlabel('Time (seconds)');
ylabel('Intensity (Volts)');grid on, grid minor;
Nend = 9590; Mend = 17869;
Nend = find(T9.LineNo <= Nend, 1, 'last')
T9.LineNo(Nend)
Mend = find(T9.LineNo <= Mend, 1, 'last')
T9.LineNo(Mend)
Nrange = 1 : Nend;
Mrange = Nend+1:Mend;
T9NCMX = mean(T9{Nrange, {'X1_um_'}}, 'omitnan') %#ok<*NOPTS>
T9NCMY = mean(T9{Nrange, {'Y1_um_'}}, 'omitnan')
T9NCSTDX = std(T9{Nrange, {'X1_um_'}}, 'omitnan')
T9NCSTDY = std(T9{Nrange, {'Y1_um_'}}, 'omitnan')
T9CMX = mean(T9{Mrange, {'X1_um_'}}, 'omitnan')
T9CMY = mean(T9{Mrange, {'Y1_um_'}}, 'omitnan')
T9CSTDX = std(T9{Mrange, {'X1_um_'}}, 'omitnan')
T9CSTDY = std(T9{Mrange, {'Y1_um_'}}, 'omitnan')
louis ferreira
on 29 Jun 2021
5 Comments
Walter Roberson
on 30 Jun 2021
Go back to your original code. Right after reading the table, you had
T9 = T9t( ~any (isnan(T9t) | isinf(T9t), 2),:)
which selects only certain rows of T9t, discarding the rest. But it does not leave "holes" when it does that: the rows after the discarded entries "fall down" to fill the holes. So you might know something was line 9450 originally, but it has "fallen down" to lower-numbered slot, and MATLAB does not keep track of where it used to be. The kind of numeric indexing you are using is always relative to what is actually present.
The workaround is to attach row numbers before discarding rows, so you can figure out what row something was originally.
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
