Stats functions for tables containing "nan" and "inf" (mean, mode, std)

3 views (last 30 days)
Hi, I am trying to find the mean and std of a column of a table. Unfortunately the presence of "NAN" and "inf" are giving me an infinite mean and NAN std.
Those data points i would like to be ignored, as i know what they mean with respect to the experiment, and are not relevant to the mean and std i am looking for.
What I have is:
clear all, clc, close all
%T9: Attempt at periodic mechanical mirror tilt (screws were able to be
%tilted, both for x and y by roughly 3pi/2 rads, forward and then back)
%Load data CSV
T9t=readtable('LF_BS_MECHTILT_T9.csv');
T9 = T9t( ~any (isnan(T9t) | isinf(T9t), 2),:)
figure
subplot(211)
plot(T9{:,{'ms'}} , str2double(T9{:,{'X1_um_','Y1_um_'}}))
legend({'x1(um)', 'y1(um)'}, 'location', "best")
title('Test 9: Attempt at periodic mechanical mirror tilt (Delta t from human error)')
xlabel('Time (seconds)');
ylabel('Distance From Centre (microns)');grid on, grid minor;
subplot(212)
plot(T9{:,{'ms'}} , T9{:,{'I1_V_'}})
legend({'Intensity (Volts)'}, 'location', "best")
title('Test 9 (I(V))')
xlabel('Time (seconds)');
ylabel('Intensity (Volts)');grid on, grid minor;
T9NCMX=mean(str2double(T9{1:9590, {'X1_um_'}}), 'omitnan');
T9NCMY=mean(str2double(T9{1:9590, {'Y1_um_'}}), 'omitnan');
T9NCSTDX=std(str2double(T9{1:9590, {'X1_um_'}}), 'omitnan');
T9NCSTDY=std(str2double(T9{1:9590, {'Y1_um_'}}), 'omitnan');
T9CMX=mean(str2double(T9{9590:17869, {'X1_um_'}}), 'omitnan');
T9CMY=mean(str2double(T9{9590:17869, {'Y1_um_'}}), 'omitnan');
T9CSTDX=std(str2double(T9{9590:17869, {'X1_um_'}}), 'omitnan');
T9CSTDY=std(str2double(T9{9590:17869, {'Y1_um_'}}), 'omitnan');
You can probably tell i have already tried quite a lot of different things and have thus gotten myself lost in a long list of incremental problemsolving attempts.
The original problem and solution that lead me to this format before trying to do stats on it can be found here:
The context is likely to be valuable if you are kind enough to help.
The relevant data file is attached, if you require anything else to be able to help then please let me know.
Thank you.

Answers (2)

Walter Roberson
Walter Roberson on 29 Jun 2021
filename = 'https://www.mathworks.com/matlabcentral/answers/uploaded_files/668733/LF_BS_MECHTILT_T9.csv';
T9t = readtable(filename);
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
T9 = rmmissing(T9t);
subset = T9{:,{'X1_um_','Y1_um_'}};
whos subset
Name Size Bytes Class Attributes subset 16346x2 261536 double
So you are already extracting double but you are trying to str2double() that, which is going to give you nan.
Note also the code repair for removing missing data.
  4 Comments
Walter Roberson
Walter Roberson on 29 Jun 2021
Tested in R2019b.
(The bit about USER is testing to see whether the code is being run in MATLAB Answers, in which case the file is to be read from the URL, and otherwise the file is to be read locally.)
You were getting NaN because of the str2double(), which were not necessary if you did a simple options detection.
You were also getting NaN because you had four lines with infinities, one of which is line 6702 of your file.
67460 inf inf 0.01 0.001 0.000 0.12 4.97 4.97 5.01 5.02
NAME = getenv('USER');
if strcmp(NAME, 'mluser')
filename = 'https://www.mathworks.com/matlabcentral/answers/uploaded_files/668733/LF_BS_MECHTILT_T9.csv';
else
filename = 'LF_BS_MECHTILT_T9.csv';
end
opt = detectImportOptions(filename);
T9t = readtable(filename, opt);
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
T9t = standardizeMissing(T9t, inf, 'Datavariables', {'X1_um_', 'Y1_um_'});
T9 = rmmissing(T9t);
subplot(2,1,1);
plot(T9{:,{'ms'}} , T9{:,{'X1_um_','Y1_um_'}});
legend({'x1(um)', 'y1(um)'}, 'location', "best");
title('Test 9: Attempt at periodic mechanical mirror tilt (Delta t from human error)');
xlabel('Time (seconds)');
ylabel('Distance From Centre (microns)'); grid on; grid minor;
subplot(2,1,2);
plot(T9{:,{'ms'}} , T9{:,{'I1_V_'}});
legend({'Intensity (Volts)'}, 'location', "best");
title('Test 9 (I(V))')
xlabel('Time (seconds)');
ylabel('Intensity (Volts)');grid on, grid minor;
Nsize = 9590;
last = min(17869, height(T9));
T9NCMX = mean(T9{1:Nsize, {'X1_um_'}}, 'omitnan') %#ok<*NOPTS>
T9NCMX = -153.1838
T9NCMY = mean(T9{1:Nsize, {'Y1_um_'}}, 'omitnan')
T9NCMY = -235.9715
T9NCSTDX = std(T9{1:Nsize, {'X1_um_'}}, 'omitnan')
T9NCSTDX = 374.1832
T9NCSTDY = std(T9{1:Nsize, {'Y1_um_'}}, 'omitnan')
T9NCSTDY = 333.0656
T9CMX = mean(T9{Nsize+1:last, {'X1_um_'}}, 'omitnan')
T9CMX = -13.2349
T9CMY = mean(T9{Nsize+1:last, {'Y1_um_'}}, 'omitnan')
T9CMY = -10.2050
T9CSTDX = std(T9{Nsize+1:last, {'X1_um_'}}, 'omitnan')
T9CSTDX = 84.6210
T9CSTDY = std(T9{Nsize+1:last, {'Y1_um_'}}, 'omitnan')
T9CSTDY = 77.1382
Walter Roberson
Walter Roberson on 29 Jun 2021
Also, the file you supplied was shorter than how many you were trying to read... or at least was after removing missing items.
Speaking of which... you use a fixed 9590 to break the data, but since missing data is being removed, you should find probably find some other way. For example,
NAME = getenv('USER');
if strcmp(NAME, 'mluser')
filename = 'https://www.mathworks.com/matlabcentral/answers/uploaded_files/668733/LF_BS_MECHTILT_T9.csv';
else
filename = 'LF_BS_MECHTILT_T9.csv';
end
opt = detectImportOptions(filename);
T9t = readtable(filename, opt);
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
firstdataline = opt.DataLines(1);
T9t.LineNo = (1:height(T9t)).' + (firstdataline - 1);
T9t = standardizeMissing(T9t, inf, 'Datavariables', {'X1_um_', 'Y1_um_'});
T9 = rmmissing(T9t);
subplot(2,1,1);
plot(T9{:,{'ms'}} , T9{:,{'X1_um_','Y1_um_'}});
legend({'x1(um)', 'y1(um)'}, 'location', "best");
title('Test 9: Attempt at periodic mechanical mirror tilt (Delta t from human error)');
xlabel('Time (seconds)');
ylabel('Distance From Centre (microns)'); grid on; grid minor;
subplot(2,1,2);
plot(T9{:,{'ms'}} , T9{:,{'I1_V_'}});
legend({'Intensity (Volts)'}, 'location', "best");
title('Test 9 (I(V))')
xlabel('Time (seconds)');
ylabel('Intensity (Volts)');grid on, grid minor;
Nend = 9590; Mend = 17869;
Nend = find(T9.LineNo <= Nend, 1, 'last')
Nend = 8884
T9.LineNo(Nend)
ans = 9590
Mend = find(T9.LineNo <= Mend, 1, 'last')
Mend = 16338
T9.LineNo(Mend)
ans = 17869
Nrange = 1 : Nend;
Mrange = Nend+1:Mend;
T9NCMX = mean(T9{Nrange, {'X1_um_'}}, 'omitnan') %#ok<*NOPTS>
T9NCMX = -164.2345
T9NCMY = mean(T9{Nrange, {'Y1_um_'}}, 'omitnan')
T9NCMY = -253.0831
T9NCSTDX = std(T9{Nrange, {'X1_um_'}}, 'omitnan')
T9NCSTDX = 385.8182
T9NCSTDY = std(T9{Nrange, {'Y1_um_'}}, 'omitnan')
T9NCSTDY = 338.2043
T9CMX = mean(T9{Mrange, {'X1_um_'}}, 'omitnan')
T9CMX = -13.3237
T9CMY = mean(T9{Mrange, {'Y1_um_'}}, 'omitnan')
T9CMY = -11.1980
T9CSTDX = std(T9{Mrange, {'X1_um_'}}, 'omitnan')
T9CSTDX = 85.0398
T9CSTDY = std(T9{Mrange, {'Y1_um_'}}, 'omitnan')
T9CSTDY = 83.9901

Sign in to comment.


louis ferreira
louis ferreira on 29 Jun 2021
thank you,
What is "options detection" ?
"You were also getting NaN because you had four lines with infinities, one of which is line 6702 of your file.
67460 inf inf 0.01 0.001 0.000 0.12 4.97 4.97 5.01 5.02"
This i was aware of, telling matlab not count them was the issue.
  5 Comments
Walter Roberson
Walter Roberson on 30 Jun 2021
Go back to your original code. Right after reading the table, you had
T9 = T9t( ~any (isnan(T9t) | isinf(T9t), 2),:)
which selects only certain rows of T9t, discarding the rest. But it does not leave "holes" when it does that: the rows after the discarded entries "fall down" to fill the holes. So you might know something was line 9450 originally, but it has "fallen down" to lower-numbered slot, and MATLAB does not keep track of where it used to be. The kind of numeric indexing you are using is always relative to what is actually present.
The workaround is to attach row numbers before discarding rows, so you can figure out what row something was originally.
louis ferreira
louis ferreira on 30 Jun 2021
Ah, yes you are correct, i was being stupid, that line was the last thing i could think of trying before needing to ask for help, so i had kind of forgotten i had tried it there.
Thank you this has really helped contextualise what i have done, and not understood.

Sign in to comment.

Categories

Find more on MATLAB in Help Center and File Exchange

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!