Remove boxplot identified outliers from data

Hi all,
I would wish to remove outliers from data available in a table that presents a wear parameter progression that allegedly depends on the wheel diameter range so that I can use data identified as non-outlier at a later stage. My query is not related to how to remove data outliers from boxplot output, but to be able to remove these from my data.
To date I managed to remove the outliers for the whole range of data (i.e., wheel diameter wear progression from 920 to 845mm), that is to account for the whole wear diameter range using rmoutliers, however this does not seem to account for outliers that are considered in smaller diameter ranges (i.e., 5mm) as it is my intention.
To be more specific I have used the following code, which might not be perfect, but seems to work, to identify outliers in the wear parameter under assessment across all wheel diameter range:
%Remove outliers from data.
[B,TF] = rmoutliers(y,'quartiles');
J = [x, y, TF];
V2 = array2table(J);
V3 = V2((V2.J3 <= 0),:);
xx = V3.J1;
yy = V3.J2;
V4 = V2((V2.J3 >= 1),:);
xxout = V4.J1;
yyout = V4.J2;
Then I have plotted this data vs wheel diameter range of 5 mm withing the wear range limits [920 to 845mm], but as I have considered the whole range of wear there are still outliers being identified.
Boxplot without outliers removed:
Boxplot with outliers removed for the whole wheel range (noted that still some outliers are being captured).
All in all, what I want is to remove outliers using the same approach, but considering data packs contained in 5 mm range (or n range as required) data sets as the boxplot function does, but with these being removed from my data so that I can assess.
Below an example of the data that I am looking to work out in such manner (wear parameter, diameter range min - diameter range max):
Any help would be very much appreciated.
Kind regards,
CB

5 Comments

please note, that all outliers marked in your second boxplot are not outliers in thebfirst boxplot. Of course not, since you removed them. But the difference between the two boxlots is, that the assumed statistical distribution per box is not the same. In the first boxplot the box data distribution is wider than compare to the second
CB
CB on 10 May 2022
Edited: CB on 10 May 2022
Unfortunately, I don´t think this is what I am looking for.
The first boxplot includes all data and identifies outliers as values out for the 5mm range that has been chosen
Thereafter I removed the outliers for the whole range of data available (all diameters), that is, just considering the "y" variable using [B,TF] = rmoutliers(y,'quartiles');
What I want is to be able to discretise data (according to x) so that I can remove outliers from data as per identified in the first boxplot (range of 5mm).
Hope this clarifies.
Kind reagards,
CB
Jonas
Jonas on 10 May 2022
Edited: Jonas on 10 May 2022
so if i understand you correctly now, you want to remove the red outliers per box (per group). first i would find your data groups display the boxplot for this, use findgroups(). now look into each group (each box) and remove values which are about +-2.7 times the standard deviation (std) away from the groups mean (this information comes from the Whisker section of the boxplot() documentation). to apply the cutting on each group, use splitapply() together with your data vector and the output of findgroups()
if you want to temove outliers of the common distribution of a 10mm range, you can ise the same procedure but you have to form a new group variable which is more coarse than the current. If you take column 2 or 3, the minimum difference between elements is 5, to generate a new group variable for groups of range of 10, you can calculate e.g. floor(col/10)*10 with col beeing column 2 or 3
Hi Thanks,
That worked.
Kind regards,
CB
great, have a nice day!

Sign in to comment.

Answers (0)

Products

Release

R2021b

Asked:

CB
on 9 May 2022

Commented:

on 13 May 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!