Cody

Problem 42485. Eliminate Outliers Using Interquartile Range

Created by Monica Roberts in Community

Given a vector with your "data" find the outliers and remove them.

To determine whether data contains an outlier:

  1. Identify the point furthest from the mean of the data.
  2. Determine whether that point is further than 1.5*IQR away from the mean.
  3. If so, that point is an outlier and should be eliminated from the data resulting in a new set of data.
  4. Repeat steps to determine if new data set contains an outlier until dataset no longer contains outlier.

IQR: Interquartile Range is the range between the median of the upper half and the median of the lower half of data: http://www.wikihow.com/Find-the-IQR

To find an outlier by hand:

Data: [ 53 55 51 50 60 52 ] we will check for outliers.

Sorted: [ 50 51 52 53 55 60 ] where the mean is 53.5 and 60 is the furthest away (60-53.5 > 53.5-50).

1.5 * IQR = 1.5 * (55-51) = 6

Since 60-53.5 = 6.5 > 6, 60 is an outlier.

New Data: [ 53 55 51 50 52 ] we will check for outliers.

New Data Sorted: [ 50 51 52 53 55 ] where the mean is 52.2 and 55 is the furthest away.

1.5* IQR = 1.5 * (54-50.5) = 4.5

Since 55-52.2 = 2.8 < 4.5, 55 is NOT an outlier.

Our original data had one outlier, which was 60.

Example:

Input data = [53 55 51 50 60 52]
Output new_data = [53 55 51 50 52]

since 60 is an outlier, it is removed

*Note: A number may be repeated within a dataset that is an outlier. You should not remove all instances, but remove only the first instance and check the new dataset to determine whether this number is still an outlier (see 5th test suite).*

Solution Stats

34.25% Correct | 65.75% Incorrect
Last solution submitted on May 25, 2019

Problem Comments

Recent Solvers12

Suggested Problems

More from this Author1

Tags