Detection and Correction of outliers

So we have a table of data in Excel (file). We need to screen the data for outliers and replace these outliers with the mean +/- 3 standard deviations + some value.

We first of all need to calculate each column’s mean and +/- 3 standard deviations. We can use these formula for column A:

=AVERAGE(C8:C21)+3*STDEV(C8:C21)

=AVERAGE(C9:C22)-3*STDEV(C9:C22)

The next step is to copy and paste your table but get rid of the data values:

Let’s add a function into cell A-SJ1, and copy-paste this function so it fills all of the cells.

=IF(C8>C$23,C$23,IF(C8<C$24,C$24,C8))

What this function does is look at the original table and asks if that column is bigger than the mean+3stdev value and if so, use mean+3stdev instead of the original value. If not, check to see if the value is smaller than mean-3stdev instead of the original value and if so, replace with mean-3stdev. If not, use the original value.

The real magic in this formula though is the use of the $ symbol. This ‘locks’ the value it proceeds when the formula is copied and pasted. Because of this we can individually test each data point in each column and update as appropriate.

In our original dataset there were no outliers. I reduced our stdev threshold to 1.8 stdevs. I also modified the formula in A-SJ1 to this:

=IF(C8>C$23,”E”&C$23,IF(C8<C$24,”E”&C$24,C8))

This suffixes “E” to any value that is outlying:

With this technique we can also screen for outliers that occur at a participant level:

=IF(C8>MIN(C$23,$K8),”E”&MIN(C$23,$K8),IF(C8<MAX(C$24,$L8),”E”&MAX(C$24,$L8),C8))

If you want to calculate the % of outliers you can now use this formula:

=COUNTIF(O8:U21,”E*”)/COUNTA(O8:U21)*100

we have 4.08% outlying here when using 1.8 stdev as our threshold.

here’s my excel file.