Identifying Statistical Outliers in your Survey Data

Marketer often undertake large survey projects with little forethought about their approach to data analysis. Compounding this problem is their general lack of interest in cleaning the data they collect.

Data cleaning isn’t really optional. Without it your quantitative data may be tainted and your actions based on inaccurate information.

Identifying statistical outliers is a key part of data cleaning, and that’s what we’re going to cover here. We’ll discuss how we identify an outlier in relation to the study’s goals and the kind of data collected, and what to do with an outlier once identified (to omit it or leave it in your results).

Identifying Statistical Outliers in Your Survey Data

Data points that lie outside of the trend set by the majority of other values are typically easy to distinguish when the data is represented visually in a graph.

For example, the day you get 139 trial signups on your marketing site when the daily median is closer to 60 would be an obvious outlier, right?

Well, maybe.

But it’s tough to say without doing a little simple math first. [Notice that we didn’t use the average of 60 in the example; this is because an average can be manipulated by an outlier, and heavily if the sample is small.]

How to Calculate the Median

Start by taking your sample and ordering each observation from lowest to highest. As an example, we’ll stick with the trial signup hypothetical. In this case, we have a sample of 13 days and the signups from those days. After being re-arranged from smallest to largest, they look like this:

Day 1: 32
Day 2: 45
Day 3: 49
Day 4: 52
Day 5: 59
Day 6: 62
Day 7: 63 <-median
Day 8: 67
Day 9: 68
Day 10: 71
Day 11: 72
Day 12: 74
Day 13: 139

The median in this data set is Day 7 with a value of 63 trial signups. If you happen to have an even number of observations, the median would be the average of the two values closest to the middle. So now that we have the median for this sample, we’ll assign 63 as the variable Q2, which sits between variables Q1 and Q3 that define the upper and lower quartiles.

Q2 = 63

Calculate the Lower Quartile

Similar to the median (Q2) the lower quartile (Q1) is the middle observation of the lower half of the sample. With an even number of days (6) below the median, we’ll have to average days 3 and 4 (49 and 52 respectively). That makes our lower quartile (Q1) 50.5.

Q1 = 50.5

Calculate the Upper Quartile

Following the same steps, days 10 and 11 will have to be averaged (71 and 72 respectively). This gives us 71.5 for the Q3.

Q3 = 71.5

Calculate the Interquartile Range

The idea behind the interquartile is that once you know the distance between Q1 and Q3 (21 in this example), you can quickly identify boundaries known as ‘fences’ to sieve for statistical outliers. Observations that fall outside the inner fence are known as minor statistical outliers, while observations that also fall outside the outer fence are known as major statistical outliers.

Interquartile range: 21

There are two sets of fences – the inner fence and the outer fence. To calculate the inner fence, we multiply the interquartile by 1.5 and add the result to Q3 and subtract from Q1. To calculate the outer fence, we follow the same steps, but multiply by 3.

21 x 1.5 = 31.5
Q1-31.5 = 19, Q3+31.5 = 103
Inner fence = 19 to 103

21 x 3 = 63
Q1-63 = -12.5, Q3+63 = 134.5
Outer fence = -12.5 to 134.5

Now that we have our inner and outer fences, we can clearly see that the lowest of our observations, Day 1 with 32 signups, is well within the inner fence, and not considered an outlier. However, at our high end, Day 13 with 139 signups is well outside the inner fence and also outside the outer fence. This makes Day 13 a major outlier.

You’ve Identified the Statistical Outliers – Now What?

This is where a very objective process begins to take on a more subjective feel. Even though you’ve clearly labeled the observations that are statistical outliers within the data set, it isn’t a black and white issue whether you should omit or not omit an observation, especially considering it may be looked at as a form of data tampering.

Things to consider:

  • Was the outlier caused by error? Human error, process error, calculation error, etc.: If an inaccuracy is to blame, omission is generally a good idea. If not, then it may provide valuable insight, and including it may prove important.
  • Will the outlier’s inclusion skew the average? If so, it should probably be removed. If not, removing the outlier may be less crucial to conceiving an accurate picture.

There are several methods to determining statistical outliers, such as the Chauvenet’s criterion and Grubbs’ test. This is certainly not the only way to calculate an outlier, but if you need a simple and fast equation to determine an outlier with regards to the median and quartiles, the method outlined here will serve you well.

Source URL: Read More
The public content above was dynamically discovered – by graded relevancy to this site’s keyword domain name. Such discovery was by systematic attempts to filter for “Creative Commons“ re-use licensing and/or by Press Release distributions. “Source URL” states the content’s owner and/or publisher. When possible, this site references the content above to generate its value-add, the dynamic sentimental analysis below, which allows us to research global sentiments across a multitude of topics related to this site’s specific keyword domain name. Additionally, when possible, this site references the content above to provide on-demand (multilingual) translations and/or to power its “Read Article to Me” feature, which reads the content aloud to visitors. Where applicable, this site also auto-generates a “References” section, which appends the content above by listing all mentioned links. Views expressed in the content above are solely those of the author(s). We do not endorse, offer to sell, promote, recommend, or, otherwise, make any statement about the content above. We reference the content above for your “reading” entertainment purposes only. Review “DMCA & Terms”, at the bottom of this site, for terms of your access and use as well as for applicable DMCA take-down request.

Acquire this Domain
You can acquire this site’s domain name! We have nurtured its online marketing value by systematically curating this site by the domain’s relevant keywords. Explore our content network – you can advertise on each or rent vs. buy the domain. | Skype: TLDtraders | +1 (475) BUY-NAME (289 – 6263). Thousands search by this site’s exact keyword domain name! Most are sent here because search engines often love the keyword. This domain can be your 24/7 lead generator! If you own it, you could capture a large amount of online traffic for your niche. Stop wasting money on ads. Instead, buy this domain to gain a long-term marketing asset. If you can’t afford to buy then you can rent the domain.

About Us
We are Internet Investors, Developers, and Franchisers – operating a content network of several thousand sites while federating 100+ eCommerce and SaaS startups. With our proprietary “inverted incubation” model, we leverage a portfolio of $100M in valued domains to impact online trends, traffic, and transactions. We use robotic process automation, machine learning, and other proprietary approaches to power our content network. Contact us to learn how we can help you with your online marketing and/or site maintenance.