FACTOID # 19: Cheap sloppy joes: Looking for reduced-price lunches for schoolchildren? Head for Oklahoma!
 
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 
   
 
WHAT'S NEW
RELATED ARTICLES
People who viewed "Outlier" also viewed:
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Outlier
Figure 1. Box plot of data from the Michelson-Morley Experiment displaying outliers in the middle column.
Figure 1. Box plot of data from the Michelson-Morley Experiment displaying outliers in the middle column.

In statistics such as stratified samples, an outlier is an observation that is numerically distant from the rest of the data. Statistics derived from data sets that include outliers will often be misleading. For example, if one is calculating the average temperature of 10 objects in a room, and most are between 20 and 25 degrees Celsius, but an oven is at 350 °C, the median of the data may be 23 but the mean temperature will be 55. In this case, the median better reflects the temperature of a randomly sampled object than the mean. Outliers may be indicative of data points 'that belong to a different population than the rest of the sample set. D is Bs exclave, but is not an enclave. ... Polynesian outliers are a number of Polynesian islands which lie in Melanesia and Micronesia. ... Image File history File links This is a lossless scalable vector image. ... Image File history File links This is a lossless scalable vector image. ... The Michelson-Morley experiment, one of the most important and famous experiments in the history of physics, was performed in 1887 by Albert Michelson and Edward Morley at what is now Case Western Reserve University, and is considered by some to be the first strong evidence against the theory of... This article is about the field of statistics. ... A data set (or dataset) is a collection of data, usually presented in tabular form. ... In mathematics, an average or central tendency of a set (list) of data refers to a measure of the middle of the data set. ... The degree Celsius (°C) is a unit of temperature named after the Swedish astronomer Anders Celsius (1701–1744), who first proposed it in 1742. ... This article is about the statistical concept. ... This article is about mathematical mean. ... A sample is that part of a population which is actually observed. ...


In most samplings of data, some data points will be further away from their expected values than what is deemed reasonable. This can be due to systematic error, faults in the theory that generated the expected values, or it can simply be the case that some observations happen to be a long way from the center of the data. Outlier points can therefore indicate faulty data, erroneous procedures, or areas where a certain theory might not be valid. However, a small number of outliers is expected in normal distributions. Italic textSystematic errorsBold text are biases in measurement which lead to measured values being systematically too high or too low. ... The word theory has a number of distinct meanings in different fields of knowledge, depending on their methodologies and the context of discussion. ... The normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields. ...


Estimators not sensitive to outliers are said to be robust. In statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population parameter; an estimate is the result from the actual application of the function to a particular set of data. ... Robust means healthy, strong, durable, and often adaptable, innovative, flexible. ...


Deletion of outlier data is a controversial practice frowned on by many scientists and science instructors; while mathematical criteria provide an objective and quantitative method for data rejection, they do not make the practice more scientifically or methodologically sound, especially in small sets or where a normal distribution cannot be assumed. Rejection of outliers is more acceptable in areas of practice where the underlying model of the process being measured and the usual distribution of measurement error are confidently known. When practiced, rejection of outliers usually is based on some rule such as the quartile rules given below, Chauvenet's Criterion, or Grubbs' Test for Outliers. Chauvenets Criterion is a means of assessing whether one piece of experimental data — an outlier — from a set of observations, is spurious. ...

Contents

Mathematical definitions

Mild outliers

Defining Q1 and Q3 to be first and third quartiles, and IQR to be the interquartile range (Q3Q1), one possible definition of being "far away" in this context is: In descriptive statistics, a quartile is any of the three values which divide the sorted data set into four equal parts, so that each part represents 1/4th of the sample or population. ... In descriptive statistics, the interquartile range (IQR), also called the midspread and middle fifty is the range between the third and first quartiles and is a measure of statistical dispersion. ...

< Q_1 - 1.5cdot mathrm{IQR},

or

> Q_3 + 1.5cdot mathrm{IQR}.

Q1 and Q3 thus determine the so-called inner fences (above), beyond which an observation would be labeled a mild outlier.


Extreme outliers

Extreme outliers are observations that are beyond the outer fences:

< Q_1 - 3cdot mathrm{IQR},

or

> Q_3 + 3cdot mathrm{IQR}.

Occurrence and causes

In the case of normally distributed data, using the above definitions, only about 1 in 150 observations will be a mild outlier, and only about outlier. Because of this, outliers usually demand special attention, since they may indicate problems in sampling or data collection or transcription. The normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields. ... Sampling is that part of statistical practice concerned with the selection of individual observations intended to yield some knowledge about a population of concern, especially for the purposes of statistical inference. ...


Alternatively, an outlier could be the result of a flaw in the assumed theory, calling for further investigation by the researcher.


Non-normal distributions

Even when a normal distribution model is appropriate to the data being analyzed, outliers are expected for large sample sizes and should not automatically be discarded if that is the case. Also, the possibility should be considered that the underlying distribution of the data is not approximately normal, having "fat tails". For instance, when sampling from a Cauchy distribution, the sample variance increases with the sample size, the sample mean fails to converge as the sample size increases, and outliers are expected at far larger rates than for a normal distribution. Outliers play an important role in statistics The Cauchy-Lorentz distribution, named after Augustin Cauchy, is a continuous probability distribution with probability density function where x0 is the location parameter, specifying the location of the peak of the distribution, and γ is the scale parameter which specifies the half-width at half-maximum (HWHM). ...


See also

Robust statistics provides an alternative approach to classical statistical methods. ... In robust statistics, robust regression is a form of regression analysis designed to circumvent the limitations of traditional parametric and non-parametric methods. ... Figure 1. ... In statistics, a Studentized residual, named in honor of William Sealey Gosset, who wrote under the pseudonym Student, is a residual adjusted by dividing it by an estimate of its standard deviation. ...

External links

  • Definition of outlier from MathWorld.
  • [1] Grubbs test described by NIST manual
  • Fahy, Tom (2006). Arcadia Financial: The Impact of Outliers on Absolute Returns in Trend Following Trading Systems. Retrieved on 2006-10-29.
  • how to detect outliers, and how to deal with outliers
Year 2006 (MMVI) was a common year starting on Sunday of the Gregorian calendar. ... is the 302nd day of the year (303rd in leap years) in the Gregorian calendar. ...

  Results from FactBites:
 
Outlier Detection and Frequency Counts with DataFlux Technology (402 words)
Outliers show you the highest and lowest values for a set of data.
However, the report indicates that there are many outliers on both the low end and the high end.
Outlier detection from DataFlux allows you to quickly and easily determine if there are gross inconsistencies in certain data elements, and helps you drill through to the actual records and begin to create a defined process to correct the data.
  More results at FactBites »

 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m