FACTOID # 13: New York has America's lowest percentage of residents who are veterans.
 
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 
   
 
WHAT'S NEW
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Hypergeometric distribution

Contents

Introduction

Hypergeometric
Probability mass function
Cumulative distribution function
Parameters Nin 1,2,3,dots,
min 0,1,dots,N,
nin 1,2,dots,N,
Support k in mathrm{max} lbrace 0,n+m-N rbrace,dots, mathrm{min} lbrace m,n rbrace ,
Probability mass function (pmf) {{{m choose k} {{N-m} choose {n-k}}}over {N choose n}}
Cumulative distribution function (cdf)
Mean n mover N
Median
Mode left lfloor frac{(n+1)(m+1)}{N+2} right rfloor
Variance n(m/N)(1-m/N)(N-n)over (N-1)
Skewness frac{(N-2m)(N-1)^frac{1}{2}(N-2n)}{[nm(N-m)(N-n)]^frac{1}{2}(N-2)}
Excess kurtosis  left[frac{N^2(N-1)}{n(N-2)(N-3)(N-n)}right]

cdotleft[frac{N(N+1)-6N(N-n)}{m(N-m)}right. +left.frac{3n(N-n)(N+6)}{N^2}-6right] In mathematics, the support of a real-valued function f on a set X is sometimes defined as the subset of X on which f is nonzero. ... In probability theory, a probability mass function (abbreviated pmf) gives the probability that a discrete random variable is exactly equal to some value. ... In probability theory, the cumulative distribution function (abbreviated cdf) completely describes the probability distribution of a real-valued random variable, X. For every real number x, the cdf is given by where the right-hand side represents the probability that the random variable X takes on a value less than... In probability theory the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects as the outcome of the random trial when identical odds are... In probability theory and statistics, a median is a number dividing the higher half of a sample, a population, or a probability distribution from the lower half. ... In statistics, mode means the most frequent value assumed by a random variable, or occurring in a sampling of a random variable. ... In probability theory and statistics, the variance of a random variable (or somewhat more precisely, of a probability distribution) is a measure of its statistical dispersion, indicating how its possible values are spread around the expected value. ... Example of the experimental data with non-zero skewness (gravitropic response of wheat coleoptiles, 1,790) In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. ... The far red light has no effect on the average speed of the gravitropic reaction in wheat coleoptiles, but it changes kurtosis from platykurtic to leptokurtic (-0. ...

Entropy
Moment-generating function (mgf) frac{{N-m choose n}} {{N choose n}},_2F_1(-n,!-m;!N!-!m!-!n!+!1;!e^{t})
Characteristic function frac{{N-m choose n}} {{N choose n}},_2F_1(-n,!-m;!N!-!m!-!n!+!1;!e^{it})

In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the number of successes in a sequence of n draws from a finite population without replacement. Claude Shannon In information theory, the Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable. ... In probability theory and statistics, the moment-generating function of a random variable X is wherever this expectation exists. ... In probability theory, the characteristic function of any random variable completely defines its probability distribution. ... Probability theory is the branch of mathematics concerned with analysis of random phenomena. ... A graph of a normal bell curve showing statistics used in educational assessment and comparing various grading methods. ... In mathematics and statistics, a probability distribution is a function of the probabilities of a mutually exclusive and exhaustive set of events. ...



A typical example is illustrated by this contingency table:
In statistics, contingency tables are used to record and analyse the relationship between two or more variables, most usually categorical variables. ...

drawn not drawn total
defective k mk m
non-defective nk N + k − n − m N − m
total n N − n N

There is a shipment of N objects in which m are defective. The hypergeometric distribution describes the probability that in a sample of n distinctive objects drawn from the shipment exactly k objects are defective. Sampling is that part of statistical practice concerned with the selection of individual observations intended to yield some knowledge about a population of concern, especially for the purposes of statistical inference. ...


In general, if a random variable X follows the hypergeometric distribution with parameters N, m and n, then the probability of getting exactly k successes is given by A random variable is a mathematical function that maps outcomes of random experiments to numbers. ...

 f(k;N,m,n) = {{{m choose k} {{N-m} choose {n-k}}}over {N choose n}}

The probability is positive when k is between
max{0,n + mN} and min{m,n}.


The formula can be understood as follows: There are tbinom{N}{n} possible samples (without replacement). There are tbinom{m}{k} ways to obtain k defective objects and there are tbinom{N-m}{n-k} ways to fill out the rest of the sample with non-defective objects.


The fact that the sum of the probabilities, as k runs through the range of possible values, is equal to 1, is essentially Vandermonde's identity from combinatorics. In combinatorial mathematics, Vandermondes identity, named after Alexandre-Théophile Vandermonde, states that This may be proved by simple algebra relying on the fact that (see factorial) but it also admits a more combinatorics-flavored bijective proof, as follows. ... Combinatorics is a branch of pure mathematics concerning the study of discrete (and usually finite) objects. ...


Application and example

The classical application of the hypergeometric distribution is sampling without replacement. Think of an urn with two types of marbles, black ones and white ones. Define drawing a white marble as a success and drawing a black marble as a failure (analogous to the binomial distribution). If the variable N describes the number of all marbles in the urn (see contingency table above) and m describes the number of white marbles (called defective in the example above), then N − m corresponds to the number of black marbles.
Now, assume that there are 5 white and 45 black marbles in the urn. Standing next to the urn, you close your eyes and draw 10 marbles without replacement. What's the probability that you draw exactly 4 white marbles (and - of course - 6 black marbles) ? An urn problem is an idealized thought experiment in which some objects of real interest (such as atoms, people, cars, etc. ...


This problem is summarized by the following contingency table:

drawn not drawn total
white marbles 4 (k) 1 = 5 − 4 (mk) 5 (m)
black marbles 6 = 10 − 4 (nk) 39 = 50 + 4 − 10 − 5 (N + k − n − m) 45 (N − m)
total 10 (n) 40 (N − n) 50 (N)

The probability of drawing exactly x white marbles can be calculated by the formula

 Pr(k=x) = f(k;N,m,n) = {{{m choose k} {{N-m} choose {n-k}}}over {N choose n}}.

Hence, in this example x = 4, calculate

 Pr(k=4) = f(4;50,5,10) = {{{5 choose 4} {{45} choose {6}}}over {50 choose 10}} = 0.003964583dots.

So, the probability of drawing exactly 4 white marbles is quite low (approximately 0.004) and the event is very unlikely. It means, if you repeated your random experiment (drawing 10 marbles from the urn of 50 marbles without replacement) 1000 times you just would expect to obtain such a result 4 times.


But what about the probability of drawing all 5 white marbles? You will intuitively agree upon that this is even more unlikely than drawing 4 white marbles. Let us calculate the probability for such an extreme event.


The contingency table is as follows:

drawn not drawn total
white marbles 5 (k) 0 = 5 − 5 (m − k) 5 (m)
black marbles 5 = 10 − 5 (n − k) 40 = 50 + 5 − 10 − 5 (N + k − n − D) 45 (N − m)
total 10 (n) 40 (N − n) 50 (N)

And we can calculate the probability as follows (notice that the denominator always stays the same):

 Pr[k=5] = f(5;50,5,10) = {{{5 choose 5} {{45} choose {5}}}over {50 choose 10}} = 0.0001189375dots,

As expected, the probability of drawing 5 white marbles is even much lower than drawing 4 white marbles.


Symmetries

  • f(k;N,m,n) = f(nk;N,Nm,n)

This symmetry can be intuitively understood if you repaint all the black marbles to white and vice versa, thus the black and white marbles simply change roles.

  • f(k;N,m,n) = f(mk;N,m,Nn)

This symmetry can be intuitively understood as swapping the roles of taken and not taken marbles.

  • f(k;N,m,n) = f(k;N,n,m)

This symmetry can be intuitively understood if instead of drawing marbles, you label the marbles that you would have drawn. Both expressions give the probability that exactly k marbles are "black" and labeled "drawn".


Relationship to Fisher's exact test

The test (see above) based on the hypergeometric distribution (hypergeometric test) is identical to the corresponding one-tailed version of Fisher's exact test. Reciprocally, the p-value of a two-sided Fisher's exact test can be calculated as the sum of two appropriate hypergeometric tests (for more information see the following web site). Fishers exact test is a statistical significance test used in the analysis of categorical data where sample sizes are small. ...


Related distributions

Let X ~ Hypergeometric(m, N, n) and p = m / N.

  • If N and m are large compared to n and p is not close to 0 or 1, then P[X le x] approx P[Y le x] where Y has a binomial distribution with parameters n and p.
  • If n is large, N and m are large compared to n and p is not close to 0 or 1, then

P[X le x] approx Phi left( frac{x-n p}{sqrt{n p (1-p)}} right) In probability theory and statistics, the Bernoulli distribution, named after Swiss scientist Jakob Bernoulli, is a discrete probability distribution, which takes value 1 with success probability and value 0 with failure probability . ... In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. ...


where Φ is the standard normal distribution function Probability density function of Gaussian distribution (bell curve). ...


Multivariate hypergeometric distribution

Multivariate Hypergeometric Distribution
Probability mass function
Cumulative distribution function
Parameters c in mathbb{N}
(m_1,ldots,m_c) in mathbb{N}^c
N = sum_{i=1}^c m_i
n in [0,N]
Support left{ mathbf{k} in mathbb{Z}_{0+}^c , : , sum_{i=1}^{c} k_i = n right}
Probability mass function (pmf) frac{prod_{i=1}^{c} binom{m_i}{k_i}}{binom{N}{n}}
Cumulative distribution function (cdf)
Mean E(X_i) = frac{n m_i}{N}
Median
Mode
Variance var(X_i) = frac{m_i}{N} left(1-frac{m_i}{N}right) n frac{N-n}{N-1}
cov(X_i,X_j) = -frac{n m_i m_j}{N^2} frac{N-n}{N-1}
Skewness
Excess kurtosis
Entropy
Moment-generating function (mgf)
Characteristic function

The model of an urn with black and white marbles can be extended to the case where there are more than two colors of marbles. If there are mi marbles of color i in the urn and you take n marbles at random without replacement, then the number of marbles of each color in the sample (k1,k2,...,kc) has the multivariate hypergeometric distribution. In mathematics, the support of a real-valued function f on a set X is sometimes defined as the subset of X on which f is nonzero. ... In probability theory, a probability mass function (abbreviated pmf) gives the probability that a discrete random variable is exactly equal to some value. ... In probability theory, the cumulative distribution function (abbreviated cdf) completely describes the probability distribution of a real-valued random variable, X. For every real number x, the cdf is given by where the right-hand side represents the probability that the random variable X takes on a value less than... In probability theory the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects as the outcome of the random trial when identical odds are... In probability theory and statistics, a median is a number dividing the higher half of a sample, a population, or a probability distribution from the lower half. ... In statistics, mode means the most frequent value assumed by a random variable, or occurring in a sampling of a random variable. ... In probability theory and statistics, the variance of a random variable (or somewhat more precisely, of a probability distribution) is a measure of its statistical dispersion, indicating how its possible values are spread around the expected value. ... Example of the experimental data with non-zero skewness (gravitropic response of wheat coleoptiles, 1,790) In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. ... The far red light has no effect on the average speed of the gravitropic reaction in wheat coleoptiles, but it changes kurtosis from platykurtic to leptokurtic (-0. ... Claude Shannon In information theory, the Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable. ... In probability theory and statistics, the moment-generating function of a random variable X is wherever this expectation exists. ... In probability theory, the characteristic function of any random variable completely defines its probability distribution. ... An urn problem is an idealized thought experiment in which some objects of real interest (such as atoms, people, cars, etc. ...


The properties of this distribution is given in the adjacent table, where c is the number of different colors and N=sum_{i=1}^{c} m_i is the total number of marbles.


See also

In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. ... Fishers exact test is a statistical significance test used in the analysis of categorical data where sample sizes are small. ... // In statistics, the hypergeometric distribution is the discrete probability distribution generated by picking colored balls at random from an urn without replacement. ... Sampling is that part of statistical practice concerned with the selection of individual observations intended to yield some knowledge about a population of concern, especially for the purposes of statistical inference. ... An urn problem is an idealized thought experiment in which some objects of real interest (such as atoms, people, cars, etc. ...

External links

  • Hypergeometric Distribution Calculator
Image:Bvn-small.png Probability distributionsview  talk  edit ]
Univariate Multivariate
Discrete: BenfordBernoullibinomialBoltzmanncategoricalcompound Poisson • discrete phase-type • degenerate • Gauss-Kuzmin • geometrichypergeometriclogarithmicnegative binomialparabolic fractalPoissonRademacherSkellamuniform • Yule-Simon • zetaZipf • Zipf-Mandelbrot Ewensmultinomialmultivariate Polya
Continuous: BetaBeta primeCauchychi-squareDirac delta function • Coxian • Erlangexponentialexponential powerFfading • Fermi-Dirac • Fisher's z • Fisher-Tippett • Gammageneralized extreme valuegeneralized hyperbolicgeneralized inverse Gaussian • Half-Logistic • Hotelling's T-square • hyperbolic secant • hyper-exponential • hypoexponential • inverse chi-square (scaled inverse chi-square)• inverse Gaussianinverse gamma (scaled inverse gamma) • KumaraswamyLandauLaplaceLévy • Lévy skew alpha-stable • logistic • log-normal • Maxwell-Boltzmann • Maxwell speedNakagaminormal (Gaussian) • normal-gamma • normal inverse Gaussian • ParetoPearson • phase-type • polarraised cosineRayleigh • relativistic Breit-Wigner • Riceshifted Gompertz • Student's t • triangulartruncated normal • type-1 Gumbel • type-2 Gumbel • uniform • Variance-Gamma • Voigtvon MisesWeibullWigner semicircleWilks' lambda Dirichlet • inverse-Wishart • Kentmatrix normalmultivariate normalmultivariate Student • von Mises-Fisher • Wigner quasi • Wishart
Miscellaneous: Cantorconditionalequilibriumexponential familyinfinitely divisible • location-scale family • marginalmaximum entropyposteriorprior • quasi • samplingsingular

  Results from FactBites:
 
Hypergeometric distribution - Wikipedia, the free encyclopedia (1071 words)
In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the number of successes in a sequence of n draws from a finite population without replacement.
The hypergeometric distribution describes the probability that in a sample of n distinctive objects drawn from the shipment exactly k objects are defective.
When the population size is large compared to the sample size (i.e., N is much larger than n) the hypergeometric distribution is approximated reasonably well by a binomial distribution with parameters n (number of trials) and p = D / N (probability of success in a single trial).
  More results at FactBites »

 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m