FACTOID # 13: New York has America's lowest percentage of residents who are veterans.

 Home Encyclopedia Statistics States A-Z Flags Maps FAQ About

 WHAT'S NEW

SEARCH ALL

Search encyclopedia, statistics and forums:

(* = Graphable)

Encyclopedia > Hypergeometric distribution

Parameters Probability mass function Cumulative distribution function $Nin 1,2,3,dots,$ $min 0,1,dots,N,$ $nin 1,2,dots,N,$ $k in mathrm{max} lbrace 0,n+m-N rbrace,dots, mathrm{min} lbrace m,n rbrace ,$ ${{{m choose k} {{N-m} choose {n-k}}}over {N choose n}}$ $n mover N$ $left lfloor frac{(n+1)(m+1)}{N+2} right rfloor$ $n(m/N)(1-m/N)(N-n)over (N-1)$ $frac{(N-2m)(N-1)^frac{1}{2}(N-2n)}{[nm(N-m)(N-n)]^frac{1}{2}(N-2)}$ $left[frac{N^2(N-1)}{n(N-2)(N-3)(N-n)}right]$ $cdotleft[frac{N(N+1)-6N(N-n)}{m(N-m)}right.$ $+left.frac{3n(N-n)(N+6)}{N^2}-6right]$ In mathematics, the support of a real-valued function f on a set X is sometimes defined as the subset of X on which f is nonzero. ... In probability theory, a probability mass function (abbreviated pmf) gives the probability that a discrete random variable is exactly equal to some value. ... In probability theory, the cumulative distribution function (abbreviated cdf) completely describes the probability distribution of a real-valued random variable, X. For every real number x, the cdf is given by where the right-hand side represents the probability that the random variable X takes on a value less than... In probability theory the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects as the outcome of the random trial when identical odds are... In probability theory and statistics, a median is a number dividing the higher half of a sample, a population, or a probability distribution from the lower half. ... In statistics, mode means the most frequent value assumed by a random variable, or occurring in a sampling of a random variable. ... In probability theory and statistics, the variance of a random variable (or somewhat more precisely, of a probability distribution) is a measure of its statistical dispersion, indicating how its possible values are spread around the expected value. ... Example of the experimental data with non-zero skewness (gravitropic response of wheat coleoptiles, 1,790) In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. ... The far red light has no effect on the average speed of the gravitropic reaction in wheat coleoptiles, but it changes kurtosis from platykurtic to leptokurtic (-0. ... $frac{{N-m choose n}} {{N choose n}},_2F_1(-n,!-m;!N!-!m!-!n!+!1;!e^{t})$ $frac{{N-m choose n}} {{N choose n}},_2F_1(-n,!-m;!N!-!m!-!n!+!1;!e^{it})$

In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the number of successes in a sequence of n draws from a finite population without replacement. Claude Shannon In information theory, the Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable. ... In probability theory and statistics, the moment-generating function of a random variable X is wherever this expectation exists. ... In probability theory, the characteristic function of any random variable completely defines its probability distribution. ... Probability theory is the branch of mathematics concerned with analysis of random phenomena. ... A graph of a normal bell curve showing statistics used in educational assessment and comparing various grading methods. ... In mathematics and statistics, a probability distribution is a function of the probabilities of a mutually exclusive and exhaustive set of events. ...

A typical example is illustrated by this contingency table:
In statistics, contingency tables are used to record and analyse the relationship between two or more variables, most usually categorical variables. ...

drawn not drawn total
defective k mk m
non-defective nk N + k − n − m N − m
total n N − n N

There is a shipment of N objects in which m are defective. The hypergeometric distribution describes the probability that in a sample of n distinctive objects drawn from the shipment exactly k objects are defective. Sampling is that part of statistical practice concerned with the selection of individual observations intended to yield some knowledge about a population of concern, especially for the purposes of statistical inference. ...

In general, if a random variable X follows the hypergeometric distribution with parameters N, m and n, then the probability of getting exactly k successes is given by A random variable is a mathematical function that maps outcomes of random experiments to numbers. ...

$f(k;N,m,n) = {{{m choose k} {{N-m} choose {n-k}}}over {N choose n}}$

The probability is positive when k is between
max{0,n + mN} and min{m,n}.

The formula can be understood as follows: There are $tbinom{N}{n}$ possible samples (without replacement). There are $tbinom{m}{k}$ ways to obtain k defective objects and there are $tbinom{N-m}{n-k}$ ways to fill out the rest of the sample with non-defective objects.

The fact that the sum of the probabilities, as k runs through the range of possible values, is equal to 1, is essentially Vandermonde's identity from combinatorics. In combinatorial mathematics, Vandermondes identity, named after Alexandre-Théophile Vandermonde, states that This may be proved by simple algebra relying on the fact that (see factorial) but it also admits a more combinatorics-flavored bijective proof, as follows. ... Combinatorics is a branch of pure mathematics concerning the study of discrete (and usually finite) objects. ...

## Application and example

The classical application of the hypergeometric distribution is sampling without replacement. Think of an urn with two types of marbles, black ones and white ones. Define drawing a white marble as a success and drawing a black marble as a failure (analogous to the binomial distribution). If the variable N describes the number of all marbles in the urn (see contingency table above) and m describes the number of white marbles (called defective in the example above), then N − m corresponds to the number of black marbles.
Now, assume that there are 5 white and 45 black marbles in the urn. Standing next to the urn, you close your eyes and draw 10 marbles without replacement. What's the probability that you draw exactly 4 white marbles (and - of course - 6 black marbles) ? An urn problem is an idealized thought experiment in which some objects of real interest (such as atoms, people, cars, etc. ...

This problem is summarized by the following contingency table:

drawn not drawn total
white marbles 4 (k) 1 = 5 − 4 (mk) 5 (m)
black marbles 6 = 10 − 4 (nk) 39 = 50 + 4 − 10 − 5 (N + k − n − m) 45 (N − m)
total 10 (n) 40 (N − n) 50 (N)

The probability of drawing exactly x white marbles can be calculated by the formula

$Pr(k=x) = f(k;N,m,n) = {{{m choose k} {{N-m} choose {n-k}}}over {N choose n}}.$

Hence, in this example x = 4, calculate

$Pr(k=4) = f(4;50,5,10) = {{{5 choose 4} {{45} choose {6}}}over {50 choose 10}} = 0.003964583dots.$

So, the probability of drawing exactly 4 white marbles is quite low (approximately 0.004) and the event is very unlikely. It means, if you repeated your random experiment (drawing 10 marbles from the urn of 50 marbles without replacement) 1000 times you just would expect to obtain such a result 4 times.

But what about the probability of drawing all 5 white marbles? You will intuitively agree upon that this is even more unlikely than drawing 4 white marbles. Let us calculate the probability for such an extreme event.

The contingency table is as follows:

drawn not drawn total
white marbles 5 (k) 0 = 5 − 5 (m − k) 5 (m)
black marbles 5 = 10 − 5 (n − k) 40 = 50 + 5 − 10 − 5 (N + k − n − D) 45 (N − m)
total 10 (n) 40 (N − n) 50 (N)

And we can calculate the probability as follows (notice that the denominator always stays the same):

$Pr[k=5] = f(5;50,5,10) = {{{5 choose 5} {{45} choose {5}}}over {50 choose 10}} = 0.0001189375dots,$

As expected, the probability of drawing 5 white marbles is even much lower than drawing 4 white marbles.

## Symmetries

• f(k;N,m,n) = f(nk;N,Nm,n)

This symmetry can be intuitively understood if you repaint all the black marbles to white and vice versa, thus the black and white marbles simply change roles.

• f(k;N,m,n) = f(mk;N,m,Nn)

This symmetry can be intuitively understood as swapping the roles of taken and not taken marbles.

• f(k;N,m,n) = f(k;N,n,m)

This symmetry can be intuitively understood if instead of drawing marbles, you label the marbles that you would have drawn. Both expressions give the probability that exactly k marbles are "black" and labeled "drawn".

## Relationship to Fisher's exact test

The test (see above) based on the hypergeometric distribution (hypergeometric test) is identical to the corresponding one-tailed version of Fisher's exact test. Reciprocally, the p-value of a two-sided Fisher's exact test can be calculated as the sum of two appropriate hypergeometric tests (for more information see the following web site). Fishers exact test is a statistical significance test used in the analysis of categorical data where sample sizes are small. ...

## Related distributions

Let X ~ Hypergeometric(m, N, n) and p = m / N.

• If N and m are large compared to n and p is not close to 0 or 1, then $P[X le x] approx P[Y le x]$ where Y has a binomial distribution with parameters n and p.
• If n is large, N and m are large compared to n and p is not close to 0 or 1, then

$P[X le x] approx Phi left( frac{x-n p}{sqrt{n p (1-p)}} right)$ In probability theory and statistics, the Bernoulli distribution, named after Swiss scientist Jakob Bernoulli, is a discrete probability distribution, which takes value 1 with success probability and value 0 with failure probability . ... In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. ...

where Φ is the standard normal distribution function Probability density function of Gaussian distribution (bell curve). ...

## Multivariate hypergeometric distribution

Parameters Probability mass function Cumulative distribution function $c in mathbb{N}$ $(m_1,ldots,m_c) in mathbb{N}^c$ $N = sum_{i=1}^c m_i$ $n in [0,N]$ $left{ mathbf{k} in mathbb{Z}_{0+}^c , : , sum_{i=1}^{c} k_i = n right}$ $frac{prod_{i=1}^{c} binom{m_i}{k_i}}{binom{N}{n}}$ $E(X_i) = frac{n m_i}{N}$ $var(X_i) = frac{m_i}{N} left(1-frac{m_i}{N}right) n frac{N-n}{N-1}$ $cov(X_i,X_j) = -frac{n m_i m_j}{N^2} frac{N-n}{N-1}$

The model of an urn with black and white marbles can be extended to the case where there are more than two colors of marbles. If there are mi marbles of color i in the urn and you take n marbles at random without replacement, then the number of marbles of each color in the sample (k1,k2,...,kc) has the multivariate hypergeometric distribution. In mathematics, the support of a real-valued function f on a set X is sometimes defined as the subset of X on which f is nonzero. ... In probability theory, a probability mass function (abbreviated pmf) gives the probability that a discrete random variable is exactly equal to some value. ... In probability theory, the cumulative distribution function (abbreviated cdf) completely describes the probability distribution of a real-valued random variable, X. For every real number x, the cdf is given by where the right-hand side represents the probability that the random variable X takes on a value less than... In probability theory the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects as the outcome of the random trial when identical odds are... In probability theory and statistics, a median is a number dividing the higher half of a sample, a population, or a probability distribution from the lower half. ... In statistics, mode means the most frequent value assumed by a random variable, or occurring in a sampling of a random variable. ... In probability theory and statistics, the variance of a random variable (or somewhat more precisely, of a probability distribution) is a measure of its statistical dispersion, indicating how its possible values are spread around the expected value. ... Example of the experimental data with non-zero skewness (gravitropic response of wheat coleoptiles, 1,790) In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. ... The far red light has no effect on the average speed of the gravitropic reaction in wheat coleoptiles, but it changes kurtosis from platykurtic to leptokurtic (-0. ... Claude Shannon In information theory, the Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable. ... In probability theory and statistics, the moment-generating function of a random variable X is wherever this expectation exists. ... In probability theory, the characteristic function of any random variable completely defines its probability distribution. ... An urn problem is an idealized thought experiment in which some objects of real interest (such as atoms, people, cars, etc. ...

The properties of this distribution is given in the adjacent table, where c is the number of different colors and $N=sum_{i=1}^{c} m_i$ is the total number of marbles.

In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. ... Fishers exact test is a statistical significance test used in the analysis of categorical data where sample sizes are small. ... // In statistics, the hypergeometric distribution is the discrete probability distribution generated by picking colored balls at random from an urn without replacement. ... Sampling is that part of statistical practice concerned with the selection of individual observations intended to yield some knowledge about a population of concern, especially for the purposes of statistical inference. ... An urn problem is an idealized thought experiment in which some objects of real interest (such as atoms, people, cars, etc. ...

Results from FactBites:

 Hypergeometric distribution - Wikipedia, the free encyclopedia (1071 words) In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the number of successes in a sequence of n draws from a finite population without replacement. The hypergeometric distribution describes the probability that in a sample of n distinctive objects drawn from the shipment exactly k objects are defective. When the population size is large compared to the sample size (i.e., N is much larger than n) the hypergeometric distribution is approximated reasonably well by a binomial distribution with parameters n (number of trials) and p = D / N (probability of success in a single trial).
More results at FactBites »

Share your thoughts, questions and commentary here