FACTOID # 16: In the 2000 Presidential Election, Texas gave Ralph Nader the 3rd highest popular vote count of any US state.

 Home Encyclopedia Statistics States A-Z Flags Maps FAQ About

 WHAT'S NEW

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

(* = Graphable)

Encyclopedia > Conjugate prior

In Bayesian probability theory, a class of prior probability distributions p(θ) is said to be conjugate to a class of likelihood functions p(x|θ) if the resulting posterior distributions p(θ|x) are in the same family as p(θ). For example, the Gaussian family is conjugate to itself (or self-conjugate): if the likelihood function is Gaussian, choosing a Gaussian prior will ensure that the posterior distribution is also Gaussian. The concept, as well as the term "conjugate prior", was introduced by Howard Raiffa and Robert Schlaifer in their work on Bayesian decision theory.[1] A similar concept had been discovered independently by George Alfred Barnard.[2] Bayesian probability is an interpretation of probability suggested by Bayesian theory, which holds that the concept of probability can be defined as the degree to which a person believes a proposition. ... A prior probability is a marginal probability, interpreted as a description of what is known about a variable in the absence of some evidence. ... Look up likelihood in Wiktionary, the free dictionary. ... The posterior probability can be calculated by Bayes theorem from the prior probability and the likelihood function. ... The normal distribution, also called Gaussian distribution by scientists (named after Carl Friedrich Gauss due to his rigorous application of the distribution to astronomical data (Havil, 2003)), is a continuous probability distribution of great importance in many fields. ... Howard Raiffa is the Frank P. Ramsey Professor (Emeritus) of Managerial Economics, a joint chair held by the Business School and the Kennedy School of Government at Harvard University. ... Robert Schlaifer (1915 - July 24, 1994) pioneer of Bayesian decision theory. ... George Alfred Barnard (September 23, 1915 - August 9, 2002) British statistician known particularly for his work on the foundations of statistics and on quality control. ...

Consider the general problem of inferring a distribution for a parameter θ given some datum or data x. From Bayes' theorem, the posterior distribution is calculated from the prior p(θ) and the likelihood function $theta mapsto p(xmidtheta)!$ as Bayes theorem (also known as Bayes rule or Bayes law) is a result in probability theory, which relates the conditional and marginal probability distributions of random variables. ... Look up likelihood in Wiktionary, the free dictionary. ...

$p(theta|x) = frac{p(x|theta) , p(theta)} {int p(x|theta) , p(theta) , dtheta}. !$

Let the likelihood function be considered fixed; the likelihood function is usually well-determined from a statement of the data-generating process. It is clear that different choices of the prior distribution p(θ) may make the integral more or less difficult to calculate, and the product p(x|θ) × p(θ) may take one algebraic form or another. For certain choices of the prior, the posterior has the same algebraic form as the prior (generally with different parameters). Such a choice is a conjugate prior.

A conjugate prior is an algebraic convenience: otherwise a difficult numerical integration may be necessary.

All members of the exponential family have conjugate priors. See Gelman et al.[3] for a catalog. In probability and statistics, an exponential family is any class of probability distributions having a certain form. ...

Example GA_googleFillSlot("encyclopedia_square");

For a random variable which is a Bernoulli trial with unknown probability of success q in [0,1], the usual conjugate prior is the beta distribution with A random variable is a mathematical function that maps outcomes of random experiments to numbers. ... In the theory of probability and statistics, a Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, called success and failure. ... In probability theory and statistics, the beta distribution is a continuous probability distribution with the probability density function (pdf) defined on the interval [0, 1]: where Î± and Î² are parameters that must be greater than zero and B is the beta function. ...

$p(q=x) = {x^{alpha-1}(1-x)^{beta-1} over Beta(alpha,beta)}$

where α and β are chosen to reflect any existing belief or information (α = 1 and β = 1 would give a uniform distribution) and Β(αβ) is the Beta function acting as a normalising constant. In mathematics, the uniform distributions are simple probability distributions. ... In theoretical physics, specifically quantum field theory, a beta-function Î²(g) encodes the dependence of a coupling parameter, g, on the energy scale, of a given physical process. ... The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics. ...

If we then sample this random variable and get s successes and f failures, we have

$P(s,f|q=x) = {s+f choose s} x^s(1-x)^f,$
$p(q=x|s,f) = {{{s+f choose s} x^{s+alpha-1}(1-x)^{f+beta-1} / Beta(alpha,beta)} over int_{y=0}^1 left({s+f choose s} y^{s+alpha-1}(1-y)^{f+beta-1} / Beta(alpha,beta)right) dy} = {x^{s+alpha-1}(1-x)^{f+beta-1} over Beta(s+alpha,f+beta)} ,$

which is another Beta distribution with a simple change to the parameters. This posterior distribution could then be used as the prior for more samples, with the parameters simply adding each extra piece of information as it comes.

Table of conjugate distributions

Discrete likelihood distributions

Likelihood Model parameters Conjugate prior distribution Prior hyperparameters Posterior hyperparameters
Bernoulli p (probability) Beta $alpha, beta!$ $alpha + sum_{i=1}^n x_i, beta + n - sum_{i=1}^n x_i!$
Binomial p (probability) Beta $alpha, beta!$ $alpha + sum_{i=1}^n x_i, beta + sum_{i=1}^n(N_i - x_i)!$
Poisson λ (rate) Gamma $alpha, beta!$ $alpha + sum_{i=1}^n x_i, beta + n!$
Multinomial p (probability vector) Dirichlet $vec{alpha}!$ $vec{alpha}+sum_{i=1}^nvec{x}^{,(i)}!$
Geometric p0 (probability) Beta $alpha, beta!$ $alpha + n, beta + sum_{i=1}^n x_i!$

In probability theory and statistics, the Bernoulli distribution, named after Swiss scientist Jakob Bernoulli, is a discrete probability distribution, which takes value 1 with success probability and value 0 with failure probability . ... In probability theory and statistics, the beta distribution is a continuous probability distribution with the probability density function (pdf) defined on the interval [0, 1]: where Î± and Î² are parameters that must be greater than zero and B is the beta function. ... In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. ... In probability theory and statistics, the beta distribution is a continuous probability distribution with the probability density function (pdf) defined on the interval [0, 1]: where Î± and Î² are parameters that must be greater than zero and B is the beta function. ... In probability theory and statistics, the Poisson distribution is a discrete probability distribution. ... In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions that represents the sum of exponentially distributed random variables, each of which has mean . ... In probability theory, the multinomial distribution is a generalization of the binomial distribution. ... Several images of the probability density of the Dirichlet distribution when K=3 for various parameter vectors Î±. Clockwise from top left: Î±=(6, 2, 2), (3, 7, 5), (6, 2, 6), (2, 3, 4). ... In probability theory and statistics, the geometric distribution is either of two discrete probability distributions: the probability distribution of the number X of Bernoulli trials needed to get one success, supported on the set { 1, 2, 3, ...}, or the probability distribution of the number Y = X âˆ’ 1 of failures before... In probability theory and statistics, the beta distribution is a continuous probability distribution with the probability density function (pdf) defined on the interval [0, 1]: where Î± and Î² are parameters that must be greater than zero and B is the beta function. ...

Continuous likelihood distributions

Likelihood Model parameters Conjugate prior distribution Prior hyperparameters Posterior hyperparameters
Uniform $U(0,theta)!$ Pareto $X_{m}, k!$ $max{,X_{(n)},X_{m}}, k+n!$
Exponential λ (rate) Gamma $alpha, beta!$ $alpha+n, beta+sum_{i=1}^n x_i!$
Normal
with known variance σ2
μ (mean) Normal $mu_0, sigma_0^2!$ $(frac{mu_0}{sigma_0^2} + frac{sum_{i=1}^n x_i}{sigma^2})/(frac{1}{sigma_0^2} + frac{n}{sigma^2}), (frac{1}{sigma_0^2} + frac{n}{sigma^2})^{-1}$
Normal
with known mean μ
σ2 (variance) Scaled inverse chi-square $nu, sigma^2!$ $nu+n,frac{nusigma^2+sum_{i=1}^n (x_i-mu)^2}{nu+n}!$
Normal
with known mean μ
τ (precision) Gamma $alpha, beta!$ $alpha+frac{n}{2},beta+frac{sum_{i=1}^n (x_i-mu)^2}{2}!$
Normal μ and σ2
Assuming dependence
Normal-Scaled inverse gamma $m, u, nu, sigma^2$ $frac{um+nbar{x}}{u+n},u+n, nu+n, frac{nusigma^2+(n-1)S^2}{nu+n}+frac{nu(bar{x}-m)^2}{(u+n)(nu+n)}$, where $bar{x}$ is the sample mean and S2 is the sample variance.
Normal μ and τ
Assuming dependence
Normal-gamma $m, u, alpha, beta$ $frac{um+nbar{x}}{u+n},u+n, alpha+frac{n}{2}, beta+frac{nS^2}{2}+frac{nu(bar{x}-m)^2}{2(n+u)}$, where $bar{x}$ is sample mean and S2 is the sample variance.
Multivariate normal with known covariance matrix μ (mean vector) Multivariate normal $mathbf{mu}_0, Sigma_0$ $left(Sigma_0^{-1} + nSigma^{-1}right)^{-1}left( Sigma_0^{-1}mu_0 + n Sigma^{-1} bar{x} right), left(Sigma_0^{-1} + nSigma^{-1}right)^{-1}$, where $bar{x}$ is the sample mean.
Multivariate normal Σ (variance matrix) inverse-Wishart
Pareto k (shape) Gamma $alpha, beta!$ $alpha+n, beta+sum_{i=1}^n lnfrac{x_i}{x_{mathrm{m}}}!$
Pareto xm (location) Pareto
Normal σ2 (variance) Inverse Gamma Distribution $mathbf{alpha, beta}$ $mathbf{alpha}+frac{n}{2} , mathbf{beta} + frac{sum_{i=1}^n{(x_i-mu)^2}}{2}$

In probability theory and statistics, the continuous uniform distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distributions support are equally probable. ... The Pareto distribution, named after the Italian economist Vilfredo Pareto, is a power law probability distribution found in a large number of real-world situations. ... In probability theory and statistics, the exponential distributions are a class of continuous probability distribution. ... In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions that represents the sum of exponentially distributed random variables, each of which has mean . ... The normal distribution, also called Gaussian distribution by scientists (named after Carl Friedrich Gauss due to his rigorous application of the distribution to astronomical data (Havil, 2003)), is a continuous probability distribution of great importance in many fields. ... The normal distribution, also called Gaussian distribution by scientists (named after Carl Friedrich Gauss due to his rigorous application of the distribution to astronomical data (Havil, 2003)), is a continuous probability distribution of great importance in many fields. ... The normal distribution, also called Gaussian distribution by scientists (named after Carl Friedrich Gauss due to his rigorous application of the distribution to astronomical data (Havil, 2003)), is a continuous probability distribution of great importance in many fields. ... The scaled inverse chi-square distribution arises in Bayesian statistics. ... The normal distribution, also called Gaussian distribution by scientists (named after Carl Friedrich Gauss due to his rigorous application of the distribution to astronomical data (Havil, 2003)), is a continuous probability distribution of great importance in many fields. ... In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions that represents the sum of exponentially distributed random variables, each of which has mean . ... The normal distribution, also called Gaussian distribution by scientists (named after Carl Friedrich Gauss due to his rigorous application of the distribution to astronomical data (Havil, 2003)), is a continuous probability distribution of great importance in many fields. ... The normal distribution, also called Gaussian distribution by scientists (named after Carl Friedrich Gauss due to his rigorous application of the distribution to astronomical data (Havil, 2003)), is a continuous probability distribution of great importance in many fields. ... ... The normal distribution, also called Gaussian distribution by scientists (named after Carl Friedrich Gauss due to his rigorous application of the distribution to astronomical data (Havil, 2003)), is a continuous probability distribution of great importance in many fields. ... In probability theory and statistics, the normal-gamma distribution is a four-parameter family of continuous probability distributions. ... In probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution, is a specific probability distribution, which can be thought of as a generalization to higher dimensions of the one-dimensional normal distribution (also called a Gaussian distribution). ... In probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution, is a specific probability distribution, which can be thought of as a generalization to higher dimensions of the one-dimensional normal distribution (also called a Gaussian distribution). ... In probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution, is a specific probability distribution, which can be thought of as a generalization to higher dimensions of the one-dimensional normal distribution (also called a Gaussian distribution). ... In statistics, the Inverse Wishart distribution, also the inverse Wishart distribution and inverted Wishart distribution is a probability density function defined on matrices. ... The Pareto distribution, named after the Italian economist Vilfredo Pareto, is a power law probability distribution found in a large number of real-world situations. ... In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions that represents the sum of exponentially distributed random variables, each of which has mean . ... The Pareto distribution, named after the Italian economist Vilfredo Pareto, is a power law probability distribution found in a large number of real-world situations. ... The Pareto distribution, named after the Italian economist Vilfredo Pareto, is a power law probability distribution found in a large number of real-world situations. ... The normal distribution, also called Gaussian distribution by scientists (named after Carl Friedrich Gauss due to his rigorous application of the distribution to astronomical data (Havil, 2003)), is a continuous probability distribution of great importance in many fields. ... The inverse gamma distribution has the probability density function over the support with shape parameter and scale parameter . ...

Notes

1. ^ Howard Raiffa and Robert Schlaifer. Applied Statistical Decision Theory. Division of Research, Graduate School of Business Administration, Harvard University, 1961.
2. ^ Jeff Miller et al. Earliest Known Uses of Some of the Words of Mathematics, "conjugate prior distributions". Electronic document, revision of November 13, 2005, retrieved December 2, 2005.
3. ^ Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data Analysis, 2nd edition. CRC Press, 2003. ISBN 1-58488-388-X

Results from FactBites:

 Conjugate prior - Wikipedia, the free encyclopedia (429 words) The concept, as well as the term "conjugate prior", was introduced by Howard Raiffa and Robert Schlaifer in their work on Bayesian decision theory. A conjugate prior is an algebraic convenience: otherwise a difficult numerical integration may be necessary. All members of the exponential family have conjugate priors.
 Meaning and role of the prior: many data limit versus frontier type measurements (1086 words) The beta distribution is the conjugate prior of the binomial distribution, i.e. Anyway, instead of playing blindly with mathematics, looking around for `objective' priors, or priors that come from abstract arguments, it is important to understand at once the role of prior and likelihood. Priors are logically important to make a `probably inversion' via the Bayes formula, and it is a matter of fact that no other route to probabilistic inference exists.
More results at FactBites »

 COMMENTARY Post Reply

Share your thoughts, questions and commentary here
 Your name Your comments

Want to know more?
Search encyclopedia, statistics and forums:

Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m