FACTOID # 27: If you're itching to live in a trailer park, hitch up your home and head to South Carolina, where a whopping 18% of residences are mobile homes.
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 


FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:



(* = Graphable)



Encyclopedia > Prior distribution

A prior probability is a marginal probability, interpreted as a description of what is known about a variable in the absence of some evidence. The posterior probability is then the conditional probability of the variable taking the evidence into account. The posterior probability is computed from the prior and the likelihood function via Bayes' theorem.

As prior and posterior are not terms used in frequentist analyses, this article uses the vocabulary of Bayesian probability and Bayesian inference.

Throughout this article, for the sake of brevity the term variable encompasses observable variables, latent (unobserved) variables, parameters, and hypotheses.


Prior probability distribution

In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p (for example, suppose p is the proportion of voters who will vote for John Kerry) is the probability distribution that would express one's uncertainty about p before the "data" (for example, an opinion poll) are taken into account. It is meant to attribute uncertainty rather than randomness to the uncertain quantity.

One applies Bayes' theorem, multiplying the prior by the likelihood function and then normalizing, to get the posterior probability distribution, which is the conditional distribution of the uncertain quantity given the data.

A prior is often the purely subjective assessment of an experienced expert. Some will choose a conjugate prior when they can, to make calculation of the posterior distribution easier.

Informative priors

An informative prior expresses specific, definite information about a variable. An example is a prior distribution for the temperature at noon tomorrow. A reasonable approach is to make the prior a normal distribution with expected value equal to today's noontime temperature, with variance equal to the day-to-day variance of atmospheric temperature.

This example has a property in common with many priors, namely, that the posterior from one problem (today's temperature) becomes the prior for another problem (tomorrow's temperature); pre-existing evidence which has already been taken into account is part of the prior and as more evidence accumulates the prior is largely by the evidence rather than any original assumption, provided that the original assumption admitted the possibility of what the evidence is suggesting. The terms "prior" and "posterior" are generally relative to a specific datum or observation.

Uninformative priors

An uninformative prior expresses vague or general information about a variable. The term "uninformative prior" is a misnomer; such a prior might be called a not very informative prior. Uninformative priors can express information such as "the variable is positive" or "the variable is less than some limit".

The use of an uninformative prior typically yields results which are not too different from conventional statistical analysis, as the likelihood function often yields more information than the uninformative prior.

Some attempts have been made at finding probability distributions in some sense logically required by the nature of one's state of uncertainty; these are a subject of philosophical controversy. For example, Edwin T. Jaynes has published an argument [a reference here would be useful] based on Lie groups that if one is so uncertain about the value of the aforementioned proportion p that one knows only that at least one voter will vote for Kerry and at least one will not, then the conditional probability distribution of p given one's state of ignorance is the uniform distribution on the interval [0, 1].

Philosophical problems associated with uninformative priors are associated with the choice of an appropriate metric, or measurement scale. Suppose we want a prior for the running speed of a runner who is unknown to us. We could specify, say, a normal distribution as the prior for his speed, but alternatively we could specify a normal prior for the time he takes to complete 100 metres, which is proportional to the reciprocal of the first prior. These are very different priors, but it is not clear which is to be preferred. Similarly, if asked to estimate an unknown proportion between 0 and 1, we might say that all proportions are equally likely and use a uniform prior. Alternativley, we might say that all orders of magnitude for the proportion are equally likely, which gives a prior proportional to the logarithm. The Jeffreys prior attempts to solve this problem by computing a prior which expresses the same belief no matter which metric is used.

Improper priors

If Bayes' therorem is written as

then it is clear that it would remain true if all the prior probabilities P(Ai) and P(Aj) were multiplied by a given constant; the same would be true for a continuous random variable. The posterior probabilites will still sum (or integrate) to 1 even if the prior values do not, and so the priors only need be specified in the correct proportion.

Taking this idea further, in many cases the sum or integral of the prior values may not even need to be finite to get sensible answers for the posterior probabilities. When this is the case, the prior is called an improper prior. Some statisticians use improper priors as uninformative priors. For example, if they need a prior distribution for the mean and variance of a random variable, they may assume p(m,v)~1/v (for v>0) which would suggest that any value for the mean is equally likely and that a value for the positive variance becomes less likely in inverse proportion to its value. Since

this would be an improper prior both for the mean and for the variance.


  • Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data Analysis, 2nd edition. CRC Press, 2003.

  Results from FactBites:
Prior probability - Wikipedia, the free encyclopedia (1485 words)
As prior and posterior are not terms used in frequentist analyses, this article uses the vocabulary of Bayesian probability and Bayesian inference.
And in the continuous case, the maximum entropy prior given that the density is normalized with mean zero and variance unity is the standard normal distribution.
We could specify, say, a normal distribution as the prior for his speed, but alternatively we could specify a normal prior for the time he takes to complete 100 metres, which is proportional to the reciprocal of the first prior.
Procedures - Posterior Distribution - Overview (435 words)
In this case, the prior distribution is often taken as the observed distribution of scores for the full sample of students, or some subset of the sample of which the individual student is a member.
The three Panels (A, B, and C) reflect the prior distribution (A) (which is the same for both tests), the measurement likelihood that constitutes the sample information (B), and posterior (C) distributions.
Notice that the distributions change depending on the mix of items on the test--the prior distribution is less influential when there is more information from the sample.
  More results at FactBites »



Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m