One may be faced with the problem of making a definite decision with respect to an uncertain hypothesis which is known only through its observable consequences. A **statistical hypothesis test**, or more briefly, *hypothesis test*, is an algorithm to state the alternative (for or against the hypothesis) which minimizes certain risks. Look up Hypothesis in Wiktionary, the free dictionary. ...
This article describes the commonly used frequentist treatment of hypothesis testing. From the Bayesian point of view, it is appropriate to treat hypothesis testing as a special case of normative decision theory (specifically a model selection problem) and it is possible to accumulate evidence in favor of (or against) a hypothesis using concepts such as likelihood ratios known as Bayes factors. Statistical regularity has motivated the development of the relative frequency concept of probability. ...
Bayesian probability is an interpretation of probability suggested by Bayesian theory, which holds that the concept of probability can be defined as the degree to which a person believes a proposition. ...
Decision theory is an interdisciplinary area of study, related to and of interest to practitioners in mathematics, statistics, economics, philosophy, management and psychology. ...
Model selection is the task of selecting a mathematical model from a set of potential models, given evidence. ...
For other senses of this word, see evidence (disambiguation). ...
A likelihood-ratio test is a statistical test relying on a test statistic computed by taking the ratio of the maximum value of the likelihood function under the constraint of the null hypothesis to the maximum with that constraint relaxed. ...
In statistics, the use of Bayes factors is a Bayesian alternative to classical hypothesis testing. ...
There are several preparations we make before we observe the data. - The hypothesis must be stated in mathematical/statistical terms that make it possible to calculate the probability of possible samples assuming the hypothesis is correct. For example:
*The mean response to treatment being tested is equal to the mean response to the placebo in the control group. Both responses have the normal distribution with this unknown mean and the same known standard deviation ... (value).* - A test statistic must be chosen that will summarize the information in the sample that is relevant to the hypothesis. Such a statistic is known as a sufficient statistic. In the example given above, it might be the numerical difference between the two sample means,
**m**_{1} − m_{2}. - The distribution of the test statistic is used to calculate the probability sets of possible values (usually an interval or union of intervals). In this example, the difference between sample means would have a normal distribution with a standard deviation equal to the
**common standard deviation** times the factor where *n*_{1} and *n*_{2} are the sample sizes. - Among all the sets of possible values, we must choose one that we think represents the most extreme evidence
**against** the hypothesis. That is called the **critical region** of the test statistic. The probability of the test statistic falling in the critical region when the hypothesis is correct is called the **alpha** value (or **size**) of the test. - The probability that a sample falls in the critical region when the parameter is θ, where θ is for the alternative hypothesis, is called the
**power** of the test at θ. The **power function** of a critical region is the function that maps θ to the power of θ. After the data is available, the test statistic is calculated and we determine whether it is inside the critical region. The normal distribution, also called Gaussian distribution by scientists (named after Carl Friedrich Gauss due to his rigorous application of the distribution to astronomical data (Havil, 2003)) is a probability distribution of great importance in many fields. ...
In probability and statistics, the standard deviation of a probability distribution, random variable, or population or multiset of values is a measure of the spread of its values. ...
A statistic (singular) is the result of applying a statistical algorithm to a set of data. ...
In statistics, one often considers a family of probability distributions for a random variable X (and X is often a vector whose components are scalar-valued random variables, frequently independent) parameterized by a scalar- or vector-valued parameter, which let us call θ. A quantity T(X) that depends on...
If the test statistic is inside the critical region, then our conclusion is one of the following: - The hypothesis is incorrect, therefore reject the null hypothesis. (Therefore the critical region is sometimes called the
**rejection region**, while its complement is the **acceptance region**.) - An event of probability less than or equal to
*alpha* has occurred. The researcher has to choose between these logical alternatives. In the example we would say: the observed response to treatment is statistically significant. In statistics, a result is significant if it is unlikely to have occurred by chance, given that a presumed null hypothesis is true. ...
If the test statistic is outside the critical region, the only conclusion is that *There is not enough evidence to reject the hypothesis.* This is **not** the same as evidence in favor of the hypothesis. That we cannot obtain using these arguments, since lack of evidence against a hypothesis is not evidence for it. On this basis, statistical research progresses by eliminating error, not by *finding the truth*. ## Common test statistics
Name | Formula | Assumptions | One-sample z-test | | (Normal distribution **or** *n* ≥ 30) **and** σ known. (z is the distance from the mean in standard deviations. It is possible to calculate a minimum proportion of a population that falls within n standard deviations(see Chebyshev's inequality). The Z-test is a statistical test used in inference. ...
In probability theory, Chebyshevs inequality (also known as Tchebysheffs inequality, Chebyshevs theorem, or the BienaymÃ©-Chebyshev inequality), named after Pafnuty Chebyshev, who first proved it, states that in any data sample or probability distribution, nearly all the values are close to the mean value, and provides a...
| Two-sample z-test | | Normal distribution **and** independent observations **and** (σ₁ AND σ₂ known) | One-sample t-test | *d**f* = *n* − 1 A t-test is any statistical hypothesis test in which the test statistic has a Students t-distribution if the null hypothesis is true. ...
| (Normal population **or** *n* > 30) **and** σ unknown | Two-sample pooled t-test | *d**f* = *n*_{1} + *n*_{2} − 2 | (Normal populations **or** *n*₁ + *n*₂ > 40) **and** independent observations **and** σ₁ = σ₂ **and** (σ₁ and σ₂ unknown) | Two-sample unpooled t-test | **or** *d**f* = min{*n*_{1},*n*_{2}} | (Normal populations **or** *n*₁ + *n*₂ > 40) **and** independent observations **and** σ₁ ≠ σ₂ **and** (σ₁ **and** σ₂ unknown) | Paired t-test | *d**f* = *n* − 1 | (Normal population of differences **or** *n* > 30) **and** σ unknown | One-proportion z-test | | *np* > 10 **and** *n*(1 − *p*) > 10 | Two-proportion z-test, equal variances | | n₁p₁ > 5 AND *n*₁(1 − *p*₁) > 5 **and** *n*₂*p*₂ > 5 **and** *n*₂(1 − *p*₂) > 5 **and** independent observations | Two-proportion z-test, unequal variances | | *n*₁*p*₁ > 5 **and** *n*₁(1 − *p*₁) > 5 **and** *n*₂*p*₂ > 5 **and** *n*₂(1 − *p*₂) > 5 **and** independent observations | The statistics for some other tests have their own page on Wikipedia, including the Wald test and the likelihood ratio test. Under the Wald statistical test, named after Abraham Wald, the maximum likelihood estimate of the parameter(s) of interest is compared with the proposed value , with the assumption that the difference between the two will be approximately normal. ...
A likelihood-ratio test is a statistical test relying on a test statistic computed by taking the ratio of the maximum value of the likelihood function under the constraint of the null hypothesis to the maximum with that constraint relaxed. ...
## Criticism Some statisticians have commented that pure "significance testing" has what is actually a rather strange goal of detecting the existence of a "real" difference between two populations. In practice a difference can almost always be found given a large enough sample, what is typically the more relevant goal of science is a determination of causal effect size. The amount and nature of the difference, in other words, is what should be studied. Many researchers also feel that hypothesis testing is something of a misnomer. In practice a single statistical test in a single study rarely "proves" anything. Effect size is a measure of the strength of the relationship between two variables. ...
"A little thought reveals a fact widely understood among statisticians: The null hypothesis, taken literally (and that's the only way you can take it in formal hypothesis testing), is always false in the real world.... If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null hypothesis is always false, what's the big deal about rejecting it?" (Cohen 1990) In statistics, a null hypothesis is a hypothesis set up to be nullified or refuted in order to support an alternative hypothesis. ...
(The above criticism only applies to point hypothesis tests. If one were testing, for example, whether a parameter is greater than zero, it would not apply.) "... surely, God loves the .06 nearly as much as the .05." (Rosnell and Rosenthal 1989) "How has the virtually barren technique of hypothesis testing come to assume such importance in the process by which we arrive at our conclusions from our data?" (Loftus 1991) "Despite the stranglehold that hypothesis testing has on experimental psychology, I find it difficult to imagine a less insightful means of transiting from data to conclusions." (Loftus 1991) Even when you reject null hypothesis, effect sizes should be taken into consideration. If the effect is statistically significant but the effect size is very small, then it is a stretch to consider the effect theoretically important.
## See also The following tables provide guidance to the selection of the proper parametric or non-pametric tests for a given data set. ...
In statistics, the multiple comparisons problem tests null hypotheses stating that the averages of several disjoint populations are equal to each other (homogeneous). ...
Omnibus tests are a kind of statistical test. ...
In statistics, the Behrens-Fisher problem is the problem of interval estimation and hypothesis testing concerning the difference between the means of two normally distributed populations when the variances of the two populations are not assumed to be equal, based on two independent samples. ...
It has been suggested that this article or section be merged with Bootstrap (statistics). ...
This article or section does not cite its references or sources. ...
In science and the philosophy of science, falsifiability is the logical property of empirical statements, related to contingency and defeasibility, that they must admit of logical counterexamples. ...
In statistics, Fishers method is a data fusion or meta-analysis (analysis beyond analysis) technique for combining the results from a variety of independent tests bearing upon the same overall hypothesis (H0) as if in a single large test. ...
Look up test in Wiktionary, the free dictionary. ...
See: statistical significance significant figures This is a disambiguation page — a navigational aid which lists other pages that might otherwise share the same title. ...
In statistics, a null hypothesis is a hypothesis set up to be nullified or refuted in order to support an alternative hypothesis. ...
In statistical hypothesis testing, the p-value of a random variable T used as a test statistic is the probability that T will assume a value at least as extreme as the observed value tobserved, given that a null hypothesis being considered is true. ...
The theory of statistics includes a number of topics: Statistical models of the sources of data and typical problem formulation: Sampling from a finite population Measuring observational error and refining procedures Studying statistical relations Planning statistical research to measure and control observational error: Design of experiments to determine treatment effects...
In statistics, a result is significant if it is unlikely to have occurred by chance, given that a presumed null hypothesis is true. ...
In statistical hypothesis testing, a Type I error consists of rejecting a null hypothesis that is true, in other words finding a result to have statistical significance when this has in fact happened by chance. ...
In statistical hypothesis testing, a Type II error consists of failing to reject an invalid null hypothesis (i. ...
## External links |