In statistics, a result is **significant** if it is unlikely to have occurred by chance, given that a presumed null hypothesis is true, but is not improbable if the null hypothesis is false. More precisely, in traditional frequentist statistical hypothesis testing, the **significance level** of a test is the maximum probability of accidentally rejecting a *true* null hypothesis (a decision known as a Type I error). The significance of a result is also called its p-value. For example, one may choose a significance level of, say, 5%, and calculate a *critical value* of a statistic (such as the mean) so that the probability of it exceeding that value, given the truth of the null hypothesis, would be 5%. If the actual, calculated statistic value exceeds the critical value, then it is **significant** "at the 5% level". If the significance level is smaller, a value will be less likely to be more extreme than the critical value. So a result which is "significant at the 1% level" is more significant than a result which is "significant at the 5% level". However a test at the 1% level is more likely to have a Type II error than a test at the 5% level, and so will have less statistical power. In devising a hypothesis test, the tester will aim to maximize power for a given significance, but ultimately have to recognise that the best which can be achieved is likely to be a balance between significance and power, in other words between the risks of Type I and Type II errors. |