 FACTOID # 8: Bookworms: Vermont has the highest number of high school teachers per capita and third highest number of librarians per capita.

 Home Encyclopedia Statistics States A-Z Flags Maps FAQ About

 WHAT'S NEW

SEARCH ALL

Search encyclopedia, statistics and forums:

(* = Graphable)

Encyclopedia > List of statistical topics

The "Related changes" link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most recent changes to this page, see the page history.

See also the list of probability topics, and the list of statisticians. This is a list of probability topics, by Wikipedia page. ... Statisticians or people who made notable contributions to the theories of statistics, or related aspects of probability, or machine learning: // Odd Olai Aalen (1947â€“) Gottfried Achenwall (1719â€“1772) Abraham Manie Adelstein (1916â€“1992) John Aitchison (1926â€“) Alexander Aitken (1895â€“1967) Aleyamma George Hirotsugu Akaike (1927â€“) Oskar Anderson (1887â€“1960) Peter...

Contents: Top - 0–9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

The absolute deviation of an element of a data set is the absolute difference between that element and a given point. ... â€œAccuracyâ€ redirects here. ... In classical (frequentist) decision theory, an admissible decision rule is a rule for making a decision that is better in some sense than any other rule that may compete with it. ... The Akaike information criterion (AIC) (pronounced ah-kah-ee-keh), developed by Hirotsugu Akaike in 1971 and proposed in Akaike (1974), is a measure of the goodness of fit of an estimated statistical model. ... Algorithms for calculating variance play a minor role in statistical computing. ... The Allan variance, named after David W. Allen, also known as two-sample variance, is a measurement of accuracy in clocks. ... 80 4-point near-alignments of 137 random points Statistics shows that if you put a large number of random points on a bounded flat surface you can find many alignments of random points. ... These are statistical procedures which can be used to analyse categorical data: regression analysis of variance linear modeling log-linear modeling logistic regression repeated measures analysis simple correspondence analysis multiple correspondence analysis contingency table Burt table binary table frequency table chi-square statistics odds ratios correlation statistics Fishers exact... In statistics, analysis of rhythmic variance (ANORVA) is a new simple method for detecting rhythms in biological time series, published by Peter Celec (Biol Res. ... In statistics, analysis of variance (ANOVA) is a collection of statistical models and their associated procedures which compare means by splitting the overall observed variance into different parts. ... In statistics, an ancillary statistic is a statistic whose probability distribution does not depend on which of the probability distributions among those being considered is the distribution of the statistical population from which the data were taken. ... ANCOVA, or analysis of covariance is an old-fashioned name for a linear regression model with one continuous explanatory variable and one or more factors. ... ASCA, ANOVA-SCA, or analysis of variance â€“ simultaneous component analysis is a method that partitions variation and enables interpretation of these partitions by SCA, a method that is similar to PCA. This method is a multi or even megavariate extension of ANOVA. The variation partitioning is similar to Analysis of... An anomaly time series is the time series of deviations of a quantity from some mean. ... Approximate Bayesian computation (ABC) is a family of computational techniques in Bayesian statistics. ... In Survival analysis, the Area compatibility factor, F, is used in Indirect Standardisation of population mortality rates. ... In mathematics and statistics, the arithmetic mean (or simply the mean) of a list of numbers is the sum of all the members of the list divided by the number of items in the list. ... A plot showing 100 random numbers with a hidden sine function, and an autocorrelation of the series on the bottom. ... Autocorrelation is a mathematical tool used frequently in signal processing for analysing functions or series of values, such as time domain signals. ... In econometrics, an autoregressive conditional heteroskedasticity (ARCH, Engle (1982)) model considers the variance of the current error term to be a function of the variances of the previous time periods error terms. ... In statistics, an autoregressive integrated moving average (ARIMA) model is a generalisation of an autoregressive moving average or (ARMA) model. ... In statistics, autoregressive moving average (ARMA) models, sometimes called Box-Jenkins models after George Box and G. M. Jenkins, are typically applied to time series data. ...

## B

The Balding-Nichols model is a statistical description of the allele frequencies in the components of a sub-divided population. ... Statistics are very important to baseball, perhaps as much as they are for cricket, and more than almost any other sport. ... In statistics, Basus theorem states that any complete sufficient statistic is independent of any ancillary statistic. ... Bayes theorem (also known as Bayes rule or Bayes law) is a result in probability theory, which relates the conditional and marginal probability distributions of random variables. ... Thomas Bayes (c. ... In statistics, the use of Bayes factors is a Bayesian alternative to classical hypothesis testing. ... Bayesian inference is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true. ... In statistics, Bayesian linear regression is a Bayesian alternative to the more well-known ordinary least-squares linear regression. ... The posterior probability of a model given data, P(H|D), is given by Bayes theorem: P(H|D) = P(D|H)P(H)/P(D) The key data_dependent term P(D|H) is a likelihood, and is sometimes called the evidence for model H; evaluating it correctly is the... A Bayesian network (or a belief network) is a probabilistic graphical model that represents a set of variables and their probabilistic independencies. ... Bayesian search theory is the application of Bayesian statistics to the search for lost objects. ... In statistics, the Behrens-Fisher problem is the problem of interval estimation and hypothesis testing concerning the difference between the means of two normally distributed populations when the variances of the two populations are not assumed to be equal, based on two independent samples. ... Belief propagation is an iterative algorithm for computing marginals of functions on a graphical model most commonly used in artificial intelligence and information theory. ... In statistics, Bessels correction, named after Friedrich Bessel, is the use of n âˆ’ 1 instead of n when estimating variance, where n is the number of observations in a sample. ... In empirical Bayes methods, the Beta-binomial model is an analytic model where the likelihood function is specifed by a binomial distribution and the conjugate prior is a Beta distribution // It is convenient to reparameterize the distributions so that the expected mean of the prior is a single parameter: Let... In probability theory and statistics, the beta distribution is a continuous probability distribution with the probability density function (pdf) defined on the interval [0, 1]: where Î± and Î² are parameters that must be greater than zero and B is the beta function. ... The Bhattacharya coefficient is an approximate measurement of the amount of overlap between two statistical samples. ... In statistics, the term bias is used for two different concepts. ... A biased sample is one that is falsely taken to be typical of a population from which it is drawn. ... Allan Birnbaum (May 27, 1923 - July 1, 1976) was an American statistician who contributed to statistical inference, foundations of statistics, statistical genetics, statistical psychology, and history of statistics. ... In probability theory, Chebyshevs inequality (also known as Tchebysheffs inequality, Chebyshevs theorem, or the BienaymÃ©-Chebyshev inequality), named after Pafnuty Chebyshev, who first proved it, states that in any data sample or probability distribution, nearly all the values are close to the mean value, and provides a... Binary classification is the task of classifying the members of a given set of objects into two groups on the basis of whether they have some property or not. ... In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. ... In statistics, the binomial test is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories. ... In combinatorial mathematics, a block design (more fully, a balanced incomplete block design) is a particular kind of set system, which has long-standing applications to experimental design (an area of statistics) as well as purely combinatorial aspects. ... In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups (blocks) which are similar to one another. ... In statistics, the Bonferroni correction states that if an experimenter is testing n independent hypotheses on a set of data, then the statistical significance level that should be used for each hypothesis separately is 1/n times what it would be if only one hypothesis were tested. ... Bootstrap aggregating (bagging) is a meta-algorithm to improve classification and regression models in terms of stability and classification accuracy. ... In statistics, bootstrapping is a modern, computer intensive, general purpose approach to statistical inference, falling within a broader class of resampling methods. ... Data is taken to be either a scalar number, a vector or a matrix. ... In statistics, the Box-Cox transformation of the variable Y given the Box-Cox parameter &#955; &#8805; 0 is defined as This transformation has proved popular in regression analysis, including econometrics. ... Figure 1. ... Leo Breiman (January 27, 1928â€“July 7, 2005) was a distinguished statistician at the University of California, Berkeley. ... In statistics, the Breusch-Pagan test is used to test for heteroskedasticity in a linear regression model. ... Ladislaus Josephovich Bortkiewicz (August 7, 1868 - July 15, 1931) was a Russian economist and statistician of Polish descent. ... Business statistics is the science of good decision making in the face of uncertainty and is used in many disciplines such as financial analysis, econometrics, auditing, production and operations including services improvement, and marketing research. ...

## D

The introduction to this article provides insufficient context for those unfamiliar with the subject matter. ... Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often proximity according to some defined distance measure. ... Kurt Thearling, An Introduction to Data Mining (also available is a corresponding online tutorial) Dean Abbott, I. Philip Matkovsky, and John Elder IV, Ph. ... In statistics, a data point is a single typed measurement. ... A data set (or dataset) is a collection of data, usually presented in tabular form. ... In statistics, data transformation is carried in order to transform the data and assure that it has a normal distribution (a remedy for outliers, failures of normality, linearity, and homoscedasticity). ... In probability theory, de Finettis theorem explains why exchangeable observations are conditionally independent given some (usually) unobservable quantity to which an epistemic probability distribution would then be assigned. ... Decision theory is an area of study of discrete mathematics that models human decision-making in science, engineering and indeed all human social activities. ... This article or section is in need of attention from an expert on the subject. ... In statistics, the delta method is a method for deriving an approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator. ... In statistics, Deming regression, named after W. Edwards Deming, is a method of linear regression that finds a line of best fit for a set of related data. ... Demographics refers to selected population characteristics as used in government, marketing or opinion research, or the demographic profiles used in such research. ... Map of countries by population Population growth showing projections for later this century Demography is the statistical study of human populations. ... Among the kinds of data that national leaders need are the demographic statistics of their population. ... In probability and statistics, density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function. ... A design matrix is a matrix that is used in the certain statistical models, e. ... Descriptive statistics are used to describe the basic features of the data in a study. ... The first statistician to consider a methodology for the design of experiments was Sir Ronald A. Fisher. ... Detection theory, or signal detection theory, is a means to quantify the ability to discern between signal and noise. ... In statistics, deviance is a quantity whose expected values can be used for statistical hypothesis testing. ... The DIC (Deviance Information Criteria) is a hierarchical modeling generalization of the AIC (Akaike Information Criteria). ... In statistics, the Dickey-Fuller test tests whether a unit root is present in an autoregressive model. ... In statistics, dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction. ... Circular or directional statistics is the subdiscipline of statistics that deals with circular or directional data. ... Discrete choice analysis is a statistical technique. ... In mathematics, a probability distribution assigns to every interval of the real numbers a probability, so that the probability axioms are satisfied. ... In regression analysis, a dummy variable is one that takes the values 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome. ... In statistics, Duncans new Multiple Range Test (MRT) is a multiple comparison procedure developed by David B Duncan in 1955. ... In gambling a Dutch book or lock is a set of odds and bets which guarantees a profit, no matter what the outcome of the gamble. ...

## E

An ecological correlation is a correlation between two variables that are group means, in contrast to a correlation between two variables that describe individuals. ... The ecological fallacy is a widely recognised error in the interpretation of statistical data, whereby inferences about the nature of individuals are based solely upon aggregate statistics collected for the group to which those individuals belong. ... Econometrics literally means economic measurement. It is the branch of economics that applies statistical methods to the empirical study of economic theories and relationships. ... The Edgeworth series or Gram-Charlier A series, named in honor of Francis Ysidro Edgeworth, are series that approximate a probability distribution in terms of its cumulants. ... In statistics, effect size is a measure of the strength of the relationship between two variables. ... In statistics, efficiency is one measure of desirability of an estimator. ... In statistics, empirical Bayes methods involve: An underlying probability distribution of some unobservable quantity assigned to each member of a statistical population. ... In statistics, an empirical distribution function is a cumulative probability distribution function that concentrates probability 1/n at each of the n numbers in a sample. ... Suppose is a sample space of observations. ... Energy statistics refers to collecting, compiling, analyzing and disseminating data on commodities such as coal, crude oil, natural gas, electricity, or renewable energy sources (biomass, geothermal, wind or solar energy), when they are used for the energy they contain. ... This article is one of a group being considered for deletion in accordance with Wikipedias deletion policy. ... Full name Tore Olaus Engset born 1865­, died 1943. ... Agner Krarup Erlang (January 1, 1878&#8211;February 3, 1929) was a Danish mathematician, statistician, and engineer who invented the fields of queueing theory and traffic engineering. ... In statistics and optimization, the concepts of error and residual are easily confused with each other. ... Errors-in-Variables is a robust modeling technique in statistics, which assumes that every variable can have error or noise. ... Estimation is the calculated approximation of a result which is usable even if input data may be incomplete, uncertain, or noisy. ... Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data. ... In multivariate statistics, the importance of the Wishart distribution stems in part from the fact that it is the probability distribution of the maximum likelihood estimator of the covariance matrix of a multivariate normal distribution. ... In statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population parameter; an estimate is the result from the actual application of the function to a particular set of data. ... In population genetics, Ewenss sampling formula, introduced by Warren Ewens, states that under certain conditions (specified below), if a random sample of n gametes is taken from a population and classified according to the gene at a particular locus then the probability that there are a1 alleles represented once... An exact (significance) test is a test where all assumptions that the derivation of the distribution of the test statistic is based on are met. ... In probability theory the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects as the outcome of the random trial when identical odds are... In statistical computing, an expectation-maximization (EM) algorithm is an algorithm for finding maximum likelihood estimates of parameters in probabilistic models, where the model depends on unobserved latent variables. ... Experimental research designs are used for the controlled testing of causal processes. ... In statistics, an explained sum of squares (ESS) is the sum of squared predicted values in a standard regression model (for example ), where is the response variable, is the explanatory variable, and are coefficients, indexes the observations from to , and is the error term. ... In statistics, an explanatory variable (also regressor or independent variable) is a variable in a regression model which appears on the right hand side of the equation. ... Exploratory data analysis (EDA) is that part of statistical practice concerned with reviewing, communicating and using data where there is a low level of knowledge about its cause system. ... In probability theory and statistics, the exponential distributions are a class of continuous probability distribution. ... In probability and statistics, an exponential family is any class of probability distributions having a certain form. ... In statistics, exponential smoothing refers to a particular type of moving average technique applied to time series data, either to produce smoothed data for presentation, or to make forecasts. ... Extreme value theory is a branch of statistics dealing with the extreme deviations from the median of probability distributions. ...

## F

Failure rate is the frequency with which an engineered system or component fails, expressed for example in failures per hour. ... In statistics and probability, the F-distribution is a continuous probability distribution. ... An F-test is any statistical test in which the test statistic has an F-distribution if the null hypothesis is true. ... Factor analysis is a statistical data reduction technique used to explain variability among observed random variables in terms of fewer unobserved random variables called factors. ... In statistics, a factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or levels, and whose experimental units take on all possible combinations of these levels across all such factors. ... Coin flipping or coin tossing is the practice of throwing a coin in the air to resolve a dispute between two parties. ... False discovery rate (FDR) control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. ... Type I errors (or Î± error, or false positive) and type II errors (Î² error, or a false negative) are two terms used to describe statistical errors. ... Type I errors (or Î± error, or false positive) and type II errors (Î² error, or a false negative) are two terms used to describe statistical errors. ... In statistics, familywise error rate (FWER) is the probability of making one or more false discoveries, or type I errors among all the hypotheses when performing multiple pairwise tests. // The m specific hypotheses of interest are assumed to be known in advance, but the numbers of true null... In applied statistics, the file drawer problem results from the fact that academics tend not to publish results that indicate the null hypothesis could not be rejected. ... In statistics and information theory, the Fisher information (denoted ) is the variance of the score. ... Sir Ronald Aylmer Fisher, FRS (17 February 1890 â€“ 29 July 1962) was an English statistician, evolutionary biologist, and geneticist. ... Fishers exact test is a statistical significance test used in the analysis of categorical data where sample sizes are small. ... Linear discriminant analysis (LDA), is sometimes known as Fishers linear discriminant, after its inventor, Ronald A. Fisher, who published it in The Use of Multiple Measures in Taxonomic Problems (1936). ... In statistics, Fishers method is a data fusion or meta-analysis (analysis after analysis) technique for combining the results from a variety of independent tests bearing upon the same overall hypothesis (H0) as if in a single large test. ... In statistics, hypotheses about the value of r, the correlation coefficient between variables x and y of the underlying population, can be tested using the Fisher transformation applied to r. ... It has been suggested that this article or section be merged with fixed effects model. ... // Fleiss kappa is a generalisation of Scotts pi statistic, a statistical measure of inter-rater reliability. ... In statistics a forecast error is the difference between the actual/real and the predicted/forecast value of a time series. ... A forest plot is a graph displaying the results of multiple studies in a meta-analysis. ... In statistics, fractional factorial designs are experimental designs consisting of a carefully chosen subset (fraction) of the experimental runs of a full factorial design. ... Freedman-Diaconis rule is used to specify the number of bins to be used in a histogram. ... In statistics, a frequency distribution is a list of the values that a variable takes in a sample. ... Statistical regularity has motivated the development of the relative frequency concept of probability. ... Functional data analysis is a series of techniques in statistics for characterizing a series of data points as a single piece of data. ...

## G

In statistics, G-tests are likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-square tests were previously recommended. ... The Galton-Watson process is a stochastic process arising from Francis Galtons statistical investigation of the extinction of surnames. ... Galtonâ€™s problem, named after Sir Francis Galton, is the problem of drawing inferences from cross-cultural data, due to the statistical phenomenon now called autocorrelation. ... In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. ... This article is not about Gauss-Markov processes. ... In statistics, the generalized canonical correlation analysis (gCCA), is a way of making sense of cross-correlation matrices between the sets of random variables when there are more than two sets. ... In statistics, the generalized linear model (GLM) is a useful generalization of ordinary least squares regression. ... The generalized method of moments is a very general statistical method for obtaining estimates of parameters of statistical models. ... There are very few or no other articles that link to this one. ... This article or section is in need of attention from an expert on the subject. ... In mathematics and physics, Gibbs sampling is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables. ... Graphical representation of the Gini coefficient The Gini coefficient is a measure of inequality of income distribution or inequality of wealth distribution. ... Goodâ€“Turing Frequency Estimation is a statistical technique for predicting the probability of occurrence of objects belonging to an unknown number of species, given past observations of such objects and their species. ... Goodness of fit means how well a statistical model fits a set of observations. ... William Sealy Gosset (June 13, 1876 &#8211; October 16, 1937) was a chemist and statistician, better known by his pen name Student. ... An n×n Graeco-Latin square is a table, each cell of which contains a pair of symbols, composed of a symbol from each of two sets of n elements. ... In probability theory and statistics, a graphical model (GM) represents dependencies among random variables by a graph in which each random variable is a node. ...

## H

In economics, the Herfindahl index is a measure of the size of firms in relationship to the industry and an indicator of the amount of competition among them. ... In statistics, Halton sequences are well-known quasi-random sequences, first introduced in 1960 as an alternative to pseudo-random number sequences. ... In statistics, the Hannan-Quinn information criterion (HQC) is an alternative to Akaike Information Criterion (AIC) and Bayesian information criterion (BIC). ... The Hausman specification test is the first easy method allowing scientists to evaluate if their statistical models correspond to the data. ... The hazard ratio in survival analysis is a summary of the difference between two survival curves, representing the reduction in the risk of death on treatment compared to control, over the period of follow-up. ... In statistics, a sequence or a vector of random variables is heteroskedastic if the random variables in the sequence or vector may have different variances. ... In statistics, a frequent assumption in linear regression is that the disturbances ui have the same variance. ... State transitions in a hidden Markov model (example) x â€” hidden states y â€” observable outputs a â€” transition probabilities b â€” output probabilities A hidden Markov model (HMM) is a statistical model in which the system being modeled is assumed to be a Markov process with unknown parameters, and the challenge is to... Hierarchical linear modeling (HLM), also known as multi-level analysis, is a more advanced form of simple linear regression and multiple linear regression. ... For the histogram used in digital image processing, see Color histogram. ... In statistics, the Holm-Bonferroni method  performs more than one hypothesis test simultaneously. ... This article or section is in need of attention from an expert on the subject. ... In statistics, a sequence or a vector of random variables is homoscedastic if all random variables in the sequence or vector have the same finite variance. ... In statistics, Hotellings T-square statistic, named for Harold Hotelling, is a generalization of Students t statistic that is used in multivariate hypothesis testing. ... The Howland will forgery trial was a US court case in 1868 to decide Henrietta Howland Robinsons contest of the will of Sylvia Ann Howland. ... In econometrics, Huber-White standard errors are standard errors that are adjusted for correlations of error terms across observations, especially in panel and survey data as well as data with cluster structure. ... The Hubbert curve, named after the geophysicist M. King Hubbert, is the derivative of the logistic curve. ...

## I

Here is an illustration of the central limit theorem. ... There is also an imputation disambiguation page. ... Independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents supposing the mutual statistical independence of the non-Gaussian source signals. ... In probability theory, a sequence or other collection of random variables is independent and identically distributed (i. ... For probability distributions having an expected value and a median, the mean (i. ... It has been suggested that this article or section be merged with statistical inference. ... The information bottleneck method is a technique for finding the best trade-off between accuracy and compression when summarizing (e. ... This article or section is in need of attention from an expert on the subject. ... In statistics, an instrumental variable (IV, or instrument) can be used in regression analysis to produce a consistent estimator when the explanatory variables (covariates) are correlated with the error terms. ... A graphical depiction of a statistical interaction in which the extent to which experience impacts cost depends on decision time. ... In statistics, the interclass correlation (or interclass correlation coefficient) measures a bivariate relation among variables. ... In statistics, interclass dependence (or class interdependence) means that the occurrence of one class is probabilistically dependent on other classes that may occur in the same space. ... In descriptive statistics, the interquartile range (IQR), also called the midspread and middle fifty is the range between the third and first quartiles and is a measure of statistical dispersion. ... Inter-rater reliability or Inter-rater agreement is the measurement of agreement between raters. ... In statistics, interval estimation is the use of sample data to calculate an interval of possible (or probable) values of an unknown population parameter. ... An intervening variable is a hypothetical construct that attempts to explain relationships between variables, and especially the relationships between independent variables and dependent variables. ... In statistics, the intraclass correlation (or the intraclass correlation coefficient) is a measure of correlation, consistency or conformity for a data set when it has multiple groups. ... In statistics, the Inverse Wishart distribution, also the inverse Wishart distribution and inverted Wishart distribution is a probability density function defined on matrices. ... Inverse transform sampling , also known as the probability integral transform, is a method of sampling a number at random from any probability distribution given its cumulative distribution function (cdf). ... Item response theory (IRT) is a body of related psychometric theory that provides a foundation for scaling persons and items based on responses to assessment items. ... The method of iteratively re-weighted least squares (IRLS) is a numerical algorithm for minimizing any specified objective function using a standard weighted least squares method such as Gaussian elimination. ...

## J

The James-Stein estimator is a nonlinear estimator which can be shown to dominate, or outperform, the ordinary (least squares) estimator. ... In statistics, the Jarque-Bera test is a goodness-of-fit measure of departure from normality, based on the sample kurtosis and skewness. ... In Bayesian probability, the Jeffreys prior is a noninformative prior distribution proportional to the square root of the Fisher information: and is invariant under reparameterization of . ...

## K

The Kaplan-Meier estimator (also known as the Product Limit Estimator) estimates the survival function from life-time data. ... Cohens kappa coefficient is a statistical measure of inter-rater reliability. ... A kappa statistic is a measure of degree of nonrandom agreement between observers and/or measurements of a specific categorical variable. ... The Kendall tau distance is a metric that counts the number of pairwise disagreements between two lists. ... The Kendall tau rank correlation coefficient (or simply the Kendall tau coefficient, Kendalls Ï„ or Tau test(s)) is used to measure the degree of correspondence between two rankings and assessing the significance of this correspondence. ... The 5-parameter Fisher-Bingham distribution or Kent distribution is a probability distribution on the three-dimensional sphere. ... A Kernel is a weighting function used in non-parametric estimation techniques. ... In statistics, the Kolmogorov-Smirnov test (often called the K-S test) is used to determine whether two underlying probability distributions differ, or whether an underlying probability distribution differs from a hypothesized distribution, in either case based on finite samples. ... Kriging is group of geostatistical techniques to interpolate the value of a random field (e. ... In statistics, the Kruskal-Wallis one-way analysis of variance by ranks (named after William Kruskal and Allen Wallis) is a non-parametric method. ... In statistics, Kuipers test is closely related to the more well-known Kolmogorov-Smirnov test (or K-S test as it is often called). ... In probability theory and information theory, the Kullback-Leibler divergence (or information divergence, or information gain, or relative entropy) is a natural distance measure from a true probability distribution P to an arbitrary probability distribution Q. Typically P represents data, observations, or a precise calculated probability distribution. ... The far red light has no effect on the average speed of the gravitropic reaction in wheat coleoptiles, but it changes kurtosis from platykurtic to leptokurtic (-0. ...

## M

In statistics, M-estimators are a type of estimator whose properties are quite well-known. ... As a broad subfield of artificial intelligence, machine learning is concerned with the design and development of algorithms and techniques that allow computers to learn. At a general level, there are two types of learning: inductive, and deductive. ... In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936. ... Main effect is the term used in research methods for the effect that is produced by the average of an independent variable that has been produced over another independent variable. ... Majorization is a mathematical relation. ... In statistics, the Mann-Whitney U test (also called the Mann-Whitney-Wilcoxon (MWW), Wilcoxon rank-sum test, or Wilcoxon-Mann-Whitney test) is a non-parametric test for assessing whether two samples of observations come from the same distribution. ... Multivariate analysis of variance (MANOVA) is an extension of analysis of variance (ANOVA) methods to cover cases where there is more than one dependent variable and where the dependent variables cannot simply be combined. ... The Mantel test is a statistical test of the correlation between two matrices. ... In statistics, MAP estimates come from maximizing the likelihood function multiplied by an a priori distribution. ... The top portion of this graphic depicts probability densities (for a binomial distribution) that show the relative likelihood that the true percentage is in a particular area given a reported percentage of 50%. The bottom portion of this graphic shows the margin of error, the corresponding zone of 95% confidence. ... In probability theory, given two jointly distributed random variables X and Y, the marginal distribution of X is simply the probability distribution of X ignoring information about Y, typically calculated by summing or integrating the joint probability distribution over Y. For discrete random variables, the marginal probability mass function can... In Bayesian probability theory, a marginal likelihood function is a likelihood function integrated over some variables, typically model parameters. ... In statistics, marginal models (Heagerty & Zeger, 2000) are a technique for obtaining regression estimates in multilevel modeling, also called hierarchical linear models. ... Markov chain geostatistics applies Markov chains in geostatistics for conditional simulation on sparse observed data; see Li et al. ... Markov chain Monte Carlo (MCMC) methods (which include random walk Monte Carlo methods) are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its stationary distribution. ... It is possible to model mathematically the progress of most infectious diseases to discover the likely outcome of an epidemic or to help manage them by vaccination. ... Mathematical statistics uses probability theory and other branches of mathematics to study statistics from a purely mathematical standpoint. ... Maximum likelihood estimation (MLE) is a popular statistical method used to make inferences about parameters of the underlying probability distribution from a given data set. ... Maximum parsimony, often simply referred to as parsimony, is a non-parametric statistical method commonly used in computational phylogenetics for estimating phylogenies. ... In statistics, McNemars test is a non-parametric method used on nominal data to determine whether the row and column marginal frequencies are equal. ... In probability theory the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects as the outcome of the random trial when identical odds are... The absolute deviation of an element of a data set is the absolute difference between that element and a given point. ... The mean difference is a measure of statistical dispersion equal to the average absolute difference of two independent values drawn from a probability distribution. ... In mathematics, a mean of circular quantities is a mean which suited for quantities like angles, daytimes, fractional parts of real numbers. ... Mean reciprocal rank is a statistic for evaluating any process that produces a list of possible responses to a query, ordered by probability of correctness. ... In statistics the mean squared error of an estimator T of an unobservable parameter Î¸ is i. ... In statistics the mean squared prediction error of a smoothing procedure is the expected sum of squared deviations of the fitted values from the (unobservable) function . ... The level of measurement of a variable in mathematics and statistics is a classification that was proposed in order to describe the nature of information contained within numbers assigned to objects and, therefore, within the variable. ... In probability theory and statistics, a median is a type of average that is described as the number dividing the higher half of a sample, a population, or a probability distribution, from the lower half. ... In statistics, the median test is a special case of Pearsons chi-square test. ... Mean time between failures (MTBF) is the mean (average) time between failures of a system, the reciprocal of the failure rate in the special case when the failure rate is constant. ... In probability theory, memorylessness is a property of certain probability distributions: the exponential distributions and the geometric distributions. ... A meta-analysis is a statistical practice of combining the results of a number of studies. ... In statistics, the method of moments is a method of estimation of population parameters such as mean, variance, median, etc. ... The Proposal distribution Q proposes the next point that the random walk might move to. ... In statistics, the midhinge, H-spread, or interquartile range is the difference of the first and third quartiles. ... The midrange of a set of statistical data values is the arithmetic mean of the smallest and largest values in the set. ... â€œMinmaxâ€ redirects here. ... In statistics, and more specifically in estimation theory, a minimum-variance unbiased estimator (MVUE or MVU estimator) is an unbiased estimator of parameters, whose variance is minimized for all values of the parameters. ... To meet Wikipedias quality standards, this article or section may require cleanup. ... In statistics, the theory of minimum norm quadratic unbiased estimation (MINQUE) was developed by C.R. Rao. ... A misuse of statistics occurs when a statistical argument asserts a falsehood. ... In mathematics, the term mixture model is a model in which independent variables are fractions of a total. ... Model selection is the task of selecting a mathematical model from a set of potential models, given evidence. ... The Modifiable Areal Unit Problem (MAUP) is a potential source of error that can affect spatial studies which utilise aggregate data sources (Unwin, 1996). ... -1... In probability theory and statistics, the moment-generating function of a random variable X is wherever this expectation exists. ... In statistics, the method of moments is a method of estimation of population parameters such as mean, variance, median, etc. ... The term moving average is used in different contexts. ... Multicollinearity is a statistical term for the existence of a high degree of linear correlation amongst two or more explanatory variables in a regression model. ... The technique is also used in marketing, see Multidimensional scaling in marketing Multidimensional scaling (MDS) are a set of related statistical techniques often used in data visualisation for exploring similarities or dissimilarities in a given data set. ... In statistics, multilevel models are used when some variable under study varies at more than one level. ... In statistics, the multiple comparisons problem tests null hypotheses stating that the averages of several disjoint populations are equal to each other (homogeneous). ... // Introduction In statistics, regression analysis is a method for explanation of phenomena and prediction of future events. ... Multiple Testing Correction refers to re-calculating probabilities obtained from a statistical test which was repeated multiple times. ... Multivariate statistics or multivariate statistical analysis in statistics describes a collection of procedures which involve observation and analysis of more than one statistical variable at a time. ... In probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution, is a specific probability distribution, which can be thought of as a generalization to higher dimensions of the one-dimensional normal distribution (also called a Gaussian distribution). ... In statistics, a multivariate Student distribution is a multivariate generalization of the Students t-distribution. ...

## N

National statistical services Australia: Australian Bureau of Statistics Brazil: Brazilian Institute of Geography and Statistics (IBGE) Belgium: Statistics Belgium Canada: Statistics Canada Colombia: Departamento Administrativo Nacional de Estadistica (DANE) Denmark: Danmarks statistik - http://www. ... In probability and statistics the negative binomial distribution is a discrete probability distribution. ... In statistics, a relationship between two variables is negative if the slope in a corresponding graph is negative, orâ€”what is in some contexts equivalentâ€”if the correlation between them is negative. ... In statistics, the Neyman-Pearson lemma states that when doing a hypothesis test between two point hypotheses H0: &#952;=&#952;0 and H1: &#952;=&#952;1, then the likelihood-ratio test which rejects H0 in favour of H1 when is the most powerful test of size &#945;. ... In probability theory and statistics, the noncentral chi distribution is a generalization of the chi distribution. ... In probability theory and statistics, the noncentral chi-square or noncentral distribution is a generalization of the chi-square distribution. ... In probability theory and statistics, the noncentral F-distribution is a continuous probability distribution that is a generalization of the (ordinary) F-distribution. ... // In statistics, the hypergeometric distribution is the discrete probability distribution generated by picking colored balls at random from an urn without replacement. ... High dimensional data can be difficult to interpret. ... dataset with approximating polynomials Nonlinear regression in statistics is the problem of fitting a model to multidimensional x,y data, where f is a nonlinear function of x with parameters Î¸. In general, there is no algebraic expression for the best-fitting parameters, as there is in linear regression. ... NMF redirects here. ... Non-Parametric statistics are statistics where it is not assumed that the population fits any parametrized distributions. ... Sampling is the use of a subset of the population to represent the whole population. ... The normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields. ... In statistics, the rankits of the data points in a data set consisting simply of a list of scalars are expected values of order statistics of the standard normal distribution corresponding to data points in a manner determined by the order in which the data points appear. ... In statistics, normality tests are concerned with determining whether or not a random variable is normally distributed. ... In probability theory, it is almost a cliche to say that uncorrelatedness of two random variables does not entail independence. ... In industrial statistics, the np-chart is a type of control chart that is very similar to the p-chart except that the statistic being plotted is a number count rather than a sample proportion of items. ... In statistics, a null hypothesis is a hypothesis set up to be nullified or refuted in order to support an alternative hypothesis. ...

## O

Observational error is the difference between a measured value of quantity and its true value. ... In probability theory and statistics the odds in favor of an event or a proposition are the quantity p / (1 âˆ’ p), where p is the probability of the event or proposition. ... The odds-ratio is a statistical measure, particularly important in Bayesian statistics and logistic regression. ... Omitted-variable bias is the bias that appears in an estimate of a parameter if a regression run does not have the appropriate form and data for other parameters. ... An opinion poll is a survey of opinion from a particular sample. ... Probability distributions for the n = 5 order statistics of an exponential distribution with Î¸ = 3. ... In statistics, ordered logit is a flavor of the popular logit analysis, used for ordinal dependent variables. ... In statistics, ordered probit is a flavor of the popular probit analysis, used for ordinal dependent variables. ... In community ecology, ordination is a method of multivariate analysis complementary to data clustering, and used mainly in exploratory data analysis (rather than in hypothesis testing). ... Figure 1. ... Noisy (roughly linear) data is fit to both linear and polynomial functions. ...

## Q

In statistics, the Q test is used for identification and rejection of outliers. ... In statistics, a Q-Q plot (Q stands for quantile) is a tool for diagnosing differences in distributions (such as non-normality) of a population from which a random sample has been taken. ... If is a vector of random variables, and is an -dimensional square matrix, then the scalar quantity is known as a quadratic form in . ... This article or section does not cite any references or sources. ... Quantitative marketing research is a social research method that utilizes statistical techniques. ... Quantitative psychological research is psychological research which performs statistical estimation or statistical inference. ... Quantitative research is the systematic scientific investigation of quantitative properties and phenomena and their relationships. ... In descriptive statistics, a quartile is any of the three values which divide the sorted data set into four equal parts, so that each part represents 1/4th of the sample or population. ... In statistics, the quartile coefficient of dispersion is a descriptive statistic used to make comparisons within and between data sets. ... This article is one of a group being considered for deletion in accordance with Wikipedias deletion policy. ... Lambert Adolphe Jacques QuÃ©telet (February 22, 1796 â€“ February 17, 1874) was a Belgian astronomer, mathematician, statistician and sociologist. ...

## R

â€œRandomâ€ redirects here. ... It has been suggested that this article or section be merged with random effects model. ... It has been suggested that this article or section be merged with random effects estimation. ... Sampling is that part of statistical practice concerned with the selection of individual observations intended to yield some knowledge about a population of concern, especially for the purposes of statistical inference. ... A random sequence is a kind of stochastic process. ... Randomization is the process of making something random; this can mean: Generating a random permutation of a sequence (such as when shuffling cards). ... A randomized controlled trial (RCT) is a form of clinical trial, or scientific procedure used in the testing of the efficacy of medicines or medical procedures. ... There are many practical measures of randomness for a binary sequence. ... In descriptive statistics, the range is the length of the smallest interval which contains all the data. ... Rank-size distribution or the rank-size rule or law describes the remarkable regularity in many phenomena including the distribution of city sizes around the world, size of businesses, particle sizes (such as sand), lengths of rivers, frequency of word usage, wealth among individuals, etc. ... In statistics, the rankits of the data points in a data set consisting simply of a list of scalars are expected values of order statistics of the standard normal distribution corresponding to data points in a manner determined by the order in which the data points appear. ... In statistics, the Rao-Blackwell theorem describes a technique that can transform an absurdly crude estimator into an estimator that is optimal by the mean-squared-error criterion or any of a variety of similar criteria. ... Rasch models are probabilistic measurement models which currently find their application primarily in psychological and attainment assessment, and are being increasingly used in other areas, including the health profession and market research. ... A ratio distribution (or quotient distribution) is a statistical distribution constructed as the distribution of the ratio of random variables having two other distributions. ... The polytomous Rasch model is a measurement model that has potential application in any context in which the objective is to measure a trait or ability through a process in which responses to items are scored with successive integers. ... In statistics and data analysis, a raw score is an original datum that has not been transformed â€“ for example, the original result obtained by a student on a test (i. ... ROC curve of three epitope predictors. ... In descriptive statistics and chaos theory, a recurrence plot (RP) is a plot showing, for a given moment in time, the times at which a phase space trajectory visits roughly the same area in the phase space. ... // Description Recursive least squares algorithm is used in adaptive filters to find the filter coefficients that relate to producing the recursively least squares of the error signal (difference between the desired and the actual signal) Performance This algorithm converges faster than the LMS algorithm. ... Recursive partitioning is a statistical method for the multivariable analysis of medical diagnostic tests. ... In statistics, regression analysis examines the relation of a dependent variable (response variable) to specified independent variables (explanatory variables). ... In statistics, linear regression is a regression method that models the relationship between a dependent variable Y, independent variables Xi, i = 1, ..., p, and a random term Îµ. The model can be written as Example of linear regression with one dependent and one independent variable. ... // Introduction Regression dilution is a statistical phenomena also known as attenuation. Consider fitting a straight line (linear regression) for the relationship of an outcome variable y to a predictor variable x, and estimating the gradient (slope) of the line. ... The regression (or regressive) fallacy is a logical fallacy. ... Regression toward the mean refers to the fact that those with extreme scores on any measure at one point in time will, for purely statistical reasons, probably have less extreme scores the next time they are tested. ... In mathematics, rejection sampling is a technique used to generate observations from a distribution. ... In statistics and mathematical epidemiology, relative risk (RR) of an event associated with the exposure is a ratio of probability of outcome of interest in exposed group versus treatment group. ... In statistics, reliability is the consistency of a set of measurements or measuring instrument. ... Reliability theory developed apart from the mainstream of probability and statistics, and was used originally as a tool to help nineteenth century maritime insurance and life insurance companies compute profitable rates to charge their customers. ... Reliability Theory of Aging and Longevity is a scientific approach aimed to gain theoretical insights into mechanisms of biological aging and species survival patterns by applying a general theory of systems failure, known as reliability theory. ... In statistics, resampling is any of a variety of methods for doing one of the following: Estimating the precision of sample statistics (medians, variances, percentiles) by using subsets of available data (jackknife) or drawing randomly with replacement from a set of data points (bootstrapping) Exchanging labels on data points when... In statistics and optimization, the concepts of error and residual are easily confused with each other. ... In statistics, the residual sum of squares (RSS) is the sum of squares of residuals. ... In statistics, a response variable (or response) is what one measures in an experiment. ... Tikhonov regularization, is the most commonly used method of regularization of ill-posed problems. ... Brian D. Ripley is a distinguished statistician, professor of Applied Statistics at the University of Oxford and a Professorial fellow at St Peters College. ... In statistics, the Robbins lemma, named after Herbert Robbins, states that if X is a random variable with a Poisson distribution, and f is any function for which the expected value E(f(X)) exists, then Robbins introduced this proposition while developing empirical Bayes methods. ... In robust statistics, robust regression is a form of regression analysis designed to circumvent the limitations of traditional parametric and non-parametric methods. ... Robust statistics provides an alternative approach to classical statistical methods. ... Buildings near the manor house The Rothamsted Experimental Station, one of the oldest agricultural research institutions in the world, is located at Harpenden in Hertfordshire, England. ... The R programming language, sometimes described as GNU S, is a programming language and software environment for statistical computing and graphics. ... The Rubin Causal Model (RCM) is an approach to the statistical analysis of cause and effect based on the framework of RCM is named after its originator, Donald Rubin, Professor of Statistics at Harvard University. ... In probability theory, the rule of succession is a formula introduced in the 18th century by Pierre-Simon Laplace in the course of treating the sunrise problem. ...

## T

In probability and statistics, the t-distribution or Students t-distribution is a probability distribution that arises in the problem of estimating the mean of a normally distributed population when the sample size is small. ... Taguchi methods are statistical methods developed by Genichi Taguchi to improve the quality of manufactured goods. ... The introduction to this article provides insufficient context for those unfamiliar with the subject matter. ... Test-retest is a statistical method used to examine how reliable a test is: A test is performed twice, e. ... In statistics, signal processing, and econometrics, a time series is a sequence of data points, measured typically at successive times, spaced at (often uniform) time intervals. ... In statistics, hypotheses suggested by the data must be tested differently from hypotheses formed independently of the data. ... Also know as Tolerance limits. ... A transect is a path along which one records and counts occurrences of the phenomenon of study (e. ... Treatment learning is a process by which an ordered classified data set can be evaluated as part of a data mining session to produce a representative data model. ... A series of measurements of a process may be treated as a time series, and then trend estimation is the application of statistical techniques to make and justify statements about trends in the data. ... A truncated mean or trimmed mean is a statistical measure of central tendency, much like the mean and median. ... Type I errors (or Î± error, or false positive) and type II errors (Î² error, or a false negative) are two terms used to describe statistical errors. ...

## U

The Mann-Whitney U test is one of the best-known non-parametric statistical significance tests. ... In statistics, the term bias is used for two different concepts. ... The standard deviation is often estimated from a random sample drawn from the population. ... Uncomfortable science is the term coined by statistician John Tukey for cases in which there is a need to draw an inference from a limited sample of data, where further samples influenced by the same cause system will not be available. ... Unit-weighted regression is perhaps the easiest form of multiple regression analysis, a method in which two or more variables are used to predict the value of an outcome. ... An urn problem is an idealized thought experiment in which some objects of real interest (such as atoms, people, cars, etc. ...

## V

In psychology, validity has two distinct fields of application. ... This article is about mathematics. ... In probability theory and statistics, the variance-to-mean ratio (VMR), like the coefficient of variation, is a measure of the dispersion of a probability distribution. ... Three functions are used in geostatistics for describing the spatial or the temporal correlation of observations: these are the correlogram, the covariance and the semivariogram. ... The VC dimension (for Vapnik Chervonenkis dimension) is a measure of the capacity of a classification algorithm. ... Vapnik Chervonenkis theory (also known as VC theory) was developed during 1960-1990 by Vladimir Vapnik and Alexey Chervonenkis. ... Points sampled from three von Mises-Fisher distributions on the sphere (blue: , green: , red: ). The mean directions are shown with arrows. ... In probability theory, the Vysochanskij-Petunin inequality gives a lower bound for the probability that a random variable with finite variance lies within a certain number of standard deviations of the variables mean. ...

## W

Under the Wald statistical test, named after Abraham Wald, the maximum likelihood estimate of the parameter(s) of interest is compared with the proposed value , with the assumption that the difference between the two will be approximately normal. ... In probability theory and statistics, the Weibull distribution (named after Waloddi Weibull) is a continuous probability distribution with the probability density function where and is the shape parameter and is the scale parameter of the distribution. ... In statistics and uncertainty analysis, the Welch-Satterthwaite equation is used to calculate an approximation to the effective degrees of freedom of a linear combination of sample variances. ... This article or section is in need of attention from an expert on the subject. ... A Winsorized mean is a statistical measure of central tendency, much like the mean and median, and even more similar to the truncated mean. ... Respondents to a census or other surveys sometimes inaccurately report their or other household members age or date of birth. ... In statistics the White test is a test which establishes whether the residual variance of a variable in a regression model is constant (homoskedasticity). ... The Wilcoxon signed-rank test is a non-parametric alternative to the paired Students t-test for the case of two related samples or repeated measurements on a single sample. ... In signal processing, a window function (or apodization function) is a function that is zero-valued outside of some chosen interval. ... Winsorising is the transformation of outliers in statistical data. ... In statistics, the Wishart distribution, named in honor of John Wishart, is any of a family of probability distributions for nonnegative-definite matrix-valued random variables (random matrices). These distributions are of great importance in the estimation of covariance matrices in multivariate statistics. ... In statistics, Wolds theorem or Wold representation theorem, named after Herman Wold, says that every covariance-stationary time series can be written as an infinite moving average (MA()) process of its innovation process. ...

## X

• X-12-ARIMA
• X-bar/R chart

X-12-ARIMA is the U.S. Census Bureaus software package for seasonal adjustment. ... An X-bar/R chart is a specific member of a family of control charts. ...

## Y

The Yamartino method is an algorithm for calculating the standard deviation of wind direction () during a single pass through the incoming data. ... Yates correction for continuity, or Yates chi-square test, adjusts the formula for Pearsons chi-square test by subtracting 0. ... Youdens J statistic is a single statistic that captures the performance of a diagnostic test. ... In probability and statistics, the Yule-Simon distribution is a discrete probability distribution. ...

## Z

• z-score
• z-factor
• z statistic
• Zipf-Mandelbrot law Results from FactBites:

 Course List - Statistics - Grand Valley State University (1260 words) Topics include sampling and non-sampling errors, questionnaire design, non-probability and probability sampling, commonly used sampling methods (e.g., simple random, stratified, systematic, cluster), estimating population sizes, and random response models. Topics include types of treatment allocation and randomization, patient recruitment and adherence, power and sample size, interacting with monitoring committees, administering multicenter trials, and study closeout. Independent research in an area of statistics or biostatistics that is of interest to the student and the supervising faculty member.
 Browse Topic: Statistics (2343 words) Economics and Statistics Administration (ESA) has three primary missions: 1) maintain the highest quality Federal statistical system, 2) communicate a vision of the key forces at work in the economy, and 3) support the information and analytical needs of the Department and the Executive Branch. BLS is the principal fact-finding agency for the Federal Government in the broad field of labor economics and statistics. The Office of Policy is the agency's source for statistics on the impact and operations of programs and on the earnings of the working and beneficiary populations.
More results at FactBites »

Share your thoughts, questions and commentary here
Press Releases | Feeds | Contact