In statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. It is the natural generalization to higher dimensions, of the concept of the variance of a scalarvalued random variable. Statistics is a type of data analysis which practice includes the planning, summarizing, and interpreting of observations of a system possibly followed by predicting or forecasting of future events based on a mathematical model of the system being observed. ...
Probability theory is the mathematical study of probability. ...
In probability theory and statistics, the covariance between two realvalued random variables X and Y, with expected values and is defined as: where E is the expected value. ...
In probability theory and statistics, the variance of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. ...
Scalar is a concept that has meaning in mathematics, physics, and computing. ...
A random variable can be thought of as the numeric result of operating a nondeterministic mechanism or performing a nondeterministic experiment to generate a random result. ...
If X is a column vector with n scalar random variable components, and μ_{k} is the expected value of the k^{th} element of X, i.e., μ_{k} = E(X_{k}), then the covariance matrix is defined as: Scalar is a concept that has meaning in mathematics, physics, and computing. ...
In probability theory (and especially gambling), the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects to win per bet if bets with identical...

The (i,j) element is the covariance between X_{i} and X_{j}. In probability theory and statistics, the covariance between two realvalued random variables X and Y, with expected values and is defined as: where E is the expected value. ...
This concept generalizes to higher dimensions the concept of variance of a scalarvalued random variable X, defined as In probability theory and statistics, the variance of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. ...
where μ = E(X).
Conflicting nomenclatures and notations
Nomenclatures differ. Some statisticians, following the probabilist William Feller, call this matrix the variance of the random vector X, because it is the natural generalization to higher dimensions of the 1dimensional variance. Others call it the covariance matrix, because it is the matrix of covariances between the scalar components of the vector X. Unfortunately, several different conventions jar to some degree with each other: William (Vilim) Feller (July 7, 1906  January 14, 1970) was a CroatianAmerican mathematician specializing in probability theory. ...
In probability theory and statistics, the covariance between two realvalued random variables X and Y, with expected values and is defined as: where E is the expected value. ...
Standard notation: Also standard notation (unfortunately conflicting with the above): Also standard notation:  (the "crosscovariance" between two random vectors)
The first two of these usages conflict with each other. The first and third are in perfect harmony. The first notation is found in William Feller's universally admired twovolume book on probability. William (Vilim) Feller (July 7, 1906  January 14, 1970) was a CroatianAmerican mathematician specializing in probability theory. ...
Properties For and the following basic properties apply:  If p = q,
 If and are independent, then
where and are a random vectors, is a random vector, is vector, and are matrices. This covariance matrix (though very simple) is a very useful tool in many very different areas. From it a transformation matrix can be derived that allows one to completely decorrelate the data or, from a different point of view, to find an optimal basis for representing the data in a compact way. This is called principal components analysis (PCA) in statistics and KarhunenLoève transform (KLtransform) in image processing. In linear algebra, linear transformations can be represented by matrices. ...
In statistics, principal components analysis (PCA) is a technique that can be used to simplify a dataset; more formally it is a linear transformation that chooses a new coordinate system for the data set such that the greatest variance by any projection of the data set comes to lie on...
Statistics is a type of data analysis which practice includes the planning, summarizing, and interpreting of observations of a system possibly followed by predicting or forecasting of future events based on a mathematical model of the system being observed. ...
This article needs to be cleaned up to conform to a higher standard of quality. ...
Complex random vectors The variance of a complex scalarvalued random variable with expected value μ is conventionally defined using complex conjugation: In mathematics, the complex numbers are an extension of the real numbers by the inclusion of the imaginary unit i, satisfying . ...
In mathematics, the complex conjugate of a complex number is given by changing the sign of the imaginary part. ...
where the complex conjugate of a complex number z is denoted z ^{*} . If Z is a columnvector of complexvalued random variables, then we take the conjugate transpose by both transposing and conjugating, getting a square matrix: In mathematics, the conjugate transpose or adjoint of an mbyn matrix A with complex entries is the nbym matrix A* obtained from A by taking the transpose and then taking the complex conjugate of each entry. ...
where Z ^{*} denotes the conjugate transpose, which is applicable to the scalar case since the transpose of a scalar is still a scalar.
Estimation The derivation of the maximumlikelihood estimator of the covariance matrix of a multivariate normal distribution is perhaps surprisingly subtle. It involves the spectral theorem and the reason why it can be better to view a scalar as the trace of a matrix than as a mere scalar. See estimation of covariance matrices. In probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution (in honor of Carl Friedrich Gauss, who was not the first to write about the normal distribution) is a specific probability distribution. ...
In mathematics, particularly linear algebra and functional analysis, the spectral theorem is a collection of results about linear operators or about matrices. ...
Scalar is a concept that has meaning in mathematics, physics, and computing. ...
In multivariate statistics, the importance of the Wishart distribution stems in part from the fact that it is the probability distribution of the maximum likelihood estimator of the covariance matrix of a multivariate normal distribution. ...
External references 