In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. In the broadest sense correlation is any statistical association, though it commonly refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

Correlations are useful because they can indicate a predictive relationship that can be exploited in practice. For example, an electrical utility may produce less power on a mild day based on the correlation between electricity demand and weather. In this example, there is a causal relationship, because extreme weather causes people to use more electricity for heating or cooling. However, in general, the presence of a correlation is not sufficient to infer the presence of a causal relationship (i.e., correlation does not imply causation).

Formally, random variables are dependent if they do not satisfy a mathematical property of probabilistic independence. In informal parlance, correlation is synonymous with dependence. However, when used in a technical sense, correlation refers to any of several specific types of mathematical operations between the tested variables and their respective expected values. Essentially, correlation is the measure of how two or more variables are related to one another. There are several correlation coefficients, often denoted or , measuring the degree of correlation. The most common of these is the Pearson correlation coefficient, which is sensitive only to a linear relationship between two variables (which may be present even when one variable is a nonlinear function of the other). Other correlation coefficients – such as Spearman's rank correlation – have been developed to be more robust than Pearson's, that is, more sensitive to nonlinear relationships.[1][2][3] Mutual information can also be applied to measure dependence between two variables.

Pearson's product-moment coefficient


Symmetry property ( X , Y ) = corr ( Y , X ) {\displaystyle \operatorname {corr} (X,Y)=\operatorname {corr} (Y,X)} . This is verified by the commutative property of multiplication.

Correlation and independence 1 , 1 ) {\displaystyle (-1,1)} in all other cases, indicating the degree of linear dependence between the variables. As it approaches zero there is less of a relationship (closer to uncorrelated). The closer the coefficient is to either −1 or 1, the stronger the correlation between the variables.

If the variables are independent, Pearson's correlation coefficient is 0, but the converse is not true because the correlation coefficient detects only linear dependencies between two variables.

independent, Pearson's correlation coefficient is 0, but the converse is not true because the correlation coefficient detects only linear dependencies between two variables.

For example, suppose the random variable is symmetrically distributed about zero, and . Then is completely determined by , so that and are perfectly dependent, but their correlation is zero; they are uncorrelated. However, in the special case when and are jointly normal, uncorrelatedness is equivalent to independence.


Even though uncorrelated data does not necessarily imply independence, one can check if random variables are independent if their mutual information is 0.

Given a series of measurements of the pair indexed by , the sample correlation coefficient can be used to estimate the population Pearson correlation between and . The sample correlation coefficient is defined as

and are the sample means of and , and and are the corrected sample standard deviations of and .

Equivalent expressions for are