Pearson Correlation Coefficient, Covariance Matrix and Linear Dependency

After reading this post, I think you will find it easy to explain this picture from Wikipedia:

 

Several sets of (x, y) points, with the correlation coefficient of x and y for each set.

Several sets of (x, y) points, with the correlation coefficient of x and y for each set.

 

From Wikipedia,

In statistics, the Pearson product-moment correlation coefficient (sometimes referred to as the PMCC, and typically denoted by \rho) is a measure of the correlation (linear dependence) between two variables X and Y, giving a value between +1 and −1 inclusive.

The definition of Pearson coefficient is

\rho_{X,Y} = \frac{cov(X,Y)}{\sigma_X\sigma_Y}

where cov(X,Y) is the covariance between X and Y, and \sigma_X and \sigma_Y are the standard deviations.

For your reference, standard deviation is the square root of variance, and the estimate of variance from n samples \{x_i\} is:

var(X) \doteq \frac{1}{n} \sum_{i=1}^n (x_i - \mu_X)^2

The estimate of covariance between X and Y is

cov(X,Y) \doteq \frac{1}{n} \sum_{i=1}^n (x_i - \mu_X) (y_i - \mu_Y)

The estimate of Pearson coefficient is

\rho_{X,Y} \doteq \frac{\sum_{i=1}^n (x_i - \mu_X) (y_i - \mu_Y)}{\sqrt{ \sum_{i=1}^n (x_i-\mu_X)^2} \sqrt{\sum_{i=1}^n (y_i - \mu_Y)^2}}

The denominator is simply a non-negative normalization factor, and only the nominator (the covariance) measures the correlation between X and Y. So let us have a close look at covariance.

Consider a sample of four symmetric bivariate points \{d_i = \langle x_i, y_i \rangle\}_{i=1}^4: (We consider bivariate samples here only because they can be shown on the 2D screen.)

4samples

It is not hard to verify that the estimate of covariance from this sample is 0. Note that if a point is in the 1st and 3rd orthant, then (x_i-\mu_X)(y_i-\mu_y) is positive, otherwise it is negative. So generally, given a set of points distributed symmetrically around their mean, we have covariance cov(X,Y) \rightarrow 0 and thus \rho_{X,Y}\rightarrow 0.

To make cov(X,Y) or \rho_{X,Y} larger than 0, we need more points in the 1st and 3rd orthants, for example:

6samples

Similarly, to make cov(X,Y) or \rho_{X,Y} less than 0, we need more points in the 2nd and 4th orthants.  However large (or small) the value of cov(X,Y) is, the denominator of $\rho_{X,Y}$ normalizes the result between [-1,1].

The covariance matrix of X and Y is

C(X,Y) = \begin{bmatrix} var(X), & cov(X,Y) \\ cov(X,Y), & var(Y) \end{bmatrix}

which contains all the factors used to compute \rho_{X,Y}.

An experiment is to construct a covariance matrix with extremely large cov(X,Y) given var(X)=var(Y)=1:

C(X,Y) = \begin{bmatrix} 1, & 1 \\ 1, & 1 \end{bmatrix}

then we sample 1000 points from a Gaussian distribution N(d; [0,0], C(X,Y), plot the sampling result:

You see, all samples are in the sample line.

If we relax cov(X,Y) to be 0.9, the result is:

If we use negative cov(X,Y), say -0.9,  then we have

C(X,Y) = \begin{bmatrix} 1, & -0.9 \\ -0.9, & 1 \end{bmatrix}

and

All above plots have mean slope is either 1 or -1.  How can we have other slope values? The result is to change the bounding box defined by var(X) or var(Y).  For example

C(X,Y) = \begin{bmatrix} 2, & \sqrt{2} \\ \sqrt{2}, & 1 \end{bmatrix}

Note that the covariance matrix C(X,Y) must be positive-definite, which, for any non-zero vector z, has z^T C z>0. In bivariate case, consider z=[a,b], the constraint is equivalent to

a^2 var(X) + 2ab\, cov(X,Y) + b^2 var(Y) > 0

If cov(X,Y)=\pm\sqrt{var(X)}\sqrt{var(Y)}, the left-side of above equation can be written as a square-form and is thus \geq 0. If we want > 0, we need

|cov(X,Y)| < \sqrt{var(X) var(Y)} = \sigma_X\sigma_Y

This constraints the Pearson coefficient between [-1,1].  When \rho_{X,Y} is 1 or -1,  all sample points of X and Y are on a line, and we say X and Y are linearly dependent.

Plots in this post were drawn using the following MATLAB command:

M = sample_gaussian([0,0,], [2,sqrt(2);sqrt(2),1], 1000);
cla; scatter(M(:,1), M(:,2)); axis tight

where function sample_gaussian can be found in my previous post.