Correlation Matrix: A simple way to map the statistical relationship between objects

partial correlationImagine you have a thousand options and you need to choose the top five based on certain criteria. How would you do it? Now take it a step further – imagine you’re a stock analyst. How will you select the top stocks to buy in a particular industry? The correlation coefficient matrix, though a bit of a mouthful, is quite popular with stock market traders. It helps them analyze market trends and make predictions for the future. The correlation coefficient matrix, or just the correlation matrix as it is popularly called, is related to the concept of covariance in statistics.

Reading a basic matrix

Before we move on to the correlation coefficient matrix, let’s take a quick look at a basic matrix first:

In this matrix, there are three rows and three columns, with a total of nine elements. This is a basic square matrix.

We’ll use a square matrix to explain the concept of the correlation coefficient matrix. However, a correlation coefficient matrix will be much larger and will include many more elements if it’s calculated from real data sources.

Sample Variance

Now, let’s assume we have calculated the sample variance (  of three distinct variable elements x, y and z. What is sample variance? The sample variance is just the sum of squares (calculated through the sample central movement, the sample mean and sample size). It shows the variance (a statistical concept) between two or more variables.

A variance- covariance matrix is denoted by the Greek letter Σ. If we calculated the sample variances of three elements x, y and z, it would be denoted by the notations , and . The formula for calculating the sample variance of a variable is as follows:

(n+1) Σ=

The first element of the matrix  would represent the sample variance of x with itself. The second element of the matrix  would represent the sample variance of x with y and so on. If you notice, three elements of the matrix – ,  and – get repeated twice in the matrix. The upper half of the matrix mirrors the lower half of the matrix. This makes the variance-covariance matrix a symmetric matrix.

In reality, a variance-covariance matrix would look something like this:

Σ =

Keep in mind that this is just an example of variance-covariance matrix. It is not mathematically accurate. It is very difficult to calculate and maintain a variance-covariance or a correlational matrix in the real world – there is a reason why companies employ actuaries and data analysts after all!

If the variance between the variables is positive, it means that as the value of one variable goes up, the value of the other variable goes up as well. If the variance between the variables is negative, it means that as the value of one variable goes up, the value of the other variable goes down.

Reading a Correlation Matrix

The correlation matrix is calculated through the sample variance of the data variables. For example, if you wanted to calculate the correlation coefficient ( )of two variables x and y, you would use the following formula:

= /

We apply the formula for all the variables that form the variance-covariance matrix to make a correlation matrix. However, the first element in the first row, the second element in the second row and the third element in the third row of the correlation matrix is always one. Why is that so? Because that shows the correspondence of a variable with itself!Try calculating the correlation coefficient for the variable x with itself. It will always be one.

A correlation matrix takes the following form:

You can see that the correlation matrix is a symmetric matrix as well. The upper half of the matrix is mirrored by the lower half of the matrix.

Here is an example of a correlation matrix:

As you can see, all the values of the correlation coefficient are between plus 1 and minus 1. It’s almost impossible for the coefficient to be exactly plus 1 or minus 1. This means that in the real world no variable will have a stable relationship with another variable at all times. If the coefficient is positive, a rise in the value of one variable will show a corresponding rise in the value of the other variable.

The closer the value of a coefficient is to 1, the closer the relationship between the two data variables in question is. If the value of the coefficient is positive or negative but it is very minute (for example 0.0001234), this means that there may be little to no relationship between the two variables in question. If you want to take a real world example, think of the relationship between medicine prices and gold prices- a rise or drop in the price of gold will hardly affect medicine prices, meaning there is little to no relationship between the two variables.

Correlation Matrices as they are used in the Real World

A correlation matrix that is calculated for the stock market will probably show the short-term, medium-term and long-term relationship between data variables. For example, if we take the prices of gold and silver, we can see that in the long-term the price of one rises as the price of the other rises. This shows a positive correlation between the two in the long-term. However, this may not be the case in the short-term (1 month) or the medium-term (3-6 months). You can see why a correlation matrix would prove very useful for traders. To make a correct prediction, data is usually pulled from a suitably lengthy time period (5 years or more), put into a numerical form and then studied through the correlation matrix.

A correlation matrix is also used by actuaries to calculate risks for big companies and banks. Risk factors become the data variables and the relationship between them is studied. They may be classified as high risk (more than 0.75), medium risk (.25 to .75) and low risk (less than .25). Typically, it’s almost impossible to calculate a correlation matrix if there are hundreds of risk factors involved. In many cases, the matrices end up being too complicated to understand. You find a way around this problem by calculating an eigen decomposition matrix from the standard matrix, which makes it much easier to study and understand.