  Data analysis forms the basis of statistics. Data analysis is essential in the business world today, as the data companies gather from the market influences each and every decision they make. The correlation coefficient formula is one of the best ways of forming opinions on the basis of statistics.

## Sorting Data

Data that is obtained through research is generally converted into numeric form, so that further calculations can be made on the data and it becomes easy to handle. Some of the common methods to handle and analyze data include ranking data variables, organizing them on charts and graphs, sorting them into different categories and calculating the mean, the mode and the median of gathered data variables.

## Understanding Data Variables

What are these data variables we keep mentioning? They can be anything, like the height of a person, a person’s age or goods like candy, or a basketball. Why are these data variables relevant? It’s obvious that a child will consume more candy than a grown up. However, a child may not play as much basketball as a teenager or an adult. There is an inverse (negative) relationship between the data variables the age of the person and the candy consumed- as a person grows older, he consumes less candy. However, there is a direct (positive) relationship between the age of a person and time spent playing basketball- a person plays more basketball as he grows older.

If you were a company making candy or basketballs, you will want to know the relationship between these data variables, so you can target your product toward a particular spectrum of the population.

## Using the Correlation Coefficient Formula: Find the Relationship between your Data Variables

The correlation coefficient formula is a very useful formula in statistics. It can help you calculate the relationship between two data variables on a scale of -1 to +1. If your result is +1, this means that your two variables are a perfect positive match (which happens rarely). If your result is 0, your variables don’t match at all. If your result is -1, your variables are a negative match.

Let’s continue using our candy and basketball example. We will have three working variables in that case: age, candy consumed and basketball played. If we calculated the relationship between age and candy consumed with the correlation coefficient formula, the value of the correlational coefficient would be somewhere on the scale of 0 to -1. Why? Because there is an inverse relationship between these two variables! The older a person gets, the less the candy he or she consumes. If you plot these two points on a graph and draw a line, it would have a negative slope.

If we were to calculate the correlation coefficient for age and basketball, it would be positive- somewhere between 0 and 1. On a graph, the line you would draw from (0,0) would have a positive slope.

The correlation coefficient (also called the Pearson product-moment correlation coefficient) is nothing but the measure of dependence between two variables x and y. In our example, the variable x would have been age and the variable y would be candy (or basketball). It doesn’t matter if you switch the values around- age could be y and candy or basketball could be x. The correlation coefficient still has the same value (it would still occupy the same location on a graph).

## The Correlation Coefficient Formula Explained

The correlation coefficient formula is as follows:

(r) =[ nΣxy – (Σx)(Σy) / Sqrt([nΣx2 – (Σx)2][nΣy2 – (Σy)2])]

Looks complicated? Let’s break it down:

r: The correlation coefficient is denoted by the letter r.

n: Number of values. If we had five people we were calculating the correlation coefficient for, the value of n would be 5.

x: This is the first data variable.

y: This is the second data variable.

Σ: The Sigma symbol (Greek) tells us to calculate the “sum of” whatever is tagged next to it.

Example: Let’s calculate the correlation coefficient for a set of data to help you understand the formula better. Ordinarily, most calculators (the scientific ones) calculate the coefficient automatically if you input the data variables, but you should try it a couple of times on paper- it’ll help you grasp the concept better. Let’s take ages of children as our x variable and the candy consumed as y variable. Let’s assume that after research we found that as kids got older, they ate less candy. The following are the value of the data variables we received for three children (or a single child at three stages of his life.

 x (Age of Child) Y (Candy Consumed) 6 10 7 9 8 8

Step 1: Find all the values we need

Here, n (number of variables in x and y) would be 3.

 x y xy x2 y2 6 10 60 36 100 7 9 63 49 81 8 8 64 64 64

The other values we need are:

Σx =6 + 7 + 8 = 21

Σy = 10 + 9 + 8 = 27

Σxy = 60 + 63 + 64 = 187

Σx2 = 36 + 49 + 64 = 149

Σy2 = 100 + 81 + 64 = 245

Step 2: Input Values into the Formula

(r) =[ nΣxy – (Σx)(Σy) / Sqrt([nΣx2 – (Σx)2][nΣy2 – (Σy)2])]

r = [3(187) – (21)(27) / Sqrt ([3(149) – (21) 2 ][3(245)-(27) 2  ])]

r= [561-567/ Sqrt ([447-441][735-729])]

r= [-6/ Sqrt ( )]

r=[-6/ Sqrt (36)]

r= -6/ 6

r= -1

Explanation

The correlational coefficient we obtained is a perfect minus 1. This shows that there is a perfect negative (inverse) match between our two variables. As the value of one variable increases, the value of the other variable goes down. In reality, there would be some kids who didn’t eat candy when they were young, kids who didn’t eat much candy when they were young and kids who ate different amounts of candy at different ages. So the value of the coefficient would probably almost never be -1 – instead, it would be somewhere between 0 and 1.

Of course, we took a relatively simple example to show you that Data Analysis isn’t as hard as it looks! You can apply the same principles to analyse data for a wide range of business situations.

Page Last Updated: February 2020