Types of Correlation: Tools for Determining Data Relationships

Article Summary

There are several types of correlation in statistics — positive, negative, and more specialized measures like Pearson's, Kendall Rank, and Spearman's. This article covers how each type works, scatter plots, and the correlation-causation distinction. You'll gain a clear foundation for interpreting data relationships confidently.

Ever wonder if one thing had to do with another thing? Like eating chocolate and pimples? I do. Sometimes. Okay, a lot of times!

You can find stuff like this out by using statistic tools called correlation. There are positive and negative types of correlation. Basically, any relationship between two variables is called a correlation. Variables such as these already occur in a population or a group and are not controlled by someone doing the experiment. When a direct relationship occurs in such a way that a second variable increases when the first one increases like when more cars overheat in higher weather temperatures, this is called a positive correlation. In contrast, when one variable amount goes up and another variable amount goes down, like eating more ice cream has to do with less temper tantrums in a week, this is called a negative correlation.

In both these correlation types, there is no proof or evidence that a change in one variable causes another variable to change as well. In other words, more ice cream may not be causing fewer temper tantrums. Correlations are simply indications that there are relationships between both variable. It is important to remember that causation and correlation are not the same.

Here is a course entitled Introductory Statistics Part 1: Descriptive Statistics that helps you learn applications and concepts of statistics at your own pace.

In the subject of statistics, any relationship between two data sets or two random variables is called ‘dependence.’ Correlation refers to any relationship in statistics that has to do with dependence. Examples of this include the correlations between the appearance of kids and their parents. There are also correlations between the price of a product and the demand it generates. Since correlations indicate predictive relationships this is sometimes exploited. For instance, less power may be produced by electrical utilities company on milder days since they know the correlation between power demands and weather. Like they know that in extreme conditions of the weather, people use more electricity for cooling and heating.

This is why in statistics, correlations are very useful. These not only explain how strong a relationship is, but also describe connections between variables. For example, two variables can be the size of a city and the crime rate, observers can draw correlations between a city’s population and how crime. When you find out the correlation between these two variables, you can then make a statement such as, ‘when the size of the city increases, so does the crime rate.’ Or ‘there is no relationship between city size or crime rate.’ There can either be negative or positive correlations and the varied results depend on the type of performed correlations. Here is an article about learning statistics quickly and easily that could get you ahead in statistics class.

Kendall Rank Correlation

Named for Maurice Kendall, the British statistician, the Kendall Rank Correlation measures the dependence strength between 2 random sets of variables. Kendall can be utilized for further analyzing statistics when the null hypothesis is rejected by Spearman’s Correlations. This is used with the interval, ratio or ordinal heterogeneous and homogenous ranked data. Also known as discordant pairs, this attains a correlation when one of the variable values, which ‘x’ represents decreases and the other value of the variable, represented by ‘y’ becomes increased. When both variables simultaneously increase, a correlation then occurs. This is also known as a pair that is ‘concordant.’ Want to learn more? Here is a course called Practical Statistics for the User Experience I that shows you how to interpret large and small sample tests.

Rank Correlation Coefficient

These types of correlation measure the extents to which one there is an increase in one variable, there is also an increase in the other one without requiring that a linear relationship represent this increase. Examples of the Rank correlation coefficient are Kendall’s Rank Correlation Coefficient and Spearman’s Rank Correlation Coefficient. You get a negative Rank Correlation if there is an increase in one variable but a decrease in the other one. Many people think of this type of correlation to be the Pearson’s coefficient alternative, used either to make the coefficient less sensitive to distribution non-normality or for reducing calculation amounts. On the other hand, there is little mathematical basis for views such as this, as correlations of this type measure various relationship types than the Pearson’s version. Thus, this type is seen as measures of various association types rather than alternatives for measuring the correlation coefficient of a population. So that you understand the way this correlation type works, here are a few pairs of number for you to consider:

(102, 2000). (101,500), (10,100), (0,1)

You will notice as you look at each pair that in every one of them, as x decreases, so does y. This makes it a perfect relationship, since both numbers do the same thing (x increases and y increases too). In the same manner, if x happens to increase but y decreases, the rank correlation coefficient will be negative one (-1).

Pearson’s Product-Moment Coefficient

Between 2 quantities, the most familiar measure of correlation is the Pearson’s Correlation Coefficient or the Pearson’s Product Moment Correlation Coefficient. Most of the time, people simply refer to this as the correlation coefficient. You get this number by dividing 2 variables’ covariance by the standard deviations product:

R = covariance/ (standard deviation x) (standard deviation y)

This type of coefficient was developed by someone named Karl Pearson from Francis Galton’s slightly different by similar idea. Karl Pearson also happens to be the founder of the discipline of mathematical statistics. Basically, the Pearson Coefficient acts as a statistics tool of measurement represented by a math formula using interval homogenous data or same data or a quantitative ratio. This is considered to be a linear simple correlation which means that the 2 variables relationship depends on them being regular or constant. A correlations’ regularity or strength is measured by Pearson and whether or not this is a negative or a positive relationship. You have a stronger correlation the closer the value of r comes to +1.00 or -1.00. The correlation is weaker the closer the value of r comes to the number zero. So if r equals .90 or -.90 this would be a stronger relationship than .09 or -.09.

Scatter Plot

Also known as a scatter diagram, this shows the relationship of 2 interval variables of ratio on a grid with coordinates. Here, you only see points. In regression analysis, this is step one. Actually a scatter plot is a pretty fast way of seeing if there is an association between variables and how strong the association happens to be. Scatter plots also depict a relationship’s direction. In a straight line, all clustered points together suggest a relationship that is strong. A relationship can still exist even if a few points are on the line’s exterior. If point are scattered but are not clustered, there is no relationship and it is considered random. If you want to take this a step further, here is a course about Inferential Statistics that shows you how to interpret and analyze tests step by step.

Correlation Determination

To determine the measurement of the linear regression results of proportional reduction error, correlation determination is used. This type of correlation also shows the proportion of the dependent variable’s total variation. This is also known as the coefficient of determination. A negative sign is added to the answer if the covariance originally was also negative. The formula used to determine this is:

R squared = covariance squared/ (variance x) (variance y)

Other Correlation Types

Correlation ratios are able to detect almost any correlation that is functional. To detect more general dependencies are other types such as: the dual total correlation, the total correlation and the entropy based mutual information.

Beyond simple linear regression, the coefficient of determination generalizes a relationship’s correlation coefficient.

One way of capturing a more complete view of correlation is to consider that between them is a copula.

Another correlation you can apply to ordinal data aiming to estimate a correlation between latent theorized variables is called the polychoric correlation.

Brownian Correlation or Covariance is one type of correlation that was made for addressing the Pearson’s correlation deficiency which can be zero for random dependent values. In the Brownian version, zero correlation and zero distance imply independence.

It is not enough to define correlation between variables that are random just with the correlation coefficient. It is only in very specific cases such as when the correlation coefficient defines the dependence structure completely, like in a multivariate normal distribution.

A Note on Correlation and Causality

Causality and correlation is used for reporting findings in social studies and experiments. The thing is, media tends to use the two as if they were one and the same or interchangeable. While causalities are correlations, it is a logical fallacy to mistake a correlation with cause/effect relationships. It is important to know the difference between the two so that experiment results can be interpreted properly.

Correlations are when there is a type of relationship between two things, not necessarily being a relationship of cause and effect, such as ‘there are higher rates of lung cancer in people who smoke,’ which is a statement of positive correlation, since lung cancer increases as smoking does. This is not the same as causality. On the flip side of the coin, when you hear the word ‘causality,’ this means that something causes another, or that there is a cause-effect relationship with something else. It is hard to prove causalities since you need evidence proving that between 2 things, there is a relationship and also that the type of relationship it is happens to be cause and effect. For instance, a statement of causality would be, ‘lung cancer is caused by smoking’ since it says that one thing is actually caused by another thing. The primary difference between causality and correlation is that causality is not proved by correlation. Two things can have a relationship but this does not mean that B is caused by A. Thing A may be caused by Thing B or some other reason may be causing them both. For instance, studies show that compared to non-pet owners, there are lower depression rates in pet owners. This is a negative correlation. However it does not mean to say that pet ownership causes lower depression rates or that lower depression rates causes people to become pet owners. It just means that the two are related without pinpointing the cause.

Hope this helps! Here is a course entitled Workshop in Probability and Statistics that shows you the step by step methods of statistics fundamentals.