Linear correlation: the linear association between variables

Article Summary

Linear correlation describes the straight-line relationship between two variables, measured by the Pearson correlation coefficient (r), which ranges from -1 to 1. This article covers how to calculate r, interpret its sign and magnitude, and apply it to real examples. You'll gain a clear, practical understanding of linear correlation in data analysis.

While correlation coefficients measure the strength of association between two variables, linear correlation indicates the strongest association between two variables. Visually, this represents any relationship between two variables that depicts a straight line when plotted out next to each other in a graph. Just like the visual, descriptive statistics is one area of statistical applications that uses numerical and graphical techniques to summarize the data, to look for patterns and to present the information in a useful and convenient way. Try this introduction to descriptive statistics today for a more comprehensive review.

The Pearson product-moment correlation coefficient measures the strength of the linear association between variables. The correlation coefficient of a sample is most commonly denoted by r, and the correlation coefficient of a population is denoted by ρ or R. This R is used significantly in statistics, but also in mathematics and science as a measure of the strength of the linear relationship between two variables. More simply, it expresses how much one number can be expected to be influenced by changes in another.For more study, try this course on inferential statistics to guide you through statistical tests in SPSS, including t tests, ANOVA, correlation, regression, and chi-square.

Before moving onto linear association, let’s start with how to find the correlation coefficient.

First, be sure you’re working with two different sets of data. E ach pair of which will be denoted by (x_i,y_i).

You begin with four initial calculations. The quantities from these calculations will be used in subsequent steps of our calculation of r:
1. Calculate x̄, the mean of all of the first coordinates of the data x_i.
2. Calculate ȳ, the mean of all of the second coordinates of the data y_i.
3. Calculate s_x the sample standard deviation of all of the first coordinates of the data x_i.
4. Calculate s_y the sample standard deviation of all of the second coordinates of the data y_i.
Use the formula (z_x)_i = (x_i – x̄) / s_x and calculate a standardized value for each x_i.
Use the formula (z_y)_i = (y_i – ȳ) / s_y and calculate a standardized value for each y_i.
Multiply corresponding standardized values: (z_x)_i(z_y)_i
Add the products from the last step together.
Divide the sum from the previous step by n – 1, where n is the total number of points in our set of paired data. The result of all of this is the correlation coefficient r.

The complete formula looks like this:

The sign and the absolute value of a correlation coefficient describe the direction and the magnitude of the relationship between two variables.

The value of a correlation coefficient ranges between -1 and 1.
The greater the absolute value of a correlation coefficient, the stronger the linear relationship.
The strongest linear relationship is indicated by a correlation coefficient of -1 or 1.
The weakest linear relationship is indicated by a correlation coefficient equal to 0.
A positive correlation means that if one variable gets bigger, the other variable tends to get bigger.
A negative correlation means that if one variable gets bigger, the other variable tends to get smaller.

It is important to note that just because r equals 0, it does not mean there is zero relationship between two variables. Rather, it means that there is 0 linear relationship. The Pearson product-moment correlation coefficient only measures linear relationships.

Let’s put these rules to use with a popular statistics example.

Question: A magazine reported the following correlations.

The correlation between car weight and car reliability is -0.30.
The correlation between car weight and annual maintenance cost is 0.20.

Which of the following statements are true?

I. Heavier cars tend to be less reliable.

II. Heavier cars tend to cost more to maintain.

III. Car weight is related more strongly to reliability than to maintenance cost.

Options:

(A) I only

(B) II only

(D) I and II only

(E) I, II, and III

Answer: The correct answer is (E). The correlation between car weight and reliability is negative. This means that reliability tends to decrease as car weight increases. The correlation between car weight and maintenance cost is positive. This means that maintenance costs tend to increase as car weight increases.

The strength of a relationship between two variables is indicated by the absolute value of the correlation coefficient. The correlation between car weight and reliability has an absolute value of 0.30, meaning there is a linear correlation between the variables (strongest linear relationship is indicated by a correlation coefficient of -1 or 1) although not very strong. The correlation between car weight and maintenance cost has an absolute value of 0.20 (still a linear relationship, but slightly weaker). Therefore, the relationship between car weight and reliability is stronger than the relationship between car weight and maintenance cost.

Try using the formula to test another number of variable pairs. Need more practice on practical statistics? Try this course on approachable concepts and generating statistical solutions to common questions in user research.

Linear correlation: the linear association between variables

Article Summary

Share article: