Correlation coefficients are used in the statistics for measuring how strong a relationship as existing between two variables. There are many types of correlation coefficient like Pearson’s correlation commonly used in linear regression. It is very much popular and useful in statistics. In this article, we will learn about correlation coefficient formula with example. Let us begin learning!
Correlation Coefficient Formula
What is the Correlation?
Correlation refers to a process for establishing whether or not relationships exist between two given variables. So, through this coefficient, one can get a general idea about whether or not two variables are related. There are many measures are available for variables which are measured at the ordinal or higher level of measurement. But still, correlation is the most commonly used approach.
Here we will see how to calculate and interpret correlation coefficients for ordinal and interval level scales. Methods of correlation will summarize the relationship between two variables in a single number known as the correlation coefficient. The correlation coefficient is usually shown by the symbol r and it ranges from -1 to +1.
A correlation coefficient very close to zero, but either positive or negative, will imply little or no relationship between the two variables. Correlation coefficient close to +1 means an increase in one of the variables being associated with increases in the other variable.
A correlation coefficient close to -1 means with an increase in one of the variables being associated with a decrease in the other variable. The two variables X and Y are taken.
Pearson Correlation Coefficient Formula:
It is the most common formula used for linear dependency between the data set. It lies between -1 to +1. When the coefficient comes down to zero, then the data will be considered as not related.
r = \(\frac{N\times \sum{XY}-(\sum{X}\sum{Y})}{\sqrt{ [N\sum{x^2}-(\sum{x})^2 ][N \sum{y^2}-(\sum{y})^2 }]}\)
Where,
r | Pearson Correlation Coefficient |
n | Quantity of Information |
\(\sum X\) | Total of the First Variable Value |
\(\sum Y\) | Total of the Second Variable Value |
\(\sum XY\) | Sum of the Product of & Second Value |
\(\sum X^2\) | Sum of the Squares of the First Value |
\(\sum Y^2\) | Sum of the Squares of the Second Value |
Solved Examples
Q.1: Calculate the correlation coefficient for the following data:
X = 4, 8 ,12, 16 and
Y = 5, 10, 15, 20.
Solution:
Given variables are,
X = 4, 8 ,12, 16 and
Y = 5, 10, 15, 20
To find the linear coefficient of these data, we need to first construct a table as follows to get the required values of the formula.
X | Y | X² | Y² | XY |
4 | 5 | 16 | 25 | 20 |
8 | 10 | 64 | 100 | 80 |
12 | 15 | 144 | 225 | 180 |
16 | 20 | 256 | 400 | 320 |
\(\sum X\) = 40 | \(\sum Y\) =50 | \(\sum X^2\)= 480 | \(\sum Y^2\) = 750 | \(\sum XY\)= 600 |
Now,
r = \(\frac{N\times \sum{XY}-(\sum{X}\sum{Y})}{\sqrt{ [N\sum{x^2}-(\sum{x})^2 ][N \sum{y^2}-(\sum{y})^2 }]}\)
Putting all the values,
r = \(\frac{4\times 600-(40\times 50)}{\sqrt{ [4\times 480-(40)^2 ][4\times 750-(50)^2 }]}\)
Solving we get
r= \(\frac{400}{17.89\times22.36}\)
r= \(\frac{400}{400}\)
r=1
Therefore, correlation coefficient is 1.
I get a different answer for first example.
I got Q1 as 20.5
median 23 and
Q3 26