Rank Correlation: Spearman Coefficient, Methods, Formula, Examples

Sometimes there doesn’t exist a marked linear relationship between two random variables but a monotonic relation (if one increases, the other also increases or instead, decreases) is clearly noticed. Pearson’s Correlation Coefficient evaluation, in this case, would give us the strength and direction of the linear association only between the variables of interest. Herein comes the advantage of the Spearman Rank Correlation methods, which will instead, give us the strength and direction of the monotonic relation between the connected variables. This can be a good starting point for further evaluation.

The Spearman Rank-Order Correlation Coefficient

The Spearman’s Correlation Coefficient, represented by ρ or by r_R, is a nonparametric measure of the strength and direction of the association that exists between two ranked variables. It determines the degree to which a relationship is monotonic, i.e., whether there is a monotonic component of the association between two continuous or ordered variables.

Monotonicity is “less restrictive” than that of a linear relationship. Although monotonicity is not actually a requirement of Spearman’s correlation, it will not be meaningful to pursue Spearman’s correlation to determine the strength and direction of a monotonic relationship if we already know the relationship between the two variables is not monotonic.

On the other hand if, for example, the relationship appears linear (assessed via scatterplot) one would run a Pearson’s correlation because this will measure the strength and direction of any linear relationship. Monotonicity –

Spearman rank correlation

Spearman Ranking of the Data

We must rank the data under consideration before proceeding with the Spearman’s Rank Correlation evaluation. This is necessary because we need to compare whether on increasing one variable, the other follows a monotonic relation (increases or decreases regularly) with respect to it or not.

Thus, at every level, we need to compare the values of the two variables. The method of ranking assigns such ‘levels’ to each value in the dataset so that we can easily compare it.

Assign number 1 to n (the number of data points) corresponding to the variable values in the order highest to lowest.
In the case of two or more values being identical, assign to them the arithmetic mean of the ranks that they would have otherwise occupied.

For example, Selling Price values given: 28.2, 32.8, 19.4, 22.5, 20.0, 22.5 The corresponding ranks are: 2, 1, 5, 3.5, 4, 3.5 The highest value 32.8 is given rank 1, 28.2 is given rank 2,…. Two values are identical (22.5) and in this case, the arithmetic means of ranks that they would have otherwise occupied ($\frac{3 + 4}{2})$ has to be taken.

Browse more about Correlation and Regression

The Formula for Spearman Rank Correlation

$$ r_R = 1 – \frac{6\Sigma_i {d_i}^2}{n(n^2 – 1)} $$

where n is the number of data points of the two variables and d_i is the difference in the ranks of the i^th element of each random variable considered. The Spearman correlation coefficient, ρ, can take values from +1 to -1.

A ρ of +1 indicates a perfect association of ranks
A ρ of zero indicates no association between ranks and
ρ of -1 indicates a perfect negative association of ranks.
The closer ρ is to zero, the weaker the association between the ranks.

Solved Examples for On Spearman Rank Correlation

Question: The following table provides data about the percentage of students who have free university meals and their CGPA scores. Calculate the Spearman’s Rank Correlation between the two and interpret the result.

State University	% of students having free meals	% of students scoring above 8.5 CGPA

Pune	14.4	54
Chennai	7.2	64
Delhi	27.5	44
Kanpur	33.8	32
Ahmedabad	38.0	37
Indore	15.9	68
Guwahati	4.9	62

Solution: Let us first assign the random variables to the required data –

X – % of students having free meals
Y – % of students scoring above 8.5 CGPA

Before proceeding with the calculation, we’ll need to assign ranks to the data corresponding to each state university. We construct the table for the rank as below –

State University	d_X = Ranks_X	d_Y = Ranks_Y	d = (d_X – d_Y)	d²

Pune	3	4	-1	1
Chennai	2	6	-4	16
Delhi	5	3	2	4
Kanpur	6	1	5	25
Ahmedabad	7	2	5	25
Indore	4	7	-3	9
Guwahati	1	5	-4	16
				Σd² = 96

Now, using the formula(with n = 7 here) – $$ r_R = 1 – \frac{6\Sigma_i {d_i}^2}{n(n^2 – 1)} $$ $$ = 1 – \frac{6.96}{7.(49 – 1)} $$ $$ = 1 – \frac{576}{336} $$ $$ = -0.714 $$

Such a strong negative coefficient of correlation gives away an important implication – the universities with the highest percentage of students consuming free meals tend to have the least successful results (and vice-versa). Similarly, we can solve all other questions.

Share with friends