The range and the mean deviation of data distributions serve their purpose in characterizing the spread of a distribution curve, but are very limited in their application and suffer from serious drawbacks. The Standard Deviation and coefficient of variation is, therefore, an improved measure of dispersion of a given dataset. Also, it can be used as a good parameter to characterize different curves.
It builds upon the concept of the mean deviation and is actually the root mean squared deviation of a dataset. In financial markets, it is a common term used in deals involving stocks, mutual funds, ETFs and others. So now let’s understand what this important quantity actually means!
Suggested Videos
Standard Deviation
As the name suggests, this quantity is a standard measure of the deviation of the entire data in any distribution. Usually represented by s or σ. It uses the arithmetic mean of the distribution as the reference point and normalizes the deviation of all the data values from this mean.
Therefore, we define the formula for the standard deviation of the distribution of a variable X with n data points as – $$ s = \sqrt{\frac{\Sigma(x_i – \bar{x})^2}{n}} $$
This formula can sometimes be useful in this alternate form as well – $$ s = \sqrt{\frac{\Sigma x_i^2}{n} – {\bar{x}}^2} $$
Also, sometimes if appropriate for a frequency distribution f(X), we can compute the standard deviation as – $$ s = \sqrt{\frac{\Sigma f_i(x_i – \bar{x})^2}{n}} $$
Alternatively – $$ s = \sqrt{\frac{\Sigma f_ix_i^2}{n} – {\bar{x}}^2} $$
Similarly, for a grouped frequency distribution, the following formula is easy and efficient to use – $$ s = \sqrt{\frac{\Sigma f_i{d_i}^2}{n} – (\frac{\Sigma f_id_i}{n})^2} $$
where, \(d_i = \frac{x_i – A}{C}\).
x_{i} – the midpoint of the i’th class
A – the mid-point value of the median class
C – class width
Browse more Topics under Measures Of Central Tendency And Dispersion
- Arithmetic Mean
- Median and Mode
- Partition Values or Fractiles
- Harmonic Mean and Geometric Mean
- Measure of Dispersion
- Range and Mean Deviation
- Quartiles, Quartile Deviation and Coefficient of Quartile Deviation
Properties of the Standard Deviation
- Since we interpret the value of the standard deviation as the normalized spread of the distribution’s data points about the mean; we may attribute a measure of dispersion to it easily. Such a measure of dispersion is an absolute measure of dispersion since it’s value (with units) is different for every distribution, even for two distributions with the same shape.
- The value of the standard deviation is always positive. It is obvious as well because, in the formula, we are taking the square root of a positive quantity. This in turn actually means that here we are only dealing with the deviation (difference) in the values from the mean. We don’t care about the inclination; whether the inclination is towards the left side of the mean or the right side on the distribution curve.
- Sometimes, the variance of a curve is given as the measure of dispersion instead of the usual standard deviation. The relation between them is as follows – $$ Variance = (\sigma)^2 $$
The standard deviation, in turn, is the positive square root of the variance. - The value of the standard deviation of a constant variable (which assumes a constant value over every point) is equal to 0. Clearly, its spread would be 0, if it always stays constant. Actually, all of its deviations will also be equal to 0.
- If a random variable is transformed into a new random variable by a change of scale and a shift of origin as –
Y = aX + b
where Y – the new random variable
X – the original random variable
a,b – constants
Then the standard deviations of X and Y can be related as –
s_{Y} = |a|s_{X}
Clearly, the shift in origin doesn’t affect the shape of the distribution, and therefore its standard deviation remains unchanged. Only the scaling factor is important.
Important Formulae
For the combined standard deviation of multiple distributions – X_{1}, X_{2}, X_{3} … X_{k}, with individual standard deviations – s_{1}, s_{2}, s_{3} … s_{k}, arithmetic means – a_{1}, a_{2}, a_{3} …. a_{k}, and each distribution in turn containing n_{1}, n_{2}, n_{3} …. n_{k} number of data points, the formula is –
$$ s_{combined} = \sqrt{\frac{(n_1{s_1}^2 + n_2{s_2}^2 + ….. n_k{s_k}^2) + (n_1{d_1}^2 + n_2{d_2}^2 + ….. n_k{d_k}^2)}{n_1 + n_2 + ….. n_k}} $$
$$ = \sqrt{\frac{\Sigma n_i{s_i}^2 + \Sigma n_i{d_i}^2}{\Sigma n_i}} $$
where the summation extends over all the distributions.
In the final formula, \(d_i = \bar{x_i} – \bar{x}\)
where, \(\bar{x_i}\) – the arithmetic mean of the i’th distribution
\(\bar{x}\) – the combined arithmetic mean of all distributions, given by – $$\bar{x} = \frac{n_1\bar{x_1} + n_2\bar{x_2} + … n_k\bar{x_k}}{n_1 + n_2 + … n_k}$$ $$ = \frac{\Sigma n_i\bar{x_i}}{\Sigma n_i}$$
The Coefficient of Variation
Unlike the standard deviation which is an absolute measure of dispersion of a given distribution, the coefficient of variation is a useful quantity which is a relative measure of dispersion for the distribution. We can easily derive it from the standard deviation itself
We define, the Coefficient of variation as the ratio of the standard deviation and the Arithmetic Mean of a distribution as a percentage. It is unit-less and serves as a very useful quantity in the economic sector for relative risk assessment and comparison between two quantifiable data curves. It is represented as CV.
The formula for CV is – $$ CV (percent) = \frac{S.D.}{A.M.} \times 100 $$
Go through the following solved examples now for a better understanding of the topic.
Solved Example For You
Question – The length of 20 similar crystals is measured (in mm) in a chemistry experiment. Calculate the standard deviation and the coefficient of variation for the observations taken.
Crystal no. | Length (mm) | Crystal no. | Length (mm) |
1 | 9 | 11 | 7 |
2 | 2 | 12 | 4 |
3 | 5 | 13 | 12 |
4 | 4 | 14 | 5 |
5 | 12 | 15 | 4 |
6 | 7 | 16 | 10 |
7 | 8 | 17 | 9 |
8 | 11 | 18 | 6 |
9 | 9 | 19 | 9 |
10 | 3 | 20 | 4 |
Solution – We can construct the table as given below –
Crystal no. | x_{i} | (x_{i} – A) | (x_{i} – A)^{2} |
1 | 9 | 2 | 4 |
2 | 2 | -5 | 25 |
3 | 5 | -2 | 4 |
4 | 4 | -3 | 9 |
5 | 12 | 5 | 25 |
6 | 7 | 0 | 0 |
7 | 8 | 1 | 1 |
8 | 11 | 4 | 16 |
9 | 9 | 2 | 4 |
10 | 3 | -4 | 16 |
11 | 7 | 0 | 0 |
12 | 4 | -3 | 9 |
13 | 12 | 5 | 25 |
14 | 5 | -2 | 4 |
15 | 4 | -3 | 9 |
16 | 10 | 3 | 9 |
17 | 9 | 2 | 4 |
18 | 6 | -1 | 1 |
19 | 9 | 2 | 4 |
20 | 4 | -3 | 9 |
N = 20 | ∑ x_{i} = 140 | ∑ (x_{i} – A)^{2} = 178 | |
A = ∑ x_{i} / N = 140/20 = 7 mm |
Now, we may give the Standard Deviation as – $$ S.D. = \sqrt{\frac{\Sigma(x_i – A)^2}{N}} $$ $$ = \sqrt{\frac{178}{20}} $$ $$ = 2.9832 (mm) $$
We can calculate the coefficient of variation as – $$ C.V. = \frac{S.D.}{A} \times 100$$ $$ = \frac{2.9832}{7} \times 100 $$ $$ = 42.62 \text{ percent} $$
Similarly, we may find these quantities for other distributions.