Theoretical Distributions

Normal Distribution – Basic Application

The Normal Distribution or more aptly, the Gaussian Distribution is the most important continuous probability distribution in statistics. A vast number of random variables of interest, in every physical science and economics, are either approximately or exactly described by the normal distribution. Moreover, it can also be used to approximate other probability distributions, thus justifying the usage of the word normal as in pertaining to the one that is mostly used.

Suggested Videos

Play
Play
Play
previous arrow
next arrow
previous arrownext arrow
Slider

 

The random variables which follow the normal distribution are ones whose values can assume any known value in a given range. For eg – The height of the students in the school in ft. It clearly can take any value, but would obviously be bounded in the range 0 to 9ft (say). This restriction is actually imposed physically in our problem.

The normal distribution, on the other hand, doesn’t even care about it. Your range can even extend to – ∞ to  + ∞ and you’ll still get a nice smooth curve. Such random variables are known as Continuous Variables, and the Normal Distribution then gives you the probability of your value being in a particular range for a given trial.

Browse more Topics under Theoretical Distributions

Definition

The Normal Distribution defines a probability density function f(x) for the continuous random variable X considered in the system. It is basically a function whose integral across an interval (say to x + dx) gives the probability of the random variable X taking the values between and x + dx.

Actually, since there will be infinite values between and x + dx, we don’t talk about the probability of taking an exact value x0 since it will be negligible. Thus, a range of x is considered, and a continuous probability density function is defined, with the following properties – $$f(x) \geq 0 \, \forall \, x \, \epsilon \, (-\infty,+\infty)$$ $$\int_{-\infty}^{+\infty}f(x) = 1$$

For a normal distribution of a random variable X with the mean = μ and the variance = σ2, f(x) takes the form –
$$f(x) = \frac{1}{\sigma \sqrt{2 \pi}}exp({-\frac{1}{2}(\frac{x – \mu}{\sigma})^2})$$
– σ is a positive constant.
– An equivalent shorthand representation – $$ X \sim N(\mu , {\sigma}^2)$$

Properties of the Normal Distribution

For a specific μ = 3 and a σ ranging from 1 to 3, the probability density function (P.D.F.) is as shown –

normal distribution

The following properties follow –

⇒ The distribution is symmetric about the point x = μ and has a characteristic bell-shaped curve with respect to it. Therefore, its skewness is equal to zero i.e. the curve is neither inclined to the right (negatively skewed) nor to the left (positively skewed).

⇒ The mean, median and the mode of a normal distribution, all coincide with each other and are equal to μ.

⇒ The Standard Deviation for this distribution is equal to σ.
Mean Deviation: σ√2
First Quartile: μ – 0.675σ and the Third Quartile: μ + 0.675σ
Thus, Quartile Deviation: 0.675

⇒ At x = μ ± σ, the function f(x) falls to e-1/2 ≈ 0.61 of its peak value. These points are the points of inflection, where d2f/dx2 = 0.

⇒ Additive Property: If two Normal Distributions \( X_1 \sim N(\mu_1,{\sigma_1}^2) \) and \( X_2 \sim N(\mu_2,{\sigma_2}^2) \) are added to give another random variable Y, then Y also obeys a Normal Distribution given by \( Y = X_1 + X_2 \sim N(\mu_1 + \mu_2,{\sigma_1}^2 + {\sigma_2}^2) \).

The Standard Form

Since the effect of changing the μ and σ is only to shift the curve along the x-axis or just broaden it or narrow it respectively. Thus, we can define a new random variable Z that would accommodate these changes in itself as –

\(Z = \frac{x – \mu}{\sigma}\)

Z is also known as the standardized normal variable or the normal deviate. In terms of this standard variable, the Normal Distribution gets reduced to the following form – $$\phi (z) = \frac{1}{\sqrt{2\pi}}exp(-\frac{z^2}{2})$$
This distribution has the parameters equal to μ = 0 and σ2 = 1. This we can say, \(Z \sim N(0,1)\).

The Cumulative Probability Function

For the general normal distribution, the cumulative probability function can be defined as – $$F(x) = Pr(X < x) = \frac{1}{\sigma \sqrt{2 \pi}}\int_{-\infty}^{x}exp({-\frac{1}{2}(\frac{x – \mu}{\sigma})^2})du $$ where u is the dummy integration variable.
However, this (indefinite) integral cannot be evaluated analytically. Therefore, for distributions with different parameters, the values of the definite integral will change.

For a way out, the values of the cumulative probability density function Φ(z) for the standard normal distribution are already tabulated in a table known as the Biometrika. Using them, the required values can be derived for any distribution by backtracing it from the standard form. The tabulated values – $$ \phi (z) = Pr(Z < z) = \frac{1}{\sqrt{2 \pi}}\int_{-\infty}^{z}exp({-\frac{u^2}{2}})du $$

normal distribution
Only the values of Φ(z) for z > 0 are tabulated because, by the symmetry of the distribution and the diagram above, we have: Φ(-z) = 1 – Φ(z). This way all the values can be found. Besides, the following properties are also useful in various types of problems – (For and constant)

Pr(Z< a) = Φ(a) = Pr(Z ≤ a)
Pr(Z > a) = 1 – Φ(a)
Pr(a < Z ≤ b) = Φ(b) – Φ(a)

Back-tracing Z = (x – μ)/σ, we can get the probability of the original variable X lying in a given range as well – $$F(x) = \phi(\frac{x – \mu}{\sigma})$$ $$ Pr(a \leq X \leq b) = \frac{1}{\sigma \sqrt{2 \pi}}\int_{-\infty}^{x}exp({-\frac{1}{2}(\frac{x – \mu}{\sigma})^2})du $$ $$ = F(b) – F(a) $$ $$ = \phi(\frac{a – \mu}{\sigma}) – \phi(\frac{b – \mu}{\sigma}) $$

The Spread of the Distribution

From the above discussion, we can deduce the general formula given below to estimate the spread of the standard normal distribution about the mean –

Pr(μ – nσ < X ≤ μ + nσ) = Pr(-n < Z ≤ n)
= Φ(n) – Φ(-n)
= Φ(n) – (1 – Φ(n))
= 2Φ(n) – 1

Thus, for the general cases n = 1,2,3, we have –

n Spread Probability
1 Pr(μ – σ < X ≤ μ + σ) = 2Φ(1) – 1 0.6826 ≈ 68.23%
2 Pr(μ – 2σ < X ≤ μ + 2σ) = 2Φ(2) – 1 0.9544 ≈ 95.44%
3 Pr(μ – 3σ < X ≤ μ + 3σ) = 2Φ(3) – 1 0.9974 ≈ 99.74%

normal distribution                                                                                                                                                        Source-Wikipedia

These limits on X are called the one-, two- and three-sigma limits respectively. Note that they are independent of the mean and the variance of a distribution. Besides, you can note that the probability of X lying beyond 3σ from the mean is 1 – 0.9974 = 0.0026 ≈ 0.26%. Thus, there is no point deviating too much from the mean if you need some significant values.

Now go through the solved examples below to get a better feel for the topic.

Solved Examples For you

Question 1

The time taken for a data file to travel from the source to the company is Gaussian distributed. If 6.8% of the files take over 200 ms, and 3.0% take under 140 ms to complete the journey, then find out the mean and standard deviation of the distribution.

Solution: Let T be the random variable denoting the journey time in ms. Given, X follows a normal distribution.

X ∼ N(μ,σ2), where μ and σ are unknown.

 From given data –

\(Pr(X > 200) = 1 – \phi(\frac{200 – \mu}{\sigma}) = 0.068\)
⇒ \( \phi(\frac{200 – \mu}{\sigma}) = 1 – 0.068 = 0.932\)

From the Biometrika tables,
⇒ \( \frac{200 – \mu}{\sigma} = 1.49\) ………………. (1)

 

\(Pr(X < 140) = \phi(\frac{140 – \mu}{\sigma}) = 0.030\)
\(\phi(\frac{\mu – 140}{\sigma}) = 1 – 0.030 = 0.970\)

From the Biometrika tables,
⇒ \( \frac{\mu – 140}{\sigma} = 1.88\) ……………….. (2)

Solving the simultaneous equations (1) and (2), we can get μ = 173.5 and σ = 17.8.

Question 2

An executive travels home from her office every evening. Her journey consists of a train ride followed by a bicycle ride. The time spent on the train is normally distributed with a mean 52 minutes and a standard deviation of 1.8 minutes, while the time spent on the bike is normally distributed with a mean 8 minutes and a standard deviation of 2.6 minutes. Assuming these two factors are independent, estimate the percentage of occasions on which the whole journey exceeds 65 minutes.

Solution: Let us define the random variables –

X – Time spent on train ∼ N(52,(1.8)2)
Y – Time spent on bike ∼ N(8,(2.6)2)

Since X and Y are independent, the total journey time can be defined as T = X + Y. Then,

T ∼ N(52 + 8, (1.8)2 + (2.6)2) = N(60,(3.16)2)

The standard variable for this distribution is thus –

\(Z = \frac{T – 60}{3.16}\)

Required probability can be calculated as –

Pr(T > 65) = Pr(Z > \frac{65 – 60}{3.16})
= Pr(Z > 1.58)
Using the Biometrika tables,
= 1 – 0.943
= 0.057

Thus, the total journey time exceeds 65 minutes at about 5.7% of occasions.

Share with friends

Customize your course in 30 seconds

Which class are you in?
5th
6th
7th
8th
9th
10th
11th
12th
Get ready for all-new Live Classes!
Now learn Live with India's best teachers. Join courses with the best schedule and enjoy fun and interactive classes.
tutor
tutor
Ashhar Firdausi
IIT Roorkee
Biology
tutor
tutor
Dr. Nazma Shaik
VTU
Chemistry
tutor
tutor
Gaurav Tiwari
APJAKTU
Physics
Get Started

One response to “Normal Distribution – Basic Application”

  1. Hood says:

    In problem two, the sum of standard deviations should be 4.4 not 3.16.

Leave a Reply

Your email address will not be published. Required fields are marked *

Download the App

Watch lectures, practise questions and take tests on the go.

Customize your course in 30 seconds

No thanks.