Correlation and Regression

Regression Lines, Regression Equations and Regression Coefficients


When we make a distribution in which there is an involvement of more than one variable, then such an analysis is Regression Analysis. It generally focuses on finding or rather predicting the value of the variable that is dependent on the other. Let’s know more about regression.


Suggested Videos

previous arrow
next arrow
previous arrownext arrow


Regression Lines

Let there be two variables: x & y. If y depends on x, then the result comes in the form of simple regression. Furthermore, we name the variables and as:

y – Regression or Dependent Variable or Explained Variable
x – Independent Variable or Predictor or Explanator

Therefore, if we use a simple linear regression model where depends on x, then the regression line of on x is:

y = a + bx

Browse more Topics under Correlation And Regression

Regression Coefficient

The two constants and are regression parameters. Furthermore, we denote the variable as byx and we term it as regression coefficient of y on x.

Also, we can have one more definition for the regression line of on x. We can call it the best fit as the result comes from least squares. This method is the most suitable method for finding the value of on i.e. the value of a dependent variable on an independent variable.

Least Squares Method

∑ ei= ∑ (y– y ^ i)2 = ∑ (y– a – bxi)2

Here, variable yis the actual value or the observed value. Further,  y ^ i = a + bxi, denotes the estimated value of yfor a given random value of a variable of xi; e = Difference between observed and estimated value and is the error or residue. The regression line of y or x along with the estimation errors are as follows:


On minimizing the least squares equation, here is what we get. We refer to these equations Normal Equations.

∑y= na + b ∑xi
∑xiy= a ∑xi+ b ∑xi

We get the least squares estimate for a and by solving the above two equations for both and b.

b = Cov(x,y)/Sx2
= (r.SxSy)/Sx2
= (r.Sy)/Sx

The estimate of a,  after the estimation of is:

a = \( \bar{y} \) – b\( \bar{x} \)

On substituting the estimates of a  and b is:

[ y – \( \bar{y} \) ]/S= r[ x – \( \bar{x} \) ]/Sx

Sometimes, it might so happen that variable depends on variable y. In such cases, the line of regression of x on y is:

xa ^ + b^y

Regression Equation

The standard form of the regression equation of variable on is:

 [ x – \( \bar{x} \) ]/S= r[ y – \( \bar{y} \) ]/Sy

Properties of Regression Lines

Here are some of the important properties of regression lines.

  • The value of the regression coefficient doesn’t change. This is because of the shifting of the origin. The change takes place because of the change of scale. According to the property,  if the variables (x,y) which are the original variables changes to (u,v), then:

u = (x – a)/p
v = (y – c)/q

byx = \( \frac{q}{p} \) × bvu


bxy = \( \frac{p}{q} \) × buv

  •  There are two lines of regression. Both these lines are known to intersect at a specific point [\( \bar{x} \), \( \bar{y} \)]. Here the variables under consideration are x and y. As per this property, the intersection of both the lines of regression i.e. of y on x and x on y is  [\( \bar{x} \), \( \bar{y} \)]. Hence, this is the solution for both the equations of x and y.
  • The correlation coefficient between the two variables i.e. x and y is the GM (geometric mean) of both the coefficients. The sign over the values of correlation coefficients will be a common sign of both the regression coefficients. According to this property, if we denote the regression coefficients as byx (=b) and bxy (=b’), then the correlation coefficient is:

r  = ± \( \sqrt{b_{yx} + b_{xy}} \)

Hence, in a case, where both these coefficients give negative value, then ‘r’ will be negative as well. However, if both the values of coefficients are positive, then ‘r’ will be a positive value.

Solved Question on Regression

Question: Given here is the relationship between two variables, x and u where u + 3x = 10. Similarly, the relationship between the other two variables, y, and v where 2y + 5v = 25. The coefficient of y on x is 0.80. Furthermore, what will be the coefficient on v on u?

Solution: Given that,

u + 3x = 10
u = \( \frac{x – \frac{10}{3}  }{ \frac{-1}{3} } \)


2y + 5v = 25
v = \( \frac{y –  \frac{25}{2}}{ \frac{-5}{2} } \)

We know,

\( b_{yx} \) = \( \frac{q}{p} \) × \( b_{vu} \)
0.80 = \( \frac{-2.5}{-0.33} \) × \( b_{vu} \)
0.80 = 7.5 × \( b_{vu} \)
\( b_{vu} \) = 0.133 × 0.80 = 8/75

Question: The regression equation for variables x and y are 7x – 3y – 18 = 0 and 4x – y – 11 = 0.

  1. What is the AM for x and y?
  2. Find the correlation coefficient in between x and y.


(i) The intersection of two lines have the same intersection point and that is [\( \bar{x} \), \( \bar{y} \)]. Therefore, we replace, x and y with \( \bar{x} \) and \( \bar{y} \)

7x – 3y = 18
4x – y = 11

Hence, on solving these two equations we get \( \bar{x} \) = 3 and \( \bar{y} \) = 1.

(ii) We know,

r2 = 7/12


r = \( \sqrt{ \frac{7}{12}} \) (r is positive as both the coefficients are positive)
= 0.7638

Share with friends
Customize your course in 30 seconds

Which class are you in?

Get ready for all-new Live Classes!
Now learn Live with India's best teachers. Join courses with the best schedule and enjoy fun and interactive classes.
Ashhar Firdausi
IIT Roorkee
Dr. Nazma Shaik
Gaurav Tiwari
Get Started

Download the App

Watch lectures, practise questions and take tests on the go.
Customize your course in 30 seconds

Which class are you in?

No thanks.