Correlation and Regression

Regression Lines, Regression Equations and Regression Coefficients

When we make a distribution in which there is an involvement of more than one variable, then such an analysis is Regression Analysis. It generally focuses on finding or rather predicting the value of the variable that is dependent on the other. Let’s know more about regression.

Regression

Suggested Videos

Play
Play
Play
Play
previous arrow
next arrow
previous arrownext arrow
Slider

 

Regression Lines

Let there be two variables: x & y. If y depends on x, then the result comes in the form of simple regression. Furthermore, we name the variables x and y as:

y – Regression or Dependent Variable or Explained Variable
x – Independent Variable or Predictor or Explanator

Therefore, if we use a simple linear regression model where y depends on x, then the regression line of y on x is:

y = a + bx

Browse more Topics under Correlation And Regression

Regression Coefficient

The two constants a and b are regression parameters. Furthermore, we denote the variable b as byx and we term it as regression coefficient of y on x.

Also, we can have one more definition for the regression line of y on x. We can call it the best fit as the result comes from least squares. This method is the most suitable method for finding the value of y on x i.e. the value of a dependent variable on an independent variable.

Least Squares Method

∑ ei2 = ∑ (yi – y ^ i)2 = ∑ (yi – a – bxi)2

Here, variable yi is the actual value or the observed value. Further,  y ^ i = a + bxi, denotes the estimated value of yi for a given random value of a variable of xi; ei  = Difference between observed and estimated value and is the error or residue. The regression line of y or x along with the estimation errors are as follows:

Regression

On minimizing the least squares equation, here is what we get. We refer to these equations Normal Equations.

∑yi = na + b ∑xi
∑xiyi = a ∑xi2 + b ∑xi

We get the least squares estimate for a and b by solving the above two equations for both a and b.

b = Cov(x,y)/Sx2
= (r.SxSy)/Sx2
= (r.Sy)/Sx

The estimate of a,  after the estimation of b is:

a = \( \bar{y} \) – b\( \bar{x} \)

On substituting the estimates of a  and b is:

[ y – \( \bar{y} \) ]/Sy = r[ x – \( \bar{x} \) ]/Sx

Sometimes, it might so happen that variable x depends on variable y. In such cases, the line of regression of x on y is:

x = a ^ + b^y

Regression Equation

The standard form of the regression equation of variable x on y is:

 [ x – \( \bar{x} \) ]/Sx = r[ y – \( \bar{y} \) ]/Sy

Properties of Regression Lines

Here are some of the important properties of regression lines.

  • The value of the regression coefficient doesn’t change. This is because of the shifting of the origin. The change takes place because of the change of scale. According to the property,  if the variables (x,y) which are the original variables changes to (u,v), then:

u = (x – a)/p
v = (y – c)/q

byx = \( \frac{q}{p} \) × bvu

Also,

bxy = \( \frac{p}{q} \) × buv

  •  There are two lines of regression. Both these lines are known to intersect at a specific point [\( \bar{x} \), \( \bar{y} \)]. Here the variables under consideration are x and y. As per this property, the intersection of both the lines of regression i.e. of y on x and x on y is  [\( \bar{x} \), \( \bar{y} \)]. Hence, this is the solution for both the equations of x and y.
  • The correlation coefficient between the two variables i.e. x and y is the GM (geometric mean) of both the coefficients. The sign over the values of correlation coefficients will be a common sign of both the regression coefficients. According to this property, if we denote the regression coefficients as byx (=b) and bxy (=b’), then the correlation coefficient is:

r  = ± \( \sqrt{b_{yx} + b_{xy}} \)

Hence, in a case, where both these coefficients give negative value, then ‘r’ will be negative as well. However, if both the values of coefficients are positive, then ‘r’ will be a positive value.

Solved Question on Regression

Question: Given here is the relationship between two variables, x and u where u + 3x = 10. Similarly, the relationship between the other two variables, y, and v where 2y + 5v = 25. The coefficient of y on x is 0.80. Furthermore, what will be the coefficient on v on u?

Solution: Given that,

u + 3x = 10
u = \( \frac{x – \frac{10}{3}  }{ \frac{-1}{3} } \)

Also,

2y + 5v = 25
v = \( \frac{y –  \frac{25}{2}}{ \frac{-5}{2} } \)

We know,

\( b_{yx} \) = \( \frac{q}{p} \) × \( b_{vu} \)
0.80 = \( \frac{-2.5}{-0.33} \) × \( b_{vu} \)
0.80 = 7.5 × \( b_{vu} \)
\( b_{vu} \) = 0.133 × 0.80 = 8/75

Question: The regression equation for variables x and y are 7x – 3y – 18 = 0 and 4x – y – 11 = 0.

  1. What is the AM for x and y?
  2. Find the correlation coefficient in between x and y.

Solution: 

(i) The intersection of two lines have the same intersection point and that is [\( \bar{x} \), \( \bar{y} \)]. Therefore, we replace, x and y with \( \bar{x} \) and \( \bar{y} \)

7x – 3y = 18
4x – y = 11

Hence, on solving these two equations we get \( \bar{x} \) = 3 and \( \bar{y} \) = 1.

(ii) We know,

r2 = 7/12

Therefore,

r = \( \sqrt{ \frac{7}{12}} \) (r is positive as both the coefficients are positive)
= 0.7638

Share with friends

Customize your course in 30 seconds

Which class are you in?
5th
6th
7th
8th
9th
10th
11th
12th
Get ready for all-new Live Classes!
Now learn Live with India's best teachers. Join courses with the best schedule and enjoy fun and interactive classes.
tutor
tutor
Ashhar Firdausi
IIT Roorkee
Biology
tutor
tutor
Dr. Nazma Shaik
VTU
Chemistry
tutor
tutor
Gaurav Tiwari
APJAKTU
Physics
Get Started

Leave a Reply

Your email address will not be published. Required fields are marked *

Download the App

Watch lectures, practise questions and take tests on the go.

Customize your course in 30 seconds

No thanks.