When we make a distribution in which there is an involvement of more than one variable, then such an analysis is Regression Analysis. It generally focuses on finding or rather predicting the value of the variable that is dependent on the other. Let’s know more about regression.
Suggested Videos
Regression Lines
Let there be two variables: x & y. If y depends on x, then the result comes in the form of simple regression. Furthermore, we name the variables x and y as:
y – Regression or Dependent Variable or Explained Variable
x – Independent Variable or Predictor or Explanator
Therefore, if we use a simple linear regression model where y depends on x, then the regression line of y on x is:
y = a + bx
Browse more Topics under Correlation And Regression
- Scatter Diagram
- Karl Pearson’s Coefficient of Correlation
- Rank Correlation
- Probable Error and Probable Limits
Regression Coefficient
The two constants a and b are regression parameters. Furthermore, we denote the variable b as byx and we term it as regression coefficient of y on x.
Also, we can have one more definition for the regression line of y on x. We can call it the best fit as the result comes from least squares. This method is the most suitable method for finding the value of y on x i.e. the value of a dependent variable on an independent variable.
Least Squares Method
∑ ei2 = ∑ (yi – y ^ i)2 = ∑ (yi – a – bxi)2
Here, variable yi is the actual value or the observed value. Further, y ^ i = a + bxi, denotes the estimated value of yi for a given random value of a variable of xi; ei = Difference between observed and estimated value and is the error or residue. The regression line of y or x along with the estimation errors are as follows:
On minimizing the least squares equation, here is what we get. We refer to these equations Normal Equations.
∑yi = na + b ∑xi
∑xiyi = a ∑xi2 + b ∑xi
We get the least squares estimate for a and b by solving the above two equations for both a and b.
b = Cov(x,y)/Sx2
= (r.SxSy)/Sx2
= (r.Sy)/Sx
The estimate of a, after the estimation of b is:
a = \( \bar{y} \) – b\( \bar{x} \)
On substituting the estimates of a and b is:
[ y – \( \bar{y} \) ]/Sy = r[ x – \( \bar{x} \) ]/Sx
Sometimes, it might so happen that variable x depends on variable y. In such cases, the line of regression of x on y is:
x = a ^ + b^y
Regression Equation
The standard form of the regression equation of variable x on y is:
[ x – \( \bar{x} \) ]/Sx = r[ y – \( \bar{y} \) ]/Sy
Properties of Regression Lines
Here are some of the important properties of regression lines.
- The value of the regression coefficient doesn’t change. This is because of the shifting of the origin. The change takes place because of the change of scale. According to the property, if the variables (x,y) which are the original variables changes to (u,v), then:
u = (x – a)/p
v = (y – c)/q
byx = \( \frac{q}{p} \) × bvu
Also,
bxy = \( \frac{p}{q} \) × buv
- There are two lines of regression. Both these lines are known to intersect at a specific point [\( \bar{x} \), \( \bar{y} \)]. Here the variables under consideration are x and y. As per this property, the intersection of both the lines of regression i.e. of y on x and x on y is [\( \bar{x} \), \( \bar{y} \)]. Hence, this is the solution for both the equations of x and y.
- The correlation coefficient between the two variables i.e. x and y is the GM (geometric mean) of both the coefficients. The sign over the values of correlation coefficients will be a common sign of both the regression coefficients. According to this property, if we denote the regression coefficients as byx (=b) and bxy (=b’), then the correlation coefficient is:
r = ± \( \sqrt{b_{yx} + b_{xy}} \)
Hence, in a case, where both these coefficients give negative value, then ‘r’ will be negative as well. However, if both the values of coefficients are positive, then ‘r’ will be a positive value.
Solved Question on Regression
Question: Given here is the relationship between two variables, x and u where u + 3x = 10. Similarly, the relationship between the other two variables, y, and v where 2y + 5v = 25. The coefficient of y on x is 0.80. Furthermore, what will be the coefficient on v on u?
Solution: Given that,
u + 3x = 10
u = \( \frac{x – \frac{10}{3} }{ \frac{-1}{3} } \)
Also,
2y + 5v = 25
v = \( \frac{y – \frac{25}{2}}{ \frac{-5}{2} } \)
We know,
\( b_{yx} \) = \( \frac{q}{p} \) × \( b_{vu} \)
0.80 = \( \frac{-2.5}{-0.33} \) × \( b_{vu} \)
0.80 = 7.5 × \( b_{vu} \)
\( b_{vu} \) = 0.133 × 0.80 = 8/75
Question: The regression equation for variables x and y are 7x – 3y – 18 = 0 and 4x – y – 11 = 0.
- What is the AM for x and y?
- Find the correlation coefficient in between x and y.
Solution:
(i) The intersection of two lines have the same intersection point and that is [\( \bar{x} \), \( \bar{y} \)]. Therefore, we replace, x and y with \( \bar{x} \) and \( \bar{y} \)
7x – 3y = 18
4x – y = 11
Hence, on solving these two equations we get \( \bar{x} \) = 3 and \( \bar{y} \) = 1.
(ii) We know,
r2 = 7/12
Therefore,
r = \( \sqrt{ \frac{7}{12}} \) (r is positive as both the coefficients are positive)
= 0.7638