Scatter Diagrams are convenient mathematical tools to study the correlation between two random variables. As the name suggests, they are a form of a sheet of paper upon which the data points corresponding to the variables of interest, are scattered. Judging by the shape of the pattern that the data points form on this sheet of paper, we can determine the association between the two variables, and can further apply the best suitable correlation analysis technique.
Interpretation of Scatter Diagrams
The Scatter Diagrams between two random variables feature the variables as their x and y-axes. We can take any variable as the independent variable in such a case (the other variable being the dependent one), and correspondingly plot every data point on the graph (xi,yi ). The totality of all the plotted points forms the scatter diagram.
Based on the different shapes the scatter plot may assume, we can draw different inferences. We can calculate a coefficient of correlation for the given data. It is a quantitative measure of the association of the random variables. Its value is always less than 1, and it may be positive or negative.
In the case of a positive correlation, the plotted points are distributed from lower left corner to upper right corner (in the general pattern of being evenly spread about a straight line with a positive slope), and in the case of a negative correlation, the plotted points are spread out about a straight line of a negative slope) from upper left to lower right.
If the points are randomly distributed in space, or almost equally distributed at every location without depicting any particular pattern, it is the case of a very small correlation, tending to 0.
Browse more Topics under Correlation And Regression
- Karl Pearson’s Coefficient of Correlation
- Rank Correlation
- Probable Error and Probable Limits
- Regression Lines, Regression Equations and Regression Coefficient
Types of Patterns
Now, look at the different possible scenarios of the patterns formed in the scatter diagrams, with their corresponding coefficients of correlation values mentioned with them, below and try to make sense of them.
Source – Wikipedia
It is clear that the case of r = 0 may occur in many forms. Some such factors include the symmetry of the pattern around a particular point, the general randomness of the points etc. Note that the scatter diagram by itself doesn’t assign quantitative values as measures of correlation for the plots. It simply gives an idea of what association to expect between the random variables of interest.
Now go through the solved example below, to understand how to make your own scatter plots and analyze them.
Solved Examples on Scatter Diagram
Question: Draw the scatter diagram for the given pair of variables and understand the type of correlation between them.
|No. of Students||Marks obtained (out of 100)|
Here, we take the two variables for consideration as:
M: The marks obtained out of 100
S: Number of students
Since the values of M is in the form of bins, we can use the centre point of each class in the scatter diagram instead. So let us first choose the axes of our diagram.
X-axis – Marks obtained out of 100
Y-axis – Number of Students
The data points that we need to plot according to the given dataset are –
(45,12), (55,10), (65,8), (75,7), (85,5), (95,2)
Here’s how the plot will look like –
From the shape of the curve, clearly, only a fewer number of students get high marks. This implies a negative correlation between the two variables we have considered here; which is a bit obvious for example you can look at your own class.