Correlation
Correlation is a statistical measure that describes the extent to which two variables move together. It does NOT establish cause and effect — only the degree of association.
Meaning of Correlation
Positive Correlation: Both variables move in the same direction — when one increases, the other also increases. Example: Income and expenditure; rainfall and crop yield.
Negative Correlation: Variables move in opposite directions — when one increases, the other decreases. Example: Price and demand; temperature and sales of heaters.
Zero Correlation: No systematic relationship between the variables.
Perfect Positive Correlation: r = +1
Perfect Negative Correlation: r = -1
No Correlation: r = 0
The value of r always lies between -1 and +1: -1 ≤ r ≤ +1.
Methods of Measuring Correlation
1. Scatter Diagram (Scatter Plot):
A visual method. Plot pairs of (X, Y) values on a graph. The pattern of dots indicates the direction and strength of correlation. No calculation involved — only a rough visual indicator.
2. Karl Pearson's Coefficient of Correlation (r):
r = Sum((X - X-bar)(Y - Y-bar)) / √(Sum(X - X-bar)2 × Sum(Y - Y-bar)2)
Also written as:
r = Sum(dx × dy) / √(Sum(dx2) × Sum(dy2))
where dx = X - X-bar and dy = Y - Y-bar.
Assumed Mean Method:
r = (N × Sum(dx × dy) - Sum(dx) × Sum(dy)) / √((N × Sum(dx2) - (Sum(dx))2) × (N × Sum(dy2) - (Sum(dy))2))
r is a pure number — no units. It is always between -1 and +1.
3. Spearman's Rank Correlation (rs):
Used when data is in ranks or when the data is not normally distributed.
rs = 1 - (6 × Sum(D2)) / (N × (N2 - 1))
where D = difference between ranks of corresponding values, N = number of pairs.
When ranks are repeated (tied ranks), the average of the ranks is assigned to the tied values, and a correction factor is added.
Interpretation of r
| r value | Interpretation |
|---------|----------------|
| r = +1 | Perfect positive correlation |
| 0.75 to 0.99 | High positive correlation |
| 0.25 to 0.74 | Moderate positive correlation |
| 0 to 0.24 | Low positive correlation |
| r = 0 | No correlation |
| Negative values | Negative correlation (same degrees) |
Worked Examples
State the type of correlation: As the price of a good rises, the quantity demanded falls.
Negative correlation — price and demand move in opposite directions.
Given N=5, Sum(dx × dy) = 40, Sum(dx2) = 50, Sum(dy2) = 50.
r = 40 / √(50 × 50) = 40 / 50 = 0.8 (high positive correlation)
Calculate Spearman's rank correlation for N=4, with D values = 1, -1, 2, -2.
Sum(D2) = 1 + 1 + 4 + 4 = 10
rs = 1 - (6 × 10) / (4 × (16 - 1)) = 1 - 60/60 = 1 - 1 = 0
Two students are ranked in Mathematics and Science:
Math ranks: 1, 2, 3; Science ranks: 3, 2, 1. Find D: (1-3)=-2, (2-2)=0, (3-1)=2. D2: 4, 0, 4. Sum(D2) = 8.
rs = 1 - (6×8) / (3×(9-1)) = 1 - 48/24 = 1 - 2 = -1 (perfect negative correlation)
Interpret r = +0.92.
This indicates a high positive correlation between the two variables — they move strongly in the same direction.
In a scatter diagram, points cluster along a line sloping downward from left to right. What does this suggest?
Negative correlation — as X increases, Y decreases.
Why does correlation not imply causation? Give an example.
Ice cream sales and drowning deaths are positively correlated — both rise in summer. But ice cream does not cause drowning; the hidden cause is hot weather (a lurking variable).
Common mistakes
Common mistakes
Concluding that correlation proves causation. Also, Pearson's r is only valid for linear relationships. Spearman's rank correlation is appropriate when data is ordinal or when the relationship is monotonic but not necessarily linear.
Summary
Correlation measures the strength and direction of association between two variables. r ranges from -1 to +1. Methods include scatter diagrams, Pearson's coefficient, and Spearman's rank correlation. High correlation does not mean one variable causes the other.