What Is the Pearson Correlation Coefficient? A Plain-English Guide
Updated 2026-06-11 · 7 min read
The Pearson correlation coefficient, written r, is a single number between −1 and +1 that measures how closely two sets of numbers move together. For investors it answers a practical question: when one ETF goes up, does the other tend to go up too, move the opposite way, or do its own thing? This guide explains what r is, how it is calculated, and how to read it without the jargon.
What the number actually means
Pearson's r captures the strength and direction of a linear relationship between two variables:
- +1 — perfect positive: the two move in lockstep, up together and down together.
- 0 — no linear relationship: knowing one tells you nothing about the other.
- −1 — perfect negative: when one rises, the other falls by a proportional amount.
Real-world pairs land somewhere in between. Two broad A-share ETFs might show r ≈ 0.9 (they rise and fall together), while an equity ETF and a government-bond ETF often sit near zero or slightly negative — which is exactly why investors pair them.
The formula, demystified
For two series x and y, Pearson's r is the covariance of x and y divided by the product of their standard deviations:
r = Σ(xᵢ − x̄)(yᵢ − ȳ) / √[ Σ(xᵢ − x̄)² · Σ(yᵢ − ȳ)² ]
In words: for each observation, measure how far x is from its average and how far y is from its average, multiply those two distances, and add them up. If x and y tend to be above (or below) their averages at the same time, the products are positive and r climbs toward +1. If one is high when the other is low, the products are negative and r falls toward −1. Dividing by the standard deviations rescales everything into the fixed −1 to +1 range, so the result doesn't depend on the units.
How strong is 'strong'? Reading the value
There's no universal law, but a common rule of thumb for the absolute value of r is:
- 0.8–1.0 — very strong
- 0.6–0.8 — strong
- 0.4–0.6 — moderate
- 0.2–0.4 — weak
- 0.0–0.2 — negligible
The sign matters as much as the size: r = −0.7 is just as strong a relationship as r = +0.7, only in the opposite direction. For diversification, a low or negative correlation is often the goal — see the diversification guide.
The big caveat: correlation is not causation
A high r tells you two series moved together; it does not tell you one caused the other. Both may be driven by a third factor (for A-share ETFs, that's often broad market sentiment or liquidity). Pearson's r also only sees linear relationships — two variables can be strongly related in a curved way and still show r near zero. Treat r as a useful summary, not the whole story.
Why investors compute r on returns, not prices
This is the single most common mistake. If you feed raw ETF prices into the formula, almost any two funds look highly correlated — simply because both prices drifted upward over the years. That shared trend swamps the real co-movement. The fix is to use daily returns (the percentage change from one day to the next). Returns strip out the long-term drift and reveal whether the funds actually move together day to day. Every result in this tool is computed on returns for exactly this reason.
FAQ
What is a good correlation coefficient?
It depends on your goal. To confirm two funds track the same thing, you want r near +1. To diversify a portfolio, you want low or negative correlation — closer to 0 or below — so the holdings don't all fall at once.
Can the Pearson correlation be exactly 1 or −1?
In theory yes, but with real market data it almost never happens. Values of ±1 require a perfect straight-line relationship. Even two ETFs tracking very similar indexes usually land around 0.95–0.99, not exactly 1.
Is Pearson correlation the same as R-squared?
They are related: R-squared is simply r multiplied by itself (r²). So an r of 0.8 corresponds to an R-squared of 0.64, meaning about 64% of one variable's variation is explained linearly by the other.