Do Bond Returns Predict Stock Returns? Granger Causality Test in Python
June 28, 2026
What's the question?
A longstanding question in macro finance is whether bond market movements contain information about future equity returns. The logic is intuitive: bonds reflect expectations about interest rates, inflation, and credit conditions. If those expectations shift before equity markets react, then lagged bond returns would carry predictive power for stock returns.
The Granger causality test provides a formal statistical framework for this question. Developed by Clive Granger in 1969, the test asks whether the past values of one time series improve the forecast of another, beyond what the second series' own history can explain. Importantly, "Granger causality" does not imply true causation. It measures temporal precedence and incremental predictive content. If lagged bond returns significantly reduce the forecast error for stock returns, bonds are said to Granger-cause stocks.
The test works by estimating two models. The restricted model regresses stock returns on their own lags only. The unrestricted model adds lagged bond returns as additional regressors. An F-test then compares the residual sum of squares. A low p-value (below 0.05) indicates that the bond return lags contribute statistically significant explanatory power.
The approach
The test uses monthly returns for five ETFs spanning the fixed-income spectrum: TLT (20+ year Treasury bonds), IEF (7-10 year Treasury bonds), SHY (1-3 year Treasury bonds), and AGG (broad aggregate bond index), with SPY representing the equity market.
- Pull 7 years of monthly closing prices for all five ETFs
- Compute monthly returns from closing prices
- Run the Granger causality test in both directions: bond returns predicting stock returns, and stock returns predicting bond returns
- Test lag orders 1 through 4 (1 to 4 months of history) to check whether predictive content emerges at longer horizons
Duration diversity matters. If the yield curve transmits information to equity markets, the effect may appear at different lag structures for long-duration bonds (TLT) versus short-duration instruments (SHY).
Code
import xfinlink as xfl
import pandas as pd
from statsmodels.tsa.stattools import grangercausalitytests
xfl.set_api_key("YOUR_API_KEY") # free at https://xfinlink.com/signup
tickers = ["SPY", "TLT", "IEF", "SHY", "AGG"]
df = xfl.prices(tickers, period="7y", interval="1mo", fields=["close"])
returns = {}
for t in tickers:
sub = df[df["ticker"] == t].sort_values("date").copy()
sub["return"] = sub["close"].pct_change()
sub = sub.dropna(subset=["return"])
returns[t] = sub[["date", "return"]].set_index("date")
returns[t].columns = [t]
merged = returns["SPY"]
for t in tickers[1:]:
merged = merged.join(returns[t], how="inner")
bond_etfs = ["TLT", "IEF", "SHY", "AGG"]
for bond in bond_etfs:
test_data = merged[["SPY", bond]].copy()
gc = grangercausalitytests(test_data, maxlag=4, verbose=False)
for lag in range(1, 5):
f_stat = gc[lag][0]["ssr_ftest"][0]
p_val = gc[lag][0]["ssr_ftest"][1]
sig = "YES" if p_val < 0.05 else "no"
print(f"{bond:<6} lag={lag} F={f_stat:.3f} p={p_val:.4f} {sig}")
Full script with formatting and visualisation: bond-returns-granger-causality-stock-python.py
Output
Observation period: 2019-08 to 2026-06
Monthly observations: 83
Bond ETF Lag F-stat p-value Significant?
-------------------------------------------------------
TLT 1 2.416 0.1241 no
TLT 2 1.898 0.1569 no
TLT 3 1.539 0.2117 no
TLT 4 1.136 0.3467 no
IEF 1 2.364 0.1282 no
IEF 2 1.846 0.1649 no
IEF 3 2.466 0.0689 no
IEF 4 1.708 0.1579 no
SHY 1 0.974 0.3268 no
SHY 2 1.455 0.2399 no
SHY 3 4.564 0.0055 YES
SHY 4 3.307 0.0154 YES
AGG 1 3.198 0.0775 no
AGG 2 2.788 0.0679 no
AGG 3 2.586 0.0596 no
AGG 4 1.837 0.1314 no
Reverse test: Do stock returns Granger-cause bond returns?
Bond ETF Lag F-stat p-value Significant?
-------------------------------------------------------
TLT 1 0.072 0.7898 no
TLT 2 0.129 0.8791 no
TLT 3 0.188 0.9043 no
TLT 4 0.151 0.9618 no
IEF 1 0.458 0.5004 no
IEF 2 0.231 0.7942 no
IEF 3 0.242 0.8666 no
IEF 4 0.170 0.9528 no
SHY 1 1.778 0.1863 no
SHY 2 0.970 0.3836 no
SHY 3 0.541 0.6559 no
SHY 4 0.379 0.8228 no
AGG 1 0.056 0.8143 no
AGG 2 0.165 0.8482 no
AGG 3 0.359 0.7827 no
AGG 4 0.295 0.8806 no
What this tells us
Long-duration Treasuries (TLT, IEF) and the broad bond aggregate (AGG) do not Granger-cause equity returns at any lag tested. P-values range from 0.07 to 0.35 — none crossing the 0.05 threshold. The bond market's most volatile and widely watched segment, the long end of the yield curve, does not contain statistically significant predictive information for monthly stock returns over this sample.
The exception is SHY, the 1-3 year Treasury ETF, which achieves significance at lag 3 (p = 0.006) and lag 4 (p = 0.015). Short-term Treasuries are the segment of the curve most directly influenced by Federal Reserve policy expectations. A shift in SHY returns reflects changing expectations for near-term fed funds rate decisions. The 3-4 month lag aligns with the typical horizon over which monetary policy expectations crystallize and propagate into risk asset repricing.
The reverse test is uniformly insignificant. Equity returns do not Granger-cause bond returns at any lag or for any bond ETF. The relationship, to the extent it exists, is one-directional: short-term rate expectations may lead equities, but equities do not lead bonds.
AGG approaches significance at lags 1 and 2 (p-values of 0.08 and 0.07) without crossing the threshold. AGG blends duration exposures, diluting whatever signal SHY carries at longer lags while picking up partial sensitivity to rate expectations through its short-duration allocation.
So what?
For portfolio managers and systematic traders, the results suggest that monitoring short-term Treasury returns — not the long bond — may offer a marginal timing edge for equity allocation decisions at the quarterly horizon. SHY captures the market's real-time pricing of near-term monetary policy, and that pricing appears to lead stock market moves by 3 to 4 months.
This does not mean building a trading strategy on the SHY signal alone. With only 83 observations and two significant lags out of 16 tests, the finding is suggestive rather than definitive, and multiple testing inflates the chance of a false positive. The practical application is as a supplementary input: if short-term bond returns have been consistently positive (rates falling, rate cuts expected) over the past quarter, it is a tailwind for equities. If short-term bonds have sold off (rates rising, tightening expected), the headwind may take another quarter to arrive in stock prices.
The absence of long-bond predictive power is itself informative. Duration risk in TLT and IEF is driven by term premium and inflation expectations — factors that the equity market prices concurrently rather than with a lag. The short end, by contrast, carries cleaner monetary policy information that takes time to filter through credit conditions, corporate earnings expectations, and investor positioning.
Built with xfinlink — free financial data API for Python. pip install xfinlink
pip install xfinlink