Are KO and PEP Cointegrated? Pairs Trading Signal Construction in Python
June 28, 2026
What's the question?
Pairs trading depends on a specific statistical property: cointegration. Two stock prices are cointegrated if a linear combination of them is stationary — meaning the spread between them fluctuates around a fixed mean rather than drifting permanently. This is stronger than correlation. Two stocks can be highly correlated in returns yet not cointegrated in prices, because correlation measures co-movement within a period while cointegration measures a long-run equilibrium that the prices are pulled back toward.
The Engle-Granger two-step procedure provides a direct test. First, regress one price series on the other using ordinary least squares to estimate the hedge ratio — the number of shares of stock A that offsets one share of stock B. Second, test whether the residuals (the spread) contain a unit root using the Augmented Dickey-Fuller test. If the spread is stationary (p-value below 0.05), the pair is cointegrated and a mean-reversion signal has a statistical foundation. If the spread has a unit root, there is no equilibrium to revert to, and trading the spread is no different from trading a random walk.
Three classic same-sector pairs serve as test cases: Coca-Cola and PepsiCo (beverages), Chevron and ExxonMobil (oil majors), and Home Depot and Lowe’s (home improvement). The common assumption is that competitors in the same industry should maintain a stable price relationship. The data may disagree.
The approach
- Pull 5 years of daily split-adjusted closing prices for all six stocks
- For each pair, run OLS regression to estimate the hedge ratio (beta coefficient)
- Compute the spread as the residual: stock B minus hedge ratio times stock A
- Run the ADF test on each spread to determine cointegration status
- Compute the z-score of each spread to generate a trading signal — values above +2 or below −2 indicate the spread has moved approximately two standard deviations from its mean
Code
import xfinlink as xfl
import pandas as pd
import numpy as np
from statsmodels.tsa.stattools import adfuller
from statsmodels.regression.linear_model import OLS
from statsmodels.tools import add_constant
xfl.set_api_key("YOUR_API_KEY") # free at https://xfinlink.com/signup
tickers = ["KO", "PEP", "CVX", "XOM", "HD", "LOW"]
df = xfl.prices(tickers, period="5y", fields=["adj_close"])
pairs = [("KO", "PEP"), ("CVX", "XOM"), ("HD", "LOW")]
for t1, t2 in pairs:
s1 = df[df["ticker"] == t1].sort_values("date").set_index("date")["adj_close"]
s2 = df[df["ticker"] == t2].sort_values("date").set_index("date")["adj_close"]
aligned = pd.concat([s1, s2], axis=1, keys=[t1, t2]).dropna()
X = add_constant(aligned[t1].values)
model = OLS(aligned[t2].values, X).fit()
hedge_ratio = model.params[1]
spread = aligned[t2].values - hedge_ratio * aligned[t1].values
adf_stat, p_value = adfuller(spread, autolag="AIC")[:2]
tag = "COINTEGRATED" if p_value < 0.05 else "NOT COINTEGRATED"
z = (spread - spread.mean()) / spread.std()
print(f"{t1}/{t2}: hedge={hedge_ratio:.4f} ADF={adf_stat:.4f} "
f"p={p_value:.4f} ({tag}) z_now={z[-1]:.2f}")
Full script with formatting and visualisation: pairs-cointegration-signal-python.py
Output
Pair Hedge Ratio ADF Stat p-value Status Z (now)
---------------------------------------------------------------------------
KO/PEP -0.8954 -3.4219 0.0102 COINTEGRATED -0.42
CVX/XOM 0.8880 -1.5055 0.5308 NOT COINTEGRATED 1.24
HD/LOW 0.5531 -3.0084 0.0341 COINTEGRATED -0.51
What this tells us
Two of the three pairs are cointegrated at the 5% significance level. KO/PEP (p = 0.010) and HD/LOW (p = 0.034) both reject the null hypothesis of a unit root in the spread. CVX/XOM (p = 0.531) does not.
The KO/PEP result contains a counterintuitive detail: the hedge ratio is negative (−0.8954). Over the past five years, Coca-Cola has risen approximately 54% while PepsiCo has declined slightly. Their price levels have moved in opposite directions despite being in the same industry. The negative hedge ratio reflects this divergence — the cointegration relationship is between PEP and the inverse of KO. The spread is still stationary, meaning the divergence oscillates rather than growing without bound. The chart confirms this: the KO/PEP z-score touches the ±2 bands and reverts, generating clear entry signals.
CVX/XOM tells a different story. Despite being the two largest U.S. oil majors, their spread has drifted persistently since 2022 — visible as the steady upward movement in the middle panel of the chart. The z-score sits at +1.24 and has been above zero for most of the last two years. The ADF statistic of −1.51 is far from the critical value needed for rejection. There is no statistical basis for mean-reversion trading on this pair at the five-year horizon.
HD/LOW is cointegrated with a hedge ratio of 0.5531, meaning roughly 0.55 shares of Home Depot for each share of Lowe’s. The z-score at −0.51 places the spread near fair value, offering no immediate entry signal, but the cointegration relationship validates the pair as a candidate for future signals.
So what?
The Engle-Granger test converts a subjective judgment — “these two stocks should trade together” — into a statistical claim with a measurable confidence level. Of three pairs that most practitioners would assume are cointegrated, one fails the test entirely. The CVX/XOM result is a direct warning against assuming that same-sector membership implies a stable spread.
For the two cointegrated pairs, the z-score provides an actionable signal. A z-score above +2 suggests the spread is extended and may contract (short the spread). A z-score below −2 suggests the opposite (long the spread). The current readings of −0.42 (KO/PEP) and −0.51 (HD/LOW) indicate neither pair is at an extreme, so there is no immediate trade — which is itself useful information.
One important limitation: the hedge ratio estimated over 5 years is not guaranteed to remain stable. Rolling the estimation window (for example, re-estimating every quarter using the trailing 2 years) and confirming that the cointegration relationship persists before entering each trade adds robustness. The test is not a one-time event. It is a recurring diagnostic that should be re-run before each entry and monitored during each position.
Built with xfinlink — free financial data API for Python. pip install xfinlink
pip install xfinlink