Which Stocks Actually Drive Portfolio Returns? Shapley Value Attribution in Python
June 28, 2026
What's the question?
When an equal-weight portfolio of 8 stocks returns 22%, how much of that return is attributable to each stock? The naive answer is simple: multiply each stock's individual return by its portfolio weight. A stock that returned 40% with a 12.5% weight contributed 5 percentage points.
This approach is wrong. It ignores interaction effects. A stock that rises 40% while the other seven fall contributes more than its weight suggests because it is the primary source of positive performance. Conversely, a stock that falls 25% while other stocks also fall has an amplified negative effect because it fails to diversify when diversification is needed most.
The Shapley value, a concept from cooperative game theory, provides a mathematically rigorous alternative. Developed by Lloyd Shapley in 1953, it assigns each player in a cooperative game a value equal to their average marginal contribution across all possible coalitions. In portfolio terms, the "game" is the portfolio's total return, and each stock is a "player." The Shapley value for stock i is computed by averaging its marginal contribution to every possible subset of the other stocks.
With 8 stocks, there are 28 = 256 possible coalitions, making exact computation feasible. The key property: Shapley values sum exactly to the total portfolio return, providing a complete and fair decomposition.
The approach
- Pull 1 year of daily returns for 8 stocks spanning technology (AAPL, MSFT, NVDA), financials (JPM), energy (XOM), healthcare (JNJ, UNH), and consumer staples (PG)
- Define the characteristic function v(S) as the compounded return of an equal-weight sub-portfolio of coalition S
- For each stock, compute the Shapley value by iterating over all 27 = 128 coalitions that exclude it, measuring the marginal contribution of adding it, and weighting by the combinatorial factor |S|!(n−|S|−1)!/n!
- Compare Shapley attribution against naive weight-times-return attribution to identify where the two methods diverge
Code
import xfinlink as xfl
import pandas as pd
import numpy as np
from itertools import combinations
from math import factorial
xfl.set_api_key("YOUR_API_KEY") # free at https://xfinlink.com/signup
tickers = ["AAPL", "MSFT", "NVDA", "JPM", "XOM", "JNJ", "PG", "UNH"]
n = len(tickers)
df = xfl.prices(tickers, period="1y", fields=["return_daily"])
ret = df.pivot_table(index="date", columns="ticker", values="return_daily").dropna()
weights = {t: 1.0 / n for t in tickers}
def coalition_return(members):
if not members:
return 0.0
daily = ret[list(members)].mean(axis=1)
return (1 + daily).prod() - 1
full_return = coalition_return(tickers)
shapley = {}
for i in tickers:
sv = 0.0
others = [t for t in tickers if t != i]
for size in range(0, n):
for S in combinations(others, size):
S_set = set(S)
marginal = coalition_return(list(S_set | {i})) - coalition_return(list(S_set))
weight = factorial(len(S_set)) * factorial(n - len(S_set) - 1) / factorial(n)
sv += weight * marginal
shapley[i] = sv
naive = {t: weights[t] * ((1 + ret[t]).prod() - 1) for t in tickers}
for t in sorted(shapley, key=shapley.get, reverse=True):
print(f"{t:>5} Shapley: {shapley[t]*100:>+7.2f}% "
f"Naive: {naive[t]*100:>+7.2f}% "
f"Diff: {(shapley[t]-naive[t])*100:>+6.2f} pp")
print(f"\n SUM Shapley: {sum(shapley.values())*100:>+7.2f}% "
f"Portfolio: {full_return*100:>+7.2f}%")
Full script with formatting and visualisation: shapley-value-portfolio-attribution-python.py
Output
JNJ Shapley: +18.46% Naive: +8.39% Diff: +10.08 pp
UNH Shapley: +10.21% Naive: +4.80% Diff: +5.41 pp
AAPL Shapley: +10.06% Naive: +5.14% Diff: +4.92 pp
XOM Shapley: +4.52% Naive: +3.10% Diff: +1.42 pp
NVDA Shapley: +3.93% Naive: +2.76% Diff: +1.18 pp
JPM Shapley: +0.34% Naive: +1.83% Diff: -1.48 pp
PG Shapley: -8.37% Naive: -0.85% Diff: -7.52 pp
MSFT Shapley: -16.44% Naive: -3.10% Diff: -13.34 pp
SUM Shapley: +22.73% Portfolio: +22.73%
What this tells us
The Shapley values diverge substantially from naive attribution. The most striking case is JNJ: naive attribution assigns it +8.39 percentage points (its 67% return times 12.5% weight), but the Shapley value is +18.46 percentage points, more than double. This is because JNJ's strong performance had low correlation with the rest of the portfolio. When JNJ joins any coalition, it both raises the return and improves diversification. Across all 128 coalitions, its marginal contribution consistently exceeds what weight-based attribution credits.
MSFT shows the opposite pattern. Its naive attribution is −3.10 percentage points (a −25% return scaled by weight), but the Shapley value is −16.44 percentage points. MSFT's negative return is amplified because it correlates positively with other technology names in the portfolio. When MSFT joins a coalition that already contains AAPL or NVDA, it adds correlated downside without diversification benefit. The Shapley framework correctly penalizes this redundancy.
PG presents a subtler case. Its naive contribution is only −0.85 percentage points (a modest −7% loss scaled by weight). But its Shapley value is −8.37 percentage points. Despite being a small standalone loss, PG fails to provide the defensive diversification typically expected from consumer staples. Its negative return during a period when the portfolio needed stability makes its true cost far larger than weight-based attribution suggests.
The Shapley values sum exactly to the portfolio's 22.73% return. The naive attributions sum to 22.07% — close but not exact, because the weight-times-return decomposition ignores compounding and rebalancing effects in a daily equal-weight portfolio.
So what?
Weight-based attribution is the industry default for portfolio performance analysis. This exercise demonstrates that it systematically understates the contribution of diversifying winners and understates the damage of correlated losers. The gap between Shapley and naive attribution is not a rounding error — it reaches 13 percentage points for MSFT and 10 percentage points for JNJ.
For portfolio construction, the implication is clear: a stock's value to a portfolio depends on what else is in the portfolio. JNJ's Shapley value of +18.46% versus its naive attribution of +8.39% quantifies the diversification premium. Conversely, MSFT's amplified negative Shapley value quantifies the cost of correlation concentration.
With 8 stocks, exact Shapley computation takes seconds. For larger portfolios, Monte Carlo sampling of random permutations provides accurate approximations. The mathematical guarantee that Shapley values sum to the total return makes them the only attribution method with a rigorous fairness axiom — no other decomposition simultaneously satisfies efficiency, symmetry, and additivity.
Built with xfinlink — free financial data API for Python. pip install xfinlink
pip install xfinlink