BLOG

Behind the numbers.

Code examples, market analysis, and data quality deep-dives.

Do Bond Returns Predict Stock Returns? Granger Causality Test in Python
Which Stocks Actually Drive Portfolio Returns? Shapley Value Attribution in Python
Does "Sell in May" Still Work? Calendar Anomaly Backtest in Python
How to Build Complete Price History Through Ticker Changes? Entity Resolution in Python
Are KO and PEP Cointegrated? Pairs Trading Signal Construction in Python
Which Commodities Have the Strongest Momentum? Rotation Backtest in Python
Which Commodity ETFs Have the Worst Tail Risk? Expected Shortfall in Python
Are Gold Miners Leveraged Gold Bets? Rolling Beta Analysis in Python
Does the Base-Metals-to-Gold Ratio Lead Cyclical Stocks? Signal Test in Python
Can Risk Parity Tame Commodity Volatility? Portfolio Optimization in Python
Are Power Stocks Becoming an AI Infrastructure Trade? Momentum Screening in Python
Which AI Chip Stocks Have Margin Momentum? Profitability Trend Analysis in Python
Which AI Stocks Are Cheapest Relative to Growth? Growth-Adjusted Valuation in Python
Does AI Stock Leadership Persist? Momentum Backtest in Python
Which AI Stocks Have the Cleanest Balance Sheets? Net Cash Screening in Python
Can Risk Parity Reduce Mega-Cap Drawdowns? Portfolio Optimization in Python
Which Growth Stocks Are Self-Funding? Cash-Flow Quality Screening in Python
Which Sectors Struggle When the Dollar Rallies? Sector Rotation Analysis in Python
Do Cheap Stocks Hold Up When Bonds Sell Off? Valuation Rotation in Python
Does the Nasdaq 100 Have Better Growth Quality Than the Dow? Index Constituent Analysis in Python
Do Healthcare Cash-Flow Margins Predict Returns? Signal Evaluation in Python
Which Dividend Stocks Survive a Cash-Flow Stress Test? Dividend Screening in Python
Does Heavy Insider Selling Predict Weak Returns? Insider Flow Test in Python
Can Quality Screens Reduce Small-Cap Balance-Sheet Risk? Russell 2000 Test in Python
Which Retailers Have Positive Operating Leverage? Margin Screening in Python
Is MSTR a Leveraged Bitcoin Proxy? Rolling Beta Analysis in Python
Is Micron's Memory Cycle Recovering? Inventory and Margin Forecasting in Python
Which Sectors Work When Bonds Rally? Rate-Sensitive Rotation in Python
Do One-Month Price Extremes Reverse? Signal Evaluation in Python
Do Low-Volatility S&P 500 Stocks Reduce Drawdowns? Factor Test in Python
Is AI Capex Paying Back Fast Enough? Revenue Hurdle Forecasting in Python
Could Shorter AI Asset Lives Hit Earnings? Depreciation Stress Test in Python
How Much AI Capex Risk Can a Portfolio Remove? Constrained Optimization in Python
Is the AI Capex Trade Crowded? Rolling Volatility and Sector Rotation in Python
Did the AI Boom Come From Existing S&P 500 Members? Point-in-Time Momentum Test in Python
Is AI Revenue Circular? Customer-Vendor Capex Loop Analysis in Python
Is the AI Trade Connected to Private Credit? Rolling Correlation Network in Python
Is Apollo More Balance-Sheet Sensitive Than Peers? Leverage Screen in Python
Are AI Earnings Supported by Cash Flow? Accrual and Capex Screen in Python
Can Defensive Stocks Hedge AI Drawdowns? Basket Regime Test in Python
How Fast Does the Market Price In Fed Decisions? FOMC Event Study in Python
How Much Are Options Sellers Overpaid? The Variance Risk Premium in Python
Which Companies Have the Worst Earnings Quality? Sloan Accrual Screen with Geographic Revenue Data in Python
Does the Oil-to-Gold Ratio Signal Recessions? XLE/GLD Backtest in Python
Is AI Spending Crowding Out Free Cash Flow? Capex Sustainability Across the Mag 7 in Python
Does a Long Energy / Short Bonds Portfolio Capture Inflation Surprises? Factor Construction in Python
Can a Hidden Markov Model Detect Oil Market Regimes? HMM Analysis in Python
Do Grain Prices Predict Food Inflation? Granger Causality Test in Python
Does the Corporate Credit Spread Predict Stock Market Crashes? BAA-AAA Spread Analysis in Python
Do Oil Stocks Hedge Inflation? Rolling Beta Analysis in Python
Which Stocks Are Most Rate-Sensitive? Equity Duration via Bond Beta in Python
Which Companies Have the Highest Accrual Ratios? Earnings Quality Screening in Python
Is Alpha Persistent or Decaying? Rolling Sharpe Ratio Analysis in Python
Are Markets Trending or Mean-Reverting? Hurst Exponent Analysis in Python
Is Consumer Discretionary vs Staples a Leading Indicator? XLY/XLP Ratio Analysis in Python
Does Heavy Capex Predict Future Stock Returns? Capital Expenditure Analysis in Python
How to Estimate Cost of Equity Using CAPM in Python
Is Volatility Predictable? Testing for Volatility Clustering in Python
Which Industrials Are Overleveraged? Net Debt to EBITDA Screening in Python
GM Before and After Bankruptcy: Why Entity Resolution Matters for Financial Data
What Is Adjusted Beta? Merrill Lynch Beta Shrinkage in Python
How Good Is a Stock Pick? Information Ratio and Tracking Error in Python
Do Stock Returns Follow a Normal Distribution? Testing for Fat Tails in Python
Which Large Caps Have the Highest Free Cash Flow Yield? FCF Screening in Python
Which Sectors Won Over 5 Years? Sector Rotation Analysis in Python
How to Forecast Stock Volatility with GARCH Models in Python
Are Stock Prices Mean-Reverting? Augmented Dickey-Fuller Test in Python
How to Calculate CAPM Alpha and Beta with Regression in Python
How to Compare Sector Sharpe Ratios and Sortino Ratios in Python
DELL: Why Stitching Historical Price Data Together Is Wrong
How to Analyze Drawdown and Recovery for Bank Stocks in Python
How to Screen SaaS Stocks by Revenue Growth and Cash Flow in Python
How to Screen REITs by Dividend Yield and Valuation in Python
How Correlated Are the Magnificent 7? Intra-Group Correlation in Python
AAPL vs XOM: Do Individual Stocks Have Seasonal Patterns?
How to Rank Large-Cap Stocks by Momentum in Python
How to Build a Multi-Endpoint Financial Dashboard in Python
How to Compare Volatility Across Energy Stocks in Python
How to Screen Healthcare Stocks by Valuation in Python
How to Build a Sector Correlation Matrix for Portfolio Diversification in Python
How to Find Oversold and Overbought Stocks Using Z-Scores in Python
How to Measure Earnings Quality: Cash Flow vs Net Income in Python
How to Build a Multi-Factor Stock Screen in Python (Value + Momentum + Quality)
How to Build a Simple DCF Model for Any Stock in Python
How to Screen Tech Stocks by Revenue Growth in Python
How to Screen Stocks by Balance Sheet Health in Python
Is "Sell in May" Real? SPY Monthly Seasonality Over 10 Years
How to Compare Sector Performance YTD Using Python
How to Track S&P 500 Additions and Removals Over Time in Python
How to Screen Dividend Stocks by Yield and Quality in Python
How to Calculate Max Drawdown and Recovery Time for Any Stock in Python
How to Compare Profitability Across Mega-Cap Tech Stocks in Python
Why Ticker Symbols Are Unreliable: The Recycling Problem Every Quant Should Know
How to Calculate and Compare Stock Volatility in Python
How to Screen Blue-Chip Stocks by P/E Ratio in Python
How to Track Companies Through Ticker Changes, Bankruptcies, and Renames in Python
S&P 500 Turnover: How Much the Index Has Changed Since 2010
How to Calculate Stock Beta and Correlation in Python
← All articles

Which Stocks Actually Drive Portfolio Returns? Shapley Value Attribution in Python

What's the question?

When an equal-weight portfolio of 8 stocks returns 22%, how much of that return is attributable to each stock? The naive answer is simple: multiply each stock's individual return by its portfolio weight. A stock that returned 40% with a 12.5% weight contributed 5 percentage points.

This approach is wrong. It ignores interaction effects. A stock that rises 40% while the other seven fall contributes more than its weight suggests because it is the primary source of positive performance. Conversely, a stock that falls 25% while other stocks also fall has an amplified negative effect because it fails to diversify when diversification is needed most.

The Shapley value, a concept from cooperative game theory, provides a mathematically rigorous alternative. Developed by Lloyd Shapley in 1953, it assigns each player in a cooperative game a value equal to their average marginal contribution across all possible coalitions. In portfolio terms, the "game" is the portfolio's total return, and each stock is a "player." The Shapley value for stock i is computed by averaging its marginal contribution to every possible subset of the other stocks.

With 8 stocks, there are 28 = 256 possible coalitions, making exact computation feasible. The key property: Shapley values sum exactly to the total portfolio return, providing a complete and fair decomposition.

The approach

  1. Pull 1 year of daily returns for 8 stocks spanning technology (AAPL, MSFT, NVDA), financials (JPM), energy (XOM), healthcare (JNJ, UNH), and consumer staples (PG)
  2. Define the characteristic function v(S) as the compounded return of an equal-weight sub-portfolio of coalition S
  3. For each stock, compute the Shapley value by iterating over all 27 = 128 coalitions that exclude it, measuring the marginal contribution of adding it, and weighting by the combinatorial factor |S|!(n−|S|−1)!/n!
  4. Compare Shapley attribution against naive weight-times-return attribution to identify where the two methods diverge

Code

import xfinlink as xfl
import pandas as pd
import numpy as np
from itertools import combinations
from math import factorial

xfl.set_api_key("YOUR_API_KEY")  # free at https://xfinlink.com/signup

tickers = ["AAPL", "MSFT", "NVDA", "JPM", "XOM", "JNJ", "PG", "UNH"]
n = len(tickers)

df = xfl.prices(tickers, period="1y", fields=["return_daily"])
ret = df.pivot_table(index="date", columns="ticker", values="return_daily").dropna()
weights = {t: 1.0 / n for t in tickers}

def coalition_return(members):
    if not members:
        return 0.0
    daily = ret[list(members)].mean(axis=1)
    return (1 + daily).prod() - 1

full_return = coalition_return(tickers)

shapley = {}
for i in tickers:
    sv = 0.0
    others = [t for t in tickers if t != i]
    for size in range(0, n):
        for S in combinations(others, size):
            S_set = set(S)
            marginal = coalition_return(list(S_set | {i})) - coalition_return(list(S_set))
            weight = factorial(len(S_set)) * factorial(n - len(S_set) - 1) / factorial(n)
            sv += weight * marginal
    shapley[i] = sv

naive = {t: weights[t] * ((1 + ret[t]).prod() - 1) for t in tickers}

for t in sorted(shapley, key=shapley.get, reverse=True):
    print(f"{t:>5}   Shapley: {shapley[t]*100:>+7.2f}%   "
          f"Naive: {naive[t]*100:>+7.2f}%   "
          f"Diff: {(shapley[t]-naive[t])*100:>+6.2f} pp")
print(f"\n  SUM   Shapley: {sum(shapley.values())*100:>+7.2f}%   "
      f"Portfolio: {full_return*100:>+7.2f}%")

Full script with formatting and visualisation: shapley-value-portfolio-attribution-python.py

Output

Grouped bar chart comparing Shapley value attribution versus naive weight-based attribution for 8 stocks
  JNJ   Shapley:  +18.46%   Naive:   +8.39%   Diff: +10.08 pp
  UNH   Shapley:  +10.21%   Naive:   +4.80%   Diff:  +5.41 pp
 AAPL   Shapley:  +10.06%   Naive:   +5.14%   Diff:  +4.92 pp
  XOM   Shapley:   +4.52%   Naive:   +3.10%   Diff:  +1.42 pp
 NVDA   Shapley:   +3.93%   Naive:   +2.76%   Diff:  +1.18 pp
  JPM   Shapley:   +0.34%   Naive:   +1.83%   Diff:  -1.48 pp
   PG   Shapley:   -8.37%   Naive:   -0.85%   Diff:  -7.52 pp
 MSFT   Shapley:  -16.44%   Naive:   -3.10%   Diff: -13.34 pp

  SUM   Shapley:  +22.73%   Portfolio:  +22.73%

What this tells us

The Shapley values diverge substantially from naive attribution. The most striking case is JNJ: naive attribution assigns it +8.39 percentage points (its 67% return times 12.5% weight), but the Shapley value is +18.46 percentage points, more than double. This is because JNJ's strong performance had low correlation with the rest of the portfolio. When JNJ joins any coalition, it both raises the return and improves diversification. Across all 128 coalitions, its marginal contribution consistently exceeds what weight-based attribution credits.

MSFT shows the opposite pattern. Its naive attribution is −3.10 percentage points (a −25% return scaled by weight), but the Shapley value is −16.44 percentage points. MSFT's negative return is amplified because it correlates positively with other technology names in the portfolio. When MSFT joins a coalition that already contains AAPL or NVDA, it adds correlated downside without diversification benefit. The Shapley framework correctly penalizes this redundancy.

PG presents a subtler case. Its naive contribution is only −0.85 percentage points (a modest −7% loss scaled by weight). But its Shapley value is −8.37 percentage points. Despite being a small standalone loss, PG fails to provide the defensive diversification typically expected from consumer staples. Its negative return during a period when the portfolio needed stability makes its true cost far larger than weight-based attribution suggests.

The Shapley values sum exactly to the portfolio's 22.73% return. The naive attributions sum to 22.07% — close but not exact, because the weight-times-return decomposition ignores compounding and rebalancing effects in a daily equal-weight portfolio.

So what?

Weight-based attribution is the industry default for portfolio performance analysis. This exercise demonstrates that it systematically understates the contribution of diversifying winners and understates the damage of correlated losers. The gap between Shapley and naive attribution is not a rounding error — it reaches 13 percentage points for MSFT and 10 percentage points for JNJ.

For portfolio construction, the implication is clear: a stock's value to a portfolio depends on what else is in the portfolio. JNJ's Shapley value of +18.46% versus its naive attribution of +8.39% quantifies the diversification premium. Conversely, MSFT's amplified negative Shapley value quantifies the cost of correlation concentration.

With 8 stocks, exact Shapley computation takes seconds. For larger portfolios, Monte Carlo sampling of random permutations provides accurate approximations. The mathematical guarantee that Shapley values sum to the total return makes them the only attribution method with a rigorous fairness axiom — no other decomposition simultaneously satisfies efficiency, symmetry, and additivity.

Built with xfinlink — free financial data API for Python. pip install xfinlink

Built with xfinlink — free financial data API for Python. pip install xfinlink
← All articles