Herron Topic 1 - Web Data, Log and Simple Returns, and Portfolio Math

FINA 6333 for Spring 2024

Author

Richard Herron

This notebook covers three topics:

  1. How to download web data with the yfinance and pandas-datareader packages
  2. How to calculate log and simple returns
  3. How to calculate portfolio returns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%precision 4
pd.options.display.float_format = '{:.4f}'.format
%config InlineBackend.figure_format = 'retina'

1 Web Data

We will typically use the yfinance and pandas-datarader packages to download data from the web. If you followed my instructions to install Miniconda on your computer, you have already installed these packages.

1.1 The yfinance Package

The yfinance package provides “a reliable, threaded, and Pythonic way to download historical market data from Yahoo! finance.” Other packages provide similar functionality, but yfinance is best.

import yfinance as yf

We can download data for the MATANA stocks (Microsoft, Alphabet, Tesla, Amazon, Nvidia, and Apple). We can pass tickers as either a space-delimited string or a list of strings.

df = yf.download(tickers='MSFT GOOG TSLA AMZN NVDA AAPL')
df
[*********************100%%**********************]  6 of 6 completed
Adj Close Close ... Open Volume
AAPL AMZN GOOG MSFT NVDA TSLA AAPL AMZN GOOG MSFT ... GOOG MSFT NVDA TSLA AAPL AMZN GOOG MSFT NVDA TSLA
Date
1980-12-12 0.0993 NaN NaN NaN NaN NaN 0.1283 NaN NaN NaN ... NaN NaN NaN NaN 469033600 NaN NaN NaN NaN NaN
1980-12-15 0.0941 NaN NaN NaN NaN NaN 0.1217 NaN NaN NaN ... NaN NaN NaN NaN 175884800 NaN NaN NaN NaN NaN
1980-12-16 0.0872 NaN NaN NaN NaN NaN 0.1127 NaN NaN NaN ... NaN NaN NaN NaN 105728000 NaN NaN NaN NaN NaN
1980-12-17 0.0894 NaN NaN NaN NaN NaN 0.1155 NaN NaN NaN ... NaN NaN NaN NaN 86441600 NaN NaN NaN NaN NaN
1980-12-18 0.0920 NaN NaN NaN NaN NaN 0.1189 NaN NaN NaN ... NaN NaN NaN NaN 73449600 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-01-26 192.4200 159.1200 153.7900 403.9300 610.3100 183.2500 192.4200 159.1200 153.7900 403.9300 ... 152.8700 404.3700 609.6000 185.5000 44553400 51001100.0000 19483600.0000 17786700.0000 38983800.0000 107063400.0000
2024-01-29 191.7300 161.2600 154.8400 409.7200 624.6500 190.9300 191.7300 161.2600 154.8400 409.7200 ... 153.6400 406.0600 612.3200 185.6300 47145600 45270400.0000 20909300.0000 24510200.0000 34873300.0000 125013100.0000
2024-01-30 188.0400 159.0000 153.0500 408.5900 627.7400 191.5900 188.0400 159.0000 153.0500 408.5900 ... 154.0100 412.2600 629.0000 195.3300 55859400 45207400.0000 26578900.0000 33477600.0000 41073500.0000 109982300.0000
2024-01-31 184.4000 155.2000 141.8000 397.5800 615.2700 187.2900 184.4000 155.2000 141.8000 397.5800 ... 145.3900 406.9600 614.4000 187.0000 55467800 50284400.0000 43908600.0000 47871100.0000 45379500.0000 103221400.0000
2024-02-01 186.8600 159.2800 142.7100 403.7800 630.2700 188.8600 186.8600 159.2800 142.7100 403.7800 ... 143.6900 401.8300 621.0000 188.5000 52672619 63406285.0000 24997783.0000 29155588.0000 35818376.0000 90546003.0000

10875 rows × 36 columns

(
    df
    ['Adj Close']
    .pct_change()
    .loc['2023']
    .add(1)
    .cumprod()
    .sub(1)
    .mul(100)
    .plot()
)
plt.ylabel('Year-to-Date Return (%)')
plt.title('Year-to-Date Returns for MATANA Stocks in 2023')
plt.show()

1.2 The pandas-datareader package

The pandas-datareader package provides easy access to various data sources, including the Kenneth French Data Library and the Federal Reserve Economic Data (FRED). The pandas-datareader package also downloads Yahoo! Finance data, but the yfinance package has better documentation. We will use pdr as the abbreviated prefix for pandas-datareader.

import pandas_datareader as pdr

Here we download the daily benchmark factors from Ken French’s Data Library.

pdr.famafrench.get_available_datasets()[:5]
['F-F_Research_Data_Factors',
 'F-F_Research_Data_Factors_weekly',
 'F-F_Research_Data_Factors_daily',
 'F-F_Research_Data_5_Factors_2x3',
 'F-F_Research_Data_5_Factors_2x3_daily']

For Fama and French data, pandas-datareader returns the most recent five years of data unless we specify a start date. French typically provides data back through the second half of 1926. pandas-datareader returns dictionaries of data frames, and the 'DESCR' value describes these data frames.

ff_all = pdr.DataReader(
    name='F-F_Research_Data_Factors_daily',
    data_source='famafrench',
    start='1900'
)
C:\Users\r.herron\AppData\Local\Temp\ipykernel_26796\2526882917.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  ff_all = pdr.DataReader(
type(ff_all)
dict
print(ff_all['DESCR'])
F-F Research Data Factors daily
-------------------------------

This file was created by CMPT_ME_BEME_RETS_DAILY using the 202312 CRSP database. The Tbill return is the simple daily rate that, over the number of trading days in the month, compounds to 1-month TBill rate from Ibbotson and Associates Inc. Copyright 2023 Kenneth R. French

  0 : (25649 rows x 4 cols)
ff_all[0]
Mkt-RF SMB HML RF
Date
1926-07-01 0.1000 -0.2500 -0.2700 0.0090
1926-07-02 0.4500 -0.3300 -0.0600 0.0090
1926-07-06 0.1700 0.3000 -0.3900 0.0090
1926-07-07 0.0900 -0.5800 0.0200 0.0090
1926-07-08 0.2100 -0.3800 0.1900 0.0090
... ... ... ... ...
2023-12-22 0.2100 0.6400 0.0900 0.0210
2023-12-26 0.4800 0.6900 0.4600 0.0210
2023-12-27 0.1600 0.1400 0.1200 0.0210
2023-12-28 -0.0100 -0.3600 0.0300 0.0210
2023-12-29 -0.4300 -1.1200 -0.3700 0.0210

25649 rows × 4 columns

(
    ff_all[0] # slice factors
    .div(100) # convert to decimal
    .add(1) # calculate cumulative returns
    .cumprod()
    .sub(1)
    .mul(100) # convert to percent
    .plot() # plot
)
plt.ylabel('Cumulative Return (%)')
plt.title('Cumulative Returns for the Daily Benchmark Factors')
plt.gca().yaxis.set_major_formatter(plt.matplotlib.ticker.StrMethodFormatter('{x:,.0f}'))
# plt.yscale('log') # log scale impractical here with negative returns
plt.show()

2 Log and Simple Returns

We will typically use simple returns, calculated as \(R_{simple,t} = \frac{P_t + D_t - P_{t-1}}{P_{t-1}} = \frac{P_t + D_t}{P_{t-1}} - 1\). The simple return is the return that investors receive on invested dollars. We can calculate simple returns from Yahoo Finance data with the .pct_change() method on the adjusted close column (i.e., Adj Close), which adjusts for dividends and splits. The adjusted close column is a reverse-engineered close price (i.e., end-of-trading-day price) that incorporates dividends and splits, making simple return calculations easy.

However, we may see log returns elsewhere, which are the (natural) log of one plus simple returns: \[R_{log,t} = \log(1 + R_{simple,t}) = \log\left(1 + \frac{P_t + D_t}{P_{t-1}} - 1 \right) = \log\left(\frac{P_t + D_t}{P_{t-1}} \right) = \log(P_t + D_t) - \log(P_{t-1})\] Therefore, we calculate log returns as either the log of one plus simple returns or the difference of the logs of the adjusted close column. Log returns are also known as continuously-compounded returns.

We will typically use simple returns instead of log returns. However, this section explains the differences between simple and log returns and where each is appropriate.

2.1 Simple and Log Returns are Similar for Small Returns

\(\log(1 + x) \approx x\) for small values of \(x\), so simple returns and log returns are similar for small returns. Returns are typically small at daily and monthly horizons, so the difference between simple and log returns is small at these horizons. The following figure shows \(R_{simple,t} \approx R_{log,t}\) for small \(R\)s.

R = np.linspace(-0.75, 0.75, 100)
logR = np.log(1 + R)
plt.plot(R, logR)
plt.plot([-1, 1], [-1, 1])
plt.xlabel('Simple Return')
plt.ylabel('Log Return')
plt.title('Log Versus Simple Returns')
plt.legend(['Actual', 'If Log = Simple'])
plt.show()

2.2 Simple Return Advantage: Portfolio Calculations

We can only perform portfolio calculations with simple returns. For a portfolio of \(N\) assets with portfolio weights \(w_i\), the portfolio return \(R_{p}\) is the weighted average of the returns of its assets, \(R_{p} = \sum_{i=1}^N w_i R_{i}\). For two stocks with portfolio weights of 50%, our portfolio return is \(R_{portfolio} = 0.5 R_1 + 0.5 R_2 = \frac{R_1 + R_2}{2}\). However, we cannot calculate portfolio returns with log returns because the sum of logs is the log of products.

We cannot calculate portfolio returns as the weighted average of log returns.

2.3 Log Return Advantage: Log Returns are Additive

The advantage of log returns is that we can compound log returns with addition. The additive property of log returns makes code simple, computations fast, and proofs easy when we compound returns over multiple periods.

We compound returns from \(t=0\) to \(t=T\) as follows: \[1 + R_{0, T} = (1 + R_1) \times (1 + R_2) \times \dots \times (1 + R_T)\]

Next, we take the log of both sides of the previous equation and use the property that the log of products is the sum of logs: \[\log(1 + R_{0, T}) = \log((1 + R_1) \times (1 + R_2) \times \dots \times (1 + R_T)) = \log(1 + R_1) + \log(1 + R_2) + \dots + \log(1 + R_T) = \sum_{t=1}^T \log(1 + R_t)\]

Next, we exponentiate both sides of the previous equation: \[e^{\log(1 + R_{0, T})} = e^{\sum_{t=0}^T \log(1 + R_t)}\]

Next, we use the property that \(e^{\log(x)} = x\) to simplify the previous equation: \[1 + R_{0,T} = e^{\sum_{t=0}^T \log(1 + R_t)}\]

Finally, we subtract 1 from both sides: \[R_{0 ,T} = e^{\sum_{t=0}^T \log(1 + R_t)} - 1\]

So, the return \(R_{0,T}\) from \(t=0\) to \(t=T\) is the exponentiated sum of log returns. The pandas developers assume users understand the math above and focus on optimizing sums.

The following code generates 10,000 random log returns. The np.random.randn() call generates normally distributed random numbers. To generate equivalent simple returns, we exponentiate these log returns, then subtract one.

np.random.seed(42)
df2 = pd.DataFrame(data={'R': np.exp(np.random.randn(10000)) - 1})
df2
R
0 0.6433
1 -0.1291
2 0.9111
3 3.5861
4 -0.2088
... ...
9995 2.6733
9996 -0.8644
9997 -0.5060
9998 0.6418
9999 0.9048

10000 rows × 1 columns

df2.describe()
R
count 10000.0000
mean 0.6529
std 2.1918
min -0.9802
25% -0.4896
50% -0.0026
75% 0.9564
max 49.7158

We can time the calculation of 12-observation rolling returns. We use .apply() for the simple return version because .rolling() does not have a product method. We find that .rolling() is slower with .apply() than with .sum() by a factor of 2,000. We will learn about .rolling() and .apply() in a few weeks, but they provide the best example of when to use log returns.

%%timeit
df2['R12_via_simple'] = (
    df2['R']
    .add(1)
    .rolling(12)
    .apply(lambda x: x.prod())
    .sub(1)
)
284 ms ± 80.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
df2['R12_via_log'] = (
    df2['R']
    .add(1)
    .pipe(np.log)
    .rolling(12)
    .sum()
    .pipe(np.exp)
    .sub(1)
)
557 µs ± 60.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
df2.head(15)
R R12_via_simple R12_via_log
0 0.6433 NaN NaN
1 -0.1291 NaN NaN
2 0.9111 NaN NaN
3 3.5861 NaN NaN
4 -0.2088 NaN NaN
5 -0.2087 NaN NaN
6 3.8511 NaN NaN
7 1.1542 NaN NaN
8 -0.3747 NaN NaN
9 0.7204 NaN NaN
10 -0.3709 NaN NaN
11 -0.3723 33.8643 33.8643
12 0.2737 26.0236 26.0236
13 -0.8524 3.5800 3.5800
14 -0.8218 -0.5730 -0.5730
np.allclose(df2['R12_via_simple'], df2['R12_via_log'], equal_nan=True)
True

These two approaches calculate the same return, but the simple-return approach is 1,000 times slower than the log-return approach!

We can use log returns to calculate total returns very quickly!

3 Portfolio Math

Portfolio return \(R_{p}\) is the weighted average of its asset returns, so \(R_{p} = \sum_{i=1}^N w_i R_{i}\). Here \(N\) is the number of assets, and \(w_i\) is the weight on asset \(i\).

3.1 The 1/N Portfolio

The \(\frac{1}{N}\) portfolio equally weights portfolio assets, so \(w_1 = w_2 = \dots = w_N = \frac{1}{N}\). We typically rebalance the \(\frac{1}{N}\) portfolio every period. If \(w_i = \frac{1}{N}\), then \(R_{p} = \sum_{i=1}^N \frac{1}{N} R_{i} = \frac{\sum_{i=1}^N R_i}{N} = \bar{R}\). Therefore, we can use .mean() to calculate \(\frac{1}{N}\) portfolio returns.

returns = df['Adj Close'].pct_change().loc['2023']

returns
AAPL AMZN GOOG MSFT NVDA TSLA
Date
2023-01-03 -0.0374 0.0217 0.0109 -0.0010 -0.0205 -0.1224
2023-01-04 0.0103 -0.0079 -0.0110 -0.0437 0.0303 0.0512
2023-01-05 -0.0106 -0.0237 -0.0219 -0.0296 -0.0328 -0.0290
2023-01-06 0.0368 0.0356 0.0160 0.0118 0.0416 0.0247
2023-01-09 0.0041 0.0149 0.0073 0.0097 0.0518 0.0593
... ... ... ... ... ... ...
2023-12-22 -0.0055 -0.0027 0.0065 0.0028 -0.0033 -0.0077
2023-12-26 -0.0028 -0.0001 0.0007 0.0002 0.0092 0.0161
2023-12-27 0.0005 -0.0005 -0.0097 -0.0016 0.0028 0.0188
2023-12-28 0.0022 0.0003 -0.0011 0.0032 0.0021 -0.0316
2023-12-29 -0.0054 -0.0094 -0.0025 0.0020 0.0000 -0.0186

250 rows × 6 columns

returns.mean()
AAPL   0.0017
AMZN   0.0026
GOOG   0.0020
MSFT   0.0020
NVDA   0.0053
TSLA   0.0034
dtype: float64
rp_1 = returns.mean(axis=1)
rp_1
Date
2023-01-03   -0.0248
2023-01-04    0.0049
2023-01-05   -0.0246
2023-01-06    0.0278
2023-01-09    0.0245
               ...  
2023-12-22   -0.0017
2023-12-26    0.0039
2023-12-27    0.0017
2023-12-28   -0.0041
2023-12-29   -0.0056
Length: 250, dtype: float64

Note that when we apply the same portfolio weights every period, we rebalance at the same frequency as the returns data. If we have daily data, rebalance daily. If we have monthly data, we rebalance monthly, and so on.

3.2 A More General Solution

If we combine weights into vector \(w\) and the time series of asset returns into matrix \(\mathbf{R}\), then we can calculate the time series of portfolio returns as \(R_p = w^T \mathbf{R}\). The pandas version of this calculation is R.dot(w), where R is a data frame of asset returns and w is a series of portfolio weights. We can use this approach to calculate \(\frac{1}{N}\) portfolio returns, too.

weights = np.ones(returns.shape[1]) / returns.shape[1]
weights
array([0.1667, 0.1667, 0.1667, 0.1667, 0.1667, 0.1667])
rp_2 = returns.dot(weights)
rp_2
Date
2023-01-03   -0.0248
2023-01-04    0.0049
2023-01-05   -0.0246
2023-01-06    0.0278
2023-01-09    0.0245
               ...  
2023-12-22   -0.0017
2023-12-26    0.0039
2023-12-27    0.0017
2023-12-28   -0.0041
2023-12-29   -0.0056
Length: 250, dtype: float64

Both approaches give the same answer!

np.allclose(rp_1, rp_2, equal_nan=True)
True