Herron Topic 1 - Web Data, Log and Simple Returns, and Portfolio Math

FINA 6333 for Spring 2024

Author

Richard Herron

This notebook covers three topics:

How to download web data with the yfinance and pandas-datareader packages
How to calculate log and simple returns
How to calculate portfolio returns

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

%precision 4
pd.options.display.float_format = '{:.4f}'.format
%config InlineBackend.figure_format = 'retina'

1 Web Data

We will typically use the yfinance and pandas-datarader packages to download data from the web. If you followed my instructions to install Miniconda on your computer, you have already installed these packages.

1.1 The yfinance Package

The yfinance package provides “a reliable, threaded, and Pythonic way to download historical market data from Yahoo! finance.” Other packages provide similar functionality, but yfinance is best.

import yfinance as yf

We can download data for the MATANA stocks (Microsoft, Alphabet, Tesla, Amazon, Nvidia, and Apple). We can pass tickers as either a space-delimited string or a list of strings.

df = yf.download(tickers='MSFT GOOG TSLA AMZN NVDA AAPL')
df

[*********************100%%**********************]  6 of 6 completed

	Adj Close						Close				...	Open				Volume
	AAPL	AMZN	GOOG	MSFT	NVDA	TSLA	AAPL	AMZN	GOOG	MSFT	...	GOOG	MSFT	NVDA	TSLA	AAPL	AMZN	GOOG	MSFT	NVDA	TSLA
Date
1980-12-12	0.0993	NaN	NaN	NaN	NaN	NaN	0.1283	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	469033600	NaN	NaN	NaN	NaN	NaN
1980-12-15	0.0941	NaN	NaN	NaN	NaN	NaN	0.1217	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	175884800	NaN	NaN	NaN	NaN	NaN
1980-12-16	0.0872	NaN	NaN	NaN	NaN	NaN	0.1127	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	105728000	NaN	NaN	NaN	NaN	NaN
1980-12-17	0.0894	NaN	NaN	NaN	NaN	NaN	0.1155	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	86441600	NaN	NaN	NaN	NaN	NaN
1980-12-18	0.0920	NaN	NaN	NaN	NaN	NaN	0.1189	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	73449600	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2024-01-26	192.4200	159.1200	153.7900	403.9300	610.3100	183.2500	192.4200	159.1200	153.7900	403.9300	...	152.8700	404.3700	609.6000	185.5000	44553400	51001100.0000	19483600.0000	17786700.0000	38983800.0000	107063400.0000
2024-01-29	191.7300	161.2600	154.8400	409.7200	624.6500	190.9300	191.7300	161.2600	154.8400	409.7200	...	153.6400	406.0600	612.3200	185.6300	47145600	45270400.0000	20909300.0000	24510200.0000	34873300.0000	125013100.0000
2024-01-30	188.0400	159.0000	153.0500	408.5900	627.7400	191.5900	188.0400	159.0000	153.0500	408.5900	...	154.0100	412.2600	629.0000	195.3300	55859400	45207400.0000	26578900.0000	33477600.0000	41073500.0000	109982300.0000
2024-01-31	184.4000	155.2000	141.8000	397.5800	615.2700	187.2900	184.4000	155.2000	141.8000	397.5800	...	145.3900	406.9600	614.4000	187.0000	55467800	50284400.0000	43908600.0000	47871100.0000	45379500.0000	103221400.0000
2024-02-01	186.8600	159.2800	142.7100	403.7800	630.2700	188.8600	186.8600	159.2800	142.7100	403.7800	...	143.6900	401.8300	621.0000	188.5000	52672619	63406285.0000	24997783.0000	29155588.0000	35818376.0000	90546003.0000

10875 rows × 36 columns

(
    df
    ['Adj Close']
    .pct_change()
    .loc['2023']
    .add(1)
    .cumprod()
    .sub(1)
    .mul(100)
    .plot()
)
plt.ylabel('Year-to-Date Return (%)')
plt.title('Year-to-Date Returns for MATANA Stocks in 2023')
plt.show()

1.2 The pandas-datareader package

The pandas-datareader package provides easy access to various data sources, including the Kenneth French Data Library and the Federal Reserve Economic Data (FRED). The pandas-datareader package also downloads Yahoo! Finance data, but the yfinance package has better documentation. We will use pdr as the abbreviated prefix for pandas-datareader.

import pandas_datareader as pdr

Here we download the daily benchmark factors from Ken French’s Data Library.

pdr.famafrench.get_available_datasets()[:5]

['F-F_Research_Data_Factors',
 'F-F_Research_Data_Factors_weekly',
 'F-F_Research_Data_Factors_daily',
 'F-F_Research_Data_5_Factors_2x3',
 'F-F_Research_Data_5_Factors_2x3_daily']

For Fama and French data, pandas-datareader returns the most recent five years of data unless we specify a start date. French typically provides data back through the second half of 1926. pandas-datareader returns dictionaries of data frames, and the 'DESCR' value describes these data frames.

ff_all = pdr.DataReader(
    name='F-F_Research_Data_Factors_daily',
    data_source='famafrench',
    start='1900'
)

C:\Users\r.herron\AppData\Local\Temp\ipykernel_26796\2526882917.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  ff_all = pdr.DataReader(

type(ff_all)

dict

print(ff_all['DESCR'])

F-F Research Data Factors daily
-------------------------------

This file was created by CMPT_ME_BEME_RETS_DAILY using the 202312 CRSP database. The Tbill return is the simple daily rate that, over the number of trading days in the month, compounds to 1-month TBill rate from Ibbotson and Associates Inc. Copyright 2023 Kenneth R. French

  0 : (25649 rows x 4 cols)

ff_all[0]

	Mkt-RF	SMB	HML	RF
Date
1926-07-01	0.1000	-0.2500	-0.2700	0.0090
1926-07-02	0.4500	-0.3300	-0.0600	0.0090
1926-07-06	0.1700	0.3000	-0.3900	0.0090
1926-07-07	0.0900	-0.5800	0.0200	0.0090
1926-07-08	0.2100	-0.3800	0.1900	0.0090
...	...	...	...	...
2023-12-22	0.2100	0.6400	0.0900	0.0210
2023-12-26	0.4800	0.6900	0.4600	0.0210
2023-12-27	0.1600	0.1400	0.1200	0.0210
2023-12-28	-0.0100	-0.3600	0.0300	0.0210
2023-12-29	-0.4300	-1.1200	-0.3700	0.0210

25649 rows × 4 columns

(
    ff_all[0] # slice factors
    .div(100) # convert to decimal
    .add(1) # calculate cumulative returns
    .cumprod()
    .sub(1)
    .mul(100) # convert to percent
    .plot() # plot
)
plt.ylabel('Cumulative Return (%)')
plt.title('Cumulative Returns for the Daily Benchmark Factors')
plt.gca().yaxis.set_major_formatter(plt.matplotlib.ticker.StrMethodFormatter('{x:,.0f}'))
# plt.yscale('log') # log scale impractical here with negative returns
plt.show()

2 Log and Simple Returns

We will typically use simple returns, calculated as \(R_{simple,t} = \frac{P_t + D_t - P_{t-1}}{P_{t-1}} = \frac{P_t + D_t}{P_{t-1}} - 1\). The simple return is the return that investors receive on invested dollars. We can calculate simple returns from Yahoo Finance data with the .pct_change() method on the adjusted close column (i.e., Adj Close), which adjusts for dividends and splits. The adjusted close column is a reverse-engineered close price (i.e., end-of-trading-day price) that incorporates dividends and splits, making simple return calculations easy.

However, we may see log returns elsewhere, which are the (natural) log of one plus simple returns: \[R_{log,t} = \log(1 + R_{simple,t}) = \log\left(1 + \frac{P_t + D_t}{P_{t-1}} - 1 \right) = \log\left(\frac{P_t + D_t}{P_{t-1}} \right) = \log(P_t + D_t) - \log(P_{t-1})\] Therefore, we calculate log returns as either the log of one plus simple returns or the difference of the logs of the adjusted close column. Log returns are also known as continuously-compounded returns.

We will typically use simple returns instead of log returns. However, this section explains the differences between simple and log returns and where each is appropriate.

2.1 Simple and Log Returns are Similar for Small Returns

\(\log(1 + x) \approx x\) for small values of \(x\), so simple returns and log returns are similar for small returns. Returns are typically small at daily and monthly horizons, so the difference between simple and log returns is small at these horizons. The following figure shows \(R_{simple,t} \approx R_{log,t}\) for small \(R\)s.

R = np.linspace(-0.75, 0.75, 100)
logR = np.log(1 + R)

plt.plot(R, logR)
plt.plot([-1, 1], [-1, 1])
plt.xlabel('Simple Return')
plt.ylabel('Log Return')
plt.title('Log Versus Simple Returns')
plt.legend(['Actual', 'If Log = Simple'])
plt.show()

2.2 Simple Return Advantage: Portfolio Calculations

We can only perform portfolio calculations with simple returns. For a portfolio of \(N\) assets with portfolio weights \(w_i\), the portfolio return \(R_{p}\) is the weighted average of the returns of its assets, \(R_{p} = \sum_{i=1}^N w_i R_{i}\). For two stocks with portfolio weights of 50%, our portfolio return is \(R_{portfolio} = 0.5 R_1 + 0.5 R_2 = \frac{R_1 + R_2}{2}\). However, we cannot calculate portfolio returns with log returns because the sum of logs is the log of products.

We cannot calculate portfolio returns as the weighted average of log returns.

2.3 Log Return Advantage: Log Returns are Additive

The advantage of log returns is that we can compound log returns with addition. The additive property of log returns makes code simple, computations fast, and proofs easy when we compound returns over multiple periods.

We compound returns from \(t=0\) to \(t=T\) as follows: \[1 + R_{0, T} = (1 + R_1) \times (1 + R_2) \times \dots \times (1 + R_T)\]

Next, we take the log of both sides of the previous equation and use the property that the log of products is the sum of logs: \[\log(1 + R_{0, T}) = \log((1 + R_1) \times (1 + R_2) \times \dots \times (1 + R_T)) = \log(1 + R_1) + \log(1 + R_2) + \dots + \log(1 + R_T) = \sum_{t=1}^T \log(1 + R_t)\]

Next, we exponentiate both sides of the previous equation: \[e^{\log(1 + R_{0, T})} = e^{\sum_{t=0}^T \log(1 + R_t)}\]

Next, we use the property that \(e^{\log(x)} = x\) to simplify the previous equation: \[1 + R_{0,T} = e^{\sum_{t=0}^T \log(1 + R_t)}\]

Finally, we subtract 1 from both sides: \[R_{0 ,T} = e^{\sum_{t=0}^T \log(1 + R_t)} - 1\]

So, the return \(R_{0,T}\) from \(t=0\) to \(t=T\) is the exponentiated sum of log returns. The pandas developers assume users understand the math above and focus on optimizing sums.

The following code generates 10,000 random log returns. The np.random.randn() call generates normally distributed random numbers. To generate equivalent simple returns, we exponentiate these log returns, then subtract one.

np.random.seed(42)
df2 = pd.DataFrame(data={'R': np.exp(np.random.randn(10000)) - 1})
df2

	R
0	0.6433
1	-0.1291
2	0.9111
3	3.5861
4	-0.2088
...	...
9995	2.6733
9996	-0.8644
9997	-0.5060
9998	0.6418
9999	0.9048

10000 rows × 1 columns

df2.describe()

	R
count	10000.0000
mean	0.6529
std	2.1918
min	-0.9802
25%	-0.4896
50%	-0.0026
75%	0.9564
max	49.7158

We can time the calculation of 12-observation rolling returns. We use .apply() for the simple return version because .rolling() does not have a product method. We find that .rolling() is slower with .apply() than with .sum() by a factor of 2,000. We will learn about .rolling() and .apply() in a few weeks, but they provide the best example of when to use log returns.

%%timeit
df2['R12_via_simple'] = (
    df2['R']
    .add(1)
    .rolling(12)
    .apply(lambda x: x.prod())
    .sub(1)
)

284 ms ± 80.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
df2['R12_via_log'] = (
    df2['R']
    .add(1)
    .pipe(np.log)
    .rolling(12)
    .sum()
    .pipe(np.exp)
    .sub(1)
)

557 µs ± 60.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

df2.head(15)

	R	R12_via_simple	R12_via_log
0	0.6433	NaN	NaN
1	-0.1291	NaN	NaN
2	0.9111	NaN	NaN
3	3.5861	NaN	NaN
4	-0.2088	NaN	NaN
5	-0.2087	NaN	NaN
6	3.8511	NaN	NaN
7	1.1542	NaN	NaN
8	-0.3747	NaN	NaN
9	0.7204	NaN	NaN
10	-0.3709	NaN	NaN
11	-0.3723	33.8643	33.8643
12	0.2737	26.0236	26.0236
13	-0.8524	3.5800	3.5800
14	-0.8218	-0.5730	-0.5730

np.allclose(df2['R12_via_simple'], df2['R12_via_log'], equal_nan=True)

True

These two approaches calculate the same return, but the simple-return approach is 1,000 times slower than the log-return approach!

We can use log returns to calculate total returns very quickly!

3 Portfolio Math

Portfolio return \(R_{p}\) is the weighted average of its asset returns, so \(R_{p} = \sum_{i=1}^N w_i R_{i}\). Here \(N\) is the number of assets, and \(w_i\) is the weight on asset \(i\).

3.1 The 1/N Portfolio

The \(\frac{1}{N}\) portfolio equally weights portfolio assets, so \(w_1 = w_2 = \dots = w_N = \frac{1}{N}\). We typically rebalance the \(\frac{1}{N}\) portfolio every period. If \(w_i = \frac{1}{N}\), then \(R_{p} = \sum_{i=1}^N \frac{1}{N} R_{i} = \frac{\sum_{i=1}^N R_i}{N} = \bar{R}\). Therefore, we can use .mean() to calculate \(\frac{1}{N}\) portfolio returns.

returns = df['Adj Close'].pct_change().loc['2023']

returns

	AAPL	AMZN	GOOG	MSFT	NVDA	TSLA
Date
2023-01-03	-0.0374	0.0217	0.0109	-0.0010	-0.0205	-0.1224
2023-01-04	0.0103	-0.0079	-0.0110	-0.0437	0.0303	0.0512
2023-01-05	-0.0106	-0.0237	-0.0219	-0.0296	-0.0328	-0.0290
2023-01-06	0.0368	0.0356	0.0160	0.0118	0.0416	0.0247
2023-01-09	0.0041	0.0149	0.0073	0.0097	0.0518	0.0593
...	...	...	...	...	...	...
2023-12-22	-0.0055	-0.0027	0.0065	0.0028	-0.0033	-0.0077
2023-12-26	-0.0028	-0.0001	0.0007	0.0002	0.0092	0.0161
2023-12-27	0.0005	-0.0005	-0.0097	-0.0016	0.0028	0.0188
2023-12-28	0.0022	0.0003	-0.0011	0.0032	0.0021	-0.0316
2023-12-29	-0.0054	-0.0094	-0.0025	0.0020	0.0000	-0.0186

250 rows × 6 columns

returns.mean()

AAPL   0.0017
AMZN   0.0026
GOOG   0.0020
MSFT   0.0020
NVDA   0.0053
TSLA   0.0034
dtype: float64

rp_1 = returns.mean(axis=1)
rp_1

Date
2023-01-03   -0.0248
2023-01-04    0.0049
2023-01-05   -0.0246
2023-01-06    0.0278
2023-01-09    0.0245
               ...  
2023-12-22   -0.0017
2023-12-26    0.0039
2023-12-27    0.0017
2023-12-28   -0.0041
2023-12-29   -0.0056
Length: 250, dtype: float64

Note that when we apply the same portfolio weights every period, we rebalance at the same frequency as the returns data. If we have daily data, rebalance daily. If we have monthly data, we rebalance monthly, and so on.

3.2 A More General Solution

If we combine weights into vector \(w\) and the time series of asset returns into matrix \(\mathbf{R}\), then we can calculate the time series of portfolio returns as \(R_p = w^T \mathbf{R}\). The pandas version of this calculation is R.dot(w), where R is a data frame of asset returns and w is a series of portfolio weights. We can use this approach to calculate \(\frac{1}{N}\) portfolio returns, too.

weights = np.ones(returns.shape[1]) / returns.shape[1]
weights

array([0.1667, 0.1667, 0.1667, 0.1667, 0.1667, 0.1667])

rp_2 = returns.dot(weights)
rp_2

Date
2023-01-03   -0.0248
2023-01-04    0.0049
2023-01-05   -0.0246
2023-01-06    0.0278
2023-01-09    0.0245
               ...  
2023-12-22   -0.0017
2023-12-26    0.0039
2023-12-27    0.0017
2023-12-28   -0.0041
2023-12-29   -0.0056
Length: 250, dtype: float64

Both approaches give the same answer!

np.allclose(rp_1, rp_2, equal_nan=True)

True