McKinney Chapter 4 - Practice for Section 03

FINA 6333 for Spring 2024

Author

Richard Herron

1 Announcements

  1. Our second DataCamp course, Intermediate Python, is due Friday, 1/26, at 11:59 PM
  2. I will record our week 4 lecture video on McKinney chapter 5 this Thursday evening, and the week 4 pre-class quiz is due before class next Tuesday, 1/30
  3. Team projects
    1. Continue to join teams on Canvas > People > Team Projects
    2. I removed the join-a-team assignment, but I will give the first project assignment in early February, so join a team by then

2 10-minute Recap

2.1 NumPy Arrays

NumPy arrays are multidimensional data structures that can store numerical data efficiently and perform fast mathematical operations on them.

import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
import numpy as np
%precision 4
'%.4f'
np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.ones((2, 2))
array([[1., 1.],
       [1., 1.]])
np.ones((2, 2, 2, 2))
array([[[[1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.]]],


       [[[1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.]]]])

We can use np.ranom.rand() to create n-dimensional arrays of standard normal random variables (i.e., \(\mu=0, \sigma=1\)). We can use np.random.seed() to set the random number generator seed to make our analysis repeatable.

np.random.seed(42)
np.random.randn(2, 2)
array([[ 0.4967, -0.1383],
       [ 0.6477,  1.523 ]])

2.2 Vectorized Functions

Vectorized computation is the process of applying an operation to an entire array or a subset of an array without using explicit loops. NumPy supports vectorized computation using universal functions (ufuncs), which are functions that operate on arrays element-wise.

Say we want to calculate square roots.

4**0.5
2.0000
[i**0.5 for i in range(10)]
[0.0000,
 1.0000,
 1.4142,
 1.7321,
 2.0000,
 2.2361,
 2.4495,
 2.6458,
 2.8284,
 3.0000]
np.sqrt(4)
2.0000
np.sqrt(np.arange(10))
array([0.    , 1.    , 1.4142, 1.7321, 2.    , 2.2361, 2.4495, 2.6458,
       2.8284, 3.    ])

2.3 Indexing and Slicing

Indexing and slicing are techniques to access or modify specific elements or subsets of an array. NumPy also supports advanced indexing methods, such as fancy indexing and boolean indexing, which allow more flexible and complex selection of array elements.

np.random.seed(42)
my_array = np.random.randn(3, 3)

my_array
array([[ 0.4967, -0.1383,  0.6477],
       [ 1.523 , -0.2342, -0.2341],
       [ 1.5792,  0.7674, -0.4695]])
my_array[0] # indexes first array in array of arrays
array([ 0.4967, -0.1383,  0.6477])
my_array[0][0] # indexes first element in first array in array of arrays
0.4967

In NumPy, we typically replace the chained [i][j] notation with the [i, j] from linear algebra class.

my_array[0, 0]
0.4967
my_array[:2, :2]
array([[ 0.4967, -0.1383],
       [ 1.523 , -0.2342]])

3 Practice

3.1 Create a 1-dimensional array named a1 that counts from 0 to 24 by 1.

a1 = np.arange(25)

a1
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])
a1.ndim
1
a1.dtype
dtype('int32')

3.2 Create a 1-dimentional array named a2 that counts from 0 to 24 by 3.

a2 = np.arange(0, 25, 3)

a2
array([ 0,  3,  6,  9, 12, 15, 18, 21, 24])

3.3 Create a 1-dimentional array named a3 that counts from 0 to 100 by multiples of 3 or 5.

a3 = np.array([i for i in range(101) if (i%3==0) or (i%5==0)]) # core Python allows "or" or "|"

a3
array([  0,   3,   5,   6,   9,  10,  12,  15,  18,  20,  21,  24,  25,
        27,  30,  33,  35,  36,  39,  40,  42,  45,  48,  50,  51,  54,
        55,  57,  60,  63,  65,  66,  69,  70,  72,  75,  78,  80,  81,
        84,  85,  87,  90,  93,  95,  96,  99, 100])
a3.ndim
1
a3_alt = np.arange(101)
a3_alt = a3_alt[(a3_alt%3==0) | (a3_alt%5==0)] # NumPy does not allow "or", only "|"

a3_alt
array([  0,   3,   5,   6,   9,  10,  12,  15,  18,  20,  21,  24,  25,
        27,  30,  33,  35,  36,  39,  40,  42,  45,  48,  50,  51,  54,
        55,  57,  60,  63,  65,  66,  69,  70,  72,  75,  78,  80,  81,
        84,  85,  87,  90,  93,  95,  96,  99, 100])

How can we test if a3 and a3_alt have the same values?

The == operators tests is NumPy arrays are the same, element by element. We can append the .all() to make sure all element-by-element comparisons are True!

(a3 == a3_alt).all()
True

if we want to compare within some tolerance, we can use np.allclose().

np.allclose(a3, a3_alt)
True

Minor detour for another example! How do I slice values greater than 5 in an array from 0 to 9?

ten = np.arange(10)
gt_five = ten[ten > 5]

gt_five
array([6, 7, 8, 9])

3.4 Create a 1-dimensional array a3 that contains the squares of the even integers through 100,000.

How much faster is the NumPy version than the list comprehension version?

np.arange(0, 100_001, 2)**2
array([         0,          4,         16, ..., 1409265424, 1409665412,
       1410065408])

On some computers, the output above is wrong because NumPy defaults to 32-bit integers, depending on the computer! Always check your output! To avoid this problem, we can force np.arange() to use 64-bit integers with the dtype= argument.

np.arange(0, 100_001, 2, dtype=np.int64)**2
array([          0,           4,          16, ...,  9999200016,
        9999600004, 10000000000], dtype=int64)

We can use the %timeit magic to time which code is faster! The %timeit magic runs the code on the same line many times and reports the mean computation time. The %%timet magic with two percent signs runs the code in the same cell many times and reports the mean computation time.

%timeit np.arange(0, 100_001, 2, dtype=np.int64)**2
43.8 µs ± 5.15 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit np.array([i**2 for i in range(0, 100_001)])
13.4 ms ± 918 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The NumPy version is about 1,000 times faster!

3.5 Write a function that mimic Excel’s pv function.

Here is how we call Excel’s pv function: =PV(rate, nper, pmt, [fv], [type]) We can use the annuity and lump sum present value formulas.

Present value of an annuity payment pmt: \(PV_{pmt} = \frac{pmt}{rate} \times \left(1 - \frac{1}{(1+rate)^{nper}} \right)\)

Present value of a lump sum fv: \(PV_{fv} = \frac{fv}{(1+rate)^{nper}}\)

def pv(rate, nper, pmt=None, fv=None, type=None):
    if pmt is None:
        pmt = 0
    if fv is None:
        fv = 0
    if type is None:
        type = 'END'
    
    pv_pmt = (pmt / rate) * (1 - 1 / (1 + rate)**nper)
    pv_fv = fv / (1 + rate)**nper
    pv = pv_pmt + pv_fv

    if type == 'BGN':
        pv *= (1 + rate) # same as pv = pv*(1 + rate)
    
    return -1 * pv
pv(rate = 0.05, nper = 14, fv = 1_000, type = 'END')
-505.0680
a = 5
a*= 5
a*= 5
a
125

3.6 Write a function that mimic Excel’s fv function.

def calc_fv(rate, nper, pmt=None, pv=None, type=None):
    if pmt is None:
        pmt = 0
    if pv is None:
        pv = 0
    if type is None:
        type = 'END'
    
    fv_pmt = (pmt / rate) * ((1 + rate)**nper - 1)
    fv_pv = pv * (1 + rate)**nper
    fv = fv_pmt + fv_pv

    if type == 'BGN':
        fv *= (1 + rate) # same as pv = pv*(1 + rate)
    
    return -1 * fv
calc_fv(rate=0.072, nper=10, pv=-1000)
2004.2314
calc_fv(rate=0.072, nper=10, pmt=72, pv=-1000)
1000.0000

3.7 Replace the negative values in data with -1 and positive values with +1.

np.random.seed(42)
data = np.random.randn(4, 4)
data
array([[ 0.4967, -0.1383,  0.6477,  1.523 ],
       [-0.2342, -0.2341,  1.5792,  0.7674],
       [-0.4695,  0.5426, -0.4634, -0.4657],
       [ 0.242 , -1.9133, -1.7249, -0.5623]])
data[data < 0] = -1
data[data > 0] = +1

data
array([[ 1., -1.,  1.,  1.],
       [-1., -1.,  1.,  1.],
       [-1.,  1., -1., -1.],
       [ 1., -1., -1., -1.]])

We have a second option in NumPy! NumPy’s np.where() function works the same as Excel’s if() function.

np.random.seed(42)
data = np.random.randn(4, 4)

np.where( # we can add whitespace inside parentheses ()!
    data < 0, # condition to test
    -1, # result if true
    np.where(data > 0, +1, data) # result if false
)
array([[ 1., -1.,  1.,  1.],
       [-1., -1.,  1.,  1.],
       [-1.,  1., -1., -1.],
       [ 1., -1., -1., -1.]])

We also have a third option in NumPy! NumPy’s np.select() function lets test multiple conditions!

np.random.seed(42)
data = np.random.randn(4, 4)

np.select( # we can add whitespace inside parentheses ()!
    condlist=[data<0, data>0],
    choicelist=[-1, +1],
    default=data
)
array([[ 1., -1.,  1.,  1.],
       [-1., -1.,  1.,  1.],
       [-1.,  1., -1., -1.],
       [ 1., -1., -1., -1.]])

3.8 Write a function npmts() that calculates the number of payments that generate \(x\%\) of the present value of a perpetuity.

Your npmts() should accept arguments c1, r, and g that represent \(C_1\), \(r\), and \(g\). The present value of a growing perpetuity is \(PV = \frac{C_1}{r - g}\), and the present value of a growing annuity is \(PV = \frac{C_1}{r - g}\left[ 1 - \left( \frac{1 + g}{1 + r} \right)^t \right]\).

We can use the growing annuity and perpetuity formulas to show: \(x = \left[ 1 - \left( \frac{1 + g}{1 + r} \right)^t \right]\).

Then: \(1 - x = \left( \frac{1 + g}{1 + r} \right)^t\).

Finally: \(t = \frac{\log(1-x)}{\log\left(\frac{1 + g}{1 + r}\right)}\)

We do not need to accept an argument c1 because \(C_1\) cancels out!

def npmts(x, r, g):
    return np.log(1-x) / np.log((1 + g) / (1 + r))
npmts(0.5, 0.1, 0.05)
14.9000

3.9 Write a function that calculates the internal rate of return given a NumPy array of cash flows.

Here are some data where the \(IRR\) is obvious!

c = np.array([-100, 110])
r = 0.10

First, write a function that calculates net present value (NPV) given cash flows in a NumpPy array c and a discount rate in a scalar r. The npv() function below uses NumPy arrays to calculate NPV as: \[NPV = \sum_{t=0}^T \frac{c_t}{(1+r)^t}\]

def calc_npv(r, c):
    t = np.arange(len(c))
    return (c / (1 + r)**t).sum()
calc_npv(r=r, c=c)
-0.0000

We can use a while loop to guess IRR values until we find an NPV close to zero. We can use the Newton-Rapshon method to make smarter guesses. If we have function \(f(x)\) and guess \(x_t\), our next guess should be \(x_{t+1} = x_t - \frac{f(x_t)}{f'(x_t)}\). Here our \(f(x)\) is \(NPV(r)\), and we can approximate \(f'(x_t)\) as \(\frac{NPV(r+0.000001) - NPV(r)}{0.000001}\). We will make guess until \(|NPV| < 0.000001\).

def calc_irr(c, guess=0):
    irr = guess
    npv = calc_npv(r=irr, c=c)
    while np.abs(npv) > 1e-6:
        slope = (calc_npv(r=irr+1e-6, c=c) - npv) / 1e-6 # Delta NPV / Delta r
        irr = irr - npv / slope
        npv = calc_npv(r=irr, c=c)
        # print(f'NPV is {npv}, IRR is {irr}')

    return irr
calc_irr(c=c) # c is the first positional argument, so we can omit "c=" 
0.1000

3.10 Write a function returns() that accepts NumPy arrays of prices and dividends and returns a NumPy array of returns.

prices = np.array([100, 150, 100, 50, 100, 150, 100, 150])
dividends = np.array([1, 1, 1, 1, 2, 2, 2, 2])

Here is the return for the first period.

(prices[1] - prices[0] + dividends[1]) / prices[0]
0.5100

Here are the capital gains for every period.

prices[1:] - prices[:-1]
array([ 50, -50, -50,  50,  50, -50,  50])
(prices[1:] - prices[:-1] + dividends[1:]) / prices[:-1]
array([ 0.51  , -0.3267, -0.49  ,  1.04  ,  0.52  , -0.32  ,  0.52  ])
def returns(p, d):
    return (p[1:] - p[:-1] + d[1:]) / p[:-1]
returns(p=prices, d=dividends)
array([ 0.51  , -0.3267, -0.49  ,  1.04  ,  0.52  , -0.32  ,  0.52  ])

3.11 Rewrite the function returns() so it returns NumPy arrays of returns, capital gains yields, and dividend yields.

def returns_2(p, d):
    r = (p[1:] - p[:-1] + d[1:]) / p[:-1]
    dp = d[1:] / p[:-1]
    cgp = r - dp
    return {'r':r, 'dp':dp, 'cgp':cgp}
returns_2(p=prices, d=dividends)
{'r': array([ 0.51  , -0.3267, -0.49  ,  1.04  ,  0.52  , -0.32  ,  0.52  ]),
 'dp': array([0.01  , 0.0067, 0.01  , 0.04  , 0.02  , 0.0133, 0.02  ]),
 'cgp': array([ 0.5   , -0.3333, -0.5   ,  1.    ,  0.5   , -0.3333,  0.5   ])}

3.12 Rescale and shift numbers so that they cover the range [0, 1]

Input: np.array([18.5, 17.0, 18.0, 19.0, 18.0])
Output: np.array([0.75, 0.0, 0.5, 1.0, 0.5])

numbers = np.array([18.5, 17.0, 18.0, 19.0, 18.0])
(numbers - numbers.min()) / (numbers.max() - numbers.min())
array([0.75, 0.  , 0.5 , 1.  , 0.5 ])

3.13 Write functions var() and std() that calculate variance and standard deviation.

NumPy’s .var() and .std() methods return population statistics (i.e., denominators of \(n\)). The pandas equivalents return sample statistics (denominators of \(n-1\)), which are more appropriate for financial data analysis where we have a sample instead of a population.

Both function should have an argument sample that is True by default so both functions return sample statistics by default.

Use numbers to compare your functions with NumPy’s .var() and .std() methods.

The population variance is “the average squared distance from the average.”

((numbers - numbers.mean())**2).mean()
0.4400
numbers.var()
0.4400
def var(x, sample=True):
    mu_x = x.mean()
    return ((x - mu_x)**2).sum() / (len(x) - sample)
var(numbers)
0.5500
var(numbers) == numbers.var(ddof=1)
True
var(numbers, sample=False)
0.4400
var(numbers, sample=False) == numbers.var()
True
def std(x, sample=True):
    return np.sqrt(var(x=x, sample=sample))
std(numbers, sample=False)
0.6633
std(numbers, sample=False) == numbers.std()
True
std(numbers)
0.7416
std(numbers) == numbers.std(ddof=1)
True