Our second DataCamp course, Intermediate Python, is due Friday, 1/26, at 11:59 PM
I will record our week 4 lecture video on McKinney chapter 5 this Thursday evening, and the week 4 pre-class quiz is due before class next Tuesday, 1/30
Team projects
Continue to join teams on Canvas > People > Team Projects
I removed the join-a-team assignment, but I will give the first project assignment in early February, so join a team by then
2 10-minute Recap
2.1 NumPy Arrays
NumPy arrays are multidimensional data structures that can store numerical data efficiently and perform fast mathematical operations on them.
import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
We can use np.ranom.rand() to create n-dimensional arrays of standard normal random variables (i.e., \(\mu=0, \sigma=1\)). We can use np.random.seed() to set the random number generator seed to make our analysis repeatable.
np.random.seed(42)np.random.randn(2, 2)
array([[ 0.4967, -0.1383],
[ 0.6477, 1.523 ]])
2.2 Vectorized Functions
Vectorized computation is the process of applying an operation to an entire array or a subset of an array without using explicit loops. NumPy supports vectorized computation using universal functions (ufuncs), which are functions that operate on arrays element-wise.
Indexing and slicing are techniques to access or modify specific elements or subsets of an array. NumPy also supports advanced indexing methods, such as fancy indexing and boolean indexing, which allow more flexible and complex selection of array elements.
How can we test if a3 and a3_alt have the same values?
The == operators tests is NumPy arrays are the same, element by element. We can append the .all() to make sure all element-by-element comparisons are True!
(a3 == a3_alt).all()
True
if we want to compare within some tolerance, we can use np.allclose().
np.allclose(a3, a3_alt)
True
Minor detour for another example! How do I slice values greater than 5 in an array from 0 to 9?
ten = np.arange(10)gt_five = ten[ten >5]gt_five
array([6, 7, 8, 9])
3.4 Create a 1-dimensional array a3 that contains the squares of the even integers through 100,000.
How much faster is the NumPy version than the list comprehension version?
On some computers, the output above is wrong because NumPy defaults to 32-bit integers, depending on the computer! Always check your output! To avoid this problem, we can force np.arange() to use 64-bit integers with the dtype= argument.
We can use the %timeit magic to time which code is faster! The %timeit magic runs the code on the same line many times and reports the mean computation time. The %%timet magic with two percent signs runs the code in the same cell many times and reports the mean computation time.
We have a second option in NumPy! NumPy’s np.where() function works the same as Excel’s if() function.
np.random.seed(42)data = np.random.randn(4, 4)np.where( # we can add whitespace inside parentheses ()! data <0, # condition to test-1, # result if true np.where(data >0, +1, data) # result if false)
3.8 Write a function npmts() that calculates the number of payments that generate \(x\%\) of the present value of a perpetuity.
Your npmts() should accept arguments c1, r, and g that represent \(C_1\), \(r\), and \(g\). The present value of a growing perpetuity is \(PV = \frac{C_1}{r - g}\), and the present value of a growing annuity is \(PV = \frac{C_1}{r - g}\left[ 1 - \left( \frac{1 + g}{1 + r} \right)^t \right]\).
We can use the growing annuity and perpetuity formulas to show: \(x = \left[ 1 - \left( \frac{1 + g}{1 + r} \right)^t \right]\).
3.9 Write a function that calculates the internal rate of return given a NumPy array of cash flows.
Here are some data where the \(IRR\) is obvious!
c = np.array([-100, 110])r =0.10
First, write a function that calculates net present value (NPV) given cash flows in a NumpPy array c and a discount rate in a scalar r. The npv() function below uses NumPy arrays to calculate NPV as: \[NPV = \sum_{t=0}^T \frac{c_t}{(1+r)^t}\]
def calc_npv(r, c): t = np.arange(len(c))return (c / (1+ r)**t).sum()
calc_npv(r=r, c=c)
-0.0000
We can use a while loop to guess IRR values until we find an NPV close to zero. We can use the Newton-Rapshon method to make smarter guesses. If we have function \(f(x)\) and guess \(x_t\), our next guess should be \(x_{t+1} = x_t - \frac{f(x_t)}{f'(x_t)}\). Here our \(f(x)\) is \(NPV(r)\), and we can approximate \(f'(x_t)\) as \(\frac{NPV(r+0.000001) - NPV(r)}{0.000001}\). We will make guess until \(|NPV| < 0.000001\).
3.13 Write functions var() and std() that calculate variance and standard deviation.
NumPy’s .var() and .std() methods return population statistics (i.e., denominators of \(n\)). The pandas equivalents return sample statistics (denominators of \(n-1\)), which are more appropriate for financial data analysis where we have a sample instead of a population.
Both function should have an argument sample that is True by default so both functions return sample statistics by default.
Use numbers to compare your functions with NumPy’s .var() and .std() methods.
The population variance is “the average squared distance from the average.”