1. What is a model?#
“All models are wrong…but some models are useful.” – George Box (maybe)
What are some types of models you’ve ? (let’s aim for a diverse list)
Role model
Linear regression model
classification model
arma model (time-series model)
logistic regression model
super model
model organism
clustering model
large language model
template
lego model
architectural model
model train
Okay, then what is a model?
input, output
framework to follow
function
mold
simplifications/abstractions of real phenomena
How are models useful? What are their limitations?
import pandas as pd
import matplotlib.pyplot as plt
sp500 = pd.read_csv('https://raw.githubusercontent.com/GettysburgDataScience/datasets/refs/heads/main/sp500_1950_2025_weekly.csv',
parse_dates = ['Date'], thousands = ',')
sp500.head()
Date | Price | Open | High | Low | Vol. | Change % | |
---|---|---|---|---|---|---|---|
0 | 2025-08-31 | 6437.47 | 6386.45 | 6445.17 | 6360.53 | NaN | -0.35% |
1 | 2025-08-24 | 6460.26 | 6457.67 | 6508.23 | 6429.21 | NaN | -0.10% |
2 | 2025-08-17 | 6466.91 | 6445.02 | 6478.89 | 6343.86 | NaN | 0.27% |
3 | 2025-08-10 | 6449.80 | 6389.67 | 6481.34 | 6364.06 | NaN | 0.94% |
4 | 2025-08-03 | 6389.45 | 6271.71 | 6395.16 | 6271.71 | NaN | 2.43% |
murders_df = pd.read_csv('https://raw.githubusercontent.com/GettysburgDataScience/datasets/refs/heads/main/cdc_murdermortality_2023.csv',
thousands = ',')
murders_df.describe()
YEAR | RATE | DEATHS | |
---|---|---|---|
count | 502.000000 | 477.000000 | 502.000000 |
mean | 2018.515936 | 6.598323 | 413.547809 |
std | 2.880651 | 3.969087 | 458.695970 |
min | 2014.000000 | 0.000000 | 10.000000 |
25% | 2016.000000 | 3.600000 | 72.250000 |
50% | 2019.000000 | 6.100000 | 262.000000 |
75% | 2021.000000 | 8.500000 | 603.500000 |
max | 2023.000000 | 33.100000 | 2495.000000 |
murders_df['POP'] = murders_df['DEATHS']/murders_df['RATE']
murders_df.head()
YEAR | STATE | RATE | DEATHS | URL | POP | |
---|---|---|---|---|---|---|
0 | 2023 | AL | 14.8 | 717 | /nchs/state-stats/states/al.html | 48.445946 |
1 | 2023 | AK | 8.5 | 61 | /nchs/state-stats/states/ak.html | 7.176471 |
2 | 2023 | AZ | 7.5 | 531 | /nchs/state-stats/states/az.html | 70.800000 |
3 | 2023 | AR | 11.3 | 325 | /nchs/state-stats/states/ar.html | 28.761062 |
4 | 2023 | CA | 5.1 | 1972 | /nchs/state-stats/states/ca.html | 386.666667 |
hw_df = pd.read_csv('https://raw.githubusercontent.com/GettysburgDataScience/datasets/refs/heads/main/SOCR-HeightWeight.csv',
index_col = 0)
hw_df.head()
Height(Inches) | Weight(Pounds) | |
---|---|---|
Index | ||
1 | 65.78331 | 112.9925 |
2 | 71.51521 | 136.4873 |
3 | 69.39874 | 153.0269 |
4 | 68.21660 | 142.3354 |
5 | 67.78781 | 144.2971 |
fig, ax = plt.subplots(1, 3, figsize = (20, 5))
# Height vs Weight
ax[0].scatter(hw_df['Height(Inches)'], hw_df['Weight(Pounds)'], s = 1, alpha = 0.2)
ax[0].set_title('US Height vs Weight')
ax[0].set_xlabel('Height (in)')
ax[0].set_ylabel('Weight (lbs)')
# Homocides
murders2023_df = murders_df.query('YEAR==2023')
ax[1].scatter(murders2023_df['POP'], murders2023_df['DEATHS'], s = 10)
for i, row in murders2023_df.iterrows():
ax[1].annotate(row['STATE'], (row['POP'], row['DEATHS']), fontsize=8, alpha = 0.5)
ax[1].set_title('2023 Homocides by population for US states')
ax[1].set_xlabel('Population (100K residents)')
ax[1].set_ylabel('Homocides')
# S&P 500
ax[2].plot(sp500['Date'], sp500['Price'])
ax[2].set_title('S&P500 over time')
ax[2].set_ylabel('Price at end of week ($)')
ax[2].set_xlabel('Date')
plt.show()

Answer the following questions based on the plots above.
US Height vs Weight
Given a room of 1000 Americans, what do you think the average height and weight would be?
Suppose a 6’ American (72 in), how much would you predict them to weigh?
What information might change your answers above?
US Homocides
Which states have a number of homocides above expectation?
If there were a new state with 30M residents, how many homocides would you expect?
S&P 500
When has the economy experienced there significant bubbles or recessions?
What do you expect the SP500 price to be in 2030?
How did you arrive at your answers for these questions?