3. Crack the Code! (Solutions)#
import pandas as pd
data_url = "https://raw.githubusercontent.com/GettysburgDataScience/datasets/refs/heads/main/countypres_2000-2024.csv"
elec_df = pd.read_csv(data_url, dtype = {'fips':str})
elec_df
| year | state | state_po | county_name | county_fips | office | candidate | party | candidatevotes | totalvotes | version | mode | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2000 | ALABAMA | AL | AUTAUGA | 1001.0 | US PRESIDENT | AL GORE | DEMOCRAT | 4942 | 17208 | 20250821 | TOTAL |
| 1 | 2000 | ALABAMA | AL | AUTAUGA | 1001.0 | US PRESIDENT | GEORGE W. BUSH | REPUBLICAN | 11993 | 17208 | 20250821 | TOTAL |
| 2 | 2000 | ALABAMA | AL | AUTAUGA | 1001.0 | US PRESIDENT | OTHER | OTHER | 113 | 17208 | 20250821 | TOTAL |
| 3 | 2000 | ALABAMA | AL | AUTAUGA | 1001.0 | US PRESIDENT | RALPH NADER | GREEN | 160 | 17208 | 20250821 | TOTAL |
| 4 | 2000 | ALABAMA | AL | BALDWIN | 1003.0 | US PRESIDENT | AL GORE | DEMOCRAT | 13997 | 56480 | 20250821 | TOTAL |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 94404 | 2024 | WYOMING | WY | WESTON | 56045.0 | US PRESIDENT | DONALD J TRUMP | REPUBLICAN | 3069 | 3512 | 20250821 | NaN |
| 94405 | 2024 | WYOMING | WY | WESTON | 56045.0 | US PRESIDENT | KAMALA D HARRIS | DEMOCRAT | 378 | 3512 | 20250821 | NaN |
| 94406 | 2024 | WYOMING | WY | WESTON | 56045.0 | US PRESIDENT | OTHER | OTHER | 18 | 3512 | 20250821 | NaN |
| 94407 | 2024 | WYOMING | WY | WESTON | 56045.0 | US PRESIDENT | OVERVOTES | NaN | 1 | 3512 | 20250821 | NaN |
| 94408 | 2024 | WYOMING | WY | WESTON | 56045.0 | US PRESIDENT | UNDERVOTES | NaN | 20 | 3512 | 20250821 | NaN |
94409 rows × 12 columns
3.1. Note#
The way election tallies were reported changed in 2020. Prior to 2020, only Total counts are provided. In the 2020 and 2024 elections, some states provide breakdowns by voting method (e.g. absentee, provisional), but the nomenclature varies by state.
modes = {}
modes['state'] = []
for year in range(2000,2025, 4):
modes[str(year)] = []
for state in elec_df['state_po'].unique():
modes['state'].append(state)
for year in range(2000, 2025, 4):
modes[str(year)].append(elec_df.query("state_po == @state and year==@year")['mode'].unique())
voting_modes = pd.DataFrame(modes)
voting_modes.tail(10)
| state | 2000 | 2004 | 2008 | 2012 | 2016 | 2020 | 2024 | |
|---|---|---|---|---|---|---|---|---|
| 41 | SD | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL VOTES, VOTE CENTER] |
| 42 | TN | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL VOTES] |
| 43 | TX | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [nan, EARLY VOTING, TOTAL VOTES] |
| 44 | UT | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [EARLY, ELECTION DAY, MAIL, TOTAL] | [nan] |
| 45 | VT | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [nan] |
| 46 | VA | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [ABSENTEE, ELECTION DAY, PROVISIONAL] | [TOTAL VOTES] |
| 47 | WA | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [nan] |
| 48 | WV | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [nan] |
| 49 | WI | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [nan] |
| 50 | WY | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [nan] |
3.2. Clues#
Below are 7 questions of increasing difficulty. Each answer is numeric. Find the answers and assign the value to the corresponding variable in the Answers section below. The secret messages are clues to a fictitious scientific breakthrough that purportedly happened today.
a. What number is in the row with index 189 under the column ‘candidatevotes’?
b. How many different counties are tabulated in this dataset?
c. What is the county_fips code for the alphabetically last county in Pennsylvania?
d. In 2024, how many election day (not mail-in or provisional) votes did DJT get in Adams County, PA?
e. How many more votes were cast in PA in the 2020 election than the 2016 election?
f. In the 2000 election, by how many votes did the Republican party win Florida?
g. Rounded to the nearest percentage point, what was the Republican lean of Adams County in 2020?
# Here's a for example
a = elec_df['candidatevotes'].iloc[189]
b = len(elec_df['county_fips'].unique())
c = elec_df.query('state_po == "PA"')['county_fips'].iloc[-1]
c = 42133
acpa = 42001
elec_df.query('state_po == "PA" and county_name == "ADAMS"')
d = elec_df.query('county_fips == @acpa and year == 2024 and party == "REPUBLICAN" and mode == "ELECTION DAY"')['candidatevotes'].iloc[0]
votes_2020 = elec_df.query('state_po == "PA" and year == 2020')['candidatevotes'].sum()
votes_2016 = elec_df.query('state_po == "PA" and year == 2016')['candidatevotes'].sum()
e = votes_2020-votes_2016
votes_R = elec_df.query('state_po == "FL" and year == 2000 and party == "REPUBLICAN"')['candidatevotes'].sum()
votes_D = elec_df.query('state_po == "FL" and year == 2000 and party == "DEMOCRAT"')['candidatevotes'].sum()
f = votes_D - votes_R
R = elec_df.query('county_fips == @acpa and year == 2020 and party == "REPUBLICAN"')['candidatevotes'].sum()
D = elec_df.query('county_fips == @acpa and year == 2020 and party == "DEMOCRAT"')['candidatevotes'].sum()
g = round(100*(R-D)/(R+D))
Show code cell source
import numpy as np
def decode_word(codestr, ans):
np.random.seed(ans)
number = np.random.randint(0, 1000000000)
numberstr = str(number)
decoded = ''
while codestr:
code = int(codestr[0:3])
num = numberstr[0]
char = chr(code - (int(num)+1)*ord(num))
decoded += char
try:
codestr = codestr[3:]
except:
pass
if len(numberstr) == 1:
continue
else:
numberstr = numberstr[1:]
return decoded
def decode_message(encoded_message, ans_list):
code_list = encoded_message.split()
word_list = []
for code, ans in zip(code_list, ans_list):
try:
word_list.append(decode_word(code, ans))
except:
word_list.append('_' * int(len(code)/3))
return ' '.join(word_list)
secret_message_a = '390495152144 462308199314 269422551417375 290309197479 584684671319423198605158164111 644479264684439 580671323423619633'
secret_message_b = '591674305432605 689479189684419 607681675314421142 323305 250429550417376 488305199304 432489145198375196'
ans = [a, b, c, d, e, f, g]
message_a = decode_message(secret_message_a, ans)
message_b = decode_message(secret_message_b, ans[::-1])
print(f'Quote 1: {message_a}')
print(f'Quote 2: {message_b}')
Quote 1: Huh. Then who's Vice President? Jerry Lewis?
Quote 2: Where we're going, we don't need roads.