Crack the Code!#
import pandas as pd
data_url = "https://raw.githubusercontent.com/GettysburgDataScience/datasets/refs/heads/main/countypres_2000-2024.csv"
elec_df = pd.read_csv(data_url, dtype = {'fips':str})
elec_df
| year | state | state_po | county_name | county_fips | office | candidate | party | candidatevotes | totalvotes | version | mode | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2000 | ALABAMA | AL | AUTAUGA | 1001.0 | US PRESIDENT | AL GORE | DEMOCRAT | 4942 | 17208 | 20250821 | TOTAL |
| 1 | 2000 | ALABAMA | AL | AUTAUGA | 1001.0 | US PRESIDENT | GEORGE W. BUSH | REPUBLICAN | 11993 | 17208 | 20250821 | TOTAL |
| 2 | 2000 | ALABAMA | AL | AUTAUGA | 1001.0 | US PRESIDENT | OTHER | OTHER | 113 | 17208 | 20250821 | TOTAL |
| 3 | 2000 | ALABAMA | AL | AUTAUGA | 1001.0 | US PRESIDENT | RALPH NADER | GREEN | 160 | 17208 | 20250821 | TOTAL |
| 4 | 2000 | ALABAMA | AL | BALDWIN | 1003.0 | US PRESIDENT | AL GORE | DEMOCRAT | 13997 | 56480 | 20250821 | TOTAL |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 94404 | 2024 | WYOMING | WY | WESTON | 56045.0 | US PRESIDENT | DONALD J TRUMP | REPUBLICAN | 3069 | 3512 | 20250821 | NaN |
| 94405 | 2024 | WYOMING | WY | WESTON | 56045.0 | US PRESIDENT | KAMALA D HARRIS | DEMOCRAT | 378 | 3512 | 20250821 | NaN |
| 94406 | 2024 | WYOMING | WY | WESTON | 56045.0 | US PRESIDENT | OTHER | OTHER | 18 | 3512 | 20250821 | NaN |
| 94407 | 2024 | WYOMING | WY | WESTON | 56045.0 | US PRESIDENT | OVERVOTES | NaN | 1 | 3512 | 20250821 | NaN |
| 94408 | 2024 | WYOMING | WY | WESTON | 56045.0 | US PRESIDENT | UNDERVOTES | NaN | 20 | 3512 | 20250821 | NaN |
94409 rows × 12 columns
Counties with swing potential. Classification.
Economic indicator and voting.
Regression of something pre/post-covid voter turnout modeling
Predict lean based on income, education, race, religion, occupation, gender, age distribution.
Note#
The way election tallies were reported changed in 2020. Prior to 2020, only Total counts are provided. In the 2020 and 2024 elections, some states provide breakdowns by voting method (e.g. absentee, provisional), but the nomenclature varies by state.
modes = {}
modes['state'] = []
for year in range(2000,2025, 4):
modes[str(year)] = []
for state in elec_df['state_po'].unique():
modes['state'].append(state)
for year in range(2000, 2025, 4):
modes[str(year)].append(elec_df.query("state_po == @state and year==@year")['mode'].unique())
voting_modes = pd.DataFrame(modes)
voting_modes.tail(10)
| state | 2000 | 2004 | 2008 | 2012 | 2016 | 2020 | 2024 | |
|---|---|---|---|---|---|---|---|---|
| 41 | SD | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL VOTES, VOTE CENTER] |
| 42 | TN | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL VOTES] |
| 43 | TX | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [nan, EARLY VOTING, TOTAL VOTES] |
| 44 | UT | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [EARLY, ELECTION DAY, MAIL, TOTAL] | [nan] |
| 45 | VT | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [nan] |
| 46 | VA | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [ABSENTEE, ELECTION DAY, PROVISIONAL] | [TOTAL VOTES] |
| 47 | WA | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [nan] |
| 48 | WV | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [nan] |
| 49 | WI | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [nan] |
| 50 | WY | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [TOTAL] | [nan] |
Clues#
Below are 7 questions of increasing difficulty. Each answer is numeric. Find the answers and assign the value to the corresponding variable in the Answers section below. The secret messages are clues to a fictitious scientific breakthrough that purportedly happened today.
a. What number is in the row with index 189 under the column ‘candidatevotes’?
b. How many different counties are tabulated in this dataset?
c. What is the county_fips code for the alphabetically last county in Pennsylvania?
d. In 2024, how many election day (not mail-in or provisional) votes did DJT get in Adams County, PA?
e. How many more votes were cast in PA in the 2020 election than the 2016 election?
f. In the 2000 election, by how many votes did the Republican party win Florida?
g. Rounded to the nearest percentage point, what was the Republican lean of Adams County in 2020?
# Here's a for example
a = elec_df['candidatevotes'].iloc[189]
a
np.int64(17084)
Show code cell source
import numpy as np
def decode_word(codestr, ans):
np.random.seed(ans)
number = np.random.randint(0, 1000000000)
numberstr = str(number)
decoded = ''
while codestr:
code = int(codestr[0:3])
num = numberstr[0]
char = chr(code - (int(num)+1)*ord(num))
decoded += char
try:
codestr = codestr[3:]
except:
pass
if len(numberstr) == 1:
continue
else:
numberstr = numberstr[1:]
return decoded
def decode_message(encoded_message, ans_list):
code_list = encoded_message.split()
word_list = []
for code, ans in zip(code_list, ans_list):
try:
word_list.append(decode_word(code, ans))
except:
word_list.append('_' * int(len(code)/3))
return ' '.join(word_list)
a = 17084
b = 2
c = 3
d = 4
e = 5
f = 6
g = 7
secret_message_a = '390495152144 462308199314 269422551417375 290309197479 584684671319423198605158164111 644479264684439 580671323423619633'
secret_message_b = '591674305432605 689479189684419 607681675314421142 323305 250429550417376 488305199304 432489145198375196'
ans = [a, b, c, d, e, f, g]
message_a = decode_message(secret_message_a, ans)
message_b = decode_message(secret_message_b, ans[::-1])
print(f'Quote 1: {message_a}')
print(f'Quote 2: {message_b}')
Quote 1: Huh. ____ _____ ____ __________ _____ ______
Quote 2: _____ _____ ______ __ _____ ____ roads.