Crack the Code!

Contents

Crack the Code!#

import pandas as pd

data_url = "https://raw.githubusercontent.com/GettysburgDataScience/datasets/refs/heads/main/countypres_2000-2024.csv"
elec_df = pd.read_csv(data_url, dtype = {'fips':str})

elec_df
year state state_po county_name county_fips office candidate party candidatevotes totalvotes version mode
0 2000 ALABAMA AL AUTAUGA 1001.0 US PRESIDENT AL GORE DEMOCRAT 4942 17208 20250821 TOTAL
1 2000 ALABAMA AL AUTAUGA 1001.0 US PRESIDENT GEORGE W. BUSH REPUBLICAN 11993 17208 20250821 TOTAL
2 2000 ALABAMA AL AUTAUGA 1001.0 US PRESIDENT OTHER OTHER 113 17208 20250821 TOTAL
3 2000 ALABAMA AL AUTAUGA 1001.0 US PRESIDENT RALPH NADER GREEN 160 17208 20250821 TOTAL
4 2000 ALABAMA AL BALDWIN 1003.0 US PRESIDENT AL GORE DEMOCRAT 13997 56480 20250821 TOTAL
... ... ... ... ... ... ... ... ... ... ... ... ...
94404 2024 WYOMING WY WESTON 56045.0 US PRESIDENT DONALD J TRUMP REPUBLICAN 3069 3512 20250821 NaN
94405 2024 WYOMING WY WESTON 56045.0 US PRESIDENT KAMALA D HARRIS DEMOCRAT 378 3512 20250821 NaN
94406 2024 WYOMING WY WESTON 56045.0 US PRESIDENT OTHER OTHER 18 3512 20250821 NaN
94407 2024 WYOMING WY WESTON 56045.0 US PRESIDENT OVERVOTES NaN 1 3512 20250821 NaN
94408 2024 WYOMING WY WESTON 56045.0 US PRESIDENT UNDERVOTES NaN 20 3512 20250821 NaN

94409 rows × 12 columns

  • Counties with swing potential. Classification.

  • Economic indicator and voting.

  • Regression of something pre/post-covid voter turnout modeling

  • Predict lean based on income, education, race, religion, occupation, gender, age distribution.

Note#

The way election tallies were reported changed in 2020. Prior to 2020, only Total counts are provided. In the 2020 and 2024 elections, some states provide breakdowns by voting method (e.g. absentee, provisional), but the nomenclature varies by state.

modes = {}

modes['state'] = []
for year in range(2000,2025, 4):
    modes[str(year)] = []
        
for state in elec_df['state_po'].unique():
    modes['state'].append(state)
    
    for year in range(2000, 2025, 4):
        modes[str(year)].append(elec_df.query("state_po == @state and year==@year")['mode'].unique())
   
voting_modes = pd.DataFrame(modes)
voting_modes.tail(10)
state 2000 2004 2008 2012 2016 2020 2024
41 SD [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL VOTES, VOTE CENTER]
42 TN [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL VOTES]
43 TX [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [nan, EARLY VOTING, TOTAL VOTES]
44 UT [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [EARLY, ELECTION DAY, MAIL, TOTAL] [nan]
45 VT [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [nan]
46 VA [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [ABSENTEE, ELECTION DAY, PROVISIONAL] [TOTAL VOTES]
47 WA [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [nan]
48 WV [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [nan]
49 WI [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [nan]
50 WY [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [TOTAL] [nan]

Clues#

Below are 7 questions of increasing difficulty. Each answer is numeric. Find the answers and assign the value to the corresponding variable in the Answers section below. The secret messages are clues to a fictitious scientific breakthrough that purportedly happened today.

  • a. What number is in the row with index 189 under the column ‘candidatevotes’?

  • b. How many different counties are tabulated in this dataset?

  • c. What is the county_fips code for the alphabetically last county in Pennsylvania?

  • d. In 2024, how many election day (not mail-in or provisional) votes did DJT get in Adams County, PA?

  • e. How many more votes were cast in PA in the 2020 election than the 2016 election?

  • f. In the 2000 election, by how many votes did the Republican party win Florida?

  • g. Rounded to the nearest percentage point, what was the Republican lean of Adams County in 2020?

# Here's a for example

a = elec_df['candidatevotes'].iloc[189]
a
np.int64(17084)
Hide code cell source
import numpy as np

def decode_word(codestr, ans):
    np.random.seed(ans)
    number = np.random.randint(0, 1000000000)
    
    numberstr = str(number)
    decoded = ''
    while codestr:
        code = int(codestr[0:3])
        num = numberstr[0]     

        char = chr(code - (int(num)+1)*ord(num))
        decoded += char
        
        try:
            codestr = codestr[3:]
        except:
            pass
        
        if len(numberstr) == 1:
            continue
        else:
            numberstr = numberstr[1:]
        
    return decoded

def decode_message(encoded_message, ans_list):
    code_list = encoded_message.split()
    word_list = []
    for code, ans in zip(code_list, ans_list):
        try:
            word_list.append(decode_word(code, ans))
        except:
            word_list.append('_' * int(len(code)/3))
            
    return ' '.join(word_list)
a = 17084
b = 2
c = 3
d = 4
e = 5
f = 6
g = 7

secret_message_a = '390495152144 462308199314 269422551417375 290309197479 584684671319423198605158164111 644479264684439 580671323423619633'
secret_message_b = '591674305432605 689479189684419 607681675314421142 323305 250429550417376 488305199304 432489145198375196'

ans = [a, b, c, d, e, f, g]

message_a = decode_message(secret_message_a, ans)
message_b = decode_message(secret_message_b, ans[::-1])

print(f'Quote 1: {message_a}')
print(f'Quote 2: {message_b}')
Quote 1: Huh. ____ _____ ____ __________ _____ ______
Quote 2: _____ _____ ______ __ _____ ____ roads.