PS00 (Solutions by Prof Roth)#

Prof Eatai Roth
DS325 F2025
Gettysburg College

Before downloading this Problem Set file#

Let’s set up a Git repository folder on your local computer. This folder will hold all of your assignments and projects for this class and will be linked to a repository that lives on Github.

Github Assignments invitation

Jupyter notebook review#

Notebooks comprise two kinds of cells:

  • Markdown - text annotation (like this cell).

  • Code - cells that will run code.

Some useful keyboard shortcuts:

  • SHIFT+RETURN (SHIFT+ENTER on PC) - runs a cell and moves focus to the next cell

  • Esc -> A - adds a cell above the current cell (-> means followed by)

  • Esc -> B - adds a cell below the current cell

  • Esc -> M - converts a cell to Markdown

  • Esc -> Y - converts a cell to Code

Kernels#

When you first run a cell of code, VS Code will prompt you to select a kernel. You can think of a kernel as a computational session and VS Code wants to know which version of Python you’d like to use to execute your code. It will present a list of installed Python distributions; you should select the one that has ‘Anaconda’ in the path.

The kernel also keeps track of all the variables you’ve created in the session. If you want to run your code from a clean slate, you can restart the kernel to clear all variables.

Assignment grading#

I’ll be using a package called Otter Grader to help automate the grading and feedback you get on assignments. We’ll see how Otter grader works below, but first, we must install the package.

%%capture
!pip install otter-grader
import otter
grader = otter.Notebook()

Practice Problems#

Problem 1

Write a function named hello_blank that takes as an input any string and returns a new string ‘Hello [input string]!’

def hello_blank(word):
    return f'Hello {word}!'
# grader.check("q1")

Problem 2

Create a DataFrame mirroring the table below and assign this to icecream_df. Then group by the flavor column and find the mean price-per-scoop for each flavor; assign this series to price_by_flavor.

flavor

scoops

price

chocolate

1

2

vanilla

1

1.5

chocolate

2

3

strawberry

1

2

strawberry

3

4

vanilla

2

2

mint

1

4

mint

2

5

chocolate

3

5

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

icecream_df = pd.DataFrame(dict(
    flavor = ['chocolate', 'vanilla', 'chocolate',  'strawberry', 'strawberry', 'vanilla', 'mint', 'mint', 'chocolate'],
    scoops = [1, 1, 2, 1, 3, 2, 1, 2, 3],
    price = [2, 1.5, 3, 2, 4, 2, 4, 5, 5]
    )
    )
flavors_df = icecream_df.groupby('flavor').sum()
price_by_flavor =flavors_df['price']/flavors_df['scoops']
price_by_flavor
flavor
chocolate     1.666667
mint          3.000000
strawberry    1.500000
vanilla       1.166667
dtype: float64
# grader.check("q2a")

2b Create a barplot of price_by_flavor.

plt.bar(x = price_by_flavor.index, height = price_by_flavor)
plt.ylabel('Price per scoop ($)')
plt.title('Price per scoop by flavor')
plt.show()
../_images/7eecfa743ff0747866b2ced094788809a3275661439ddb4c934e4e1d7ab86485.png

2c Per scoop, which flavor is most expensive?

Mint is most expensive.