PS03 - Classification#
DS325, Gettysburg College
Prof Eatai Roth
Due Friday Oct 24, 2025 5:00p
Total pts: 20
Your Name:
Your Collaborators:
IMPORTANT INSTRUCTIONS:#
When you submit your code, make sure that every cell runs without returning an error.
Once you have the results you need, edit out any extraneous code and outputs.
Problem 1#
%%capture
!pip install ucimlrepo
# You only need to run this cell once.
# If the package is installed successfully,
# re-comment out the line below to never run this cell again.
from ucimlrepo import fetch_ucirepo
import pandas as pd
# fetch dataset
mushroom = fetch_ucirepo(id=73)
# data (as pandas dataframes)
X = mushroom.data.features
y = mushroom.data.targets
# variable information
display(mushroom.variables[['name','description']])
name | description | |
---|---|---|
0 | poisonous | None |
1 | cap-shape | bell=b,conical=c,convex=x,flat=f, knobbed=k,su... |
2 | cap-surface | fibrous=f,grooves=g,scaly=y,smooth=s |
3 | cap-color | brown=n,buff=b,cinnamon=c,gray=g,green=r, pink... |
4 | bruises | bruises=t,no=f |
5 | odor | almond=a,anise=l,creosote=c,fishy=y,foul=f, mu... |
6 | gill-attachment | attached=a,descending=d,free=f,notched=n |
7 | gill-spacing | close=c,crowded=w,distant=d |
8 | gill-size | broad=b,narrow=n |
9 | gill-color | black=k,brown=n,buff=b,chocolate=h,gray=g, gre... |
10 | stalk-shape | enlarging=e,tapering=t |
11 | stalk-root | bulbous=b,club=c,cup=u,equal=e, rhizomorphs=z,... |
12 | stalk-surface-above-ring | fibrous=f,scaly=y,silky=k,smooth=s |
13 | stalk-surface-below-ring | fibrous=f,scaly=y,silky=k,smooth=s |
14 | stalk-color-above-ring | brown=n,buff=b,cinnamon=c,gray=g,orange=o, pin... |
15 | stalk-color-below-ring | brown=n,buff=b,cinnamon=c,gray=g,orange=o, pin... |
16 | veil-type | partial=p,universal=u |
17 | veil-color | brown=n,orange=o,white=w,yellow=y |
18 | ring-number | none=n,one=o,two=t |
19 | ring-type | cobwebby=c,evanescent=e,flaring=f,large=l, non... |
20 | spore-print-color | black=k,brown=n,buff=b,chocolate=h,green=r, or... |
21 | population | abundant=a,clustered=c,numerous=n, scattered=s... |
22 | habitat | grasses=g,leaves=l,meadows=m,paths=p, urban=u,... |
Question
The dataset has no numerical features.
For which of the above features would you use an Ordinal Encoder? Fill in the list below, add ‘-’ for additional bullets.
Remember, you will also use an Ordinal Encoder for binary features.
‘’’your answers here’’’
Coding prompt
Now write the code to appropriately encode the features.
'''your code here'''
'your code here'
Question Once encoded, how many features do you have?
‘’’your answer here’’’
Coding prompt 2
Now write the code to perform a classification using a decision tree. Experiment with gridsearch and different hyper-parameters until you feel you have a good model. (You should get perfect or nearly perfect prediction).
Plot the resultant decision tree (using plot tree)
Plot the confusion matrices for your both your training and testing sets
List the feature importances for your model (see these notes)
'''your code here'''
'your code here'
Problem 2#
In this exercise, you’ll compare three models and how they classify a synthetic dataset, nested moons.
Fit the following models:
DecisionTree
LogisticRegression
Create polynomial features (deg 3 at least) then use a LogisticRegression with Lasso.
Make each model as good as possible.
from sklearn.datasets import make_moons, make_s_curve
import matplotlib.pyplot as plt
import matplotlib as mpl
df = pd.DataFrame()
df[['feature_1','feature_2']], df['y'] = make_moons(n_samples=800, noise=.2, random_state=1)
fig, ax = plt.subplots(1,1, figsize=(5,5))
cmap = mpl.colors.ListedColormap(['red', 'blue']) # Training Data
df.plot(x = 'feature_1', y = 'feature_2', c = 'y',
s = 3, colorbar = False,
kind = 'scatter', cmap = cmap,
ax = ax)
plt.show()

'''your code here'''
'your code here'
Coding prompt
For each model, plot the decision boundary plot and the confusion matrix (see these notes).
You may use co-pilot/ChatGPT to create the DecisionBoundary plot for the polynomial regression. It is tricky.
'''your code here'''
'your code here'
Question
Which model is best? Why?
‘’’your response here’’’