PS03 - Classification

PS03 - Classification#

DS325, Gettysburg College

Prof Eatai Roth

Due Friday Oct 24, 2025 5:00p

Total pts: 20

Your Name:

Your Collaborators:

IMPORTANT INSTRUCTIONS:#

When you submit your code, make sure that every cell runs without returning an error.
Once you have the results you need, edit out any extraneous code and outputs.

Problem 1#

%%capture
!pip install ucimlrepo

# You only need to run this cell once. 
# If the package is installed successfully, 
# re-comment out the line below to never run this cell again.

from ucimlrepo import fetch_ucirepo 
import pandas as pd

  
# fetch dataset 
mushroom = fetch_ucirepo(id=73) 
  
# data (as pandas dataframes) 
X = mushroom.data.features 
y = mushroom.data.targets 
  
# variable information 
display(mushroom.variables[['name','description']]) 

	name	description
0	poisonous	None
1	cap-shape	bell=b,conical=c,convex=x,flat=f, knobbed=k,su...
2	cap-surface	fibrous=f,grooves=g,scaly=y,smooth=s
3	cap-color	brown=n,buff=b,cinnamon=c,gray=g,green=r, pink...
4	bruises	bruises=t,no=f
5	odor	almond=a,anise=l,creosote=c,fishy=y,foul=f, mu...
6	gill-attachment	attached=a,descending=d,free=f,notched=n
7	gill-spacing	close=c,crowded=w,distant=d
8	gill-size	broad=b,narrow=n
9	gill-color	black=k,brown=n,buff=b,chocolate=h,gray=g, gre...
10	stalk-shape	enlarging=e,tapering=t
11	stalk-root	bulbous=b,club=c,cup=u,equal=e, rhizomorphs=z,...
12	stalk-surface-above-ring	fibrous=f,scaly=y,silky=k,smooth=s
13	stalk-surface-below-ring	fibrous=f,scaly=y,silky=k,smooth=s
14	stalk-color-above-ring	brown=n,buff=b,cinnamon=c,gray=g,orange=o, pin...
15	stalk-color-below-ring	brown=n,buff=b,cinnamon=c,gray=g,orange=o, pin...
16	veil-type	partial=p,universal=u
17	veil-color	brown=n,orange=o,white=w,yellow=y
18	ring-number	none=n,one=o,two=t
19	ring-type	cobwebby=c,evanescent=e,flaring=f,large=l, non...
20	spore-print-color	black=k,brown=n,buff=b,chocolate=h,green=r, or...
21	population	abundant=a,clustered=c,numerous=n, scattered=s...
22	habitat	grasses=g,leaves=l,meadows=m,paths=p, urban=u,...

Question

The dataset has no numerical features.

For which of the above features would you use an Ordinal Encoder? Fill in the list below, add ‘-’ for additional bullets.

Remember, you will also use an Ordinal Encoder for binary features.

‘’’your answers here’’’

Coding prompt

Now write the code to appropriately encode the features.

'''your code here'''

'your code here'

Question Once encoded, how many features do you have?

‘’’your answer here’’’

Coding prompt 2

Now write the code to perform a classification using a decision tree. Experiment with gridsearch and different hyper-parameters until you feel you have a good model. (You should get perfect or nearly perfect prediction).

Plot the resultant decision tree (using plot tree)
Plot the confusion matrices for your both your training and testing sets
List the feature importances for your model (see these notes)

'''your code here'''

'your code here'

Problem 2#

In this exercise, you’ll compare three models and how they classify a synthetic dataset, nested moons.

Fit the following models:

DecisionTree
LogisticRegression
Create polynomial features (deg 3 at least) then use a LogisticRegression with Lasso.

Make each model as good as possible.

from sklearn.datasets import make_moons, make_s_curve

import matplotlib.pyplot as plt
import matplotlib as mpl

df = pd.DataFrame()
df[['feature_1','feature_2']], df['y'] = make_moons(n_samples=800, noise=.2, random_state=1)


fig, ax = plt.subplots(1,1, figsize=(5,5))

cmap = mpl.colors.ListedColormap(['red', 'blue'])    # Training Data
df.plot(x = 'feature_1', y = 'feature_2', c = 'y',
        s = 3, colorbar = False,
        kind = 'scatter', cmap = cmap,
        ax = ax)

plt.show()

../_images/d792cb809dd950802fc001cdcbaa0d739abdabe741ae24803dbd952a92098d69.png

'''your code here'''

'your code here'

Coding prompt

For each model, plot the decision boundary plot and the confusion matrix (see these notes).

You may use co-pilot/ChatGPT to create the DecisionBoundary plot for the polynomial regression. It is tricky.

'''your code here'''

'your code here'

Question

Which model is best? Why?

‘’’your response here’’’

PS03 - Classification

Contents

PS03 - Classification#

IMPORTANT INSTRUCTIONS:#

Problem 1#

Problem 2#