PS03 - Classification

PS03 - Classification#

DS325, Gettysburg College
Prof Eatai Roth

Due Monday Mar 30, 2026 at 9a

Total pts: 20

There will be a quiz on Logistic Regression, including this homework on Thursday April 2.

Your Name:

Your Collaborators:

IMPORTANT INSTRUCTIONS:#

When you submit your code, make sure that every cell runs without returning an error.
Once you have the results you need, edit out any extraneous code and outputs.

Problem 1#

Churn is the term service companies use to describe customers that leave the platform (e.g. switching from Verizon to T-Mobile). In this problem, you will be developing a classifier to predict customers at risk of churn from an Iranian Telecom service. The data come from the UC Irvine ML repository.

Fit a LogisticRegression model to the data:

split your data into training and testing sets (split 70-30 train-test)
scale the feature data
fit your model. Use some form of regularization and try several values of regularization strength (C) using LogisticRegressionCV.
use the model to make predictions.
plot the confusion matrix of the training and test set side by side0
print the classification report of the test set.

Then, answer the question below.

# %%capture
# !pip install ucimlrepo

# You only need to run this cell once. 
# If the package is installed successfully, 
# re-comment out the line below to never run this cell again.

from ucimlrepo import fetch_ucirepo 
import pandas as pd
import matplotlib.pyplot as plt

 
# fetch dataset 
iranian_churn = fetch_ucirepo(id=563) 
  
# data (as pandas dataframes) 
X = iranian_churn.data.features 
y = iranian_churn.data.targets 

Questions

What regularization did you use and with what value of C?
- your answer here
Which were the three most informative features?
- 1st most
- 2nd most
- 3rd most
What were the Recall, Precision, and Accuracy of your algorithm?
- Recall:
- Precision:
- Accuracy:
Which metric is most useful for this problem? Why?
- your answer here
Suppose your algorithm could predict churn 2 months before it happened. How might the company act on these predictions?
- your answer here

Problem 2#

In this exercise, you’ll compare three models and how they classify a synthetic dataset, nested moons.

Fit the following models:

Plain LogisticRegression
LogisticRegression with polynomial features (up to degree 5)
LogisticRegression with polynomial features (up to degree 5) with Lasso

Make each model as good as possible.

For each model:

plot the confusion matrix
print the classification report
plot the decision boundaries

For a classification problem with 2 features, we can plot the decision boundary separating the regions of the plane into those that map to each category. For each model, plot the decision boundary plot along with the data points from the test set.
You may use co-pilot/ChatGPT to create the DecisionBoundary plots.

Then answer the questions below.

from sklearn.datasets import make_moons

import matplotlib.pyplot as plt
import matplotlib as mpl

df = pd.DataFrame()
df[['feature_1','feature_2']], df['y'] = make_moons(n_samples=800, noise=.2, random_state=1)


fig, ax = plt.subplots(1,1, figsize=(5,5))

cmap = mpl.colors.ListedColormap(['red', 'blue'])    # Training Data
df.plot(x = 'feature_1', y = 'feature_2', c = 'y',
        s = 3, colorbar = False,
        kind = 'scatter', cmap = cmap,
        ax = ax)

plt.show()

../_images/d792cb809dd950802fc001cdcbaa0d739abdabe741ae24803dbd952a92098d69.png