PS03 - Classification

PS03 - Classification#

DS325, Gettysburg College
Prof Eatai Roth

Due Monday Mar 30, 2026 at 9a

Total pts: 20

There will be a quiz on Logistic Regression, including this homework on Thursday April 2.

Your Name:

Your Collaborators:

IMPORTANT INSTRUCTIONS:#

  • When you submit your code, make sure that every cell runs without returning an error.

  • Once you have the results you need, edit out any extraneous code and outputs.

Problem 1#

Churn is the term service companies use to describe customers that leave the platform (e.g. switching from Verizon to T-Mobile). In this problem, you will be developing a classifier to predict customers at risk of churn from an Iranian Telecom service. The data come from the UC Irvine ML repository.

Fit a LogisticRegression model to the data:

  • split your data into training and testing sets (split 70-30 train-test)

  • scale the feature data

  • fit your model. Use some form of regularization and try several values of regularization strength (C) using LogisticRegressionCV.

  • use the model to make predictions.

  • plot the confusion matrix of the training and test set side by side0

  • print the classification report of the test set.

Then, answer the question below.

# %%capture
# !pip install ucimlrepo

# You only need to run this cell once. 
# If the package is installed successfully, 
# re-comment out the line below to never run this cell again.
from ucimlrepo import fetch_ucirepo 
import pandas as pd
import matplotlib.pyplot as plt
 
# fetch dataset 
iranian_churn = fetch_ucirepo(id=563) 
  
# data (as pandas dataframes) 
X = iranian_churn.data.features 
y = iranian_churn.data.targets 

Questions

  • What regularization did you use and with what value of C?

    • your answer here

  • Which were the three most informative features?

    • 1st most

    • 2nd most

    • 3rd most

  • What were the Recall, Precision, and Accuracy of your algorithm?

    • Recall:

    • Precision:

    • Accuracy:

  • Which metric is most useful for this problem? Why?

    • your answer here

  • Suppose your algorithm could predict churn 2 months before it happened. How might the company act on these predictions?

    • your answer here

Problem 2#

In this exercise, you’ll compare three models and how they classify a synthetic dataset, nested moons.

Fit the following models:

  • Plain LogisticRegression

  • LogisticRegression with polynomial features (up to degree 5)

  • LogisticRegression with polynomial features (up to degree 5) with Lasso

Make each model as good as possible.

For each model:

  • plot the confusion matrix

  • print the classification report

  • plot the decision boundaries

For a classification problem with 2 features, we can plot the decision boundary separating the regions of the plane into those that map to each category. For each model, plot the decision boundary plot along with the data points from the test set.
You may use co-pilot/ChatGPT to create the DecisionBoundary plots.

Then answer the questions below.

from sklearn.datasets import make_moons

import matplotlib.pyplot as plt
import matplotlib as mpl
df = pd.DataFrame()
df[['feature_1','feature_2']], df['y'] = make_moons(n_samples=800, noise=.2, random_state=1)


fig, ax = plt.subplots(1,1, figsize=(5,5))

cmap = mpl.colors.ListedColormap(['red', 'blue'])    # Training Data
df.plot(x = 'feature_1', y = 'feature_2', c = 'y',
        s = 3, colorbar = False,
        kind = 'scatter', cmap = cmap,
        ax = ax)

plt.show()
../_images/d792cb809dd950802fc001cdcbaa0d739abdabe741ae24803dbd952a92098d69.png

Questions

  • What are the Accuracy scores for each model?

    • Logistic

    • Logistic with Polynomial

    • Logistic with Polynomial and Lasso

  • Describe the decision boundaries for each?

    • Logistic

    • Logistic with Polynomial

    • Logistic with Polynomial and Lasso

  • Based on the decision boundaries and the confusion matrices, state which models are over- or under-fitting if any.

    • Logistic

    • Logistic with Polynomial

    • Logistic with Polynomial and Lasso