2. Logistic Regression#
Is it regression or classification? Yes.
Let’s start differently, with an example.
2.1. Example: Simple Logistic Regression as a Classifier#
Let’s revisit the breast cancer dataset. The data comprise numerous physical features of a tumor (e.g. area, texture, symmetry, etc.) and each feature set is labeled with a binary target, benign or malignant.
Note: In the original data set, benign tumors are labeled 1 and malignant tumors 0. This seems backwards to me and every time I look at these data, my wrong intuition beats out my terrible memory. So, in the example below, I’ve swapped the labeling so that 1 and 0 correspond to malignant and benign, respectively. So:
0 = benign
1 = malignant
We’ll first fit a simple logistic regression, predicting malignancy based on just one feature.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
# Load the breast cancer dataset
bc_df, y = load_breast_cancer(return_X_y=True, as_frame=True)
# Adding the target to the DataFrame of features
# AND FLIPPING THE LABELS OF THE TARGET
# 1 - malignant
# 0 - benign
bc_df['y'] = 1-y
display(bc_df.describe())
display(bc_df['y'].describe())
mean radius | mean texture | mean perimeter | mean area | mean smoothness | mean compactness | mean concavity | mean concave points | mean symmetry | mean fractal dimension | ... | worst texture | worst perimeter | worst area | worst smoothness | worst compactness | worst concavity | worst concave points | worst symmetry | worst fractal dimension | y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 569.000000 | 569.000000 | 569.000000 | 569.000000 | 569.000000 | 569.000000 | 569.000000 | 569.000000 | 569.000000 | 569.000000 | ... | 569.000000 | 569.000000 | 569.000000 | 569.000000 | 569.000000 | 569.000000 | 569.000000 | 569.000000 | 569.000000 | 569.000000 |
mean | 14.127292 | 19.289649 | 91.969033 | 654.889104 | 0.096360 | 0.104341 | 0.088799 | 0.048919 | 0.181162 | 0.062798 | ... | 25.677223 | 107.261213 | 880.583128 | 0.132369 | 0.254265 | 0.272188 | 0.114606 | 0.290076 | 0.083946 | 0.372583 |
std | 3.524049 | 4.301036 | 24.298981 | 351.914129 | 0.014064 | 0.052813 | 0.079720 | 0.038803 | 0.027414 | 0.007060 | ... | 6.146258 | 33.602542 | 569.356993 | 0.022832 | 0.157336 | 0.208624 | 0.065732 | 0.061867 | 0.018061 | 0.483918 |
min | 6.981000 | 9.710000 | 43.790000 | 143.500000 | 0.052630 | 0.019380 | 0.000000 | 0.000000 | 0.106000 | 0.049960 | ... | 12.020000 | 50.410000 | 185.200000 | 0.071170 | 0.027290 | 0.000000 | 0.000000 | 0.156500 | 0.055040 | 0.000000 |
25% | 11.700000 | 16.170000 | 75.170000 | 420.300000 | 0.086370 | 0.064920 | 0.029560 | 0.020310 | 0.161900 | 0.057700 | ... | 21.080000 | 84.110000 | 515.300000 | 0.116600 | 0.147200 | 0.114500 | 0.064930 | 0.250400 | 0.071460 | 0.000000 |
50% | 13.370000 | 18.840000 | 86.240000 | 551.100000 | 0.095870 | 0.092630 | 0.061540 | 0.033500 | 0.179200 | 0.061540 | ... | 25.410000 | 97.660000 | 686.500000 | 0.131300 | 0.211900 | 0.226700 | 0.099930 | 0.282200 | 0.080040 | 0.000000 |
75% | 15.780000 | 21.800000 | 104.100000 | 782.700000 | 0.105300 | 0.130400 | 0.130700 | 0.074000 | 0.195700 | 0.066120 | ... | 29.720000 | 125.400000 | 1084.000000 | 0.146000 | 0.339100 | 0.382900 | 0.161400 | 0.317900 | 0.092080 | 1.000000 |
max | 28.110000 | 39.280000 | 188.500000 | 2501.000000 | 0.163400 | 0.345400 | 0.426800 | 0.201200 | 0.304000 | 0.097440 | ... | 49.540000 | 251.200000 | 4254.000000 | 0.222600 | 1.058000 | 1.252000 | 0.291000 | 0.663800 | 0.207500 | 1.000000 |
8 rows × 31 columns
count 569.000000
mean 0.372583
std 0.483918
min 0.000000
25% 0.000000
50% 0.000000
75% 1.000000
max 1.000000
Name: y, dtype: float64
# sns.pairplot(bc_df)
# plt.show()
bc_corr = bc_df.corr()
fig, ax = plt.subplots(1,1,figsize = (15,15))
sns.heatmap(bc_corr, annot = True, fmt = '.2f')
plt.show()

bc_corr = bc_df.corr()
bc_corr[['y']].sort_values(by = 'y', ascending = False)
y | |
---|---|
y | 1.000000 |
worst concave points | 0.793566 |
worst perimeter | 0.782914 |
mean concave points | 0.776614 |
worst radius | 0.776454 |
mean perimeter | 0.742636 |
worst area | 0.733825 |
mean radius | 0.730029 |
mean area | 0.708984 |
mean concavity | 0.696360 |
worst concavity | 0.659610 |
mean compactness | 0.596534 |
worst compactness | 0.590998 |
radius error | 0.567134 |
perimeter error | 0.556141 |
area error | 0.548236 |
worst texture | 0.456903 |
worst smoothness | 0.421465 |
worst symmetry | 0.416294 |
mean texture | 0.415185 |
concave points error | 0.408042 |
mean smoothness | 0.358560 |
mean symmetry | 0.330499 |
worst fractal dimension | 0.323872 |
compactness error | 0.292999 |
concavity error | 0.253730 |
fractal dimension error | 0.077972 |
symmetry error | -0.006522 |
texture error | -0.008303 |
mean fractal dimension | -0.012838 |
smoothness error | -0.067016 |
feature = 'worst concave points'
X = bc_df[[feature]]
y = bc_df['y']
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import ConfusionMatrixDisplay
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=59)
# Train a logistic regression model
model_LogReg = LogisticRegression(penalty = None, max_iter = 10000)
model_LogReg.fit(X_train, y_train)
# Make predictions
y_pred = model_LogReg.predict(X_test)
TP = (y_pred==1) & (y_test==1)
TN = (y_pred==0) & (y_test==0)
FP = (y_pred==1) & (y_test==0)
FN = (y_pred==0) & (y_test==1)
right = 'forestgreen'
wrong = 'firebrick'
positive = '+'
negative = 's'
fig, ax = plt.subplots(1,1, figsize = (15, 6))
ax.plot(X_test[TP], y_pred[TP], color = right, alpha = 0.5, marker = positive, linewidth = 0, label = 'True Positive')
ax.plot(X_test[TN], y_pred[TN], color = right, markerfacecolor='none',alpha = 0.5, marker = negative, linewidth = 0, label = 'True Negative')
ax.plot(X_test[FP], y_pred[FP], color = wrong, alpha = 0.5, marker = positive, linewidth = 0, label = 'False Positive')
ax.plot(X_test[FN], y_pred[FN], color = wrong, markerfacecolor='none', alpha = 0.5, marker = negative, linewidth = 0, label = 'False Negative')
plt.legend()
plt.show()

y_pred_prob = model_LogReg.predict_proba(X_test)
y_pred_prob
array([[5.44077032e-01, 4.55922968e-01],
[9.62062910e-01, 3.79370899e-02],
[9.57331930e-01, 4.26680697e-02],
[9.99721296e-01, 2.78704480e-04],
[1.46279385e-01, 8.53720615e-01],
[2.08250039e-01, 7.91749961e-01],
[9.99721296e-01, 2.78704480e-04],
[8.87634417e-01, 1.12365583e-01],
[9.05480794e-01, 9.45192059e-02],
[8.75698342e-01, 1.24301658e-01],
[1.25467803e-01, 8.74532197e-01],
[1.18044894e-02, 9.88195511e-01],
[2.51504627e-01, 7.48495373e-01],
[6.74321801e-01, 3.25678199e-01],
[5.83207275e-01, 4.16792725e-01],
[9.13925978e-02, 9.08607402e-01],
[9.84872909e-01, 1.51270914e-02],
[9.90642327e-01, 9.35767331e-03],
[9.62771546e-01, 3.72284541e-02],
[5.37995787e-01, 4.62004213e-01],
[9.93737280e-01, 6.26271980e-03],
[9.56828937e-01, 4.31710631e-02],
[4.19100013e-02, 9.58089999e-01],
[2.81397609e-01, 7.18602391e-01],
[1.19545310e-01, 8.80454690e-01],
[6.51251169e-02, 9.34874883e-01],
[9.95730447e-01, 4.26955263e-03],
[2.82637326e-01, 7.17362674e-01],
[9.98423814e-01, 1.57618596e-03],
[1.67298032e-01, 8.32701968e-01],
[9.92748756e-01, 7.25124437e-03],
[8.21984874e-01, 1.78015126e-01],
[8.45805817e-03, 9.91541942e-01],
[2.57311209e-01, 7.42688791e-01],
[1.44000467e-01, 8.55999533e-01],
[9.45816607e-01, 5.41833934e-02],
[4.93604510e-02, 9.50639549e-01],
[9.94545476e-01, 5.45452389e-03],
[9.84624642e-01, 1.53753577e-02],
[7.49056048e-03, 9.92509440e-01],
[2.68934852e-02, 9.73106515e-01],
[7.21985647e-01, 2.78014353e-01],
[9.99721296e-01, 2.78704480e-04],
[4.21565347e-02, 9.57843465e-01],
[8.53803448e-05, 9.99914620e-01],
[4.57519380e-03, 9.95424806e-01],
[4.80376467e-03, 9.95196235e-01],
[9.48482438e-01, 5.15175616e-02],
[9.90081448e-01, 9.91855209e-03],
[2.03050833e-02, 9.79694917e-01],
[2.22736444e-01, 7.77263556e-01],
[9.28301925e-01, 7.16980749e-02],
[9.68693845e-01, 3.13061549e-02],
[1.73688304e-02, 9.82631170e-01],
[9.40399971e-04, 9.99059600e-01],
[9.93891585e-01, 6.10841495e-03],
[9.96122594e-01, 3.87740572e-03],
[9.74880458e-01, 2.51195423e-02],
[9.98907860e-01, 1.09213978e-03],
[3.90571874e-02, 9.60942813e-01],
[1.26930061e-02, 9.87306994e-01],
[9.57952801e-01, 4.20471993e-02],
[3.37541527e-01, 6.62458473e-01],
[2.63204709e-01, 7.36795291e-01],
[2.54884107e-02, 9.74511589e-01],
[6.04436804e-02, 9.39556320e-01],
[9.88205705e-01, 1.17942949e-02],
[4.93604510e-02, 9.50639549e-01],
[3.14502725e-04, 9.99685497e-01],
[9.97886484e-01, 2.11351555e-03],
[3.97524251e-02, 9.60247575e-01],
[9.64592586e-01, 3.54074144e-02],
[9.87007610e-01, 1.29923902e-02],
[9.99721296e-01, 2.78704480e-04],
[1.22328155e-03, 9.98776718e-01],
[9.97204301e-01, 2.79569920e-03],
[4.73877860e-02, 9.52612214e-01],
[8.55048130e-01, 1.44951870e-01],
[2.29160968e-01, 7.70839032e-01],
[9.41556212e-01, 5.84437883e-02],
[6.84922452e-05, 9.99931508e-01],
[9.88205705e-01, 1.17942949e-02],
[8.32482479e-01, 1.67517521e-01],
[6.32859195e-02, 9.36714080e-01],
[9.52374732e-01, 4.76252677e-02],
[6.70139405e-02, 9.32986060e-01],
[9.99721296e-01, 2.78704480e-04],
[8.95446175e-01, 1.04553825e-01],
[9.23081659e-01, 7.69183412e-02],
[9.91747022e-01, 8.25297820e-03],
[9.82019225e-01, 1.79807747e-02],
[1.60613162e-02, 9.83938684e-01],
[4.36650887e-02, 9.56334911e-01],
[9.55986599e-01, 4.40134013e-02],
[9.36329283e-01, 6.36707175e-02],
[9.25015210e-01, 7.49847904e-02],
[9.44579811e-01, 5.54201895e-02],
[2.06934066e-03, 9.97930659e-01],
[9.94034854e-01, 5.96514559e-03],
[9.87007610e-01, 1.29923902e-02],
[9.81910797e-01, 1.80892033e-02],
[4.29859288e-01, 5.70140712e-01],
[8.98053722e-01, 1.01946278e-01],
[8.73685169e-01, 1.26314831e-01],
[1.60087715e-03, 9.98399123e-01],
[9.89012068e-01, 1.09879322e-02],
[9.87575970e-01, 1.24240303e-02],
[9.99241979e-01, 7.58021348e-04],
[9.95845992e-01, 4.15400758e-03],
[9.79845732e-01, 2.01542680e-02],
[6.20354615e-04, 9.99379645e-01],
[3.94277545e-01, 6.05722455e-01],
[4.66180777e-01, 5.33819223e-01],
[9.91605518e-01, 8.39448225e-03]])
fig, ax = plt.subplots(1,1, figsize = (15, 6))
ax.plot(X_test[TP], y_pred_prob[TP,1], color = right, alpha = 0.5, marker = positive, linewidth = 0, label = 'True Positive')
ax.plot(X_test[TN], y_pred_prob[TN,1], color = right, markerfacecolor='none',alpha = 0.5, marker = negative, linewidth = 0, label = 'True Negative')
ax.plot(X_test[FP], y_pred_prob[FP,1], color = wrong, alpha = 0.5, marker = positive, linewidth = 0, label = 'False Positive')
ax.plot(X_test[FN], y_pred_prob[FN,1], color = wrong, markerfacecolor='none', alpha = 0.5, marker = negative, linewidth = 0, label = 'False Negative')
ax.set_xlabel(feature)
ax.set_ylabel('Predicted Malignancy (Prob %)')
plt.legend()
plt.axhline(y = 0.5, color = 'k', linestyle = '--', alpha = 0.2)
plt.show()

What is this shape?
2.2. The Logistic Function#
The logistic function is a smooth monotonically increasing (means goes up as x goes up) curve with a range of (0, 1). It trends to 0 as x decreases and trends to 1 as x increases with a value of 0.5 at x=0.
This is the logistic function:
Let’s take a look at this function.
t = np.arange(-5, 5, 0.1)
logistic = lambda t: 1/(1 + np.exp(-t))
fig, ax = plt.subplots(2,1, figsize=(10, 10))
ax[0].plot(t, np.exp(t))
ax[0].set_title('Exponential function exp(t)')
ax[1].plot(t, logistic(t))
ax[1].set_title('Logistic function')
ax[1].set_xlabel('t')
ax[1].set_ylabel('Prob')
plt.show()

For the plots above:
as t gets to be a big positive number, \(\exp(-t)\) goes to 0 and \(\sigma(t)\) goes to 1.
as t gets to be a big negative number, \(\exp(-t)\) goes to \(+\infty\) and \(\sigma(t)\) goes to 0.
The logistic function actually furnishes a probability. When we use logistic regression for classification, we set a decision threshold for the probability, 50% by default.
We denote the predicted probability as \(\hat{p}\).
If \(\hat{p}=\sigma(t)>0.5\) then 1 is more likely than 0, so classify as 1
If \(\hat{p}=\sigma(t)<0.5\) then 0 is more likely than 1, so classify as 0
50% is a default threshold. It’s a good decision value if either case has equal consequence.
Improving Precision: We can raise the threshold if we want to be more discerning about what we classify as 1.
Improving Recall: Conversely, we can lower the threshold if we want to catch more instances of 1.
2.3. Logistic and Linear Regression#
Why are these two topics in the same chapter?
Let’s look back at the logistic function?
What are the parameters of this model? I don’t see any. That’s because they’re hidden inside \(t\).
The value of \(t\) is the output of a linear model (and it can be any flavor of regularized linear model too!). So the parameters of a logistic regression are actually the coefficients of a linear regression (or Lasso or Ridge or ElasticNet). What does this linear model do?
The linear equation maps feature vectors to the range \((-\infty, +\infty)\). Features that should be classified as 1 get assigned positive numbers, larger for more certain classifications; features that should be classified as 0 get assigned negative numbers, larger negative for more certain classifications.
2.3.1. Cost Function#
The loss function minimized for any prediction is:
The cost function minimized for logistic regression is the log-loss function.
np.log(0.5)
np.float64(-0.6931471805599453)
t = np.arange(0.001,1.1, 0.001)
y = np.log(t)
plt.title('Log function')
plt.axvline(x=1, color = 'k', linestyle = '--')
plt.axhline(y=0, color = 'k', linestyle = '--')
plt.xlabel(f'Probability')
plt.ylabel(f'Log(Prob)')
plt.plot(t, y)
plt.show()

2.4. Back to the Example#
Now, let’s use all of our features.
%%capture
from sklearn.preprocessing import StandardScaler
X = bc_df.drop('y', axis = 1)
y = bc_df['y']
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Scale the data, since we are using multiple features
ss = StandardScaler()
X_train_scaled = ss.fit_transform(X_train)
X_test_scaled = ss.fit_transform(X_test)
# Train a logistic regression model
model_LogReg = LogisticRegression(penalty = None, max_iter = 10000)
model_LogReg.fit(X_train_scaled, y_train)
# Make predictions
y_pred_train = model_LogReg.predict(X_train_scaled)
y_pred = model_LogReg.predict(X_test_scaled)
y_pred_prob = model_LogReg.predict_proba(X_test_scaled)
TP = (y_pred==1) & (y_test==1)
TN = (y_pred==0) & (y_test==0)
FP = (y_pred==1) & (y_test==0)
FN = (y_pred==0) & (y_test==1)
fig, ax = plt.subplots(1,1, figsize = (15, 6))
ax.plot(X_test[TP], y_pred[TP], color = right, alpha = 0.5, marker = positive, linewidth = 0, label = 'True Positive')
ax.plot(X_test[TN], y_pred[TN], color = right, markerfacecolor='none',alpha = 0.5, marker = negative, linewidth = 0, label = 'True Negative')
ax.plot(X_test[FP], y_pred[FP], color = wrong, alpha = 0.5, marker = positive, linewidth = 0, label = 'False Positive')
ax.plot(X_test[FN], y_pred[FN], color = wrong, markerfacecolor='none', alpha = 0.5, marker = negative, linewidth = 0, label = 'False Negative')
plt.show()

2.4.1. Assessing the model#
How’d we do?
fig, ax = plt.subplots(1,2, figsize = (12,6), sharey=True)
ConfusionMatrixDisplay.from_predictions(y_train, y_pred_train,
# normalize = 'true',
display_labels = ['Benign', 'Malignant'],
cmap = 'GnBu',
ax = ax[0])
ConfusionMatrixDisplay.from_predictions(y_test, y_pred,
# normalize = 'true',
display_labels = ['Benign', 'Malignant'],
cmap = 'GnBu',
ax = ax[1])
plt.show()

2.5. Regularization in Logistic Regression#
Take a look at the documentation for LogisticRegression.
The four hyper-parameters you will likely use:
penalty
‘l1’ - use Lasso
‘l2’ - use Ridge (Default)
‘elasticnet’ - use ElasticNet
None - use unregularized linear regression
C - is the regularization parameter BUT C = 1/alpha. Small C is high regularization; large C is low regularization. Annoying.
l1_ratio - if using ‘elasticnet’ penalty, this hyper-parameter balances the amount of L1 (Lasso) and L2 (Ridge) penalties.
closer to 0, more Ridge
closer to 1, more Lasso
max_iter - sometimes, fitting the model won’t converge. Try increasing the value of max_iters (default is 100) and see if that fixes the problem.
solver - different regularizations may require different solvers (how the optimal parameters are found). Let’s look at the documentation.
2.5.1. Improving our model#
The model we fit seems to be over-fitting the data. How do I know this from the confusion matrix?
Let’s use some regularization to see if we can improve the fit.
from sklearn.linear_model import LogisticRegressionCV
%%capture
# Train a logistic regression model
model_LogReg_Ridge = LogisticRegressionCV(penalty = 'l2', Cs = np.logspace(-2,2, 9), cv = 5, max_iter = 10000)
model_LogReg_Ridge.fit(X_train_scaled, y_train)
# Make predictions
y_pred_train = model_LogReg_Ridge.predict(X_train_scaled)
y_pred = model_LogReg_Ridge.predict(X_test_scaled)
y_pred_prob = model_LogReg_Ridge.predict_proba(X_test_scaled)
fig, ax = plt.subplots(1,2, figsize = (12,6), sharey=True)
ConfusionMatrixDisplay.from_predictions(y_train, y_pred_train,
# normalize = 'true',
display_labels = ['Benign', 'Malignant'],
cmap = 'GnBu',
ax = ax[0])
ConfusionMatrixDisplay.from_predictions(y_test, y_pred,
# normalize = 'true',
display_labels = ['Benign', 'Malignant'],
cmap = 'GnBu',
ax = ax[1])
plt.show()

2.5.2. Assessing the model#
What were our most significant features?
model_LogReg_Ridge.__dict__
{'Cs': array([1.00000000e-02, 3.16227766e-02, 1.00000000e-01, 3.16227766e-01,
1.00000000e+00, 3.16227766e+00, 1.00000000e+01, 3.16227766e+01,
1.00000000e+02]),
'fit_intercept': True,
'cv': 5,
'dual': False,
'penalty': 'l2',
'scoring': None,
'tol': 0.0001,
'max_iter': 10000,
'class_weight': None,
'n_jobs': None,
'verbose': 0,
'solver': 'lbfgs',
'refit': True,
'intercept_scaling': 1.0,
'multi_class': 'deprecated',
'random_state': None,
'l1_ratios': None,
'n_features_in_': 30,
'classes_': array([0, 1]),
'Cs_': array([1.00000000e-02, 3.16227766e-02, 1.00000000e-01, 3.16227766e-01,
1.00000000e+00, 3.16227766e+00, 1.00000000e+01, 3.16227766e+01,
1.00000000e+02]),
'n_iter_': array([[[10, 7, 8, 12, 12, 16, 21, 23, 29],
[ 9, 7, 8, 9, 10, 14, 14, 20, 17],
[10, 7, 8, 10, 11, 16, 22, 26, 26],
[10, 6, 7, 9, 11, 13, 21, 30, 26],
[ 9, 8, 8, 12, 11, 15, 20, 15, 11]]], dtype=int32),
'scores_': {np.int64(1): array([[0.93406593, 0.97802198, 0.98901099, 0.98901099, 0.97802198,
0.98901099, 0.98901099, 0.96703297, 0.93406593],
[0.94505495, 0.96703297, 0.96703297, 0.96703297, 0.96703297,
0.97802198, 0.96703297, 0.95604396, 0.95604396],
[0.97802198, 0.98901099, 0.98901099, 1. , 1. ,
1. , 1. , 1. , 1. ],
[0.93406593, 0.95604396, 0.96703297, 0.97802198, 0.97802198,
0.97802198, 0.97802198, 0.97802198, 0.97802198],
[0.93406593, 0.94505495, 0.94505495, 0.93406593, 0.94505495,
0.94505495, 0.94505495, 0.94505495, 0.94505495]])},
'coefs_paths_': {np.int64(1): array([[[ 0.18877931, 0.15354924, 0.18835075, ..., 0.15799421,
0.05308207, -0.62707392],
[ 0.26001545, 0.25317046, 0.25722958, ..., 0.26765068,
0.06680208, -0.62149993],
[ 0.32315174, 0.36240604, 0.31657734, ..., 0.45050076,
0.07887859, -0.58505505],
...,
[-0.19992366, -0.18313838, -0.24642842, ..., 2.79406576,
-0.05705654, 0.22982167],
[-0.72115368, -0.18133544, -0.8247711 , ..., 4.61597812,
-0.33770871, 0.73177235],
[-1.27061238, 0.50945972, -1.6081652 , ..., 7.68106288,
-1.3848245 , 0.88623181]],
[[ 0.19407935, 0.1610312 , 0.19282929, ..., 0.15803426,
0.06208549, -0.71466605],
[ 0.2782481 , 0.26927561, 0.27376116, ..., 0.27343663,
0.07898245, -0.74263137],
[ 0.37187738, 0.38911002, 0.36187733, ..., 0.45587393,
0.09207409, -0.76980897],
...,
[ 0.48720426, -0.0536111 , 0.31852832, ..., 2.67600778,
-0.53317192, -0.68615334],
[ 0.39188048, 0.03247218, 0.0542682 , ..., 4.11402491,
-1.36680146, -0.3901461 ],
[ 0.22874718, 0.44243786, -0.29121275, ..., 6.33994447,
-2.42184226, 0.22317786]],
[[ 0.18850543, 0.17367384, 0.1880179 , ..., 0.15486926,
0.07403958, -0.59769658],
[ 0.2711116 , 0.29088627, 0.26713907, ..., 0.25437626,
0.10366637, -0.59128212],
[ 0.35563314, 0.41306985, 0.34485634, ..., 0.41666843,
0.14372282, -0.58508927],
...,
[ 0.16359844, 0.09733587, -0.07076539, ..., 2.52569141,
-0.01463229, 0.23649025],
[-0.282397 , 0.29898835, -0.80273291, ..., 3.84916428,
-0.4422473 , 0.94181455],
[-1.02417092, 0.93962937, -2.05384819, ..., 5.80342855,
-1.18953451, 1.68586285]],
[[ 0.19997914, 0.15811623, 0.19773728, ..., 0.16137979,
0.06280437, -0.65213172],
[ 0.28459406, 0.26940186, 0.27864513, ..., 0.26870448,
0.08220969, -0.66336185],
[ 0.36637957, 0.4077656 , 0.35470213, ..., 0.42896016,
0.097046 , -0.66788239],
...,
[-0.24784738, 0.55084161, -0.40231166, ..., 1.80653513,
0.01311299, 0.18548086],
[-0.99713236, 0.62963699, -1.34442754, ..., 2.47198482,
-0.09830574, 0.79818549],
[-2.25795274, 1.15047984, -2.98278922, ..., 3.56491544,
-0.40545261, 1.60598903]],
[[ 0.19100602, 0.1763199 , 0.18967846, ..., 0.15504789,
0.05368948, -0.67868101],
[ 0.2711364 , 0.29533977, 0.26551771, ..., 0.25671223,
0.07234754, -0.68779421],
[ 0.35676133, 0.41859913, 0.34305118, ..., 0.42267667,
0.1034524 , -0.67676617],
...,
[ 0.42997228, -0.08864118, 0.41415401, ..., 2.60677422,
1.39493816, 0.55349446],
[ 0.48506192, -0.37839758, 0.51076076, ..., 3.80606774,
2.300287 , 1.22929661],
[ 0.7696429 , -0.52645044, 0.82005258, ..., 5.46332712,
3.50667034, 1.95913911]]], shape=(5, 9, 31))},
'C_': array([3.16227766]),
'l1_ratio_': array([None], dtype=object),
'coef_': array([[ 0.30024566, 0.17240886, 0.2191313 , 0.38105667, -0.02093164,
-1.20860703, 1.10774301, 1.89822535, -0.50315616, 0.27157447,
2.04744451, -0.37269222, 0.57003749, 1.4691531 , 0.52954499,
-0.68159526, -0.57761771, 0.52984054, -0.77511646, -0.90946073,
1.20043014, 1.96459618, 0.51709265, 1.21674533, 0.43159841,
-0.1202286 , 1.45955511, 0.91213833, 1.78008332, 0.12361707]]),
'intercept_': array([-0.1816065]),
'l1_ratios_': array([None], dtype=object)}
coefs = model_LogReg_Ridge.coef_[0]
features = X.columns
feature_weights_df = pd.DataFrame({'features':features, 'weights':coefs, 'abs weights': np.abs(coefs)})
feature_weights_df.sort_values(by = 'abs weights', ascending=False)
features | weights | abs weights | |
---|---|---|---|
10 | radius error | 2.047445 | 2.047445 |
21 | worst texture | 1.964596 | 1.964596 |
7 | mean concave points | 1.898225 | 1.898225 |
28 | worst symmetry | 1.780083 | 1.780083 |
13 | area error | 1.469153 | 1.469153 |
26 | worst concavity | 1.459555 | 1.459555 |
23 | worst area | 1.216745 | 1.216745 |
5 | mean compactness | -1.208607 | 1.208607 |
20 | worst radius | 1.200430 | 1.200430 |
6 | mean concavity | 1.107743 | 1.107743 |
27 | worst concave points | 0.912138 | 0.912138 |
19 | fractal dimension error | -0.909461 | 0.909461 |
18 | symmetry error | -0.775116 | 0.775116 |
15 | compactness error | -0.681595 | 0.681595 |
16 | concavity error | -0.577618 | 0.577618 |
12 | perimeter error | 0.570037 | 0.570037 |
17 | concave points error | 0.529841 | 0.529841 |
14 | smoothness error | 0.529545 | 0.529545 |
22 | worst perimeter | 0.517093 | 0.517093 |
8 | mean symmetry | -0.503156 | 0.503156 |
24 | worst smoothness | 0.431598 | 0.431598 |
3 | mean area | 0.381057 | 0.381057 |
11 | texture error | -0.372692 | 0.372692 |
0 | mean radius | 0.300246 | 0.300246 |
9 | mean fractal dimension | 0.271574 | 0.271574 |
2 | mean perimeter | 0.219131 | 0.219131 |
1 | mean texture | 0.172409 | 0.172409 |
29 | worst fractal dimension | 0.123617 | 0.123617 |
25 | worst compactness | -0.120229 | 0.120229 |
4 | mean smoothness | -0.020932 | 0.020932 |
2.6. Another Example: Classifying Wines#
In this classification problem, we are given chemical properties of wine as well as a target value corresponding to the vineyard the wine came from. Can we identify which vineyard a wine came from by analyzing the chemical content of the wine itself?
There are three categories of wine in this classification (0,1,2). Below, I’ve set up the problem so that you can fit a binary logistic regression or a multi-class just by toggling the comments on a few lines of code in the next cell. Everything else can stay the same; the LogisticRegression model will handle both cases.
For multi-class classification, we can leave the target variable as it is in the original data set.
For binary classification, we will relabel the target so that:
1 = the wine is from vineyard 1
0 = the wine is from either vineyard 0 or 2 (not 1).
from sklearn.datasets import load_wine
wine_df, y = load_wine(return_X_y=True, as_frame=True)
# Multi-class (3) problem
wine_df['y'] = y
vineyard_labels = ['Vineyard 0', 'Vineyard 1', 'Vineyard 2']
# # Binary Classification problem
# vineyard_labels = ['Not Vineyard 1', 'Vineyard 1']
# wine_df['y'] = 1 * (y==1)
print(wine_df['y'].unique())
wine_df.sample(10)
[0 1 2]
alcohol | malic_acid | ash | alcalinity_of_ash | magnesium | total_phenols | flavanoids | nonflavanoid_phenols | proanthocyanins | color_intensity | hue | od280/od315_of_diluted_wines | proline | y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
71 | 13.86 | 1.51 | 2.67 | 25.0 | 86.0 | 2.95 | 2.86 | 0.21 | 1.87 | 3.38 | 1.36 | 3.16 | 410.0 | 1 |
29 | 14.02 | 1.68 | 2.21 | 16.0 | 96.0 | 2.65 | 2.33 | 0.26 | 1.98 | 4.70 | 1.04 | 3.59 | 1035.0 | 0 |
91 | 12.00 | 1.51 | 2.42 | 22.0 | 86.0 | 1.45 | 1.25 | 0.50 | 1.63 | 3.60 | 1.05 | 2.65 | 450.0 | 1 |
79 | 12.70 | 3.87 | 2.40 | 23.0 | 101.0 | 2.83 | 2.55 | 0.43 | 1.95 | 2.57 | 1.19 | 3.13 | 463.0 | 1 |
66 | 13.11 | 1.01 | 1.70 | 15.0 | 78.0 | 2.98 | 3.18 | 0.26 | 2.28 | 5.30 | 1.12 | 3.18 | 502.0 | 1 |
15 | 13.63 | 1.81 | 2.70 | 17.2 | 112.0 | 2.85 | 2.91 | 0.30 | 1.46 | 7.30 | 1.28 | 2.88 | 1310.0 | 0 |
67 | 12.37 | 1.17 | 1.92 | 19.6 | 78.0 | 2.11 | 2.00 | 0.27 | 1.04 | 4.68 | 1.12 | 3.48 | 510.0 | 1 |
177 | 14.13 | 4.10 | 2.74 | 24.5 | 96.0 | 2.05 | 0.76 | 0.56 | 1.35 | 9.20 | 0.61 | 1.60 | 560.0 | 2 |
164 | 13.78 | 2.76 | 2.30 | 22.0 | 90.0 | 1.35 | 0.68 | 0.41 | 1.03 | 9.58 | 0.70 | 1.68 | 615.0 | 2 |
61 | 12.64 | 1.36 | 2.02 | 16.8 | 100.0 | 2.02 | 1.41 | 0.53 | 0.62 | 5.75 | 0.98 | 1.59 | 450.0 | 1 |
# Split the data into features and target
y = wine_df['y']
X = wine_df.drop(columns = ['y'])
X
alcohol | malic_acid | ash | alcalinity_of_ash | magnesium | total_phenols | flavanoids | nonflavanoid_phenols | proanthocyanins | color_intensity | hue | od280/od315_of_diluted_wines | proline | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 14.23 | 1.71 | 2.43 | 15.6 | 127.0 | 2.80 | 3.06 | 0.28 | 2.29 | 5.64 | 1.04 | 3.92 | 1065.0 |
1 | 13.20 | 1.78 | 2.14 | 11.2 | 100.0 | 2.65 | 2.76 | 0.26 | 1.28 | 4.38 | 1.05 | 3.40 | 1050.0 |
2 | 13.16 | 2.36 | 2.67 | 18.6 | 101.0 | 2.80 | 3.24 | 0.30 | 2.81 | 5.68 | 1.03 | 3.17 | 1185.0 |
3 | 14.37 | 1.95 | 2.50 | 16.8 | 113.0 | 3.85 | 3.49 | 0.24 | 2.18 | 7.80 | 0.86 | 3.45 | 1480.0 |
4 | 13.24 | 2.59 | 2.87 | 21.0 | 118.0 | 2.80 | 2.69 | 0.39 | 1.82 | 4.32 | 1.04 | 2.93 | 735.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
173 | 13.71 | 5.65 | 2.45 | 20.5 | 95.0 | 1.68 | 0.61 | 0.52 | 1.06 | 7.70 | 0.64 | 1.74 | 740.0 |
174 | 13.40 | 3.91 | 2.48 | 23.0 | 102.0 | 1.80 | 0.75 | 0.43 | 1.41 | 7.30 | 0.70 | 1.56 | 750.0 |
175 | 13.27 | 4.28 | 2.26 | 20.0 | 120.0 | 1.59 | 0.69 | 0.43 | 1.35 | 10.20 | 0.59 | 1.56 | 835.0 |
176 | 13.17 | 2.59 | 2.37 | 20.0 | 120.0 | 1.65 | 0.68 | 0.53 | 1.46 | 9.30 | 0.60 | 1.62 | 840.0 |
177 | 14.13 | 4.10 | 2.74 | 24.5 | 96.0 | 2.05 | 0.76 | 0.56 | 1.35 | 9.20 | 0.61 | 1.60 | 560.0 |
178 rows × 13 columns
# train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y)
# Scale the data
ss = StandardScaler()
X_train_scaled = ss.fit_transform(X_train)
X_test_scaled = ss.transform(X_test)
# I'm choosing to use logistic regression with Lasso (L1) penalty
# One hyper-parameter is solver and not all solvers work for all parameters
logreg_lasso = LogisticRegressionCV(penalty = 'l1', Cs = np.logspace(-3,3,13), solver = 'liblinear', cv = 5)
logreg_lasso.fit(X_train_scaled,y_train)
# Make predictions
y_pred = logreg_lasso.predict(X_test_scaled)
y_pred_train = logreg_lasso.predict(X_train_scaled)
y_proba = logreg_lasso.predict_proba(X_test_scaled)
fig, ax = plt.subplots(1,2, figsize = (12,6), sharey=True)
ConfusionMatrixDisplay.from_predictions(y_train, y_pred_train,
# normalize = 'true',
display_labels = vineyard_labels,
cmap = 'GnBu',
ax = ax[0])
ConfusionMatrixDisplay.from_predictions(y_test, y_pred,
# normalize = 'true',
display_labels = vineyard_labels,
cmap = 'GnBu',
ax = ax[1])
plt.show()

What were the most important features?
Get the feature names from the columns of X
Get the coefficients from the model and calculate their absolute values
Create a dataframe with columns: features, weights, abs weights
Sort by abs weight
# Follow the instructions above
features = X.columns
# When we perform multiple logistic regressions, we get multiple sets of coefficients
# The algorithm fits several binary logistic regressions to solve the multi-class problem
coefs = logreg_lasso.coef_[2]
features_df = pd.DataFrame({'features': features, 'coef':coefs, 'abs coef':np.abs(coefs)})
features_df.sort_values(by = 'abs coef', ascending = False)
features | coef | abs coef | |
---|---|---|---|
6 | flavanoids | -4.485358 | 4.485358 |
9 | color_intensity | 2.195638 | 2.195638 |
2 | ash | 0.914783 | 0.914783 |
10 | hue | -0.672959 | 0.672959 |
7 | nonflavanoid_phenols | -0.571479 | 0.571479 |
1 | malic_acid | 0.455491 | 0.455491 |
11 | od280/od315_of_diluted_wines | -0.092544 | 0.092544 |
0 | alcohol | 0.000000 | 0.000000 |
3 | alcalinity_of_ash | 0.000000 | 0.000000 |
4 | magnesium | 0.000000 | 0.000000 |
5 | total_phenols | 0.000000 | 0.000000 |
8 | proanthocyanins | 0.000000 | 0.000000 |
12 | proline | 0.000000 | 0.000000 |
y_proba
array([[9.97044576e-01, 9.57737319e-09, 2.95541450e-03],
[9.17236576e-06, 1.94012081e-02, 9.80589620e-01],
[1.03906676e-06, 9.72579941e-07, 9.99997988e-01],
[2.32391569e-05, 5.18195562e-01, 4.81781199e-01],
[9.99221796e-01, 1.38898356e-04, 6.39305349e-04],
[3.40815629e-04, 1.98792389e-07, 9.99658986e-01],
[1.68325005e-03, 2.04776169e-07, 9.98316545e-01],
[2.07327421e-04, 9.98141173e-01, 1.65149942e-03],
[2.08529639e-05, 2.47837114e-01, 7.52142033e-01],
[4.57257900e-07, 9.99851664e-01, 1.47879152e-04],
[3.64571984e-05, 9.98450820e-01, 1.51272242e-03],
[3.04025313e-03, 9.99708362e-10, 9.96959746e-01],
[9.99790673e-01, 1.27974707e-07, 2.09199353e-04],
[6.61183160e-03, 9.93088019e-01, 3.00149022e-04],
[3.49261222e-05, 9.62107699e-01, 3.78573745e-02],
[9.99547937e-01, 8.68880032e-07, 4.51194067e-04],
[3.07358237e-03, 9.96114025e-01, 8.12392202e-04],
[9.93489975e-01, 2.99365127e-03, 3.51637383e-03],
[9.97889001e-01, 3.66601824e-06, 2.10733333e-03],
[9.99694718e-01, 2.25197027e-05, 2.82762160e-04],
[9.78333644e-01, 2.14018075e-02, 2.64548884e-04],
[3.34345861e-03, 9.95579000e-01, 1.07754089e-03],
[9.14843055e-04, 2.59025676e-06, 9.99082567e-01],
[5.41164005e-04, 1.20024064e-06, 9.99457636e-01],
[6.71557979e-06, 8.43556428e-06, 9.99984849e-01],
[1.75840533e-04, 1.45820063e-05, 9.99809577e-01],
[8.65640512e-05, 1.62532752e-05, 9.99897183e-01],
[1.26218060e-06, 9.80984845e-01, 1.90138929e-02],
[9.99603087e-01, 3.88864641e-04, 8.04836320e-06],
[2.59411447e-05, 9.98827795e-01, 1.14626346e-03],
[6.72821999e-03, 9.81212736e-01, 1.20590441e-02],
[9.09297611e-06, 9.96234884e-01, 3.75602308e-03],
[1.71460590e-03, 9.98281964e-01, 3.43007892e-06],
[1.43038698e-05, 8.24435803e-01, 1.75549893e-01],
[1.08931470e-04, 1.73767473e-06, 9.99889331e-01],
[1.85295801e-04, 9.96680874e-01, 3.13383062e-03],
[2.65284871e-05, 2.89119029e-03, 9.97082281e-01],
[2.30393770e-07, 9.97869164e-01, 2.13060535e-03],
[4.49251602e-05, 9.93546278e-01, 6.40879650e-03],
[5.88574589e-01, 4.11353823e-01, 7.15879365e-05],
[2.03122770e-04, 4.65954809e-08, 9.99796831e-01],
[1.26129380e-07, 6.62007699e-04, 9.99337866e-01],
[2.34225821e-02, 3.11762572e-04, 9.76265655e-01],
[7.95939834e-01, 2.03950641e-01, 1.09525105e-04],
[9.60134745e-02, 9.03600916e-01, 3.85609471e-04]])