2. Neural Networks with Keras#

First things first, I’m a realist…

I added instructions to install Tensorflow and Keras (it doesn’t come pre-packaged with Anaconda). I’m assuming some have yet to do that, and these packages take time to install.

Try to run the code block below. If it gives an error that Tensorflow or Keras aren’t found, you’ll need to install those packages. Uncomment the first line in the code block below and run it again.

While that’s happening, we’ll go over some background and vocabulary relating to neural networks.

#!conda install tensorflow keras

from keras.datasets import fashion_mnist
# Load the Fashion-MNIST dataset
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

2.1. What are Neural Networks?#

Neural networks are machine learning models (loosely) inspired by the structure of the human brain. Neural networks can be implemented for regression and classification tasks and are widely used in complex tasks such as image and speech processing, language translation, time-series modeling and forecasting, and anomaly detection. There are numerous variants of neural networks. Some notable examples:

  • Multi-layer Percpetron (MLP) - the vanilla neural network we’ll use today

  • Convolutional Neural Network (CNN) - are largely used in image and video processing, but can also be applied to time-series and forecasting problems

  • Recurrent Neural Network (RNN) - used for time-series modeling and forecasting

  • Long- Short-Term Memory Neural Network (LSTM) - an upgrade of RNN. This was the cutting edge of natural language processing before transformers (as in Generative Pre-trained Transformer) ushered in the age of LMMs.

  • Generative Adversarial Networks (GANs) - used for generating novel data (i.e. a genAI model) and fraud detection. GANs comprise parallel neural networks, a generator (creating fake instances of data) and discriminator (trained to detect real data from fake).

2.2. Why Neural Networks over other models? The PROs.#

  • Ability to model complex nonlinear relationships: Neural networks can automatically learn and represent intricate, nonlinear patterns between inputs and outputs, which many traditional models (like linear regression) cannot do without extensive manual feature engineering.

  • Handling of high-dimensional and unstructured data: NNs excel at processing large-scale, high-dimensional data (e.g. images, audio, and text) where other models often struggle.

  • Feature extraction and automatic feature engineering: NNs ‘discover’ and construct relevant features from raw data, reducing the need for manual intervention and domain expertise in feature selection.

  • Adaptability in deployment: NNs adapt to new data and improve performance over time, making them suitable for dynamic, real-world applications.

2.3. Why not Neural Networks? The CONs.#

  • Require large amounts of data: NNs have many many parameters, so they typically need thousands to millions of labeled examples to perform well. For smaller data sets, other ML models are more suitable.

  • Lack interpretability (“black box”): For most NNs (especially deep NNs), we cannot gleen meaning from the parameters.

  • High computational cost: Training neural networks, especially deep NNs, demands significant computational resources (powerful GPUs/TPUs) and can take much longer than training traditional models. Many such models are trained on remote cloud computing servers (pay per compute).

  • Risk of overfitting: Neural networks, with their large number of parameters, are prone to overfitting if not properly regularized, especially when trained on small or noisy datasets.

  • Complexity in development and tuning: Designing, training, and tuning neural networks (e.g., choosing architecture, hyperparameters) is often more complex and time-consuming than working with traditional models, which generally have fewer parameters and simpler structures.

2.4. Neural Network Anatomy and The Multi-Layer Perceptron#

The term perceptron has two common usages: a single artificial neuron or several artificial neurons arranged in a single layer. Either way, a perceptron is a sort of building block for more complex neural networks.

First, let’s consider the single ‘neuron’ interpretation.

perceptron, by Adam Weaver

In the diagram above, each input feature is assigned a weight (parameters) and a weighted sum of features and a bias term are fed through some non-linear activation function. The diagram above can be represented by the following equation:

\[ \hat{y} = f(b + w_0 \cdot x_0 + w_1 \cdot x_1+ w_2 \cdot x_2 + w_3 \cdot x_3) \]

2.4.1. Activation Functions#

Every perceptron that comprises a neural network has some activation function. Activiation functions are non-linear and without them, a neural network (regardless of size) could be reduced to a single linear neuron. These activation functions make it that for any given region of the feature space, only some neurons will participate and others will be dormant. So the feature space is parsed by different subsets of neurons.

Some example activation functions are:

  • Heaviside function - step function

  • Sigmoid function - same as logit function from logistic regression

  • Rectifying Linear Unit (ReLu) - linear for positive values and zero otherwise. This is the most common activation function.

  • Tanh - hyperbolic tangent function, similar to sigmoid but ranges -1 to 1.

2.4.2. Multi-Layer Perceptron (the vanilla NN)#

MLP, Kishgore NG

A MLP comprises layers of perceptrons and each layer may itself contain numerous perceptrons. In this diagram, every neuron in one layer projects onto every neuron in the subsequent layer. These are called dense layers.

Generally, in densely connected NNs, each perceptron in a layer is the same (same number of parameters and same activation function).

Glossary:

  • Input layer - This layer accepts the features

  • Hidden layer - layers of perceptrons between input and output. Deep Neural Networks refer to NNs with many hidden layers. In the past, ‘many’ meant more than 3, but today, we have NNs with hundreds or thousands of layers. Deep is subjective and changes as technology improves.

  • Output layer - the layer where predictions are made

  • For regression, the output layer will have a single neuron for each predicted value (often one)

  • For binary classification, the output layer may have one or two neurons.

  • For multi-class classification, the output layer will have as many neurons as there are classes.

2.4.3. Architecture and Hyper-parameters#

When we first create the MLP, we have to decide on the architecture (how many layers, how many neurons per layer, activation functions, etc) and the hyper-parameters (regularization, batch size, epochs).

We haven’t had to worry about training time and computational demands with the models we’ve used thus far, but NN’s complexity make these issues non-trivial.

In fitting, we can adjust two hyper-parameters that govern learning: learning rate, batch size, and number of epochs.

  • Learning Rate - determines how much the parameters are adjusted at each update

  • Batch - a subset of the training data. The training data set is partitioned into batches and the parameters are updated after each batch is processed.

  • Epoch - Once through the entire training data. The training algorithm iterates through the entire training data set numerous times, each is an epoch.

Batch Size

Training Speed

Memory Usage

Generalization

Large

Faster

Higher

Risk of Overfitting

Small

Slower

Lower

Regularized

Rule of thumb: If we increase batch size, we should also increase learning rate.

import numpy as np

import keras
from keras.models import Sequential
from keras.layers import Dense, Flatten

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, classification_report, accuracy_score

import matplotlib.pyplot as plt

2.5. Example: Fashion-MNIST, classifying articles of clothing#

The Fashion-MNIST datase comprises 60,000 28x28 pixel, gray-scale images of clothing items from one of the following categories.

  • 0 T-shirt/top

  • 1 Trouser

  • 2 Pullover

  • 3 Dress

  • 4 Coat

  • 5 Sandal

  • 6 Shirt

  • 7 Sneaker

  • 8 Bag

  • 9 Ankle boot

x_train.shape
(60000, 28, 28)
num_samples = 25
fig, ax = plt.subplots(5, 5, figsize = (10, 10), sharex = True, sharey= True)

for k in range(num_samples):
    i,j = int(k/5), k%5
    ax[i,j].imshow(x_train[k,:,:]/255, cmap = 'gray')
plt.show()
../_images/42890b2f0b3bd3b98316fe58b43d7afdc4d0c780a9bfc65da0b02f1524049137.png
# Normalize the data, pixel data is 0-255 (8-bit) but we want 0-1
x_train = x_train / 255.0
x_test = x_test / 255.0

# Create a vanilla neural network
model = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=10, batch_size=32, validation_split=0.2)
Epoch 1/10
/Users/eatai/.pyenv/versions/3.13.1/envs/datascience/lib/python3.13/site-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
   1/1500 ━━━━━━━━━━━━━━━━━━━━ 5:07 205ms/step - accuracy: 0.1250 - loss: 2.5064

  84/1500 ━━━━━━━━━━━━━━━━━━━ 0s 609us/step - accuracy: 0.4923 - loss: 1.4047  

 178/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 571us/step - accuracy: 0.5888 - loss: 1.1415

 272/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 559us/step - accuracy: 0.6349 - loss: 1.0175

 367/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 551us/step - accuracy: 0.6638 - loss: 0.9406

 462/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 546us/step - accuracy: 0.6845 - loss: 0.8862

 554/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 547us/step - accuracy: 0.6996 - loss: 0.8462

 649/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 544us/step - accuracy: 0.7117 - loss: 0.8135

 745/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 542us/step - accuracy: 0.7215 - loss: 0.7868

 833/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 545us/step - accuracy: 0.7291 - loss: 0.7660

 925/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 545us/step - accuracy: 0.7359 - loss: 0.7471

1018/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 545us/step - accuracy: 0.7420 - loss: 0.7305

1114/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 543us/step - accuracy: 0.7475 - loss: 0.7154

1147/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 606us/step - accuracy: 0.7492 - loss: 0.7107

1242/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 600us/step - accuracy: 0.7538 - loss: 0.6978

1337/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 595us/step - accuracy: 0.7579 - loss: 0.6861

1434/1500 ━━━━━━━━━━━━━━━━━━━ 0s 590us/step - accuracy: 0.7618 - loss: 0.6753

1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 692us/step - accuracy: 0.8174 - loss: 0.5204 - val_accuracy: 0.8486 - val_loss: 0.4296
Epoch 2/10
   1/1500 ━━━━━━━━━━━━━━━━━━━━ 9s 6ms/step - accuracy: 0.8750 - loss: 0.2825

  95/1500 ━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.8423 - loss: 0.4300

 190/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 532us/step - accuracy: 0.8488 - loss: 0.4218

 284/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 533us/step - accuracy: 0.8510 - loss: 0.4175

 380/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 530us/step - accuracy: 0.8529 - loss: 0.4135

 475/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 530us/step - accuracy: 0.8541 - loss: 0.4110

 570/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 530us/step - accuracy: 0.8551 - loss: 0.4091

 666/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 529us/step - accuracy: 0.8561 - loss: 0.4076

 761/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 529us/step - accuracy: 0.8568 - loss: 0.4062

 857/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 528us/step - accuracy: 0.8571 - loss: 0.4052

 951/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 529us/step - accuracy: 0.8574 - loss: 0.4042

1042/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 531us/step - accuracy: 0.8577 - loss: 0.4032

1136/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 531us/step - accuracy: 0.8578 - loss: 0.4025

1232/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 531us/step - accuracy: 0.8579 - loss: 0.4018

1328/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 530us/step - accuracy: 0.8581 - loss: 0.4011

1425/1500 ━━━━━━━━━━━━━━━━━━━ 0s 530us/step - accuracy: 0.8583 - loss: 0.4004

1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 604us/step - accuracy: 0.8621 - loss: 0.3878 - val_accuracy: 0.8608 - val_loss: 0.3761
Epoch 3/10
   1/1500 ━━━━━━━━━━━━━━━━━━━━ 8s 6ms/step - accuracy: 0.8750 - loss: 0.3307

  96/1500 ━━━━━━━━━━━━━━━━━━━ 0s 529us/step - accuracy: 0.8681 - loss: 0.3795

 187/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 540us/step - accuracy: 0.8698 - loss: 0.3675

 280/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 540us/step - accuracy: 0.8698 - loss: 0.3651

 376/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8697 - loss: 0.3639

 473/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 531us/step - accuracy: 0.8700 - loss: 0.3622

 569/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 530us/step - accuracy: 0.8702 - loss: 0.3608

 666/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 528us/step - accuracy: 0.8703 - loss: 0.3594

 762/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 528us/step - accuracy: 0.8704 - loss: 0.3585

 859/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 527us/step - accuracy: 0.8705 - loss: 0.3577

 954/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 527us/step - accuracy: 0.8707 - loss: 0.3568

1048/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 528us/step - accuracy: 0.8709 - loss: 0.3559

1135/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 532us/step - accuracy: 0.8711 - loss: 0.3552

1223/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 535us/step - accuracy: 0.8712 - loss: 0.3546

1313/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8713 - loss: 0.3540

1407/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8714 - loss: 0.3535

1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 610us/step - accuracy: 0.8724 - loss: 0.3469 - val_accuracy: 0.8721 - val_loss: 0.3569
Epoch 4/10
   1/1500 ━━━━━━━━━━━━━━━━━━━━ 7s 5ms/step - accuracy: 0.8750 - loss: 0.3239

  95/1500 ━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8855 - loss: 0.3232

 192/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 528us/step - accuracy: 0.8799 - loss: 0.3310

 288/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 526us/step - accuracy: 0.8785 - loss: 0.3313

 384/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 526us/step - accuracy: 0.8782 - loss: 0.3312

 480/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.8787 - loss: 0.3293

 576/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.8795 - loss: 0.3272

 671/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.8801 - loss: 0.3259

 767/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.8805 - loss: 0.3248

 863/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.8809 - loss: 0.3240

 961/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 524us/step - accuracy: 0.8810 - loss: 0.3236

1057/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 524us/step - accuracy: 0.8812 - loss: 0.3233

1148/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 526us/step - accuracy: 0.8813 - loss: 0.3231

1243/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 526us/step - accuracy: 0.8815 - loss: 0.3228

1339/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 526us/step - accuracy: 0.8815 - loss: 0.3226

1435/1500 ━━━━━━━━━━━━━━━━━━━ 0s 526us/step - accuracy: 0.8816 - loss: 0.3225

1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 599us/step - accuracy: 0.8822 - loss: 0.3203 - val_accuracy: 0.8750 - val_loss: 0.3516
Epoch 5/10
   1/1500 ━━━━━━━━━━━━━━━━━━━━ 7s 5ms/step - accuracy: 0.7812 - loss: 0.4199

  97/1500 ━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.8813 - loss: 0.2982

 193/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 524us/step - accuracy: 0.8861 - loss: 0.2940

 289/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 523us/step - accuracy: 0.8870 - loss: 0.2966

 387/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 521us/step - accuracy: 0.8870 - loss: 0.2995

 484/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 521us/step - accuracy: 0.8869 - loss: 0.3018

 581/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 521us/step - accuracy: 0.8867 - loss: 0.3033

 678/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8869 - loss: 0.3039

 774/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8870 - loss: 0.3042

 871/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8871 - loss: 0.3045

 968/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8871 - loss: 0.3048

1065/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8871 - loss: 0.3048

1162/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8872 - loss: 0.3047

1257/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8872 - loss: 0.3046

1353/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8873 - loss: 0.3046

1449/1500 ━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8873 - loss: 0.3045

1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 599us/step - accuracy: 0.8880 - loss: 0.3034 - val_accuracy: 0.8804 - val_loss: 0.3408
Epoch 6/10
   1/1500 ━━━━━━━━━━━━━━━━━━━━ 9s 7ms/step - accuracy: 0.9375 - loss: 0.1836

  91/1500 ━━━━━━━━━━━━━━━━━━━ 0s 561us/step - accuracy: 0.8982 - loss: 0.2788

 185/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 546us/step - accuracy: 0.8953 - loss: 0.2865

 279/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 542us/step - accuracy: 0.8956 - loss: 0.2869

 375/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.8957 - loss: 0.2875

 470/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 535us/step - accuracy: 0.8958 - loss: 0.2878

 567/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 532us/step - accuracy: 0.8959 - loss: 0.2879

 662/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 531us/step - accuracy: 0.8960 - loss: 0.2880

 756/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 532us/step - accuracy: 0.8960 - loss: 0.2878

 839/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 539us/step - accuracy: 0.8959 - loss: 0.2878

 933/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 539us/step - accuracy: 0.8958 - loss: 0.2878

1031/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.8958 - loss: 0.2877

1126/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8958 - loss: 0.2875

1217/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.8958 - loss: 0.2874

1312/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.8958 - loss: 0.2873

1406/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.8958 - loss: 0.2872

1500/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.8957 - loss: 0.2872

1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 614us/step - accuracy: 0.8945 - loss: 0.2883 - val_accuracy: 0.8815 - val_loss: 0.3330
Epoch 7/10
   1/1500 ━━━━━━━━━━━━━━━━━━━━ 7s 5ms/step - accuracy: 0.8125 - loss: 0.4398

  94/1500 ━━━━━━━━━━━━━━━━━━━ 0s 542us/step - accuracy: 0.8948 - loss: 0.2732

 188/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.8960 - loss: 0.2699

 282/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8966 - loss: 0.2697

 376/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 535us/step - accuracy: 0.8970 - loss: 0.2697

 470/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 535us/step - accuracy: 0.8971 - loss: 0.2703

 563/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8974 - loss: 0.2705

 657/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8976 - loss: 0.2708

 753/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 535us/step - accuracy: 0.8979 - loss: 0.2708

 847/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 534us/step - accuracy: 0.8980 - loss: 0.2711

 942/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 534us/step - accuracy: 0.8980 - loss: 0.2714

1018/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 543us/step - accuracy: 0.8979 - loss: 0.2716

1111/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 543us/step - accuracy: 0.8980 - loss: 0.2717

1205/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 543us/step - accuracy: 0.8981 - loss: 0.2716

1300/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 542us/step - accuracy: 0.8981 - loss: 0.2716

1395/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 541us/step - accuracy: 0.8982 - loss: 0.2715

1491/1500 ━━━━━━━━━━━━━━━━━━━ 0s 540us/step - accuracy: 0.8982 - loss: 0.2715

1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 616us/step - accuracy: 0.8989 - loss: 0.2716 - val_accuracy: 0.8847 - val_loss: 0.3234
Epoch 8/10
   1/1500 ━━━━━━━━━━━━━━━━━━━━ 8s 6ms/step - accuracy: 0.9375 - loss: 0.1962

  94/1500 ━━━━━━━━━━━━━━━━━━━ 0s 541us/step - accuracy: 0.9026 - loss: 0.2717

 189/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 534us/step - accuracy: 0.9023 - loss: 0.2659

 284/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 533us/step - accuracy: 0.9025 - loss: 0.2637

 379/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 531us/step - accuracy: 0.9030 - loss: 0.2612

 469/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.9033 - loss: 0.2603

 558/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 542us/step - accuracy: 0.9034 - loss: 0.2600

 647/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 545us/step - accuracy: 0.9034 - loss: 0.2599

 737/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 547us/step - accuracy: 0.9034 - loss: 0.2597

 831/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 545us/step - accuracy: 0.9034 - loss: 0.2596

 926/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 544us/step - accuracy: 0.9034 - loss: 0.2595

1023/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 541us/step - accuracy: 0.9034 - loss: 0.2596

1119/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 540us/step - accuracy: 0.9034 - loss: 0.2597

1215/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 539us/step - accuracy: 0.9035 - loss: 0.2597

1312/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.9035 - loss: 0.2597

1410/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 535us/step - accuracy: 0.9035 - loss: 0.2598

1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 608us/step - accuracy: 0.9024 - loss: 0.2623 - val_accuracy: 0.8842 - val_loss: 0.3244
Epoch 9/10
   1/1500 ━━━━━━━━━━━━━━━━━━━━ 8s 5ms/step - accuracy: 0.9062 - loss: 0.2110

  96/1500 ━━━━━━━━━━━━━━━━━━━ 0s 530us/step - accuracy: 0.8964 - loss: 0.2571

 193/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 523us/step - accuracy: 0.9002 - loss: 0.2548

 291/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.9018 - loss: 0.2532

 389/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 518us/step - accuracy: 0.9026 - loss: 0.2519

 487/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 517us/step - accuracy: 0.9032 - loss: 0.2512

 585/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 516us/step - accuracy: 0.9035 - loss: 0.2511

 683/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 515us/step - accuracy: 0.9038 - loss: 0.2506

 782/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 514us/step - accuracy: 0.9041 - loss: 0.2501

 882/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 513us/step - accuracy: 0.9043 - loss: 0.2498

 981/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 512us/step - accuracy: 0.9045 - loss: 0.2497

1080/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 512us/step - accuracy: 0.9045 - loss: 0.2500

1180/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 511us/step - accuracy: 0.9046 - loss: 0.2502

1279/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 511us/step - accuracy: 0.9046 - loss: 0.2503

1377/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 511us/step - accuracy: 0.9047 - loss: 0.2504

1473/1500 ━━━━━━━━━━━━━━━━━━━ 0s 512us/step - accuracy: 0.9048 - loss: 0.2504

1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 584us/step - accuracy: 0.9063 - loss: 0.2507 - val_accuracy: 0.8904 - val_loss: 0.3164
Epoch 10/10
   1/1500 ━━━━━━━━━━━━━━━━━━━━ 7s 5ms/step - accuracy: 0.9688 - loss: 0.1049

  98/1500 ━━━━━━━━━━━━━━━━━━━ 0s 518us/step - accuracy: 0.9133 - loss: 0.2170

 196/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 516us/step - accuracy: 0.9140 - loss: 0.2204

 294/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 515us/step - accuracy: 0.9136 - loss: 0.2245

 389/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 518us/step - accuracy: 0.9126 - loss: 0.2285

 487/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 518us/step - accuracy: 0.9119 - loss: 0.2310

 586/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 516us/step - accuracy: 0.9114 - loss: 0.2333

 685/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 515us/step - accuracy: 0.9110 - loss: 0.2350

 783/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 515us/step - accuracy: 0.9107 - loss: 0.2362

 876/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 517us/step - accuracy: 0.9106 - loss: 0.2369

 967/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 521us/step - accuracy: 0.9105 - loss: 0.2374

1064/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.9105 - loss: 0.2379

1158/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 522us/step - accuracy: 0.9104 - loss: 0.2382

1256/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 521us/step - accuracy: 0.9104 - loss: 0.2385

1354/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 521us/step - accuracy: 0.9103 - loss: 0.2387

1452/1500 ━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.9103 - loss: 0.2388

1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 592us/step - accuracy: 0.9098 - loss: 0.2410 - val_accuracy: 0.8884 - val_loss: 0.3171
<keras.src.callbacks.history.History at 0x120ecbe00>
model.layers
[<Flatten name=flatten, built=True>,
 <Dense name=dense, built=True>,
 <Dense name=dense_1, built=True>]
model.layers[1].get_weights()
[array([[-0.0713469 ,  0.03250205,  0.06500132, ...,  0.27670878,
         -0.18487853, -0.08517335],
        [-0.05822118, -0.09768099,  0.07717109, ...,  0.41678566,
         -0.01116703, -0.11538623],
        [-0.05299133,  0.10303343, -0.09143736, ...,  0.17697497,
         -0.11204378, -0.11445503],
        ...,
        [-0.18283023, -0.05082365, -0.15573852, ...,  0.4768174 ,
         -0.13417463, -0.1478706 ],
        [-0.23174927, -0.29934692,  0.14626086, ...,  0.5767144 ,
         -0.3227887 , -0.11754484],
        [-0.22875336, -0.20766266, -0.07919479, ...,  0.42055562,
         -0.42195907, -0.07124062]], shape=(784, 128), dtype=float32),
 array([ 0.33520427,  0.41476032,  0.42854255,  0.26456   ,  0.28423107,
         0.4619906 , -0.01085375, -0.01410234,  0.46816376,  0.04695842,
         0.16199604,  0.1178596 , -0.15440884,  0.26269406, -0.02273692,
         0.39012593,  0.11480645, -0.16267306,  0.01186102, -0.20373592,
         0.77516353,  0.28331184,  0.04801618, -0.01358667,  0.27781582,
        -0.30025145, -0.01313267, -0.02819712,  0.05266671, -0.01129104,
         0.3718213 , -0.06276885,  0.07143743, -0.42740637,  0.27966768,
         0.5643312 ,  0.13939717, -0.02660735,  0.25482723,  0.15903096,
         0.02761725, -0.11521526, -0.018591  , -0.08635442,  0.1875941 ,
        -0.24020974,  0.33598614,  0.14443065,  0.16586447,  0.31778356,
        -0.01498642,  0.92979336, -0.01211132, -0.10277095, -0.1661265 ,
         0.00524938,  0.10501956, -0.11736097, -0.05673584,  0.83130604,
        -0.16295753,  0.4000649 , -0.28859657,  0.11708748,  0.2133803 ,
         0.5311273 , -0.07117038,  0.36213043,  0.09919515,  0.04499178,
        -0.06633466, -0.00488194, -0.06314942,  0.12584068,  0.316148  ,
         0.4582312 ,  0.26652548,  0.32099682,  0.27829206,  0.35091612,
         0.34667146,  0.51486707,  0.4186107 ,  0.12076598,  0.28654033,
         0.383458  ,  0.8813752 , -0.00307724,  0.44035587,  0.24539825,
         0.08246749,  0.34919336, -0.18238382, -0.0148306 , -0.42604852,
        -0.17117037,  0.29033217, -0.26033685,  0.19798407,  0.08194257,
        -0.01106311,  0.2647324 , -0.21065266,  0.45927092, -0.06942318,
        -0.23241717, -0.01085974,  0.51232344,  0.529505  ,  0.3539233 ,
        -0.01077797, -0.13007548, -0.01885176, -0.10258822,  0.37626743,
         0.24545035,  0.23259574, -0.03035248,  0.39216578,  0.68170875,
        -0.01603815,  0.38373974,  0.37374336,  0.31222183, -0.39119828,
         0.3422284 ,  0.01534443, -0.12235121], dtype=float32)]
weights = model.layers[1].get_weights()[0]
biases = model.layers[1].get_weights()[1]

print(weights.shape)
print(biases.shape)
(784, 128)
(128,)
fig, ax = plt.subplots(16, 8, figsize = (10, 20), sharex = True, sharey= True)

for k, weight in enumerate(weights.transpose()):
    i,j = int(k/8), k%8
    ax[i,j].imshow(weight.reshape(28,28), cmap = 'gray')
../_images/e80eb6e4c87388168e2acc9112bd1037d333e284d9e76ee8e7325c2e82f28a14.png
# Evaluate the model on the test set
y_pred = np.argmax(model.predict(x_test), axis=1)
y_train_pred = np.argmax(model.predict(x_train), axis=1)
  1/313 ━━━━━━━━━━━━━━━━━━━━ 4s 15ms/step

201/313 ━━━━━━━━━━━━━━━━━━━━ 0s 251us/step

313/313 ━━━━━━━━━━━━━━━━━━━━ 0s 286us/step
   1/1875 ━━━━━━━━━━━━━━━━━━━━ 10s 5ms/step

 239/1875 ━━━━━━━━━━━━━━━━━━━━ 0s 210us/step

 487/1875 ━━━━━━━━━━━━━━━━━━━━ 0s 206us/step

 736/1875 ━━━━━━━━━━━━━━━━━━━━ 0s 204us/step

 983/1875 ━━━━━━━━━━━━━━━━━━━━ 0s 204us/step

1228/1875 ━━━━━━━━━━━━━━━━━━━━ 0s 204us/step

1457/1875 ━━━━━━━━━━━━━━━━━━━━ 0s 206us/step

1681/1875 ━━━━━━━━━━━━━━━━━━━━ 0s 209us/step

1875/1875 ━━━━━━━━━━━━━━━━━━━━ 0s 212us/step
labels = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# Create a confusion matrix
cm = confusion_matrix(y_test, y_pred)
ConfusionMatrixDisplay(cm, display_labels = labels).plot()

plt.xticks(rotation = 60)
plt.show()
../_images/dea17cb740a47b3fac10eef5860d9920ea40b8e52d0fa4635123191b0044d50a.png
print('TRAINING REPORT:')
print(classification_report(y_train, y_train_pred))

print('TESTING REPORT:')
print(classification_report(y_test, y_pred))
TRAINING REPORT:
              precision    recall  f1-score   support

           0       0.81      0.93      0.87      6000
           1       0.99      0.99      0.99      6000
           2       0.85      0.84      0.85      6000
           3       0.92      0.93      0.92      6000
           4       0.84      0.86      0.85      6000
           5       1.00      0.96      0.98      6000
           6       0.82      0.69      0.75      6000
           7       0.96      0.97      0.97      6000
           8       0.98      0.99      0.99      6000
           9       0.96      0.98      0.97      6000

    accuracy                           0.91     60000
   macro avg       0.91      0.91      0.91     60000
weighted avg       0.91      0.91      0.91     60000

TESTING REPORT:
              precision    recall  f1-score   support

           0       0.78      0.89      0.83      1000
           1       0.99      0.97      0.98      1000
           2       0.79      0.78      0.79      1000
           3       0.88      0.89      0.89      1000
           4       0.80      0.81      0.80      1000
           5       0.99      0.93      0.96      1000
           6       0.73      0.63      0.68      1000
           7       0.93      0.96      0.95      1000
           8       0.97      0.97      0.97      1000
           9       0.94      0.96      0.95      1000

    accuracy                           0.88     10000
   macro avg       0.88      0.88      0.88     10000
weighted avg       0.88      0.88      0.88     10000