2. Neural Networks with Keras#
First things first, I’m a realist…
I added instructions to install Tensorflow and Keras (it doesn’t come pre-packaged with Anaconda). I’m assuming some have yet to do that, and these packages take time to install.
Try to run the code block below. If it gives an error that Tensorflow or Keras aren’t found, you’ll need to install those packages. Uncomment the first line in the code block below and run it again.
While that’s happening, we’ll go over some background and vocabulary relating to neural networks.
#!conda install tensorflow keras
from keras.datasets import fashion_mnist
# Load the Fashion-MNIST dataset
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
2.1. What are Neural Networks?#
Neural networks are machine learning models (loosely) inspired by the structure of the human brain. Neural networks can be implemented for regression and classification tasks and are widely used in complex tasks such as image and speech processing, language translation, time-series modeling and forecasting, and anomaly detection. There are numerous variants of neural networks. Some notable examples:
Multi-layer Percpetron (MLP) - the vanilla neural network we’ll use today
Convolutional Neural Network (CNN) - are largely used in image and video processing, but can also be applied to time-series and forecasting problems
Recurrent Neural Network (RNN) - used for time-series modeling and forecasting
Long- Short-Term Memory Neural Network (LSTM) - an upgrade of RNN. This was the cutting edge of natural language processing before transformers (as in Generative Pre-trained Transformer) ushered in the age of LMMs.
Generative Adversarial Networks (GANs) - used for generating novel data (i.e. a genAI model) and fraud detection. GANs comprise parallel neural networks, a generator (creating fake instances of data) and discriminator (trained to detect real data from fake).
2.2. Why Neural Networks over other models? The PROs.#
Ability to model complex nonlinear relationships: Neural networks can automatically learn and represent intricate, nonlinear patterns between inputs and outputs, which many traditional models (like linear regression) cannot do without extensive manual feature engineering.
Handling of high-dimensional and unstructured data: NNs excel at processing large-scale, high-dimensional data (e.g. images, audio, and text) where other models often struggle.
Feature extraction and automatic feature engineering: NNs ‘discover’ and construct relevant features from raw data, reducing the need for manual intervention and domain expertise in feature selection.
Adaptability in deployment: NNs adapt to new data and improve performance over time, making them suitable for dynamic, real-world applications.
2.3. Why not Neural Networks? The CONs.#
Require large amounts of data: NNs have many many parameters, so they typically need thousands to millions of labeled examples to perform well. For smaller data sets, other ML models are more suitable.
Lack interpretability (“black box”): For most NNs (especially deep NNs), we cannot gleen meaning from the parameters.
High computational cost: Training neural networks, especially deep NNs, demands significant computational resources (powerful GPUs/TPUs) and can take much longer than training traditional models. Many such models are trained on remote cloud computing servers (pay per compute).
Risk of overfitting: Neural networks, with their large number of parameters, are prone to overfitting if not properly regularized, especially when trained on small or noisy datasets.
Complexity in development and tuning: Designing, training, and tuning neural networks (e.g., choosing architecture, hyperparameters) is often more complex and time-consuming than working with traditional models, which generally have fewer parameters and simpler structures.
2.4. Neural Network Anatomy and The Multi-Layer Perceptron#
The term perceptron has two common usages: a single artificial neuron or several artificial neurons arranged in a single layer. Either way, a perceptron is a sort of building block for more complex neural networks.
First, let’s consider the single ‘neuron’ interpretation.

In the diagram above, each input feature is assigned a weight (parameters) and a weighted sum of features and a bias term are fed through some non-linear activation function. The diagram above can be represented by the following equation:
2.4.1. Activation Functions#
Every perceptron that comprises a neural network has some activation function. Activiation functions are non-linear and without them, a neural network (regardless of size) could be reduced to a single linear neuron. These activation functions make it that for any given region of the feature space, only some neurons will participate and others will be dormant. So the feature space is parsed by different subsets of neurons.
Some example activation functions are:
Heaviside function - step function
Sigmoid function - same as logit function from logistic regression
Rectifying Linear Unit (ReLu) - linear for positive values and zero otherwise. This is the most common activation function.
Tanh - hyperbolic tangent function, similar to sigmoid but ranges -1 to 1.
2.4.2. Multi-Layer Perceptron (the vanilla NN)#

A MLP comprises layers of perceptrons and each layer may itself contain numerous perceptrons. In this diagram, every neuron in one layer projects onto every neuron in the subsequent layer. These are called dense layers.
Generally, in densely connected NNs, each perceptron in a layer is the same (same number of parameters and same activation function).
Glossary:
Input layer - This layer accepts the features
Hidden layer - layers of perceptrons between input and output. Deep Neural Networks refer to NNs with many hidden layers. In the past, ‘many’ meant more than 3, but today, we have NNs with hundreds or thousands of layers. Deep is subjective and changes as technology improves.
Output layer - the layer where predictions are made
For regression, the output layer will have a single neuron for each predicted value (often one)
For binary classification, the output layer may have one or two neurons.
For multi-class classification, the output layer will have as many neurons as there are classes.
2.4.3. Architecture and Hyper-parameters#
When we first create the MLP, we have to decide on the architecture (how many layers, how many neurons per layer, activation functions, etc) and the hyper-parameters (regularization, batch size, epochs).
We haven’t had to worry about training time and computational demands with the models we’ve used thus far, but NN’s complexity make these issues non-trivial.
In fitting, we can adjust two hyper-parameters that govern learning: learning rate, batch size, and number of epochs.
Learning Rate - determines how much the parameters are adjusted at each update
Batch - a subset of the training data. The training data set is partitioned into batches and the parameters are updated after each batch is processed.
Epoch - Once through the entire training data. The training algorithm iterates through the entire training data set numerous times, each is an epoch.
Batch Size |
Training Speed |
Memory Usage |
Generalization |
|---|---|---|---|
Large |
Faster |
Higher |
Risk of Overfitting |
Small |
Slower |
Lower |
Regularized |
Rule of thumb: If we increase batch size, we should also increase learning rate.
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense, Flatten
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, classification_report, accuracy_score
import matplotlib.pyplot as plt
2.5. Example: Fashion-MNIST, classifying articles of clothing#
The Fashion-MNIST datase comprises 60,000 28x28 pixel, gray-scale images of clothing items from one of the following categories.
0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot
x_train.shape
(60000, 28, 28)
num_samples = 25
fig, ax = plt.subplots(5, 5, figsize = (10, 10), sharex = True, sharey= True)
for k in range(num_samples):
i,j = int(k/5), k%5
ax[i,j].imshow(x_train[k,:,:]/255, cmap = 'gray')
plt.show()
# Normalize the data, pixel data is 0-255 (8-bit) but we want 0-1
x_train = x_train / 255.0
x_test = x_test / 255.0
# Create a vanilla neural network
model = Sequential([
Flatten(input_shape=(28, 28)),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])
# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
model.fit(x_train, y_train, epochs=10, batch_size=32, validation_split=0.2)
Epoch 1/10
/Users/eatai/.pyenv/versions/3.13.1/envs/datascience/lib/python3.13/site-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
super().__init__(**kwargs)
1/1500 ━━━━━━━━━━━━━━━━━━━━ 5:07 205ms/step - accuracy: 0.1250 - loss: 2.5064
84/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 609us/step - accuracy: 0.4923 - loss: 1.4047
178/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 571us/step - accuracy: 0.5888 - loss: 1.1415
272/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 559us/step - accuracy: 0.6349 - loss: 1.0175
367/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 551us/step - accuracy: 0.6638 - loss: 0.9406
462/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 546us/step - accuracy: 0.6845 - loss: 0.8862
554/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 547us/step - accuracy: 0.6996 - loss: 0.8462
649/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 544us/step - accuracy: 0.7117 - loss: 0.8135
745/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 542us/step - accuracy: 0.7215 - loss: 0.7868
833/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 545us/step - accuracy: 0.7291 - loss: 0.7660
925/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 545us/step - accuracy: 0.7359 - loss: 0.7471
1018/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 545us/step - accuracy: 0.7420 - loss: 0.7305
1114/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 543us/step - accuracy: 0.7475 - loss: 0.7154
1147/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 606us/step - accuracy: 0.7492 - loss: 0.7107
1242/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 600us/step - accuracy: 0.7538 - loss: 0.6978
1337/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 595us/step - accuracy: 0.7579 - loss: 0.6861
1434/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 590us/step - accuracy: 0.7618 - loss: 0.6753
1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 692us/step - accuracy: 0.8174 - loss: 0.5204 - val_accuracy: 0.8486 - val_loss: 0.4296
Epoch 2/10
1/1500 ━━━━━━━━━━━━━━━━━━━━ 9s 6ms/step - accuracy: 0.8750 - loss: 0.2825
95/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.8423 - loss: 0.4300
190/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 532us/step - accuracy: 0.8488 - loss: 0.4218
284/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 533us/step - accuracy: 0.8510 - loss: 0.4175
380/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 530us/step - accuracy: 0.8529 - loss: 0.4135
475/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 530us/step - accuracy: 0.8541 - loss: 0.4110
570/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 530us/step - accuracy: 0.8551 - loss: 0.4091
666/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 529us/step - accuracy: 0.8561 - loss: 0.4076
761/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 529us/step - accuracy: 0.8568 - loss: 0.4062
857/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 528us/step - accuracy: 0.8571 - loss: 0.4052
951/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 529us/step - accuracy: 0.8574 - loss: 0.4042
1042/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 531us/step - accuracy: 0.8577 - loss: 0.4032
1136/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 531us/step - accuracy: 0.8578 - loss: 0.4025
1232/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 531us/step - accuracy: 0.8579 - loss: 0.4018
1328/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 530us/step - accuracy: 0.8581 - loss: 0.4011
1425/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 530us/step - accuracy: 0.8583 - loss: 0.4004
1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 604us/step - accuracy: 0.8621 - loss: 0.3878 - val_accuracy: 0.8608 - val_loss: 0.3761
Epoch 3/10
1/1500 ━━━━━━━━━━━━━━━━━━━━ 8s 6ms/step - accuracy: 0.8750 - loss: 0.3307
96/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 529us/step - accuracy: 0.8681 - loss: 0.3795
187/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 540us/step - accuracy: 0.8698 - loss: 0.3675
280/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 540us/step - accuracy: 0.8698 - loss: 0.3651
376/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8697 - loss: 0.3639
473/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 531us/step - accuracy: 0.8700 - loss: 0.3622
569/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 530us/step - accuracy: 0.8702 - loss: 0.3608
666/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 528us/step - accuracy: 0.8703 - loss: 0.3594
762/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 528us/step - accuracy: 0.8704 - loss: 0.3585
859/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 527us/step - accuracy: 0.8705 - loss: 0.3577
954/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 527us/step - accuracy: 0.8707 - loss: 0.3568
1048/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 528us/step - accuracy: 0.8709 - loss: 0.3559
1135/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 532us/step - accuracy: 0.8711 - loss: 0.3552
1223/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 535us/step - accuracy: 0.8712 - loss: 0.3546
1313/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8713 - loss: 0.3540
1407/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8714 - loss: 0.3535
1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 610us/step - accuracy: 0.8724 - loss: 0.3469 - val_accuracy: 0.8721 - val_loss: 0.3569
Epoch 4/10
1/1500 ━━━━━━━━━━━━━━━━━━━━ 7s 5ms/step - accuracy: 0.8750 - loss: 0.3239
95/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8855 - loss: 0.3232
192/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 528us/step - accuracy: 0.8799 - loss: 0.3310
288/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 526us/step - accuracy: 0.8785 - loss: 0.3313
384/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 526us/step - accuracy: 0.8782 - loss: 0.3312
480/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.8787 - loss: 0.3293
576/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.8795 - loss: 0.3272
671/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.8801 - loss: 0.3259
767/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.8805 - loss: 0.3248
863/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.8809 - loss: 0.3240
961/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 524us/step - accuracy: 0.8810 - loss: 0.3236
1057/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 524us/step - accuracy: 0.8812 - loss: 0.3233
1148/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 526us/step - accuracy: 0.8813 - loss: 0.3231
1243/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 526us/step - accuracy: 0.8815 - loss: 0.3228
1339/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 526us/step - accuracy: 0.8815 - loss: 0.3226
1435/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 526us/step - accuracy: 0.8816 - loss: 0.3225
1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 599us/step - accuracy: 0.8822 - loss: 0.3203 - val_accuracy: 0.8750 - val_loss: 0.3516
Epoch 5/10
1/1500 ━━━━━━━━━━━━━━━━━━━━ 7s 5ms/step - accuracy: 0.7812 - loss: 0.4199
97/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.8813 - loss: 0.2982
193/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 524us/step - accuracy: 0.8861 - loss: 0.2940
289/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 523us/step - accuracy: 0.8870 - loss: 0.2966
387/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 521us/step - accuracy: 0.8870 - loss: 0.2995
484/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 521us/step - accuracy: 0.8869 - loss: 0.3018
581/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 521us/step - accuracy: 0.8867 - loss: 0.3033
678/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8869 - loss: 0.3039
774/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8870 - loss: 0.3042
871/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8871 - loss: 0.3045
968/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8871 - loss: 0.3048
1065/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8871 - loss: 0.3048
1162/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8872 - loss: 0.3047
1257/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8872 - loss: 0.3046
1353/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8873 - loss: 0.3046
1449/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.8873 - loss: 0.3045
1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 599us/step - accuracy: 0.8880 - loss: 0.3034 - val_accuracy: 0.8804 - val_loss: 0.3408
Epoch 6/10
1/1500 ━━━━━━━━━━━━━━━━━━━━ 9s 7ms/step - accuracy: 0.9375 - loss: 0.1836
91/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 561us/step - accuracy: 0.8982 - loss: 0.2788
185/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 546us/step - accuracy: 0.8953 - loss: 0.2865
279/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 542us/step - accuracy: 0.8956 - loss: 0.2869
375/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.8957 - loss: 0.2875
470/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 535us/step - accuracy: 0.8958 - loss: 0.2878
567/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 532us/step - accuracy: 0.8959 - loss: 0.2879
662/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 531us/step - accuracy: 0.8960 - loss: 0.2880
756/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 532us/step - accuracy: 0.8960 - loss: 0.2878
839/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 539us/step - accuracy: 0.8959 - loss: 0.2878
933/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 539us/step - accuracy: 0.8958 - loss: 0.2878
1031/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.8958 - loss: 0.2877
1126/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8958 - loss: 0.2875
1217/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.8958 - loss: 0.2874
1312/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.8958 - loss: 0.2873
1406/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.8958 - loss: 0.2872
1500/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.8957 - loss: 0.2872
1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 614us/step - accuracy: 0.8945 - loss: 0.2883 - val_accuracy: 0.8815 - val_loss: 0.3330
Epoch 7/10
1/1500 ━━━━━━━━━━━━━━━━━━━━ 7s 5ms/step - accuracy: 0.8125 - loss: 0.4398
94/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 542us/step - accuracy: 0.8948 - loss: 0.2732
188/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.8960 - loss: 0.2699
282/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8966 - loss: 0.2697
376/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 535us/step - accuracy: 0.8970 - loss: 0.2697
470/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 535us/step - accuracy: 0.8971 - loss: 0.2703
563/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8974 - loss: 0.2705
657/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8976 - loss: 0.2708
753/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 535us/step - accuracy: 0.8979 - loss: 0.2708
847/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 534us/step - accuracy: 0.8980 - loss: 0.2711
942/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 534us/step - accuracy: 0.8980 - loss: 0.2714
1018/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 543us/step - accuracy: 0.8979 - loss: 0.2716
1111/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 543us/step - accuracy: 0.8980 - loss: 0.2717
1205/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 543us/step - accuracy: 0.8981 - loss: 0.2716
1300/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 542us/step - accuracy: 0.8981 - loss: 0.2716
1395/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 541us/step - accuracy: 0.8982 - loss: 0.2715
1491/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 540us/step - accuracy: 0.8982 - loss: 0.2715
1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 616us/step - accuracy: 0.8989 - loss: 0.2716 - val_accuracy: 0.8847 - val_loss: 0.3234
Epoch 8/10
1/1500 ━━━━━━━━━━━━━━━━━━━━ 8s 6ms/step - accuracy: 0.9375 - loss: 0.1962
94/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 541us/step - accuracy: 0.9026 - loss: 0.2717
189/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 534us/step - accuracy: 0.9023 - loss: 0.2659
284/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 533us/step - accuracy: 0.9025 - loss: 0.2637
379/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 531us/step - accuracy: 0.9030 - loss: 0.2612
469/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.9033 - loss: 0.2603
558/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 542us/step - accuracy: 0.9034 - loss: 0.2600
647/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 545us/step - accuracy: 0.9034 - loss: 0.2599
737/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 547us/step - accuracy: 0.9034 - loss: 0.2597
831/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 545us/step - accuracy: 0.9034 - loss: 0.2596
926/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 544us/step - accuracy: 0.9034 - loss: 0.2595
1023/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 541us/step - accuracy: 0.9034 - loss: 0.2596
1119/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 540us/step - accuracy: 0.9034 - loss: 0.2597
1215/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 539us/step - accuracy: 0.9035 - loss: 0.2597
1312/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.9035 - loss: 0.2597
1410/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 535us/step - accuracy: 0.9035 - loss: 0.2598
1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 608us/step - accuracy: 0.9024 - loss: 0.2623 - val_accuracy: 0.8842 - val_loss: 0.3244
Epoch 9/10
1/1500 ━━━━━━━━━━━━━━━━━━━━ 8s 5ms/step - accuracy: 0.9062 - loss: 0.2110
96/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 530us/step - accuracy: 0.8964 - loss: 0.2571
193/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 523us/step - accuracy: 0.9002 - loss: 0.2548
291/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.9018 - loss: 0.2532
389/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 518us/step - accuracy: 0.9026 - loss: 0.2519
487/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 517us/step - accuracy: 0.9032 - loss: 0.2512
585/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 516us/step - accuracy: 0.9035 - loss: 0.2511
683/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 515us/step - accuracy: 0.9038 - loss: 0.2506
782/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 514us/step - accuracy: 0.9041 - loss: 0.2501
882/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 513us/step - accuracy: 0.9043 - loss: 0.2498
981/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 512us/step - accuracy: 0.9045 - loss: 0.2497
1080/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 512us/step - accuracy: 0.9045 - loss: 0.2500
1180/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 511us/step - accuracy: 0.9046 - loss: 0.2502
1279/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 511us/step - accuracy: 0.9046 - loss: 0.2503
1377/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 511us/step - accuracy: 0.9047 - loss: 0.2504
1473/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 512us/step - accuracy: 0.9048 - loss: 0.2504
1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 584us/step - accuracy: 0.9063 - loss: 0.2507 - val_accuracy: 0.8904 - val_loss: 0.3164
Epoch 10/10
1/1500 ━━━━━━━━━━━━━━━━━━━━ 7s 5ms/step - accuracy: 0.9688 - loss: 0.1049
98/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 518us/step - accuracy: 0.9133 - loss: 0.2170
196/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 516us/step - accuracy: 0.9140 - loss: 0.2204
294/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 515us/step - accuracy: 0.9136 - loss: 0.2245
389/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 518us/step - accuracy: 0.9126 - loss: 0.2285
487/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 518us/step - accuracy: 0.9119 - loss: 0.2310
586/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 516us/step - accuracy: 0.9114 - loss: 0.2333
685/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 515us/step - accuracy: 0.9110 - loss: 0.2350
783/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 515us/step - accuracy: 0.9107 - loss: 0.2362
876/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 517us/step - accuracy: 0.9106 - loss: 0.2369
967/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 521us/step - accuracy: 0.9105 - loss: 0.2374
1064/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.9105 - loss: 0.2379
1158/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 522us/step - accuracy: 0.9104 - loss: 0.2382
1256/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 521us/step - accuracy: 0.9104 - loss: 0.2385
1354/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 521us/step - accuracy: 0.9103 - loss: 0.2387
1452/1500 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.9103 - loss: 0.2388
1500/1500 ━━━━━━━━━━━━━━━━━━━━ 1s 592us/step - accuracy: 0.9098 - loss: 0.2410 - val_accuracy: 0.8884 - val_loss: 0.3171
<keras.src.callbacks.history.History at 0x120ecbe00>
model.layers
[<Flatten name=flatten, built=True>,
<Dense name=dense, built=True>,
<Dense name=dense_1, built=True>]
model.layers[1].get_weights()
[array([[-0.0713469 , 0.03250205, 0.06500132, ..., 0.27670878,
-0.18487853, -0.08517335],
[-0.05822118, -0.09768099, 0.07717109, ..., 0.41678566,
-0.01116703, -0.11538623],
[-0.05299133, 0.10303343, -0.09143736, ..., 0.17697497,
-0.11204378, -0.11445503],
...,
[-0.18283023, -0.05082365, -0.15573852, ..., 0.4768174 ,
-0.13417463, -0.1478706 ],
[-0.23174927, -0.29934692, 0.14626086, ..., 0.5767144 ,
-0.3227887 , -0.11754484],
[-0.22875336, -0.20766266, -0.07919479, ..., 0.42055562,
-0.42195907, -0.07124062]], shape=(784, 128), dtype=float32),
array([ 0.33520427, 0.41476032, 0.42854255, 0.26456 , 0.28423107,
0.4619906 , -0.01085375, -0.01410234, 0.46816376, 0.04695842,
0.16199604, 0.1178596 , -0.15440884, 0.26269406, -0.02273692,
0.39012593, 0.11480645, -0.16267306, 0.01186102, -0.20373592,
0.77516353, 0.28331184, 0.04801618, -0.01358667, 0.27781582,
-0.30025145, -0.01313267, -0.02819712, 0.05266671, -0.01129104,
0.3718213 , -0.06276885, 0.07143743, -0.42740637, 0.27966768,
0.5643312 , 0.13939717, -0.02660735, 0.25482723, 0.15903096,
0.02761725, -0.11521526, -0.018591 , -0.08635442, 0.1875941 ,
-0.24020974, 0.33598614, 0.14443065, 0.16586447, 0.31778356,
-0.01498642, 0.92979336, -0.01211132, -0.10277095, -0.1661265 ,
0.00524938, 0.10501956, -0.11736097, -0.05673584, 0.83130604,
-0.16295753, 0.4000649 , -0.28859657, 0.11708748, 0.2133803 ,
0.5311273 , -0.07117038, 0.36213043, 0.09919515, 0.04499178,
-0.06633466, -0.00488194, -0.06314942, 0.12584068, 0.316148 ,
0.4582312 , 0.26652548, 0.32099682, 0.27829206, 0.35091612,
0.34667146, 0.51486707, 0.4186107 , 0.12076598, 0.28654033,
0.383458 , 0.8813752 , -0.00307724, 0.44035587, 0.24539825,
0.08246749, 0.34919336, -0.18238382, -0.0148306 , -0.42604852,
-0.17117037, 0.29033217, -0.26033685, 0.19798407, 0.08194257,
-0.01106311, 0.2647324 , -0.21065266, 0.45927092, -0.06942318,
-0.23241717, -0.01085974, 0.51232344, 0.529505 , 0.3539233 ,
-0.01077797, -0.13007548, -0.01885176, -0.10258822, 0.37626743,
0.24545035, 0.23259574, -0.03035248, 0.39216578, 0.68170875,
-0.01603815, 0.38373974, 0.37374336, 0.31222183, -0.39119828,
0.3422284 , 0.01534443, -0.12235121], dtype=float32)]
weights = model.layers[1].get_weights()[0]
biases = model.layers[1].get_weights()[1]
print(weights.shape)
print(biases.shape)
(784, 128)
(128,)
fig, ax = plt.subplots(16, 8, figsize = (10, 20), sharex = True, sharey= True)
for k, weight in enumerate(weights.transpose()):
i,j = int(k/8), k%8
ax[i,j].imshow(weight.reshape(28,28), cmap = 'gray')
# Evaluate the model on the test set
y_pred = np.argmax(model.predict(x_test), axis=1)
y_train_pred = np.argmax(model.predict(x_train), axis=1)
1/313 ━━━━━━━━━━━━━━━━━━━━ 4s 15ms/step
201/313 ━━━━━━━━━━━━━━━━━━━━ 0s 251us/step
313/313 ━━━━━━━━━━━━━━━━━━━━ 0s 286us/step
1/1875 ━━━━━━━━━━━━━━━━━━━━ 10s 5ms/step
239/1875 ━━━━━━━━━━━━━━━━━━━━ 0s 210us/step
487/1875 ━━━━━━━━━━━━━━━━━━━━ 0s 206us/step
736/1875 ━━━━━━━━━━━━━━━━━━━━ 0s 204us/step
983/1875 ━━━━━━━━━━━━━━━━━━━━ 0s 204us/step
1228/1875 ━━━━━━━━━━━━━━━━━━━━ 0s 204us/step
1457/1875 ━━━━━━━━━━━━━━━━━━━━ 0s 206us/step
1681/1875 ━━━━━━━━━━━━━━━━━━━━ 0s 209us/step
1875/1875 ━━━━━━━━━━━━━━━━━━━━ 0s 212us/step
labels = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
# Create a confusion matrix
cm = confusion_matrix(y_test, y_pred)
ConfusionMatrixDisplay(cm, display_labels = labels).plot()
plt.xticks(rotation = 60)
plt.show()
print('TRAINING REPORT:')
print(classification_report(y_train, y_train_pred))
print('TESTING REPORT:')
print(classification_report(y_test, y_pred))
TRAINING REPORT:
precision recall f1-score support
0 0.81 0.93 0.87 6000
1 0.99 0.99 0.99 6000
2 0.85 0.84 0.85 6000
3 0.92 0.93 0.92 6000
4 0.84 0.86 0.85 6000
5 1.00 0.96 0.98 6000
6 0.82 0.69 0.75 6000
7 0.96 0.97 0.97 6000
8 0.98 0.99 0.99 6000
9 0.96 0.98 0.97 6000
accuracy 0.91 60000
macro avg 0.91 0.91 0.91 60000
weighted avg 0.91 0.91 0.91 60000
TESTING REPORT:
precision recall f1-score support
0 0.78 0.89 0.83 1000
1 0.99 0.97 0.98 1000
2 0.79 0.78 0.79 1000
3 0.88 0.89 0.89 1000
4 0.80 0.81 0.80 1000
5 0.99 0.93 0.96 1000
6 0.73 0.63 0.68 1000
7 0.93 0.96 0.95 1000
8 0.97 0.97 0.97 1000
9 0.94 0.96 0.95 1000
accuracy 0.88 10000
macro avg 0.88 0.88 0.88 10000
weighted avg 0.88 0.88 0.88 10000