TensorFlow Guide: Building Your First Neural Network with Keras

3. Building Your First Neural Network with Keras

Keras is a high-level API for building and training deep learning models, fully integrated into TensorFlow (tf.keras). It’s designed for fast experimentation and ease of use, making it perfect for beginners. In this chapter, you’ll learn how to build, compile, and train your first neural networks using Keras.

3.1 Understanding Neural Network Basics

Before we build, let’s briefly revisit what a neural network is at a high level:

Input Layer: Receives the raw data.
Hidden Layers: Perform computations (weighted sums + activation functions) to learn complex patterns. A network can have one or many hidden layers.
Output Layer: Produces the final prediction.
Neurons (Units): Basic computational units in each layer.
Weights and Biases: Parameters that the network “learns” during training.
Activation Functions: Non-linear functions applied to the output of each neuron, allowing the network to learn non-linear relationships. Common ones include ReLU, Sigmoid, and Softmax.

3.2 Keras Model Types: Sequential and Functional API

Keras offers two primary ways to build models: the Sequential API and the Functional API.

3.2.1 Sequential Model

The Sequential model is the simplest way to build a neural network in Keras. It’s suitable for networks that are a linear stack of layers, where each layer has exactly one input tensor and one output tensor.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# 1. Define the model
# A Sequential model is a linear stack of layers.
model = keras.Sequential([
    # Input layer (often implicitly defined by the first layer's input_shape)
    # A Dense layer is a fully-connected layer.
    # It means every neuron in this layer is connected to every neuron in the previous layer.
    layers.Dense(units=64, activation='relu', input_shape=(784,)), # 784 input features
    layers.Dense(units=32, activation='relu'),
    layers.Dense(units=10, activation='softmax') # 10 output classes for classification
])

# Let's inspect the model architecture
model.summary()

input_shape=(784,): Specifies that the input to the network will be 784-dimensional vectors. This is common for flattened images (e.g., 28x28 MNIST digits).
units: The number of neurons (output dimensions) in the layer.
activation: The activation function to apply. relu (Rectified Linear Unit) is popular for hidden layers. softmax is used in the output layer for multi-class classification, as it converts raw outputs (logits) into probabilities that sum to 1.

3.2.2 Functional API

The Functional API is more flexible and allows you to build models with arbitrary architectures, such as multi-input models, multi-output models, or models with shared layers. It’s used when your network isn’t a simple linear stack.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# 1. Define the input tensor
inputs = keras.Input(shape=(784,), name='input_features')

# 2. Define the layers and connect them
x = layers.Dense(64, activation='relu', name='hidden_layer_1')(inputs)
x = layers.Dense(32, activation='relu', name='hidden_layer_2')(x)
outputs = layers.Dense(10, activation='softmax', name='output_probabilities')(x)

# 3. Create the model by specifying its inputs and outputs
functional_model = keras.Model(inputs=inputs, outputs=outputs, name='my_functional_model')

# Inspect the model architecture
functional_model.summary()

The Functional API provides a more explicit way to define the flow of tensors through your network.

3.3 The Training Workflow: Compile and Fit

Once your model architecture is defined, the next steps are to compile it and then fit (train) it with data.

3.3.1 Compiling the Model

Compiling configures the model for training. You need to specify:

optimizer: The algorithm used to update the model’s weights during training (e.g., ‘adam’, ‘sgd’, ‘rmsprop’). Adam is a good general-purpose choice.
loss: The function that measures how well the model’s predictions match the true labels. The goal of training is to minimize this loss.
- 'sparse_categorical_crossentropy': For integer labels (e.g., 0, 1, 2 for classes).
- 'categorical_crossentropy': For one-hot encoded labels (e.g., [0,0,1,0] for class 2).
- 'binary_crossentropy': For binary classification.
- 'mse' (Mean Squared Error): For regression tasks.
metrics: A list of metrics to monitor during training and testing (e.g., ‘accuracy’ for classification, ‘mae’ for regression).

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

functional_model.compile(optimizer='rmsprop',
                         loss='sparse_categorical_crossentropy',
                         metrics=['accuracy'])

3.3.2 Training the Model (`.fit()`)

The .fit() method trains the model for a fixed number of epochs.

x: Input data (features).
y: Target data (labels).
epochs: The number of times the model will iterate over the entire dataset.
batch_size (optional): The number of samples per gradient update. Training happens in mini-batches.
validation_data (optional): Data on which to evaluate the loss and metrics at the end of each epoch. This helps in detecting overfitting.

Let’s put it all together with a classic dataset: MNIST (handwritten digits).

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt

# 1. Load and preprocess the MNIST dataset
# The dataset contains 60,000 training images and 10,000 test images.
# Each image is 28x28 grayscale, representing a digit from 0-9.
(train_images, train_labels), (test_images, test_labels) = keras.datasets.mnist.load_data()

# Normalize pixel values from 0-255 to 0-1 (for better neural network performance)
train_images = train_images.astype('float32') / 255.0
test_images = test_images.astype('float32') / 255.0

# Flatten the 28x28 images into 784-dimensional vectors
train_images = train_images.reshape((60000, 784))
test_images = test_images.reshape((10000, 784))

print(f"Training images shape: {train_images.shape}")
print(f"Training labels shape: {train_labels.shape}")
print(f"Test images shape: {test_images.shape}")
print(f"Test labels shape: {test_labels.shape}\n")

# 2. Define the model (Sequential API)
model = keras.Sequential([
    layers.Dense(units=128, activation='relu', input_shape=(784,)),
    layers.Dense(units=64, activation='relu'),
    layers.Dense(units=10, activation='softmax') # 10 classes (digits 0-9)
])

# 3. Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# 4. Train the model
print("Starting training...\n")
history = model.fit(train_images, train_labels,
                    epochs=10,
                    batch_size=32,
                    validation_split=0.1) # Use 10% of training data for validation

print("\nTraining finished!")

# 5. Evaluate the model on the test set
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print(f"\nTest accuracy: {test_acc:.4f}")

# 6. Make predictions
predictions = model.predict(test_images[:5])
print(f"\nPredictions for the first 5 test images:\n{np.argmax(predictions, axis=1)}")
print(f"True labels for the first 5 test images:\n{test_labels[:5]}")

# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

3.4 Key Components of a Keras Layer

Every Keras layer (like layers.Dense) has attributes and methods:

kernel: The weights matrix (learnable parameter).
bias: The bias vector (learnable parameter).
activation: The activation function.
input_shape: The shape of the input it expects.
output_shape: The shape of the output it produces.

You can inspect these after the model has been built (either by calling model.summary() or after the first call to model.fit() or model.predict()).

# Access weights and biases of a layer
print(f"\nWeights of the first dense layer (shape): {model.layers[0].kernel.shape}")
print(f"Bias of the first dense layer (shape): {model.layers[0].bias.shape}")

Exercise 3.1: Build and Train a Regression Model

Objective: Apply your knowledge to build and train a simple regression model.
Instructions:
- Generate Synthetic Data: Create some simple synthetic data for a regression problem.
  - X = np.linspace(-10, 10, 1000) (1000 evenly spaced numbers between -10 and 10)
  - y = X**2 + np.random.normal(0, 5, 1000) (a quadratic relationship with some noise)
  - Reshape X to (1000, 1) to fit Keras input expectations.
- Split Data: Split X and y into training and testing sets (e.g., 80% train, 20% test). sklearn.model_selection.train_test_split is useful here.
- Define a Sequential Model: Create a keras.Sequential model with:
  - An Input layer (or input_shape in the first Dense layer) for a single input feature.
  - Two Dense hidden layers (e.g., 64 units, ‘relu’ activation).
  - A final Dense output layer with 1 unit and no activation function (for regression tasks, we predict raw values).
- Compile the Model:
  - Use optimizer='adam'.
  - Use loss='mse' (Mean Squared Error) for regression.
  - Use metrics=['mae'] (Mean Absolute Error) to monitor.
- Train the Model: Train for a suitable number of epochs (e.g., 50-100). Use validation_split to monitor performance.
- Evaluate and Predict: Evaluate the model on the test set and make predictions on a few test samples.
- Visualize: Plot the true values vs. predicted values for the test set to visually assess performance.
Expected Output: Training progress, test MAE, and a scatter plot showing how well the model’s predictions align with the true values.

# Your solution for Exercise 3.1 here
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

# 1. Generate Synthetic Data
X = np.linspace(-10, 10, 1000)
y = X**2 + np.random.normal(0, 5, 1000) # Quadratic relationship with noise

# Reshape X to (1000, 1) for Keras
X = X.reshape(-1, 1)

print(f"X shape: {X.shape}")
print(f"y shape: {y.shape}\n")

# 2. Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"X_train shape: {X_train.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"y_test shape: {y_test.shape}\n")

# 3. Define a Sequential Model for Regression
regression_model = keras.Sequential([
    # Input layer implicitly defined by the first Dense layer's input_shape
    layers.Dense(units=64, activation='relu', input_shape=(1,), name='hidden_layer_1_reg'),
    layers.Dense(units=64, activation='relu', name='hidden_layer_2_reg'),
    layers.Dense(units=1, name='output_layer_reg') # Single output unit, no activation for regression
])

regression_model.summary()

# 4. Compile the Model
regression_model.compile(optimizer='adam',
                         loss='mse', # Mean Squared Error for regression
                         metrics=['mae']) # Mean Absolute Error

# 5. Train the Model
print("\nStarting regression model training...\n")
reg_history = regression_model.fit(X_train, y_train,
                                   epochs=100,
                                   batch_size=32,
                                   validation_split=0.1,
                                   verbose=0) # Set verbose to 0 to suppress output per epoch

print("\nRegression model training finished!")

# 6. Evaluate and Predict
test_loss_reg, test_mae_reg = regression_model.evaluate(X_test, y_test, verbose=0)
print(f"Test Mean Squared Error (MSE): {test_loss_reg:.4f}")
print(f"Test Mean Absolute Error (MAE): {test_mae_reg:.4f}\n")

# Make predictions on the test set
y_pred = regression_model.predict(X_test)

# 7. Visualize Predictions
plt.figure(figsize=(10, 6))
plt.scatter(X_test, y_test, label='True Values', alpha=0.6)
plt.scatter(X_test, y_pred, label='Predictions', alpha=0.6, marker='x', color='red')
plt.title('True vs. Predicted Values for Regression Model')
plt.xlabel('X (Input Feature)')
plt.ylabel('Y (Target Value)')
plt.legend()
plt.grid(True)
plt.show()

# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(reg_history.history['loss'], label='Training Loss (MSE)')
plt.plot(reg_history.history['val_loss'], label='Validation Loss (MSE)')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss (MSE)')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(reg_history.history['mae'], label='Training MAE')
plt.plot(reg_history.history['val_mae'], label='Validation MAE')
plt.title('Training and Validation Mean Absolute Error')
plt.xlabel('Epoch')
plt.ylabel('MAE')
plt.legend()
plt.show()

You’ve successfully built and trained your first neural networks using Keras for both classification and regression tasks! This is a significant milestone. In the next chapter, we’ll explore how to efficiently handle and preprocess data using the tf.data API.