2. Core Concepts and Fundamentals
TensorFlow is built upon a few fundamental concepts that, once understood, unlock its full power. In this chapter, we’ll break down the core building blocks: Tensors, Operations, and the underlying concept of Graphs (even in TensorFlow 2.x’s eager execution model).
2.1 Tensors: The Universal Data Structure
In TensorFlow, all data—whether it’s raw input, model weights, biases, or outputs—is represented as tensors. A tensor is a multi-dimensional array, similar to NumPy arrays, but with the added benefit of being able to run on GPUs (for accelerated computation) and being part of a computation graph.
What makes a Tensor?
Every tensor has two key properties:
- Data Type (
dtype): The type of elements it holds (e.g.,tf.float32,tf.int32,tf.string,tf.bool). TensorFlow is very strict aboutdtypecompatibility in operations. - Shape (
shape): The number of dimensions (rank) and the size of each dimension.- Scalar (Rank 0): A single number. Shape:
() - Vector (Rank 1): An array of numbers. Shape:
(D,) - Matrix (Rank 2): A 2D array. Shape:
(D1, D2) - Higher-rank tensors: For images (Height, Width, Channels), or batches of images (Batch Size, Height, Width, Channels).
- Scalar (Rank 0): A single number. Shape:
Creating Tensors
Let’s see how to create tensors using TensorFlow.
import tensorflow as tf
# 1. Scalar (Rank 0 tensor)
scalar_tensor = tf.constant(7)
print(f"Scalar Tensor: {scalar_tensor}")
print(f"Data type: {scalar_tensor.dtype}, Shape: {scalar_tensor.shape}\n")
# 2. Vector (Rank 1 tensor)
vector_tensor = tf.constant([1, 2, 3, 4, 5])
print(f"Vector Tensor: {vector_tensor}")
print(f"Data type: {vector_tensor.dtype}, Shape: {vector_tensor.shape}\n")
# 3. Matrix (Rank 2 tensor)
matrix_tensor = tf.constant([[10, 20], [30, 40]])
print(f"Matrix Tensor:\n{matrix_tensor}")
print(f"Data type: {matrix_tensor.dtype}, Shape: {matrix_tensor.shape}\n")
# 4. Higher-rank tensor (e.g., a 3D tensor)
# This could represent 2 images, each 2x2 pixels with 3 color channels (RGB)
image_tensor = tf.constant([
[[[255, 0, 0], [0, 255, 0]], [[0, 0, 255], [255, 255, 0]]],
[[[100, 100, 100], [200, 200, 200]], [[50, 50, 50], [150, 150, 150]]]
], dtype=tf.int32)
print(f"3D Image Tensor:\n{image_tensor}")
print(f"Data type: {image_tensor.dtype}, Shape: {image_tensor.shape}\n")
# Tensors with specified data types
float_tensor = tf.constant(3.14, dtype=tf.float64)
print(f"Float64 Tensor: {float_tensor}, Dtype: {float_tensor.dtype}\n")
# Tensors with NumPy arrays
import numpy as np
numpy_array = np.array([[1.0, 2.0], [3.0, 4.0]])
tf_from_numpy = tf.constant(numpy_array)
print(f"Tensor from NumPy array:\n{tf_from_numpy}")
print(f"Data type: {tf_from_numpy.dtype}, Shape: {tf_from_numpy.shape}\n")
# Tensors of zeros and ones
zeros_tensor = tf.zeros(shape=(2, 3), dtype=tf.float32)
ones_tensor = tf.ones(shape=(1, 5), dtype=tf.int32)
print(f"Zeros Tensor:\n{zeros_tensor}")
print(f"Ones Tensor:\n{ones_tensor}\n")
# Random tensors
random_uniform_tensor = tf.random.uniform(shape=(2, 2), minval=0, maxval=10, dtype=tf.int32)
random_normal_tensor = tf.random.normal(shape=(3, 3), mean=0.0, stddev=1.0, dtype=tf.float32)
print(f"Random Uniform Tensor:\n{random_uniform_tensor}")
print(f"Random Normal Tensor:\n{random_normal_tensor}\n")
Tensor to NumPy Conversion
You can easily convert TensorFlow tensors to NumPy arrays and vice-versa.
import tensorflow as tf
import numpy as np
tf_tensor = tf.constant([[1, 2], [3, 4]])
numpy_from_tf = tf_tensor.numpy() # Convert to NumPy array
print(f"Tensor: {tf_tensor}")
print(f"NumPy array from Tensor: {numpy_from_tf}\n")
numpy_array = np.array([5, 6, 7])
tf_from_numpy = tf.convert_to_tensor(numpy_array) # Convert NumPy array to Tensor
print(f"NumPy array: {numpy_array}")
print(f"Tensor from NumPy array: {tf_from_numpy}\n")
Exercise 2.1: Tensor Manipulation
- Objective: Create various tensors and practice extracting information about them.
- Instructions:
- Create a tensor
Athat is a 2x3 matrix of integers, containing values of your choice. - Create a tensor
Bthat is a 4x1 vector of floating-point numbers, all set to0.5. - Create a tensor
Cthat is a 3x3 matrix filled with random numbers drawn from a normal distribution. - Print the
shape,dtype, andrank(number of dimensions) of each tensor. (Hint: usetf.rank()ortensor.ndimfor rank). - Convert tensor
Ato a NumPy array and print it. - Create a NumPy array
Dof shape(2, 2)and convert it into a TensorFlow tensor.
- Create a tensor
- Expected Output: The shapes, data types, and ranks for A, B, and C, and correctly converted arrays/tensors.
# Your solution for Exercise 2.1 here
import tensorflow as tf
import numpy as np
# Create tensor A (2x3 matrix of integers)
A = tf.constant([[1, 2, 3], [4, 5, 6]], dtype=tf.int32)
print(f"Tensor A:\n{A}")
print(f"Shape of A: {A.shape}")
print(f"Dtype of A: {A.dtype}")
print(f"Rank of A: {tf.rank(A).numpy()}\n")
# Create tensor B (4x1 vector of floating-point numbers, all 0.5)
B = tf.constant([0.5, 0.5, 0.5, 0.5], dtype=tf.float32)
# Reshape to 4x1 explicitly if needed for certain operations, though (4,) is also a vector
B_reshaped = tf.reshape(B, (4, 1))
print(f"Tensor B:\n{B_reshaped}")
print(f"Shape of B: {B_reshaped.shape}")
print(f"Dtype of B: {B_reshaped.dtype}")
print(f"Rank of B: {tf.rank(B_reshaped).numpy()}\n")
# Create tensor C (3x3 matrix of random numbers from a normal distribution)
C = tf.random.normal(shape=(3, 3), mean=0.0, stddev=1.0, dtype=tf.float32)
print(f"Tensor C:\n{C}")
print(f"Shape of C: {C.shape}")
print(f"Dtype of C: {C.dtype}")
print(f"Rank of C: {tf.rank(C).numpy()}\n")
# Convert tensor A to a NumPy array and print it
numpy_A = A.numpy()
print(f"NumPy array from Tensor A:\n{numpy_A}\n")
# Create a NumPy array D and convert it to a TensorFlow tensor
D_numpy = np.array([[10, 11], [12, 13]], dtype=np.int64)
D_tf = tf.convert_to_tensor(D_numpy)
print(f"NumPy array D:\n{D_numpy}")
print(f"Tensor from NumPy array D:\n{D_tf}")
print(f"Shape of D_tf: {D_tf.shape}")
print(f"Dtype of D_tf: {D_tf.dtype}")
print(f"Rank of D_tf: {tf.rank(D_tf).numpy()}\n")
2.2 Operations: Manipulating Tensors
TensorFlow provides a rich set of operations to manipulate tensors. These operations are essentially mathematical functions that take one or more tensors as input and produce one or more tensors as output.
Common Operations
import tensorflow as tf
tensor_a = tf.constant([1, 2, 3], dtype=tf.float32)
tensor_b = tf.constant([4, 5, 6], dtype=tf.float32)
matrix_c = tf.constant([[1, 2], [3, 4]], dtype=tf.float32)
matrix_d = tf.constant([[5, 6], [7, 8]], dtype=tf.float32)
# 1. Element-wise operations (like NumPy)
print("--- Element-wise Operations ---")
print(f"Addition: {tensor_a + tensor_b}")
print(f"Subtraction: {tensor_b - tensor_a}")
print(f"Multiplication: {tensor_a * tensor_b}")
print(f"Division: {tensor_b / tensor_a}")
print(f"Power: {tf.pow(tensor_a, 2)}\n")
# 2. Matrix Multiplication
# The inner dimensions must match: (A, B) @ (B, C) -> (A, C)
print("--- Matrix Multiplication ---")
matrix_product = tf.matmul(matrix_c, matrix_d)
print(f"Matrix C:\n{matrix_c}")
print(f"Matrix D:\n{matrix_d}")
print(f"Matrix product (C @ D):\n{matrix_product}\n")
# You can also use the '@' operator for matrix multiplication
matrix_product_op = matrix_c @ matrix_d
print(f"Matrix product (C @ D) using '@' operator:\n{matrix_product_op}\n")
# 3. Aggregation Operations (summarizing tensors)
print("--- Aggregation Operations ---")
large_tensor = tf.constant([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
], dtype=tf.float32)
print(f"Sum of all elements: {tf.reduce_sum(large_tensor)}")
print(f"Mean of all elements: {tf.reduce_mean(large_tensor)}")
print(f"Maximum element: {tf.reduce_max(large_tensor)}")
print(f"Minimum element: {tf.reduce_min(large_tensor)}\n")
# Aggregation along an axis
print(f"Sum along axis 0 (columns): {tf.reduce_sum(large_tensor, axis=0)}")
print(f"Mean along axis 1 (rows): {tf.reduce_mean(large_tensor, axis=1)}\n")
# 4. Reshaping Tensors
print("--- Reshaping Tensors ---")
original_tensor = tf.constant(range(9)) # [0, 1, 2, 3, 4, 5, 6, 7, 8]
print(f"Original Tensor: {original_tensor}, Shape: {original_tensor.shape}")
reshaped_tensor = tf.reshape(original_tensor, shape=(3, 3))
print(f"Reshaped to (3,3):\n{reshaped_tensor}, Shape: {reshaped_tensor.shape}\n")
# 5. Slicing and Indexing
print("--- Slicing and Indexing ---")
sliced_row = reshaped_tensor[0, :] # First row
print(f"First row: {sliced_row}\n")
sliced_column = reshaped_tensor[:, 1] # Second column
print(f"Second column: {sliced_column}\n")
sub_matrix = reshaped_tensor[0:2, 0:2] # Top-left 2x2 sub-matrix
print(f"Top-left 2x2 sub-matrix:\n{sub_matrix}\n")
# 6. Broadcasting
# TensorFlow automatically expands dimensions to make shapes compatible
print("--- Broadcasting ---")
tensor_val = tf.constant(10.0) # Scalar
vector_val = tf.constant([1.0, 2.0, 3.0])
result_broadcast = tensor_val * vector_val
print(f"Scalar {tensor_val} * Vector {vector_val} = {result_broadcast}\n")
# Example: adding a vector to each row of a matrix
matrix_e = tf.constant([[10, 20, 30], [40, 50, 60]]) # Shape (2,3)
vector_f = tf.constant([1, 2, 3]) # Shape (3,)
broadcast_sum = matrix_e + vector_f
print(f"Matrix E:\n{matrix_e}")
print(f"Vector F: {vector_f}")
print(f"Broadcast Sum (E + F):\n{broadcast_sum}\n")
Exercise 2.2: Tensor Operations Workout
- Objective: Practice using various TensorFlow operations.
- Instructions:
- Create two 3x2 matrices,
M1andM2, with random integer values between 1 and 10. - Perform element-wise addition, subtraction, and multiplication between
M1andM2. - Calculate the matrix product of
M1(transposed) andM2. Remember, fortf.matmul(A, B), the number of columns inAmust equal the number of rows inB. So ifM1is(3,2)andM2is(3,2), you need to transpose one. Trytf.transpose(M1) @ M2. - Find the mean, maximum, and minimum values of
M1alongaxis=0(columns). - Reshape
M2into a 6x1 vector. - Extract the second row and the first column of
M1. - Create a scalar tensor
Swith value5. MultiplyM2bySusing broadcasting.
- Create two 3x2 matrices,
- Expected Output: The results of all operations, including shapes where relevant.
# Your solution for Exercise 2.2 here
import tensorflow as tf
# Create two 3x2 matrices, M1 and M2
M1 = tf.random.uniform(shape=(3, 2), minval=1, maxval=11, dtype=tf.int32)
M2 = tf.random.uniform(shape=(3, 2), minval=1, maxval=11, dtype=tf.int32)
print(f"Matrix M1:\n{M1}")
print(f"Matrix M2:\n{M2}\n")
# Element-wise operations
print("--- Element-wise Operations ---")
print(f"M1 + M2:\n{M1 + M2}")
print(f"M1 - M2:\n{M1 - M2}")
print(f"M1 * M2:\n{M1 * M2}\n")
# Matrix product of M1 (transposed) and M2
# M1 is (3,2), M2 is (3,2). M1_T will be (2,3). (2,3) @ (3,2) -> (2,2)
matrix_product_M1T_M2 = tf.transpose(M1) @ M2
print(f"Matrix product (M1_transposed @ M2):\n{matrix_product_M1T_M2}\n")
# Find mean, max, min of M1 along axis=0
print("--- Aggregation along axis=0 for M1 ---")
mean_M1_axis0 = tf.reduce_mean(tf.cast(M1, tf.float32), axis=0) # Cast to float for mean
max_M1_axis0 = tf.reduce_max(M1, axis=0)
min_M1_axis0 = tf.reduce_min(M1, axis=0)
print(f"Mean of M1 along axis 0: {mean_M1_axis0}")
print(f"Max of M1 along axis 0: {max_M1_axis0}")
print(f"Min of M1 along axis 0: {min_M1_axis0}\n")
# Reshape M2 into a 6x1 vector
M2_reshaped = tf.reshape(M2, (6, 1))
print(f"M2 reshaped to (6,1):\n{M2_reshaped}")
print(f"Shape of M2_reshaped: {M2_reshaped.shape}\n")
# Extract second row and first column of M1
second_row_M1 = M1[1, :]
first_column_M1 = M1[:, 0]
print(f"Second row of M1: {second_row_M1}")
print(f"First column of M1: {first_column_M1}\n")
# Create a scalar tensor S and multiply M2 by S
S = tf.constant(5, dtype=tf.int32)
M2_multiplied_by_S = M2 * S
print(f"Scalar S: {S}")
print(f"M2 multiplied by S (broadcasting):\n{M2_multiplied_by_S}\n")
2.3 Computation Graphs and tf.function
Historically, TensorFlow 1.x relied heavily on static computation graphs, where you first defined the entire graph of operations, and then executed it in a session. This offered performance benefits but made debugging harder.
TensorFlow 2.x introduced Eager Execution as the default. This means operations are executed immediately, much like in Python with NumPy, making development and debugging much more intuitive.
However, static graphs still offer significant advantages for performance and deployment. TensorFlow 2.x leverages this with tf.function.
Eager Execution
In eager execution, TensorFlow operations execute immediately and return their values. This behavior is similar to how operations work in Python with NumPy.
import tensorflow as tf
# Eager execution is the default
a = tf.constant([[10, 20], [30, 40]])
b = tf.constant([[1, 2], [3, 4]])
c = a + b
print(f"Result of eager execution:\n{c}\n")
# You can inspect tensor values directly
print(f"Value of c[0,0]: {c[0,0].numpy()}")
tf.function: Bridging Eager and Graph Modes
The @tf.function decorator allows you to selectively compile Python functions into high-performance TensorFlow graphs. When a function decorated with tf.function is called, TensorFlow traces the Python operations to build a callable graph. This graph can then be optimized (e.g., fusing operations, optimizing memory usage) and executed efficiently, often with significant speedups, especially for repetitive computations like training loops.
import tensorflow as tf
import time
# A simple Python function
def python_function(x, y):
return tf.matmul(x, y) + x
# The same function compiled into a TensorFlow graph
@tf.function
def tf_function(x, y):
return tf.matmul(x, y) + x
# Create some large tensors
size = 1000
matrix1 = tf.random.uniform(shape=(size, size), dtype=tf.float32)
matrix2 = tf.random.uniform(shape=(size, size), dtype=tf.float32)
print(f"Matrices shape: {matrix1.shape}\n")
# Time the Python function (eager execution)
start_time = time.time()
for _ in range(10):
_ = python_function(matrix1, matrix2)
end_time = time.time()
print(f"Eager execution time: {end_time - start_time:.4f} seconds\n")
# Time the tf.function (graph execution)
start_time = time.time()
for _ in range(10):
_ = tf_function(matrix1, matrix2) # First call triggers tracing, subsequent calls use the graph
end_time = time.time()
print(f"Graph execution time (@tf.function): {end_time - start_time:.4f} seconds")
print("Note: The first call to tf_function includes graph tracing overhead.\n")
# Let's call it again to see the pure graph execution speed
start_time = time.time()
for _ in range(10):
_ = tf_function(matrix1, matrix2)
end_time = time.time()
print(f"Pure graph execution time (subsequent calls): {end_time - start_time:.4f} seconds")
You’ll typically see a significant speedup with @tf.function for repetitive tasks. It’s best used for entire computation steps (like a single training step or a prediction step) rather than on individual operations.
Why use tf.function?
- Performance: Graphs can be optimized by TensorFlow (e.g., fusing operations, simplifying computations) and run more efficiently on different hardware, including GPUs and TPUs.
- Portability: Graphs can be saved and run independently of the Python code, making deployment to various environments (TensorFlow Serving, TensorFlow Lite, TensorFlow.js) easier.
- Serialization: Allows saving and loading models as portable
SavedModelformats.
Exercise 2.3: Harnessing tf.function
- Objective: Experience the performance benefits of
tf.function. - Instructions:
- Write a Python function
complex_calculation(x)that takes a 2D tensorx(e.g., 500x500 matrix) and performs the following sequence of operations:- Square
x. - Take the square root of the result.
- Add
5.0to every element. - Perform a matrix multiplication of the result with its transpose (
result @ tf.transpose(result)). - Compute the mean of all elements in the final tensor.
- Square
- Create a large random tensor
my_data = tf.random.uniform(shape=(500, 500), minval=0.0, maxval=100.0, dtype=tf.float32). - Run
complex_calculation100 times without@tf.functionand measure the execution time. - Decorate
complex_calculationwith@tf.functionto createtf_complex_calculation. - Run
tf_complex_calculation100 times and measure the execution time. - Compare the two execution times and note the difference.
- Write a Python function
- Expected Output: The final mean value for both calls (should be identical) and a clear comparison of execution times, demonstrating the speedup from
tf.function.
# Your solution for Exercise 2.3 here
import tensorflow as tf
import time
# Original Python function
def complex_calculation(x):
# 1. Square x
squared_x = tf.square(x)
# 2. Take the square root of the result
sqrt_x = tf.sqrt(squared_x)
# 3. Add 5.0 to every element
added_five = sqrt_x + 5.0
# 4. Perform a matrix multiplication of the result with its transpose
matmul_result = added_five @ tf.transpose(added_five)
# 5. Compute the mean of all elements in the final tensor
final_mean = tf.reduce_mean(matmul_result)
return final_mean
# Compiled version with @tf.function
@tf.function
def tf_complex_calculation(x):
# 1. Square x
squared_x = tf.square(x)
# 2. Take the square root of the result
sqrt_x = tf.sqrt(squared_x)
# 3. Add 5.0 to every element
added_five = sqrt_x + 5.0
# 4. Perform a matrix multiplication of the result with its transpose
matmul_result = added_five @ tf.transpose(added_five)
# 5. Compute the mean of all elements in the final tensor
final_mean = tf.reduce_mean(matmul_result)
return final_mean
# Create a large random tensor
my_data = tf.random.uniform(shape=(500, 500), minval=0.0, maxval=100.0, dtype=tf.float32)
num_runs = 100
print(f"Data tensor shape: {my_data.shape}\n")
# Run without @tf.function (eager execution)
eager_start_time = time.time()
eager_results = []
for _ in range(num_runs):
eager_results.append(complex_calculation(my_data))
eager_end_time = time.time()
eager_time = eager_end_time - eager_start_time
print(f"Eager execution time for {num_runs} runs: {eager_time:.4f} seconds")
print(f"First eager result: {eager_results[0].numpy():.4f}\n")
# Run with @tf.function (graph execution)
graph_start_time = time.time()
graph_results = []
for _ in range(num_runs):
graph_results.append(tf_complex_calculation(my_data))
graph_end_time = time.time()
graph_time = graph_end_time - graph_start_time
print(f"Graph execution time for {num_runs} runs (@tf.function): {graph_time:.4f} seconds")
print(f"First graph result: {graph_results[0].numpy():.4f}\n")
print(f"Speedup: {eager_time / graph_time:.2f}x faster with @tf.function")
You’ve now grasped the fundamental building blocks of TensorFlow: tensors, operations, and the power of tf.function to compile efficient graphs. These concepts are crucial as you move on to building more complex models.