Zero to Mastery: Helm and Kubernetes with AKS Cluster - A Comprehensive Learning Guide

// table of contents

Zero to Mastery: Helm and Kubernetes with AKS Cluster

Welcome to this comprehensive learning guide designed to take you from a complete novice to a master of Helm and Kubernetes, specifically within the Azure Kubernetes Service (AKS) environment. This document will walk you through the essential concepts, practical examples, and advanced techniques required to successfully deploy, manage, and scale your applications from development to production.

1. Introduction to Helm and Kubernetes with AKS

What is Kubernetes?

Kubernetes (often abbreviated as K8s) is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. Kubernetes provides a platform for running and managing these containers in a highly available and resilient manner.

What is Helm?

Helm is the package manager for Kubernetes. Just like you use package managers like apt on Ubuntu or yum on CentOS to install software, Helm helps you install and manage applications on Kubernetes. Helm packages are called “Charts,” and they contain all the necessary resources and configurations to deploy an application or service to a Kubernetes cluster.

What is Azure Kubernetes Service (AKS)?

Azure Kubernetes Service (AKS) is a managed Kubernetes offering from Microsoft Azure. AKS simplifies the deployment, management, and scaling of Kubernetes clusters in the cloud. With AKS, Azure handles the underlying infrastructure (like control plane management and automatic upgrades), allowing you to focus on your applications rather than the operational overhead of Kubernetes itself. AKS integrates seamlessly with other Azure services for monitoring, security, and networking.

Why learn Helm and Kubernetes with AKS?

Learning Helm and Kubernetes with AKS offers significant benefits for modern application development and operations:

  • Simplified Deployment and Management: Kubernetes automates many tasks related to container orchestration, while Helm streamlines the packaging and deployment of complex applications. AKS further simplifies Kubernetes management by handling the control plane.
  • Scalability: Kubernetes allows you to scale your applications up or down based on demand, ensuring optimal resource utilization and performance.
  • Portability: Kubernetes workloads can run on various cloud providers or on-premises, offering flexibility and avoiding vendor lock-in.
  • Resilience: Kubernetes automatically handles failures by restarting containers, rescheduling pods, and ensuring application availability.
  • Infrastructure as Code (IaC): Helm Charts enable you to define your application deployments as code, facilitating version control, collaboration, and consistent deployments across environments.
  • Azure Ecosystem Integration: AKS integrates deeply with Azure services like Azure Container Registry (ACR), Azure Monitor, Azure Key Vault, and Azure DevOps (including GitHub Actions), providing a powerful and cohesive platform for cloud-native development.
  • Industry Relevance: Kubernetes has become the de facto standard for container orchestration, and proficiency in AKS and Helm is highly sought after in the cloud and DevOps job markets.

Setting up your development environment

Before diving into Helm and Kubernetes, let’s set up your local machine and Azure environment.

Prerequisites:

  • Azure Account: You’ll need an active Azure subscription. If you don’t have one, you can sign up for a free Azure account.
  • Azure CLI: The Azure command-line interface (CLI) is essential for interacting with Azure resources.
  • kubectl: The Kubernetes command-line tool (kubectl) is used to run commands against Kubernetes clusters.
  • Helm CLI: The Helm client is used to manage Helm Charts.
  • Git: For version control of your code and configurations.
  • Code Editor: Visual Studio Code is highly recommended for Kubernetes and Helm development due to its rich extensions.

Step-by-step setup:

  1. Log in to Azure CLI:

    az login
    

    Follow the browser prompts to authenticate.

  2. Set your Azure subscription: If you have multiple subscriptions, set the one you want to use:

    az account set --subscription "Your Subscription Name or ID"
    
  3. Install/Verify kubectl and helm: Ensure you have the latest versions.

    kubectl version --client
    helm version --client
    

    If not installed, refer to the installation links above.

  4. Create an Azure Resource Group: A resource group is a logical container for your Azure resources.

    az group create --name myAKSResourceGroup --location eastus
    

    You can choose a different location closer to you.

  5. Create an AKS Cluster: This command will create a basic AKS cluster. For production, you would add more configurations (e.g., availability zones, network settings).

    az aks create --resource-group myAKSResourceGroup --name myAKSCluster --node-count 2 --generate-ssh-keys --enable-managed-identity
    

    This command creates a cluster with two nodes and enables a managed identity, which is a best practice for AKS.

  6. Get AKS cluster credentials: Configure kubectl to connect to your new AKS cluster.

    az aks get-credentials --resource-group myAKSResourceGroup --name myAKSCluster
    

    Now you can interact with your AKS cluster using kubectl. Test it:

    kubectl get nodes
    

    You should see your two nodes listed.

2. Core Concepts and Fundamentals

This section will introduce you to the fundamental building blocks of Kubernetes and Helm.

2.1 Kubernetes Core Concepts

2.1.1 Pods

A Pod is the smallest deployable unit in Kubernetes. It represents a single instance of a running process in your cluster. Pods typically contain one or more containers (e.g., a Docker container). All containers within a Pod share the same network namespace, IP address, and storage volumes.

Key characteristics:

  • Ephemeral: Pods are designed to be short-lived. If a Pod crashes or a node fails, Kubernetes will automatically create a new Pod.
  • Single application instance: Generally, you run a single main application process per Pod, though sidecar containers (e.g., a logging agent or a proxy) can share the Pod.
  • Shared resources: Containers in a Pod share the same network and can communicate via localhost. They can also share storage volumes.

Code Example: Basic Nginx Pod Let’s create a simple Pod running an Nginx web server.

# nginx-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-nginx-pod
  labels:
    app: nginx
spec:
  containers:
  - name: nginx-container
    image: nginx:latest
    ports:
    - containerPort: 80

To deploy this Pod:

kubectl apply -f nginx-pod.yaml

Check the status:

kubectl get pods

You should see my-nginx-pod in the Running state. To access the Nginx web page, you’d typically need a Service (covered later). For now, let’s clean up.

kubectl delete -f nginx-pod.yaml

Exercise/Mini-Challenge:

  1. Create a Pod named my-busybox-pod using the busybox image.
  2. Have the busybox container run a command that prints “Hello from BusyBox!” to standard output and then exits.
  3. Check the logs of the Pod after it runs.

2.1.2 Deployments

Deployments are a higher-level abstraction in Kubernetes used to manage the lifecycle of Pods. They provide declarative updates for Pods and ReplicaSets, allowing you to define how many replicas of your application should be running and how to update them (e.g., rolling updates).

Key characteristics:

  • Manage ReplicaSets: A Deployment manages ReplicaSets, which in turn ensure a specified number of Pod replicas are always running.
  • Rolling Updates: Deployments enable zero-downtime application updates by gradually replacing old Pods with new ones.
  • Rollbacks: If an update causes issues, you can easily roll back to a previous stable version.
  • Desired State: You define the desired state of your application (e.g., 3 replicas of Nginx version 1.25), and Kubernetes works to maintain that state.

Code Example: Nginx Deployment Let’s create a Deployment for our Nginx application, ensuring 3 replicas are always running.

# nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx-container
        image: nginx:latest
        ports:
        - containerPort: 80

Deploy the application:

kubectl apply -f nginx-deployment.yaml

Check the status:

kubectl get deployments
kubectl get pods -l app=nginx

You should see 3 Nginx Pods running.

Now, let’s perform a rolling update by changing the Nginx image version.

# nginx-deployment-updated.yaml (modify image to nginx:1.25.3)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx-container
        image: nginx:1.25.3 # Updated image version
        ports:
        - containerPort: 80

Apply the update:

kubectl apply -f nginx-deployment-updated.yaml

Watch the rollout:

kubectl rollout status deployment/nginx-deployment

You’ll see the old Pods being terminated and new ones with nginx:1.25.3 being created.

Clean up:

kubectl delete -f nginx-deployment.yaml

Exercise/Mini-Challenge:

  1. Create a Deployment named my-web-app with 5 replicas using the httpd:2.4 image.
  2. Verify that 5 Pods are running.
  3. Scale the deployment down to 2 replicas using kubectl scale.
  4. Perform a rolling update to httpd:2.4.58-alpine.
  5. After the update, try to roll back to the previous version.

2.1.3 Services

Services in Kubernetes enable network access to a set of Pods. Since Pods are ephemeral and their IP addresses can change, Services provide a stable network endpoint for your applications. They act as an abstraction layer, allowing other applications or external users to communicate with your Pods without knowing their individual IP addresses.

Key types of Services:

  • ClusterIP (Default): Exposes the Service on an internal IP in the cluster. It’s only reachable from within the cluster.
  • NodePort: Exposes the Service on each Node’s IP at a static port (the NodePort). Makes the Service accessible from outside the cluster.
  • LoadBalancer: Exposes the Service externally using a cloud provider’s load balancer (e.g., Azure Load Balancer for AKS). This is the standard way to expose public-facing applications.
  • ExternalName: Maps the Service to a DNS name, not to a selector. Used for external services.

Code Example: Nginx Deployment with LoadBalancer Service Let’s expose our Nginx application to the internet using a LoadBalancer Service.

# nginx-deployment-service.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx-container
        image: nginx:latest
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  selector:
    app: nginx
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  type: LoadBalancer

Deploy the resources:

kubectl apply -f nginx-deployment-service.yaml

Get the external IP address of the Service (it might take a few minutes for the LoadBalancer to provision):

kubectl get service nginx-service

Look for the EXTERNAL-IP column. Once it’s assigned, you can access your Nginx application through that IP in your web browser.

Clean up:

kubectl delete -f nginx-deployment-service.yaml

Exercise/Mini-Challenge:

  1. Create a Deployment for a simple “hello-world” web application (you can use an image like gcr.io/google-samples/node-hello:1.0).
  2. Create a NodePort Service to expose this application on port 30080.
  3. Access the application from your local machine using one of your AKS node’s IP addresses and the NodePort.
  4. Change the Service type to LoadBalancer and observe the external IP.

2.1.4 Namespaces

Namespaces provide a mechanism for isolating groups of resources within a single Kubernetes cluster. They are crucial for organizing resources, managing access control, and preventing naming collisions in multi-tenant or large clusters.

Key characteristics:

  • Resource Scoping: Resources in one namespace are logically isolated from resources in other namespaces.
  • Access Control: You can apply Role-Based Access Control (RBAC) policies to specific namespaces, limiting who can access or modify resources within them.
  • Default Namespaces: Kubernetes comes with default namespaces: default, kube-system (for Kubernetes system components), kube-public, and kube-node-lease.

Code Example: Deploying into a Custom Namespace

First, create a new namespace:

kubectl create namespace dev-environment

Now, let’s deploy our Nginx Pod into this new namespace.

# nginx-pod-dev.yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-nginx-dev-pod
  namespace: dev-environment # Specify the namespace
  labels:
    app: nginx
spec:
  containers:
  - name: nginx-container
    image: nginx:latest
    ports:
    - containerPort: 80

Deploy the Pod:

kubectl apply -f nginx-pod-dev.yaml

Check pods in the dev-environment namespace:

kubectl get pods --namespace dev-environment
# or
kubectl get pods -n dev-environment

To see pods across all namespaces:

kubectl get pods --all-namespaces

Clean up:

kubectl delete -f nginx-pod-dev.yaml --namespace dev-environment
kubectl delete namespace dev-environment

Exercise/Mini-Challenge:

  1. Create two namespaces: team-alpha and team-beta.
  2. Deploy a simple Deployment with 3 replicas of nginx into team-alpha.
  3. Deploy a simple Deployment with 2 replicas of httpd into team-beta.
  4. Verify that you can only see the nginx pods when listing resources in team-alpha and httpd pods in team-beta.
  5. Try to delete a resource in team-alpha while your current context is set to team-beta (without specifying -n team-alpha). Observe the error.

2.1.5 ConfigMaps and Secrets

ConfigMaps and Secrets are Kubernetes objects used to store configuration data and sensitive information, respectively. They allow you to decouple configuration from your application code, making your applications more portable and easier to manage.

ConfigMaps

ConfigMaps store non-confidential data in key-value pairs. They are useful for storing environment variables, command-line arguments, or configuration files.

Key characteristics:

  • Non-sensitive data: Intended for general configuration, not sensitive information.
  • Flexible consumption: Can be consumed as environment variables, command-line arguments, or files mounted into Pods.

Code Example: ConfigMap for Nginx configuration Let’s use a ConfigMap to provide a custom Nginx configuration.

# nginx-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-custom-config
data:
  nginx.conf: |
    server {
      listen 80;
      location / {
        root /usr/share/nginx/html;
        index index.html index.htm;
      }
      location /healthz {
        return 200 'OK';
        add_header Content-Type text/plain;
      }
    }    

Create the ConfigMap:

kubectl apply -f nginx-configmap.yaml

Now, deploy an Nginx Pod that uses this ConfigMap. We’ll mount the nginx.conf as a file.

# nginx-pod-with-configmap.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx-config-pod
  labels:
    app: nginx-config
spec:
  containers:
  - name: nginx-container
    image: nginx:latest
    ports:
    - containerPort: 80
    volumeMounts:
    - name: nginx-config-volume
      mountPath: /etc/nginx/conf.d/
      readOnly: true
  volumes:
  - name: nginx-config-volume
    configMap:
      name: nginx-custom-config
      items:
      - key: nginx.conf
        path: default.conf # Mounts nginx.conf from ConfigMap as default.conf in the container

Deploy the Pod:

kubectl apply -f nginx-pod-with-configmap.yaml

You can now verify that the custom health endpoint /healthz works if you expose the Pod with a service. Clean up:

kubectl delete -f nginx-pod-with-configmap.yaml
kubectl delete -f nginx-configmap.yaml

Secrets

Secrets are similar to ConfigMaps but are designed for storing sensitive information like passwords, API keys, and certificates. Kubernetes provides mechanisms to keep Secrets secure (though they are not encrypted at rest by default in all Kubernetes distributions; AKS offers encryption at rest for etcd).

Key characteristics:

  • Sensitive data: Used for passwords, tokens, keys, etc.
  • Base64 encoded: By default, Secrets are base64 encoded when stored in etcd, but this is not encryption. Anyone with API access can decode them.
  • Mount as files or environment variables: Can be mounted as files into Pods or injected as environment variables. Mounting as files is generally preferred.

Code Example: Secret for a database password Let’s create a Secret for a mock database password.

# Create base64 encoded strings
echo -n 'mysecretpassword' | base64
# Output will be something like: bXlzZWNyZXRwYXNzd29yZA==
# db-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: my-db-secret
type: Opaque # General-purpose Secret type
data:
  DB_PASSWORD: bXlzZWNyZXRwYXNzd29yZA== # Base64 encoded 'mysecretpassword'

Create the Secret:

kubectl apply -f db-secret.yaml

Now, let’s deploy a Pod that consumes this Secret as an environment variable.

# app-pod-with-secret.yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-app-secret-pod
spec:
  containers:
  - name: my-app-container
    image: busybox:latest
    command: ["sh", "-c", "echo 'Application started...'; echo 'DB Password: '$DB_PASSWORD; sleep 3600"]
    env:
    - name: DB_PASSWORD
      valueFrom:
        secretKeyRef:
          name: my-db-secret
          key: DB_PASSWORD

Deploy the Pod:

kubectl apply -f app-pod-with-secret.yaml

Check the logs to see the environment variable being used:

kubectl logs my-app-secret-pod

You should see “DB Password: mysecretpassword”.

Clean up:

kubectl delete -f app-pod-with-secret.yaml
kubectl delete -f db-secret.yaml

Exercise/Mini-Challenge (ConfigMaps and Secrets):

  1. Create a ConfigMap named app-settings with two keys: API_URL set to https://api.example.com and DEBUG_MODE set to true.
  2. Create an Opaque Secret named api-token-secret with a key API_TOKEN containing a base64 encoded mock API token (e.g., my-super-secret-token).
  3. Create a Pod that uses the alpine/git image and mounts app-settings as environment variables and api-token-secret as a file at /etc/secrets/token.
  4. Inside the Pod, try to cat the mounted secret file and print the environment variables.

2.2 Helm Core Concepts

2.2.1 Charts

A Helm Chart is a collection of files that describe a related set of Kubernetes resources. Think of it as a package for your Kubernetes application. Charts can deploy anything from a simple web app to a complex microservices architecture.

Structure of a Helm Chart: A typical Helm Chart has the following directory structure:

mychart/
  Chart.yaml          # A YAML file containing information about the chart
  values.yaml         # The default values for this chart's templates
  charts/             # A directory containing any dependent charts (subcharts)
  templates/          # A directory of templates that, when combined with values, generate Kubernetes manifest files
  templates/NOTES.txt # A short plain text document describing the chart's deployment

Chart.yaml: This file contains metadata about the Chart, such as its name, version, and API version.

# mychart/Chart.yaml
apiVersion: v2
name: mywebapp
description: A Helm chart for my web application
version: 0.1.0 # Chart version
appVersion: "1.0.0" # Version of the application it deploys

values.yaml: This file defines the default configuration values for your application. These values can be overridden during deployment.

# mychart/values.yaml
replicaCount: 1

image:
  repository: nginx
  tag: latest
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 80

templates/: This directory contains Kubernetes manifest files (e.g., deployment.yaml, service.yaml) that are templated using Go template syntax. Helm processes these templates with the values provided.

Code Example: Creating a Simple Helm Chart Let’s create a basic Helm Chart for our Nginx application.

  1. Create a new chart:

    helm create mynginxapp
    

    This command generates a boilerplate chart structure.

  2. Explore the generated files: Navigate into the mynginxapp directory and inspect Chart.yaml, values.yaml, and the files in templates/.

  3. Customize values.yaml (optional for this example): For this basic deploy, the default values.yaml should be fine. It will deploy an Nginx container.

  4. Install the chart:

    helm install my-nginx-release mynginxapp/
    

    my-nginx-release is the name given to this specific deployment of the chart.

  5. Verify the deployment:

    kubectl get pods -l app.kubernetes.io/instance=my-nginx-release
    kubectl get svc -l app.kubernetes.io/instance=my-nginx-release
    
  6. Upgrade the chart (e.g., change replica count): Edit mynginxapp/values.yaml and change replicaCount: 1 to replicaCount: 3.

    Now, upgrade the release:

    helm upgrade my-nginx-release mynginxapp/
    

    Verify the new replica count:

    kubectl get pods -l app.kubernetes.io/instance=my-nginx-release
    
  7. Rollback (if needed): If an upgrade goes wrong, you can roll back to a previous revision.

    helm history my-nginx-release
    # Note the REVISION number of the previous stable release (e.g., 1)
    helm rollback my-nginx-release 1
    
  8. Uninstall the chart:

    helm uninstall my-nginx-release
    

    This removes all resources associated with the release.

Exercise/Mini-Challenge:

  1. Create a new Helm Chart named my-backend-app.
  2. Modify its values.yaml to include an image.tag for ubuntu:latest and command and args to run a simple sleep 3600 command.
  3. Install the chart.
  4. Upgrade the chart to change the image.tag to ubuntu:22.04.
  5. List all installed Helm releases.

2.2.2 Templates and Values

Helm Charts are powerful because they use Go templating to create dynamic Kubernetes manifests.

  • Templates: Files in the templates/ directory use Go template syntax (e.g., {{ .Values.replicaCount }}) to inject values.
  • Values: Data supplied to the templates, primarily from values.yaml, but also from --set flags on the helm install or helm upgrade command, or separate -f values files.

Code Example: Customizing a Chart with Templates and Values Let’s customize the mynginxapp chart to add an environment variable.

  1. Edit mynginxapp/templates/deployment.yaml: Find the container section and add an env block:

      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          env:
            - name: MY_ENV_VAR
              value: "{{ .Values.myCustomEnvVar }}" # New environment variable
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
    
  2. Edit mynginxapp/values.yaml: Add the myCustomEnvVar key:

    replicaCount: 1
    
    image:
      repository: nginx
      tag: latest
      pullPolicy: IfNotPresent
    
    service:
      type: ClusterIP
      port: 80
    
    myCustomEnvVar: "Hello from Helm!" # Default value for the new env var
    
  3. Install or upgrade the chart:

    helm install my-nginx-custom mynginxapp/
    # or if already installed
    helm upgrade my-nginx-custom mynginxapp/
    
  4. Verify the environment variable: Get the name of your Nginx pod:

    kubectl get pods -l app.kubernetes.io/instance=my-nginx-custom -o custom-columns=NAME:.metadata.name --no-headers
    

    Then, exec into the pod and check environment variables:

    POD_NAME=$(kubectl get pods -l app.kubernetes.io/instance=my-nginx-custom -o custom-columns=NAME:.metadata.name --no-headers)
    kubectl exec -it $POD_NAME -- env | grep MY_ENV_VAR
    

    You should see MY_ENV_VAR=Hello from Helm!.

Clean up:

helm uninstall my-nginx-custom

Exercise/Mini-Challenge:

  1. In your my-backend-app chart, add a new ConfigMap template (templates/configmap.yaml).
  2. Define a key in values.yaml like config.message (e.g., config.message: "Welcome to my app!").
  3. Have the ConfigMap use {{ .Values.config.message }} as its data.
  4. Mount this ConfigMap as a file into your my-backend-app Pod and ensure the Pod’s command prints the content of the mounted file.
  5. Install the chart and verify.

2.2.3 Releases and History

When you install a Helm Chart, it creates a Release. A release is an instance of a chart running in a Kubernetes cluster. Helm tracks the state of each release, including its configuration and revisions.

  • Helm Releases: Each helm install or helm upgrade operation creates a new revision for a release.
  • History: You can view the history of a release, including past configurations and statuses, using helm history. This is crucial for debugging and understanding changes over time.

Code Example: (Covered in 2.2.1 Chart section with helm install, helm upgrade, helm history, helm rollback).

Exercise/Mini-Challenge:

  1. Install a simple chart (e.g., bitnami/nginx from a Helm repository).
  2. Perform two upgrades, changing a different value each time (e.g., replicaCount, then service.type).
  3. Check the helm history for the release.
  4. Rollback to the first revision.

3. Intermediate Topics

Now that you have a solid foundation, let’s explore more advanced aspects of Kubernetes and Helm with AKS.

3.1 Advanced Kubernetes Concepts

3.1.1 Ingress Controllers

While Services of type LoadBalancer expose a single application, Ingress manages external access to services in a cluster, typically HTTP/S. An Ingress resource defines rules for routing external HTTP/S traffic to internal cluster Services. An Ingress Controller (like Nginx Ingress Controller or Azure Application Gateway Ingress Controller - AGIC) is the actual component that watches the Ingress resources and acts upon them.

Why Ingress?

  • Single IP for multiple services: Expose multiple services through a single external IP address.
  • Host-based routing: Route traffic to different services based on the hostname (e.g., app1.example.com to Service A, app2.example.com to Service B).
  • Path-based routing: Route traffic based on URL paths (e.g., /api to Service A, /web to Service B).
  • SSL/TLS termination: Handle SSL certificates at the Ingress layer.

Code Example: Deploying Nginx Ingress Controller and an Ingress Resource

First, let’s install the Nginx Ingress Controller using Helm:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx \
    --namespace ingress-basic --create-namespace \
    --set controller.replicaCount=2 \
    --set controller.nodeSelector."kubernetes\.io/os"=linux \
    --set defaultBackend.nodeSelector."kubernetes\.io/os"=linux

This will deploy the Nginx Ingress Controller and a LoadBalancer Service to expose it. Get the external IP of the Ingress Controller:

kubectl get svc -n ingress-basic nginx-ingress-ingress-nginx-controller

Note down the EXTERNAL-IP.

Now, let’s deploy a simple Nginx application and an Ingress resource to route traffic to it.

# app-with-ingress.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp-deployment
  labels:
    app: webapp
spec:
  replicas: 2
  selector:
    matchLabels:
      app: webapp
  template:
    metadata:
      labels:
        app: webapp
    spec:
      containers:
      - name: webapp-container
        image: nginx:latest
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: webapp-service
spec:
  selector:
    app: webapp
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  type: ClusterIP # Internal service
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: webapp-ingress
  annotations:
    # Use the Nginx Ingress Controller
    kubernetes.io/ingress.class: nginx
spec:
  rules:
  - host: myapp.example.com # Replace with your desired hostname or IP
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: webapp-service
            port:
              number: 80

Important: For myapp.example.com to work, you would typically configure your DNS provider to point myapp.example.com to the EXTERNAL-IP of your Nginx Ingress Controller LoadBalancer. For local testing, you can modify your hosts file (e.g., /etc/hosts on Linux/macOS or C:\Windows\System32\drivers\etc\hosts on Windows) to map the IP to the hostname.

Deploy the application and Ingress:

kubectl apply -f app-with-ingress.yaml

Now, if you access http://myapp.example.com (or the IP if you used hosts file), you should see the Nginx welcome page.

Clean up:

kubectl delete -f app-with-ingress.yaml
helm uninstall nginx-ingress --namespace ingress-basic

Exercise/Mini-Challenge:

  1. Install the Nginx Ingress Controller (if not already installed).
  2. Deploy two different web applications (e.g., nginx and httpd), each with its own ClusterIP Service.
  3. Create an Ingress resource that routes traffic to nginx.example.com to the Nginx service and httpd.example.com to the Httpd service.
  4. Verify routing using curl or by modifying your hosts file.

3.1.2 Persistent Volumes and Persistent Volume Claims

Containers are ephemeral by nature, meaning any data stored inside them is lost when the container restarts or is deleted. For stateful applications (like databases), you need a way to store data persistently. Kubernetes addresses this with Persistent Volumes (PVs) and Persistent Volume Claims (PVCs).

  • Persistent Volume (PV): A piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned by a storage class. It’s a cluster resource, independent of a Pod’s lifecycle.
  • Persistent Volume Claim (PVC): A request for storage by a user. It consumes PV resources. Pods then use PVCs to access the storage.

In AKS, Azure provides dynamic provisioning of storage through Storage Classes. When you create a PVC, AKS can automatically provision an Azure Disk or Azure Files resource.

Code Example: Nginx with Persistent Storage Let’s deploy an Nginx application that serves content from a Persistent Volume.

# nginx-pvc-deployment.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nginx-pvc
spec:
  accessModes:
    - ReadWriteOnce # Can be mounted as read-write by a single node
  resources:
    requests:
      storage: 1Gi # Request 1 Gigabyte of storage
  storageClassName: default # Use the default StorageClass for Azure Disks
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-pv-deployment
  labels:
    app: nginx-pv
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-pv
  template:
    metadata:
      labels:
        app: nginx-pv
    spec:
      volumes:
      - name: nginx-persistent-storage
        persistentVolumeClaim:
          claimName: nginx-pvc
      containers:
      - name: nginx-container
        image: nginx:latest
        ports:
        - containerPort: 80
        volumeMounts:
        - name: nginx-persistent-storage
          mountPath: /usr/share/nginx/html # Mount the PV to serve HTML content
        lifecycle:
          postStart: # Populate some content after container starts
            exec:
              command: ["/bin/sh", "-c", "echo '<h1>Hello from Persistent Volume!</h1>' > /usr/share/nginx/html/index.html"]
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-pv-service
spec:
  selector:
    app: nginx-pv
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  type: LoadBalancer

Deploy the resources:

kubectl apply -f nginx-pvc-deployment.yaml

Check the PVC and Deployment status. Once the LoadBalancer gets an IP, access it to see “Hello from Persistent Volume!”.

Even if you delete and recreate the nginx-pv-deployment (but not the PVC), the content will persist.

kubectl delete deployment nginx-pv-deployment
# Wait for pods to terminate
kubectl apply -f nginx-pvc-deployment.yaml # Deploy again, the data is still there

Clean up:

kubectl delete -f nginx-pvc-deployment.yaml

Important: Deleting the PVC will also delete the underlying Azure Disk unless the reclaimPolicy of the StorageClass is set to Retain. The default StorageClass in AKS usually has Delete.

Exercise/Mini-Challenge:

  1. Create a PVC requesting 2Gi of storage.
  2. Deploy a Pod running ubuntu that mounts this PVC to /data.
  3. Inside the Pod, create a file message.txt with some content in /data.
  4. Delete the Pod, then create a new Pod that mounts the same PVC to /data.
  5. Verify that message.txt still exists in the new Pod.
  6. Delete the Pod and PVC.

3.1.3 Horizontal Pod Autoscaler (HPA)

The Horizontal Pod Autoscaler (HPA) automatically scales the number of Pod replicas in a Deployment or ReplicaSet based on observed CPU utilization or other select metrics. This ensures your application can handle varying loads efficiently without manual intervention.

How HPA works:

  • HPA continuously monitors the specified metrics (e.g., average CPU utilization) of the Pods targeted by a Deployment.
  • If the metrics exceed a predefined threshold, HPA increases the number of Pod replicas.
  • If the metrics fall below the threshold, HPA decreases the number of Pod replicas.

Prerequisites: For CPU/Memory-based HPA, your Pods must have resource requests defined. For custom metrics, you need a metrics server deployed (which is typically available in AKS by default).

Code Example: HPA for Nginx Deployment Let’s configure HPA for our Nginx Deployment to scale based on CPU utilization.

# nginx-deployment-hpa.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hpa-nginx-deployment
  labels:
    app: hpa-nginx
spec:
  replicas: 1 # Start with 1 replica
  selector:
    matchLabels:
      app: hpa-nginx
  template:
    metadata:
      labels:
        app: hpa-nginx
    spec:
      containers:
      - name: nginx-container
        image: nginx:latest
        resources:
          requests:
            cpu: "100m" # Request 100 millicores of CPU
          limits:
            cpu: "200m" # Limit to 200 millicores
        ports:
        - containerPort: 80
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hpa-nginx-deployment
  minReplicas: 1
  maxReplicas: 5 # Scale up to 5 replicas
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50 # Target 50% CPU utilization

Deploy the resources:

kubectl apply -f nginx-deployment-hpa.yaml

Check HPA status:

kubectl get hpa

Initially, it will show 1 replica.

To simulate load, you can use a busybox Pod to continuously hit the Nginx service (if exposed, or use a port-forward). For simplicity, we won’t show load generation here, but you would see the replica count increase if CPU utilization goes above 50%.

Clean up:

kubectl delete -f nginx-deployment-hpa.yaml

Exercise/Mini-Challenge:

  1. Deploy a simple php-apache application (e.g., kubernetes/examples/hpa/php-apache). Ensure the Deployment has CPU requests defined.
  2. Create an HPA that targets this Deployment, with minReplicas: 1, maxReplicas: 10, and targetCPUUtilizationPercentage: 50.
  3. Simulate load on the php-apache service. You can use a busybox pod:
    kubectl run -it --rm load-generator --image=busybox:latest -- /bin/sh
    # Inside busybox:
    # while true; do wget -q -O- http://php-apache; done
    
  4. Observe the HPA increasing the replica count using kubectl get hpa -w.
  5. Stop the load generator and observe the HPA scaling down.

3.1.4 Network Policies

Network Policies allow you to define rules for how Pods communicate with each other and with external network endpoints. By default, Pods are non-isolated and can accept traffic from any source. Network Policies enable a zero-trust approach by enforcing communication segmentation.

Key concepts:

  • Isolation: By default, Pods in a namespace are isolated if a Network Policy exists for that namespace.
  • Ingress/Egress rules: Define rules for incoming (Ingress) and outgoing (Egress) traffic.
  • Selectors: Use Pod selectors and Namespace selectors to specify which Pods or Namespaces the policy applies to.

Code Example: Restricting Nginx access Let’s restrict our Nginx application to only be accessible from Pods with a specific label.

First, create a test-app Deployment that will try to access Nginx.

# restricted-nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: restricted-nginx-deployment
  labels:
    app: restricted-nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: restricted-nginx
  template:
    metadata:
      labels:
        app: restricted-nginx
    spec:
      containers:
      - name: nginx-container
        image: nginx:latest
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: restricted-nginx-service
spec:
  selector:
    app: restricted-nginx
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-client-deployment
  labels:
    app: test-client
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test-client
  template:
    metadata:
      labels:
        app: test-client
    spec:
      containers:
      - name: busybox-container
        image: busybox:latest
        command: ["sh", "-c", "sleep 3600"]

Deploy these:

kubectl apply -f restricted-nginx.yaml

Get the cluster IP of the restricted-nginx-service:

kubectl get svc restricted-nginx-service

Note the CLUSTER-IP.

Now, exec into the test-client pod and try to wget the Nginx service:

CLIENT_POD=$(kubectl get pods -l app=test-client -o custom-columns=NAME:.metadata.name --no-headers)
NGINX_IP=$(kubectl get svc restricted-nginx-service -o jsonpath='{.spec.clusterIP}')
kubectl exec -it $CLIENT_POD -- wget -O- -T 2 http://$NGINX_IP

It should succeed (you’ll see the Nginx welcome page).

Now, let’s apply a Network Policy to isolate Nginx.

# nginx-network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-specific-app
  namespace: default # Policy applies to pods in the default namespace
spec:
  podSelector:
    matchLabels:
      app: restricted-nginx # This policy applies to pods with label app: restricted-nginx
  policyTypes:
    - Ingress # Only applies to Ingress traffic
  ingress:
    - from:
      - podSelector:
          matchLabels:
            app: test-client # Only allow traffic from pods with label app: test-client

Apply the policy:

kubectl apply -f nginx-network-policy.yaml

Wait a few seconds for the policy to take effect. Then, try to wget again from the test-client pod:

kubectl exec -it $CLIENT_POD -- wget -O- -T 2 http://$NGINX_IP

This time, it should fail (connection timed out or similar error), because the test-client pod is not explicitly allowed by the policy (or, rather, the policy states that only pods matching the podSelector under from are allowed).

Clean up:

kubectl delete -f nginx-network-policy.yaml
kubectl delete -f restricted-nginx.yaml

Exercise/Mini-Challenge:

  1. Deploy two Deployments: frontend (with label app: frontend) and backend (with label app: backend). Each should have a ClusterIP Service.
  2. Deploy a busybox Pod (label app: admin-tool).
  3. Create a Network Policy that allows frontend Pods to communicate with backend Pods on a specific port (e.g., 8080) and allows the admin-tool Pod to communicate with backend Pods on a different port (e.g., 9000), but no other ingress traffic to backend.
  4. Test connectivity from frontend and admin-tool Pods to backend, and try from a third, unprivileged Pod to verify the policy.

3.1.5 RBAC (Role-Based Access Control)

Role-Based Access Control (RBAC) is a method of regulating access to computer or network resources based on the roles of individual users within an organization. In Kubernetes, RBAC allows you to define who can do what to which resources.

Key concepts:

  • Role: Defines permissions within a specific namespace.
  • ClusterRole: Defines permissions across the entire cluster.
  • RoleBinding: Grants the permissions defined in a Role to a user or ServiceAccount within a specific namespace.
  • ClusterRoleBinding: Grants the permissions defined in a ClusterRole to a user or ServiceAccount across the entire cluster.
  • ServiceAccount: An identity used by processes running in Pods. Pods that access the Kubernetes API do so using a ServiceAccount.

In AKS, you typically integrate Kubernetes RBAC with Azure Active Directory (Azure AD) RBAC. This allows you to manage Kubernetes permissions using your existing Azure AD identities.

Code Example: Granting Read-Only Access to a Namespace

  1. Create a new namespace and a test user/group in Azure AD: This step is external to Kubernetes manifests. You would create an Azure AD Group (e.g., aks-dev-readers) and add Azure AD users to it. (For this example, we’ll assume an Azure AD group ID exists, or you can simulate by creating a new ServiceAccount.)

  2. Create a Role for read-only access:

    # readonly-role.yaml
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      name: pod-reader
      namespace: dev-environment # Define role in 'dev-environment' namespace
    rules:
    - apiGroups: [""] # "" indicates the core API group
      resources: ["pods", "pods/log"]
      verbs: ["get", "watch", "list"]
    - apiGroups: ["apps"]
      resources: ["deployments"]
      verbs: ["get", "watch", "list"]
    
  3. Create a ServiceAccount and RoleBinding: Let’s create a ServiceAccount and bind the pod-reader role to it.

    # readonly-serviceaccount-rolebinding.yaml
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: dev-reader-sa
      namespace: dev-environment
    

apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: dev-reader-binding namespace: dev-environment subjects:

  • kind: ServiceAccount name: dev-reader-sa namespace: dev-environment roleRef: kind: Role name: pod-reader apiGroup: rbac.authorization.k8s.io

Deploy these:

kubectl create namespace dev-environment
kubectl apply -f readonly-role.yaml
kubectl apply -f readonly-serviceaccount-rolebinding.yaml
  1. Test with the ServiceAccount: You can simulate using this ServiceAccount by creating a Pod that uses it.

    # test-pod-with-sa.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-sa-pod
      namespace: dev-environment
    spec:
      serviceAccountName: dev-reader-sa # Assign the ServiceAccount
      containers:
      - name: busybox-container
        image: busybox:latest
        command: ["sh", "-c", "sleep 3600"]
    

    Deploy the pod: kubectl apply -f test-pod-with-sa.yaml

    Now, from your local machine, try to list deployments as dev-reader-sa (this is an advanced kubectl trick for testing).

    # Get the token for the ServiceAccount
    SA_TOKEN=$(kubectl create token dev-reader-sa -n dev-environment --duration 1h)
    
    # Use the token to list deployments in dev-environment
    kubectl get deployments -n dev-environment --token=$SA_TOKEN
    # This should succeed, as the 'pod-reader' role allows listing deployments.
    
    # Try to delete a deployment (should fail due to insufficient permissions)
    kubectl delete deployment -n dev-environment my-deployment --token=$SA_TOKEN
    # You should get an "Error from server (Forbidden)" message.
    

    This demonstrates the concept of restricted access. For actual Azure AD integration, you’d map Azure AD groups to ClusterRoleBindings.

Clean up:

kubectl delete -f test-pod-with-sa.yaml -n dev-environment
kubectl delete -f readonly-serviceaccount-rolebinding.yaml -n dev-environment
kubectl delete -f readonly-role.yaml -n dev-environment
kubectl delete namespace dev-environment

Exercise/Mini-Challenge:

  1. Create a namespace hr-app.
  2. Create a ServiceAccount named hr-operator in hr-app.
  3. Define a Role called hr-app-full-access in the hr-app namespace that grants create, get, list, update, delete, and watch verbs on pods and deployments.
  4. Bind the hr-operator ServiceAccount to the hr-app-full-access Role.
  5. Try to list pods in hr-app using the hr-operator ServiceAccount token (similar to the example above).
  6. Try to list pods in kube-system using the hr-operator ServiceAccount token (should fail).

3.2 Advanced Helm Concepts

3.2.1 Chart Dependencies (Subcharts)

Complex applications often consist of multiple components. Helm allows you to manage these components as subcharts within a parent chart. This enables modularity and reusability.

Key characteristics:

  • Modularization: Break down large applications into smaller, manageable charts.
  • Reusability: Use existing, stable charts (e.g., from Bitnami) as dependencies.
  • Dependencies: Defined in the Chart.yaml of the parent chart.

Code Example: Application with a Database Subchart

Let’s create a parent chart (my-full-app) that depends on a database subchart (e.g., Bitnami’s PostgreSQL).

  1. Create the parent chart:

    helm create my-full-app
    cd my-full-app
    
  2. Add the dependency in Chart.yaml: Edit my-full-app/Chart.yaml and add a dependencies section:

    # my-full-app/Chart.yaml
    apiVersion: v2
    name: my-full-app
    description: A Helm chart for my full application with a database
    version: 0.1.0
    appVersion: "1.0.0"
    
    dependencies:
      - name: postgresql
        version: 12.x.x # Use a specific major version, check Bitnami repo for latest
        repository: https://charts.bitnami.com/bitnami
        condition: postgresql.enabled # Enable/disable subchart with values.yaml
    

    (Note: Check the Bitnami PostgreSQL chart for its latest stable version.)

  3. Update Helm dependencies: This downloads the subchart into the charts/ directory.

    helm dependency update .
    

    You should now see charts/postgresql-12.x.x.tgz.

  4. Configure subchart values (optional for this example): You can override default values of the subchart by adding them under the subchart’s name in my-full-app/values.yaml. For example, to set the PostgreSQL password:

    # my-full-app/values.yaml
    replicaCount: 1
    
    image:
      repository: nginx
      tag: latest
      pullPolicy: IfNotPresent
    
    service:
      type: ClusterIP
      port: 80
    
    # PostgreSQL subchart specific values
    postgresql:
      enabled: true # Explicitly enable the subchart
      auth:
        postgresPassword: "mysecretpostgrespassword"
    
  5. Install the parent chart:

    helm install my-full-stack my-full-app/
    
  6. Verify both components:

    kubectl get pods -l app.kubernetes.io/instance=my-full-stack
    kubectl get svc -l app.kubernetes.io/instance=my-full-stack
    

    You should see both your Nginx pod (from the parent chart) and the PostgreSQL pod (from the subchart).

Clean up:

helm uninstall my-full-stack

Exercise/Mini-Challenge:

  1. Create a parent Helm Chart called ecommerce-stack.
  2. Add mongodb (from Bitnami) and redis (from Bitnami) as subchart dependencies in Chart.yaml.
  3. Update the dependencies using helm dependency update.
  4. In values.yaml of ecommerce-stack, configure custom passwords for both MongoDB and Redis subcharts.
  5. Install ecommerce-stack and verify that all pods and services are running.

3.2.2 Conditional Logic and Loops in Templates

Helm’s Go templating language allows for powerful conditional logic (if/else) and iteration (range) to create flexible and dynamic manifests.

Use cases:

  • Conditional resource creation: Create a resource only if a certain value is set (e.g., if .Values.ingress.enabled).
  • Dynamic configuration: Generate multiple environment variables or port mappings based on a list in values.yaml.

Code Example: Conditional Ingress and Multiple Ports

Let’s modify my-full-app to conditionally create an Ingress and expose multiple ports if defined.

  1. Edit my-full-app/values.yaml: Add ingress.enabled and service.additionalPorts.

    # my-full-app/values.yaml
    # ... other values ...
    
    ingress:
      enabled: false
      host: myapp.local
    
    service:
      type: ClusterIP
      port: 80
      additionalPorts: # List of additional ports
        - name: https
          port: 443
          targetPort: 443
          protocol: TCP
        - name: metrics
          port: 9090
          targetPort: 9090
          protocol: TCP
    
  2. Edit my-full-app/templates/service.yaml: Add a range loop to create additional ports.

    # my-full-app/templates/service.yaml
    apiVersion: v1
    kind: Service
    metadata:
      name: {{ include "my-full-app.fullname" . }}
      labels:
        {{- include "my-full-app.labels" . | nindent 4 }}
    spec:
      type: {{ .Values.service.type }}
      ports:
        - port: {{ .Values.service.port }}
          targetPort: http
          protocol: TCP
          name: http
        {{- range .Values.service.additionalPorts }}
        - name: {{ .name }}
          port: {{ .port }}
          targetPort: {{ .targetPort }}
          protocol: {{ .protocol }}
        {{- end }}
      selector:
        {{- include "my-full-app.selectorLabels" . | nindent 4 }}
    
  3. Create my-full-app/templates/ingress.yaml (new file): Use an if block to conditionally create the Ingress.

    # my-full-app/templates/ingress.yaml
    {{- if .Values.ingress.enabled }}
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: {{ include "my-full-app.fullname" . }}
      labels:
        {{- include "my-full-app.labels" . | nindent 4 }}
      {{- with .Values.ingress.annotations }}
      annotations:
        {{- toYaml . | nindent 4 }}
      {{- end }}
    spec:
      rules:
        - host: {{ .Values.ingress.host }}
          http:
            paths:
              - path: /
                pathType: Prefix
                backend:
                  service:
                    name: {{ include "my-full-app.fullname" . }}
                    port:
                      number: {{ .Values.service.port }}
    {{- end }}
    

    (For this Ingress to work, you might need an Ingress Controller installed separately, like Nginx Ingress Controller.)

  4. Install the chart with Ingress enabled:

    helm install my-conditional-app my-full-app/ --set ingress.enabled=true --set ingress.host=myconditionalapp.local
    
  5. Verify: Check the Service for multiple ports, and the Ingress resource for conditional creation.

    kubectl get svc my-conditional-app
    kubectl get ing my-conditional-app
    

Clean up:

helm uninstall my-conditional-app

Exercise/Mini-Challenge:

  1. In your ecommerce-stack chart, modify the deployment.yaml template.
  2. Add a conditional if block that, if app.envVars.enabled is true in values.yaml, iterates through a list app.envVars.list and creates environment variables for your application Pod.
  3. Test by installing the chart with app.envVars.enabled: true and a few custom environment variables, then with app.envVars.enabled: false.

4. Advanced Topics and Best Practices

This section delves into production-ready patterns, operations, and security for your Helm and Kubernetes deployments on AKS.

4.1 Infrastructure as Code (IaC) with Terraform for AKS

Infrastructure as Code (IaC) is the practice of managing and provisioning infrastructure through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools. Terraform is a popular IaC tool that allows you to define and manage your Azure resources, including AKS clusters, declaratively.

Why use Terraform for AKS?

  • Reproducibility: Create identical environments reliably.
  • Version Control: Track infrastructure changes in Git.
  • Automation: Automate the provisioning of complex infrastructure.
  • State Management: Terraform keeps a state file to map real-world resources to your configuration.

Key Terraform resources for AKS:

  • azurerm_resource_group: To manage Azure Resource Groups.
  • azurerm_kubernetes_cluster: To provision the AKS cluster itself.
  • azurerm_kubernetes_cluster_node_pool: To manage additional node pools.
  • azurerm_container_registry: For Azure Container Registry.

Code Example: Provisioning AKS with Terraform

  1. Create a Terraform project directory:

    mkdir azure-aks-infra
    cd azure-aks-infra
    
  2. main.tf: Define the Azure provider and resources.

    # main.tf
    terraform {
      required_providers {
        azurerm = {
          source = "hashicorp/azurerm"
          version = "~> 3.0" # Pin to a major version
        }
      }
      backend "azurerm" { # Configure remote state storage in Azure Storage Account
        resource_group_name  = "tfstate-rg"
        storage_account_name = "tfstateuksouth2025"
        container_name       = "tfstate"
        key                  = "aks.terraform.tfstate"
      }
    }
    
    provider "azurerm" {
      features {}
    }
    
    resource "azurerm_resource_group" "aks_rg" {
      name     = "my-aks-terraform-rg"
      location = "East US"
    }
    
    resource "azurerm_kubernetes_cluster" "aks_cluster" {
      name                = "my-terraform-aks"
      location            = azurerm_resource_group.aks_rg.location
      resource_group_name = azurerm_resource_group.aks_rg.name
      dns_prefix          = "myterraformaks"
    
      default_node_pool {
        name       = "systempool"
        node_count = 2
        vm_size    = "Standard_DS2_v2"
        # only_critical_addons_enabled = true # Recommended for production
      }
    
      identity {
        type = "SystemAssigned"
      }
    
      tags = {
        environment = "dev"
        managed_by  = "terraform"
      }
    
      # Optional: Enable AKS Automatic or Deployment Safeguards for production
      # automatic_upgrade_channel = "stable"
      # node_auto_provisioning {
      #   enabled = true
      # }
      # deployment_safeguards {
      #   enabled = true
      # }
    
      # Optional: Private Cluster for enhanced security
      # private_cluster_enabled = true
      # private_dns_zone_id = "..." # Reference existing private DNS zone
    }
    
    output "aks_cluster_name" {
      value       = azurerm_kubernetes_cluster.aks_cluster.name
      description = "The name of the AKS cluster."
    }
    
    output "aks_kube_config" {
      value       = azurerm_kubernetes_cluster.aks_cluster.kube_config_raw
      sensitive   = true # Mark as sensitive to prevent plain text output
      description = "The raw Kubernetes configuration for the AKS cluster."
    }
    
  3. Initialize Terraform and create state storage (if not already done): Before terraform init, you need an Azure Storage Account and Container for remote state.

    # Create resource group for state
    az group create --name tfstate-rg --location eastus
    # Create storage account
    az storage account create --name tfstateuksouth2025 --resource-group tfstate-rg --location eastus --sku Standard_LRS
    # Create storage container
    az storage container create --name tfstate --account-name tfstateuksouth2025
    

    Now, initialize Terraform:

    terraform init
    
  4. Plan and Apply:

    terraform plan
    terraform apply --auto-approve # --auto-approve for automated pipelines, otherwise manually type 'yes'
    

    This will provision your AKS cluster.

  5. Get Kubeconfig: After apply, you can use the outputted kubeconfig:

    # Store the kube_config_raw output into a file
    terraform output -raw aks_kube_config > kubeconfig_aks
    # Set KUBECONFIG environment variable to use it
    export KUBECONFIG=$(pwd)/kubeconfig_aks
    # Test connection
    kubectl get nodes
    
  6. Clean up:

    terraform destroy --auto-approve
    az group delete --name tfstate-rg --yes --no-wait # Delete the resource group for state storage
    

Best Practices for Terraform and AKS:

  • Module Usage: For complex setups, use official Terraform modules (e.g., Azure/aks/azurerm) to create well-architected clusters.
  • State Management: Always use remote state (like Azure Storage Blob) and state locking to enable collaboration and prevent concurrent modifications.
  • Version Pinning: Pin your azurerm provider version to prevent unexpected changes due to new provider versions.
  • Prevent Destroy: For production clusters, use lifecycle { prevent_destroy = true } on azurerm_kubernetes_cluster to prevent accidental deletion.
  • Private Clusters: Enable private_cluster_enabled for enhanced security where appropriate.
  • Managed Identities: Leverage Managed Identities for AKS and your applications to interact with other Azure services securely.

Exercise/Mini-Challenge:

  1. Modify the main.tf to add an additional azurerm_kubernetes_cluster_node_pool named apppool with node_count = 1 and a different vm_size (e.g., Standard_D2_v3).
  2. Set only_critical_addons_enabled = true on the default_node_pool.
  3. Deploy the infrastructure and verify the two node pools.
  4. Implement automatic_upgrade_channel = "stable" in your azurerm_kubernetes_cluster resource.

4.2 Logging, Debugging, and Tracing

Observability is crucial for understanding the health and performance of your applications and infrastructure.

4.2.1 Logging with Azure Monitor and Container Insights

Logging provides a record of events happening within your applications and Kubernetes cluster. Azure Monitor Container Insights is a feature of Azure Monitor that monitors the performance of container workloads deployed to AKS. It collects metrics and logs from containers, nodes, and the control plane.

Key features:

  • Automatic collection: Collects logs, metrics, and events from AKS automatically.
  • Log Analytics Workspace: Stores collected data in a centralized Log Analytics Workspace for querying and analysis.
  • Pre-built dashboards: Provides out-of-the-box dashboards for cluster health, node performance, and container activity.
  • Kubelet logs: You can also get kubelet logs for node-level troubleshooting.

Enabling Container Insights (usually enabled by default when creating AKS via Azure portal/CLI): You can enable it during cluster creation or afterwards:

az aks create --resource-group myAKSResourceGroup --name myAKSCluster --enable-managed-identity --enable-addons monitoring
# Or for existing cluster:
az aks enable-addons --addons monitoring --name myAKSCluster --resource-group myAKSResourceGroup

Viewing Logs in Azure Portal:

  1. Navigate to your AKS cluster in the Azure portal.
  2. Under “Monitoring”, click “Insights”.
  3. Explore the various views (Cluster, Nodes, Controllers, Containers) to see performance metrics and logs.
  4. Use “Logs” (or “Log Analytics Workspace” directly) to write Kusto Query Language (KQL) queries.

Example KQL Queries:

  • Get container logs:
    ContainerLogV2
    | where TimeGenerated > ago(1h)
    | order by TimeGenerated desc
    
  • Get Pod CPU usage:
    KubePodInventory
    | where TimeGenerated > ago(1h)
    | summarize max(CpuUsageMs) by PodName
    | order by max_CpuUsageMs desc
    
  • Kubelet Logs: You can access kubelet logs for a specific node via SSH (kubectl debug node/<node-name> -it --image=mcr.microsoft.com/aks/fundamental/base-ubuntu:v0.0.12) and then chroot /host journalctl -u kubelet -o cat. Or directly via kubectl get --raw "/api/v1/nodes/nodename/proxy/logs/messages"|grep kubelet

Best Practices for Logging:

  • Centralized Logging: Always send your application logs to a centralized logging solution (Azure Monitor, Splunk, ELK, Grafana Loki).
  • Structured Logging: Use JSON or other structured formats for application logs to make them easier to parse and query.
  • Log Levels: Implement appropriate log levels (DEBUG, INFO, WARN, ERROR) in your applications.
  • Sensitive Data: Avoid logging sensitive information.

Exercise/Mini-Challenge:

  1. Deploy a simple application that logs messages (e.g., “INFO: Application started” every 5 seconds).
  2. Enable Container Insights on your AKS cluster if it’s not already.
  3. Go to the Log Analytics Workspace linked to your AKS cluster and write a KQL query to filter for your application’s logs.
  4. Modify the application to log an “ERROR: Something went wrong!” message occasionally and observe it in your logs.

4.2.2 Debugging

Debugging involves identifying and resolving issues in your applications and Kubernetes environment.

Common Kubernetes debugging steps:

  • kubectl get pods, kubectl describe pod <pod-name>: Check pod status, events, and configuration.
  • kubectl logs <pod-name>: View container logs.
  • kubectl exec -it <pod-name> -- /bin/sh: Get a shell into a running container for interactive debugging.
  • kubectl port-forward <pod-name> <local-port>:<container-port>: Access a service running inside a Pod from your local machine.
  • kubectl debug: A powerful command for creating ephemeral debug containers, especially useful for debugging distroless images or containers without a shell.
    kubectl debug -it <pod-name> --image=ubuntu:latest --share-processes
    
    This creates a new container in the same Pod, sharing its process namespace, allowing you to debug the main container from a full OS environment.
  • Inspect events: kubectl get events --all-namespaces can reveal scheduling issues, image pull failures, or other cluster-level problems.
  • Network troubleshooting: Use ping, curl, netstat from within a busybox or debug Pod to diagnose connectivity issues.

Debugging Pending Pods Example: (Refer to Core Concepts -> Debug Running Pods section in search results). If a Pod is Pending, kubectl describe pod is your first stop, looking at the Events section. Common reasons include insufficient resources (CPU/memory), node selectors not matching, or Persistent Volume issues.

Best Practices for Debugging:

  • Start small: Isolate the problem to the smallest possible component (Pod, Service, Ingress).
  • Check logs first: Application logs and Kubernetes events are often the quickest way to find clues.
  • Reproduce consistently: Try to reproduce the issue in a development environment.
  • Use ephemeral debug containers: Leverage kubectl debug for efficient in-cluster troubleshooting.

Exercise/Mini-Challenge:

  1. Create a Deployment that requests more CPU than any single node in your AKS cluster has (e.g., 4000m on a 2-core node).
  2. Observe the Pods in a Pending state.
  3. Use kubectl describe deployment and kubectl describe pod <pending-pod> to identify the reason for the FailedScheduling error.
  4. Correct the CPU request in your Deployment manifest and redeploy to resolve the issue.

4.2.3 Tracing

Tracing helps you understand the flow of requests through complex distributed systems (microservices). It visualizes how different services interact and where latency occurs.

Key concepts:

  • Spans: A unit of work within a trace, representing an operation (e.g., API call, database query).
  • Traces: A collection of spans that represent a single request’s journey through the system.
  • OpenTelemetry: A vendor-neutral set of APIs, SDKs, and tools to instrument, generate, collect, and export telemetry data (metrics, logs, and traces).
  • Azure Application Insights: An Application Performance Management (APM) service that can collect and visualize traces (among other telemetry).

Implementing Tracing:

  1. Instrument your applications: Use OpenTelemetry SDKs in your application code to generate trace data.
  2. Deploy an OpenTelemetry Collector: This component collects trace data from your applications and exports it to a tracing backend.
  3. Choose a tracing backend: Integrate with services like Azure Application Insights, Jaeger, or Zipkin to visualize traces.

Example (Conceptual): Application Instrumentation

# app.py (Python example with OpenTelemetry)
from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
    ConsoleSpanExporter,
    SimpleSpanProcessor,
)
from flask import Flask

# Configure TracerProvider
resource = Resource.create({"service.name": "my-web-app"})
provider = TracerProvider(resource=resource)
trace.set_tracer_provider(provider)

# Configure Jaeger Exporter (or Azure Application Insights exporter)
jaeger_exporter = JaegerExporter(
    agent_host_name="otel-collector.monitoring", # Or the IP of your collector
    agent_port=6831,
)
provider.add_span_processor(SimpleSpanProcessor(jaeger_exporter))

tracer = trace.get_tracer(__name__)

app = Flask(__name__)

@app.route("/")
def hello_world():
    with tracer.start_as_current_span("hello-request"):
        return "Hello, Traced World!"

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=80)

You would then deploy an OpenTelemetry Collector and a Jaeger/Azure Application Insights instance in your cluster to visualize these traces.

Best Practices for Tracing:

  • Consistency: Ensure all services in your distributed system are instrumented consistently.
  • Context Propagation: Use HTTP headers or other mechanisms to propagate trace context (trace ID, span ID) across service calls.
  • Sampling: Implement sampling to manage the volume of trace data, especially in high-traffic environments.
  • Link with Logs/Metrics: Correlate trace IDs with logs and metrics for a holistic view of your application’s health.

Exercise/Mini-Challenge (Conceptual):

  1. Research OpenTelemetry for a programming language you are familiar with (e.g., Python, Node.js, Java).
  2. Find an example of instrumenting a simple web service with OpenTelemetry.
  3. Outline the steps you would take to deploy this application to AKS, send its traces to an OpenTelemetry Collector (deployed via Helm), and visualize them in a tool like Jaeger (also deployed via Helm).

4.3 Handling Production Situations

4.3.1 Health Checks (Readiness and Liveness Probes)

Kubernetes uses probes to determine the health of your application Pods:

  • Liveness Probe: Checks if a container is running. If it fails, Kubernetes restarts the container. Essential for applications that might get into a broken state without crashing.
  • Readiness Probe: Checks if a container is ready to serve traffic. If it fails, Kubernetes removes the Pod from Service load balancers. Useful for applications that need time to warm up or load data.

Types of Probes:

  • HTTP GET: Makes an HTTP request to a specified path on a given port.
  • TCP Socket: Attempts to open a TCP socket on a specified port.
  • Exec: Executes a command inside the container and checks the exit code.

Code Example: Nginx with Liveness and Readiness Probes

# nginx-probes.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-probes-deployment
  labels:
    app: nginx-probes
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-probes
  template:
    metadata:
      labels:
        app: nginx-probes
    spec:
      containers:
      - name: nginx-container
        image: nginx:latest
        ports:
        - containerPort: 80
        livenessProbe:
          httpGet:
            path: /healthz
            port: 80
          initialDelaySeconds: 5 # Wait 5 seconds before first check
          periodSeconds: 5     # Check every 5 seconds
          failureThreshold: 3  # After 3 failures, restart
        readinessProbe:
          httpGet:
            path: /ready
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5
          failureThreshold: 1 # After 1 failure, take out of service
        lifecycle:
          postStart:
            exec:
              command: ["/bin/sh", "-c", "echo 'OK' > /usr/share/nginx/html/healthz; echo 'OK' > /usr/share/nginx/html/ready"] # Create health endpoints
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 10 && rm -f /usr/share/nginx/html/ready"] # Simulate unreadiness before shutdown

Deploy:

kubectl apply -f nginx-probes.yaml

To observe, you would expose this with a Service. To test liveness failure, you could kubectl exec into the pod and rm /usr/share/nginx/html/healthz. Kubernetes should restart the container. To test readiness, you could rm /usr/share/nginx/html/ready. The Pod would remain running but be removed from the service endpoints.

Clean up:

kubectl delete -f nginx-probes.yaml

Best Practices for Probes:

  • Realistic checks: Probes should reflect the actual health and readiness of your application.
  • Lightweight endpoints: Health endpoints should be fast and not resource-intensive.
  • Graceful shutdown: Use preStop hooks to gracefully drain traffic and perform cleanup before termination.
  • Configuration: Tune initialDelaySeconds, periodSeconds, timeoutSeconds, and failureThreshold based on application characteristics.

Exercise/Mini-Challenge:

  1. Deploy an application (e.g., a simple Python Flask app) with both a liveness probe (checking /healthz) and a readiness probe (checking /ready).
  2. Implement a /healthz endpoint that always returns 200.
  3. Implement a /ready endpoint that initially returns 200, but after 30 seconds, starts returning 500 (simulating unreadiness).
  4. Observe the Pod’s status change to “Not Ready” after 30 seconds without the Pod restarting.

4.3.2 Resource Quotas and Limit Ranges

Resource Quotas and Limit Ranges are mechanisms to manage and constrain resource consumption within a Kubernetes cluster.

  • Resource Quotas: Limits the total amount of resources (CPU, memory, storage) that can be consumed by all Pods within a namespace.
  • Limit Ranges: Enforces default resource limits and requests for Pods within a namespace, and can also define minimum and maximum resource constraints for containers.

Why use them?

  • Resource Governance: Prevent resource starvation and ensure fair resource distribution across teams/namespaces.
  • Cost Control: Manage cloud spending by setting limits on resource usage.
  • Stability: Prevent “noisy neighbor” issues where one application consumes excessive resources.

Code Example: Resource Quota and Limit Range in a Namespace

  1. Create a namespace:

    kubectl create namespace constrained-env
    
  2. Define a Resource Quota:

    # resource-quota.yaml
    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: dev-quota
      namespace: constrained-env
    

spec: hard: pods: “10” # Max 10 pods requests.cpu: “1” # Total CPU requests for all pods max 1 CPU core requests.memory: “2Gi” # Total memory requests max 2 GB limits.cpu: “2” # Total CPU limits for all pods max 2 CPU cores limits.memory: “4Gi” # Total memory limits max 4 GB persistentvolumeclaims: “2” # Max 2 PVCs requests.storage: “5Gi” # Total storage requests max 5 GB


3.  **Define a Limit Range**:
    ```yaml
    # limit-range.yaml
    apiVersion: v1
    kind: LimitRange
    metadata:
      name: cpu-mem-limit-range
      namespace: constrained-env
spec:
  limits:
  - default: # Default limits if not specified by container
      cpu: 500m
      memory: 512Mi
    defaultRequest: # Default requests if not specified by container
      cpu: 100m
      memory: 256Mi
    max: # Maximum allowed for a single container
      cpu: 1
      memory: 1Gi
    type: Container
    ```

Apply them to the namespace:
```bash
kubectl apply -f resource-quota.yaml -n constrained-env
kubectl apply -f limit-range.yaml -n constrained-env

Now, try to deploy a Pod without specifying requests/limits in constrained-env:

# test-pod-no-limits.yaml
apiVersion: v1
kind: Pod
metadata:
  name: test-pod-no-limits
  namespace: constrained-env
spec:
  containers:
  - name: my-container
    image: busybox:latest
    command: ["sleep", "3600"]

Deploy: kubectl apply -f test-pod-no-limits.yaml -n constrained-env kubectl describe pod test-pod-no-limits -n constrained-env will show that default requests and limits were applied from the LimitRange.

Now, try to create a Pod that violates the quota (e.g., requesting 3 CPU cores):

# test-pod-exceed-quota.yaml
apiVersion: v1
kind: Pod
metadata:
  name: test-pod-exceed-quota
  namespace: constrained-env
spec:
  containers:
  - name: my-container
    image: busybox:latest
    command: ["sleep", "3600"]
    resources:
      requests:
        cpu: "3" # Exceeds resource quota total requests.cpu: 1
      limits:
        cpu: "3" # Exceeds resource quota total limits.cpu: 2

Deploying this will result in an Error from server (Forbidden) due to resource quota violation.

Clean up:

kubectl delete -f test-pod-no-limits.yaml -n constrained-env
# kubectl delete -f test-pod-exceed-quota.yaml -n constrained-env (if it was created)
kubectl delete -f limit-range.yaml -n constrained-env
kubectl delete -f resource-quota.yaml -n constrained-env
kubectl delete namespace constrained-env

Best Practices for Resource Governance:

  • Define for every namespace: Apply Resource Quotas and Limit Ranges to all non-system namespaces.
  • Sensible defaults: Set reasonable defaultRequest and default limits in LimitRanges to prevent runaway containers.
  • Monitor usage: Monitor actual resource consumption against quotas to fine-tune your limits.

Exercise/Mini-Challenge:

  1. Create a namespace staging.
  2. Apply a ResourceQuota to staging that limits pods to 5 and requests.memory to 1Gi.
  3. Apply a LimitRange to staging that sets a defaultRequest.memory of 128Mi and default.memory of 256Mi for containers.
  4. Deploy a Deployment with 4 replicas of a simple nginx app (without explicit memory requests/limits). Verify they use the default values and count towards the quota.
  5. Try to deploy a 6th pod into the staging namespace. Observe the quota error.

4.4 Automated Deployment with CI/CD (GitHub Actions)

Continuous Integration/Continuous Delivery (CI/CD) pipelines automate the process of building, testing, and deploying your applications. GitHub Actions provides a powerful, flexible, and fully integrated CI/CD solution directly within your GitHub repositories.

Workflow for deploying to AKS with GitHub Actions:

  1. Code Commit: Developer pushes code to GitHub.
  2. CI Build: GitHub Action triggers, builds the Docker image, runs tests.
  3. Image Push: Pushes the Docker image to Azure Container Registry (ACR).
  4. CD Deploy: GitHub Action triggers, logs into AKS, and deploys the Helm Chart (or Kubernetes manifests).

Key GitHub Actions components for AKS/Helm:

  • azure/login: Authenticate to Azure.
  • azure/docker-login: Authenticate to ACR.
  • azure/aks-set-context: Set kubectl context to AKS.
  • helm/helm-action: Install/upgrade Helm Charts.
  • actions/checkout: Checkout your repository code.

Code Example: GitHub Actions for Helm Deployment to AKS

  1. Set up Azure Service Principal: GitHub Actions needs credentials to interact with Azure. Create a Service Principal with Contributor role on your AKS Resource Group and ACR.

    az ad sp create-for-rbac --name "github-actions-sp" --role contributor \
        --scopes /subscriptions/<subscription-id>/resourceGroups/<your-aks-resource-group> \
        --sdk-auth
    

    This will output a JSON object. Save it as a GitHub Secret (e.g., AZURE_CREDENTIALS).

  2. Create an ACR (Azure Container Registry):

    az acr create --resource-group myAKSResourceGroup --name myacr2025example --sku Basic --admin-enabled true
    

    Note your ACR login server (e.g., myacr2025example.azurecr.io).

  3. Create your application and Helm chart: Assume you have a simple Nginx application and the mynginxapp Helm chart from earlier.

  4. .github/workflows/deploy.yaml:

    # .github/workflows/deploy.yaml
    name: Deploy to AKS
    
    on:
      push:
        branches:
          - main
      workflow_dispatch: # Allows manual trigger
    
    env:
      AZURE_CONTAINER_REGISTRY: myacr2025example.azurecr.io # Replace with your ACR name
      AZURE_AKS_CLUSTER_NAME: myAKSCluster # Replace with your AKS cluster name
      RESOURCE_GROUP: myAKSResourceGroup # Replace with your AKS resource group
      HELM_CHART_PATH: mynginxapp # Path to your Helm chart directory
      HELM_RELEASE_NAME: my-webapp
    
    jobs:
      build-and-deploy:
        runs-on: ubuntu-latest
        steps:
        - name: Checkout repository
          uses: actions/checkout@v4
    
        - name: Azure Login
          uses: azure/login@v1
          with:
            creds: ${{ secrets.AZURE_CREDENTIALS }}
    
        - name: Docker Login to ACR
          uses: azure/docker-login@v1
          with:
            login-server: ${{ env.AZURE_CONTAINER_REGISTRY }}
            username: ${{ secrets.ACR_USERNAME }} # Use ACR Admin username
            password: ${{ secrets.ACR_PASSWORD }} # Use ACR Admin password
    
        - name: Build and Push Docker image
          run: |
            docker build . -t ${{ env.AZURE_CONTAINER_REGISTRY }}/mywebapp:${{ github.sha }}
            docker push ${{ env.AZURE_CONTAINER_REGISTRY }}/mywebapp:${{ github.sha }}        
    
        - name: Set AKS Kubeconfig
          uses: azure/aks-set-context@v1
          with:
            resource-group: ${{ env.RESOURCE_GROUP }}
            cluster-name: ${{ env.AZURE_AKS_CLUSTER_NAME }}
    
        - name: Install or Upgrade Helm Chart
          uses: helm/helm-action@v1.2.0 # Use a specific version
          with:
            command: upgrade
            chart: ${{ env.HELM_CHART_PATH }}
            release-name: ${{ env.HELM_RELEASE_NAME }}
            namespace: default # Or your target namespace
            values: |
              image:
                repository: ${{ env.AZURE_CONTAINER_REGISTRY }}/mywebapp
                tag: ${{ github.sha }}
              service:
                type: LoadBalancer # Expose for testing          
            set-values: | # Example of setting individual values
              replicaCount=2
            wait: true # Wait for the deployment to be ready
            atomic: true # Rollback on failure
    

    GitHub Secrets:

    • AZURE_CREDENTIALS: The JSON output from az ad sp create-for-rbac.
    • ACR_USERNAME, ACR_PASSWORD: Get these from your ACR: az acr credential show --name <your-acr-name> --query 'username' and --query 'passwords[0].value'.
  5. Commit and push: Push this deploy.yaml to your main branch. GitHub Actions will trigger the workflow.

Best Practices for CI/CD with AKS/Helm:

  • GitOps: Adopt a GitOps approach where all desired state (infrastructure and application configs) is stored in Git, and an operator (like Argo CD or Flux CD) applies changes to the cluster. GitHub Actions can be used to update the Git repository that the GitOps operator watches.
  • Separate Environments: Use different branches or separate workflows/pipelines for dev, staging, and production environments.
  • Security: Use short-lived credentials (OpenID Connect with Azure AD for GitHub Actions) instead of Service Principals when possible. Grant least-privilege permissions.
  • Testing: Integrate unit tests, integration tests, and end-to-end tests into your CI pipeline. Use helm lint and helm template --debug in CI.
  • Rollback Strategy: Ensure your deployments are atomic and have clear rollback procedures.
  • Notifications: Configure notifications for pipeline success/failure.

Exercise/Mini-Challenge:

  1. Set up an Azure Service Principal and add it to your GitHub repository secrets.
  2. Create a simple Dockerfile for a basic web server (e.g., Python Flask).
  3. Create a Helm Chart for this web server.
  4. Implement a GitHub Actions workflow that:
    • Builds the Docker image.
    • Pushes the image to your ACR.
    • Deploys/upgrades the Helm Chart to your AKS cluster, using the newly built image tag.
  5. Trigger the workflow and verify the deployment in AKS.

4.5 Secret Management (Azure Key Vault Provider for Secrets Store CSI Driver)

Kubernetes Secrets store sensitive data, but they are base64 encoded, not truly encrypted by default at rest in all scenarios. For enhanced security, especially in production, you should integrate with a dedicated secret management solution. Azure Key Vault is Azure’s fully managed secret store, and the Secrets Store CSI Driver allows you to mount secrets from Key Vault directly into your Pods as volumes.

Benefits:

  • Centralized Secret Management: Manage all secrets in a secure, audited Key Vault.
  • Encryption at Rest: Key Vault encrypts secrets at rest.
  • Reduced Attack Surface: Secrets are not stored directly in Kubernetes etcd, but are injected dynamically.
  • Automatic Rotation: Leverage Key Vault’s secret rotation capabilities.

How it works:

  1. Install Secrets Store CSI Driver and Azure Key Vault Provider: These components are installed in your AKS cluster.
  2. SecretProviderClass: Defines which secrets to fetch from Key Vault.
  3. Pod volumeMounts: Pods reference the SecretProviderClass and mount the secrets as a volume.
  4. Managed Identity: The Pod uses an Azure AD Managed Identity to authenticate to Key Vault.

Code Example: Azure Key Vault Secrets with CSI Driver

  1. Enable the Azure Key Vault Secrets Provider Add-on for AKS: This is the easiest way to deploy the CSI driver and provider.

    az aks enable-addons --addons azure-keyvault-secrets-provider --name myAKSCluster --resource-group myAKSResourceGroup
    

    Ensure your AKS cluster has a Managed Identity (system-assigned or user-assigned).

  2. Create an Azure Key Vault and a Secret:

    az keyvault create --name my-aks-keyvault-2025 --resource-group myAKSResourceGroup --location eastus --enabled-for-rbac true
    az keyvault secret set --vault-name my-aks-keyvault-2025 --name MyDbPassword --value "SuperSecretP@ssw0rd!"
    
  3. Grant AKS Managed Identity access to Key Vault: Get the Client ID of your AKS cluster’s Kubelet Identity (if system-assigned):

    AKS_MI_CLIENT_ID=$(az aks show --resource-group myAKSResourceGroup --name myAKSCluster --query identity.principalId -o tsv)
    # Or for Kubelet identity (more granular):
    # KUBELET_MI_CLIENT_ID=$(az aks show -g myAKSResourceGroup -n myAKSCluster --query identityProfile.kubeletidentity.clientId -o tsv)
    
    # Grant Get and List permissions on secrets
    az keyvault set-policy --name my-aks-keyvault-2025 --resource-group myAKSResourceGroup --object-id $AKS_MI_CLIENT_ID --secret-permissions get list
    

    (Note: Using the cluster’s Kubelet identity or a dedicated user-assigned managed identity is a best practice for production. For simplicity, we used the main AKS identity above.)

  4. Create a SecretProviderClass:

    # secret-provider-class.yaml
    apiVersion: secrets-store.csi.k8s.io/v1
    kind: SecretProviderClass
    metadata:
      name: azure-kv-secrets
      namespace: default # Or your target namespace
    spec:
      provider: azure
      parameters:
        usePodIdentity: "false" # Use cluster-level identity (Kubelet or AKS MI)
        useVMManagedIdentity: "true"
        # tenantId: "<your-azure-tenant-id>" # Optional, if not using AKS cluster MI
        keyvaultName: my-aks-keyvault-2025 # Replace with your Key Vault name
        objects: |
          array:
            - |
              objectName: MyDbPassword
              objectType: secret
              objectVersion: "" # Use latest version      
    
  5. Deploy a Pod to consume the secret:

    # pod-with-kv-secret.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: myapp-kv-deployment
      labels:
        app: myapp-kv
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: myapp-kv
      template:
        metadata:
          labels:
            app: myapp-kv
        spec:
          containers:
          - name: myapp-container
            image: busybox:latest
            command: ["sh", "-c", "echo 'Application started...'; cat /mnt/secrets-store/MyDbPassword; sleep 3600"]
            volumeMounts:
            - name: secrets-store-volume
              mountPath: "/mnt/secrets-store"
              readOnly: true
          volumes:
          - name: secrets-store-volume
            csi:
              driver: secrets-store.csi.k8s.io
              readOnly: true
              volumeAttributes:
                secretProviderClass: azure-kv-secrets
    

Deploy the SecretProviderClass and the Pod:

kubectl apply -f secret-provider-class.yaml
kubectl apply -f pod-with-kv-secret.yaml

Check the pod logs:

kubectl logs -l app=myapp-kv

You should see “SuperSecretP@ssw0rd!” printed, demonstrating the secret was mounted.

Clean up:

kubectl delete -f pod-with-kv-secret.yaml
kubectl delete -f secret-provider-class.yaml
az keyvault delete --name my-aks-keyvault-2025 --resource-group myAKSResourceGroup
az keyvault purge --name my-aks-keyvault-2025 # Permanent delete if soft-delete is enabled

Best Practices for Secret Management:

  • Centralized Key Vault: Use Azure Key Vault for all application secrets.
  • Managed Identities: Always use Azure AD Managed Identities for AKS and application Pods to authenticate to Key Vault, following the principle of least privilege.
  • CSI Driver: Leverage the Secrets Store CSI Driver for dynamic secret injection, avoiding direct storage in etcd.
  • Rotation: Implement secret rotation policies in Key Vault.
  • Auditing: Use Key Vault’s auditing capabilities to track access to secrets.

Exercise/Mini-Challenge:

  1. Create a second secret in your Azure Key Vault (e.g., ApiToken).
  2. Modify the SecretProviderClass to also fetch this ApiToken.
  3. Modify your myapp-kv-deployment Pod to mount and print the ApiToken as well.
  4. Verify that both secrets are successfully mounted and accessible by the Pod.

5. Guided Projects

These projects will consolidate your learning by guiding you through deploying more complete applications.

Project 1: Deploying a Multi-Tier Web Application with Helm to AKS

Objective: Deploy a simple multi-tier web application (e.g., a frontend, backend API, and a database) to AKS using a single Helm Chart. Configure Ingress for external access and persistent storage for the database.

Problem Statement: You need to deploy a Guestbook application, which consists of a Python Flask frontend, a Redis database, and potentially an Ingress.

Steps:

  1. Clone the Sample Application and Helm Chart (or create your own): For simplicity, we’ll outline the structure. You can create these files.

    guestbook-chart/
      Chart.yaml
      values.yaml
      charts/
        # redis subchart
      templates/
        frontend-deployment.yaml
        frontend-service.yaml
        backend-deployment.yaml # Optional, if you add a backend API
        backend-service.yaml    # Optional
        ingress.yaml            # Optional, if ingress is enabled
    
  2. Frontend (Python Flask) Dockerfile (Example guestbook-chart/app/Dockerfile):

    # Dockerfile
    FROM python:3.9-slim-buster
    WORKDIR /app
    COPY requirements.txt .
    RUN pip install -r requirements.txt
    COPY . .
    EXPOSE 5000
    CMD ["python", "app.py"]
    

    (app.py would connect to Redis and display/store guestbook entries. requirements.txt would contain Flask, redis.)

  3. Create a Helm Chart (guestbook-chart):

    helm create guestbook-chart
    cd guestbook-chart
    
  4. Add Redis as a Subchart Dependency: Edit Chart.yaml:

    # guestbook-chart/Chart.yaml
    apiVersion: v2
    name: guestbook-app
    description: A multi-tier guestbook application
    version: 0.1.0
    appVersion: "1.0.0"
    dependencies:
      - name: redis
        version: 17.x.x # Use a recent stable version
        repository: https://charts.bitnami.com/bitnami
        condition: redis.enabled
    

    Run helm dependency update .

  5. Configure values.yaml:

    • Enable Redis and set a password.
    • Define frontend image (build your own and push to ACR).
    • Configure service type (e.g., LoadBalancer for frontend), or Ingress.
    • Set replica counts.
    # guestbook-chart/values.yaml
    frontend:
      image:
        repository: <your-acr-name>.azurecr.io/guestbook-frontend
        tag: latest
      replicaCount: 2
      service:
        type: LoadBalancer # Or ClusterIP if using Ingress
        port: 80
        targetPort: 5000
    
    redis:
      enabled: true
      password: "myredispassword"
      master:
        persistence:
          enabled: true
          size: 1Gi
    
    ingress:
      enabled: false # Set to true to enable ingress
      className: "nginx"
      host: guestbook.local
      annotations: {}
    
  6. Create Frontend Kubernetes Manifests in templates/:

    • frontend-deployment.yaml (using your custom image, connecting to Redis via redis-master service from subchart).
    • frontend-service.yaml (expose frontend).
  7. Build and Push Frontend Docker Image: You’ll need to build your Python Flask app Docker image and push it to your ACR.

    # From guestbook-chart/app directory
    docker build -t <your-acr-name>.azurecr.io/guestbook-frontend:latest .
    docker push <your-acr-name>.azurecr.io/guestbook-frontend:latest
    
  8. Deploy the Helm Chart:

    helm install guestbook-release guestbook-chart/
    
  9. Verify and Test:

    • kubectl get pods, kubectl get svc.
    • Access the frontend via LoadBalancer IP or Ingress hostname.
    • Verify guestbook entries persist across frontend pod restarts (Redis persistence).

Encourage independent problem-solving:

  • Task: Implement a /healthz and /ready endpoint in the Flask frontend and add liveness/readiness probes to its Deployment.
  • Task: Configure ResourceQuotas and LimitRanges for the namespace where the Guestbook app is deployed.
  • Task: Modify the chart to support an optional backend API service if backend.enabled is true in values.yaml.

Project 2: Implementing CI/CD with GitHub Actions, IaC, and Secret Management

Objective: Extend Project 1 to include a full CI/CD pipeline using GitHub Actions for automated deployment to an AKS cluster provisioned with Terraform, securely managing secrets with Azure Key Vault.

Problem Statement: Automate the deployment of the Guestbook application from Project 1 to AKS using GitHub Actions, while the AKS cluster itself is managed by Terraform, and Redis password is fetched from Azure Key Vault.

Steps:

  1. Review Terraform AKS Setup: Ensure your AKS cluster is provisioned via Terraform as described in Section 4.1. The AKS cluster should have the Azure Key Vault Secrets Provider add-on enabled.

  2. Configure Azure Key Vault: Create an Azure Key Vault and store the Redis password (e.g., RedisPassword) there. Grant the AKS cluster’s managed identity (or a dedicated user-assigned managed identity) get and list permissions to secrets in the Key Vault.

  3. Create SecretProviderClass: Define a SecretProviderClass in Kubernetes that fetches the RedisPassword from your Azure Key Vault.

  4. Modify guestbook-chart to use Key Vault Secret:

    • In guestbook-chart/templates/redis-deployment.yaml (or where Redis password is configured), modify it to consume the secret mounted by the CSI driver.
    • Mount the secrets volume into the Redis Pod:
      # Example snippet for Redis Pod in its deployment template
      spec:
        # ...
        volumes:
        - name: secrets-store-volume
          csi:
            driver: secrets-store.csi.k8s.io
            readOnly: true
            volumeAttributes:
              secretProviderClass: azure-kv-secrets # Name of your SecretProviderClass
        containers:
        - name: redis
          # ...
          volumeMounts:
          - name: secrets-store-volume
            mountPath: "/mnt/secrets-store" # Mount path for secrets
          env:
          - name: REDIS_PASSWORD # Redis chart usually expects this env var
            valueFrom:
              secretKeyRef:
                name: redis-secret-from-kv # Create a k8s secret from mounted file
                key: MyDbPassword          # Name of the file in /mnt/secrets-store
        # ... also need to create a Kubernetes Secret from the mounted file for Redis to use it
        # This is often done with an initContainer or a separate Helm chart for secret syncing
        # For simplicity, if the Redis chart can directly read from a file, use that.
        # Otherwise, the CSI driver can create a K8s Secret directly from Key Vault.
        # Let's assume CSI driver is configured to sync to a K8s secret directly:
        # In SecretProviderClass parameters:
        # secretObjects: |
        #   array:
        #     - secretName: redis-secret-from-kv
        #       type: Opaque
        #       data:
        #         - key: MyDbPassword
        #           objectName: MyDbPassword
      
  5. Create GitHub Actions Workflow (.github/workflows/deploy-guestbook.yaml):

    • IaC Pipeline (Terraform): A separate job/workflow that terraform plan and terraform apply your AKS infrastructure.
    • Application CI/CD Pipeline:
      • Checks out code.
      • Logs into Azure (using AZURE_CREDENTIALS secret).
      • Logs into ACR (ACR_USERNAME, ACR_PASSWORD secrets).
      • Builds the guestbook-frontend Docker image and pushes it to ACR (tagged with github.sha).
      • Sets kubectl context to your AKS cluster.
      • Deploys the SecretProviderClass (if not already managed by Terraform).
      • Installs or upgrades the guestbook-chart Helm release to AKS, passing the dynamic image tag.
    # .github/workflows/deploy-guestbook.yaml
    name: Guestbook App CI/CD to AKS
    
    on:
      push:
        branches:
          - main
        paths:
          - 'guestbook-chart/**'
          - 'azure-aks-infra/**' # If you include Terraform here
      workflow_dispatch:
    
    env:
      AZURE_CONTAINER_REGISTRY: myacr2025example.azurecr.io
      AZURE_AKS_CLUSTER_NAME: my-terraform-aks
      RESOURCE_GROUP: my-aks-terraform-rg
      HELM_CHART_PATH: guestbook-chart # Path to your Helm chart directory
      HELM_RELEASE_NAME: guestbook-app-release
      FRONTEND_IMAGE_NAME: guestbook-frontend
    
    jobs:
      terraform-apply:
        runs-on: ubuntu-latest
        steps:
          - name: Checkout repository
            uses: actions/checkout@v4
    
          - name: Azure Login
            uses: azure/login@v1
            with:
              creds: ${{ secrets.AZURE_CREDENTIALS }}
    
          - name: Setup Terraform
            uses: hashicorp/setup-terraform@v3
    
          - name: Terraform Init
            run: terraform init
            working-directory: azure-aks-infra # Your Terraform folder
    
          - name: Terraform Plan
            run: terraform plan
            working-directory: azure-aks-infra
    
          - name: Terraform Apply
            run: terraform apply -auto-approve
            working-directory: azure-aks-infra
            if: github.event_name == 'push' && github.ref == 'refs/heads/main' # Only apply on push to main
            # Consider manual approval for production IaC deployments
    
      build-and-deploy-app:
        needs: terraform-apply # Ensure infrastructure is ready
        runs-on: ubuntu-latest
        steps:
        - name: Checkout repository
          uses: actions/checkout@v4
    
        - name: Azure Login
          uses: azure/login@v1
          with:
            creds: ${{ secrets.AZURE_CREDENTIALS }}
    
        - name: Docker Login to ACR
          uses: azure/docker-login@v1
          with:
            login-server: ${{ env.AZURE_CONTAINER_REGISTRY }}
            username: ${{ secrets.ACR_USERNAME }}
            password: ${{ secrets.ACR_PASSWORD }}
    
        - name: Build and Push Frontend Docker image
          run: |
            docker build ${{ env.HELM_CHART_PATH }}/app -t ${{ env.AZURE_CONTAINER_REGISTRY }}/${{ env.FRONTEND_IMAGE_NAME }}:${{ github.sha }}
            docker push ${{ env.AZURE_CONTAINER_REGISTRY }}/${{ env.FRONTEND_IMAGE_NAME }}:${{ github.sha }}        
    
        - name: Set AKS Kubeconfig
          uses: azure/aks-set-context@v1
          with:
            resource-group: ${{ env.RESOURCE_GROUP }}
            cluster-name: ${{ env.AZURE_AKS_CLUSTER_NAME }}
    
        - name: Deploy SecretProviderClass
          # You can also manage SPC via Helm or Terraform
          # For this project, assuming SPC definition is static in a .yaml file
          run: kubectl apply -f secret-provider-class.yaml
          # working-directory: path/to/your/secret-provider-class-definition
    
        - name: Install or Upgrade Helm Chart
          uses: helm/helm-action@v1.2.0
          with:
            command: upgrade
            chart: ${{ env.HELM_CHART_PATH }}
            release-name: ${{ env.HELM_RELEASE_NAME }}
            namespace: default
            values: |
              frontend:
                image:
                  repository: ${{ env.AZURE_CONTAINER_REGISTRY }}/${{ env.FRONTEND_IMAGE_NAME }}
                  tag: ${{ github.sha }}
              redis:
                enabled: true # Ensure subchart is enabled
                password: "" # Password now comes from CSI driver
              service:
                type: LoadBalancer          
            set-values: |
              replicaCount=2          
            wait: true
            atomic: true
    

Encourage independent problem-solving:

  • Task: Implement separate dev, staging, and production workflows/jobs in GitHub Actions, possibly using environment-specific secrets and values files for Helm.
  • Task: Add a step in the GitHub Actions pipeline to run helm lint and helm template --debug to validate your Helm Chart before deployment.
  • Task: Configure monitoring alerts in Azure Monitor for your deployed application (e.g., high CPU usage, HTTP 500 errors).

6. Bonus Section: Further Learning and Resources

Congratulations on making it this far! This guide provides a strong foundation, but the cloud-native landscape is vast and ever-evolving. Here are some resources to continue your learning journey:

  • Official Kubernetes Documentation: The best place for in-depth concepts and examples.
  • Helm Documentation: Comprehensive guide for Helm Chart development and management.
  • Azure Kubernetes Service (AKS) Documentation: Microsoft’s official documentation for AKS, including best practices and integrations.
  • Pluralsight/Udemy/Coursera: Look for courses like “Certified Kubernetes Administrator (CKA)”, “Certified Kubernetes Application Developer (CKAD)”, or specific AKS and Helm courses.
  • KodeKloud: Offers excellent hands-on labs and courses for Kubernetes.

Official Documentation:

Blogs and Articles:

  • Microsoft Azure Blog: Stay updated on the latest Azure and AKS features.
  • Kubernetes Blog: Official blog for Kubernetes project updates.
  • Helm Blog: News and updates from the Helm community.
  • CNCF (Cloud Native Computing Foundation) Blog: Broader cloud-native ecosystem news.
  • Developer.microsoft.com/reactor/events: Look for live or recorded sessions on AKS and related technologies.

YouTube Channels:

  • Azure Friday: Weekly videos on various Azure services, often including AKS.
  • Kubernetes Official Channel: Talks, tutorials, and conference recordings.
  • TechWorld with Nana: Excellent beginner-friendly tutorials on Docker, Kubernetes, and DevOps.
  • Fireship: Quick, engaging overviews of new technologies.

Community Forums/Groups:

  • Stack Overflow: For specific technical questions.
  • Kubernetes Slack: Active community for discussions and help (#helm-users, #aks-users, etc.).
  • GitHub Issues/Discussions: For specific projects (Helm, AKS-engine, CSI drivers).
  • Reddit Communities: r/kubernetes, r/helm, r/azure, r/devops.

Next Steps/Advanced Topics:

  • Service Mesh (Istio/Linkerd): For advanced traffic management, security, and observability between microservices. AKS offers an Istio add-on.
  • GitOps with Argo CD/Flux CD: Advanced automated deployment where Git is the single source of truth, and a controller reconciles the cluster state.
  • Custom Resource Definitions (CRDs) and Operators: Extend Kubernetes functionality with custom resources and controllers to manage complex applications.
  • Advanced Networking (Azure CNI, Calico): Deeper dive into network policies, IP addressing, and connectivity.
  • AKS Security Best Practices: In-depth security hardening, vulnerability management, and compliance for AKS.
  • Performance Tuning and Cost Optimization: Optimize resource requests/limits, auto-scaling, and cluster sizing for efficiency.
  • Disaster Recovery and Business Continuity: Strategies for multi-region deployments and data backup/restore in AKS.
  • Open Policy Agent (OPA) Gatekeeper: Policy enforcement for Kubernetes clusters to ensure compliance and security.
  • Kubernetes with WebAssembly (Wasm): An emerging trend for running highly efficient, sandboxed workloads.

By continuously exploring these resources and engaging with the community, you’ll stay at the forefront of cloud-native development and master the complexities of Helm and Kubernetes on AKS. Happy learning!