Advanced Topics: High Availability and Clustering

In production environments, simply running a single Redis instance is often not enough. You need to ensure your Redis service is highly available (it remains operational even if a server fails) and scalable (it can handle increased load and data volume). Redis offers two primary solutions for these challenges: Redis Sentinel for high availability and Redis Cluster for horizontal scaling.

This chapter will guide you through:

The concepts of High Availability (HA) and how Redis achieves it.
Redis Sentinel: For automatic failover and monitoring of master-replica setups.
Redis Cluster: For sharding data across multiple nodes and providing both HA and linear scalability.
Understanding the trade-offs and when to use each.

1. High Availability with Redis Sentinel

Redis Sentinel is a distributed system that provides high availability for Redis. It continuously monitors your Redis instances (masters and replicas), and if a master goes down, it automatically promotes a replica to become the new master. Sentinel also reconfigures the other replicas to follow the new master and informs client applications about the change.

Key features of Sentinel:

Monitoring: Sentinels constantly check if your master and replica instances are behaving as expected.
Notification: They can notify system administrators or other computer programs when something goes wrong with a monitored Redis instance.
Automatic Failover: When a master is not working as expected, Sentinel can start a failover process. This promotes a replica to master, reconfigures other replicas to use the new master, and notifies clients.
Configuration Provider: Clients connect to Sentinels to ask for the current address of the Redis master for a given service.

How it works (Simplified):

You run multiple Sentinel instances (at least 3 for robustness) in your infrastructure.
Each Sentinel monitors a set of Redis masters and their replicas.
When a master becomes unreachable, Sentinels vote on its status. If a quorum of Sentinels agree the master is down (“Subjectively Down” becomes “Objectively Down”), a failover is initiated.
One Sentinel is elected as the leader to perform the failover.
The leader Sentinel selects the best replica to promote, reconfigures other replicas, and updates client configuration.

Node.js Example (Sentinel-aware client):

Redis client libraries like ioredis are designed to be Sentinel-aware. Instead of connecting directly to a master, you provide the client with a list of Sentinel addresses and the name of the master set. The client then queries the Sentinels to discover the current master’s address.

// redis_sentinel_client.js
const Redis = require('ioredis');

// --- Configuration for Sentinel ---
// In a real setup, these would be IP addresses of your Sentinel instances
const sentinelHosts = [
  { host: '127.0.0.1', port: 26379 }, // Example Sentinel 1
  { host: '127.0.0.1', port: 26380 }, // Example Sentinel 2
  { host: '127.0.0.1', port: 26381 }, // Example Sentinel 3
];
const masterSetName = 'mymaster'; // The name you give your master in Sentinel config

// Create a Redis client configured for Sentinel
const redis = new Redis({
  sentinels: sentinelHosts,
  name: masterSetName,
  // Other options like password, db, etc.
});

redis.on('connect', () => {
  console.log(`Connected to Redis master via Sentinel: ${masterSetName}`);
});

redis.on('reconnecting', () => {
  console.log('Redis client reconnecting...');
});

redis.on('error', (err) => {
  console.error('Redis client error:', err);
});

async function runSentinelClientExample() {
  try {
    console.log('--- Redis Sentinel Client Example ---');

    // Perform some operations
    await redis.set('my_app_setting', 'value_from_sentinel');
    console.log(`Set 'my_app_setting'.`);

    const value = await redis.get('my_app_setting');
    console.log(`Retrieved 'my_app_setting': ${value}`);

    // Simulate a failover (manually stop the current master)
    // The client should automatically reconnect to the new master
    console.log('\n(To test failover: Manually stop the current Redis master instance.)');
    console.log('(Sentinel should promote a replica, and this client will reconnect automatically)');
    console.log('(You will see "Redis client reconnecting..." and then it should work again)');
    
    // Keep trying to get the value, demonstrating resilience
    let counter = 0;
    const intervalId = setInterval(async () => {
      if (counter >= 10) {
        clearInterval(intervalId);
        await redis.del('my_app_setting');
        await redis.quit();
        console.log('Sentinel client example finished.');
        return;
      }
      try {
        const currentMasterIP = await redis.call('CONFIG', 'GET', 'bind'); // Get IP of current connected master
        console.log(`[${new Date().toLocaleTimeString()}] Connected to master at: ${currentMasterIP[1]}... 'my_app_setting' -> ${await redis.get('my_app_setting')}`);
      } catch (e) {
        console.error(`[${new Date().toLocaleTimeString()}] Error during continuous GET: ${e.message}`);
      }
      counter++;
    }, 3000); // Try every 3 seconds

  } catch (err) {
    console.error('Error in Sentinel client example:', err);
  }
}

// runSentinelClientExample(); // Requires a running Sentinel setup (e.g., via Docker Compose)

To run this example: You need to set up a Redis master-replica cluster and at least three Sentinel instances. This is often done using Docker Compose for local development.

2. Horizontal Scaling with Redis Cluster

Redis Cluster is Redis’s solution for horizontal scaling. It automatically shards your data across multiple Redis nodes, providing both high availability and increased throughput and data capacity.

Key features of Redis Cluster:

Automatic Sharding: Your dataset is automatically split across multiple Redis master nodes. Each master node handles a subset of the data.
High Availability: Each master node can have one or more replica nodes. If a master node fails, one of its replicas is automatically promoted to master (similar to Sentinel’s role, but integrated into the cluster).
Linear Scalability: You can add more master nodes to linearly increase read/write throughput and total data storage.
Client-side Sharding Logic: Redis Cluster-aware clients automatically direct commands to the correct node based on the key being accessed.

How it works (Simplified):

The entire keyspace is divided into 16384 hash slots.
Each master node in the cluster is responsible for a subset of these hash slots.
When a client sends a command, it hashes the key to determine which hash slot it belongs to.
The client then knows which node owns that hash slot and sends the command directly to that node. If the client guesses wrong (e.g., due to a recent slot migration or topology change), the node redirects the client to the correct node using a MOVED or ASK redirection.
Data is automatically replicated to replicas within the cluster for fault tolerance.

Node.js Example (Cluster-aware client):

ioredis also supports Redis Cluster. You provide it with a list of one or more cluster nodes, and it automatically discovers the entire cluster topology.

// redis_cluster_client.js
const Redis = require('ioredis');

// --- Configuration for Cluster ---
// In a real setup, these would be IP addresses of your cluster nodes
const clusterNodes = [
  { host: '127.0.0.1', port: 7000 }, // Example Cluster Node 1
  { host: '127.0.0.1', port: 7001 }, // Example Cluster Node 2
  // ... add all your cluster master nodes
];

// Create a Redis client configured for Cluster
const redis = new Redis.Cluster(clusterNodes, {
  redisOptions: {
    // Optional Redis connection options for individual nodes
    password: 'my-password' // If your cluster nodes require authentication
  },
  // Other cluster-specific options
  scaleReads: 'slave', // Read from replicas where possible for read scaling
  clusterRetryStrategy: (times) => {
    const delay = Math.min(times * 50, 2000); // Exponential backoff up to 2 seconds
    console.log(`Cluster reconnecting attempt ${times}. Delaying ${delay}ms.`);
    return delay;
  }
});

redis.on('connect', () => {
  console.log('Connected to Redis Cluster!');
});

redis.on('reconnecting', () => {
  console.log('Redis Cluster client reconnecting...');
});

redis.on('error', (err) => {
  console.error('Redis Cluster client error:', err);
});

async function runClusterClientExample() {
  try {
    console.log('--- Redis Cluster Client Example ---');

    // Set some keys. The client automatically routes them to the correct node.
    await redis.set('user:settings:1', '{"theme": "dark"}');
    await redis.set('product:stock:A', '100');
    await redis.set('order:details:XYZ', 'processing');
    console.log(`Set three keys: 'user:settings:1', 'product:stock:A', 'order:details:XYZ'`);

    // Get keys
    console.log(`\n'user:settings:1': ${await redis.get('user:settings:1')}`);
    console.log(`'product:stock:A': ${await redis.get('product:stock:A')}`);
    console.log(`'order:details:XYZ': ${await redis.get('order:details:XYZ')}`);

    // Try a multi-key operation IF keys are in the same hash slot.
    // Keys in the same hash slot can be grouped using hash tags: {key_name}
    await redis.set('{user:cart}:1', 'item_X');
    await redis.set('{user:cart}:2', 'item_Y');
    const cartItems = await redis.mget('{user:cart}:1', '{user:cart}:2');
    console.log(`\nItems in user cart (same hash slot): ${cartItems}`);

    // Clean up
    await redis.del('user:settings:1', 'product:stock:A', 'order:details:XYZ', '{user:cart}:1', '{user:cart}:2');

    console.log('--- Redis Cluster Client Example Complete ---');
    await redis.quit();

  } catch (err) {
    console.error('Error in Redis Cluster client example:', err);
  }
}

// runClusterClientExample(); // Requires a running Redis Cluster setup

To run this example: You need to set up a Redis Cluster with multiple master nodes and optionally replicas. This is also commonly done using Docker Compose or specialized cluster setup tools.

When to use Sentinel vs. Cluster

Redis Sentinel:
- Use case: Provides high availability for a single logical Redis master. All data resides on one master, replicated to multiple replicas. Best for smaller to medium datasets where the master can hold all data in memory and you need fault tolerance for the master.
- Scaling: Primarily provides read scaling (clients can read from replicas). Write scaling is limited by the single master.
- Simplicity: Simpler to set up and manage than a full cluster.
- Data size: Limited by the memory of a single server.
Redis Cluster:
- Use case: Provides both high availability and horizontal scalability (sharding). Best for large datasets that cannot fit on a single server or applications requiring high write throughput that exceeds a single master’s capacity.
- Scaling: Provides linear read and write scaling by adding more master nodes.
- Complexity: More complex to set up, manage, and debug.
- Data size: Can scale to many terabytes across hundreds of nodes.
- Important limitation: Multi-key operations (like MGET, transactions, Lua scripts) are only atomic and efficient if all involved keys fall within the same hash slot (using hash tags {}).

Practical Considerations for HA/Clustering

Network Latency: In a distributed setup, network latency between nodes and clients becomes a critical factor.
Quorum: For Sentinel, a quorum is the minimum number of Sentinels that must agree on a master’s state to initiate a failover. For Cluster, a majority of master nodes must be reachable for the cluster to remain operational.
Data Integrity: Ensure your persistence strategy (RDB/AOF) is properly configured on all master and replica nodes in a cluster to minimize data loss.
Client Libraries: Always use Redis client libraries that are explicitly “Sentinel-aware” or “Cluster-aware” as they handle topology discovery, failover, and redirection automatically.
Monitoring: Robust monitoring of all Redis instances, Sentinels, and cluster nodes is crucial for production deployments.

Exercises / Mini-Challenges

Sentinel Setup (Conceptual & Local Simulation):
- Goal: Understand the components of a Sentinel setup.
- Without actually setting up, draw a diagram of a Redis Sentinel deployment for a master my_product_cache. Include one master, two replicas, and three Sentinel instances.
- Describe the steps involved if the master fails and a replica is promoted.
- Challenge: Research and write down the minimal redis.conf and sentinel.conf configurations needed for such a setup. (You can skip actual deployment for this exercise, focusing on understanding the configuration).
Cluster Sharding Observation:
- Goal: Observe how Redis Cluster shards data.
- If you have a local Redis Cluster running (e.g., via Docker Compose example projects online):
  - Connect using redis-cli -c -p 7000 (where 7000 is one of your cluster node ports, -c for cluster mode).
  - SET user:1 value1
  - SET user:2 value2
  - SET user:3 value3
  - Observe how redis-cli redirects your commands to different nodes. Use CLUSTER KEYSLOT <key> to see which hash slot a key belongs to.
  - Now, use hash tags: SET {myuser}1 valueA, SET {myuser}2 valueB. Verify they land on the same node.
- Challenge: What happens if you try to run a MGET user:1 user:2 command directly (without hash tags)? Why?
Client Failover Test (requires local Sentinel setup):
- Goal: Verify automatic client failover with Sentinel.
- Set up a minimal Redis Sentinel environment (1 master, 1 replica, 3 Sentinels – Docker Compose is easiest).
- Start your redis_sentinel_client.js or Python equivalent. Observe it connecting to the master.
- Manually stop the master Redis instance (e.g., docker stop <master_container_name>).
- Observe the Sentinel logs showing a failover being initiated and a replica being promoted.
- Observe your client application: Does it reconnect and continue functioning without interruption (or with minimal interruption)?
- Challenge: After the failover, manually start the old master. What happens? (It should rejoin as a replica of the new master).

By successfully completing these challenges, you’ll gain a deep understanding of how Redis provides high availability and scales horizontally, enabling you to design and operate robust Redis-backed systems capable of handling real-world loads and failures. Our next chapter focuses on Best Practices and Performance Tuning.