Guided Project 1: Building a Cached LLM Chatbot

5. Guided Project 1: Building a Cached LLM Chatbot

In this project, you will build a basic chatbot that answers user questions. The core idea is to integrate Redis LangCache to minimize calls to a simulated expensive LLM, thereby improving response times and reducing operational costs.

Project Objective

To develop a simple command-line chatbot that processes user queries. For each query:

It first checks Redis LangCache for a semantically similar answer.
If a cached answer is found (cache hit), it returns it immediately.
If no cached answer is found (cache miss), it calls a mock LLM (simulating an actual LLM API call) to get a fresh response.
The new prompt-response pair from the mock LLM is then stored in LangCache for future use.

Prerequisites

Completed “Setting Up Your Development Environment” (Chapter 1).
Understanding of “Core Concepts of Semantic Caching” (Chapter 2) and “Basic Operations” (Chapter 3).

Project Structure

Create a new directory for this project, e.g., learn-redis-langcache/projects/chatbot-project.

`chatbot-project/index.js` (for Node.js) or `chatbot-project/chatbot.py` (for Python)

`chatbot-project/mock_llm.js` or `chatbot-project/mock_llm.py`

`.env` file in the root `learn-redis-langcache` directory (as set up in Chapter 1).

Step-by-Step Instructions

Step 1: Initialize LangCache Client and Mock LLM

We’ll start by setting up our LangCache client and a simple mock LLM function. The mock LLM will simulate the behavior of a real LLM by providing responses for a few predefined queries and a generic fallback for others.

Node.js (projects/chatbot-project/index.js)

// projects/chatbot-project/index.js
require('dotenv').config({ path: '../../.env' }); // Adjust path to your .env file

const { LangCache } = require('@redis-ai/langcache');
const readline = require('readline');
const { mockLlmResponse } = require('./mock_llm'); // Will create this next

// Retrieve LangCache credentials
const LANGCACHE_API_HOST = process.env.LANGCACHE_API_HOST;
const LANGCACHE_CACHE_ID = process.env.LANGCACHE_CACHE_ID;
const LANGCACHE_API_KEY = process.env.LANGCACHE_API_KEY;

// Initialize LangCache client
const langCache = new LangCache({
    serverURL: `https://${LANGCACHE_API_HOST}`,
    cacheId: LANGCACHE_CACHE_ID,
    apiKey: LANGCACHE_API_KEY,
});

console.log("Chatbot initialized. Type 'exit' to quit.");

// Readline interface for user input
const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
});

async function chat() {
    rl.question('You: ', async (query) => {
        if (query.toLowerCase() === 'exit') {
            console.log('Goodbye!');
            rl.close();
            return;
        }

        let response = '';
        let source = '';

        // Try to get response from LangCache
        try {
            const cacheResults = await langCache.search({
                prompt: query,
                similarityThreshold: 0.8 // Adjust as needed
            });

            if (cacheResults && cacheResults.results.length > 0) {
                response = cacheResults.results[0].response;
                source = 'Cache';
                console.log(`Bot (from Cache, score: ${cacheResults.results[0].score.toFixed(4)}): ${response}`);
            } else {
                // Cache Miss - call Mock LLM
                console.log('Cache Miss. Calling Mock LLM...');
                response = await mockLlmResponse(query);
                source = 'LLM';
                console.log(`Bot (from LLM): ${response}`);

                // Store new prompt-response pair in LangCache
                await langCache.set({ prompt: query, response: response });
                console.log('Response stored in LangCache for future use.');
            }
        } catch (error) {
            console.error('Error interacting with LangCache:', error.message);
            // Fallback to LLM directly if cache fails
            response = await mockLlmResponse(query);
            source = 'LLM (fallback)';
            console.log(`Bot (from LLM fallback): ${response}`);
        }

        chat(); // Continue the chat
    });
}

chat();

Node.js (projects/chatbot-project/mock_llm.js)

// projects/chatbot-project/mock_llm.js
async function mockLlmResponse(prompt) {
    // Simulate network delay for LLM call
    await new Promise(resolve => setTimeout(resolve, 1500)); 

    const lowerPrompt = prompt.toLowerCase();

    if (lowerPrompt.includes("hello") || lowerPrompt.includes("hi")) {
        return "Hello there! How can I assist you today?";
    } else if (lowerPrompt.includes("product features")) {
        return "Our latest product features include AI-powered analytics, real-time collaboration tools, and a secure cloud infrastructure.";
    } else if (lowerPrompt.includes("pricing")) {
        return "Our pricing plans start at $29 per month. Please visit our website for more details.";
    } else if (lowerPrompt.includes("contact support")) {
        return "You can reach our support team via email at support@example.com or call us at 1-800-555-0100.";
    } else if (lowerPrompt.includes("goodbye") || lowerPrompt.includes("bye")) {
        return "Goodbye! Have a great day!";
    } else {
        return `I'm a simple mock LLM. You asked: "${prompt}". I don't have a specific answer for that, but I can learn!`;
    }
}

module.exports = { mockLlmResponse };

Python (projects/chatbot-project/chatbot.py)

# projects/chatbot-project/chatbot.py
import os
import asyncio
import sys
import time
from dotenv import load_dotenv
from langcache import LangCache

# Load environment variables from the parent .env file
load_dotenv(dotenv_path='../../.env')

# Retrieve LangCache credentials
LANGCACHE_API_HOST = os.getenv("LANGCACHE_API_HOST")
LANGCACHE_CACHE_ID = os.getenv("LANGCACHE_CACHE_ID")
LANGCACHE_API_KEY = os.getenv("LANGCACHE_API_KEY")

# Initialize LangCache client
lang_cache = LangCache(
    server_url=f"https://{LANGCACHE_API_HOST}",
    cache_id=LANGCACHE_CACHE_ID,
    api_key=LANGCACHE_API_KEY
)

print("Chatbot initialized. Type 'exit' to quit.")

async def mock_llm_response(prompt: str) -> str:
    """Simulates an LLM API call with a delay and predefined responses."""
    # Simulate network delay for LLM call
    await asyncio.sleep(1.5)

    lower_prompt = prompt.lower()

    if "hello" in lower_prompt or "hi" in lower_prompt:
        return "Hello there! How can I assist you today?"
    elif "product features" in lower_prompt:
        return "Our latest product features include AI-powered analytics, real-time collaboration tools, and a secure cloud infrastructure."
    elif "pricing" in lower_prompt:
        return "Our pricing plans start at $29 per month. Please visit our website for more details."
    elif "contact support" in lower_prompt:
        return "You can reach our support team via email at support@example.com or call us at 1-800-555-0100."
    elif "goodbye" in lower_prompt or "bye" in lower_prompt:
        return "Goodbye! Have a great day!"
    else:
        return f"I'm a simple mock LLM. You asked: \"{prompt}\". I don't have a specific answer for that, but I can learn!"

async def chat():
    while True:
        try:
            query = await asyncio.to_thread(input, 'You: ')
        except EOFError: # Handle Ctrl+D
            print('Goodbye!')
            break

        if query.lower() == 'exit':
            print('Goodbye!')
            break

        response = ''
        source = ''

        # Try to get response from LangCache
        try:
            cache_results = await lang_cache.search(
                prompt=query,
                similarity_threshold=0.8 # Adjust as needed
            )

            if cache_results:
                response = cache_results[0].response
                source = 'Cache'
                print(f"Bot (from Cache, score: {cache_results[0].score:.4f}): {response}")
            else:
                # Cache Miss - call Mock LLM
                print('Cache Miss. Calling Mock LLM...')
                response = await mock_llm_response(query)
                source = 'LLM'
                print(f"Bot (from LLM): {response}")

                # Store new prompt-response pair in LangCache
                await lang_cache.set(prompt=query, response=response)
                print('Response stored in LangCache for future use.')
        except Exception as e:
            print(f"Error interacting with LangCache: {e}")
            # Fallback to LLM directly if cache fails
            response = await mock_llm_response(query)
            source = 'LLM (fallback)'
            print(f"Bot (from LLM fallback): {response}")

Step 2: Run and Test the Chatbot

Node.js:

Navigate to learn-redis-langcache/projects/chatbot-project.
Run node index.js.

Python:

Navigate to learn-redis-langcache/projects/chatbot-project.
Run python chatbot.py.

Testing Scenario:

First Interaction (Cache Miss):
- You: Hello there!
- Bot: Cache Miss. Calling Mock LLM...
- Bot: Hello there! How can I assist you today? (and “Response stored…”)
Second Interaction (Cache Hit):
- You: Hi!
- Bot: Bot (from Cache, score: X.XXX): Hello there! How can I assist you today? (Notice the “from Cache” and the score, and much faster response).
Another First Interaction (Cache Miss):
- You: What are your product's capabilities?
- Bot: Cache Miss. Calling Mock LLM...
- Bot: Our latest product features include AI-powered analytics...
Another Second Interaction (Cache Hit):
- You: Tell me about the product features.
- Bot: Bot (from Cache, score: X.XXX): Our latest product features include AI-powered analytics...
Completely New Question (Cache Miss):
- You: What is the meaning of life?
- Bot: Cache Miss. Calling Mock LLM...
- Bot: I'm a simple mock LLM. You asked: "What is the meaning of life?". I don't have a specific answer for that, but I can learn!

Observe the “Cache Miss” messages when a new or semantically distinct query is made, and the “Cache Hit” messages (with a much faster response time) when a similar query is repeated.

Step 3: Experiment and Refine

Challenge Yourself:

Adjust similarity_threshold:
- Change similarity_threshold in the search call (e.g., to 0.95 for stricter matches or 0.7 for looser matches). Rerun your chatbot and observe how the cache hit/miss behavior changes.
Add more mock LLM responses:
- Expand your mockLlmResponse (Node.js) or mock_llm_response (Python) function with more predefined questions and answers. Test if LangCache correctly identifies semantic similarities.
Implement Per-Entry TTL:
- Modify the langCache.set call to include a ttl for certain types of responses. For example, if a question is about “today’s news,” set a short TTL (e.g., 3600 seconds = 1 hour).
- Test by asking a time-sensitive question, waiting for the TTL to expire, and then asking it again.

Add user-specific context using Attributes:

Modify the chat function to accept a user_id (e.g., a simple hardcoded string or ask the user for it at the start).
Store and retrieve responses using this user_id as an attribute. This ensures that one user’s cached responses don’t interfere with another’s if the context is user-specific.

Hint for user-specific context:

# Python
user_id = "user_A" # Or get from input
# Store:
await lang_cache.set(prompt=query, response=response, metadata={"user_id": user_id})
# Search:
cache_results = await lang_cache.search(prompt=query, attributes={"user_id": user_id})

// Node.js
const userId = "user_A"; // Or get from input
// Store:
await langCache.set({ prompt: query, response: response, attributes: { userId: userId } });
// Search:
const cacheResults = await langCache.search({ prompt: query, attributes: { userId: userId } });

This project provides a practical foundation for integrating Redis LangCache into real-world AI applications. By actively experimenting with parameters and features, you’ll gain deeper insights into its capabilities.