Guided Project 1: Building a Cached LLM Chatbot

5. Guided Project 1: Building a Cached LLM Chatbot

In this project, you will build a basic chatbot that answers user questions. The core idea is to integrate Redis LangCache to minimize calls to a simulated expensive LLM, thereby improving response times and reducing operational costs.

Project Objective

To develop a simple command-line chatbot that processes user queries. For each query:

  1. It first checks Redis LangCache for a semantically similar answer.
  2. If a cached answer is found (cache hit), it returns it immediately.
  3. If no cached answer is found (cache miss), it calls a mock LLM (simulating an actual LLM API call) to get a fresh response.
  4. The new prompt-response pair from the mock LLM is then stored in LangCache for future use.

Prerequisites

  • Completed “Setting Up Your Development Environment” (Chapter 1).
  • Understanding of “Core Concepts of Semantic Caching” (Chapter 2) and “Basic Operations” (Chapter 3).

Project Structure

Create a new directory for this project, e.g., learn-redis-langcache/projects/chatbot-project.

chatbot-project/index.js (for Node.js) or chatbot-project/chatbot.py (for Python)

chatbot-project/mock_llm.js or chatbot-project/mock_llm.py

.env file in the root learn-redis-langcache directory (as set up in Chapter 1).

Step-by-Step Instructions

Step 1: Initialize LangCache Client and Mock LLM

We’ll start by setting up our LangCache client and a simple mock LLM function. The mock LLM will simulate the behavior of a real LLM by providing responses for a few predefined queries and a generic fallback for others.

Node.js (projects/chatbot-project/index.js)

// projects/chatbot-project/index.js
require('dotenv').config({ path: '../../.env' }); // Adjust path to your .env file

const { LangCache } = require('@redis-ai/langcache');
const readline = require('readline');
const { mockLlmResponse } = require('./mock_llm'); // Will create this next

// Retrieve LangCache credentials
const LANGCACHE_API_HOST = process.env.LANGCACHE_API_HOST;
const LANGCACHE_CACHE_ID = process.env.LANGCACHE_CACHE_ID;
const LANGCACHE_API_KEY = process.env.LANGCACHE_API_KEY;

// Initialize LangCache client
const langCache = new LangCache({
    serverURL: `https://${LANGCACHE_API_HOST}`,
    cacheId: LANGCACHE_CACHE_ID,
    apiKey: LANGCACHE_API_KEY,
});

console.log("Chatbot initialized. Type 'exit' to quit.");

// Readline interface for user input
const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
});

async function chat() {
    rl.question('You: ', async (query) => {
        if (query.toLowerCase() === 'exit') {
            console.log('Goodbye!');
            rl.close();
            return;
        }

        let response = '';
        let source = '';

        // Try to get response from LangCache
        try {
            const cacheResults = await langCache.search({
                prompt: query,
                similarityThreshold: 0.8 // Adjust as needed
            });

            if (cacheResults && cacheResults.results.length > 0) {
                response = cacheResults.results[0].response;
                source = 'Cache';
                console.log(`Bot (from Cache, score: ${cacheResults.results[0].score.toFixed(4)}): ${response}`);
            } else {
                // Cache Miss - call Mock LLM
                console.log('Cache Miss. Calling Mock LLM...');
                response = await mockLlmResponse(query);
                source = 'LLM';
                console.log(`Bot (from LLM): ${response}`);

                // Store new prompt-response pair in LangCache
                await langCache.set({ prompt: query, response: response });
                console.log('Response stored in LangCache for future use.');
            }
        } catch (error) {
            console.error('Error interacting with LangCache:', error.message);
            // Fallback to LLM directly if cache fails
            response = await mockLlmResponse(query);
            source = 'LLM (fallback)';
            console.log(`Bot (from LLM fallback): ${response}`);
        }

        chat(); // Continue the chat
    });
}

chat();

Node.js (projects/chatbot-project/mock_llm.js)

// projects/chatbot-project/mock_llm.js
async function mockLlmResponse(prompt) {
    // Simulate network delay for LLM call
    await new Promise(resolve => setTimeout(resolve, 1500)); 

    const lowerPrompt = prompt.toLowerCase();

    if (lowerPrompt.includes("hello") || lowerPrompt.includes("hi")) {
        return "Hello there! How can I assist you today?";
    } else if (lowerPrompt.includes("product features")) {
        return "Our latest product features include AI-powered analytics, real-time collaboration tools, and a secure cloud infrastructure.";
    } else if (lowerPrompt.includes("pricing")) {
        return "Our pricing plans start at $29 per month. Please visit our website for more details.";
    } else if (lowerPrompt.includes("contact support")) {
        return "You can reach our support team via email at support@example.com or call us at 1-800-555-0100.";
    } else if (lowerPrompt.includes("goodbye") || lowerPrompt.includes("bye")) {
        return "Goodbye! Have a great day!";
    } else {
        return `I'm a simple mock LLM. You asked: "${prompt}". I don't have a specific answer for that, but I can learn!`;
    }
}

module.exports = { mockLlmResponse };

Python (projects/chatbot-project/chatbot.py)

# projects/chatbot-project/chatbot.py
import os
import asyncio
import sys
import time
from dotenv import load_dotenv
from langcache import LangCache

# Load environment variables from the parent .env file
load_dotenv(dotenv_path='../../.env')

# Retrieve LangCache credentials
LANGCACHE_API_HOST = os.getenv("LANGCACHE_API_HOST")
LANGCACHE_CACHE_ID = os.getenv("LANGCACHE_CACHE_ID")
LANGCACHE_API_KEY = os.getenv("LANGCACHE_API_KEY")

# Initialize LangCache client
lang_cache = LangCache(
    server_url=f"https://{LANGCACHE_API_HOST}",
    cache_id=LANGCACHE_CACHE_ID,
    api_key=LANGCACHE_API_KEY
)

print("Chatbot initialized. Type 'exit' to quit.")

async def mock_llm_response(prompt: str) -> str:
    """Simulates an LLM API call with a delay and predefined responses."""
    # Simulate network delay for LLM call
    await asyncio.sleep(1.5)

    lower_prompt = prompt.lower()

    if "hello" in lower_prompt or "hi" in lower_prompt:
        return "Hello there! How can I assist you today?"
    elif "product features" in lower_prompt:
        return "Our latest product features include AI-powered analytics, real-time collaboration tools, and a secure cloud infrastructure."
    elif "pricing" in lower_prompt:
        return "Our pricing plans start at $29 per month. Please visit our website for more details."
    elif "contact support" in lower_prompt:
        return "You can reach our support team via email at support@example.com or call us at 1-800-555-0100."
    elif "goodbye" in lower_prompt or "bye" in lower_prompt:
        return "Goodbye! Have a great day!"
    else:
        return f"I'm a simple mock LLM. You asked: \"{prompt}\". I don't have a specific answer for that, but I can learn!"

async def chat():
    while True:
        try:
            query = await asyncio.to_thread(input, 'You: ')
        except EOFError: # Handle Ctrl+D
            print('Goodbye!')
            break

        if query.lower() == 'exit':
            print('Goodbye!')
            break

        response = ''
        source = ''

        # Try to get response from LangCache
        try:
            cache_results = await lang_cache.search(
                prompt=query,
                similarity_threshold=0.8 # Adjust as needed
            )

            if cache_results:
                response = cache_results[0].response
                source = 'Cache'
                print(f"Bot (from Cache, score: {cache_results[0].score:.4f}): {response}")
            else:
                # Cache Miss - call Mock LLM
                print('Cache Miss. Calling Mock LLM...')
                response = await mock_llm_response(query)
                source = 'LLM'
                print(f"Bot (from LLM): {response}")

                # Store new prompt-response pair in LangCache
                await lang_cache.set(prompt=query, response=response)
                print('Response stored in LangCache for future use.')
        except Exception as e:
            print(f"Error interacting with LangCache: {e}")
            # Fallback to LLM directly if cache fails
            response = await mock_llm_response(query)
            source = 'LLM (fallback)'
            print(f"Bot (from LLM fallback): {response}")

Step 2: Run and Test the Chatbot

Node.js:

  1. Navigate to learn-redis-langcache/projects/chatbot-project.
  2. Run node index.js.

Python:

  1. Navigate to learn-redis-langcache/projects/chatbot-project.
  2. Run python chatbot.py.

Testing Scenario:

  • First Interaction (Cache Miss):
    • You: Hello there!
    • Bot: Cache Miss. Calling Mock LLM...
    • Bot: Hello there! How can I assist you today? (and “Response stored…”)
  • Second Interaction (Cache Hit):
    • You: Hi!
    • Bot: Bot (from Cache, score: X.XXX): Hello there! How can I assist you today? (Notice the “from Cache” and the score, and much faster response).
  • Another First Interaction (Cache Miss):
    • You: What are your product's capabilities?
    • Bot: Cache Miss. Calling Mock LLM...
    • Bot: Our latest product features include AI-powered analytics...
  • Another Second Interaction (Cache Hit):
    • You: Tell me about the product features.
    • Bot: Bot (from Cache, score: X.XXX): Our latest product features include AI-powered analytics...
  • Completely New Question (Cache Miss):
    • You: What is the meaning of life?
    • Bot: Cache Miss. Calling Mock LLM...
    • Bot: I'm a simple mock LLM. You asked: "What is the meaning of life?". I don't have a specific answer for that, but I can learn!

Observe the “Cache Miss” messages when a new or semantically distinct query is made, and the “Cache Hit” messages (with a much faster response time) when a similar query is repeated.

Step 3: Experiment and Refine

Challenge Yourself:

  1. Adjust similarity_threshold:

    • Change similarity_threshold in the search call (e.g., to 0.95 for stricter matches or 0.7 for looser matches). Rerun your chatbot and observe how the cache hit/miss behavior changes.
  2. Add more mock LLM responses:

    • Expand your mockLlmResponse (Node.js) or mock_llm_response (Python) function with more predefined questions and answers. Test if LangCache correctly identifies semantic similarities.
  3. Implement Per-Entry TTL:

    • Modify the langCache.set call to include a ttl for certain types of responses. For example, if a question is about “today’s news,” set a short TTL (e.g., 3600 seconds = 1 hour).
    • Test by asking a time-sensitive question, waiting for the TTL to expire, and then asking it again.
  4. Add user-specific context using Attributes:

    • Modify the chat function to accept a user_id (e.g., a simple hardcoded string or ask the user for it at the start).
    • Store and retrieve responses using this user_id as an attribute. This ensures that one user’s cached responses don’t interfere with another’s if the context is user-specific.

    Hint for user-specific context:

    # Python
    user_id = "user_A" # Or get from input
    # Store:
    await lang_cache.set(prompt=query, response=response, metadata={"user_id": user_id})
    # Search:
    cache_results = await lang_cache.search(prompt=query, attributes={"user_id": user_id})
    
    // Node.js
    const userId = "user_A"; // Or get from input
    // Store:
    await langCache.set({ prompt: query, response: response, attributes: { userId: userId } });
    // Search:
    const cacheResults = await langCache.search({ prompt: query, attributes: { userId: userId } });
    

This project provides a practical foundation for integrating Redis LangCache into real-world AI applications. By actively experimenting with parameters and features, you’ll gain deeper insights into its capabilities.