Advanced LangCache Features and Optimization

4. Advanced LangCache Features and Optimization

Beyond basic set and search operations, Redis LangCache offers several powerful features and configuration options to fine-tune its behavior. Understanding these allows you to optimize cache performance, cost efficiency, and relevance for your specific AI applications.

4.1 Fine-tuning Similarity Threshold

The similarity_threshold (Python) or similarityThreshold (Node.js) parameter in the search method is crucial. It determines how closely a new prompt’s embedding must match a cached embedding for it to be considered a “hit.”

  • Higher Threshold (e.g., 0.95-1.0):

    • Pros: Very precise matches. Reduces the chance of returning irrelevant responses.
    • Cons: More cache misses. May not catch subtle semantic variations, leading to more LLM calls.
    • Use Case: When accuracy and strict relevance are paramount, and false positives are unacceptable.
  • Lower Threshold (e.g., 0.7-0.8):

    • Pros: More cache hits, even for loosely similar prompts. Maximizes cost savings and speed.
    • Cons: Increased chance of returning less relevant or slightly off-topic responses.
    • Use Case: When a broader range of answers is acceptable, and optimizing for speed and cost is the priority (e.g., general chatbots where a “good enough” answer is often sufficient).

The optimal threshold depends heavily on your application’s domain, user expectations, and the quality of your embedding model. It’s often an iterative process of experimentation and monitoring.

Code Example: Experimenting with Thresholds

# python-examples/advanced_features.py
import os
import asyncio
from dotenv import load_dotenv
from langcache import LangCache

load_dotenv()

LANGCACHE_API_HOST = os.getenv("LANGCACHE_API_HOST")
LANGCACHE_CACHE_ID = os.getenv("LANGCACHE_CACHE_ID")
LANGCACHE_API_KEY = os.getenv("LANGCACHE_API_KEY")

lang_cache = LangCache(
    server_url=f"https://{LANGCACHE_API_HOST}",
    cache_id=LANGCACHE_CACHE_ID,
    api_key=LANGCACHE_API_KEY
)

async def threshold_experiment():
    print("--- Similarity Threshold Experiment (Python) ---")

    await lang_cache.set(prompt="What are the key features of the latest iPhone?", response="The latest iPhone typically features a powerful Bionic chip, advanced camera systems including ProRes video, Ceramic Shield front cover, and MagSafe technology.")
    await asyncio.sleep(2) # Allow indexing

    query_strict = "Tell me about the iPhone's camera capabilities."
    query_lenient = "What's new with iPhone?"

    print(f"\nQuery (Strict): '{query_strict}'")
    for threshold in [0.95, 0.85, 0.75]:
        results = await lang_cache.search(prompt=query_strict, similarity_threshold=threshold)
        status = "Hit" if results else "Miss"
        score = results[0].score if results else "N/A"
        print(f"  Threshold {threshold:.2f}: {status}, Score: {score:.4f}")

    print(f"\nQuery (Lenient): '{query_lenient}'")
    for threshold in [0.95, 0.85, 0.75]:
        results = await lang_cache.search(prompt=query_lenient, similarity_threshold=threshold)
        status = "Hit" if results else "Miss"
        score = results[0].score if results else "N/A"
        print(f"  Threshold {threshold:.2f}: {status}, Score: {score:.4f}")

    print("\nThreshold experiment complete.")

# In main(): await threshold_experiment()
// nodejs-examples/advanced_features.js
require('dotenv').config({ path: '../.env' });
const { LangCache } = require('@redis-ai/langcache');

const LANGCACHE_API_HOST = process.env.LANGCACHE_API_HOST;
const LANGCACHE_CACHE_ID = process.env.LANGCACHE_CACHE_ID;
const LANGCACHE_API_KEY = process.env.LANGCACHE_API_KEY;

const langCache = new LangCache({
    serverURL: `https://${LANGCACHE_API_HOST}`,
    cacheId: LANGCACHE_CACHE_ID,
    apiKey: LANGCACHE_API_KEY,
});

async function thresholdExperiment() {
    console.log("--- Similarity Threshold Experiment (Node.js) ---");

    await langCache.set({ prompt: "What are the key features of the latest iPhone?", response: "The latest iPhone typically features a powerful Bionic chip, advanced camera systems including ProRes video, Ceramic Shield front cover, and MagSafe technology." });
    await new Promise(resolve => setTimeout(resolve, 2000)); // Allow indexing

    const queryStrict = "Tell me about the iPhone's camera capabilities.";
    const queryLenient = "What's new with iPhone?";

    console.log(`\nQuery (Strict): '${queryStrict}'`);
    for (const threshold of [0.95, 0.85, 0.75]) {
        const results = await langCache.search({ prompt: queryStrict, similarityThreshold: threshold });
        const status = (results && results.results.length > 0) ? "Hit" : "Miss";
        const score = (results && results.results.length > 0) ? results.results[0].score.toFixed(4) : "N/A";
        console.log(`  Threshold ${threshold.toFixed(2)}: ${status}, Score: ${score}`);
    }

    console.log(`\nQuery (Lenient): '${queryLenient}'`);
    for (const threshold of [0.95, 0.85, 0.75]) {
        const results = await langCache.search({ prompt: queryLenient, similarityThreshold: threshold });
        const status = (results && results.results.length > 0) ? "Hit" : "Miss";
        const score = (results && results.results.length > 0) ? results.results[0].score.toFixed(4) : "N/A";
        console.log(`  Threshold ${threshold.toFixed(2)}: ${status}, Score: ${score}`);
    }
    console.log("\nThreshold experiment complete.");
}

// In main(): await thresholdExperiment();

4.2 Managing Time-To-Live (TTL)

TTL (Time-To-Live) defines how long a cached entry remains valid before it’s automatically evicted from the cache. This is crucial for managing cache size and ensuring data freshness.

  • Global TTL: Configured when you create your LangCache service on Redis Cloud. This applies to all entries unless overridden.
  • Per-Entry TTL: You can specify a ttl (in seconds) when using the set method. This overrides the global TTL for that specific entry.

When to use TTL:

  • Volatile Information: For data that changes frequently (e.g., stock prices, weather updates), a short TTL is appropriate.
  • Less Volatile Information: For more stable information (e.g., historical facts, general product descriptions), a longer TTL or no TTL might be suitable.
  • Cache Management: Prevents the cache from growing indefinitely and consuming excessive resources.

Code Example: Per-Entry TTL

# In python-examples/advanced_features.py
async def ttl_example():
    print("\n--- Per-Entry TTL Example (Python) ---")

    # Store an entry with a short TTL (e.g., 10 seconds)
    prompt_short_ttl = "What is the capital of Mars?"
    response_short_ttl = "There is no official capital of Mars, as it is not inhabited by humans with a governing body."
    short_ttl = 10
    print(f"Storing: '{prompt_short_ttl}' with TTL: {short_ttl} seconds")
    key_short = await lang_cache.set(prompt=prompt_short_ttl, response=response_short_ttl, ttl=short_ttl)
    print(f"Stored with key: {key_short}")

    # Search immediately (should be a hit)
    print(f"Searching immediately for '{prompt_short_ttl}'...")
    results_immediate = await lang_cache.search(prompt="Mars capital?")
    if results_immediate:
        print(f"  Immediate search: Cache Hit! Response: {results_immediate[0].response[:30]}..., Score: {results_immediate[0].score:.4f}")
    else:
        print("  Immediate search: Cache Miss.")

    # Wait for TTL to expire
    print(f"Waiting {short_ttl + 2} seconds for entry to expire...")
    await asyncio.sleep(short_ttl + 2)

    # Search after TTL expiration (should be a miss)
    print(f"Searching after {short_ttl} seconds for '{prompt_short_ttl}'...")
    results_expired = await lang_cache.search(prompt="Mars capital?")
    if results_expired:
        print(f"  Expired search: Cache Hit! (Unexpected) Response: {results_expired[0].response[:30]}..., Score: {results_expired[0].score:.4f}")
    else:
        print("  Expired search: Cache Miss. (Expected)")

    print("\nTTL example complete.")

# In main(): await ttl_example()
// In nodejs-examples/advanced_features.js
async function ttlExample() {
    console.log("\n--- Per-Entry TTL Example (Node.js) ---");

    // Store an entry with a short TTL (e.g., 10 seconds)
    const promptShortTtl = "What is the capital of Mars?";
    const responseShortTtl = "There is no official capital of Mars, as it is not inhabited by humans with a governing body.";
    const shortTtl = 10; // seconds
    console.log(`Storing: '${promptShortTtl}' with TTL: ${shortTtl} seconds`);
    const storeResultShort = await langCache.set({ prompt: promptShortTtl, response: responseShortTtl, ttl: shortTtl });
    console.log(`Stored with entry ID: ${storeResultShort.entryId}`);

    // Search immediately (should be a hit)
    console.log(`Searching immediately for '${promptShortTtl}'...`);
    const resultsImmediate = await langCache.search({ prompt: "Mars capital?" });
    if (resultsImmediate && resultsImmediate.results.length > 0) {
        console.log(`  Immediate search: Cache Hit! Response: ${resultsImmediate.results[0].response.substring(0, 30)}..., Score: ${resultsImmediate.results[0].score.toFixed(4)}`);
    } else {
        console.log("  Immediate search: Cache Miss.");
    }

    // Wait for TTL to expire
    console.log(`Waiting ${shortTtl + 2} seconds for entry to expire...`);
    await new Promise(resolve => setTimeout(resolve, (shortTtl + 2) * 1000));

    // Search after TTL expiration (should be a miss)
    console.log(`Searching after ${shortTtl} seconds for '${promptShortTtl}'...`);
    const resultsExpired = await langCache.search({ prompt: "Mars capital?" });
    if (resultsExpired && resultsExpired.results.length > 0) {
        console.log(`  Expired search: Cache Hit! (Unexpected) Response: ${resultsExpired.results[0].response.substring(0, 30)}..., Score: ${resultsExpired.results[0].score.toFixed(4)}`);
    } else {
        console.log("  Expired search: Cache Miss. (Expected)");
    }
    console.log("\nTTL example complete.");
}

// In main(): await ttlExample();

Attributes (referred to as metadata in Python SDK set method, and attributes in Node.js SDK set and search methods) allow you to add custom key-value pairs to cached entries. This is incredibly powerful for scoping your cache operations.

Use Cases for Attributes:

  • Multi-tenant applications: Cache responses specific to a user_id, organization_id, or session_id.
  • Contextual caching: Store responses based on the topic, language, persona, or source of the query.
  • A/B testing: Cache results for different LLM models or prompt engineering strategies.

When you search with attributes, LangCache will only consider cached entries that match all the provided attributes in addition to semantic similarity.

Code Example: Attributes in Action

# In python-examples/advanced_features.py
async def attribute_caching_example():
    print("\n--- Attribute Caching Example (Python) ---")

    # Store responses with different user IDs and topics
    await lang_cache.set(prompt="How do I reset my password?", response="Visit the 'Forgot Password' page and follow the instructions.", metadata={"user_id": "user123", "topic": "account"})
    await lang_cache.set(prompt="How do I change my profile picture?", response="Go to 'Profile Settings' and upload a new image.", metadata={"user_id": "user123", "topic": "account"})
    await lang_cache.set(prompt="What are the billing options?", response="We accept credit card, PayPal, and bank transfers.", metadata={"user_id": "user456", "topic": "billing"})
    await asyncio.sleep(2)

    # Search for user123's account topic query
    query_user1 = "I forgot my login details."
    attributes_user1 = {"user_id": "user123", "topic": "account"}
    print(f"\nSearching for '{query_user1}' with attributes {attributes_user1}")
    results1 = await lang_cache.search(prompt=query_user1, attributes=attributes_user1)
    if results1:
        print(f"  Cache Hit! Response: {results1[0].response[:50]}..., Score: {results1[0].score:.4f}")
    else:
        print("  Cache Miss. (Unexpected)")

    # Search for user456's account topic query (should be a miss due to user_id mismatch)
    query_user2 = "I forgot my login details."
    attributes_user2 = {"user_id": "user456", "topic": "account"} # User ID mismatch
    print(f"\nSearching for '{query_user2}' with attributes {attributes_user2}")
    results2 = await lang_cache.search(prompt=query_user2, attributes=attributes_user2)
    if results2:
        print(f"  Cache Hit! (Unexpected) Response: {results2[0].response[:50]}..., Score: {results2[0].score:.4f}")
    else:
        print("  Cache Miss. (Expected due to user_id mismatch)")

    # Search for user456's billing topic query
    query_billing = "What are the ways to pay?"
    attributes_billing = {"user_id": "user456", "topic": "billing"}
    print(f"\nSearching for '{query_billing}' with attributes {attributes_billing}")
    results3 = await lang_cache.search(prompt=query_billing, attributes=attributes_billing)
    if results3:
        print(f"  Cache Hit! Response: {results3[0].response[:50]}..., Score: {results3[0].score:.4f}")
    else:
        print("  Cache Miss. (Unexpected)")

    print("\nAttribute caching example complete.")

# In main(): await attribute_caching_example()
// In nodejs-examples/advanced_features.js
async function attributeCachingExample() {
    console.log("\n--- Attribute Caching Example (Node.js) ---");

    // Store responses with different user IDs and topics
    await langCache.set({ prompt: "How do I reset my password?", response: "Visit the 'Forgot Password' page and follow the instructions.", attributes: { userId: "user123", topic: "account" } });
    await langCache.set({ prompt: "How do I change my profile picture?", response: "Go to 'Profile Settings' and upload a new image.", attributes: { userId: "user123", topic: "account" } });
    await langCache.set({ prompt: "What are the billing options?", response: "We accept credit card, PayPal, and bank transfers.", attributes: { userId: "user456", topic: "billing" } });
    await new Promise(resolve => setTimeout(resolve, 2000));

    // Search for user123's account topic query
    const queryUser1 = "I forgot my login details.";
    const attributesUser1 = { userId: "user123", topic: "account" };
    console.log(`\nSearching for '${queryUser1}' with attributes ${JSON.stringify(attributesUser1)}`);
    const results1 = await langCache.search({ prompt: queryUser1, attributes: attributesUser1 });
    if (results1 && results1.results.length > 0) {
        console.log(`  Cache Hit! Response: ${results1.results[0].response.substring(0, 50)}..., Score: ${results1.results[0].score.toFixed(4)}`);
    } else {
        console.log("  Cache Miss. (Unexpected)");
    }

    // Search for user456's account topic query (should be a miss due to userId mismatch)
    const queryUser2 = "I forgot my login details.";
    const attributesUser2 = { userId: "user456", topic: "account" }; // User ID mismatch
    console.log(`\nSearching for '${queryUser2}' with attributes ${JSON.stringify(attributesUser2)}`);
    const results2 = await langCache.search({ prompt: queryUser2, attributes: attributesUser2 });
    if (results2 && results2.results.length > 0) {
        console.log(`  Cache Hit! (Unexpected) Response: ${results2.results[0].response.substring(0, 50)}..., Score: ${results2.results[0].score.toFixed(4)}`);
    } else {
        console.log("  Cache Miss. (Expected due to userId mismatch)");
    }

    // Search for user456's billing topic query
    const queryBilling = "What are the ways to pay?";
    const attributesBilling = { userId: "user456", topic: "billing" };
    console.log(`\nSearching for '${queryBilling}' with attributes ${JSON.stringify(attributesBilling)}`);
    const results3 = await langCache.search({ prompt: queryBilling, attributes: attributesBilling });
    if (results3 && results3.results.length > 0) {
        console.log(`  Cache Hit! Response: ${results3.results[0].response.substring(0, 50)}..., Score: ${results3.results[0].score.toFixed(4)}`);
    } else {
        console.log("  Cache Miss. (Unexpected)");
    }
    console.log("\nAttribute caching example complete.");
}

// In main(): await attributeCachingExample();

4.4 Search Strategies (Exact vs. Semantic)

The LangCache SDKs (especially Python) allow you to specify search_strategies in the search method. This enables you to combine exact keyword matching with semantic similarity for more robust retrieval.

  • SearchStrategy.EXACT: Prioritizes exact matches of the prompt text. This is useful for very specific queries where even minor semantic variations might lead to different intended answers.
  • SearchStrategy.SEMANTIC: Relies solely on semantic similarity, as we’ve discussed.

When both are provided, LangCache will typically try exact first and then fall back to semantic if no exact match is found within a certain threshold.

Code Example: Combining Search Strategies (Python SDK specific)

The Node.js SDK for @redis-ai/langcache currently doesn’t expose SearchStrategy enum directly in the same way as langcache for Python. Its search method primarily focuses on semantic search and filtering via attributes. However, you can achieve a similar effect in Node.js by performing an exact match check before a semantic search, or by leveraging very high similarity thresholds.

Python Example:

# In python-examples/advanced_features.py
from langcache.models import SearchStrategy # Import for Python

async def search_strategies_example():
    print("\n--- Search Strategies Example (Python) ---")

    await lang_cache.set(prompt="What is the capital of Canada?", response="Ottawa")
    await lang_cache.set(prompt="Tell me about Ottawa, the capital city of Canada.", response="Ottawa is located on the Ottawa River and is known for its parliament buildings, Rideau Canal, and diverse cultural institutions.")
    await asyncio.sleep(2)

    query1 = "What is the capital of Canada?"
    query2 = "Canada's capital is which city?"

    # Exact search
    print(f"\nSearching for '{query1}' using EXACT strategy...")
    results_exact = await lang_cache.search(prompt=query1, search_strategies=[SearchStrategy.EXACT])
    if results_exact:
        print(f"  Exact Hit! Response: {results_exact[0].response[:50]}..., Score: {results_exact[0].score:.4f}")
    else:
        print("  Exact Miss. (Expected if prompt isn't exact in cache)")

    # Semantic search for a variation
    print(f"\nSearching for '{query2}' using SEMANTIC strategy...")
    results_semantic = await lang_cache.search(prompt=query2, search_strategies=[SearchStrategy.SEMANTIC], similarity_threshold=0.8)
    if results_semantic:
        print(f"  Semantic Hit! Response: {results_semantic[0].response[:50]}..., Score: {results_semantic[0].score:.4f}")
    else:
        print("  Semantic Miss.")

    # Combined strategy (default if not specified, often prioritizes exact)
    print(f"\nSearching for '{query1}' using DEFAULT strategies (Exact then Semantic)...")
    results_combined = await lang_cache.search(prompt=query1) # Default behavior
    if results_combined:
        print(f"  Combined Hit! Response: {results_combined[0].response[:50]}..., Score: {results_combined[0].score:.4f}")
    else:
        print("  Combined Miss.")

    print("\nSearch strategies example complete.")

# In main(): await search_strategies_example()

4.5 Best Practices for Optimal Performance

  1. Monitor Cache Hit Rate: Regularly check the cache hit rate in your Redis Cloud console. A low hit rate indicates that LangCache might not be effectively reducing LLM calls, suggesting a need to adjust thresholds, review cached content, or improve query patterns.
  2. Optimize TTL: Set appropriate TTLs for your cached data. Dynamic or time-sensitive information needs shorter TTLs, while static reference data can have longer ones or no TTL.
  3. Thoughtful Attribute Usage: Use attributes to segment your cache logically. This prevents irrelevant data from being considered during a search, improving both accuracy and performance. Avoid overly granular attributes if not necessary.
  4. Batch Operations (if available/applicable): If your application generates multiple responses at once, explore if the LangCache API or SDKs offer batch set operations to reduce network overhead. (As of current preview, this might not be explicitly exposed in the SDKs but is a general caching best practice).
  5. Error Handling and Fallbacks: Always wrap your LangCache calls in robust error handling. If LangCache is unavailable or returns an error, your application should gracefully fall back to calling the LLM directly to ensure continuity.
  6. Pre-cache Common Queries: For frequently asked questions or critical prompts, consider pre-populating your LangCache with high-quality responses to ensure high hit rates from the start.
  7. Choose the Right Embedding Model: If you have the option to choose, ensure your embedding model is well-suited for the domain and language of your prompts to generate high-quality embeddings.
  8. Understand “Semantic Similarity”: Educate your team on what semantic similarity implies. It’s not magic; slight nuances in language can sometimes lead to different embeddings. Test thoroughly.

Combined main functions for advanced_features.py and advanced_features.js (uncomment to run all examples):

python-examples/advanced_features.py

import os
import asyncio
from dotenv import load_dotenv
from langcache import LangCache
from langcache.models import SearchStrategy # Import for Python specific example

load_dotenv()

LANGCACHE_API_HOST = os.getenv("LANGCACHE_API_HOST")
LANGCACHE_CACHE_ID = os.getenv("LANGCACHE_CACHE_ID")
LANGCACHE_API_KEY = os.getenv("LANGCACHE_API_KEY")

lang_cache = LangCache(
    server_url=f"https://{LANGCACHE_API_HOST}",
    cache_id=LANGCACHE_CACHE_ID,
    api_key=LANGCACHE_API_KEY
)

async def threshold_experiment():
    print("--- Similarity Threshold Experiment (Python) ---")
    await lang_cache.set(prompt="What are the key features of the latest iPhone?", response="The latest iPhone typically features a powerful Bionic chip, advanced camera systems including ProRes video, Ceramic Shield front cover, and MagSafe technology.")
    await asyncio.sleep(2)

    query_strict = "Tell me about the iPhone's camera capabilities."
    query_lenient = "What's new with iPhone?"

    print(f"\nQuery (Strict): '{query_strict}'")
    for threshold in [0.95, 0.85, 0.75]:
        results = await lang_cache.search(prompt=query_strict, similarity_threshold=threshold)
        status = "Hit" if results else "Miss"
        score = results[0].score if results else "N/A"
        print(f"  Threshold {threshold:.2f}: {status}, Score: {score:.4f}")

    print(f"\nQuery (Lenient): '{query_lenient}'")
    for threshold in [0.95, 0.85, 0.75]:
        results = await lang_cache.search(prompt=query_lenient, similarity_threshold=threshold)
        status = "Hit" if results else "Miss"
        score = results[0].score if results else "N/A"
        print(f"  Threshold {threshold:.2f}: {status}, Score: {score:.4f}")
    print("\nThreshold experiment complete.")

async def ttl_example():
    print("\n--- Per-Entry TTL Example (Python) ---")
    prompt_short_ttl = "What is the capital of Mars?"
    response_short_ttl = "There is no official capital of Mars, as it is not inhabited by humans with a governing body."
    short_ttl = 10
    print(f"Storing: '{prompt_short_ttl}' with TTL: {short_ttl} seconds")
    key_short = await lang_cache.set(prompt=prompt_short_ttl, response=response_short_ttl, ttl=short_ttl)
    print(f"Stored with key: {key_short}")

    print(f"Searching immediately for '{prompt_short_ttl}'...")
    results_immediate = await lang_cache.search(prompt="Mars capital?")
    if results_immediate:
        print(f"  Immediate search: Cache Hit! Response: {results_immediate[0].response[:30]}..., Score: {results_immediate[0].score:.4f}")
    else:
        print("  Immediate search: Cache Miss.")

    print(f"Waiting {short_ttl + 2} seconds for entry to expire...")
    await asyncio.sleep(short_ttl + 2)

    print(f"Searching after {short_ttl} seconds for '{prompt_short_ttl}'...")
    results_expired = await lang_cache.search(prompt="Mars capital?")
    if results_expired:
        print(f"  Expired search: Cache Hit! (Unexpected) Response: {results_expired[0].response[:30]}..., Score: {results_expired[0].score:.4f}")
    else:
        print("  Expired search: Cache Miss. (Expected)")
    print("\nTTL example complete.")

async def attribute_caching_example():
    print("\n--- Attribute Caching Example (Python) ---")
    await lang_cache.set(prompt="How do I reset my password?", response="Visit the 'Forgot Password' page and follow the instructions.", metadata={"user_id": "user123", "topic": "account"})
    await lang_cache.set(prompt="How do I change my profile picture?", response="Go to 'Profile Settings' and upload a new image.", metadata={"user_id": "user123", "topic": "account"})
    await lang_cache.set(prompt="What are the billing options?", response="We accept credit card, PayPal, and bank transfers.", metadata={"user_id": "user456", "topic": "billing"})
    await asyncio.sleep(2)

    query_user1 = "I forgot my login details."
    attributes_user1 = {"user_id": "user123", "topic": "account"}
    print(f"\nSearching for '{query_user1}' with attributes {attributes_user1}")
    results1 = await lang_cache.search(prompt=query_user1, attributes=attributes_user1)
    if results1:
        print(f"  Cache Hit! Response: {results1[0].response[:50]}..., Score: {results1[0].score:.4f}")
    else:
        print("  Cache Miss. (Unexpected)")

    query_user2 = "I forgot my login details."
    attributes_user2 = {"user_id": "user456", "topic": "account"}
    print(f"\nSearching for '{query_user2}' with attributes {attributes_user2}")
    results2 = await lang_cache.search(prompt=query_user2, attributes=attributes_user2)
    if results2:
        print(f"  Cache Hit! (Unexpected) Response: {results2[0].response[:50]}..., Score: {results2[0].score:.4f}")
    else:
        print("  Cache Miss. (Expected due to user_id mismatch)")

    query_billing = "What are the ways to pay?"
    attributes_billing = {"user_id": "user456", "topic": "billing"}
    print(f"\nSearching for '{query_billing}' with attributes {attributes_billing}")
    results3 = await lang_cache.search(prompt=query_billing, attributes=attributes_billing)
    if results3:
        print(f"  Cache Hit! Response: {results3[0].response[:50]}..., Score: {results3[0].score:.4f}")
    else:
        print("  Cache Miss. (Unexpected)")
    print("\nAttribute caching example complete.")

async def search_strategies_example():
    print("\n--- Search Strategies Example (Python) ---")
    await lang_cache.set(prompt="What is the capital of Canada?", response="Ottawa")
    await lang_cache.set(prompt="Tell me about Ottawa, the capital city of Canada.", response="Ottawa is located on the Ottawa River and is known for its parliament buildings, Rideau Canal, and diverse cultural institutions.")
    await asyncio.sleep(2)

    query1 = "What is the capital of Canada?"
    query2 = "Canada's capital is which city?"

    print(f"\nSearching for '{query1}' using EXACT strategy...")
    results_exact = await lang_cache.search(prompt=query1, search_strategies=[SearchStrategy.EXACT])
    if results_exact:
        print(f"  Exact Hit! Response: {results_exact[0].response[:50]}..., Score: {results_exact[0].score:.4f}")
    else:
        print("  Exact Miss. (Expected if prompt isn't exact in cache)")

    print(f"\nSearching for '{query2}' using SEMANTIC strategy...")
    results_semantic = await lang_cache.search(prompt=query2, search_strategies=[SearchStrategy.SEMANTIC], similarity_threshold=0.8)
    if results_semantic:
        print(f"  Semantic Hit! Response: {results_semantic[0].response[:50]}..., Score: {results_semantic[0].score:.4f}")
    else:
        print("  Semantic Miss.")

    print(f"\nSearching for '{query1}' using DEFAULT strategies (Exact then Semantic)...")
    results_combined = await lang_cache.search(prompt=query1)
    if results_combined:
        print(f"  Combined Hit! Response: {results_combined[0].response[:50]}..., Score: {results_combined[0].score:.4f}")
    else:
        print("  Combined Miss.")
    print("\nSearch strategies example complete.")

async def main():
    await threshold_experiment()
    await ttl_example()
    await attribute_caching_example()
    await search_strategies_example()

if __name__ == "__main__":
    asyncio.run(main())

nodejs-examples/advanced_features.js

require('dotenv').config({ path: '../.env' });
const { LangCache } = require('@redis-ai/langcache');

const LANGCACHE_API_HOST = process.env.LANGCACHE_API_HOST;
const LANGCACHE_CACHE_ID = process.env.LANGCACHE_CACHE_ID;
const LANGCACHE_API_KEY = process.env.LANGCACHE_API_KEY;

const langCache = new LangCache({
    serverURL: `https://${LANGCACHE_API_HOST}`,
    cacheId: LANGCACHE_CACHE_ID,
    apiKey: LANGCACHE_API_KEY,
});

async function thresholdExperiment() {
    console.log("--- Similarity Threshold Experiment (Node.js) ---");
    await langCache.set({ prompt: "What are the key features of the latest iPhone?", response: "The latest iPhone typically features a powerful Bionic chip, advanced camera systems including ProRes video, Ceramic Shield front cover, and MagSafe technology." });
    await new Promise(resolve => setTimeout(resolve, 2000));

    const queryStrict = "Tell me about the iPhone's camera capabilities.";
    const queryLenient = "What's new with iPhone?";

    console.log(`\nQuery (Strict): '${queryStrict}'`);
    for (const threshold of [0.95, 0.85, 0.75]) {
        const results = await langCache.search({ prompt: queryStrict, similarityThreshold: threshold });
        const status = (results && results.results.length > 0) ? "Hit" : "Miss";
        const score = (results && results.results.length > 0) ? results.results[0].score.toFixed(4) : "N/A";
        console.log(`  Threshold ${threshold.toFixed(2)}: ${status}, Score: ${score}`);
    }

    console.log(`\nQuery (Lenient): '${queryLenient}'`);
    for (const threshold of [0.95, 0.85, 0.75]) {
        const results = await langCache.search({ prompt: queryLenient, similarityThreshold: threshold });
        const status = (results && results.results.length > 0) ? "Hit" : "Miss";
        const score = (results && results.results.length > 0) ? results.results[0].score.toFixed(4) : "N/A";
        console.log(`  Threshold ${threshold.toFixed(2)}: ${status}, Score: ${score}`);
    }
    console.log("\nThreshold experiment complete.");
}

async function ttlExample() {
    console.log("\n--- Per-Entry TTL Example (Node.js) ---");
    const promptShortTtl = "What is the capital of Mars?";
    const responseShortTtl = "There is no official capital of Mars, as it is not inhabited by humans with a governing body.";
    const shortTtl = 10;
    console.log(`Storing: '${promptShortTtl}' with TTL: ${shortTtl} seconds`);
    const storeResultShort = await langCache.set({ prompt: promptShortTtl, response: responseShortTtl, ttl: shortTtl });
    console.log(`Stored with entry ID: ${storeResultShort.entryId}`);

    console.log(`Searching immediately for '${promptShortTtl}'...`);
    const resultsImmediate = await langCache.search({ prompt: "Mars capital?" });
    if (resultsImmediate && resultsImmediate.results.length > 0) {
        console.log(`  Immediate search: Cache Hit! Response: ${resultsImmediate.results[0].response.substring(0, 30)}..., Score: ${resultsImmediate.results[0].score.toFixed(4)}`);
    } else {
        console.log("  Immediate search: Cache Miss.");
    }

    console.log(`Waiting ${shortTtl + 2} seconds for entry to expire...`);
    await new Promise(resolve => setTimeout(resolve, (shortTtl + 2) * 1000));

    console.log(`Searching after ${shortTtl} seconds for '${promptShortTtl}'...`);
    const resultsExpired = await langCache.search({ prompt: "Mars capital?" });
    if (resultsExpired && resultsExpired.results.length > 0) {
        console.log(`  Expired search: Cache Hit! (Unexpected) Response: ${resultsExpired.results[0].response.substring(0, 30)}..., Score: ${resultsExpired.results[0].score.toFixed(4)}`);
    } else {
        console.log("  Expired search: Cache Miss. (Expected)");
    }
    console.log("\nTTL example complete.");
}

async function attributeCachingExample() {
    console.log("\n--- Attribute Caching Example (Node.js) ---");
    await langCache.set({ prompt: "How do I reset my password?", response: "Visit the 'Forgot Password' page and follow the instructions.", attributes: { userId: "user123", topic: "account" } });
    await langCache.set({ prompt: "How do I change my profile picture?", response: "Go to 'Profile Settings' and upload a new image.", attributes: { userId: "user123", topic: "account" } });
    await langCache.set({ prompt: "What are the billing options?", response: "We accept credit card, PayPal, and bank transfers.", attributes: { userId: "user456", topic: "billing" } });
    await new Promise(resolve => setTimeout(resolve, 2000));

    const queryUser1 = "I forgot my login details.";
    const attributesUser1 = { userId: "user123", topic: "account" };
    console.log(`\nSearching for '${queryUser1}' with attributes ${JSON.stringify(attributesUser1)}`);
    const results1 = await langCache.search({ prompt: queryUser1, attributes: attributesUser1 });
    if (results1 && results1.results.length > 0) {
        console.log(`  Cache Hit! Response: ${results1.results[0].response.substring(0, 50)}..., Score: ${results1.results[0].score.toFixed(4)}`);
    } else {
        console.log("  Cache Miss. (Unexpected)");
    }

    const queryUser2 = "I forgot my login details.";
    const attributesUser2 = { userId: "user456", topic: "account" };
    console.log(`\nSearching for '${queryUser2}' with attributes ${JSON.stringify(attributesUser2)}`);
    const results2 = await langCache.search({ prompt: queryUser2, attributes: attributesUser2 });
    if (results2 && results2.results.length > 0) {
        console.log(`  Cache Hit! (Unexpected) Response: ${results2.results[0].response.substring(0, 50)}..., Score: ${results2.results[0].score.toFixed(4)}`);
    } else {
        console.log("  Cache Miss. (Expected due to userId mismatch)");
    }

    const queryBilling = "What are the ways to pay?";
    const attributesBilling = { userId: "user456", topic: "billing" };
    console.log(`\nSearching for '${queryBilling}' with attributes ${JSON.stringify(attributesBilling)}`);
    const results3 = await langCache.search({ prompt: queryBilling, attributes: attributesBilling });
    if (results3 && results3.results.length > 0) {
        console.log(`  Cache Hit! Response: ${results3.results[0].response.substring(0, 50)}..., Score: ${results3.results[0].score.toFixed(4)}`);
    } else {
        console.log("  Cache Miss. (Unexpected)");
    }
    console.log("\nAttribute caching example complete.");
}

async function main() {
    await thresholdExperiment();
    await ttlExample();
    await attributeCachingExample();
    // Node.js does not have a direct equivalent for SearchStrategy enum as Python SDK.
    // To simulate: perform an exact match check before a semantic search.
}

main().catch(console.error);

Exercise/Mini-Challenge: Context-Aware Customer Support Bot

Objective: Build a mock customer support bot that uses LangCache to answer questions. The bot should be context-aware, meaning it can store and retrieve answers based on the user’s role (e.g., “admin”, “customer”) and the current session_id.

Instructions:

  1. Create a new file (e.g., support_bot.py or support_bot.js).
  2. Initialize LangCache client.
  3. Implement a mock LLM function mock_llm(prompt) that returns generic responses or specific responses for a few hardcoded prompts (e.g., “What is your refund policy?”, “How to reset admin password?”).
  4. Implement a handle_query(session_id, user_role, query) function:
    • It should first try to retrieve an answer from LangCache using the query and both session_id and user_role as attributes.
    • If a cache hit: return the cached response.
    • If a cache miss:
      • Call mock_llm(query) to get a fresh response.
      • Store this new prompt-response pair in LangCache, again using session_id and user_role as attributes.
      • Return the LLM’s response.
  5. Simulate a few interactions:
    • A customer (session ID s101) asks “What is the refund policy?”.
    • The same customer (s101) asks “Return policy?”. (Should be a cache hit).
    • An admin (session ID s202) asks “How to reset admin password?”.
    • A different customer (session ID s102) asks “How to reset admin password?”. (This should likely be a cache miss, as the user_role attribute won’t match the admin entry. The mock_llm should provide a different, perhaps generic, response for customers, which then gets cached with user_role: "customer").
  6. Experiment with different similarity_threshold values in your handle_query function.

This challenge will deepen your understanding of how to use attributes to manage distinct conversational contexts and prevent information leakage between different user roles or sessions, all while optimizing LLM usage.