4. Advanced LangCache Features and Optimization
Beyond basic set and search operations, Redis LangCache offers several powerful features and configuration options to fine-tune its behavior. Understanding these allows you to optimize cache performance, cost efficiency, and relevance for your specific AI applications.
4.1 Fine-tuning Similarity Threshold
The similarity_threshold (Python) or similarityThreshold (Node.js) parameter in the search method is crucial. It determines how closely a new prompt’s embedding must match a cached embedding for it to be considered a “hit.”
Higher Threshold (e.g., 0.95-1.0):
- Pros: Very precise matches. Reduces the chance of returning irrelevant responses.
- Cons: More cache misses. May not catch subtle semantic variations, leading to more LLM calls.
- Use Case: When accuracy and strict relevance are paramount, and false positives are unacceptable.
Lower Threshold (e.g., 0.7-0.8):
- Pros: More cache hits, even for loosely similar prompts. Maximizes cost savings and speed.
- Cons: Increased chance of returning less relevant or slightly off-topic responses.
- Use Case: When a broader range of answers is acceptable, and optimizing for speed and cost is the priority (e.g., general chatbots where a “good enough” answer is often sufficient).
The optimal threshold depends heavily on your application’s domain, user expectations, and the quality of your embedding model. It’s often an iterative process of experimentation and monitoring.
Code Example: Experimenting with Thresholds
# python-examples/advanced_features.py
import os
import asyncio
from dotenv import load_dotenv
from langcache import LangCache
load_dotenv()
LANGCACHE_API_HOST = os.getenv("LANGCACHE_API_HOST")
LANGCACHE_CACHE_ID = os.getenv("LANGCACHE_CACHE_ID")
LANGCACHE_API_KEY = os.getenv("LANGCACHE_API_KEY")
lang_cache = LangCache(
server_url=f"https://{LANGCACHE_API_HOST}",
cache_id=LANGCACHE_CACHE_ID,
api_key=LANGCACHE_API_KEY
)
async def threshold_experiment():
print("--- Similarity Threshold Experiment (Python) ---")
await lang_cache.set(prompt="What are the key features of the latest iPhone?", response="The latest iPhone typically features a powerful Bionic chip, advanced camera systems including ProRes video, Ceramic Shield front cover, and MagSafe technology.")
await asyncio.sleep(2) # Allow indexing
query_strict = "Tell me about the iPhone's camera capabilities."
query_lenient = "What's new with iPhone?"
print(f"\nQuery (Strict): '{query_strict}'")
for threshold in [0.95, 0.85, 0.75]:
results = await lang_cache.search(prompt=query_strict, similarity_threshold=threshold)
status = "Hit" if results else "Miss"
score = results[0].score if results else "N/A"
print(f" Threshold {threshold:.2f}: {status}, Score: {score:.4f}")
print(f"\nQuery (Lenient): '{query_lenient}'")
for threshold in [0.95, 0.85, 0.75]:
results = await lang_cache.search(prompt=query_lenient, similarity_threshold=threshold)
status = "Hit" if results else "Miss"
score = results[0].score if results else "N/A"
print(f" Threshold {threshold:.2f}: {status}, Score: {score:.4f}")
print("\nThreshold experiment complete.")
# In main(): await threshold_experiment()
// nodejs-examples/advanced_features.js
require('dotenv').config({ path: '../.env' });
const { LangCache } = require('@redis-ai/langcache');
const LANGCACHE_API_HOST = process.env.LANGCACHE_API_HOST;
const LANGCACHE_CACHE_ID = process.env.LANGCACHE_CACHE_ID;
const LANGCACHE_API_KEY = process.env.LANGCACHE_API_KEY;
const langCache = new LangCache({
serverURL: `https://${LANGCACHE_API_HOST}`,
cacheId: LANGCACHE_CACHE_ID,
apiKey: LANGCACHE_API_KEY,
});
async function thresholdExperiment() {
console.log("--- Similarity Threshold Experiment (Node.js) ---");
await langCache.set({ prompt: "What are the key features of the latest iPhone?", response: "The latest iPhone typically features a powerful Bionic chip, advanced camera systems including ProRes video, Ceramic Shield front cover, and MagSafe technology." });
await new Promise(resolve => setTimeout(resolve, 2000)); // Allow indexing
const queryStrict = "Tell me about the iPhone's camera capabilities.";
const queryLenient = "What's new with iPhone?";
console.log(`\nQuery (Strict): '${queryStrict}'`);
for (const threshold of [0.95, 0.85, 0.75]) {
const results = await langCache.search({ prompt: queryStrict, similarityThreshold: threshold });
const status = (results && results.results.length > 0) ? "Hit" : "Miss";
const score = (results && results.results.length > 0) ? results.results[0].score.toFixed(4) : "N/A";
console.log(` Threshold ${threshold.toFixed(2)}: ${status}, Score: ${score}`);
}
console.log(`\nQuery (Lenient): '${queryLenient}'`);
for (const threshold of [0.95, 0.85, 0.75]) {
const results = await langCache.search({ prompt: queryLenient, similarityThreshold: threshold });
const status = (results && results.results.length > 0) ? "Hit" : "Miss";
const score = (results && results.results.length > 0) ? results.results[0].score.toFixed(4) : "N/A";
console.log(` Threshold ${threshold.toFixed(2)}: ${status}, Score: ${score}`);
}
console.log("\nThreshold experiment complete.");
}
// In main(): await thresholdExperiment();
4.2 Managing Time-To-Live (TTL)
TTL (Time-To-Live) defines how long a cached entry remains valid before it’s automatically evicted from the cache. This is crucial for managing cache size and ensuring data freshness.
- Global TTL: Configured when you create your LangCache service on Redis Cloud. This applies to all entries unless overridden.
- Per-Entry TTL: You can specify a
ttl(in seconds) when using thesetmethod. This overrides the global TTL for that specific entry.
When to use TTL:
- Volatile Information: For data that changes frequently (e.g., stock prices, weather updates), a short TTL is appropriate.
- Less Volatile Information: For more stable information (e.g., historical facts, general product descriptions), a longer TTL or no TTL might be suitable.
- Cache Management: Prevents the cache from growing indefinitely and consuming excessive resources.
Code Example: Per-Entry TTL
# In python-examples/advanced_features.py
async def ttl_example():
print("\n--- Per-Entry TTL Example (Python) ---")
# Store an entry with a short TTL (e.g., 10 seconds)
prompt_short_ttl = "What is the capital of Mars?"
response_short_ttl = "There is no official capital of Mars, as it is not inhabited by humans with a governing body."
short_ttl = 10
print(f"Storing: '{prompt_short_ttl}' with TTL: {short_ttl} seconds")
key_short = await lang_cache.set(prompt=prompt_short_ttl, response=response_short_ttl, ttl=short_ttl)
print(f"Stored with key: {key_short}")
# Search immediately (should be a hit)
print(f"Searching immediately for '{prompt_short_ttl}'...")
results_immediate = await lang_cache.search(prompt="Mars capital?")
if results_immediate:
print(f" Immediate search: Cache Hit! Response: {results_immediate[0].response[:30]}..., Score: {results_immediate[0].score:.4f}")
else:
print(" Immediate search: Cache Miss.")
# Wait for TTL to expire
print(f"Waiting {short_ttl + 2} seconds for entry to expire...")
await asyncio.sleep(short_ttl + 2)
# Search after TTL expiration (should be a miss)
print(f"Searching after {short_ttl} seconds for '{prompt_short_ttl}'...")
results_expired = await lang_cache.search(prompt="Mars capital?")
if results_expired:
print(f" Expired search: Cache Hit! (Unexpected) Response: {results_expired[0].response[:30]}..., Score: {results_expired[0].score:.4f}")
else:
print(" Expired search: Cache Miss. (Expected)")
print("\nTTL example complete.")
# In main(): await ttl_example()
// In nodejs-examples/advanced_features.js
async function ttlExample() {
console.log("\n--- Per-Entry TTL Example (Node.js) ---");
// Store an entry with a short TTL (e.g., 10 seconds)
const promptShortTtl = "What is the capital of Mars?";
const responseShortTtl = "There is no official capital of Mars, as it is not inhabited by humans with a governing body.";
const shortTtl = 10; // seconds
console.log(`Storing: '${promptShortTtl}' with TTL: ${shortTtl} seconds`);
const storeResultShort = await langCache.set({ prompt: promptShortTtl, response: responseShortTtl, ttl: shortTtl });
console.log(`Stored with entry ID: ${storeResultShort.entryId}`);
// Search immediately (should be a hit)
console.log(`Searching immediately for '${promptShortTtl}'...`);
const resultsImmediate = await langCache.search({ prompt: "Mars capital?" });
if (resultsImmediate && resultsImmediate.results.length > 0) {
console.log(` Immediate search: Cache Hit! Response: ${resultsImmediate.results[0].response.substring(0, 30)}..., Score: ${resultsImmediate.results[0].score.toFixed(4)}`);
} else {
console.log(" Immediate search: Cache Miss.");
}
// Wait for TTL to expire
console.log(`Waiting ${shortTtl + 2} seconds for entry to expire...`);
await new Promise(resolve => setTimeout(resolve, (shortTtl + 2) * 1000));
// Search after TTL expiration (should be a miss)
console.log(`Searching after ${shortTtl} seconds for '${promptShortTtl}'...`);
const resultsExpired = await langCache.search({ prompt: "Mars capital?" });
if (resultsExpired && resultsExpired.results.length > 0) {
console.log(` Expired search: Cache Hit! (Unexpected) Response: ${resultsExpired.results[0].response.substring(0, 30)}..., Score: ${resultsExpired.results[0].score.toFixed(4)}`);
} else {
console.log(" Expired search: Cache Miss. (Expected)");
}
console.log("\nTTL example complete.");
}
// In main(): await ttlExample();
4.3 Using Attributes for Targeted Caching and Search
Attributes (referred to as metadata in Python SDK set method, and attributes in Node.js SDK set and search methods) allow you to add custom key-value pairs to cached entries. This is incredibly powerful for scoping your cache operations.
Use Cases for Attributes:
- Multi-tenant applications: Cache responses specific to a
user_id,organization_id, orsession_id. - Contextual caching: Store responses based on the
topic,language,persona, orsourceof the query. - A/B testing: Cache results for different LLM models or prompt engineering strategies.
When you search with attributes, LangCache will only consider cached entries that match all the provided attributes in addition to semantic similarity.
Code Example: Attributes in Action
# In python-examples/advanced_features.py
async def attribute_caching_example():
print("\n--- Attribute Caching Example (Python) ---")
# Store responses with different user IDs and topics
await lang_cache.set(prompt="How do I reset my password?", response="Visit the 'Forgot Password' page and follow the instructions.", metadata={"user_id": "user123", "topic": "account"})
await lang_cache.set(prompt="How do I change my profile picture?", response="Go to 'Profile Settings' and upload a new image.", metadata={"user_id": "user123", "topic": "account"})
await lang_cache.set(prompt="What are the billing options?", response="We accept credit card, PayPal, and bank transfers.", metadata={"user_id": "user456", "topic": "billing"})
await asyncio.sleep(2)
# Search for user123's account topic query
query_user1 = "I forgot my login details."
attributes_user1 = {"user_id": "user123", "topic": "account"}
print(f"\nSearching for '{query_user1}' with attributes {attributes_user1}")
results1 = await lang_cache.search(prompt=query_user1, attributes=attributes_user1)
if results1:
print(f" Cache Hit! Response: {results1[0].response[:50]}..., Score: {results1[0].score:.4f}")
else:
print(" Cache Miss. (Unexpected)")
# Search for user456's account topic query (should be a miss due to user_id mismatch)
query_user2 = "I forgot my login details."
attributes_user2 = {"user_id": "user456", "topic": "account"} # User ID mismatch
print(f"\nSearching for '{query_user2}' with attributes {attributes_user2}")
results2 = await lang_cache.search(prompt=query_user2, attributes=attributes_user2)
if results2:
print(f" Cache Hit! (Unexpected) Response: {results2[0].response[:50]}..., Score: {results2[0].score:.4f}")
else:
print(" Cache Miss. (Expected due to user_id mismatch)")
# Search for user456's billing topic query
query_billing = "What are the ways to pay?"
attributes_billing = {"user_id": "user456", "topic": "billing"}
print(f"\nSearching for '{query_billing}' with attributes {attributes_billing}")
results3 = await lang_cache.search(prompt=query_billing, attributes=attributes_billing)
if results3:
print(f" Cache Hit! Response: {results3[0].response[:50]}..., Score: {results3[0].score:.4f}")
else:
print(" Cache Miss. (Unexpected)")
print("\nAttribute caching example complete.")
# In main(): await attribute_caching_example()
// In nodejs-examples/advanced_features.js
async function attributeCachingExample() {
console.log("\n--- Attribute Caching Example (Node.js) ---");
// Store responses with different user IDs and topics
await langCache.set({ prompt: "How do I reset my password?", response: "Visit the 'Forgot Password' page and follow the instructions.", attributes: { userId: "user123", topic: "account" } });
await langCache.set({ prompt: "How do I change my profile picture?", response: "Go to 'Profile Settings' and upload a new image.", attributes: { userId: "user123", topic: "account" } });
await langCache.set({ prompt: "What are the billing options?", response: "We accept credit card, PayPal, and bank transfers.", attributes: { userId: "user456", topic: "billing" } });
await new Promise(resolve => setTimeout(resolve, 2000));
// Search for user123's account topic query
const queryUser1 = "I forgot my login details.";
const attributesUser1 = { userId: "user123", topic: "account" };
console.log(`\nSearching for '${queryUser1}' with attributes ${JSON.stringify(attributesUser1)}`);
const results1 = await langCache.search({ prompt: queryUser1, attributes: attributesUser1 });
if (results1 && results1.results.length > 0) {
console.log(` Cache Hit! Response: ${results1.results[0].response.substring(0, 50)}..., Score: ${results1.results[0].score.toFixed(4)}`);
} else {
console.log(" Cache Miss. (Unexpected)");
}
// Search for user456's account topic query (should be a miss due to userId mismatch)
const queryUser2 = "I forgot my login details.";
const attributesUser2 = { userId: "user456", topic: "account" }; // User ID mismatch
console.log(`\nSearching for '${queryUser2}' with attributes ${JSON.stringify(attributesUser2)}`);
const results2 = await langCache.search({ prompt: queryUser2, attributes: attributesUser2 });
if (results2 && results2.results.length > 0) {
console.log(` Cache Hit! (Unexpected) Response: ${results2.results[0].response.substring(0, 50)}..., Score: ${results2.results[0].score.toFixed(4)}`);
} else {
console.log(" Cache Miss. (Expected due to userId mismatch)");
}
// Search for user456's billing topic query
const queryBilling = "What are the ways to pay?";
const attributesBilling = { userId: "user456", topic: "billing" };
console.log(`\nSearching for '${queryBilling}' with attributes ${JSON.stringify(attributesBilling)}`);
const results3 = await langCache.search({ prompt: queryBilling, attributes: attributesBilling });
if (results3 && results3.results.length > 0) {
console.log(` Cache Hit! Response: ${results3.results[0].response.substring(0, 50)}..., Score: ${results3.results[0].score.toFixed(4)}`);
} else {
console.log(" Cache Miss. (Unexpected)");
}
console.log("\nAttribute caching example complete.");
}
// In main(): await attributeCachingExample();
4.4 Search Strategies (Exact vs. Semantic)
The LangCache SDKs (especially Python) allow you to specify search_strategies in the search method. This enables you to combine exact keyword matching with semantic similarity for more robust retrieval.
SearchStrategy.EXACT: Prioritizes exact matches of the prompt text. This is useful for very specific queries where even minor semantic variations might lead to different intended answers.SearchStrategy.SEMANTIC: Relies solely on semantic similarity, as we’ve discussed.
When both are provided, LangCache will typically try exact first and then fall back to semantic if no exact match is found within a certain threshold.
Code Example: Combining Search Strategies (Python SDK specific)
The Node.js SDK for @redis-ai/langcache currently doesn’t expose SearchStrategy enum directly in the same way as langcache for Python. Its search method primarily focuses on semantic search and filtering via attributes. However, you can achieve a similar effect in Node.js by performing an exact match check before a semantic search, or by leveraging very high similarity thresholds.
Python Example:
# In python-examples/advanced_features.py
from langcache.models import SearchStrategy # Import for Python
async def search_strategies_example():
print("\n--- Search Strategies Example (Python) ---")
await lang_cache.set(prompt="What is the capital of Canada?", response="Ottawa")
await lang_cache.set(prompt="Tell me about Ottawa, the capital city of Canada.", response="Ottawa is located on the Ottawa River and is known for its parliament buildings, Rideau Canal, and diverse cultural institutions.")
await asyncio.sleep(2)
query1 = "What is the capital of Canada?"
query2 = "Canada's capital is which city?"
# Exact search
print(f"\nSearching for '{query1}' using EXACT strategy...")
results_exact = await lang_cache.search(prompt=query1, search_strategies=[SearchStrategy.EXACT])
if results_exact:
print(f" Exact Hit! Response: {results_exact[0].response[:50]}..., Score: {results_exact[0].score:.4f}")
else:
print(" Exact Miss. (Expected if prompt isn't exact in cache)")
# Semantic search for a variation
print(f"\nSearching for '{query2}' using SEMANTIC strategy...")
results_semantic = await lang_cache.search(prompt=query2, search_strategies=[SearchStrategy.SEMANTIC], similarity_threshold=0.8)
if results_semantic:
print(f" Semantic Hit! Response: {results_semantic[0].response[:50]}..., Score: {results_semantic[0].score:.4f}")
else:
print(" Semantic Miss.")
# Combined strategy (default if not specified, often prioritizes exact)
print(f"\nSearching for '{query1}' using DEFAULT strategies (Exact then Semantic)...")
results_combined = await lang_cache.search(prompt=query1) # Default behavior
if results_combined:
print(f" Combined Hit! Response: {results_combined[0].response[:50]}..., Score: {results_combined[0].score:.4f}")
else:
print(" Combined Miss.")
print("\nSearch strategies example complete.")
# In main(): await search_strategies_example()
4.5 Best Practices for Optimal Performance
- Monitor Cache Hit Rate: Regularly check the cache hit rate in your Redis Cloud console. A low hit rate indicates that LangCache might not be effectively reducing LLM calls, suggesting a need to adjust thresholds, review cached content, or improve query patterns.
- Optimize TTL: Set appropriate TTLs for your cached data. Dynamic or time-sensitive information needs shorter TTLs, while static reference data can have longer ones or no TTL.
- Thoughtful Attribute Usage: Use attributes to segment your cache logically. This prevents irrelevant data from being considered during a search, improving both accuracy and performance. Avoid overly granular attributes if not necessary.
- Batch Operations (if available/applicable): If your application generates multiple responses at once, explore if the LangCache API or SDKs offer batch
setoperations to reduce network overhead. (As of current preview, this might not be explicitly exposed in the SDKs but is a general caching best practice). - Error Handling and Fallbacks: Always wrap your LangCache calls in robust error handling. If LangCache is unavailable or returns an error, your application should gracefully fall back to calling the LLM directly to ensure continuity.
- Pre-cache Common Queries: For frequently asked questions or critical prompts, consider pre-populating your LangCache with high-quality responses to ensure high hit rates from the start.
- Choose the Right Embedding Model: If you have the option to choose, ensure your embedding model is well-suited for the domain and language of your prompts to generate high-quality embeddings.
- Understand “Semantic Similarity”: Educate your team on what semantic similarity implies. It’s not magic; slight nuances in language can sometimes lead to different embeddings. Test thoroughly.
Combined main functions for advanced_features.py and advanced_features.js (uncomment to run all examples):
python-examples/advanced_features.py
import os
import asyncio
from dotenv import load_dotenv
from langcache import LangCache
from langcache.models import SearchStrategy # Import for Python specific example
load_dotenv()
LANGCACHE_API_HOST = os.getenv("LANGCACHE_API_HOST")
LANGCACHE_CACHE_ID = os.getenv("LANGCACHE_CACHE_ID")
LANGCACHE_API_KEY = os.getenv("LANGCACHE_API_KEY")
lang_cache = LangCache(
server_url=f"https://{LANGCACHE_API_HOST}",
cache_id=LANGCACHE_CACHE_ID,
api_key=LANGCACHE_API_KEY
)
async def threshold_experiment():
print("--- Similarity Threshold Experiment (Python) ---")
await lang_cache.set(prompt="What are the key features of the latest iPhone?", response="The latest iPhone typically features a powerful Bionic chip, advanced camera systems including ProRes video, Ceramic Shield front cover, and MagSafe technology.")
await asyncio.sleep(2)
query_strict = "Tell me about the iPhone's camera capabilities."
query_lenient = "What's new with iPhone?"
print(f"\nQuery (Strict): '{query_strict}'")
for threshold in [0.95, 0.85, 0.75]:
results = await lang_cache.search(prompt=query_strict, similarity_threshold=threshold)
status = "Hit" if results else "Miss"
score = results[0].score if results else "N/A"
print(f" Threshold {threshold:.2f}: {status}, Score: {score:.4f}")
print(f"\nQuery (Lenient): '{query_lenient}'")
for threshold in [0.95, 0.85, 0.75]:
results = await lang_cache.search(prompt=query_lenient, similarity_threshold=threshold)
status = "Hit" if results else "Miss"
score = results[0].score if results else "N/A"
print(f" Threshold {threshold:.2f}: {status}, Score: {score:.4f}")
print("\nThreshold experiment complete.")
async def ttl_example():
print("\n--- Per-Entry TTL Example (Python) ---")
prompt_short_ttl = "What is the capital of Mars?"
response_short_ttl = "There is no official capital of Mars, as it is not inhabited by humans with a governing body."
short_ttl = 10
print(f"Storing: '{prompt_short_ttl}' with TTL: {short_ttl} seconds")
key_short = await lang_cache.set(prompt=prompt_short_ttl, response=response_short_ttl, ttl=short_ttl)
print(f"Stored with key: {key_short}")
print(f"Searching immediately for '{prompt_short_ttl}'...")
results_immediate = await lang_cache.search(prompt="Mars capital?")
if results_immediate:
print(f" Immediate search: Cache Hit! Response: {results_immediate[0].response[:30]}..., Score: {results_immediate[0].score:.4f}")
else:
print(" Immediate search: Cache Miss.")
print(f"Waiting {short_ttl + 2} seconds for entry to expire...")
await asyncio.sleep(short_ttl + 2)
print(f"Searching after {short_ttl} seconds for '{prompt_short_ttl}'...")
results_expired = await lang_cache.search(prompt="Mars capital?")
if results_expired:
print(f" Expired search: Cache Hit! (Unexpected) Response: {results_expired[0].response[:30]}..., Score: {results_expired[0].score:.4f}")
else:
print(" Expired search: Cache Miss. (Expected)")
print("\nTTL example complete.")
async def attribute_caching_example():
print("\n--- Attribute Caching Example (Python) ---")
await lang_cache.set(prompt="How do I reset my password?", response="Visit the 'Forgot Password' page and follow the instructions.", metadata={"user_id": "user123", "topic": "account"})
await lang_cache.set(prompt="How do I change my profile picture?", response="Go to 'Profile Settings' and upload a new image.", metadata={"user_id": "user123", "topic": "account"})
await lang_cache.set(prompt="What are the billing options?", response="We accept credit card, PayPal, and bank transfers.", metadata={"user_id": "user456", "topic": "billing"})
await asyncio.sleep(2)
query_user1 = "I forgot my login details."
attributes_user1 = {"user_id": "user123", "topic": "account"}
print(f"\nSearching for '{query_user1}' with attributes {attributes_user1}")
results1 = await lang_cache.search(prompt=query_user1, attributes=attributes_user1)
if results1:
print(f" Cache Hit! Response: {results1[0].response[:50]}..., Score: {results1[0].score:.4f}")
else:
print(" Cache Miss. (Unexpected)")
query_user2 = "I forgot my login details."
attributes_user2 = {"user_id": "user456", "topic": "account"}
print(f"\nSearching for '{query_user2}' with attributes {attributes_user2}")
results2 = await lang_cache.search(prompt=query_user2, attributes=attributes_user2)
if results2:
print(f" Cache Hit! (Unexpected) Response: {results2[0].response[:50]}..., Score: {results2[0].score:.4f}")
else:
print(" Cache Miss. (Expected due to user_id mismatch)")
query_billing = "What are the ways to pay?"
attributes_billing = {"user_id": "user456", "topic": "billing"}
print(f"\nSearching for '{query_billing}' with attributes {attributes_billing}")
results3 = await lang_cache.search(prompt=query_billing, attributes=attributes_billing)
if results3:
print(f" Cache Hit! Response: {results3[0].response[:50]}..., Score: {results3[0].score:.4f}")
else:
print(" Cache Miss. (Unexpected)")
print("\nAttribute caching example complete.")
async def search_strategies_example():
print("\n--- Search Strategies Example (Python) ---")
await lang_cache.set(prompt="What is the capital of Canada?", response="Ottawa")
await lang_cache.set(prompt="Tell me about Ottawa, the capital city of Canada.", response="Ottawa is located on the Ottawa River and is known for its parliament buildings, Rideau Canal, and diverse cultural institutions.")
await asyncio.sleep(2)
query1 = "What is the capital of Canada?"
query2 = "Canada's capital is which city?"
print(f"\nSearching for '{query1}' using EXACT strategy...")
results_exact = await lang_cache.search(prompt=query1, search_strategies=[SearchStrategy.EXACT])
if results_exact:
print(f" Exact Hit! Response: {results_exact[0].response[:50]}..., Score: {results_exact[0].score:.4f}")
else:
print(" Exact Miss. (Expected if prompt isn't exact in cache)")
print(f"\nSearching for '{query2}' using SEMANTIC strategy...")
results_semantic = await lang_cache.search(prompt=query2, search_strategies=[SearchStrategy.SEMANTIC], similarity_threshold=0.8)
if results_semantic:
print(f" Semantic Hit! Response: {results_semantic[0].response[:50]}..., Score: {results_semantic[0].score:.4f}")
else:
print(" Semantic Miss.")
print(f"\nSearching for '{query1}' using DEFAULT strategies (Exact then Semantic)...")
results_combined = await lang_cache.search(prompt=query1)
if results_combined:
print(f" Combined Hit! Response: {results_combined[0].response[:50]}..., Score: {results_combined[0].score:.4f}")
else:
print(" Combined Miss.")
print("\nSearch strategies example complete.")
async def main():
await threshold_experiment()
await ttl_example()
await attribute_caching_example()
await search_strategies_example()
if __name__ == "__main__":
asyncio.run(main())
nodejs-examples/advanced_features.js
require('dotenv').config({ path: '../.env' });
const { LangCache } = require('@redis-ai/langcache');
const LANGCACHE_API_HOST = process.env.LANGCACHE_API_HOST;
const LANGCACHE_CACHE_ID = process.env.LANGCACHE_CACHE_ID;
const LANGCACHE_API_KEY = process.env.LANGCACHE_API_KEY;
const langCache = new LangCache({
serverURL: `https://${LANGCACHE_API_HOST}`,
cacheId: LANGCACHE_CACHE_ID,
apiKey: LANGCACHE_API_KEY,
});
async function thresholdExperiment() {
console.log("--- Similarity Threshold Experiment (Node.js) ---");
await langCache.set({ prompt: "What are the key features of the latest iPhone?", response: "The latest iPhone typically features a powerful Bionic chip, advanced camera systems including ProRes video, Ceramic Shield front cover, and MagSafe technology." });
await new Promise(resolve => setTimeout(resolve, 2000));
const queryStrict = "Tell me about the iPhone's camera capabilities.";
const queryLenient = "What's new with iPhone?";
console.log(`\nQuery (Strict): '${queryStrict}'`);
for (const threshold of [0.95, 0.85, 0.75]) {
const results = await langCache.search({ prompt: queryStrict, similarityThreshold: threshold });
const status = (results && results.results.length > 0) ? "Hit" : "Miss";
const score = (results && results.results.length > 0) ? results.results[0].score.toFixed(4) : "N/A";
console.log(` Threshold ${threshold.toFixed(2)}: ${status}, Score: ${score}`);
}
console.log(`\nQuery (Lenient): '${queryLenient}'`);
for (const threshold of [0.95, 0.85, 0.75]) {
const results = await langCache.search({ prompt: queryLenient, similarityThreshold: threshold });
const status = (results && results.results.length > 0) ? "Hit" : "Miss";
const score = (results && results.results.length > 0) ? results.results[0].score.toFixed(4) : "N/A";
console.log(` Threshold ${threshold.toFixed(2)}: ${status}, Score: ${score}`);
}
console.log("\nThreshold experiment complete.");
}
async function ttlExample() {
console.log("\n--- Per-Entry TTL Example (Node.js) ---");
const promptShortTtl = "What is the capital of Mars?";
const responseShortTtl = "There is no official capital of Mars, as it is not inhabited by humans with a governing body.";
const shortTtl = 10;
console.log(`Storing: '${promptShortTtl}' with TTL: ${shortTtl} seconds`);
const storeResultShort = await langCache.set({ prompt: promptShortTtl, response: responseShortTtl, ttl: shortTtl });
console.log(`Stored with entry ID: ${storeResultShort.entryId}`);
console.log(`Searching immediately for '${promptShortTtl}'...`);
const resultsImmediate = await langCache.search({ prompt: "Mars capital?" });
if (resultsImmediate && resultsImmediate.results.length > 0) {
console.log(` Immediate search: Cache Hit! Response: ${resultsImmediate.results[0].response.substring(0, 30)}..., Score: ${resultsImmediate.results[0].score.toFixed(4)}`);
} else {
console.log(" Immediate search: Cache Miss.");
}
console.log(`Waiting ${shortTtl + 2} seconds for entry to expire...`);
await new Promise(resolve => setTimeout(resolve, (shortTtl + 2) * 1000));
console.log(`Searching after ${shortTtl} seconds for '${promptShortTtl}'...`);
const resultsExpired = await langCache.search({ prompt: "Mars capital?" });
if (resultsExpired && resultsExpired.results.length > 0) {
console.log(` Expired search: Cache Hit! (Unexpected) Response: ${resultsExpired.results[0].response.substring(0, 30)}..., Score: ${resultsExpired.results[0].score.toFixed(4)}`);
} else {
console.log(" Expired search: Cache Miss. (Expected)");
}
console.log("\nTTL example complete.");
}
async function attributeCachingExample() {
console.log("\n--- Attribute Caching Example (Node.js) ---");
await langCache.set({ prompt: "How do I reset my password?", response: "Visit the 'Forgot Password' page and follow the instructions.", attributes: { userId: "user123", topic: "account" } });
await langCache.set({ prompt: "How do I change my profile picture?", response: "Go to 'Profile Settings' and upload a new image.", attributes: { userId: "user123", topic: "account" } });
await langCache.set({ prompt: "What are the billing options?", response: "We accept credit card, PayPal, and bank transfers.", attributes: { userId: "user456", topic: "billing" } });
await new Promise(resolve => setTimeout(resolve, 2000));
const queryUser1 = "I forgot my login details.";
const attributesUser1 = { userId: "user123", topic: "account" };
console.log(`\nSearching for '${queryUser1}' with attributes ${JSON.stringify(attributesUser1)}`);
const results1 = await langCache.search({ prompt: queryUser1, attributes: attributesUser1 });
if (results1 && results1.results.length > 0) {
console.log(` Cache Hit! Response: ${results1.results[0].response.substring(0, 50)}..., Score: ${results1.results[0].score.toFixed(4)}`);
} else {
console.log(" Cache Miss. (Unexpected)");
}
const queryUser2 = "I forgot my login details.";
const attributesUser2 = { userId: "user456", topic: "account" };
console.log(`\nSearching for '${queryUser2}' with attributes ${JSON.stringify(attributesUser2)}`);
const results2 = await langCache.search({ prompt: queryUser2, attributes: attributesUser2 });
if (results2 && results2.results.length > 0) {
console.log(` Cache Hit! (Unexpected) Response: ${results2.results[0].response.substring(0, 50)}..., Score: ${results2.results[0].score.toFixed(4)}`);
} else {
console.log(" Cache Miss. (Expected due to userId mismatch)");
}
const queryBilling = "What are the ways to pay?";
const attributesBilling = { userId: "user456", topic: "billing" };
console.log(`\nSearching for '${queryBilling}' with attributes ${JSON.stringify(attributesBilling)}`);
const results3 = await langCache.search({ prompt: queryBilling, attributes: attributesBilling });
if (results3 && results3.results.length > 0) {
console.log(` Cache Hit! Response: ${results3.results[0].response.substring(0, 50)}..., Score: ${results3.results[0].score.toFixed(4)}`);
} else {
console.log(" Cache Miss. (Unexpected)");
}
console.log("\nAttribute caching example complete.");
}
async function main() {
await thresholdExperiment();
await ttlExample();
await attributeCachingExample();
// Node.js does not have a direct equivalent for SearchStrategy enum as Python SDK.
// To simulate: perform an exact match check before a semantic search.
}
main().catch(console.error);
Exercise/Mini-Challenge: Context-Aware Customer Support Bot
Objective: Build a mock customer support bot that uses LangCache to answer questions. The bot should be context-aware, meaning it can store and retrieve answers based on the user’s role (e.g., “admin”, “customer”) and the current session_id.
Instructions:
- Create a new file (e.g.,
support_bot.pyorsupport_bot.js). - Initialize LangCache client.
- Implement a mock LLM function
mock_llm(prompt)that returns generic responses or specific responses for a few hardcoded prompts (e.g., “What is your refund policy?”, “How to reset admin password?”). - Implement a
handle_query(session_id, user_role, query)function:- It should first try to retrieve an answer from LangCache using the
queryand bothsession_idanduser_roleasattributes. - If a cache hit: return the cached response.
- If a cache miss:
- Call
mock_llm(query)to get a fresh response. - Store this new prompt-response pair in LangCache, again using
session_idanduser_roleasattributes. - Return the LLM’s response.
- Call
- It should first try to retrieve an answer from LangCache using the
- Simulate a few interactions:
- A
customer(session IDs101) asks “What is the refund policy?”. - The same
customer(s101) asks “Return policy?”. (Should be a cache hit). - An
admin(session IDs202) asks “How to reset admin password?”. - A different
customer(session IDs102) asks “How to reset admin password?”. (This should likely be a cache miss, as theuser_roleattribute won’t match theadminentry. Themock_llmshould provide a different, perhaps generic, response for customers, which then gets cached withuser_role: "customer").
- A
- Experiment with different
similarity_thresholdvalues in yourhandle_queryfunction.
This challenge will deepen your understanding of how to use attributes to manage distinct conversational contexts and prevent information leakage between different user roles or sessions, all while optimizing LLM usage.