How it works
The decision path
- Extract the last user message from
messages— if none found, bypass cache and call the LLM. - Compute SHA-256 of
cache_contextdict to producecontext_hash. - Embed the prompt text using the configured embedding model.
- Search the storage backend for entries matching
namespace,embedding_model_id, andcontext_hashwith cosine similarity ≥ threshold. - On hit: return the cached response immediately.
- On miss: call the original function, store the result, return the response.
The four conditions for a cache hit
| Condition | Value must match |
|---|---|
| Namespace | cache_namespace kwarg (default: "default") |
| Embedding model | embedding_model_id from the configured embedder |
| Context hash | SHA-256 of cache_context dict |
| Cosine similarity | ≥ threshold (default: 0.92) |
Conditions that prevent a cache hit
| Scenario | Outcome |
|---|---|
cache_context missing |
ValueError and original function is not called |
cache_context is not a dict |
TypeError and original function is not called |
cache_namespace is not a string |
TypeError and original function is not called |
stream=True |
Cache bypass, original function is called |
No user message in messages |
Cache bypass, original function is called |
| Different namespace | Miss |
| Different context hash | Miss |
| Different embedding model | Entry is invisible, miss |
| Best cosine similarity below threshold | Miss |
| Entry expired by TTL | Entry is ignored/evicted, miss |
| Namespace invalidated | Entry deleted, miss |
| Cache operation error | Fail open, original function is called |
Embedding model isolation
Each cache entry stores embedding_model_id, and lookups only consider entries with the same model identifier as the currently configured embedder. Without this isolation, changing models would compare vectors from incompatible distributions and silently return wrong matches that look valid but are semantically incorrect.
Fail-open guarantees
If embedding, lookup, or store operations fail, SemanticCache logs and counts the error, then calls the original function as if caching were disabled for that request. This keeps your application behavior available even when the cache path is unhealthy, while preserving observability for incident response.