Skip to content

Configuration

CacheConfig reference

Field Type Default Description
threshold str \| float "balanced" Named profile ("strict", "balanced", "loose") or raw cosine similarity in [0.0, 1.0]
default_namespace str "default" Namespace used when cache_namespace is not passed
default_ttl float \| None None Time-to-live in seconds. None = no expiration
embedding_model str "BAAI/bge-small-en-v1.5" Embedding model identifier used for tagging cache entries. Must match the configured embedder's model_id
cache_timeout_seconds float 0.2 Timeout for cache operations (async path only). On timeout, a structlog warning (cache.timeout_exceeded) is emitted with elapsed_ms, timeout_ms, and action=bypass

Threshold profiles

  • strict (0.97): Use for code generation and factual answers where a false positive looks like confidently returning the wrong snippet or fact; a false negative looks like an extra provider call for a near-identical prompt.
  • balanced (0.92): Use for general assistants and summarization where you want practical savings without aggressive matching; a false positive looks like subtle context drift, and a false negative looks like lower hit rate on paraphrases.
  • loose (0.85): Use for repetitive support/FAQ bots where wording varies a lot; a false positive looks like a slightly off canned answer, and a false negative looks like avoidable misses in common support phrasing.

cache_timeout_seconds

In async mode, cache lookup/store operations run with a hard timeout and fail open if they exceed cache_timeout_seconds, which means your wrapped function is still called and your app keeps serving responses. When a timeout occurs, Recallm emits a structlog warning event (cache.timeout_exceeded) with elapsed_ms, timeout_ms, and action=bypass so you can track cold-start bypasses in your log aggregator. This timeout protection only applies to async code paths; sync callers (including sync RedisStorage usage) have no timeout guard and rely on the backend client's own behavior.

default_ttl

Use default_ttl when cached answers age out naturally, such as short-lived support responses or rapidly changing operational data. Avoid relying on TTL alone for correctness when upstream content changes in bulk (for example model swaps, document re-indexes, API version changes). For content-change scenarios, use namespace invalidation so you can remove stale entries immediately instead of waiting for expiry.

Per-call overrides

Pass cache_namespace on each wrapped call when you want request-level scoping, such as per-tenant or per-session isolation:

response = cached_create(
    model="gpt-4o-mini",
    messages=messages,
    cache_context={"user_id": "u-42"},
    cache_namespace="session:s-123",
)