Observability
Prometheus metrics
| Metric | Type | Labels | Measures |
|---|---|---|---|
semantic_cache_hits_total |
Counter | namespace |
Cache hits |
semantic_cache_misses_total |
Counter | namespace |
Cache misses |
semantic_cache_errors_total |
Counter | operation |
Failed cache ops |
semantic_cache_embedding_duration_seconds |
Histogram | — | Embedding latency |
semantic_cache_similarity_score |
Histogram | — | Similarity score distribution |
semantic_cache_stream_bypass_total |
Counter | namespace |
stream=True bypasses |
Structlog events
| Event name | Level | Key fields |
|---|---|---|
cache.hit |
info |
namespace, best_score, threshold, embedding_model |
cache.miss |
info |
namespace, best_score, threshold, embedding_model |
cache.stream_bypass |
info |
namespace, embedding_model |
cache.no_user_message_bypass |
info |
namespace, embedding_model |
cache.embed_failed |
error |
namespace, error |
cache.lookup_failed |
error |
namespace, error |
cache.lookup_params_failed |
error |
namespace, error |
cache.store_failed |
error |
error |
cache.namespace_too_large |
warning |
namespace, size, threshold |
Grafana dashboard
Import path: Grafana → Dashboards → Import → upload dashboards/semantic-cache.json.
The dashboard includes eight panels:
- Cache Hit Rate — 5-minute hit-rate ratio for selected namespaces.
- Cache Hits/Misses — hit and miss request rates over time.
- Embedding Latency — p95 embedding latency from histogram buckets.
- Stream Bypass Rate — rate of requests with
stream=True. - Cache Errors — error rate grouped by
operation. - Similarity Score Distribution — bucketed similarity scores over time window.
- Namespace Entry Counts — per-namespace current entry count.
- Total Cache Entries — sum of entries across namespaces.
Use $namespace to filter all panels to one or more namespaces or view aggregate behavior with “All”.
Example log output
Console (development):
2026-01-01T12:00:00Z [info ] cache.hit namespace=default best_score=0.96 threshold=0.92 embedding_model=all-MiniLM-L6-v2
2026-01-01T12:00:02Z [info ] cache.miss namespace=default best_score=0.81 threshold=0.92 embedding_model=all-MiniLM-L6-v2
2026-01-01T12:00:03Z [info ] cache.stream_bypass namespace=default embedding_model=all-MiniLM-L6-v2
2026-01-01T12:00:04Z [error ] cache.lookup_failed namespace=default error='redis timeout'
JSON (production):
{"event":"cache.hit","level":"info","namespace":"default","best_score":0.96,"threshold":0.92,"embedding_model":"all-MiniLM-L6-v2","timestamp":"2026-01-01T12:00:00Z"}
{"event":"cache.miss","level":"info","namespace":"default","best_score":0.81,"threshold":0.92,"embedding_model":"all-MiniLM-L6-v2","timestamp":"2026-01-01T12:00:02Z"}
{"event":"cache.stream_bypass","level":"info","namespace":"default","embedding_model":"all-MiniLM-L6-v2","timestamp":"2026-01-01T12:00:03Z"}
{"event":"cache.lookup_failed","level":"error","namespace":"default","error":"redis timeout","timestamp":"2026-01-01T12:00:04Z"}