Cerbos provides comprehensive observability features including Prometheus metrics, OpenTelemetry support, and health check endpoints for production monitoring.
Health Checks
Cerbos exposes health check endpoints for both HTTP and gRPC protocols to verify service availability.
HTTP Health Endpoint
The HTTP health check endpoint is available at /_cerbos/health:
curl http://localhost:3592/_cerbos/health?service=cerbos.svc.v1.CerbosService
Response Codes:
200 OK: Service is healthy and serving requests
Non-200: Service is unavailable or experiencing issues
gRPC Health Check
Cerbos implements the standard gRPC Health Checking Protocol. Use the gRPC health check service:
grpc_health_probe -addr=localhost:3593 -service=cerbos.svc.v1.CerbosService
Using the Healthcheck Command
Cerbos includes a built-in healthcheck command for Docker and Kubernetes:
# Check gRPC endpoint using config file
cerbos healthcheck --config=/path/to/.cerbos.yaml
# Check HTTP endpoint
cerbos healthcheck --config=/path/to/.cerbos.yaml --kind=http
# Manual check without config
cerbos healthcheck --kind=grpc --host-port=localhost:3593
# Skip TLS verification (development only)
cerbos healthcheck --kind=http --host-port=localhost:3592 --insecure
Configuration Options:
--config: Path to Cerbos configuration file
--kind: Health check type (grpc or http)
--host-port: Target host and port
--timeout: Health check timeout (default: 2s)
--insecure: Skip certificate verification
--no-tls: Disable TLS
Docker Healthcheck
Add to your Dockerfile:
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD [ "/cerbos" , "healthcheck" , "--config=/config/.cerbos.yaml" ]
Kubernetes Probes
livenessProbe :
httpGet :
path : /_cerbos/health?service=cerbos.svc.v1.CerbosService
port : 3592
initialDelaySeconds : 5
periodSeconds : 10
timeoutSeconds : 3
failureThreshold : 3
readinessProbe :
httpGet :
path : /_cerbos/health?service=cerbos.svc.v1.CerbosService
port : 3592
initialDelaySeconds : 3
periodSeconds : 5
timeoutSeconds : 2
failureThreshold : 2
Prometheus Metrics
Cerbos exposes Prometheus-compatible metrics at /_cerbos/metrics on the HTTP port (default: 3592).
Enabling Metrics
Metrics are enabled by default. To disable:
server :
metricsEnabled : false
Scraping Metrics
curl http://localhost:3592/_cerbos/metrics
Key Metrics
Metric Type Description cerbos_dev_engine_check_latencyHistogram Time to evaluate a policy decision (ms) cerbos_dev_engine_check_batch_sizeHistogram Distribution of batch sizes in check requests cerbos_dev_engine_plan_latencyHistogram Time to generate a query plan (ms)
Policy Compilation
Metric Type Description cerbos_dev_compiler_compile_durationHistogram Policy compilation time (ms)
Storage Operations
Metric Type Description cerbos_dev_store_poll_countCounter Number of times remote store was polled cerbos_dev_store_sync_error_countCounter Errors during store synchronization cerbos_dev_store_last_successful_refreshGauge Timestamp of last successful refresh cerbos_dev_store_bundle_op_latencyHistogram Bundle operation latency (ms) cerbos_dev_store_bundle_fetch_errors_countCounter Bundle download errors cerbos_dev_store_bundle_updates_countCounter Bundle updates from remote source
Metric Type Description cerbos_dev_cache_access_countCounter Cache access attempts (with result label) cerbos_dev_cache_live_objectsGauge Number of objects currently in cache cerbos_dev_cache_max_sizeGauge Maximum cache capacity
Policy Index
Metric Type Description cerbos_dev_index_entry_countGauge Number of entries in policy index cerbos_dev_index_crud_countCounter Create/update/delete operations
Audit Logging
Metric Type Description cerbos_dev_audit_error_countCounter Audit log write errors cerbos_dev_audit_oversized_entry_countCounter Entries exceeding maximum size
Cerbos Hub
Metric Type Description cerbos_dev_hub_connectedGauge Connection status (1=connected, 0=disconnected)
Runtime Metrics
Cerbos automatically exports Go runtime metrics including:
Memory allocation and GC statistics
Goroutine counts
CPU usage
Prometheus Configuration
scrape_configs :
- job_name : 'cerbos'
scrape_interval : 30s
static_configs :
- targets : [ 'cerbos:3592' ]
metrics_path : /_cerbos/metrics
OpenTelemetry Integration
OTLP Metrics
Configure OTLP metrics export using environment variables:
# Enable OTLP metrics exporter
export OTEL_METRICS_EXPORTER = otlp
export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT = https :// otel-collector : 4318 / v1 / metrics
# Optional: Configure protocol (grpc or http/protobuf)
export OTEL_EXPORTER_OTLP_METRICS_PROTOCOL = grpc
# Optional: Configure export intervals
export OTEL_METRIC_EXPORT_INTERVAL = 60000 # milliseconds
export OTEL_METRIC_EXPORT_TIMEOUT = 30000 # milliseconds
TLS Configuration:
# Skip certificate validation (development only)
export OTEL_EXPORTER_OTLP_METRICS_INSECURE = true
# Custom CA certificate
export OTEL_EXPORTER_OTLP_METRICS_CERTIFICATE = / path / to / ca . crt
# Mutual TLS
export OTEL_EXPORTER_OTLP_METRICS_CLIENT_CERTIFICATE = / path / to / client . crt
export OTEL_EXPORTER_OTLP_METRICS_CLIENT_KEY = / path / to / client . key
Distributed Tracing
Enable distributed tracing to track request flows:
# Required: Set OTLP endpoint
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT = https :// jaeger : 4317
# Optional: Service name in traces
export OTEL_SERVICE_NAME = cerbos-prod
# Sampling configuration
export OTEL_TRACES_SAMPLER = parentbased_traceidratio
export OTEL_TRACES_SAMPLER_ARG = 0.1 # Sample 10% of traces
Sampling Strategies:
Sampler Description Use Case always_onRecord every trace Development, debugging always_offNo traces recorded Tracing disabled traceidratioSample based on trace ID Production with controlled overhead parentbased_always_onRecord if parent sampled Distributed systems parentbased_traceidratioRatio-based with parent context Fine-grained control
Protocol Options:
# gRPC (default)
export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL = grpc
# HTTP
export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL = http / protobuf
Logging Configuration
Cerbos uses structured logging with configurable log levels.
Log Levels
Set via configuration or environment variable:
export CERBOS_LOG_LEVEL = debug
Available Levels:
DEBUG or V1, V2, etc. - Verbose debugging
INFO - Standard operational information
WARN - Warning messages
ERROR - Error conditions
Cerbos automatically detects terminal output:
TTY detected : Colored console output
Non-TTY : JSON structured logs (ECS format)
Temporary Debug Logging
Send SIGUSR1 signal to temporarily enable debug logging:
Debug logging automatically reverts after 10 minutes (configurable via CERBOS_TEMP_LOG_LEVEL_DURATION).
Request Payload Logging
For debugging, enable request/response payload logging:
server :
logRequestPayloads : true
Payload logging impacts performance and may expose sensitive data. Only enable in controlled environments.
Audit Logging
Audit logs capture access decisions and policy evaluations. See the Audit configuration documentation for details.
Audit Metrics Integration
Monitor audit log health:
# Audit error rate
rate(cerbos_dev_audit_error_count[5m])
# Oversized entries
rate(cerbos_dev_audit_oversized_entry_count[5m])
Monitoring Best Practices
Critical Alerts
Service Health : Alert on failed health checks
High Latency : cerbos_dev_engine_check_latency > 100ms (p95)
Store Sync Failures : cerbos_dev_store_sync_error_count increasing
Audit Errors : cerbos_dev_audit_error_count > 0
Hub Disconnection : cerbos_dev_hub_connected = 0
# P95 check latency
histogram_quantile(0.95, rate(cerbos_dev_engine_check_latency_bucket[5m]))
# Check throughput
rate(cerbos_dev_engine_check_latency_count[5m])
# Cache hit rate
sum(rate(cerbos_dev_cache_access_count{result="hit"}[5m])) /
sum(rate(cerbos_dev_cache_access_count[5m]))
# Store freshness
time() - cerbos_dev_store_last_successful_refresh
Dashboard Recommendations
Overview : Service health, request rate, error rate, latency
Performance : Latency percentiles, batch sizes, cache metrics
Storage : Sync status, bundle updates, policy count
Resources : Memory, CPU, goroutines, GC metrics
Admin API Metrics
When the Admin API is enabled, additional endpoints are available:
# gRPC reflection and channelz
grpcurl -plaintext localhost:3593 list
# Admin service health
grpcurl -plaintext localhost:3593 grpc.health.v1.Health/Check \
-d '{"service":"cerbos.svc.v1.CerbosAdminService"}'
Observability Stack Examples
# docker-compose.yml
version : '3.8'
services :
cerbos :
image : ghcr.io/cerbos/cerbos:latest
ports :
- "3592:3592"
- "3593:3593"
prometheus :
image : prom/prometheus
volumes :
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports :
- "9090:9090"
grafana :
image : grafana/grafana
ports :
- "3000:3000"
environment :
- GF_SECURITY_ADMIN_PASSWORD=admin
# docker-compose.yml
version : '3.8'
services :
cerbos :
image : ghcr.io/cerbos/cerbos:latest
environment :
- OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
- OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://otel-collector:4317
- OTEL_TRACES_SAMPLER=parentbased_traceidratio
- OTEL_TRACES_SAMPLER_ARG=0.1
- OTEL_METRICS_EXPORTER=otlp
otel-collector :
image : otel/opentelemetry-collector-contrib
volumes :
- ./otel-config.yaml:/etc/otel-config.yaml
command : [ "--config=/etc/otel-config.yaml" ]
ports :
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
version : '3.8'
services :
cerbos :
image : ghcr.io/cerbos/cerbos:latest
environment :
- OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://jaeger:4317
- OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=grpc
- OTEL_TRACES_SAMPLER=always_on
- OTEL_SERVICE_NAME=cerbos
jaeger :
image : jaegertracing/all-in-one:latest
ports :
- "16686:16686" # Jaeger UI
- "4317:4317" # OTLP gRPC