Health Check Failures
Symptoms
Health check endpoints return errors or non-200 status codes.HTTP Health Check Returns 500
HTTP Health Check Returns 500
Possible Causes:Solutions:
- Cerbos server not fully initialized
- Storage backend unavailable
- Critical service failure
- Check storage backend connectivity
- Review server startup logs for errors
- Verify configuration file is valid
- Ensure adequate resources (memory, CPU)
- Check for port conflicts
gRPC Health Check Timeout
gRPC Health Check Timeout
Possible Causes:Solutions:
- Server not responding on gRPC port
- TLS configuration mismatch
- Network connectivity issues
- Firewall blocking gRPC port
- Verify gRPC port (3593) is accessible
- Check TLS configuration matches client
- Ensure gRPC server started successfully
- Review firewall/security group rules
- Test with
cerbos healthcheck --kind=grpc
Health Check Works but Requests Fail
Health Check Works but Requests Fail
Possible Causes:Solutions:
- Policy loading errors
- Validation failures
- Partial service degradation
- Validate policy syntax:
cerbos compile <policy-dir> - Check policy store synchronization
- Review error logs for policy evaluation failures
- Verify schema validation if using schemas
Storage Backend Issues
Git Storage Problems
Git Clone/Fetch Failures
Git Clone/Fetch Failures
Error Messages:Diagnosis:Solutions:For SSH:For HTTPS:Common Fixes:
- Verify repository URL is correct
- Check authentication credentials are valid
- Ensure SSH key has no passphrase or use ssh-agent
- For private repos, verify access permissions
- Check network connectivity to git server
Policies Not Updating
Policies Not Updating
Symptoms:Solutions:
- Policy changes not reflected in Cerbos
cerbos_dev_store_last_successful_refreshmetric not updating
- Reduce
updatePollIntervalif needed - Check for network issues to git server
- Review logs for fetch errors
- Verify branch name is correct
- Force refresh via Admin API if available
Database Storage Problems
Connection Pool Exhaustion
Connection Pool Exhaustion
Error Messages:Diagnosis:Solutions:Tuning Guidelines:
- Set
maxOpenbased on workload:(CPU cores × 2) + spindles - Keep
maxIdleat ~50% ofmaxOpen - Ensure database
max_connections> Cerbos instances ×maxOpen - Monitor for connection leaks in application
Database Connection Failures
Database Connection Failures
Error Messages:Diagnosis:Solutions:Connection String:SSL Modes:
disable: No SSL (development only)require: Require SSL, don’t verify certverify-ca: Verify certificate authorityverify-full: Verify cert and hostname (recommended)
- Verify database host and port
- Check username and password
- Ensure database exists
- Configure SSL/TLS properly
- Check firewall rules
- Verify connection retry settings
Performance Issues
High Latency
Consistently High P50 Latency (> 10ms)
Consistently High P50 Latency (> 10ms)
Diagnosis:Common Causes:
- Cache misses: Low cache hit rate (< 90%)
- Complex policies: Heavy condition evaluation
- Storage latency: Slow policy loading
- Resource constraints: CPU/memory pressure
- Warm cache on startup
- Simplify policy conditions
- Optimize storage backend (see Performance guide)
- Increase CPU/memory allocation
- Review policy design patterns
Intermittent Latency Spikes
Intermittent Latency Spikes
Diagnosis:Common Causes:
- GC pressure: Frequent garbage collection
- Cache evictions: Memory pressure causing cache churn
- Storage sync: Policy updates during requests
- Network issues: Intermittent connectivity problems
- Increase memory allocation
- Reduce GC frequency by allocating more heap
- Stagger policy updates across instances
- Investigate network stability
- Monitor P99 latency trends
Storage Backend Latency
Storage Backend Latency
Diagnosis:Solutions by Storage Type:Git:
- Increase
updatePollIntervalto reduce fetch frequency - Use local git cache/mirror
- Reduce repository size
- Tune connection pool (see Database section)
- Add database indexes
- Use read replicas
- Use regional endpoints
- Enable CDN/caching layer
- Reduce poll interval if network is slow
Memory Issues
Out of Memory (OOM) Errors
Out of Memory (OOM) Errors
Symptoms:Diagnosis:Memory Estimation:Solutions:Kubernetes:Docker:Optimization:
- Reduce policy count if excessive
- Decrease audit buffer size
- Implement policy archival strategy
- Monitor for memory leaks
Memory Leak Detection
Memory Leak Detection
Symptoms:Solutions:
- Memory usage grows continuously
- OOM after days/weeks of operation
- GC cannot reclaim memory
- Update to latest Cerbos version
- Report issue with metrics/logs
- Implement periodic pod restarts as workaround
- Monitor for specific operations causing leaks
TLS and Certificate Issues
Certificate Verification Failed
Certificate Verification Failed
Error Messages:Diagnosis:Solutions:Expired Certificate:Client-Side:
- Check expiration:
openssl x509 -enddate -noout -in cert.crt - Renew certificate
- Cerbos auto-reloads on file change
Certificate Not Reloading
Certificate Not Reloading
Symptoms:Solutions:
- Updated certificate on disk
- Cerbos still uses old certificate
- No reload logged
- Ensure Cerbos has read permissions
- Verify files are not symlinks to read-only mounts
- Update both cert and key atomically
- Check filesystem supports inotify (for auto-reload)
- Restart Cerbos if auto-reload fails
Admin API Issues
Authentication Failed
Authentication Failed
Error Messages:Diagnosis:Solutions:Regenerate Password Hash:Common Issues:
- Password not base64-encoded
- Using bcrypt cost < 10
- Using wrong username
- Credentials from environment variables not set
Admin API Disabled
Admin API Disabled
Error Messages:Solution:Verify:
Audit Logging Issues
Audit Logs Not Written
Audit Logs Not Written
Diagnosis:Solutions:File Backend:Kafka Backend:Common Fixes:
- Verify backend is enabled in config
- Check file permissions (file backend)
- Verify Kafka broker connectivity
- Ensure topic exists and is writable
- Check for network/firewall issues
Oversized Audit Entries
Oversized Audit Entries
Symptoms:Solutions:Kafka Backend:Filter Sensitive Data:
cerbos_dev_audit_oversized_entry_countincreasing- Some requests not logged
Debug Mode
Enable debug logging temporarily:Getting Help
Information to Collect
When reporting issues:- Version:
cerbos version - Configuration: Sanitized config file
- Logs: Last 100 lines of logs
- Metrics: Relevant Prometheus metrics
- Environment: Deployment method (Docker, K8s, binary)
- Reproduction: Steps to reproduce the issue
Log Collection
Support Channels
- Community Slack: Join Cerbos Slack for community help
- GitHub Issues: Open issues for bugs
- Documentation: Check docs.cerbos.dev
- Enterprise Support: Contact Cerbos for enterprise support
Common Error Messages
| Error | Cause | Solution |
|---|---|---|
failed to load config | Invalid YAML | Validate YAML syntax |
store initialization failed | Storage backend unreachable | Check storage connectivity |
policy compilation failed | Invalid policy syntax | Run cerbos compile |
connection refused | Service not listening | Verify server started, check ports |
certificate signed by unknown authority | TLS cert mismatch | Verify CA certificate |
authentication failed | Wrong credentials | Regenerate password hash |
context deadline exceeded | Timeout | Increase timeout, check network |
resource exhausted | Rate limiting | Reduce request rate or increase limits |