Monitoring¶
Monitor USSO in production.
Health Endpoint¶
Response:
Metrics¶
USSO exposes Prometheus metrics at /metrics:
# Authentication metrics
usso_login_attempts_total
usso_login_success_total
usso_login_failure_total
# Token metrics
usso_token_issued_total
usso_token_verified_total
usso_token_expired_total
# API metrics
usso_http_requests_total
usso_http_request_duration_seconds
Logging¶
Log Levels¶
ERROR- Errors requiring attentionWARNING- Important eventsINFO- General informationDEBUG- Detailed debugging
Log Format¶
{
"timestamp": "2025-10-04T10:00:00Z",
"level": "INFO",
"message": "User login successful",
"user_id": "user:abc123",
"ip": "192.168.1.1",
"user_agent": "Mozilla/5.0..."
}
Centralized Logging¶
ELK Stack:
CloudWatch:
Alerting¶
Key Alerts¶
- High Error Rate
- Threshold: > 5% errors
-
Action: Investigate logs
-
Slow Responses
- Threshold: p95 > 1s
-
Action: Check database
-
Failed Logins
- Threshold: > 100/min
-
Action: Possible attack
-
Database Issues
- Health check fails
- Action: Check connectivity
Alert Configuration¶
# Prometheus AlertManager
groups:
- name: usso
rules:
- alert: HighErrorRate
expr: rate(usso_http_requests_total{status=~"5.."}[5m]) > 0.05
annotations:
summary: "High error rate detected"
Dashboards¶
Grafana Dashboard¶
Key panels: - Request rate - Error rate - Response time (p50, p95, p99) - Active sessions - Database connections