Domain 2: Observability

Metrics, Logs, Traces, and Dashboards

Observability Bot | Observability | Max 30 Points

0-6
Ad-hoc
7-12
Foundational
13-18
Standardized
19-24
Advanced
25-30
Optimized

Scoring Criteria by Level

LevelCriteria
1Minimal logging; no centralized metrics; debugging via SSH
2Basic metrics/logs; some dashboards; siloed per team
3Centralized observability stack; standard dashboards; basic tracing
4Full pillars (metrics, logs, traces); correlation; self-service
5Exemplars, continuous profiling; AI-assisted analysis

Assessment Questions

#QuestionMax
1How comprehensive is your metrics coverage?6
2How mature is your logging infrastructure?6
3How well do you implement distributed tracing?6
4How effective are your dashboards?6
5Can you correlate across signals?6

Focus Areas

  • Metrics: RED/USE methods, cardinality management
  • Logs: Structured logging, centralized aggregation
  • Traces: Distributed tracing, context propagation
  • Dashboards: Service-oriented, actionable visualizations

Anti-Patterns (Red Flags)

  • Debugging production via SSH
  • Metrics without context (no labels/tags)
  • Logs without structured fields
  • Dashboard sprawl with no ownership
  • Observability as afterthought

Evidence Checklist

  • Centralized metrics platform (Prometheus, Datadog, etc.)
  • Log aggregation with search capability
  • Tracing enabled for critical paths
  • Service-level dashboards exist
  • Runbooks link to relevant dashboards

Related Domains

DomainRelationship
SLOsSLIs derive from observability data
AlertingAlerts query observability backend
IncidentsDashboards critical for diagnosis

Observe, Don't Guess

Data-driven debugging at scale.