Measuring and Improving SRE Capabilities
Strategic Roadmap | Technical Operations Excellence
| # | Domain | Pts |
|---|---|---|
| 1 | SLOs & Error Budgets | 30 |
| 2 | Observability | 30 |
| 3 | Alerting Strategy | 30 |
| 4 | Incident Response | 30 |
| 5 | On-Call Health | 30 |
| 6 | Reliability Patterns | 30 |
| 7 | Capacity & Performance | 30 |
| 8 | Release Engineering | 30 |
| 9 | Toil & Automation | 30 |
| 10 | Culture & Organization | 30 |
| 11 | Chaos Engineering | 30 |
| 12 | Disaster Recovery | 30 |
| 13 | Security Reliability | 30 |
| 14 | Documentation | 30 |
| 15 | Dependency Management | 30 |
| Level | Name | Score |
|---|---|---|
| 1 | Ad-hoc | 0-90 |
| 2 | Foundational | 91-180 |
| 3 | Standardized | 181-270 |
| 4 | Advanced | 271-360 |
| 5 | Optimized | 361-450 |
| Score | Criteria |
|---|---|
| 0-6 | No formal practice |
| 7-12 | Basic/reactive approach |
| 13-18 | Documented processes |
| 19-24 | Proactive, measured |
| 25-30 | Optimized, automated |
| Activity | Frequency |
|---|---|
| Full assessment | Quarterly |
| Progress review | Monthly |
| Action items | Weekly tracking |
| Stakeholder report | Quarterly |
| Domain | Typical Issue |
|---|---|
| SLOs | No error budgets enforced |
| Alerting | High noise, low signal |
| On-Call | Alert fatigue, burnout |
| Chaos | No regular practice |
Start Assessment
Take the interactive assessment or download offline PDF kit.