Sustainable Rotations, Escalation Paths & Alert Quality
People & Culture | Technical Operations Excellence
| Metric | Target | Warning |
|---|---|---|
| Pages/shift | <2 | >5 |
| Interrupt ratio | <25% | >50% |
| Night pages | 0 | >1 |
| False positives | <10% | >30% |
High alert volume = burnout risk. Fix alerts, not engineers.
| Parameter | Recommendation |
|---|---|
| Team size | 6-8 engineers minimum |
| Shift length | Max 3-4 consecutive days |
| Handoff | Overlapping 30-min window |
| Shadow period | 2 weeks for new members |
First responder, initial triage, known fixes
Domain expert, complex issues, escalation
SEV1 coordination, customer comms, exec updates
| Gate | Requirement |
|---|---|
| Actionable | Clear remediation steps |
| Urgent | Needs human intervention now |
| Documented | Runbook link in alert |
| Tuned | <10% false positive rate |
If it doesn't page, make it a ticket. If it's noise, delete it.
| Sign | Intervention |
|---|---|
| Dreading shifts | Review alert load |
| Constant fatigue | Extend rotation gaps |
| Cynicism | Pair with supportive peer |
| Avoidance | Temporary rotation break |
Sustainable On-Call
Great on-call is boring on-call. Fix the system, not the people.