On-Call Excellence

Sustainable Rotations, Escalation Paths & Alert Quality

People & Culture | Technical Operations Excellence

<2
Pages/Shift
6-8
Rotation Size
<25%
Interrupt Ratio
24/7
Coverage

Healthy On-Call Metrics

MetricTargetWarning
Pages/shift<2>5
Interrupt ratio<25%>50%
Night pages0>1
False positives<10%>30%

High alert volume = burnout risk. Fix alerts, not engineers.

Rotation Design

ParameterRecommendation
Team size6-8 engineers minimum
Shift lengthMax 3-4 consecutive days
HandoffOverlapping 30-min window
Shadow period2 weeks for new members

Escalation Tiers

L1: Primary On-Call

First responder, initial triage, known fixes

L2: Secondary/SME

Domain expert, complex issues, escalation

L3: Management

SEV1 coordination, customer comms, exec updates

Compensation & Fairness

  • Comp time: Time off after heavy shifts
  • Pay differential: Extra pay for on-call hours
  • Equitable rotation: Fair holiday distribution
  • Opt-out option: Accommodations for burnout

Alert Quality Gates

GateRequirement
ActionableClear remediation steps
UrgentNeeds human intervention now
DocumentedRunbook link in alert
Tuned<10% false positive rate

If it doesn't page, make it a ticket. If it's noise, delete it.

Handoff Checklist

  • Active incidents briefed
  • Recent deployments noted
  • Pending changes flagged
  • Known issues documented
  • Contact info verified

Burnout Prevention

SignIntervention
Dreading shiftsReview alert load
Constant fatigueExtend rotation gaps
CynicismPair with supportive peer
AvoidanceTemporary rotation break

Continuous Improvement

  • Weekly: Review noisy alerts, tune or delete
  • Monthly: On-call retrospective
  • Quarterly: Rotation structure review

Sustainable On-Call

Great on-call is boring on-call. Fix the system, not the people.