DORA Metrics, Progressive Delivery & Safe Changes
Capacity & Release | Technical Operations Excellence
| Metric | Elite Target |
|---|---|
| Deploy Frequency | On-demand (multiple/day) |
| Lead Time | <1 hour commit to prod |
| Change Fail Rate | <5% of deploys cause issues |
| Time to Restore | <1 hour to recover |
Based on DORA research: elite performers achieve 182x higher deploy frequency
Route 1-5% traffic to new version, monitor, expand gradually
Two identical envs, instant switchover, easy rollback
Decouple deploy from release, targeted rollouts
40%+ of incidents stem from config/deployment errors
Commit → CI/CD → Canary (1-5%) → Rollout → Full Deploy
| Stage | Gate |
|---|---|
| Build | Tests pass, security scan |
| Canary | Error budget not exceeded |
| Rollout | Metrics within thresholds |
Non-Abstract Large System Design - 4 essential questions:
| Question | Focus |
|---|---|
| Is it possible? | Can we build it at all? |
| Can we do better? | Optimize design choices |
| Is it feasible? | Cost, time, resources |
| Is it resilient? | Graceful degradation |
| Component | Approach |
|---|---|
| Demand Forecast | Historical trends + growth models |
| Headroom | N+1 minimum, N+2 for critical |
| Load Testing | Regular stress tests at 2x expected |
| Auto-scaling | HPA/VPA with proper limits |
| Tier | Examples | Process |
|---|---|---|
| Low | Config, docs | Auto-deploy |
| Medium | App code | Canary + review |
| High | Infra, DB schema | Change board |
Ship Fast, Ship Safe
Elite teams deploy frequently with low failure rates.