Domain 6: Reliability Patterns

Circuit Breakers, Retries, Timeouts, Bulkheads

SRE Bot | Resilience | Max 30 Points

0-6
Ad-hoc
7-12
Foundational
13-18
Standardized
19-24
Advanced
25-30
Optimized

Scoring Criteria by Level

LevelCriteria
1No defensive patterns; cascading failures common
2Basic timeouts in some services; retry logic ad-hoc
3Circuit breakers for critical paths; standardized timeouts
4Bulkheads, load shedding; graceful degradation
5Adaptive patterns; self-healing; antifragile design

Assessment Questions

#QuestionMax
1How do you implement circuit breakers?6
2How standardized are timeouts/retries?6
3Do you use bulkhead isolation?6
4How do you handle graceful degradation?6
5How do you prevent cascading failures?6

Focus Areas

  • Circuit Breakers: Fail fast when dependencies unhealthy
  • Timeouts: Bounded wait times for all calls
  • Retries: Exponential backoff with jitter
  • Bulkheads: Isolate failure domains

Anti-Patterns (Red Flags)

  • No timeouts (infinite waits)
  • Retry storms (no backoff)
  • All-or-nothing failures
  • Cascading failures across services
  • No graceful degradation paths

Evidence Checklist

  • Circuit breaker library in use (Hystrix, resilience4j)
  • Timeout policy documented
  • Retry strategy with backoff implemented
  • Load shedding mechanisms exist
  • Graceful degradation tested

Related Domains

DomainRelationship
Chaos EngTest patterns via chaos experiments
DependenciesPatterns protect from dep failures
CapacityLoad shedding prevents overload

Design for Failure

Assume everything will fail.