Release It! Patterns

Stability Patterns for Production Systems

Resilience Patterns | Technical Operations Excellence

15
Stability Patterns
12
Anti-Patterns
2007
First Edition
5s
Timeout Default

Key Stability Patterns

PatternPurpose
Circuit BreakerStop cascading failures
BulkheadIsolate failures to partitions
TimeoutPrevent indefinite waits
RetryHandle transient failures
FallbackGraceful degradation
Shed LoadReject excess traffic
HandshakingVerify capacity before work

Circuit Breaker States

StateBehavior
ClosedNormal operation, count failures
OpenFast fail, don't call downstream
Half-OpenTest with limited traffic

Thresholds: 5 failures, 30s timeout, 1 test request

Bulkhead Strategies

  • Thread pool isolation: Separate pools per dependency
  • Semaphore isolation: Limit concurrent requests
  • Process isolation: Separate containers/pods
  • Network isolation: Separate subnets

More Stability Patterns

PatternUse Case
Steady StateSelf-cleaning logs/data
Test HarnessSimulate bad behaviors
DecouplingAsync via queues
Fail FastCheck prereqs early

Stability Anti-Patterns

Anti-PatternRisk
Integration PointsEvery call is a risk
Chain ReactionsOne failure cascades
Cascading FailuresAvalanche effect
UsersUnpredictable traffic
Blocked ThreadsThread pool exhaustion
Unbounded QueuesMemory exhaustion

More Anti-Patterns

Anti-PatternRisk
Self-DenialMarketing DDos
Unbalanced CapacityBottleneck fails first
Slow ResponsesWorse than no response
SLA InversionDepend on weaker SLA

Timeout Guidelines

TypeRecommendation
Connect1-3 seconds
Read5-30 seconds
TotalMax acceptable latency

Always set timeouts! Never use language defaults.

Key Quote

Every integration point will eventually fail in some way.

- Michael Nygard, Release It!

Expect Failure

Design for failure; plan for success.