The circuit breaker is a design pattern, used extensively in distributed systems to prevent cascading failures. In this post, we’ll go through the problem of cascading failures and go over how the circuit breaker pattern is used to prevent it.
Motivation: The problem of cascading failures
Before jumping into the circuit breaker pattern, let’s try and understand what problem it tries to solve.
When service A tries to communicate with service_B, it allocates a thread to make that call. There are 2 kinds of failures that can occur while making that call. We use the example of a user service making a call to friends service.
''' user service '''
def get_user_info(user_id: str):
try:
friends_service.get_friends(user_id)
except Exception as e:
raise InternalServerError
Immediate failures: In immediate failure, an exception is raise immediately (like: Connection Refused) and the service_A thread is freed.
Timeout failures: However serviceb takes a long time to respond. As we get new requests to service A, we’re getting more and more threads waiting for serviceb. If several requests are made while waiting for timeouts this can exhaust service A’s thread-pool and can bring down service A.
”Your code can’t just wait forever for a response that might never come, sooner or later, it needs to give up. Hope is not a design method.” -Michael T. Nygard, Release It!
Let’s walk through an example of a social media application to understand this better. Here we have an aggregator
service which is what the client interacts with, it aggregates results from a bunch of services including user
service. User
service calls photo
service and friends
service which in turn calls friends_db
.
Here, friends
service tries to make requests to the friends_db
, however friends_db
is not responding with an immediate failure, instead keeps the threads from the friends
service waiting. The friends
service tries to retry thereby using more threads. As it gets new requests more threads are waiting on the friends_db
to respond.
We can now see how friends service is now becoming the source of timeouts for user service. User service exhausts it’s thread-pool waiting for requests from friends
service, just how friends
service was waiting for friends_db
. We can now see how failure in friends_db
caused a cascading failure in services indirectly dependent on it,.
Eventually the aggregator service will also come down with the same reason. The client calls the aggregator service and so our system is effectively shutdown for the users. We see how one error in one component of our architecture caused a cascading failure bringing all other services down.
Circuit Breaker Pattern
Circuit breaker is usually implemented as an interceptor pattern/chain of responsibility/filter. It consists of 3 states:
- **Closed**: All requests are allowed to pass to the upstream service and interceptor passes on response of the upstream service to the caller.
- **Open**: No requests are allowed to pass to upstream and interceptor responses with a default response; usually an error response.
- **Half-Open**: Some of the requests are allowed pass to upstream others are terminated and responded with default response.
Create a request interceptor
Circuit breaking by wrapping service calls around a circuit breaker in code:
from circuitbreaker import CircuitBreaker
class MyCircuitBreaker(CircuitBreaker):
FAILURE_THRESHOLD = 20
RECOVERY_TIMEOUT = 60
EXPECTED_EXCEPTION = RequestException
@MyCircuitBreaker()
def get_user_info(user_id):
try:
friends_service.get_friends(user_id)
except Exception as e:
raise InternalServerError
We can also leverage the sidecar pattern to this. In this approach we don’t have to modify our services by wrapping them around circuit-breakers, but instead, we ship our applications with a sidecar like Envoy. All outbound traffic from the service is proxies through envoy. Envoy supports circuit breaking out of the box. Following is an example configuration of circuit-breaking with Envoy:
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 1000
max_requests: 1000
- priority: HIGH
max_connections: 2000
max_requests: 2000
Resources
- http://martinfowler.com/bliki/CircuitBreaker.html
- Circuitbreaker python library: https://pypi.org/project/circuitbreaker/
- Release it (Book) https://books.google.com/books/about/Release_It.html?id=Ug9QDwAAQBAJ&source=kp_book_description
- Circuit breaking in Envoy: https://www.envoyproxy.io/learn/circuit-breaking