Service Cluster
In the previous topic, we’ve talked about some basic facets of Microservice . Jumping to this one, we’ll see how to operate microservices effectively to ensure a reliable system.
Cluster
Traditionally, we build and run a service as a single process. This method might work initially, nonetheless, if the machine or process crashes, the service is also down.
Therefore, for resilience, we need to deploy the service as a cluster of multiple instances (processes), ideally distributed across different machines. This setup ensures that the service remains operational even if some instances crash.
For example, consider running a service cluster of two instances residing in different machines. If one instance (or its machine) fails, the other instance still provides the service.
From this point onward, when we refer to a service, we imply a service cluster comprising multiple instances.
Service State
To maintain a service, we need to know its state. Two key metrics are used to assess this: Health and Availability .
Healthy
Health refers to an instance’s ability to perform its intended tasks. The instance must determine its health state in one of two options:
- Heathy (Up): it’s willing to accept and handle requests from users.
- Unhealthy (Down): The instance has encountered a problem (e.g., a disconnect from the database, hardware failure…) and is no longer serving requests.
Health Interface
Typically, an instance exposes a health interface, reporting its status:
- Consumers (end-users or other services) can perform Health Check to confirm they are interacting with a healthy instance.
- The system can also use this interface to isolate unhealthy instances.
Heartbeat Mechanism
A common technique to isolate unhealthy instances is the heartbeat mechanism. In essence, a health checker (Load Balancer or DNS) periodically accesses the health interfaces of instances to filter out the faulty ones.
For example, a health checker verifies the health status of instances every 5 seconds. If an instance is found to be unhealthy, the checker will stop forwarding traffic to it.
Service Availability
Availability is a critical metric that indicates the accessibility of a service from the user perspective.
For example, a service depends on another service that is currently down. From the technical perspective, the target is the source of problem, the service itself is still operational and healthy; However, users don’t care; they just request and see that the service is unavailable.
This metric is essential for outlining Service Level Agreement (SLA). Typically, it’s calculated in two ways
Time-based Availability
The first approach is using the percentage between the uptime and total time over a period of time, typically a year:
$Availability = \frac{Uptime}{Uptime + Downtime}$
For example, a service runs for a year with a downtime of approximately 3 days
,
the availability would be:
$Availability = \frac{362}{362 + 3} \approx 99\%$
This approach assumes that requests are equally distributed over time. However, it’s less sensitive to short outages, a service may receive much higher traffic than usual within downtimes and make the final availability away from the real experience.
Request-based Availability
Request-based Availability suggests calculating availability based on the number of successful requests compared to the total numbers of requests:
$Availability = \frac{Successful\ requests}{Total\ requests}$
For example, we have a service that successfully handles 1000 out of 1010 requests:
$Availability = \frac{1000}{1010} = 99\%$
This approach gives a more precise result, but it possibly generates bias.
- More active users will affect the availability strongly, they can suppress the experience of less active ones.
- During downtime, users tend to retry and make a lot of junk requests, making availability much worse.
Thus, Request-based Availability is rarely used in public-serving services, and more suited to internal workloads.
Aggregate Availability
In the Microservice topic, we’ve discussed some types of design-time coupling negatively impact the development process. In the operational environment, runtime dependencies also emerge when services communicate over a network:
- Location Coupling: services need to know the address (IP, domain name…) of others.
- Availability Coupling: when a service calls another service, its availability will be impacted by that service. Let’s focus on this critical one!
For instance,
the Subscription Service
cannot complete its task without successfully communicating with the Account Service
.
Therefore, if the Account Service
is unavailable,
the Subscription Service
will also be affected.
Thus, the final availability of a service is an aggregation from all relevant services.
$Availability = S (self) \times S1 \times S2 \times … \times Sn$
This interdependency can be a huge issue when the communication between services forms a complex graph.
Some services will become a
Single Point of Failure
,
that means its corruption halts the entire system.
For example, in this map, if D
or E
is unavailable,
the entire chain stops working unexpectedly.
Availability Decoupling
We’ve discussed the role of Messaging to decouple a Microservice system. Helpfully, Messaging also means in the runtime environment.
For example,
the Subscription Service
becomes unavailable too as
it’s afraid that the Account Service
will miss its requests.
By introducing
Messaging
, the Subscription Service
can simply publish messages and continue its workflow without waiting for a response.
The Account Service
can then process these messages whenever it is available, ensuring its availability no longer directly impacts the Subscription Service
.
This approach decouples the services, promoting greater resilience and flexibility in the system.
This approach is particularly beneficial in environments with numerous services. Services do not rely on each other; even if some of them fail, the rest continue to function.
For example, in this diagram,
the final availability of Service A
would be: $(SA) = SA (self) \times SB \times SC \times SD$
With Messaging , it becomes $SA = SA (self) \times MessageBroker$
Actually, we’ve shifted the complex interdependency to the broker. Now, the system looks more manageable as the dependencies only end with one connection, not a harmfully long chain. Probably, the message broker becomes a dangerous Single Point of Failure , requiring it to be highly available and fault-tolerant.
Cluster Types
Basically, service clusters are categorized into two types: Stateless and Stateful .
- Stateless services only contain logic and do not store state between requests.
- Stateful services store state and make requests relate to each other.
Stateless Service
A stateless service operates with instances that share identical logic and don’t retain local data.
For instance, consider two instances of the Account Service
.
Both instances query the same database and are designed to return identical results.
The specific instance a client connects to doesn’t matter because each
instance has the same logic and data access patterns,
ensuring a consistent response.
This consistent behavior makes scaling a stateless service be a piece of cake, we simply increase or decrease the number of identical instances.
Stateful Service
A stateful service, unlike a stateless one, has instances that may store local state. As a result, different instances of the service may behave differently based on their local state.
Stateful services are often paired with real-time features, which require maintaining client connections to push messages from the service side. A common example of this is a chat application that holds client connections (typically using WebSocket) for real-time messaging.
Consider a cluster of two instances,
if Client A
connects to Instance 1
and Client B
connects to Instance 2
,
they cannot chat with each other because different instances handle their own socket connections.
Scaling Problem
Stateful services are more challenging to scale and generally recommended avoiding. Simply increasing the number of instances is insufficient, additional strategies are required to manage and share state across instances.
Centralized Cluster
The first approach is building a shared store between instances.
In the chat example, we introduce a shared component known as the Presence Store
, which manages the mapping of users to their current server instances.
Whenever a user connects to the system, their server instance creates or updates a presence record in this store.
Service instances can effectively determine the location of any user,
enabling them to forward messages directly to the appropriate instance.
The simplicity of this approach makes it the preferred solution in many solutions.
However, this way limits availability. When processing a message, the instance must rely on the connection store, adversely impacting its availability.
Decentralized Cluster
To maximize availability, the system can eliminate the connection store and deterministically map users to instances.
Each instance is responsible for a specific group of users;
When a user connects to the system, it will be assigned to the owner instance.
For example, we define groups of users with the modulo operation user id % the number of instances
.
Now, with messages containing user id
,
instances can quickly specify where to forward them.
The cluster is far cleaner without any dependency,
the final availability is bounded around proprietary instances.
However, this model is more complex than it appears. It’s extremely challenging to develop and maintain:
- How do we track cluster information consistently across instances without relying on a central store?
- How can we adapt to a dynamic number of instances as they scale up or down?
- How do we ensure the cluster stays operational when some instances fail?
These questions highlight the inherent complexity of decentralized communication. We’ll explore these challenges in greater depth in the Distributed Database topic, where a database cluster is treated as a stateful service.