Delivery Semantics
Numerous challenges arise when committing changes to both Event Streaming Platform and other data sources simultaneously. These two steps are independent, and failures between them can lead to inconsistencies.
For example, a consumer might successfully process an event and apply changes to another data store but crash before committing the event offset. Upon recovery, the consumer may reprocess the same event unexpectedly.
Delivery semantics define the guarantees provided for event delivery during production and consumption. There are three main types, each offering different trade-offs between latency, durability and reliability.
At-most-once Delivery
This delivery model ensures that an event is delivered zero or one time.
Producer
The producer sends events with ACK=0 to achieve the lowest latency. Even if a request fails, it won’t be retried to avoid duplication.
For example, if the producer doesn’t receive a response from the broker due to a network error and retries the operation, duplicated events may occur.
To prevent this, retries are disabled, even in failure scenarios.
Consumer
The consumer commits the event before handling it. This approach ensures the event won’t be processed more than once if the consumer crashes before committing.
For instance, if a consumer processes an event successfully but crashes before committing, and the commit is delayed, the event will be reprocessed after recovery.
Therefore, events must be committed before they are processed. As a result, if the consumer fails to process an event, it will not attempt to consume that event again.
This model guarantees delivery at most once and offers the lowest latency. It is suitable for scenarios where data loss is acceptable, such as metrics collection.
At-least-once Delivery
This model guarantees that every event is delivered at least once, possibly more.
Producer
The producer uses ACK=1 or ALL and enables retries on failure to ensure event persistence. Clearly, enabling retries can lead to duplicate events.
Consumer
The consumer commits the offset after processing the event.
If it crashes before committing, the event may be reprocessed.
This method provides stronger durability guarantees but may lead to duplicates and higher latency compared to at-most-once delivery. It is well-suited for scenarios where duplicate data is acceptable or can be handled, such as in user activity tracking.
Exactly-once Delivery
This is the most reliable but also the most complex delivery model, ensuring each event is delivered exactly once. Event Streaming Platform solutions don’t fully support this out of the box, so additional techniques are required.
Exactly-once Producer
The producer functions similarly to the at-least-once model, allowing retries on failures and using ACK=ALL for durability.
To prevent duplication, it uses idempotency keys. Each producer is assigned a PID (producer ID) and a seq (sequence number), which it increments locally after receiving an acknowledgment.
Example:
The partition ignores any events with outdated sequence numbers, effectively preventing duplicates.
For example, if P1
sends an event but fails to receive an acknowledgment, it will resend the event.
Because the event’s sequence number is outdated, the partition ignores it, avoiding duplication.
Exactly-once Consumer
Event Streaming Platform cannot tell whether a consumer has processed an event or not. To achieve exactly-once semantics, we must introduce one of the following approaches:
Consume–Process–Produce Pipeline
If the event is simply transformed into another event (with no external side effects), Event Streaming Platform can manage this flow.
Transactional Commit
To support this, Transactional Commit is implemented, ensuring that consumers can only see committed events. If a failure occurs during processing, the transaction is aborted and all uncommitted changes are discarded.
This guarantees the transformation process won’t produce duplicates. However, it only applies to stateless processing with no external side effects.
Two-phase Commit
To synchronize multiple systems, we can use Two-phase Commit :
However, Two-phase Commit introduces the risk of infinite blocking and is not always supported. This is discussed further in this topic.
Event Idempotency
This approach requires each event to carry a unique identifier. Consumers then check for this ID before processing.
For example, a consumer verifies whether an event has already been processed before handling it.
This approach is usually preferred over Two-phase Commit due to its simplicity and better fault tolerance. It is commonly used in combination with the Saga pattern to manage long-running, distributed operations.
However, this method does not provide strong consistency across the system, there can be consistency drift between the streaming platform and the external data store. We’ll explore this limitation in more detail in the Distributed Transaction topic.
Ultimately, exactly-once delivery requires additional mechanisms and is best suited for mission-critical systems where both data loss and duplication are unacceptable (e.g., banking platforms).