2 Minute Streaming
Posts
Kafka's High Watermark Offset

Kafka's High Watermark Offset

a cornerstore of Kafka's replication protocol

Stanislav Kozlovski
July 14, 2025

The Phantom Reads 👻📚

Any system can experience phantom reads if it’s not designed carefully.

⭐️ phantom reads {AKA Ghost Reads}: the act of two identical read queries returning different messages.

The literature on the matter is mostly focused on databases in the context of Transaction Isolation, but the problem in Kafka is a matter of durability & consistency.

Imagine the following scenario:

Producer writes data with offsets 5-7
The leader broker (kafka-1) accepts it
The data is not yet replicated to the followers and the consumer reads data with offsets 5-7
The leader broker (kafka-1) dies. kafka-2 takes over.
Since neither kafka-2 or kafka-3 had the messages from 5-7 — the data is lost. 👋 The new log ends at offset 4.
The producer continues writing. It writes 5 new records - they’re now at offset 5-9.

The consumer reads data from offset 8 (its last read offset was 7). It misses the new records 5-7. Worse off - if the same consumer explicitly starts reading from offset 5 again, it will experience the ghost read. The data it receives for the messages at offset 5-7 will be different than the one it previously received.

🤫 PS: Whether the producer uses acks=1 or acks=all doesn’t matter much, because that’s simply a matter of whether the producer got a response or not.

In practice, the data is present on the leader broker the moment it arrives (regardless of the acks setting) - so the consumer could read it.

Learn more about the acks setting here:

Kafka Acks & Min Insync Replicas Explained

min insync replicas is commonly confused to be a config that controls quorum write functionality - but no such thing exists.

blog.2minutestreaming.com/p/kafka-acks-min-insync-replicas-explained

How Kafka Avoids This

Fortunately - the case I presented to you is not reality. Kafka does not suffer from this problem, because the devs already took measures against it.

The High Watermark (HWM) ✨

The fix is incredibly simple - Kafka only returns fully-replicated messages to consumers.

In our example, the consumer read messages 5,6,7 before they were replicated. This cannot happen in practice.

The max readable offset is denoted by the high watermark offset. This is the highest offset up to which every in-sync replica has caught up to.

Our same example, this time with the high watermark offset

Follower brokers replicate by sending FetchRequests to the leader broker. The leader replies with the latest data and keeps a local map of follower broker → latest fetched offset per partition (!)^{Each partition has a different HWM offset.}

The lowest common denominator of the highest offset becomes the next high watermark offset (i.e the newest offset every follower has replicated).

The high watermark changes on more or less every follower→leader fetch request.

When the leader receives confirmation that every follower replica of the partition has read up to offset 7, the high watermark gets incremented to that. Only then is the consumer sent message 7.

💡 Did you know:

The HWM bump also denotes when a broker responds to an acks=all Producer request. The high watermark IS the offset that denotes what the latest fully-replicated message across all in-sync replicas (ISR) is, so it makes sense to reuse that mechanism to respond to producers.

HWM +1

The incrementation of the high watermark actually happens on the second fetch request after the one that replicates it.

💡 Example:

1. Follower broker A sends a FetchRequest(startOffset=5)) and receives a response with messages 5-7. It persists them to disk.

2. The follower sends the next FetchRequest(startOffset=7) . Only now does the leader bump the HWM to 7.

This is done because it’s the only way the leader can be certain that the response was received and persisted successfully.

In this way, it actually takes two rounds of replication fetch requests to be ready to expose tail data to consumers. Here’s how it works in detail:

The high watermark is actually exposed to consumers in each FetchResponse - that’s how consumer lag is calculated.

In other words, for all intents and purposes, the end of the partition’s log from a consumer’s point of view is the high watermark offset.

Simple solution! But it goes a long way. 🫡

_{Apache®, Apache Kafka®, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endornt by The Apache Software Foundation is implied by the use of these marks}_.