Kafka Acks & Min Insync Replicas Explained

uncovering the commonly-confused min insync replicas setting

Problem

There are three Kafka configs whose interplay is often misunderstood:

For some reason people keep mistaking that this results in some quorum write functionality in Kafka.

No such thing exists.

Let’s dissect the configs one by one. Starting from the simplest:

Replication Factor (RF)

Kafka’s replication protocol is such that for every partition:

  • N replicas of the data exist

these replicas live on brokers, and there is always:

  • one leader broker

  • N-1 follower brokers

replication.factor denotes what

N is. Usually 3.

This is set separately for each topic.

replication factor of one. the replica set is [1].

replication factor of three. the replica set is [1,2,3].

In-Sync Replicas (ISR)

Producer clients only write to the leader broker — the followers asynchronously replicate the data.

It’s a distributed system — how do you know the followers are keeping up with the leader (i.e have the latest data)?

Enter in-sync replicas.

💡 An in-sync replica (ISR) is a broker that has the latest data for a given partition

  • a leader is always an in-sync replica (by definition)

  • a follower is an in-sync replica only if it has fully caught up to the partition it’s following

    • if it falls behind, it’s no longer an in-sync replica

    • ❓ when do we determine it “fell behind”:

      • 1. when the follower hasn’t consumed up to the log’s last offset in the last replica.lag.time.max.ms

      • 2. when it loses ZooKeeper/KRaft quorum connectivity (zookeeper.session.timeout.ms / broker.session.timeout.ms)

Acks 🫡

Your producer sends data to Kafka. When do you expect a response?

  1. after all in-sync replicas persist the record?

  2. after the leader persists the record?

  3. immediately once sent?

The trade-off is between durability and speed. Different workloads have different requirements, so naturally - this is configurable.

💡 acks (acknowledgements) is a producer client config denoting the number of brokers that acknowledge receipt of a record before the producer considers the write as successful.

It supports three values - 0, 1 and all.

  • acks=0 - the producer won’t even wait for a response from the broker, it immediately considers the write successful once sent.

  • acks=1 - the producer considers the write successful when the leader acknowledges the record. The leader broker will know to respond the moment it persists the record to disk.

  • acks=all - the producer considers it successful when all of the in-sync replicas persist the record. The leader broker will respond once all the in-sync replicas persist the record.

The acks setting is a good way to configure your preferred trade-off between durability guarantees and performance.

☝️Minimum In-Sync Replica

Best to start with a FAQ:

If you have three in-sync replicas, acks=all and min.insync.replicas=1, is it true the broker won’t wait for the other two replicas before replying?

Random Engineer

No. It will wait for all the in-sync replicas.

If you use acks=all and there’s only one in-sync replica (the leader) - isn’t that equal to acks=1?

Random Engineer

No. Because this request is now prone to fail validation, whereas acks=1 will always pass under this circumstance.

💡 min.insync.replicas is a broker config that denotes the minimum number of in-sync replicas a broker requires to acknowledge a record before responding to a producer configured with acks=all.

If there are fewer in-sync replicas than the minimum, the broker will respond with an error and not persist the record. ❌

This setting denotes validation - the MINIMUM number of in-sync replicas a broker requires to ack a record, in order to NOT fail the produce request.

And it only applies on Produce requests with acks=all.

If you want me to name it explicitly so you never mistake it, I’d call it:

min.insync.replicas.to.not.fail.an.acks.all.produce.request

For posterity, here are the errors you might hit if you don’t pass this validation:

Server-side Error:

[2023-05-03 19:34:42,890] ERROR [ReplicaManager broker=2] Error processing append operation on partition testche-2 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.NotEnoughReplicasException: The size of the current ISR Set(2) is insufficient to satisfy the min.isr requirement of 2 for partition testche-2

Client-side Error:

[2023-05-03 19:34:42,789] WARN [Producer clientId=console-producer] Got error produce response with correlation id 19 on topic-partition testc-2, retrying (0 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender)

[2023-05-03 19:34:42,891] ERROR Error when sending message to topic testche with key: null, value: 1 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.NotEnoughReplicasException: Messages are rejected since there are fewer in-sync replicas than required.

Putting It All Together

  • Acks is about answering the question "when do you expect a response?".

    • it is a trade-off between durability (in motion) and speed

  • RF is about how many times to copy the data (durability at rest)

  • Min ISR is about how to validate your high-durability produce requests (acks=ALL)

Apache®, Apache Kafka®, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.