2 Minute Streaming
Posts
What's a Kafka Consumer Group

What's a Kafka Consumer Group

The clearest, basic explanation of Kafka Consumers and Consumer Groups out there

July 10, 2023

Consumer Groups

A consumer is an application that leverages the Kafka client library (in particular, its KafkaConsumer class) to consume messages from Kafka and do something with them.

A Kafka topic may have more messages than a single app would ever be able to process - so we need to scale the consumption up.

Enter Consumer Groups.

A consumer group is a collection of consumer applications that work in tandem to consume from the same topic(s).

The apps don’t talk to each other - they have a broker put them in the same consumer group as others via the client-configured group_id config.

Coordination

When we have N consumers in the same group, we need to make sure they’re coordinating their work appropriately.

At any one time, the goal is to have one consumer reading from a partition - you wouldn’t want two reading duplicate records from the same partition.

This is done through the consumer group protocol:

consumer clients join the group before consuming anything.
to join the group, they talk to a specific broker - called the Group Coordinator.

The Group Coordinator’s job is to maintain the membership of the group - what consumers are part of it. Because we have everything group-related owned by one broker, it becomes possible to avoid these duplicate reads in the happy path.

This Group Coordinator broker is also the one that helps the consumers save their offsets — also called offset commits. This data is stored in an internal topic named __consumer_offsets.

This topic also contains metadata about the group, so that the coordinator can fail over appropriately to another broker if the original one dies.

Consumer Group Rebalance

We rebalance when we want to move partition ownership from one consumer to another.

A consumer group rebalance is the act of having every member re-join the group. Through this process, each consumer receives the assigned partitions it should consume from.

There are 6 reasons why a group can be forced to rebalance:

a consumer joins the group (sends a JoinGroup request).
a consumer shuts down gracefully.
- it leaves the group via a LeaveGroup request
max_poll_interval_ms passing between Consumer#poll() calls.
a consumer dies. (its heartbeat requests time out after session_timeout_ms)
the Consumer#enforceRebalance API is called.
a new partition is added to a topic that the group is subscribed to.

Heartbeats

Each consumer maintains a heartbeat to the group coordinator - sending a heartbeat every heartbeat_interval_ms.

This is the main way that the coordinator communicates the need for a rebalance to the consumer.

i.e a consumer restarts and a rebalance is needed - how do the other consumers realize this?

The Coordinator responds with an error in the heartbeat request.

Example rebalance sequence due to a consumer restarting.

The lower the interval setting, the faster your consumers will react to rebalances → the faster they’ll complete them.

Liked this?

Help support our growth so that we can continue to deliver valuable content!

More Kafka? 🔥

Those are the basics that fit in 2 minutes. Here are some more consumer group concepts that we have covered:

💃 Visualization of the rebalance dance:

SyncGroup + JoinGroup

This is "SyncGroup + JoinGroup" by Stanislav Kozlovski on Vimeo, the home for high quality videos and the people who love them.

vimeo.com/843832143?share=copy

🦩 Consumer Group Rebalance Dance (JoinGroup & SyncGroup)

Kafka Consumer Group Rebalance Visualized
Its a dance consisting of two requests: 💃
1. JoinGroup
2. SyncGroup
And once established, consumers regularly heartbeat to the group coordinator to signal they’re alive.
The whole group goes through the two steps sequentially.
❤️… twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
3:03 PM • Jul 9, 2023

Read carefully:
Understanding Kafka's consumer group rebalance requires focus!
The protocol is very intricate. It was intentionally designed with clear separation of concerns:
🟣 the broker knows about group membership & subscribed topics
🟣 the consumers know about partition… httptwitter.com/i/web/status/1…p
— Stanislav Kozlovski (@BdKozlovski)
2:13 PM • Jun 29, 2023

🦜 Eager vs Cooperative-Incremental Rebalances

Regular consumer group rebalances in Kafka can be pretty disruptive...
All consumers:
1. Stop consuming in order to give up their partition ownership
2. Re-join the group via the JoinGroup request
3. Receive a brand new partition assignment via the SyncGroup request, only once… twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
2:24 PM • Jul 6, 2023

🦆 Dynamic vs Static Group Membership

The highest ROI config tweak you can make to your Kafka Consumer?
group_instance_id
The default consumer group configuration in Kafka can be a pain. Here are two main problems:
🤕 1. every consumer group rebalance stops the world - consumers stop reading from partitions until… twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
6:24 PM • Jul 3, 2023