- 2 Minute Streaming
- Posts
- Kafka Consumer Groups Basics
Kafka Consumer Groups Basics
🐥 the most basic 101 + extra resources
Consumer Groups
A consumer is an application that leverages the Kafka client library (in particular, its KafkaConsumer
class) to consume messages from Kafka and do something with them.
A Kafka topic may have more messages than a single app would ever be able to process - so we need to scale the consumption up.
Enter Consumer Groups.
A consumer group is a collection of consumer applications that work in tandem to consume from the same topic(s).
The apps don’t talk to each other - they have a broker put them in the same consumer group as others via the client-configured group_id
config.
Coordination
When we have N consumers in the same group, we need to make sure they’re coordinating their work appropriately.
At any one time, the goal is to have one consumer reading from a partition - you wouldn’t want two reading duplicate records from the same partition.
This is done through the consumer group protocol:
consumer clients join the group before consuming anything.
to join the group, they talk to a specific broker - called the Group Coordinator.
The Group Coordinator’s job is to maintain the membership of the group - what consumers are part of it. Because we have everything group-related owned by one broker, it becomes possible to avoid these duplicate reads in the happy path.
This Group Coordinator broker is also the one that helps the consumers save their offsets — also called offset commits. This data is stored in an internal topic named __consumer_offsets
.
This topic also contains metadata about the group, so that the coordinator can fail over appropriately to another broker if the original one dies.
Consumer Group Rebalance
We rebalance when we want to move partition ownership from one consumer to another.
A consumer group rebalance is the act of having every member re-join the group. Through this process, each consumer receives the assigned partitions it should consume from.
There are 6 reasons why a group can be forced to rebalance:
a consumer joins the group (sends a JoinGroup request).
a consumer shuts down gracefully.
it leaves the group via a LeaveGroup request
max_poll_interval_ms passing between Consumer#poll() calls.
a consumer dies. (its heartbeat requests time out after session_timeout_ms)
the Consumer#enforceRebalance API is called.
a new partition is added to a topic that the group is subscribed to.
Heartbeats
Each consumer maintains a heartbeat to the group coordinator - sending a heartbeat every heartbeat_interval_ms.
This is the main way that the coordinator communicates the need for a rebalance to the consumer.
i.e a consumer restarts and a rebalance is needed - how do the other consumers realize this?
The Coordinator responds with an error in the heartbeat request.
Example rebalance sequence due to a consumer restarting.
The lower the interval setting, the faster your consumers will react to rebalances → the faster they’ll complete them.
Liked this?
Help support our growth so that we can continue to deliver valuable content!
More Kafka? 🔥
Those are the basics that fit in 2 minutes. Here are some more consumer group concepts that we have covered:
Kafka Consumer Group Rebalance Visualized
Its a dance consisting of two requests: 💃
1. JoinGroup
2. SyncGroupAnd once established, consumers regularly heartbeat to the group coordinator to signal they’re alive.
The whole group goes through the two steps sequentially.
❤️… twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
3:03 PM • Jul 9, 2023
Read carefully:
Understanding Kafka's consumer group rebalance requires focus!
The protocol is very intricate. It was intentionally designed with clear separation of concerns:
🟣 the broker knows about group membership & subscribed topics
🟣 the consumers know about partition… httptwitter.com/i/web/status/1…p— Stanislav Kozlovski (@BdKozlovski)
2:13 PM • Jun 29, 2023
Regular consumer group rebalances in Kafka can be pretty disruptive...
All consumers:
1. Stop consuming in order to give up their partition ownership
2. Re-join the group via the JoinGroup request
3. Receive a brand new partition assignment via the SyncGroup request, only once… twitter.com/i/web/status/1…— Stanislav Kozlovski (@BdKozlovski)
2:24 PM • Jul 6, 2023
The highest ROI config tweak you can make to your Kafka Consumer?
group_instance_id
The default consumer group configuration in Kafka can be a pain. Here are two main problems:
🤕 1. every consumer group rebalance stops the world - consumers stop reading from partitions until… twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
6:24 PM • Jul 3, 2023
More Content ⚡️💥
bad news: the letter took longer to post this time.
good news: because of that, we have more content to share!
We have quite a few interesting threads on different topics. I’m sharing the Twitter links solely because they allow more characters than LinkedIn.
Long, very good summary from many sources about this infamous outage. If you read anything out of this issue - I recommend this.
It’s a nice, refreshing spring day.
You open up your laptop in the morning and go straight to that design document you started last night.
You are excited to finish it before the meetings of the day start coming in.
Next, you plan to fix the critical Jira ticket.
Suddenly:… twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
2:17 PM • Jul 7, 2023
✍️ Sketch Algorithms - 90% less memory, 4x less CPU and cost for roughly the same result.
Stochastic, Sublinear Streaming Algorithms.
What?
Stochastic. Sublinear. Streaming. Algorithms.
... What?
OK - let's start with the problem first 👇
You have a Kafka topic with billions of records representing latency metrics.
You want to compute their p95, p99, p999.
How?
— Stanislav Kozlovski (@BdKozlovski)
3:06 PM • Jun 24, 2023
🛡️ How Wise configured mTLS on Kafka in an easy way - Envoy + Spire
mTLS is one of the most secure ways you can configure your Apache Kafka cluster.
But. It's not the simplest.
Here is how Wise did it (without the hassle).
Let’s first define the terms:
☂️ TLS - is a protocol for encrypting the data before it travels over the wire, so that no… twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
8:52 AM • Jun 28, 2023
Last week I officially made 5 years at @confluentinc
Super grateful for the chance to join so early. It’s the company that shaped my career as an engineer, and also my life as an individual to some extent (moving abroad, etc).
— Stanislav Kozlovski (@BdKozlovski)
5:41 PM • Jul 1, 2023
The great thing about Sketches?
They’re infinitely parallelizable. 🧬
Sketch Algorithms possess one very valuable quality - and that is their additivity.
Most algorithms allow you to combine their results together to generate a brand new sketch over the sum of all of the… twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
5:34 PM • Jul 4, 2023
You want to solve the cardinality problem with 80 megabytes worth of unique IDs.
How do you do it using just 12 kilobytes?
Simple - read the story of how Reddit did it: 👇
In 2017, Reddit wanted to better communicate the scale of its communities to its users.
The easiest way… twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
7:50 AM • Jul 1, 2023
Apache®, Apache Kafka®, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.