- 2 Minute Streaming
- Posts
- Announcing Apache Kafka 4.0
Announcing Apache Kafka 4.0
see the top 3 features, and some trivia around other major releases

🥳 Kafka 4.0 Just Released
Every few years, like a well-timed event stream, a major Kafka release hits Maven.
3.0 released in September 2021. It’s been exactly 3.5 years since then.
To celebrate this epoch, I will quickly summarize the top features from 4.0, do a little retrospection and a little futurespection.
🏆 Top 3 Kafka 4.0 Features
40 KIPs made it to this release! Here are 3 major ones worth calling out:
1. KIP-848 goin’ GA
The new consumer group protocol is officially production-ready. 💥
It completely overhauls consumer rebalances by moving the logic to the broker and avoiding the stop-the-world effect (where all consumers had to pause when a new one came in).
I covered it in 2-minutes here (including a step-by-step video):
Noteworthy is that in 4.0, the feature is GA and enabled in the broker by default.
The consumer client default is still the old one, though. To opt-in to it, the consumer needs to set group.protocol=consumer
2. KIP-932: Queues (EA) 🚇
Perhaps the hottest new feature, Queues introduces a new type of consumer group - the Share Consumer - that gives you queue-like semantics:
per-message acknowledgement/retries
ability to have many consumers collaboratively share progress reading from the same partition (previously, only one consumer per consumer group could read a partition at any time)

It’s also available in a nice, 2-minute read here:
3. RIP ZooKeeper 💀
It’s been a long time coming.
After 14 years of service… ZooKeeper is officially gone from Apache Kafka. 🥲
KRaft (KIP-500) completely replaces it today.
KRaft has been a long time in the making. The first pieces of KIP-500 (the core Raft implementation) were merged in AK 2.7 (September 2020) - four and a half years ago!
For more information about KRaft, check out one of these posts:

📝 Less-Interesting Changes
MirrorMaker1 is removed
The Transaction Protocol is strengthened
KRaft is strengthened via Pre-Vote
Java 8 support is removed
Log4j was updated to v2
The log message format config (
message.format.version
) and versions v0 and v1 are deleted (you’re a 👴🏼 if you know these)
Talk is Cheap. Show Me The Code
Counting just Java, Scala and Python…
There are around 1.4 million lines of code in Kafka 4.0.
1,397,430 to be precise 😛
That is… a lot of code when you think about it.
Here is how it has evolved:


We see a reduction of Scala code, which is expected.
All new major features (KRaft, Tiered Storage, Consumer Groups v2 and Queues) are written exclusively in Java.
Old features, written in Scala, eventually get deleted (e.g ZooKeeper)
🏆 Retro
There have only been 3 other major Kafka releases. See how Kafka has evolved since:

Kafka only ever peaked at 206k lines of Scala 🥲
(Scala fans will say this is equivalent to 1M lines of Java)
Language wars aside, there is a more important metric:

The number of people contributing to Kafka keeps going up!
Kafka keeps growing in community! This is the metric I’m most excited about, and the metric that you should be most excited about too - because it points to project longevity.
BTW, I recently did a community spotlight on Kafka contributors from Taiwan.
I was really surprised to see we have 6+ high impact contributors just from there! Check it out here on my Substack:
Getting back to Kafka, let’s do a quick retro on the top features from previous major versions:
1.0 (Nov 2017)
the StreamsBuilder API was added (you’ll know if you’ve ever used KStreams)
lots of metrics added
Java 9 support added. This helped TLS perf a ton
JBOD mode was improved (a broker would previously die if a single disk went offline)
2.0 (July 2018)
KIP-290: the ability to add prefix-based ACLs
KIP-255: a framework for OAuth2 auth in Kafka
ability to update SSL certificates without restart
replication protocol strengthened
dropped Java 7 support
deleted old Scala consumer/producer clients
3.0 (Sep 2021)
changed Producer defaults to the strongest guarantees -
acks=all
,enable.idempotence=true
changed Consumer default
session.timeout.ms
from 10 seconds to 45 seconds (we really had some awful defaults didn’t we)Java 8 was deprecated
3.5 years later, we finally get to remove it today
Important KRaft improvements (snapshots, producer ID)
MirrorMaker1 is marked as deprecated
the code was deleted today
a nicer API to restart connect tasks
I’m actually surprised how boring of a release this was. But history shows most major releases are. Recent large changes have all been introduced gradually and given multiple releases’ time to soak, get improved and tested.
🌌 The Future
Things have changed a lot since 2021, with major features going GA:
Tiered Storage (KIP-405)
KRaft (KIP-500)
The new consumer group protocol (KIP-848)
Now, the next chapter begins - Kafka 4.x. ⭐️
A lot of interesting ideas are already being worked on:
Others are still being discussed around:
KIP-986: Cross-Cluster Replication - a sort of copy of Confluent’s Cluster Linking
KIP-1008: ParKa - the Marriage of Parquet and Kafka - Kafka writing directly in Parquet format (hello Iceberg?)
KIP-1134: Virtual Clusters in Kafka - first-class support for multi-tenancy in Kafka
💸 pro-tip: it costs nothing to join the mailing list and reply in any one of these discussion threads with what you think of the feature.
It helps the project by giving it a data point of user demand.
If any one of these features interests you in particular, you better notify us!
The only thing I’m waiting for is for someone to propose a KIP that makes Kafka topics leaderless and diskless (like WarpStream and Bufstream).

a brief overview of the leaderless/diskless/stateless Kafka wars
Here are a few interesting social media posts you may have missed:
Proprietary Kafka designs can save 90% of cluster costs by avoiding replication & writing directly to S3.
How hard would it actually be to extend the Open Source design to do this?
Surprisingly simple.
If you want to go the leaderless Kafka model - you'd probably need a full… x.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
3:47 PM • Feb 26, 2025

What if I told you that a 1 GiB/s Kafka topic streamed directly into your S3 data lakehouse as an Iceberg table could cost you... $10/hr?
Bufstream does it. It's literally too good to be true.
I spent 20 hours researching them. Here's their story (2 minute read) 🧵
— Stanislav Kozlovski (@BdKozlovski)
3:17 PM • Feb 14, 2025

insane numbers, you can’t get this setup with Kafka

The Tools 🛠️
At this point, the 2 Mminute Streaming empire has expanded to offer a bunch of helpful tools:
🧮 Kafka Calculator (AKalculator) - calculate your Kafka costs on the cloud
💸 AWS Data Transfer calculator - simplest way to calculate the costs of transferring data in AWS (why does’t their native calculator have this???)
⚖️ Kafka Direct-to-S3 Break Even calculator - find the precise throughput after which writing directly to S3 would save your Kafka deployment money
(granted this tool isn’t really practical today because no open source solution exists)
Apache®, Apache Kafka®, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.