Announcing Apache Kafka 4.0

see the top 3 features, and some trivia around other major releases

🥳 Kafka 4.0 Just Released

Every few years, like a well-timed event stream, a major Kafka release hits Maven.

3.0 released in September 2021. It’s been exactly 3.5 years since then.

To celebrate this epoch, I will quickly summarize the top features from 4.0, do a little retrospection and a little futurespection.

🏆 Top 3 Kafka 4.0 Features

40 KIPs made it to this release! Here are 3 major ones worth calling out:

1. KIP-848 goin’ GA

The new consumer group protocol is officially production-ready. 💥

It completely overhauls consumer rebalances by moving the logic to the broker and avoiding the stop-the-world effect (where all consumers had to pause when a new one came in).

I covered it in 2-minutes here (including a step-by-step video):

Noteworthy is that in 4.0, the feature is GA and enabled in the broker by default.

The consumer client default is still the old one, though. To opt-in to it, the consumer needs to set group.protocol=consumer

2. KIP-932: Queues (EA) 🚇

Perhaps the hottest new feature, Queues introduces a new type of consumer group - the Share Consumer - that gives you queue-like semantics:

  1. per-message acknowledgement/retries

  2. ability to have many consumers collaboratively share progress reading from the same partition (previously, only one consumer per consumer group could read a partition at any time)

It’s also available in a nice, 2-minute read here:

3. RIP ZooKeeper 💀

It’s been a long time coming.

After 14 years of service… ZooKeeper is officially gone from Apache Kafka. 🥲

KRaft (KIP-500) completely replaces it today.

KRaft has been a long time in the making. The first pieces of KIP-500 (the core Raft implementation) were merged in AK 2.7 (September 2020) - four and a half years ago!

For more information about KRaft, check out one of these posts:

📝 Less-Interesting Changes

  • MirrorMaker1 is removed

  • The Transaction Protocol is strengthened

  • KRaft is strengthened via Pre-Vote

  • Java 8 support is removed

  • Log4j was updated to v2

  • The log message format config (message.format.version) and versions v0 and v1 are deleted (you’re a 👴🏼 if you know these)

Talk is Cheap. Show Me The Code

Counting just Java, Scala and Python…

There are around 1.4 million lines of code in Kafka 4.0.

1,397,430 to be precise 😛

That is… a lot of code when you think about it.

Here is how it has evolved:

We see a reduction of Scala code, which is expected.

  • All new major features (KRaft, Tiered Storage, Consumer Groups v2 and Queues) are written exclusively in Java.

  • Old features, written in Scala, eventually get deleted (e.g ZooKeeper)

🏆 Retro

There have only been 3 other major Kafka releases. See how Kafka has evolved since:

Kafka only ever peaked at 206k lines of Scala 🥲
(Scala fans will say this is equivalent to 1M lines of Java)

Language wars aside, there is a more important metric:

The number of people contributing to Kafka keeps going up!

Kafka keeps growing in community! This is the metric I’m most excited about, and the metric that you should be most excited about too - because it points to project longevity.

BTW, I recently did a community spotlight on Kafka contributors from Taiwan.

I was really surprised to see we have 6+ high impact contributors just from there! Check it out here on my Substack:

Getting back to Kafka, let’s do a quick retro on the top features from previous major versions:

1.0 (Nov 2017)

  • the StreamsBuilder API was added (you’ll know if you’ve ever used KStreams)

  • lots of metrics added

  • Java 9 support added. This helped TLS perf a ton

  • JBOD mode was improved (a broker would previously die if a single disk went offline)

2.0 (July 2018)

  • KIP-290: the ability to add prefix-based ACLs

  • KIP-255: a framework for OAuth2 auth in Kafka

  • ability to update SSL certificates without restart

  • replication protocol strengthened

  • dropped Java 7 support

  • deleted old Scala consumer/producer clients

3.0 (Sep 2021)

  • changed Producer defaults to the strongest guarantees - acks=all, enable.idempotence=true

  • changed Consumer default session.timeout.ms from 10 seconds to 45 seconds (we really had some awful defaults didn’t we)

  • Java 8 was deprecated

    • 3.5 years later, we finally get to remove it today

  • Important KRaft improvements (snapshots, producer ID)

  • MirrorMaker1 is marked as deprecated

    • the code was deleted today

  • a nicer API to restart connect tasks

I’m actually surprised how boring of a release this was. But history shows most major releases are. Recent large changes have all been introduced gradually and given multiple releases’ time to soak, get improved and tested.

🌌 The Future

Things have changed a lot since 2021, with major features going GA:

  • Tiered Storage (KIP-405)

  • KRaft (KIP-500)

  • The new consumer group protocol (KIP-848)

Now, the next chapter begins - Kafka 4.x. ⭐️

A lot of interesting ideas are already being worked on:

Others are still being discussed around:

💸 pro-tip: it costs nothing to join the mailing list and reply in any one of these discussion threads with what you think of the feature.

It helps the project by giving it a data point of user demand.

If any one of these features interests you in particular, you better notify us!

The only thing I’m waiting for is for someone to propose a KIP that makes Kafka topics leaderless and diskless (like WarpStream and Bufstream).

a brief overview of the leaderless/diskless/stateless Kafka wars

🗣The Socials

Here are a few interesting social media posts you may have missed:

  • 🎖️ My own Napkin Math MVP design of a direct-to-S3 Kafka architecture

  • 🤩 On the same topic, announcing a tool to help you calculate at what throughput the direct-to-S3 approach starts saving you money

  • 💪 An introductory in-depth post on Bufstream (the new kid on the leaderless stateless Kafka block)

  • 🤯 Another Bufstream post where they announced insane 100 GiB/s cross-REGIONAL numbers (new kid has got some skills)

insane numbers, you can’t get this setup with Kafka

  • ☄️ 14 reasons why Tiered Storage makes Kafka simpler, better and cheaper. This is an amazing post that really got slept on. 😴

The Tools 🛠️

At this point, the 2 Mminute Streaming empire has expanded to offer a bunch of helpful tools:

Apache®, Apache Kafka®, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.