- 2 Minute Streaming
- Posts
- KIP-392: Fetch From Follower
KIP-392: Fetch From Follower
why enabling this feature can save you 50% of your Apache Kafka's deployment's total cost in the cloud
The Fetch Problem
Kafka is predominantly deployed across multiple data centers (or AZs in the cloud) for availability and durability purposes.
Kafka Consumers read from the leader replica.
But, in most cases, that leader will be in a separate data center. ❗️
In distributed systems, it is best practice to processes data as locally as possible. The benefits are:
📉 better latency - your request needs to travel less
💸 (massive) cloud cost savings in avoiding sending data across availability zones
Cost
Any production Kafka environment spans at least three availability zones (AZs), which results in Kafka racking up a lot of cross-zone traffic.
Assuming even distribution:
2/3 of all producer traffic
all replication traffic
2/3 of all consumer traffic
will cross zone boundaries. 😱
network cost example of 100MiB/s produce / 300 MiB/s consume on AWS
Cloud providers charge you egregiously for cross-zone networking.
Azure: Free. 🤩
GCP: $0.01/GiB, charged at the source
AWS: $0.02/GiB, charged 50% at the source & 50% at the destination
How do we fix this?
There is no fundamental reason why the Consumer wouldn’t be able to read from the follower replicas in the same AZ.
💡 The log is immutable, so once written - the data isn’t subject to change.
Enter KIP-392.
KIP-392
⭐️ the feature: consumers read from follower brokers.
The feature is configurable with all sorts of custom logic to have the leader broker choose the right follower for the consumer. The default implementation chooses a broker in the same rack.
Despite the data living closer, it actually results in a little higher latency when fetching the latest data. Because the high watermark needs an extra request to propagate from the leader to the follower, it artificially throttles when the follower can “reveal” the record to the consumer.
How it Works 👇
The client sends its configured
client.rack
to the broker in each fetch request.For each partition the broker leads, it uses its configured
replica.selector.class
to choose what thePreferredReadReplica
for that partition should be and returns it in the response (without any extra record data).The consumer will connect to the follower and start fetching from it for that partition 🙌
KIP-392 in Action
🤑 The Savings
KIP-392 can basically eliminate ALL of the consumer networking costs.
same AWS cluster setup, this time with KIP-392 enabled
This is always a significant chunk of the total networking costs. 💡
The higher the fanout, the higher the savings:
Support Table
Released in AK 2.4 (October 2019), this feature is 5+ years old yet there is STILL no wide support for it in the cloud:
I would have never expected MSK to have lead the way here, especially by 3 years. 👏
They’re the least incentivized out of all the providers to do so - they make money off of cross-zone traffic.
Speaking of which… why aren’t any of these providers offering pricing discounts when FFF is used? 🤔
Liked this edition?
Help support our growth so that we can continue to deliver valuable content!
And if you really enjoy the newsletter in general - please forward it to an engineer. It only takes 5 seconds. Writing it takes me 5+ hours.
What more did we post on other platforms?
🤑 The Story of How Confluent Bought WarpStream for $220 MILLION (…after just 13 months of operation…)
An expensive Kafka cluster sells for $1M.
Cheap Kafka sells for … $220M
The story of how Confluent acquired WarpStream after just 13 months of operations 👇
— Stanislav Kozlovski (@BdKozlovski)
2:51 PM • Nov 2, 2024
✨ Apache Kafka 3.9 Release Summary
Apache Kafka 3.9.0 was just released! 🔥
The LAST ZooKeeper release 🫡 🥲
What comes with this new release?
Here are the top features you should know about:
(2-minute read) 🧵
— Stanislav Kozlovski (@BdKozlovski)
5:49 PM • Nov 9, 2024
😡 Why is Kafka in the Cloud so ridiculously EXPENSIVE?
A Kafka in the cloud doing 30MB/s costs more than $110,000 a year.
A $1,000 laptop can do 10x that.
Where did we go wrong? 👇
The Cloud. Namely - its absurd networking charges 👎
Let’s break it down simply:
• AWS charges you $0.01/GB for data crossing AZs (but in the same… x.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
2:52 PM • Sep 14, 2024
😱 Kafka HDD Costs on AWS vs Hetzner
Most people think the cloud saves them money.
Not with Kafka. ❌
Storage costs alone are 32 times more expensive than what they should be.
Even a miniscule cluster costs hundreds of thousands of dollars!
Let’s run the numbers. 🤓
Assume a small Kafka cluster consisting… x.com/i/web/status/1…— Stanislav Kozlovski (@BdKozlovski)
2:53 PM • Sep 29, 2024
🤘 Bare Metal Kafka \w AWS network link cost savings
Instead of spending $278k a year to manage Kafka in the cloud, this user is paying 4x less - $72.7k annually. 🔥
Over the 7 years since deployment, they would have spent $1.94M if running in AWS at retail prices.
Instead, they paid ~$509k to host it themselves in a data center… x.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
1:00 PM • Oct 6, 2024
🤫 5 Log Details that you probably didn’t know about
5 Apache Kafka Log Details that you probably didn’t know about 👇
(1 min read)
💡1. Log retention time is based on the RECORD's timestamp.
A producer can send a record with a timestamp of 01-01-1999 and Kafka will evaluate the retention time of that partition’s log via the… x.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
5:14 PM • Oct 23, 2024
More Content?
Make sure to follow me on all mediums to not miss anything:
Big Data Stream - my long-form newsletter on Substack
BdKozlovski - my X profile
/in/stanislavkozlovski - my LinkedIn profile
Apache®, Apache Kafka®, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.