- 2 Minute Streaming
- Posts
- no one will tell you the real cost of Kafka
no one will tell you the real cost of Kafka
is kafka really that expensive?
The open source streaming space is built on transparency, so when I see something wrong - I feel obliged to call it out.
WarpStream has been hailed for delivering a much cheaper Kafka, and allegedly they prove it through a pricing calculator on their website.
Like most Kafka providers, they make a brand of talking about how Kafka sucks and it can’t be fixed.
Does Kafka Suck?
Their architecture is innovative - no doubt. It does allow for a cheaper deployment, because it writes directly to S3. That avoids all inter-zone networking and broker disks.
The WarpStream Architecture
So WarpStream has the design that should be cheaper. But after you account the company’s cost - it’s actually not cheaper.
They parrot it being 10x cheaper than Kafka, but the sole reason is because they massively inflate Kafka’s cost in their calculator.
Luckily, the calculator is client-side Javascript.
I reviewed the code in depth and found a ton of serious (intentional?) issues.
🙅♂️ Calculator Flaws
The full explanation is too long and nuanced to capture in 2 minutes.
I wrote a report on the story of how I found out in my long-form blog - Big Data Stream. It’s investigative journalism at its finest. 👌
Here I’ll capture the summary:
The end result?
It deploys a 276 Kafka cluster where each broker is running a r4.4xlarge (16 vCPU / 122 GiB RAM) and a 16 TiB gp2 SSD (supporting 16,000 IOPS & 250 MiB/s) to serve… 3.7 MiB/s of producer traffic. 🤯
There were other gotchas too, like the unintuitive input of pre-compression throughput.
The compression ratio changed post-acquisition and that further increased Kafka’s costs while keeping the WarpStream’s cost the same.
🔥 The Kicker: WarpStream is NOT Cheaper
In two out of the three public clouds, WarpStream is actually not cheaper than running Kafka yourself.
It is only in AWS where they are, and only by 30%.
If you account for cloud cost discounts (which customers would expect if they use a non-trivial BYOC deployment), it will probably be cheaper.
🤩 Announcement: AKalculator
I spent the last two months coding up an in-depth Apache Kafka deployment calculator, and today I’m releasing it.
Vendor-made calculators are incentivized to make Kafka seem more expensive and more complex than it is, in order to increase its relative value proposition.
They’re bound to suffer from the same fundamental misalignment in incentives.
This one isn’t.
Check it out here:
👀 The Full Scoop
I can only write 476 words here, but I have 7711 more to back up my claims.
The detailed story is in my Big Data Stream newsletter.
It’s truly investigative journalism at its finest. 👌
We go into:
a code review of the calculator
more gotchas
in-depth cost calculation and comparison
You’ll learn a lot about Kafka and what deployment considerations you ought to have, how vendors gaslight you, the hidden incentives of other hosted Kafka providers and how open-source Kafka has a clear path to become as cheap as WarpStream.
So I suggest you set aside a moment this Christmas time, snuggle up with a hot cocoa or coffee and reserve 30 minutes to read it.
Happy holidays all. ✌️
Disclaimer
To end this on a happier note, I want to add a bit of a disclaimer at the end.
I worked in Confluent for 6 years. Some people may see it as a bit of a backstab to speak out against a company they acquired.
I wanted to write out my thoughts in detail so there aren’t any questions about intention, malice, agenda, etc.
• I have only good things to say about Confluent. The company was always transparent, the founders are great people and the culture in there was great.
• I worked with talented, honest co-workers and friends there.
• I have equally published multiple posts praising Confluent's product and the company. And will continue to when the opportunity arises.
• I see WarpStream and Confluent as separate, at least in the actions they’ve taken and the culture they've shown.
• I do think WarpStream has a very strong team and an amazing product, it’s a true testament of excellence what they managed to build.
• I think what WarpStream did in marketing was not right with relation to the open source Apache Kafka project.
I therefore believe a call out and a bit of parody in response is therefore fair game.
I don’t mean to build my brand off drama and controversy. I’ll be going back to my regularly-scheduled programming right after this post.
I realize the initial disclaimers on my newsletters were too vague regarding my intention, so I have updated them there too.
I don’t have a bone to pick. I realize other companies make similar non-truthful claims in their marketing too. And I think they should be called out too. 👮♂️
Especially if it goes too far.
I don't think I should censor myself simply and cherry-pick who I call out simply because I have some indirect history with them.
I tried very hard to keep everything objective and backed by data. If there are any errors - please tell me, I will correct them and apologize.
I would prefer to say the truth. I would say it to a friend too.
If you don’t agree, then perhaps we can agree to disagree. 🤷♂️
You’re free to unfollow me and hold your own views. ✌️
It’s just some pixels of a post on the internet after all.
But I hold no negativity. Hopefully you don’t either. 😇
Was this too far? |
Want to Discuss?
Make sure to follow me on all mediums to not miss anything:
Big Data Stream - my long-form newsletter on Substack
BdKozlovski - my X profile
stanislavkozlovski.bsky.social - my BlueSky profile
/in/stanislavkozlovski - my LinkedIn profile
Apache®, Apache Kafka®, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.