Slack 🤝 Kafka

Slack started their Kafka journey with… Redis.

They initially used Redis as a queue for async processing of any tasks that were too slow for a web request - e.g unfurling links, notifications, search index updates, etc.

High Level Redis Architecture

Then one day in 2016?

They had a big incident.

A database slowdown lead to a:
- job execution slowdown which lead to:
  - Redis hitting its memory limit.

When Redis exhausts memory, it doesn’t allow you to enqueue new jobs. Slack lost data during this incident (the jobs they couldn’t enqueue).

And ultimately - Redis got completely stuck. It turns out dequeueing something from Redis requires a tiny amount of memory too. 😬

They solved this by migrating to Kafka. Incrementally.

They first placed Kafka in front of Redis to act as a durable store.

kafkagate, an HTTP proxy, was written & placed in front of Kafka for their PHP web apps to interface with.

Kafka + Redis Intermediate Architecture

Kafka 🤝 Slack’s Data Warehouse

In 2017, Slack shared that Kafka is also used to collect data (logs, jobs, etc) & push it to their data warehouse in S3.

They used Pinterest’s Secor library as the service that persists Kafka messages to S3. (really a sink connector equivalent)

Kafka 🤝 Slack’s Distributed Tracing Events

Another use case they have is shepherding distributed tracing events into the appropriate data stores for visualization purposes.

This is at the following scale:

310M traces a day (3587/s)
8.5 spans a day (98.3k/s)

The Latest Stats ✨

Slack continued to grow their Kafka usage across the org - with different teams adopting Kafka in their own setups. This eventually led to a fragmentation of versions & duplicate effort in managing Kafka.

Year by year, Kafka became an increasingly-central nervous system at their company, moving mission-critical data.

In 2022, it powered:

logging pipelines
trace data
billing
enterprise analytics
security analytics

The latest numbers I could find were the following:

❝

6.5 Gbps

1,000,000s of messages a second

700TB of data (0.7PB)

10 Kafka clusters

100s of nodes

Slack’s Kafka stats

Managing 10 clusters at this scale required some work - they invested in automating many processes:

topic/partition creation & reassignment
capacity planning
adding/removing brokers
replacing nodes / upgrading versions
observability
ops toil

And Slack’s Kafka usage is only growing.

They formed a new Data Streaming team to handle all current and future Kafka usecases.

Immediate future plans include a new Change Data Capture (CDC) project. It will support the caching needs for Slack’s Permission Service used to authorize actions in Slack and enable near real-time updates to their Data Warehouse.

Have any questions for Slack?

We managed to find that platform team’s manager - Ryan - on Twitter and he agreed to an AMA👇 (thanks Ryan!)

— # (#)

🗣️ A quote by Slack’s co-founder I found inspiring for product development:

— # (#)

Stan: This is how Kafka seems to naturally proliferate inside companies. Starts with one thing, and then teams just continue to adopt and adopt as its network effect of expertise & experience grows inside the company.

I’m curious to hear how adoption is going in your company. Please reply to this e-mail if you feel like sharing! We will make sure to highlight it in the next issue 🙂

Liked this edition?

Help support our growth so that we can continue to deliver valuable content!

Share on Twitter

Share on LinkedIn

Subscribe for more 2-minute reads (1 a week)

More Kafka? 🔥

What more did we post on socials this week?

Let’s see:

🎥 Visualization of the new KIP-8484 Consumer Groups protocol ✨

— # (#)

💡 The top 14 Consumer Configs you should learn about:

— # (#)

🤐 ZenDesk’s mTLS Kafka setup - see what secures your support tickets:

— # (#)

😴 When to alert on URPs 🥱

— # (#)

_{Apache®, Apache Kafka®, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks}_.

How Slack uses Apache Kafka

Slack 🤝 Kafka

Kafka 🤝 Slack’s Data Warehouse

Kafka 🤝 Slack’s Distributed Tracing Events

The Latest Stats ✨

Have any questions for Slack?

More Kafka? 🔥

Keep Reading

2 Minute Streaming

Home