Tech for PMs — what is Kafka?

What software architecture and Instagram have in common

Adil Dewan
5 min readSep 2, 2021

When I heard there was something in software called “Kafka”, I was expecting the stuff of nightmares.

Imagine my relief when I learned the software version of Kafka is just a messaging framework that moves data super fast around your app allowing you to build cool features and get business insights. Phew.

TL;DR

Apache Kafka is a messaging framework that allows data to move really fast internally within a product.

Most apps are not a single blob. They’re made up of a bunch of systems that talk to each other: the marketing website, the mobile app, the CRM system, the analytics tool etc.

Pre-Kafka, each system would send individual messages (usually in the form of API calls) to other systems directly.

With Kafka, a system can “publish” its data on a “topic” and other systems can “subscribe” to the topic to listen to that data. A bit like someone posting on Instagram and people following them to see their posts.

It has a couple of key advantages:

  • It’s lightning fast, allowing things to happen in real time in your app
  • It’s super scalable, it can facilitate the movement of massive amounts of data per second + it’s easy to tack on more systems as you grow

Kafka is the technology behind loads of features you use. It lets Uber find you a driver when you need a lift home and Chime / Revolut send you an instant push notification when you use your bank card.

Kafka is to systems what Instagram is to influencers

The year is 1354. The bubonic plague is over and you’re planning to live your best life. Naturally, you want to share what you’re doing with everyone. But it isn’t that easy…

You decide to write a memoir. Every time you do something cool, you come home and write about it. 10 years later, you send your memoir to a few acquaintances. If it’s interesting, maybe in a few thousand years it’ll be widely reproduced and you’ll be famous.

Fast forward to today and the world is starting to emerge from the COVID-19 pandemic. You’ve got big plans and thankfully there’s a much more efficient way to share your life with people. You download Instagram 📸

Instagram is an incredibly effective way of sharing:

  • lots of stuff
  • with lots of people
  • really quickly
  • as it happens

Once you’ve published a few great posts, it won’t be long before you start attracting some attention. People will come onto your feed, look at your posts and, if they find them interesting, hit the follow button. Once they’re following, they’ll be kept up to date whenever you post.

If you play your cards right, within a few short months you could be a social media influencer, reaching millions of people all around the world…

In order to understand how Kafka works, replace:

  • “Instagram” with “Kafka”
  • “People” with “systems”
  • “Post” with “message”
  • “Feed” with “topic”
  • “Follow” with “subscribe”

Kafka is just like Instagram, in that it is a great way to share lots of data with lots of systems in your app in real time⚡️

Kafka is a game changer

Before Kafka, if one system of your product wanted data from another system, it would have to make a specific request to the relevant system. That system would then send back the requested data to the original system. A bit like how someone goes to a shop to buy a book, asks where to find the book and then buys the book.

This is known as the “request-response” model of messaging.

The request-response model of messaging

Kafka improves on the request-response model. It allows one system of your product (known as the “publisher”) to share (or “stream”) its data (“events”) widely so that it can be listened to by multiple other systems of your product (“consumers”) in real time.

Just like someone sharing photos on Instagram, a publisher system (e.g. your website) shares its events via a feed (a “topic”).

A consumer system (e.g. an analytics tool) “subscribes” to the topics which its interested in so that it is kept up to date for all the useful stuff it potentially needs to know and use.

This is the “publisher-subscriber” model of messaging (known as “pub-sub”).

The publisher-subscriber model. Data is published from the website to its Kafka topic. The analytics tool, CRM system and push notifications system subscribe to the topic to listen for events.

Kafka has become a really important element of the modern software stack

At this point, it’d probably be easier to list the companies that aren’t using Kafka in some way.

It powers some really important features in your favourite products:

  • Call of Duty uses Kafka to calculate gameplay stats like shooting events and death location
  • Pinterest uses it for content recommendation and deciding which ads to show you
  • Tinder uses it to send you a push notification when you get matched with someone

What are the main benefits of Kafka?

  • It is lightning fast. It gets data from the source system to the destination system in less than 10ms, allowing things to happen in real time. For example, Uber uses it to calculate the dreaded surge pricing on a Saturday night
  • It is big-data friendly. Because the “publisher-subscriber” way of streaming events is more efficient than sending individual API calls, Kafka can store massive amounts of data — up to millions of messages per second
  • It is super scalable. It’s really easy to add more publisher and consumer systems into the mix for a relatively low overhead cost
  • It can act as a historical log of messages. These can be searched or used for data analysis.

Further reading

This is a very simple primer on Kafka. If you want to know more here are some great resources to help you go deeper into the subject…

--

--