Learning Eventing with Kafka

Arjun Chauhan
6 min readFeb 11, 2024

As a working professional I recently got a chance to work on eventing. The whole process of publishing an event to a topic(don’t worry if you don’t get this term) and then consuming it was very seamless and I didn’t have to worry about if my code logic needs to implement any fault tolerance. My major focus remained on creating the payload I want to publish and the logic to consume the event.

On the surface level the abstraction provided by kafka SDK looks very minimalistic which is exactly the point of abstraction but I got curious and spent time understanding the structure of the whole kafka ecosystem.

In this article I would be focussing on talking about the infrastructure of the kafka ecosystem and how various components work together to give us a seamless experience while interacting with it.

What is Kafka?

Kafka was first developed at linkedin and subsequently open sourced in 2011. Back then linkedin started generating lots of data which they wanted to utilize to get insights. The data they collected was sitting in systems and generating no insights because lot of this data was on different systems. To generate value they needed to deliver it to the relevant locations(systems) where it could be effectively utilised.

When Linkedin developed Kafka it sat at a centre hub where the different systems connected to it to publish information and consume it. Since then we have come in a long way and so have the use cases.

So how do we define it? Kafka is a distributed streaming platform majorly used to create real-time data streams and process these real-time data streams.

Any Kafka environment is broadly divided into three components:

  • Producer
  • Consumer
  • Broker

We would be going into the details about the structure of the system but first let’s understand some common terminologies.

Common terms used

  • Producer: A producer is an application that sends data. Data in kafka is often called a message and sending is called publishing.
  • Consumer: A consumer is an application that receives messages. Receiving data is referred to as consuming data. So publishing and consuming are just embellishment for sending and receiving.
  • Broker: The producers and consumers don’t interact with each other directly. They do it through a broker. A broker is like a centralized server that receives messages from publisher and send it to the customer who needs it.
  • Cluster: A group of servers running the kafka broker is called a cluster.
  • Topic: A kafka topic is a unique name for a data stream. We will go in detail of what it is but for now you can think of it as a normal database table.
  • Topic partition: If we consider a topic as table then this table can exponentially increase in size. This is why we use topic partition. Each topic partition is responsible for handling a subset of data. Choosing the partition is a design decision made when we setup the infrastructure. Topic partitioning allows partitioning the data in a single broker.
  • Partition offset: This is a unique sequence id for a message. It is immutable and is helpful in locating a message. So to locate a particular message we need
  • Consumer Group: As the name suggests consumer group is a group of consumers working on a certain tasks consuming assigned partitions to solve collectively a larger task.

Leader vs Follower Broker

For ensuring there is no data loss. The messages received by broker are replicated in multiple other brokers. This helps in ensuring there is no loss of messages in case a broker goes down. However it is not possible to replicate the same message in multiple brokers without any delay and is also not necessary.

One of the broker is assigned the task of being the leader broker. This broker is responsible for receiving the messages and all requests are directed to the leader broker only. The follower broker are supposed to keep reading the messages received to the leader broker.

Kafka maintains an ISR(in sync replicas) list which as the name suggests are the follower replicas which are in sync with the leader. This list is dynamic and nodes are constantly getting added and removed from it. The followers who are in sync with the leaders are the eligible candidates to be assigned the role of the leader if the current leader broker goes down.

In case the leader broker goes down one of the follower broker can be promoted to become a leader broker.

Now the question how will we determine if the broker is down or which of the follower broker are in sync with the leader and qualify for becoming the leader broker.

Zookeeper

For a particular kafka cluster zookeeper is a centralized server responsible majorly for the metadata management of the cluster. It stores information like the metadata about the brokers so that it is aware if a particular broker node goes down. Kafka is able to provide fault tolerance with the help of zookeeper.

Like mentioned above if the leader goes down a follower broker can be assigned the role of the leader. This whole process is managed by zookeeper. Zookeeper keeps itself updated with the metadata of the brokers and the follower broker who are in sync with the leader so that the correct follower can be assigned as the leader.

Architecture

Now let’s go over a common use case using which we will understand the architecture and see the advantages of using Kafka and when it really shines.

Let’s go over a simple application that is responsible for capturing the weather data every 10 minutes. This data is going to be used by another application that would perform analysis on this data to alert the user on his phone if the weather is good enough for a nice stroll. This is a very simple example but my motive is to show how using kafka would make this problem a lot easier.

Let’s say we take a conventional approach not leveraging any messaging services. The application that is capturing the data about the weather can start storing the data in a database. The application who needs this data can query the database whenever he wants which is going to be every few mintues.

The challenge we are going to face is that the size of the database will keep on increasing gradually without any apparent benefit.(let’s assume we don’t need the data for anything else apart from the notification on the user’s phone.)

One argument for this approach would be that we can perform a cleanup on the database every few hours to make sure that we are not storing unneeded data. This is correct however but this just adds another layer of effort needed to make this approach work. Also let’s consider a case where the mobile application is failing and is not able to get the data from the database. If the cleanup happens before the application could get the data then we are in a position where our mobile application won’t work as expected.

The Kafka Way

Now let’s say we setup a kafka environment.

Now every 10 minutes the data can be pushed to the broker by the producer. In our case our application would require a producer created on it’s end with the endpoint for our broker specified. This does require an additional logic but the abstraction provided by kafka libraries make this work very easy.

So now we won’t need a database to store the data that is getting generated every 10 minutes.

The broker partitions are now responsible for storing this data until the consumer application consumes it.

Every 10 minutes the application can request the data from the broker and perform the actions based on this data. Again a consumer must be configured on the application but like said previously the abstraction provided makes it very easy.

Now let’s say this simple application gains lot of traction and the number of users using it multiplies. We can scale a kafka server a lot easier compared to a database.

In this article I gave a higher lever idea on the use and architecture of Kafka cluster.

There are a lot of things that are still to be discussed about the implementation part for the consumers and producers and what are the type of configurations we can provide. This would in turn give deeper understanding on how the whole system works. That is something I would like to discuss in a later blog.

Thanks

--

--