Real-Time Event Streaming in Rails with Apache Kafka and Karafka

by Mohammad hussain, System Analyst

kafka

Kafka — What is it?

At its core, Apache Kafka is a distributed messaging system that enables different services to communicate with each other efficiently. Kafka follows a distributed log-based architecture, often compared to message queues but designed for higher scalability and durability.

In this post, we’ll break down the essential Kafka concepts—brokers, topics, partitions, producers, and consumers then set up a local Kafka cluster with Docker and integrate it with a Rails application using Karafka.

🔑 Key Kafka Concepts

  1. Broker: A broker is simply a Kafka server. It’s responsible for receiving, storing, and serving messages (events). Multiple brokers together form a Kafka cluster.

  2. Event: An event (or message) is the data produced to or consumed from Kafka. Events are stored as bytes on the broker’s disk.

  3. Producer & Consumer: Producers are services that send (produce) events to Kafka. Consumers are services that read (consume) events from Kafka. A single service can act as both producer and consumer.

  4. Topic: A topic organizes messages into categories. Think of it like a folder that holds messages of a specific type, such as payment-details or user-details.

  5. Partition: Each topic can be split into partitions to improve throughput. A partition is the smallest storage unit, holding a subset of messages for that topic.

  6. Replication Factor: Kafka ensures reliability using replication. A replication factor of 2 means each partition is stored on two brokers—one leader and one replica—ensuring redundancy if a broker fails.

  7. Offset: Kafka tracks how far a consumer has read using an offset (like a bookmark). If a consumer crashes, it can resume reading from its last saved offset.

  8. Zookeeper: Traditionally, Zookeeper has been used to manage cluster metadata, broker coordination, and leader election. (Note: newer Kafka versions are replacing Zookeeper with the Kafka Raft Metadata mode).

  9. Consumer Group: A consumer group is a set of consumers working together. Each partition of a topic is consumed by exactly one consumer in the group, ensuring no duplication while allowing parallel processing.

Understanding Kafka Topics

In Kafka, a topic is like a category or channel where messages (events) are published. Producers send messages to a topic, and consumers subscribe to topics to read those messages.

  • A topic can be divided into partitions, which allow Kafka to scale horizontally.

  • Each partition is an ordered sequence of messages, and within a partition, the order of messages is always preserved.

  • Topics can have replication, meaning their data is copied across multiple brokers for fault tolerance.

  • Consumers don’t remove messages from a topic — Kafka retains them for a configured period (e.g., 7 days by default), so multiple consumers can read the same data at different times.

Think of a topic as a message queue, but unlike traditional queues, Kafka supports multiple independent consumers without deleting the data immediately.

How Events Flow in Kafka

Let’s understand this with two simple scenarios:

Scenario 1: Single Partition

Imagine we have:

  • 1 broker

  • 1 topic (Topic-A)

  • 1 partition

  • 1 producer and 1 consumer

All events generated by the producer are sent to the single partition of Topic-A, which is managed by one broker.

Since there is only one partition, the order of events is preserved exactly as they are produced.

The consumer connected to this topic then reads the events sequentially, ensuring that the data is consumed in the same order in which it was originally written.

This setup is straightforward and guarantees message ordering, but it has limited scalability since everything flows through just one partition.

kafka

Scenario 2: Multiple Partitions

Now let’s say Topic-A has 2 partitions.

When Topic-A is divided into 2 partitions, the producer’s events are distributed between them.

Each event is written to exactly one partition, and Kafka does not duplicate messages across partitions by default.

This setup increases throughput and enables Kafka to process much larger volumes of data efficiently.

kafka

Starting a Kafka Cluster with Docker

Before integrating Kafka with your application, you need a running Kafka cluster. The easiest way to get started is by using the official Docker images.

📖 Official Kafka Docker docs

Pull the Docker image:

docker pull apache/kafka:4.0.0

Start a Kafka container:

docker run -p 9092:9092 apache/kafka:4.0.0

This runs Kafka on port 9092, making it accessible locally.

Installing and Using Karafka with Rails

📖 Karafka Official Documentation

Prerequisite: Ensure your Apache Kafka cluster is up and running before integrating Karafka.

  1. Add the gem: In your Gemfile, include Karafka:
gem 'karafka'

Then run:

bundle install
  1. Install Karafka scaffold: Run the install command to generate necessary files:
bundle exec karafka install

This generates a karafka.rb initializer, which configures Karafka and defines routes mapping topics to consumers.

  • karafka.rb — for configuring Karafka and defining topic–consumer mappings

  • app/consumers/example_consumer.rb — a starter consumer

  • app/consumers/application_consumer.rb — base class for custom consumers

Once you’ve installed the karafka gem, you need to configure it with your Kafka cluster and define the topics and consumers.

Example karafka.rb

class KarafkaApp < Karafka::App
  setup do |config|
    config.kafka = { 'bootstrap.servers': ENV.fetch("KAFKA_SERVERS", "127.0.0.1:9092") }
    config.client_id = "my_rails_app"
  end

  # Subscribe to logging and error monitoring
  Karafka.monitor.subscribe(Karafka::Instrumentation::LoggerListener.new)

  routes.draw do
    # Define topics and assign consumers
    topic :products do
      consumer ProductsConsumer

      # Optional dead-letter queue to catch failed messages
      dead_letter_queue(
        topic: "products_dlq",
        max_retries: 3
      )
    end

    topic :orders do
      consumer OrdersConsumer
    end
  end
end

👉 In this example:

  • We point Karafka to Kafka brokers (bootstrap.servers).

  • Each topic is mapped to a consumer class (e.g., ProductsConsumer).

  • Optional Dead Letter Queues (DLQ) capture failed messages for later debugging.

Example Consumer

class ProductsConsumer < ApplicationConsumer
  def consume
    messages.each do |message|
      data = message.payload
      Rails.logger.info "Received product: #{data}"

      # Example: Save or update product records in the database
      product = Product.find_or_initialize_by(code: data["code"])
      product.update!(name: data["name"], price: data["price"])
    end
  end
end

👉 Here, every time a message is published to the products topic, this consumer will process it.

Example Producer

To send data to Kafka, you can use Karafka’s producer:

def publish_order(order)
  Karafka.producer.produce_async(
    topic: "orders",
    payload: order.to_json
  )
end

👉 This sends order data asynchronously to the orders topic.

How It Works

  1. Producer: Your Rails app (or another service) publishes messages to a Kafka topic (e.g., orders).

  2. Kafka Cluster: Kafka stores these messages in partitions, ensuring ordering within each partition.

  3. Consumer: Karafka consumers in your Rails app subscribe to specific topics. Each message is delivered to the matching consumer.

  4. Processing & DLQ: The consumer processes the message (e.g., updates the database). If processing fails, the message can be retried or moved into a Dead Letter Queue.

This architecture lets Rails handle real-time event streams, ensuring data flows smoothly between different services.

Thanks for Reading! 🎉

If you’ve made it this far, you’ve successfully taken your first steps into the world of Apache Kafka. You now understand the basics of topics, partitions, producers, and consumers, along with how to set up a cluster and integrate it with Rails using Karafka.

It might seem like a small start, but what you’ve unlocked is powerful—you’re stepping into real-time data streaming, event-driven architecture, and scalable systems that can handle massive workloads. 🚀

Here’s to building applications that don’t just process data, but react to it instantly and unlock new possibilities. ⚡📊

Resources 📚

More articles

From State to Edges: How LangGraph Connects the Dots

Explore LangGraph's Nodes and Edges, and learn how they shape intelligent and flexible AI workflows in JavaScript

Read more

Step One in RAG: Building Your First Knowledge Retrieval Pipeline

Learn how to build your first RAG pipeline by combining HuggingFace embeddings, Pinecone vector search, LangChain orchestration, and Perplexity-powered responses.

Read more

Ready to Build Something Amazing?

Codemancers can bring your vision to life and help you achieve your goals