Kafka Perfromance Blog

           Big Data Testing : Apache Kafka Performance Benchmarking

In this blog we will start from the basic tools/scripts of Apache Kafka and discuss how performance test and benchmarking can be done by performing some load tests for default configuration.

Overview:

Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system.

Let’s go through it’s messaging terminology first:

  • Kafka maintains feeds of messages in categories called topics.
  • We'll call processes that publish messages to a Kafka topic producers.
  • We'll call processes that subscribe to topics and process the feed of published messages consumers.
  • Kafka is run as a cluster comprised of one or more servers each of which is called a broker.

So, at a high level, producers send messages over the network to the Kafka cluster which in turn serves them up to consumers like this:

For further information about Apache Kafka, please refer to link below:

Kafka Documentation

So, while doing performance testing for Kafka there can be two aspects which we need to take in consideration:
1. Performance at Producer End
2. Performance at Consumer End

We need to perform this test for both, Producer and Consumer so that we can make sure how many messages Producer can produce and Consumer can consume in a given time. For a large number of messages we can ensure data loss as well.

Main intent of this test is to find out the following stats:
1. Throughput(messages/sec) on size of data
2. Throughput(messages/sec) on number of messages
3. Total data
4. Total messages

Let’s go ahead with download and setup kafka, starting zookeeper, cluster, producer and consumer.

  • To download kafka refer this link http://kafka.apache.org/downloads.html
  • Once it is downloaded, untar it then switch to the directory
    sh
    tar -xzf kafka_2.9.1-0.8.2.2.tgz
    cd kafka_2.9.1-0.8.2.2
  • As Kafka uses Zookeeper, so first you need to start it, follow the steps below:
    sh
    bin/zookeeper-server-start.sh config/zookeeper.properties
  • Now start the Kafka server:
    sh
    bin/kafka-server-start.sh config/server.properties
  • Once server started we need to create a topic now, say “test”
    sh
    bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
  • To check if the topic created successfully, use the list command:
    sh
    bin/kafka-topics.sh --list --zookeeper localhost:2181***
  • Now let’s start the Producer and Consumer as mentioned below:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test
  • Send some message now by type it on Producer console, once you press enter same message should be consumer on the consumer console.

Once the messages generated by Producer are consumed on Consumer, that’s show you setup Kafka correctly.

Now let’s take the performance stats, to do this follow the steps mentioned below:
1. Launch a new terminal window
2. Set the directory to Kafka/bin
3. Here you can find multiple shell scripts, we will be using following to take performance stats:
kafka-producer-perf-test.sh
kafka-consumer-perf-test.sh

If you want to check help about both the shell scripts(perf tools) just type
sh
./kafka-producer-perf-test.sh --help
or
./kafka-consumer-perf-test.sh --help

for Producer and Consumer respectively.

Performance at Producer End

Type following command on console and hit enter key.
sh
./kafka-producer-perf-test.sh --broker-list localhost:9092 --topic test --messages 100

Let’s understand these command line options one by one,
– First parameter is “broker-list”, in this we need to mention broker info that is the list of broker\s host and port for bootstrap, this is required parameter.
– Second parameter is “topic”, this one is also required parameter and shows message category as we discussed earlier.
– Third one shows how many messages you want to produce and send to take the stats, we set it to 100 for our first scenario.

Once test completed some stats will be printed on console, something like;

| start.time | end.time | compression | message.size | batch.size | total.data.sent.in.MB | MB.sec | total.data.sent.in.nMsg | nMsg.sec |
| ———-| ——– | ———– | ———— | ———- | ——————— | —— | ———————– | ——–
| 2016-02-03 21:38:28:094 | 2016-02-03 21:38:28:449 | 0 | 100 | 200 | 0.01 | 0.0269 | 100 | 281.6901 |

  1. start.time, end.time will show when was test started and completed.
  2. If Compression is ‘0’ as above then it shows message compression was off(Default).
  3. message.size shows the size of each message.
  4. batch.size indicates how many messages will be sent in one batch, by default it is set to 200.
  5. total.data.sent.in.MB shows total data send to cluster in MB.
  6. MB.sec indicates how much data transferred in MB per sec(Throughput on size).
  7. total.data.sent.in.nMsg will show the count of total message which were sent during this test.
  8. And last nMsg.sec shows how many messages sent in a sec(Throughput on count of messages).

There are some more parameters which you can use while doing this performance test, like;

–csv-reporter-enabled : If set, the CSV metrics reporter will be enabled

–initial-message-id : The is used for generating test data, If set, messages will be tagged with an ID and sent by producer starting from this ID sequentially. Message content will be String type and in the form of 'Message:000…1:xxx…', using this parameter you will be able to see messages consuming on the consumer.

–message-size : It indicates the size of each message, it can be useful when you want to load test Kafka with some large messages.

–vary-message-size : If set, message size will vary up to the given maximum.

There are some other options as well which can be use as per need during the Producer performance test.

For this blog, I took some performance numbers based on number of messages and performance was shows by graph inline.

Performance at Consumer End

Now let’s look how can we take performance stats at consumer end, type following command and hit enter key.
sh
./kafka-consumer-perf-test.sh --topic test --zookeeper localhost:2181

Let’s understand it's command line options,

First parameter was “topic”, this one is also required parameter and shows message category.
Second parameter is “zookeeper”, this one is also required parameter and shows the connection string for the zookeeper connection in the form host:port.

Once test completed some stats will be printed on console, something like;

| start.time | end.time | fetch.size | data.consumed.in.MB | MB.sec | data.consumed.in.nMs | nMsg.sec |
| ———- | ——– | ———- | ——————- | —— | ——————– | ——– |
| 2016-02-04 11:29:41:806 | 2016-02-04 11:29:46:854 | 1048576 | 0.0954 | 1.9869 | 1001 | 20854.1667|

  1. start.time, end.time will show when was test started and completed.
  2. fetch.size** shows the amount of data to fetch in a single request.
  3. data.consumed.in.MB**** shows the size of all messages consumed.
  4. ***MB.sec* indicates how much data transferred in MB per sec(Throughput on size).
  5. data.consumed.in.nMsg will show the count of total message which were consumed during this test.
  6. And last nMsg.sec shows how many messages consumed in a sec(Throughput on count of messages).

Performance test for Consumer is also based on number of messages and result was shows by graph inline.

By using the stats we can decide the batch size, message size and number of maximum messages which can be produced/consumed for a given configuration or in other words we can benchmark numbers for Kafka.

All the above analysis is done using the default settings of kafka, there can be multiple scenarios where we can test and take the performance stats for Kafka Producer and Consumer, some of those cases can be :

  1. Change number of topics
  2. Change async batch size
  3. Change message size
  4. Change number of partitions
  5. Network Latency
  6. Change number of Brokers
  7. Change number of Producer/Consumer etc.

Above mentioned changes can be done in the properties files available in folder :
sh
/Kafka/kafka_2.9.1-0.8.2.2/config

To understand the config files you can also refer to the link provided in the beginning of the blog.

This blog is just to give an initial idea about Apache Kafka Performance testing and benchmarking, In further blog/s we will be discussing about some complex Kafka performance aspects.

Leave a Reply

Your email address will not be published. Required fields are marked *