Quantcast
Channel: Baeldung
Viewing all articles
Browse latest Browse all 4616

How to Add Partitions to an Existing Topic in Kafka

$
0
0

1. Overview

Kafka is an extremely popular messaging queue with lots of features. We store messages inside a topic in Kafka. Topics are, in turn, divided into partitions where messages are actually stored. There may come a scenario where we need to increase the number of partitions inside a topic. In this tutorial, we’re going to learn how to achieve this particular scenario.

2. Reasons to Increase Partition

Before discussing how to increase partitions, it makes sense to discuss why we even do that. So, there are some scenarios where we need to increase the partitions in Kafka. Some scenarios are listed here:

  • When producers produce huge amounts of messages that Kafka partitions can’t keep up
  • When we add a new consumer in a consumer group to handle parallel processing
  • When certain partitions handle disproportionately more data
  • For fault tolerance
  • Increasing partitions proactively considering the future need.

So, we can understand that there could be many reasons to add new partitions. Now, the question is how do we do  this? In the next section, we’re going to learn two ways to achieve this functionality.

3. How to Add Partitions

Kafka provides two ways to add new partitions. One way involves running Kafka scripts on CLI, and the other is Kafka Admin API, a programmatic way to add partitions. Let’s learn how to use these one by one.

3.1. Using Kafka Script

Kafka provides a kafka-topics.sh script to add new partitions in a topic. Here is the CLI command for that:

$ bin/kafka-topics.sh --bootstrap-server <broker:port> --topic <topic-name> --alter --partitions <number><*pre> Let's suppose our broker is running at localhost:9092, the topic name is my-topic, and the existing partitions are two. Let's add one more partition:
$ bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic my-topic --alter --partitions 3

Notably, the number of partitions here is 3, which includes the number of existing partitions as well as new partitions. This command ensures that the broker will have a total of three partitions.

3.2. Using Kafka API

Kafka also has a programmatic way to achieve the same task using its Admin API. The API is pretty simple, and it just needs the parameters that the above CLI command is expecting.

First, we need to add the Kafka Client library to our project:

<dependency>
     <groupId>org.apache.kafka</groupId>
     <artifactId>kafka-clients</artifactId>
     <version>3.9.0</version>
</dependency>

We can find the latest version of this dependency from Maven Central.

Now, let’s understand how to increase partitions programmatically:

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
try (AdminClient adminClient = AdminClient.create(props)) {
  adminClient.createPartitions(Collections.singletonMap("my-topic", NewPartitions.increaseTo(3)))
    .all()
    .get();
} catch (Exception e) {
   throw new RuntimeException(e);
}

As we can see, AdminClient expects parameters related to broker address, the name of the topic, and the total number of partitions. Here again, we can see from the method name itself – increaseTo() – that its argument expects a number of partitions that includes new partitions to add along with the number of existing partitions as well.

4. Common Pitfalls When Increasing Partitions

Although it seems a pretty easy task to add new partitions, it comes with a cost that involves several caveats that we’ll discuss in this section. Let’s discuss some important pitfalls.

4.1. Impact on Message Ordering

We know that Kafka ensures message ordering within a partition but not across partitions. When we increase partitions, the hash of keys might change, leading to some keys being reassigned to different partitions. This can disrupt the order of messages for a specific key.

If, in our system, message ordering is critical, then this rehashing may cause issues if it depends on the number of partitions. To avoid this, we should not use such key calculating functions that can cause issues. Our key calculating function and partitioning should be such that it overcomes this issue.

4.2. Consumer Rebalancing

Adding partitions triggers consumer group rebalancing, which can temporarily interrupt message consumption. Consumers may stop processing for a brief period while the rebalance occurs.

Therefore, it’s necessary to perform partition increases during off-peak times or during scheduled maintenance windows. We should use Kafka’s graceful shutdown and rebalance settings to minimize the disruption.

4.3. Increased Broker and Cluster Load

If the increased number of partitions is significant, then it may strain our Kafka cluster. This is because more partitions mean more metadata and management overhead on brokers.

That’s why we need to ensure that brokers have ample resources such as CPU, memory, and disk usage before and after increasing partitions. We need to ensure our brokers are adequately scaled to handle the new load.

4.4. Repartitioning Complexity

While Kafka allows us to add partitions dynamically, it doesn’t redistribute existing data across the new partitions. As a result, only new data will be written to the additional partitions, which can lead to uneven data distribution.

So, the onus is on us to reprocess the old data to redistribute it across the new partitions. It’s very important to avoid adding partitions too frequently. We should always plan our partitioning strategy to handle long-term growth.

Apart from the above caveats, there are more challenges, such as client configuration Issues, latency issues, partitioning strategy issues, etc., that we need to handle. It’s therefore very important to plan and test on non-prod servers to mitigate these risks.

5. Conclusion

In this article, we learned why we need to add new partitions in Kafka. We also looked at two ways to add new partitions – via CLI and Kafka Admin API. Lastly, we discussed some pitfalls that we can encounter while adding new partitions and how we can anticipate them.

All the code examples used in this article are available over on GitHub.

The post How to Add Partitions to an Existing Topic in Kafka first appeared on Baeldung.
       

Viewing all articles
Browse latest Browse all 4616

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>