1. Overview
In this tutorial, we’ll discuss a potent event streaming platform called Redpanda. It’s a competition to the de facto industry streaming platform Kafka and, interestingly, it’s also compatible with the Kafka APIs.
We’ll look at the key components, features, and use cases of Redpanda, create Java programs for publishing messages to Redpanda topics, and then read messages from it.
2. Redpanda vs. Kafka
Since the makers of Redpanda are claiming to be competition to Kafka, let’s compare them on a few of the important factors:
Feature | Redpanda | Kafka |
---|---|---|
Developer Experience |
|
|
Performance |
|
|
Cost |
|
|
Connector |
|
|
Community Support |
|
|
3. Redpanda Architecture
Redpanda’s architecture is not only simple but extremely easy to grasp. Interestingly, it has a single binary installation package that’s easy to install. This gives the developers a quick headstart, hence the reason for its popularity. Moreover, it delivers an extremely high-performing streaming platform with a great throughput.
3.1. Key Components and Features
Let’s dive into the key components and features of Redpanda that make it extremely robust and performant:
The control plane supports Kafka API for managing the broker, creating messaging topics, publishing and consuming messages, and much more. Hence, the legacy systems relying on Kafka can migrate to Redpanda with significantly less effort. However, there’s a different set of Admin APIs for managing and configuring the Redpanda cluster.
Redpanda supports tiered storage. This means we can configure it to offload or archive its data logs from its local cache to a cheaper object storage in the cloud. Also, on-demand from the consumers, the data is moved back to the local cache from the remote object storage in real time.
Redpanda has a Raft consensus algorithm implementation layer that replicates topic-partition data across its nodes. This feature prevents data loss in the event of a failure. Naturally, it guarantees high data safety and fault tolerance.
Redpanda has robust authentication and authorization support. It can authenticate external users and applications using methods such as SASL, OAuth, OpenID Connect (OIDC), basic authentication, Kerberos, and others. Additionally, it enables fine-grained access control over its resources through the Role Based Access Control (RBAC) mechanism.
Schemas are essential in defining the data exchanged between the Redpanda broker, consumers, and producers. Hence, the cluster has a Schema Registry. The Schema Registry API helps register and modify the schemas.
The HTTP Proxy (pandaproxy) API provides a convenient way to interact with Redpanda for basic data operations like listing topics and brokers, getting events, producing events, and much more.
Finally, Redpanda provides metric endpoints for its monitoring. These can be configured on Prometheus (monitoring tool) to pull important metrics and show them on Grafana dashboards.
3.2. Single Binary Installation Package
Redpanda’s installation package comprises a single binary, hence its installation is significantly simpler than Kafka. Unlike Kafka, it’s not dependent on a JVM or a cluster manager like Zookeeper. Due to these factors, operating Redpanda is remarkably easy.
It’s developed in C++ and has a compelling thread-per-core programming model that helps utilize the CPU cores, memory, and network optimally. Consequently, the hardware cost for its deployment is significantly reduced. This model also results in low latency and high throughput.
Redpanda’s cluster comprises multiple nodes. Each node can be either a data plane or a control plane. All these nodes need is a single binary package installed on them with the appropriate configurations. If the nodes have high-end computing power, they can play both roles without performance bottlenecks.
3.3. Management Tools
Redpanda provides two management tools, a Web Console and a CLI called Redpanda Keeper (RPK). The Console is a user-friendly web application that cluster administrators can use.
RPK is mostly used for low-level cluster management and tuning. However, the Console provides visibility into data streams and the capability to troubleshoot and manage the cluster.
4. Deployment
Redpanda supports Self-hosted and Redpanda Cloud deployment.
In Self-hosted deployment, customers can deploy the Redpanda cluster inside their private data centers or in their VPCs in the public cloud. It can be deployed on physical or virtual machines and Kubernetes. As a rule of thumb, each broker should have its dedicated node. Currently, RHEL/CentOS and Ubuntu operating systems are supported.
Additionally, AWS Simple Storage Service (S3), Azure Blob Storage (ABS), and Google Cloud Storage (GCS) can be used for supporting tiered storage.
Interestingly, customers can also opt for Redpanda Cloud for managed services. They can either have the whole cluster completely on Redpanda Cloud or choose to own the data plane running in their private data centers or public cloud accounts. The control plane remains on the Redpanda Cloud where monitoring, provisioning, and upgrades are all taken care of.
5. Key Use Cases
Unlike Kafka, Redpanda is an extremely robust streaming platform for developers because of its simple architecture and ease of installation. Let’s quickly look at the use case along the same lines:
In general, the participants in a streaming platform are:
- Source systems generate feeds
- Feeds could be monitoring events, metrics, notifications, and more
- Brokers in the cluster managing the topics
- Producers read feeds from source systems and publish them to the topics
- Consumers constantly poll on the subscribed topics
- Target Systems receive the transformed messages from the consumers
Redpanda guarantees the delivery of live feeds from various sources like monitoring tools, compliance and security platforms, IoT devices, and others to target systems with an incredibly 10x lower average latency.
It supports the consumer and producer model for processing live feeds or events from various sources. The producers are applications that read data from source systems and publish it to topics in the Redpanda cluster. The brokers in the cluster are highly reliable and fault-tolerant, guaranteeing message delivery.
The consumer applications subscribe to the topics in the cluster. Eventually, they read the data from the topics and, after further transforming the data, send them to various target systems like analytics platforms, NoSQL databases, relational databases, or other streaming platforms.
In Microservice architecture, Redpanda helps decouple microservices by facilitating asynchronous communication between them.
Consequently, it can play a substantial role across industries in developing:
- Observability platforms for event and log processing, reporting, troubleshooting, and auto-healing
- Real-time compliance and fraud-detection systems
- Real-time analytic dashboards and applications
6. Implement Redpanda Client With Kafka API
Notably, Redpanda supports the Kafka API. Hence, we’ll use the Kafka client to write programs that can interact with the Redpanda Stream.
For our examples, we’ve used Java Testcontainers to deploy a single-node Redpanda on a Windows desktop.
Furthermore, we’ll explore fundamental programs covering topic creation, message publishing, and message consumption. This is just for demonstration purposes and, hence, we won’t delve deeply into the Kafka API concepts.
6.1. Prerequisites
Before we begin, let’s import the necessary Maven dependency for the Kafka client library:
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>3.6.1</version>
</dependency>
6.2. Create Topic
For creating a topic on Redpanda, we’ll first instantiate the AdminClient class from the Kafka client library:
AdminClient createAdminClient() {
Properties adminProps = new Properties();
adminProps.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, getBrokerUrl());
return KafkaAdminClient.create(adminProps);
}
To set up the AdminClient, we got the broker URL and passed it to its static create() method.
Now, let’s see how we create a topic:
void createTopic(String topicName) {
try (AdminClient adminClient = createAdminClient()) {
NewTopic topic = new NewTopic(topicName, 1, (short) 1);
adminClient.createTopics(Collections.singleton(topic));
} catch (Exception e) {
LOGGER.error("Error occurred during topic creation:", e);
}
}
The createTopics() method of the AdminClient class takes in the NewTopic object as an argument for creating a topic.
Finally, let’s take a look at the createTopic() method in action:
@Test
void whenCreateTopic_thenSuccess() throws ExecutionException, InterruptedException {
String topic = "test-topic";
createTopic(topic);
try(AdminClient adminClient = createAdminClient()) {
assertTrue(adminClient.listTopics()
.names()
.get()
.contains(topic));
}
}
The program creates the topic test-topic successfully on Redpanda. We also validate the presence of the topic in the broker with the method listTopics() of the AdminClient class.
6.3. Publish Message to a Topic
Understandably, the most basic requirement of a producer application is publishing messages to a topic. For this purpose, we’ll use a KafkaProducer:
KafkaProducer<String, String> createProducer() {
Properties producerProps = new Properties();
producerProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, getBrokerUrl());
producerProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
producerProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
return new KafkaProducer<String, String>(producerProps);
}
We instantiated the producer by supplying essential properties like the broker URL and the StringSerializer class to the KafkaProducer constructor.
Now, let’s use the producer to publish the messages to a topic:
void publishMessage(String msgKey, String msg, String topic, KafkaProducer<String, String> producer)
throws ExecutionException, InterruptedException {
ProducerRecord<String, String> record = new ProducerRecord<>(topic, msgKey, msg);
producer.send(record).get();
}
After creating the ProducerRecord object, we pass it to the send() method in KafkaProducer object to publish the message. The send() method operates asynchronously and, hence, we call the method get() to ensure blocking until the message is published.
Finally, now, let’s publish a message:
@Test
void givenTopic_whenPublishMsg_thenSuccess() {
try (final KafkaProducer<String, String> producer = createProducer()) {
assertDoesNotThrow(() -> publishMessage("test_msg_key_2", "Hello Redpanda!", "baeldung-topic", producer));
}
}
First, we create the KafkaProducer object by invoking the method createProducer(). Then we publishe the message “Hello Redpanda!” to the topic baeldung-topic by calling the method publishMessage() that we covered earlier.
6.4. Consume Message From a Topic
As a next step, we’ll first create a KafkaConsumer before we can consume the messages from the stream:
KafkaConsumer<String, String> createConsumer() {
Properties consumerProps = new Properties();
consumerProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
consumerProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, getBrokerUrl());
consumerProps.put(ConsumerConfig.GROUP_ID_CONFIG, "test-consumer-group");
consumerProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
consumerProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
return new KafkaConsumer<String, String>(consumerProps);
}
We instantiate the consumer by providing essential properties like the broker URL, StringDeSerializer class, and others to the KafkaConsumer constructor. Additionally, we ensure that the consumer would consume messages from the offset 0 (“earliest”).
Moving on, let’s consume some messages:
@Test
void givenTopic_whenConsumeMessage_thenSuccess() {
try (KafkaConsumer<String, String> kafkaConsumer = createConsumer()) {
kafkaConsumer.subscribe(Collections.singletonList(TOPIC_NAME));
while(true) {
ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofMillis(1000));
if(records.count() == 0) {
continue;
}
assertTrue(records.count() >= 1);
break;
}
}
}
The method, after creating a KafkaConsumer object, subscribes to a topic. Then, it polls on it for every 1000 ms to read messages from it. Here, for demonstration, we’re coming out of the loop, but in the real world, applications continuously poll for the messages and then process them further.
7. Conclusion
In this tutorial, we explored the Redpanda Streaming platform. Conceptually, it’s similar to Apache Kafka but much easier to install, monitor, and manage. Additionally, with less compute and memory resources, it can achieve extremely high performance with high fault tolerance.
However, Redpanda still has a considerable distance to cover in terms of industry adoption when compared to Kafka. Additionally, the community support for Redpanda is not as strong as that for Kafka.
Finally, applications can migrate to Redpanda from Kafka with considerably less effort because it’s compatible with Kafka API.
As usual, the code used in this article is available over on GitHub.