Quantcast
Channel: Baeldung
Viewing all articles
Browse latest Browse all 4616

Working With Reactive Kafka Stream and Spring WebFlux

$
0
0

1. Overview

In this article, we’ll explore Reactive Kafka Streams, integrate them into a sample Spring WebFlux application, and examine how this combination enables us to build fully reactive, data-intensive applications with scalability, efficiency, and real-time processing.

To achieve this, we’ll use Spring Cloud Stream Reactive Kafka Binder, Spring WebFlux, and ClickHouse.

2. Spring Cloud Stream Reactive Kafka Binder

Spring Cloud Stream provides an abstraction layer over stream-based and message-driven microservices. The Reactive Kafka Binder enables the creation of fully reactive pipelines by connecting Kafka topics, message brokers, or Spring Cloud Stream applications. These pipelines leverage Project Reactor to process data streams reactively, ensuring non-blocking, asynchronous, and backpressure-aware processing throughout the data flow.

Unlike traditional Kafka Streams, which operate synchronously, Reactive Kafka Streams empower developers to define end-to-end reactive pipelines where each piece of data can be mapped, transformed, filtered, or reduced in real-time while still maintaining efficient resource utilization.

This approach is particularly well-suited for high-throughput, event-driven applications requiring reactive paradigms for better scalability and responsiveness.

2.1. Reactive Kafka Streams With Spring

With Spring Cloud Stream Reactive Kafka Binder, we can integrate Reactive Kafka Streams seamlessly into Spring WebFlux applications, enabling entirely reactive, non-blocking data processing. By leveraging the reactive APIs provided by Project Reactor, we can handle backpressure, achieve asynchronous data flow, and process streams efficiently without blocking threads.

This combination of Reactive Kafka Streams and Spring WebFlux offers a robust solution for building applications that require distributed, real-time, and reactive data pipelines.

Next, let’s dive into a sample application to demonstrate these capabilities in action.

3. Building a Reactive Kafka Stream Application

In this sample application, we’ll simulate a stock analytics application that receives, processes, and distributes stock price data. This application will showcase how well the Spring Cloud Stream, Kafka, and reactive programming paradigms work together in the Spring ecosystem.

First, let’s get all the dependencies needed for us to build such an application using Spring Boot:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-dependencies</artifactId>
            <version>2023.0.2</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

For this example, we’ll use the Spring Cloud BOM, which solves all dependencies’ versions. We’ll also use Spring Boot and the following modules:

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-stream-kafka</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-stream-binder-kafka-reactive</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-webflux</artifactId>
</dependency>

Those modules allow us to build our web layer and data ingestion pipeline reactively. Despite our data processing pipeline, we’ll need some data persistence to save its outcome. Let’s use a simple and very powerful analytics database to do so:

<dependency>
    <groupId>com.clickhouse</groupId>
    <artifactId>clickhouse-r2dbc</artifactId>
    <version>0.7.1</version>
</dependency>

ClickHouse is a fast, open-source, column-oriented database management system that generates real-time analytical data reports using SQL queries. Since we aim to build an entirely reactive application, we’ll use its R2DB driver

3.1. Reactive Kafka Producer Setup

In order to start our data processing pipeline, we need a producer responsible for creating the data and submitting it to our application for data ingestion. Next, we’ll see how Spring helps us easily define and use our producers:

@Component
public class StockPriceProducer {
    public static final String[] STOCKS = {"AAPL", "GOOG", "MSFT", "AMZN", "TSLA"};
    private static final String CURRENCY = "USD";
    private final ReactiveKafkaProducerTemplate<String, StockUpdate> kafkaProducer;
    private final NewTopic topic;
    private final Random random = new Random();
    public StockPriceProducer(KafkaProperties properties, 
                              @Qualifier(TopicConfig.STOCK_PRICES_IN) NewTopic topic) {
        this.kafkaProducer = new ReactiveKafkaProducerTemplate<>(
          SenderOptions.create(properties.buildProducerProperties())
        );
        this.topic = topic;
    }
    public Flux<SenderResult<Void>> produceStockPrices(int count) {
        return Flux.range(0, count)
          .map(i -> {
              String stock = STOCKS[random.nextInt(STOCKS.length)];
              double price = 100 + (200 * random.nextDouble());
              return MessageBuilder.withPayload(new StockUpdate(stock, price, CURRENCY, Instant.now()))
                .setHeader(MessageHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE)
                .build();
          })
          .flatMap(stock -> {
              var newRecord = new ProducerRecord<>(
                topic.name(), 
                stock.getPayload().symbol(), 
                stock.getPayload());
              stock.getHeaders()
                .forEach((key, value) -> newRecord.headers().add(key, value.toString().getBytes()));
              return kafkaProducer.send(newRecord);
          });
    }
}

This class produces stock price updates and sends them to our Kafka topic.

In StockPriceProducer, we inject the KafkaProperties defined in our application YAML file, which contains all the information required to connect to our Kafka cluster:

spring:
  kafka:
    producer:
      bootstrap-servers: localhost:9092
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
    properties:
      spring:
        json:
          trusted:
            packages: '*'

Then, NewTopic holds the reference to our Kafka topic, which is all we need to create our ReactiveKafkaProducerTemplate instance. This class abstracts most of the complexity involved in communication between our application and our Kafka topic.

In the produceStockPrices() method, we generate our StockUpdate objects and wrap them in the Message object. Spring provides the Message class, which encapsulates message-based systems details, like the message payload and any necessary header we may need to include as the message’s content type. Finally, we create a ProducerRecord defining the destination topic for the message and its partition key, and then we send it.

3.2. Reactive Kafka Streams Setup

Now, let’s imagine the producer is outside the same application. We need to connect to the stock price update topic and convert the stock prices from USD to EUR so that other application parts can use the data. At the same time, we need to save the history of the original stock price within a particular time window. So, let’s configure our data stream pipeline:

spring:
  cloud:
    stream:
      default-binder: kafka
      kafka:
        binder:
          brokers: localhost:9092
      bindings:
        default:
          content-type: application/json
        processStockPrices-in-0:
          destination: stock-prices-in
          group: live-stock-consumers-x
        processStockPrices-out-0:
          destination: stock-prices-out
          group: live-stock-consumers-y
          producer:
            useNativeEncoding: true

First, we use the default-binder property to define Kafka as our default binder. Spring Cloud Stream is vendor-agnostic, allowing us to use different messaging systems (e.g., Kafka and RabbitMQ) within the same application if necessary.

Next, we configure bindings, which act as bridges between the messaging system (e.g., Kafka topics) and the application’s producers and consumers:

  • The input channel processStockPrices-in-0 is bound to the stock-prices-in topic, where messages are consumed.
  • The output channel processStockPrices-out-0 is bound to the stock-prices-out topic, where processed messages are published.

Each binding is associated with the processStockPrices() method, which processes data from the input channel, applies transformations, and sends the result to the output channel.

We also define the content type as JSON, ensuring that messages are serialized and deserialized as JSON. Additionally, using useNativeEncoding: true in the producer ensures that the Kafka producer is responsible for encoding and serializing the data.

The group property (e.g., live-stock-consumers-x) enables message load balancing across consumers. All consumers in the same group are responsible for processing messages from a topic, preventing duplication.

3.3. Reactive Kafka Streams Bindings Setup

As mentioned previously, bindings are the bridges between the input and output channels, allowing us to process data in transit. The name defined in the YAML file is essential, as it must correspond to the binding implementation, in our case, a function that applies the map between input and output messages.

Next, let’s see how Spring does it:

@Configuration
public class StockPriceProcessor {
    private static final String USD = "USD";
    private static final String EUR = "EUR";
    @Bean
    public Function<Flux<Message<StockUpdate>>, Flux<Message<StockUpdate>>> processStockPrices(
      ClickHouseRepository repository, 
      CurrencyRate currencyRate
    ) {
        return stockPrices -> stockPrices.flatMapSequential(message -> {
            StockUpdate stockUpdate = message.getPayload();
            return repository.saveStockPrice(stockUpdate)
              .flatMap(success -> Boolean.TRUE.equals(success) ? Mono.just(stockUpdate) : Mono.empty())
              .flatMap(stock -> currencyRate.convertRate(USD, EUR, stock.price()))
                .map(newPrice -> convertPrice(stockUpdate, newPrice))
                .map(priceInEuro -> MessageBuilder.withPayload(priceInEuro)
                  .setHeader(KafkaHeaders.KEY, stockUpdate.symbol())
                  .copyHeaders(message.getHeaders())
                  .build());
        });
    }
    private StockUpdate convertPrice(StockUpdate stockUpdate, double newPrice) {
        return new StockUpdate(stockUpdate.symbol(), newPrice, EUR, stockUpdate.timestamp());
    }
}

This configuration demonstrates how to reactively process and transform stock price updates between two Kafka topics. The processStockPrices() function binds the input stock-prices-in topic to the output stock-prices-out topic, adding a processing layer between them. The flow is as follows:

  1. Message Processing: Each incoming StockUpdate message from the input topic is processed sequentially using flatMapSequential(). This ensures that the order of processing matches the order of the input messages, which can be important in maintaining consistency.
  2. Database Persistence: Each stock update is saved into the database using the ClickHouseRepository for future reference. Only successfully saved updates proceed further.
  3. Currency Conversion: The stock price, originally in USD, is converted to EUR using the CurrencyRate service.
  4. Message Transformation: The converted price is wrapped in a new StockUpdate object, retaining the original symbol as the Kafka message key via KafkaHeaders.KEY. This ensures proper message partitioning in the Kafka topic.
  5. Reactive Pipeline: The entire flow is reactive, leveraging Project Reactor’s non-blocking asynchronous capabilities for scalability and efficiency.

3.4. Auxiliary Services

ClickHouseRepository and CurrencyRate are simple interfaces that provide us with a simple implementation to illustrate the sample application:

public interface CurrencyRate {
    Mono<Double> convertRate(String from, String to, double amount);
}
public interface ClickHouseRepository {
    Mono<Boolean> saveStockPrice(StockUpdate stockUpdate);
    Flux<StockUpdate> findMinuteAvgStockPrices(Instant from, Instant to);
} 

These functionalities show us the business logic an application can apply while processing such a data pipeline.

3.5. Reactive Kafka Streams Consumer Setup

Once processed, the data sent to the output channel can be consumed by the same application or any other application. Such a consumer can also be implemented using a reactive Kafka template:

@Component
public class StockPriceConsumer {
    private final ReactiveKafkaConsumerTemplate<String, StockUpdate> kafkaConsumerTemplate;
    public StockPriceConsumer(@NonNull KafkaProperties properties, 
                              @Qualifier(TopicConfig.STOCK_PRICES_OUT) NewTopic topic) {
        var receiverOptions = ReceiverOptions
          .<String, StockUpdate>create(properties.buildConsumerProperties())
          .subscription(List.of(topic.name()));
        this.kafkaConsumerTemplate = new ReactiveKafkaConsumerTemplate<>(receiverOptions);
    }
    @PostConstruct
    public void consume() {
       kafkaConsumerTemplate
         .receiveAutoAck()
         .doOnNext(consumerRecord -> {
             // simulate processing
             log.info(
               "received key={}, value={} from topic={}, offset={}, partition={}", consumerRecord.key(),
               consumerRecord.value(),
               consumerRecord.topic(),
               consumerRecord.offset(),
               consumerRecord.partition());
         })
         .doOnError(e -> log.error("Consumer error",  e))
         .doOnComplete(() -> log.info("Consumed all messages"))
         .subscribe();
    }
}

The StockPriceConsumer demonstrates consuming data from the stock-prices-out topic in a reactive way:

  1. Initialization: The constructor creates ReceiverOptions using Kafka properties from the YAML configuration. It subscribes to the stock-prices-out topic and explicitly assigns all partitions.
  2. Message Processing: The consume method subscribes to the output channel (processStockPrices-out-0) using receiveAutoAck(). Each message is logged with key, value, topic, offset, and partition details, simulating data processing.
  3. Reactive Features: The consumer starts processing messages reactively as they arrive, leveraging non-blocking, backpressure-aware processing. It also logs errors doOnError() and tracks completion doOnComplete().

The following properties configure our consumer:

spring:
  kafka:
    consumer:
      bootstrap-servers: localhost:9092
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      value-deserializer: org.springframework.kafka.support.serializer.JsonDeserializer
      group-id: my-group
      properties:
        reactiveAutoCommit: true

This consumer processes the stock-prices-out topic reactively, and this implementation highlights the seamless integration of reactive programming with Kafka for efficient stream processing.

3.6. Reactive WebFlux Application

Finally, now that the data is saved in our database, we can adequately serve such information to our users since the data is cached in our service and can be processed as necessary:

@RestController
public class StocksApi {
    private final ClickHouseRepository repository;
    @Autowired
    public StocksApi(ClickHouseRepository repository) {
        this.repository = repository;
    }
    @GetMapping("/stock-prices-out")
    public Flux<StockUpdate> getAvgStockPrices(@RequestParam("from") @NotNull Instant from,  
                                               @RequestParam("to") @NotNull Instant to) {
        if (from.isAfter(to)) {
            throw new ResponseStatusException(HttpStatus.BAD_REQUEST, "'from' must come before 'to'");
        }
        return repository.findMinuteAvgStockPrices(from, to);
    }
}

4. Connecting the Dots

We achieved an entirely reactive data processing pipeline with minimal code, connecting two Kafka topics, applying business logic, and ensuring high-throughput processing. This approach is ideal for event-driven systems requiring real-time data transformations. Spring Cloud Stream and Kafka form a powerful combination with extensive functionalities beyond what we covered here.

For example, bindings support multiple inputs and outputs, and Dead Letter Queues (DLQs) can enhance pipeline robustness. It’s also possible to integrate various messaging providers, enable transactional processing between channels, and more.

Spring Cloud Stream is a versatile tool. Combining it with the reactive paradigm unlocks potent data pipelines with resilience and high throughput. This article only scratches the surface of working with reactive Kafka Streams and Spring WebFlux, leaving much more to explore, but we have observed key benefits so far:

  • Real-Time Transformation: Enables live conversion and enrichment of event streams.
  • Backpressure Management: Handles data flow dynamically, avoiding system overload.
  • Seamless Integration: Combines Kafka’s event-driven power with Spring WebFlux’s non-blocking capabilities.
  • Scalable Design: Supports high-throughput systems with robust fault tolerance mechanisms like DLQs.

Despite this approach providing many benefits, as discussed in this article, there are also some points to pay attention to.

4.1. Practical Pitfalls and Best Practices

While reactive Kafka pipelines provide numerous advantages, they also introduce challenges:

  • Backpressure Handling: Failure to manage backpressure will lead to memory bloat or dropped messages. We need to utilize .onBackpressureBuffer() or .onBackpressureDrop() where appropriate.
  • Serialization Issues: Mismatched schemas between producers and consumers can cause deserialization failures. We must ensure schema compatibility.
  • Error Recovery: We must ensure proper retry mechanisms or use DLQs to handle transient issues effectively.
  • Resource Management: Inefficient message processing can overwhelm the application pipeline. In such a case, we can leverage the .limitRate() or .take() operators to control the processing rate within our reactive pipeline. We can also configure Kafka consumer fetch sizes and poll intervals to tune the rate at which messages are retrieved from Kafka and avoid overwhelming the application pipeline.
  • Data Consistency: Inconsistent data processing can arise without atomic operations or proper retries handling. We can use Kafka transactions for atomicity or/and write idempotent consumer logic to handle retries safely.
  • Schema Evolution: Evolving schemas without proper versioning can cause compatibility issues. We can use a schema registry for versioning and applying backward-compatible changes (e.g., adding optional fields).
  • Monitoring and Observability: Insufficient monitoring can make it challenging to identify bottlenecks or failures in the pipeline. We must integrate tools like Micrometer and Grafana (or any other preferred provider) for metrics and monitoring. We can also add trace IDs to Kafka messages for distributed tracking.

Paying attention to these points guarantees a very stable and scalable data process pipeline to our system.

5. Conclusion

In this article, we demonstrated how Reactive Kafka Streams, integrated with Spring WebFlux, can enable fully reactive, data-intensive pipelines that are scalable, efficient, and capable of real-time processing. By leveraging the reactive paradigm, we built a seamless data flow between Kafka topics, applied business logic, and achieved high-throughput, event-driven processing with minimal code. This powerful combination underscores the potential of modern reactive technologies in creating robust and scalable systems tailored for real-time data transformations.

As usual, all code samples used in this article are available over on GitHub.

The post Working With Reactive Kafka Stream and Spring WebFlux first appeared on Baeldung.
       

Viewing all articles
Browse latest Browse all 4616

Trending Articles