1. Introduction
Spring Batch provides two different ways for implementing a job: using tasklets and chunks.
In this article, we’ll learn how to configure and implement both methods using a simple real-life example.
2. Dependencies
Let’s get started by adding the required dependencies:
<dependency> <groupId>org.springframework.batch</groupId> <artifactId>spring-batch-core</artifactId> <version>4.0.0.RELEASE</version> </dependency> <dependency> <groupId>org.springframework.batch</groupId> <artifactId>spring-batch-test</artifactId> <version>4.0.0.RELEASE</version> <scope>test</scope> </dependency>
To get the latest version of spring-batch-core and spring-batch-test, please refer to Maven Central.
3. Our Use Case
Let’s consider a CSV file with the following content:
Mae Hodges,10/22/1972 Gary Potter,02/22/1953 Betty Wise,02/17/1968 Wayne Rose,04/06/1977 Adam Caldwell,09/27/1995 Lucille Phillips,05/14/1992
The first position of each line represents a person’s name and the second position represents his/her date of birth.
Our use case is to generate another CSV file that contains each person’s name and age:
Mae Hodges,45 Gary Potter,64 Betty Wise,49 Wayne Rose,40 Adam Caldwell,22 Lucille Phillips,25
Now that our domain is clear let’s go ahead and build a solution using both approaches. We’ll start with tasklets.
4. Tasklets Approach
4.1. Introduction and Design
Tasklets are meant to perform a single task within a step. Our job will consist of several steps that execute one after the other. Each step should perform only one defined task.
Our job will consist of three steps:
- Read lines from the input CSV file.
- Calculate age for every person in the input CSV file.
- Write name and age of each person to a new output CSV file.
Now that the big picture is ready, let’s create one class per step.
LinesReader will be in charge of reading data from the input file:
public class LinesReader implements Tasklet { // ... }
LinesProcessor will calculate the age for every person in the file:
public class LinesProcessor implements Tasklet { // ... }
Finally, LinesWriter will have the responsibility of writing names and ages to an output file:
public class LinesWriter implements Tasklet { // ... }
At this point, all our steps implement Tasklet interface. That will force us to implement its execute method:
@Override public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception { // ... }
This method is where we’ll add the logic for each step. Before starting with that code, let’s configure our job.
4.2. Configuration
We need to add some configuration to Spring’s application context. After adding standard bean declaration for the classes created in the previous section, we’re ready to create our job definition:
@Configuration @EnableBatchProcessing public class TaskletsConfig { @Autowired private JobBuilderFactory jobs; @Autowired private StepBuilderFactory steps; @Bean protected Step readLines() { return steps .get("readLines") .tasklet(linesReader()) .build(); } @Bean protected Step processLines() { return steps .get("processLines") .tasklet(linesProcessor()) .build(); } @Bean protected Step writeLines() { return steps .get("writeLines") .tasklet(linesWriter()) .build(); } @Bean public Job job() { return jobs .get("taskletsJob") .start(readLines()) .next(processLines()) .next(writeLines()) .build(); } // ... }
This means that our “taskletsJob” will consist of three steps. The first one (readLines) will execute the tasklet defined in the bean linesReader and move to the next step: processLines. ProcessLines will perform the tasklet defined in the bean linesProcessor and go to the final step: writeLines.
Our job flow is defined, and we’re ready to add some logic!
4.3. Model and Utils
As we’ll be manipulating lines in a CSV file, we’re going to create a class Line:
public class Line implements Serializable { private String name; private LocalDate dob; private Long age; // standard constructor, getters, setters and toString implementation }
Please note that Line implements Serializable. That is because Line will act as a DTO to transfer data between steps. According to Spring Batch, objects that are transferred between steps must be serializable.
On the other hand, we can start thinking about reading and writing lines.
For that, we’ll make use of OpenCSV:
<dependency> <groupId>com.opencsv</groupId> <artifactId>opencsv</artifactId> <version>4.1</version> </dependency>
Look for the latest OpenCSV version in Maven Central.
Once OpenCSV is included, we’re also going to create a FileUtils class. It will provide methods for reading and writing CSV lines:
public class FileUtils { public Line readLine() throws Exception { if (CSVReader == null) initReader(); String[] line = CSVReader.readNext(); if (line == null) return null; return new Line( line[0], LocalDate.parse( line[1], DateTimeFormatter.ofPattern("MM/dd/yyyy"))); } public void writeLine(Line line) throws Exception { if (CSVWriter == null) initWriter(); String[] lineStr = new String[2]; lineStr[0] = line.getName(); lineStr[1] = line .getAge() .toString(); CSVWriter.writeNext(lineStr); } // ... }
Notice that readLine acts as a wrapper over OpenCSV’s readNext method and returns a Line object.
Same way, writeLine wraps OpenCSV’s writeNext receiving a Line object. Full implementation of this class can be found in the GitHub Project.
At this point, we’re all set to start with each step implementation.
4.4. LinesReader
Let’s go ahead and complete our LinesReader class:
public class LinesReader implements Tasklet, StepExecutionListener { private final Logger logger = LoggerFactory .getLogger(LinesReader.class); private List<Line> lines; private FileUtils fu; @Override public void beforeStep(StepExecution stepExecution) { lines = new ArrayList<>(); fu = new FileUtils( "taskletsvschunks/input/tasklets-vs-chunks.csv"); logger.debug("Lines Reader initialized."); } @Override public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception { Line line = fu.readLine(); while (line != null) { lines.add(line); logger.debug("Read line: " + line.toString()); line = fu.readLine(); } return RepeatStatus.FINISHED; } @Override public ExitStatus afterStep(StepExecution stepExecution) { fu.closeReader(); stepExecution .getJobExecution() .getExecutionContext() .put("lines", this.lines); logger.debug("Lines Reader ended."); return ExitStatus.COMPLETED; } }
LinesReader’s execute method creates a FileUtils instance over the input file path. Then, adds lines to a list until there’re no more lines to read.
Our class also implements StepExecutionListener that provides two extra methods: beforeStep and afterStep. We’ll use those methods to initialize and close things before and after execute runs.
If we take a look at afterStep code, we’ll notice the line where the result list (lines) is put in the job’s context to make it available for the next step:
stepExecution .getJobExecution() .getExecutionContext() .put("lines", this.lines);
At this point, our first step has already fulfilled its responsibility: load CSV lines into a List in memory. Let’s move to the second step and process them.
4.5. LinesProcessor
LinesProcessor will also implement StepExecutionListener and of course, Tasklet. That means that it will implement beforeStep, execute and afterStep methods as well:
public class LinesProcessor implements Tasklet, StepExecutionListener { private Logger logger = LoggerFactory.getLogger( LinesProcessor.class); private List<Line> lines; @Override public void beforeStep(StepExecution stepExecution) { ExecutionContext executionContext = stepExecution .getJobExecution() .getExecutionContext(); this.lines = (List<Line>) executionContext.get("lines"); logger.debug("Lines Processor initialized."); } @Override public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception { for (Line line : lines) { long age = ChronoUnit.YEARS.between( line.getDob(), LocalDate.now()); logger.debug("Calculated age " + age + " for line " + line.toString()); line.setAge(age); } return RepeatStatus.FINISHED; } @Override public ExitStatus afterStep(StepExecution stepExecution) { logger.debug("Lines Processor ended."); return ExitStatus.COMPLETED; } }
It’s effortless to understand that it loads lines list from the job’s context and calculates the age of each person.
There’s no need to put another result list in the context as modifications happen on the same object that comes from the previous step.
And we’re ready for our last step.
4.6. LinesWriter
LinesWriter‘s task is to go over lines list and write name and age to the output file:
public class LinesWriter implements Tasklet, StepExecutionListener { private final Logger logger = LoggerFactory .getLogger(LinesWriter.class); private List<Line> lines; private FileUtils fu; @Override public void beforeStep(StepExecution stepExecution) { ExecutionContext executionContext = stepExecution .getJobExecution() .getExecutionContext(); this.lines = (List<Line>) executionContext.get("lines"); fu = new FileUtils("output.csv"); logger.debug("Lines Writer initialized."); } @Override public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception { for (Line line : lines) { fu.writeLine(line); logger.debug("Wrote line " + line.toString()); } return RepeatStatus.FINISHED; } @Override public ExitStatus afterStep(StepExecution stepExecution) { fu.closeWriter(); logger.debug("Lines Writer ended."); return ExitStatus.COMPLETED; } }
We’re done with our job’s implementation! Let’s create a test to run it and see the results.
4.7. Running the Job
To run the job, we’ll create a test:
@RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(classes = TaskletsConfig.class) public class TaskletsTest { @Autowired private JobLauncherTestUtils jobLauncherTestUtils; @Test public void givenTaskletsJob_whenJobEnds_thenStatusCompleted() throws Exception { JobExecution jobExecution = jobLauncherTestUtils.launchJob(); assertEquals(ExitStatus.COMPLETED, jobExecution.getExitStatus()); } }
ContextConfiguration annotation is pointing to the Spring context configuration class, that has our job definition.
We’ll need to add a couple of extra beans before running the test:
@Bean public JobLauncherTestUtils jobLauncherTestUtils() { return new JobLauncherTestUtils(); } @Bean public JobRepository jobRepository() throws Exception { MapJobRepositoryFactoryBean factory = new MapJobRepositoryFactoryBean(); factory.setTransactionManager(transactionManager()); return (JobRepository) factory.getObject(); } @Bean public PlatformTransactionManager transactionManager() { return new ResourcelessTransactionManager(); } @Bean public JobLauncher jobLauncher() throws Exception { SimpleJobLauncher jobLauncher = new SimpleJobLauncher(); jobLauncher.setJobRepository(jobRepository()); return jobLauncher; }
Everything is ready! Go ahead and run the test!
After the job has finished, output.csv has the expected content and logs show the execution flow:
[main] DEBUG o.b.t.tasklets.LinesReader - Lines Reader initialized. [main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Mae Hodges,10/22/1972] [main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Gary Potter,02/22/1953] [main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Betty Wise,02/17/1968] [main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Wayne Rose,04/06/1977] [main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Adam Caldwell,09/27/1995] [main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Lucille Phillips,05/14/1992] [main] DEBUG o.b.t.tasklets.LinesReader - Lines Reader ended. [main] DEBUG o.b.t.tasklets.LinesProcessor - Lines Processor initialized. [main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 45 for line [Mae Hodges,10/22/1972] [main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 64 for line [Gary Potter,02/22/1953] [main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 49 for line [Betty Wise,02/17/1968] [main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 40 for line [Wayne Rose,04/06/1977] [main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 22 for line [Adam Caldwell,09/27/1995] [main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 25 for line [Lucille Phillips,05/14/1992] [main] DEBUG o.b.t.tasklets.LinesProcessor - Lines Processor ended. [main] DEBUG o.b.t.tasklets.LinesWriter - Lines Writer initialized. [main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Mae Hodges,10/22/1972,45] [main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Gary Potter,02/22/1953,64] [main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Betty Wise,02/17/1968,49] [main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Wayne Rose,04/06/1977,40] [main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Adam Caldwell,09/27/1995,22] [main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Lucille Phillips,05/14/1992,25] [main] DEBUG o.b.t.tasklets.LinesWriter - Lines Writer ended.
That’s it for Tasklets. Now we can move on to the Chunks approach.
5. Chunks Approach
5.1. Introduction and Design
As the name suggests, this approach performs actions over chunks of data. That is, instead of reading, processing and writing all the lines at once, it’ll read, process and write a fixed amount of records (chunk) at a time.
Then, it’ll repeat the cycle until there’s no more data in the file.
As a result, the flow will be slightly different:
- While there’re lines:
- Do for X amount of lines:
- Read one line
- Process one line
- Write X amount of lines.
- Do for X amount of lines:
So, we also need to create three beans for chunk oriented approach:
public class LineReader { // ... }
public class LineProcessor { // ... }
public class LinesWriter { // ... }
Before moving to implementation, let’s configure our job.
5.2. Configuration
The job definition will also look different:
@Configuration @EnableBatchProcessing public class ChunksConfig { @Autowired private JobBuilderFactory jobs; @Autowired private StepBuilderFactory steps; @Bean public ItemReader<Line> itemReader() { return new LineReader(); } @Bean public ItemProcessor<Line, Line> itemProcessor() { return new LineProcessor(); } @Bean public ItemWriter<Line> itemWriter() { return new LinesWriter(); } @Bean protected Step processLines(ItemReader<Line> reader, ItemProcessor<Line, Line> processor, ItemWriter<Line> writer) { return steps.get("processLines").<Line, Line> chunk(2) .reader(reader) .processor(processor) .writer(writer) .build(); } @Bean public Job job() { return jobs .get("chunksJob") .start(processLines(itemReader(), itemProcessor(), itemWriter())) .build(); } }
In this case, there’s only one step performing only one tasklet.
However, that tasklet defines a reader, a writer and a processor that will act over chunks of data.
Note that the commit interval indicates the amount of data to be processed in one chunk. Our job will read, process and write two lines at a time.
Now we’re ready to add our chunk logic!
5.3. LineReader
LineReader will be in charge of reading one record and returning a Line instance with its content.
To become a reader, our class has to implement ItemReader interface:
public class LineReader implements ItemReader<Line> { @Override public Line read() throws Exception { Line line = fu.readLine(); if (line != null) logger.debug("Read line: " + line.toString()); return line; } }
The code is straightforward, it just reads one line and returns it. We’ll also implement StepExecutionListener for the final version of this class:
public class LineReader implements ItemReader<Line>, StepExecutionListener { private final Logger logger = LoggerFactory .getLogger(LineReader.class); private FileUtils fu; @Override public void beforeStep(StepExecution stepExecution) { fu = new FileUtils("taskletsvschunks/input/tasklets-vs-chunks.csv"); logger.debug("Line Reader initialized."); } @Override public Line read() throws Exception { Line line = fu.readLine(); if (line != null) logger.debug("Read line: " + line.toString()); return line; } @Override public ExitStatus afterStep(StepExecution stepExecution) { fu.closeReader(); logger.debug("Line Reader ended."); return ExitStatus.COMPLETED; } }
It should be noticed that beforeStep and afterStep execute before and after the whole step respectively.
5.4. LineProcessor
LineProcessor follows pretty much the same logic than LineReader.
However, in this case, we’ll implement ItemProcessor and its method process():
public class LineProcessor implements ItemProcessor<Line, Line> { private Logger logger = LoggerFactory.getLogger(LineProcessor.class); @Override public Line process(Line line) throws Exception { long age = ChronoUnit.YEARS .between(line.getDob(), LocalDate.now()); logger.debug("Calculated age " + age + " for line " + line.toString()); line.setAge(age); return line; } }
The process() method takes an input line, processes it and returns an output line. Again, we’ll also implement StepExecutionListener:
public class LineProcessor implements ItemProcessor<Line, Line>, StepExecutionListener { private Logger logger = LoggerFactory.getLogger(LineProcessor.class); @Override public void beforeStep(StepExecution stepExecution) { logger.debug("Line Processor initialized."); } @Override public Line process(Line line) throws Exception { long age = ChronoUnit.YEARS .between(line.getDob(), LocalDate.now()); logger.debug( "Calculated age " + age + " for line " + line.toString()); line.setAge(age); return line; } @Override public ExitStatus afterStep(StepExecution stepExecution) { logger.debug("Line Processor ended."); return ExitStatus.COMPLETED; } }
5.5. LinesWriter
Unlike reader and processor, LinesWriter will write an entire chunk of lines so that it receives a List of Lines:
public class LinesWriter implements ItemWriter<Line>, StepExecutionListener { private final Logger logger = LoggerFactory .getLogger(LinesWriter.class); private FileUtils fu; @Override public void beforeStep(StepExecution stepExecution) { fu = new FileUtils("output.csv"); logger.debug("Line Writer initialized."); } @Override public void write(List<? extends Line> lines) throws Exception { for (Line line : lines) { fu.writeLine(line); logger.debug("Wrote line " + line.toString()); } } @Override public ExitStatus afterStep(StepExecution stepExecution) { fu.closeWriter(); logger.debug("Line Writer ended."); return ExitStatus.COMPLETED; } }
LinesWriter code speaks for itself. And again, we’re ready to test our job.
5.6. Running the Job
We’ll create a new test, same as the one we created for the tasklets approach:
@RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(classes = ChunksConfig.class) public class ChunksTest { @Autowired private JobLauncherTestUtils jobLauncherTestUtils; @Test public void givenChunksJob_whenJobEnds_thenStatusCompleted() throws Exception { JobExecution jobExecution = jobLauncherTestUtils.launchJob(); assertEquals(ExitStatus.COMPLETED, jobExecution.getExitStatus()); } }
After configuring ChunksConfig as explained above for TaskletsConfig, we’re all set to run the test!
Once the job is done, we can see that output.csv contains the expected result again, and the logs describe the flow:
[main] DEBUG o.b.t.chunks.LineReader - Line Reader initialized. [main] DEBUG o.b.t.chunks.LinesWriter - Line Writer initialized. [main] DEBUG o.b.t.chunks.LineProcessor - Line Processor initialized. [main] DEBUG o.b.t.chunks.LineReader - Read line: [Mae Hodges,10/22/1972] [main] DEBUG o.b.t.chunks.LineReader - Read line: [Gary Potter,02/22/1953] [main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 45 for line [Mae Hodges,10/22/1972] [main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 64 for line [Gary Potter,02/22/1953] [main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Mae Hodges,10/22/1972,45] [main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Gary Potter,02/22/1953,64] [main] DEBUG o.b.t.chunks.LineReader - Read line: [Betty Wise,02/17/1968] [main] DEBUG o.b.t.chunks.LineReader - Read line: [Wayne Rose,04/06/1977] [main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 49 for line [Betty Wise,02/17/1968] [main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 40 for line [Wayne Rose,04/06/1977] [main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Betty Wise,02/17/1968,49] [main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Wayne Rose,04/06/1977,40] [main] DEBUG o.b.t.chunks.LineReader - Read line: [Adam Caldwell,09/27/1995] [main] DEBUG o.b.t.chunks.LineReader - Read line: [Lucille Phillips,05/14/1992] [main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 22 for line [Adam Caldwell,09/27/1995] [main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 25 for line [Lucille Phillips,05/14/1992] [main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Adam Caldwell,09/27/1995,22] [main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Lucille Phillips,05/14/1992,25] [main] DEBUG o.b.t.chunks.LineProcessor - Line Processor ended. [main] DEBUG o.b.t.chunks.LinesWriter - Line Writer ended. [main] DEBUG o.b.t.chunks.LineReader - Line Reader ended.
We have the same result and a different flow. Logs make evident how the job executes following this approach.
6. Conclusion
Different contexts will show the need for one approach or the other. While Tasklets feel more natural for ‘one task after the other’ scenarios, chunks provide a simple solution to deal with paginated reads or situations where we don’t want to keep a significant amount of data in memory.
The complete implementation of this example can be found in the GitHub project.