Merge Multiple PDF Files Into a Single PDF Using Java

1. Introduction

In the modern business and document management workflow, the ability to merge multiple PDF files into a single PDF document is a common requirement. Common use cases include presentations, consolidating reports, or compiling packages into a single package.

In Java, multiple libraries exist that provide out-of-the-box features to handle PDFs and merge them into a single PDF. Apache PDFBox and iText are among the popular ones.

In this tutorial, we’ll implement the PDF merge functionality using Apache PDFBox and iText.

2. Setup

Before diving into the implementation, let’s go through the necessary setup steps. We’ll add the required dependencies for the project, additionally, we’ll create helper methods for our tests.

2.1. Dependencies

We’ll use Apache PDFBox and iText for merging the PDF files. To use Apache PDFBox, We need to add the below dependency in the pom.xml file:

<dependency> 
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId> 
    <version>2.0.31</version> 
</dependency>

To use the iText, we need to add the below dependency in the pom.xml file:

<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itextpdf</artifactId>
    <version>5.5.13.3</version>
</dependency>

2.2. Test Setup

Let’s create a sample PDF file that we’ll use to test our logic. We can create a utility method to create PDF so that we can use it across different tests:

static void createPDFDoc(String content, String filePath) throws IOException {
    PDDocument document = new PDDocument();
    for (int i = 0; i < 3; i++) {
        PDPage page = new PDPage();
        document.addPage(page);
        try (PDPageContentStream contentStream = new PDPageContentStream(document, page)) {
            contentStream.beginText();
            contentStream.setFont(PDType1Font.HELVETICA_BOLD, 14);
            contentStream.showText(content + ", page:" + i);
            contentStream.endText();
        }
    }
    document.save("src/test/resources/temp/" + filePath);
    document.close();
}

In the above logic, we create a PDF document and add three pages to it using a custom font. Now that we have the createPDFDoc() method, let’s call it before each test and delete the file after finishing the test :

@BeforeEach
public void create() throws IOException {
    File tempDirectory = new File("src/test/resources/temp");
    tempDirectory.mkdirs();
    List.of(List.of("hello_world1", "file1.pdf"), List.of("hello_world2", "file2.pdf"))
        .forEach(pair -> {
            try {
                createPDFDoc(pair.get(0), pair.get(1));
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        });
}

@AfterEach
public void destroy() throws IOException {
    Stream<Path> paths = Files.walk(Paths.get("src/test/resources/temp/"));
    paths.sorted((p1, p2) -> -p1.compareTo(p2))
         .forEach(path -> {
            try {
                Files.delete(path);
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        });
}

3. Using Apache PDFBox

Apache PDFBox is an open-source Java library for working with PDF documents. It provides a range of functionalities to create, manipulate, and extract content from PDF files programmatically.

PDFBox provides a PDFMergerUtility helper class to merge multiple PDF documents. We can use the addSource() method to add PDF files. The mergeDocuments() method merges all the added sources, which results in the final merged PDF document:

void mergeUsingPDFBox(List<String> pdfFiles, String outputFile) throws IOException {
    PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
    pdfMergerUtility.setDestinationFileName(outputFile);
    pdfFiles.forEach(file -> {
        try {
            pdfMergerUtility.addSource(new File(file));
        } catch (FileNotFoundException e) {
            throw new RuntimeException(e);
        }
    });
    pdfMergerUtility.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
}

As demonstrated above, the mergeDocuments() method takes an argument to configure memory usage when merging documents. We’re defining exclusively to use only main memory, i.e., the RAM, for buffering during the merging of documents. We can choose from plenty of other options for buffering memory, including disk, a combination of RAM and disk, and so on.

We can write a unit test to verify if the merge logic works as expected:

@Test
void givenMultiplePdfs_whenMergeUsingPDFBoxExecuted_thenPdfsMerged() throws IOException {
    List<String> files = List.of("src/test/resources/temp/file1.pdf", "src/test/resources/temp/file2.pdf");
    PDFMerge pdfMerge = new PDFMerge();
    pdfMerge.mergeUsingPDFBox(files, "src/test/resources/temp/output.pdf");
    try (PDDocument document = PDDocument.load(new File("src/test/resources/temp/output.pdf"))) {
        PDFTextStripper pdfStripper = new PDFTextStripper();
        String actual = pdfStripper.getText(document);
        String expected = """
            hello_world1, page:0
            hello_world1, page:1
            hello_world1, page:2
            hello_world2, page:0
            hello_world2, page:1
            hello_world2, page:2
            """;
        assertEquals(expected, actual);
    }
}

In the test above, we merged two PDF files using PDFBox into an output file and validated the merged content.

4. Using iText

iText is another popular Java library for creating and manipulating PDF documents. It provides a wide range of features, such as generating PDF files while including text, images, tables, and other elements such as hyperlinks and form fields.

iText provides the PdfReader and PdfWriter classes that are very handy in reading input files and writing them to the output files:

void mergeUsingIText(List<String> pdfFiles, String outputFile) throws IOException, DocumentException {
    List<PdfReader> pdfReaders = List.of(new PdfReader(pdfFiles.get(0)), new PdfReader(pdfFiles.get(1)));
    Document document = new Document();
    FileOutputStream fos = new FileOutputStream(outputFile);
    PdfWriter writer = PdfWriter.getInstance(document, fos);
    document.open();
    PdfContentByte directContent = writer.getDirectContent();
    PdfImportedPage pdfImportedPage;
    for (PdfReader pdfReader : pdfReaders) {
        int currentPdfReaderPage = 1;
        while (currentPdfReaderPage <= pdfReader.getNumberOfPages()) {
            document.newPage();
            pdfImportedPage = writer.getImportedPage(pdfReader, currentPdfReaderPage);
            directContent.addTemplate(pdfImportedPage, 0, 0);
            currentPdfReaderPage++;
        }
    }
    fos.flush();
    document.close();
    fos.close();
}

In the above logic, we read and then import the pages of PdfReader to PdfWrite using the getImportedPage() method and then add that to the directContent object, which essentially stores the read buffer of contents. Once we finish the reading, we flush the output stream fos, which writes to the output file.

We can verify our logic by writing a unit test:

@Test
void givenMultiplePdfs_whenMergeUsingITextExecuted_thenPdfsMerged() throws IOException, DocumentException {
    List<String> files = List.of("src/test/resources/temp/file1.pdf", "src/test/resources/temp/file2.pdf");
    PDFMerge pdfMerge = new PDFMerge();
    pdfMerge.mergeUsingIText(files, "src/test/resources/temp/output1.pdf");
    try (PDDocument document = PDDocument.load(new File("src/test/resources/temp/output1.pdf"))) {
        PDFTextStripper pdfStripper = new PDFTextStripper();
        String actual = pdfStripper.getText(document);
        String expected = """
            hello_world1, page:0
            hello_world1, page:1
            hello_world1, page:2
            hello_world2, page:0
            hello_world2, page:1
            hello_world2, page:2
            """;
        assertEquals(expected, actual);
    }
}

Our test is almost the same as in the previous section. The only difference is we’re calling the mergeUsingIText() method, which uses iText for merging the PDF files.

5. Conclusion

In this article, we explored how we can merge PDF files using Apache PDFBox and iText. Both libraries are feature-rich and allow us to handle different types of content inside PDF files. We implemented merge functionality and also wrote tests to verify the results.

As usual, the complete source code for the examples is available over on GitHub.

Merge Multiple PDF Files Into a Single PDF Using Java

1. Introduction

2. Setup

2.1. Dependencies

2.2. Test Setup

3. Using Apache PDFBox

4. Using iText

5. Conclusion

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112