1. Overview
In this tutorial, we’ll learn how to generate Java classes from an Apache Avro schema.
First, we’ll familiarize ourselves with two methods: using the existing Gradle plugin and implementing a custom task for the build script. Then, we’ll identify the pros and cons of each approach and understand which scenarios they fit best.
2. Getting Started With Apache Avro
Our primary focus is on generating Java classes from Apache Avro schemas. Let’s briefly recap the essential concepts before diving into the intricacies of code generation.
2.1. Apache Avro Schema Definition
First, let’s prepare the required dependencies to process the Avro format. We’ll require the apache.avro module for data serialization and deserialization, so we’ll add it to the libs.version.toml and build.gradle files:
# libs.versions.toml
[versions]
// project dependencies versions
avro = "1.11.0"
[libraries]
// project libratirs
avro = {module = "org.apache.avro:avro", version.ref = "avro"}
# build.gradle
dependencies {
implementation libs.avro
// project dependencies
}
The next step is defining the Avro Schema. For demonstration purposes, let’s prepare two schemas, one for each method used in this tutorial:
- /src/main/avro/user.avsc — for the Gradle plugin approach
- /src/main/custom/pet.avsc —for the custom Gradle task approach
We place the schemas in separate folders to maintain the correct folder structure. This also helps prevent the ClassAlreadyExists exception and ensures that the Gradle build system correctly recognizes and processes our Avro schema definitions.
The folder structure above also affects the schema definition. The User schema belongs to the avro namespace:
{
"type": "record",
"name": "User",
"namespace": "avro",
"fields": [
{
"name": "firstName",
"type": "string"
},
{
"name": "lastName",
"type": "string"
},
{
"name": "phoneNumber",
"type": "string"
}
]
}
Similarly, let’s define a Pet schema under a custom namespace:
{
"type": "record",
"name": "Pet",
"namespace": "custom",
"fields": [
{
"name": "petId",
"type": "string"
},
{
"name": "name",
"type": "string"
},
{
"name": "species",
"type": "string"
},
{
"name": "age",
"type": "int"
}
]
}
Selecting the appropriate namespace is critical to prevent naming conflicts during Java classes generation. That’s why, we will adhere to widely accepted practices by utilizing the folder hierarchy to determine the namespace identifier.
3. Java Classes Generation
Now that we’ve defined the schemas, it’s time to compile them!
3.1. Using Avro-Tools in Command Line
Out of the box, Apache Avro Framework provides tools such as an avro-tools jar to generate code:
java -jar /path/to/avro-tools-1.11.1.jar compile schema <schema file> <destination>
However, while understanding avro-tools functionality empowers us in terms of base for custom solutions, this method isn’t convenient for most real-life scenarios, where the primary requirement is to generate code during the build script’s execution.
3.2. Using Open-Source Avro Gradle Plugin
One of the possible solutions to integrate code generation into our build is using the open-source avro-gradle-plugin by davidmc24.
We only need to import the dependency and extend the build.gradle file by including the plugin ID. Let’s use the latest release from the official release page:
# libs.versions.toml
[plugins]
avro = { id = "com.github.davidmc24.gradle.plugin.avro", version = "1.9.1" }
# build.gradle
plugins {
id 'java'
alias libs.plugins.avro
}
After that, the library is ready to be used!
The plugin, by default, uses the /src/main/avro directory as a source and stores the generated classes in /build/generated-main-avro-java. We can customize this behavior by overwriting GenerateAvroJavaTask:
def generateAvro = tasks.register("generateAvro", GenerateAvroJavaTask) {
source("src/<custom>")
outputDir = file("dest/avro")
}
At first glance, this method seems quite flexible and easy to use. However, the project has been archived. So, it might not be convenient for commercial use, as further updates to this library are unlikely. For such use cases, it may be best to implement a custom Gradle task leveraging the capabilities of the Apache Avro tools library.
3.3. Implementing Custom Gradle Task
The idea behind the custom Gradle task for code generation revolves around harnessing the robust mechanism offered by the Apache Avro framework with the avro-tools jar. For that, we’ll need to update our libs.versions.toml accordingly:
# libs.versions.toml
[versions]
avro = "1.11.0"
[libraries]
avro = {module = "org.apache.avro:avro", version.ref = "avro"}
avro-tools = {module = "org.apache.avro:avro-tools", version.ref = "avro"}
The versions of the Avro and Avro-tools libraries should be equal to prevent conflicts from arising.
In addition, we’ll need to update the build script by adding the avro-tools jar to the classpath. The timing of the build process is crucial. Typically, the build script executes sequentially, resolving dependencies and executing tasks in the order specified in the script.
In the Avro schema code generation context, the custom Gradle task responsible for this task needs access to the Avro-tools library early in the build process, i.e., before general dependencies are loaded:
# build.gradle
buildscript {
dependencies {
classpath libs.avro.tools
}
}
def avroSchemasDir = "src/main/custom"
def avroCodeGenerationDir = "build/generated-main-avro-custom-java"
// Add the generated Avro Java code to the Gradle source files.
sourceSets.main.java.srcDirs += [avroCodeGenerationDir]
In this step, we can also define the source and output directories and add them to sourceSets to ensure they’re accessible by the Gradle script.
The main engine driving our custom Gradle task is SpecificCompilerTool. This class is central to the Avro code generation process, offering functionality similar to executing the command we saw earlier:
java -jar /path/to/avro-tools-1.11.1.jar compile schema <schema file> <destination> [..args]
We can customize parameters such as encoding and field visibility. The official documentation provides more information on SpecificCompilerTool:
tasks.register('customAvroCodeGeneration') {
// Define the task inputs and outputs for the Gradle up-to-date checks.
inputs.dir(avroSchemasDir)
outputs.dir(avroCodeGenerationDir)
// The Avro code generation logs to the standard streams. Redirect the standard streams to the Gradle log.
logging.captureStandardOutput(LogLevel.INFO);
logging.captureStandardError(LogLevel.ERROR)
doLast {
new SpecificCompilerTool().run(System.in, System.out, System.err, List.of(
"-encoding", "UTF-8",
"-string",
"-fieldVisibility", "private",
"-noSetters",
"schema", "$projectDir/$avroSchemasDir".toString(), "$projectDir/$avroCodeGenerationDir".toString()
))
}
}
Lastly, to include the code generation in the build flow, let’s add the dependency on customAvroCodeGeneration:
tasks.withType(JavaCompile).configureEach {
// Make Java compilation tasks depend on the Avro code generation task.
dependsOn('customAvroCodeGeneration')
}
As a result, we’ll have an Avro code generation job triggered whenever the build command is called.
4. Conclusion
Summing up this article, we familiarized ourselves with two approaches to Java code generation from Avro schemas.
The first method leverages the open-source avro-gradle-plugin, offering flexibility and seamless integration into Gradle projects. However, its suitability for commercial use may be limited since it has been archived.
The second approach involves the implementation of a custom Gradle task that extends the avro-tools library. This method is advantageous because it introduces minimal dependencies restricted to those inherent to the Apache Avro framework. This strategy helps minimize the risk of potential conflicts arising from the usage of incompatible library versions. Furthermore, the Gradle task provides control over the generation flow and might be helpful in use cases where additional checks might be required before compiling to Java classes. For instance, adding custom validation into the build pipeline, etc. This approach provides reliability and stability, making it well-suited for production environments with critical dependency management.
The complete examples are available over on GitHub.