Quantcast
Channel: Baeldung
Viewing all articles
Browse latest Browse all 4535

Parse Java Source Code and Extract Methods

$
0
0

1. Introduction

In this article, we’re going to investigate the JavaCompiler API. We’ll see what this API is, what we can do with it, and how to use it to extract the details of methods defined in our source files.

2. The JavaCompiler API

Java 6 introduced the ToolProvider mechanism, which gives us access to various built-in JVM tools. Amongst other things, this includes the JavaCompiler. This is the same functionality as in the javac application, which is only programmatically available.

Using this, we can compile Java source code. However, we can also extract information from the code as part of the compilation process.

To get access to the JavaCompiler, we need to use the ToolProvider, which will give us an instance if available:

JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();

Note that there’s no guarantee that the JavaCompiler will be available. It depends on the JVM being used and what tooling it makes available.

However, interrogating the Java code instead of simply compiling it is implementation-dependent. In this article, we’re assuming the use of the Oracle compiler and that the tools.jar file is available on the classpath. Note that since Java 9, this file is no longer available by default, so we need to make sure an appropriate version is available for use.

3. Processing Java Code

Once a JavaCompiler instance is available, we can process some Java code. We need an appropriate JavaFileManager instance and an appropriate collection of JavaFileObject instances to do this. Exactly how we do both of these things depends on the source of the code that we wish to process.

If we want to process code that exists as files on disk, we can rely on the JVM tooling. In particular, the StandardJavaFileManager that the JavaCompiler instance provides access to is intended precisely for this purpose:

StandardJavaFileManager fileManager = compiler.getStandardFileManager(null, null, StandardCharsets.UTF_8);

Once we’ve got this, we can then use it to access the files that we want to process:

Iterable<? extends JavaFileObject> compilationUnits = fileManager.getJavaFileObjectsFromFiles(Arrays.asList(new File(filename)));

We can use other instances of these if we need to. For example, if we want to process code held in local variables,

Once we have these, we can then process our files:

JavacTask javacTask = 
  (JavacTask) compiler.getTask(null, fileManager, null, null, null, compilationUnits);
Iterable<? extends CompilationUnitTree> compilationUnitTrees = javacTask.parse();

Note that we’re casting the compiler’s result.getTask() into a JavacTask instance. This class exists in the tools.jar file and is the entry point to interrogating the processed Java source. We then use this to parse our input files into a collection of CompilationUnitTree types. Each of these represents on file that we provided to the compiler.

4. Compilation Unit Details

Once we’ve got this far, we have the parsed details of the compilation unit – that is, the source files that we’ve processed – available to us.

The first thing that we can do is to interrogate the top-level details. For example, we can see what package it represents using getPackageName() and get the list of imports using getImports(). We can also use getTypeDecls() to get the list of all top-level declarations – which typically means the class definitions but could be anything the Java language supports.

We’ll notice here that everything returned is an implementation of the Tree interface. The entire compilation unit is represented as a tree structure, allowing things to be appropriately nested. For example, it’s legal to have class definitions nested inside methods where the method is already nested inside another class.

One advantage this gives us is that the Tree structure implements the visitor pattern. This allows us to have code that can interrogate any instance of the structure without knowing ahead of time what it is.

This is very useful since getTypeDecls() returns a collection of arbitrary Tree types, so we don’t know at this point what we’re dealing with:

for (Tree tree : compilationUnitTree.getTypeDecls()) {
    tree.accept(new SimpleTreeVisitor() {
        @Override
        public Object visitClass(ClassTree classTree, Object o) {
            System.out.println("Found class: " + classTree.getSimpleName());
            return null;
        }
    }, null);
}

We can also determine the type of our Tree instances by querying it directly. All of our Tree instances have a getKind() method that returns an appropriate value from the Kind enumeration. For example, class definitions will return Kind.CLASS to indicate that that’s what they are.

We can then use this and cast the value ourselves if we don’t want to use the visitor pattern:

for (Tree tree : compilationUnitTree.getTypeDecls()) {
    if (tree.getKind() == Tree.Kind.CLASS) {
        ClassTree classTree = (ClassTree) tree;
        System.out.println("Found class: " + classTree.getSimpleName());
    }
}

5. Class Details

Once we’ve got access to a ClassTree instance – however we manage that – we can start to interrogate this for details about the class definition. This includes class-level details such as the class name, the superclass, the list of interfaces, and so on.

We can also get the class members’ details – using getMembers(). This includes anything that can be a class member, such as methods, fields, nested classes, etc. Anything that you’re allowed to write directly into the body of a class will be returned by this.

This is the same as we saw with CompilationUnitTree.getTypeDecls(), where we can get a mixture of different types. As such, we need to treat it similarly, using the visitor pattern or the getKind() method.

For example, we can extract all of the methods out of a class:

for (Tree member : classTree.getMembers()) {
    member.accept(new SimpleTreeVisitor(){
        @Override
        public Object visitMethod(MethodTree methodTree, Object o) {
            System.out.println("Found method: " + methodTree.getName());
            return null;
        }
    }, null);
}

6. Method Details

If we wish, we can interrogate the MethodTree instance to get more information about the method itself. As we’d expect, we can get all the details about the method signature. This includes the method name, parameters, return type, and throws clause, but also details like generic type parameters, modifiers, and even – in the case of methods present in annotation classes – the default value.

As always, everything we’re given here is a Tree or some subclass. For example, the method parameters are always VariableTree instances because that’s the only legal thing in that position. We can then treat these as any other part of the source file.

For example, we can print out some of the details of a method:

System.out.println("Found method: " + classTree.getSimpleName() + "." + methodTree.getName());
System.out.println("Return value: " + methodTree.getReturnType());
System.out.println("Parameters: " + methodTree.getParameters());

Which will produce output such as:

Found method: ExtractJavaLiveTest.visitClassMethods
Return value: void
Parameters: ClassTree classTree

7. Method Body

We can go even further than this, though. The MethodTree instance gives us access to the parsed body of the method as a collection of statements.

This, more than anywhere else in the API, is where the fact that everything is a Tree really benefits us. In Java, there are a variety of statements that have special details about them, which can even include some statements containing other statements.

For example, the following Java code is a single statement:

for (Tree statement : methodTree.getBody().getStatements()) {
    System.out.println("Found statement: " + statement);
}

This statement is an “Enhanced for loop” and consists of:

  • A variable declaration – Tree statement
  • An expression – methodTree.getBody().getStatements()
  • A nested statement – The block containing System.out.println(“Found statement: ” + statement);

Our JavaCompiler represents this as an EnhancedForLoopTree instance, which gives us access to these different details. Every different type of statement that can be used in Java is represented by a subclass of StatementTree, allowing us to get the pertinent details back out again.

8. Future Proofing

Java pays a lot of attention to backward compatibility. However, forward compatibility is less well managed. This means it’s possible to have Java code that uses syntax our program doesn’t expect. For example, Java 5 introduced the enhanced for loop. We’d be surprised to see one of these if we were expecting code older than that.

However, all this means is that we must be prepared for Tree instances that we might not expect. Depending on exactly what we’re doing, this might be a serious concern, or it might not even be an issue. In general, though, we should be prepared to fail if we’re trying to parse Java code from a version newer than we’re expecting.

9. Conclusion

We’ve seen how to use the JavaCompiler API to parse some Java source code and get information from it. In particular, we’ve seen how to get from the source file to the individual statements that make up method bodies.

You can do much more with this API, so why not try some of it out yourself?

As always, all of the code from this article can be found over on GitHub.

       

Viewing all articles
Browse latest Browse all 4535

Trending Articles