Quantcast
Channel: Baeldung
Viewing all articles
Browse latest Browse all 4616

Introduction to SootUp

$
0
0

1. Introduction

In this article, we’ll examine the the SootUp library. SootUp is a library for performing static analysis on JVM code, using either the original source code or compiled JVM bytecode. It’s a complete overhaul of the Soot library and aims to be more modular, testable, maintainable, and usable.

2. Dependencies

Before using SootUp, we need to include the latest version in our build, 1.3.0 at the time of writing.

<dependency>
    <groupId>org.soot-oss</groupId>
    <artifactId>sootup.core</artifactId>
    <version>1.3.0</version>
</dependency>
<dependency>
    <groupId>org.soot-oss</groupId>
    <artifactId>sootup.java.core</artifactId>
    <version>1.3.0</version>
</dependency>
<dependency>
    <groupId>org.soot-oss</groupId>
    <artifactId>sootup.java.sourcecode</artifactId>
    <version>1.3.0</version>
</dependency>
<dependency>
    <groupId>org.soot-oss</groupId>
    <artifactId>sootup.java.bytecode</artifactId>
    <version>1.3.0</version>
</dependency>
<dependency>
    <groupId>org.soot-oss</groupId>
    <artifactId>sootup.jimple.parser</artifactId>
    <version>1.3.0</version>
</dependency>

We’ve got several different dependencies here, so what do they all do?

  • org.soot-uss:sootup.core is the core library.
  • org.soot-uss:sootup.java.core is the core module for working with Java.
  • org.soot-uss:sootup.java.sourcecode is the module for analysing Java source code.
  • org.soot-uss:sootup.java.bytecode is the module for analysing compiled Java bytecode.
  • org.soot-uss:sootup.jimple.parser is the module for parsing Jimple – the intermediate representation that SootUp uses for representing Java.

Unfortunately, there isn’t a BOM dependency available so we need to manage each version of these dependencies individually.

3. What is Jimple?

SootUp can analyze code in a number of different formats – including Java source code, compiled byte code, or even classes internal to the JVM itself.

To do this, it converts the various inputs into an intermediate representation known as Jimple.

Jimple exists to represent everything that can be done with Java source code or byte code, but in a way that’s easier to perform analysis on. This means that it’s deliberately different from both of those possible inputs in certain ways.

JVM bytecode is stack-based in the way that some values are accessed. This is highly efficient for runtime but is much more difficult for analysis purposes. The Jimple representation of the code converts this into being entirely variable-based instead. This can produce the exact same functionality while being much easier to understand.

Conversely, Java source code is also variable-based, but its nested structure also makes it harder to analyze. This is easier for developers to work with but harder for software tools to analyze. The Jimple representation of this converts it into a flat structure.

Jimple also exists as a language that we can read and write code in ourselves. For example, the Java source code:

public void demoMethod() {
    System.out.println("Inside method.");
}

can instead be written as Jimple as follows:

public void demoMethod() {
    java.io.PrintStream $stack1;
    target.exercise1.DemoClass this;
    this := @this: target.exercise1.DemoClass;
    $stack1 = <java.lang.System: java.io.PrintStream out>;
    virtualinvoke $stack1.<java.io.PrintStream: void println(java.lang.String)>("Inside method.");
    return;
}

This is much more verbose, but we can see that it has the same functionality. SootUp provides functionality for parsing and generating this Jimple code directly if we ever need to store and transform it in this format.

When we analyze our code, no matter what the original source was, it’ll be converted into this structure for us to work with. We’ll then be working with types such as SootClass, SootField, SootMethod, etc – that directly relate to this representation.

4. Analyzing Code

Before we can do anything with SootUp, we need to analyze some code. This is done by creating an appropriate instance of AnalysisInputLocation and constructing a JavaView around it.

The exact type of AnalysisInputLocation that we create depends on the source of the code that we want to analyze.

The simplest to use, but probably the least useful on its own, is to be able to analyze the classes in the JVM itself. We can do this with the JrtFileSystemAnalysisInputLocation class:

AnalysisInputLocation inputLocation = new JrtFileSystemAnalysisInputLocation();

More usefully, we can analyze a source file using OTFCompileAnalysisInputLocation:

AnalysisInputLocation inputLocation = new OTFCompileAnalysisInputLocation(
  Path.of("src/test/java/com/baeldung/sootup/AnalyzeUnitTest.java"));

This also has an alternative constructor for analyzing an entire list of source files in one go:

AnalysisInputLocation inputLocation = new OTFCompileAnalysisInputLocation(List.of(.....));

We can also use it to analyze source code that we have in memory as a String:

Path javaFile = Path.of("src/test/java/com/baeldung/sootup/AnalyzeUnitTest.java");
String javaContents = Files.readString(javaFile);
AnalysisInputLocation inputLocation = new OTFCompileAnalysisInputLocation("AnalyzeUnitTest.java", javaContents);

Finally, we can analyze already compiled bytecode. This is done using JavaClassPathAnalysisInputLocation, and we can point this at anything that can be considered a classpath – including JAR files or directories containing class files.

AnalysisInputLocation inputLocation = new JavaClassPathAnalysisInputLocation("target/classes");

There are also several other standard ways to access the code we want to analyze, including directly parsing Jimple representations, or reading Android APK files.

Once we’ve got our AnalysisInputLocation instance we can create a JavaView around it:

JavaView view = new JavaView(inputLocation);

This then allows us to access all of the types that exist in our input.

5. Accessing Classes

Once we’ve analyzed our code and built an instance of JavaView around it, we can start to access details about the code. This starts with accessing classes.

If we know the exact class that we’re after, we can access this directly using the fully qualified class name. SootUp uses various Signature classes to describe the elements that we want to access. In this case, we need a ClassType instance. Fortunately, we can easily generate one of these using the fully qualified class name by using an IdentifierFactory that SootUp makes available to us:

IdentifierFactory identifierFactory = view.getIdentifierFactory();
ClassType javaClass = identifierFactory.getClassType("com.baeldung.sootup.ClassUnitTest");

Once we’ve built our ClassType instance, we can use it to access the details of this class:

Optional<JavaSootClass> sootClass = view.getClass(javaClass);

This returns an Optional<JavaSootClass> because the class might not exist in our view. Alternatively, we have a getClassOrThrow() method that will directly return a SootClass – the superclass of JavaSootClass – but will throw an exception if the class isn’t available in our JavaView:

SootClass sootClass = view.getClassOrThrow(javaClass);

Once we’ve got a SootClass instance, we can use it to inspect the details of the class. This lets us determine details of the class itself, such as its visibility, whether it’s concrete or abstract, and so on:

assertTrue(classUnitTest.isPublic());
assertTrue(classUnitTest.isConcrete());
assertFalse(classUnitTest.isFinal());
assertFalse(classUnitTest.isEnum());

We can also navigate our parsed code, for example, by accessing the superclass or interfaces of our class:

Optional<? extends ClassType> superclass = sootClass.getSuperclass();
Set<? extends ClassType> interfaces = sootClass.getInterfaces();

Note that these return ClassType instead of SootClass instances. This is because there’s no guarantee that the actual class definitions are part of our view, just the names of them.

6. Accessing Fields and Methods

In addition to classes themselves, we can access the contents of classes, such as fields and methods.

If we’ve already got a SootClass available then we can query this directly to find the fields and methods:

Set<? extends SootField> fields = sootClass.getFields();
Set<? extends SootMethod> methods = sootClass.getMethods();

Unlike when we navigate from one class to another, this can safely return the entire representation of the field or method since they’re guaranteed to be in our view.

If we know exactly what we’re after, we can also go directly to it. For example, to access a field we only need to know its name:

Optional<? extends SootField> field = sootClass.getField("aField");

Accessing a method is slightly more complicated since we need to know both the method name and parameter types:

Optional<? extends SootMethod> method = sootClass.getMethod("someMethod", List.of());

If our method takes parameters then we need to provide a list of Type instances from our IdentifierFactory:

Optional<? extends SootMethod> method = sootClass.getMethod("anotherMethod",
  List.of(identifierFactory.getClassType("java.lang.String")));

This allows us to get the correct instance when we have overloaded methods. We can also list all of the overloaded methods that have the same name:

Set<? extends SootMethod> method = sootClass.getMethodsByName("someMethod");

As before, once we’ve obtained our SootMethod or SootField instance then we can use it to inspect the details:

assertTrue(sootMethod.isPrivate());
assertFalse(sootMethod.isStatic());

7. Analyzing Method Bodies

Once we’ve got hold of a SootMethod instance, we can use this to analyze the method body itself. This means the method signature, the local variables in the method, and the call graph itself.

Before we can do any of this, we need to access the method body itself:

Body methodBody = sootMethod.getBody();

Using this, we can now access all of the details of the method body.

7.1. Accessing Local Variables

The first thing we can do is access any local variables that are available within the method:

Set<Local> methodLocals = methodBody.getLocals();

This gives us access to every variable that’s accessible within the method. This list may not be what’s expected, it’s actually the list of variables from the Jimple representation of the method and so will include some additional entries from the parsing process and may not have the original variable names.

For example, the following method has 5 locals:

private void someMethod(String name) {
    var capitals = name.toUpperCase();
    System.out.println("Hello, " + capitals);
}

These are:

  • this.
  • I1 – the method parameter.
  • I2 – the variable “capitals”.
  • $stack3 – a local variable pointing to System.out.
  • $stack4 – a local variable representing “Hello, ” + capitals.

The $stack3 and $stack4 local variables are generated by the Jimple representation and are not directly present in the original code.

7.2. Accessing Method Statement Graph

In addition to the local variables, we can also analyze the entire method statement graph. This is the details of every statement that the method will perform:

StmtGraph<?> stmtGraph = methodBody.getStmtGraph();
List<Stmt> stmts = stmtGraph.getStmts();

This gives us a list of every statement that the method will perform, in the order that it will perform them. Each of these will implement the Stmt interface, representing something that the method can do.

For example, our earlier method will produce this:

This seems a lot more than the code we actually wrote – which was only two lines long. This is because this is the Jimple representation of our code instead. But we can break it down to see exactly what’s going on.

We start with two JIdentityStmt instances. These represent the values passed into our method – the this value and our I1 that we saw before as being our first parameter.

Next, we have three JAssignStmt instances. These represent assignments to variables within our method. In this case, we’re assigning the result of I1.toUpperCase() to I2, the value System.out to $stack3 and the result of “Hello, ” + I2 into $stack4.

After this, we have a JInvokeStmt instance. This represents invoking the println() method on $stack3, and passing it the value of $stack4.

Finally, we have a JReturnVoidStmt instance, which represents the implicit return at the end of the method.

This is a very simple method with no branching or control statements, but we can clearly see that everything the method does is represented here. The same is true for anything that we can achieve in a Java application.

8. Summary

This was a quick introduction to SootUp. There’s a lot more that we can do with this library. Next time you need to analyze any Java code, why not give it a try?

As usual, all of the examples from this article are available over on GitHub.

The post Introduction to SootUp first appeared on Baeldung.
       

Viewing all articles
Browse latest Browse all 4616

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>