GraalVM native executables can run faster than Scala/Java/JVM applications, with much less memory consumption

In two small tests I ran where GraalVM was able to create a native executable, the native executable ran significantly faster than the equivalent Scala/Java code running with the Java 8 JVM, and also reduced RAM consumption by a whopping 98% in a long-running example. On the negative side, GraalVM currently doesn’t seem to work with Swing applications.

Background/Intro

Over the past five years I’ve been writing Scala shell scripts — rather than scripts using Unix tools or other scripting languages — as a means of pushing the boundaries of what can be done with “Scala in the small.” This recently culminated in creating my own Scala ‘Sed’ library.

While I’ve enjoyed doing that, one thing that always bothers me with this approach is the JVM startup lag. Where binary executables start running immediately, whenever I start a JVM script I always feel that slow startup lag time.

This past week I finally decided to take GraalVM (which I’ll call Graal in this article) out for a spin, and as a means of creating native executables from command-line Scala (or Java) classes and JAR files, it looks like a big win.

Test system information

I conducted the following tests on a 2013 MacBook Pro running macOS Mojave (10.14.5), with a 2.3 GHz Intel Core i7 with 16 GB RAM and a SSD drive. The Scala version is 2.12.8, the Java version is OpenJDK 1.8.0_222, and the GraalVM version is 19.1.1. The tests were run on July 20 & 21, 2019.

Creating native executables

After you get Graal and its native-image command installed, creating native executables with Graal is pretty easy. If you have a single Java class file named Find.class and it has a main method, you can create an executable named find with this command:

$ native-image Find

Note that the output filename is all lowercase. I assume the output is named find.exe on Windows’ systems, but I don’t know for sure.

To create an executable from a self-contained JAR file — meaning a JAR file you can run with java -jar — use this command:

$ native-image -jar Hyde.jar

If you have a JAR file that needs other resources, create a native image by supplying the necessary classpath:

// create a jar file from a scala class
$ scalac RenumberAllMdFiles.scala -d RenumberAllMdFiles.jar

// turn the jar file into a native executable
$ native-image -cp $SCALA_HOME/lib/scala-library.jar:RenumberAllMdFiles.jar RenumberAllMdFiles

Note that this last example also creates a lowercase executable file named renumberallmdfiles. (Insert sad emoji face here for tools that rename my stuff.)

Test 1: Modifying 55 files with my Sed library

For my first test, I tested running a Scala JAR file I had without using Graal, i.e., using the scala command and therefore the JVM:

$ time scala -cp $SCALA_HOME/lib/scala-library.jar:RenumberAllMdFiles.jar

Then I created a native image of that JAR file, renamed it back to a camelcase name, and ran it like this:

$ time ./RenumberAllMdFiles

Per the Unix time tool, the results were an order of magnitude difference in Graal’s favor:

java/JVM     Graal
--------   --------
0m0.727s   0m0.058s

I could tell that the Graal executable was faster, but it took me a moment to realize there was a leading 0 before that 58:

0m0.058s
    -

92% faster run time. Very cool.

And yes, I verified that the results were the same.

Test 2: A longer-running application

Getting rid of that startup lag time felt like a huge win for my Scala shell-script life. Next, I wanted to see a bigger test.

I don’t have any scripts that take a long time to run, so looked around and found a Java file-finding class on this Oracle page and decided to put it to the test with Graal. I copied their code into a .java file, compiled it to a .class file, then created a Graal executable from the class file with this command:

$ native-image Find

After a quick bit of research to find something that would run for a while, I decided to search my ~/Projects directory — which contains over 500,000 files — for files named CatoGui.scala. This time I ran the .class file with the java command, and then ran the Graal native executable, and the time results were:

java/JVM    Graal
--------   -------
 27.087s   16.433s

If you prefer images:

Scala vs GraalVM performance

This was a real surprise. I expected Graal to be a little faster due to the reduced startup lag time, but it was a whopping 39% faster in a long-running script. How could this be?

Note: I re-ran these tests multiple times to verify them, including rebooting my laptop several times.

More details on Test 2

There are probably many ways to determine why Graal is so much faster for a use-case like this — and feel free to comment on those below — but I learned that a simple approach is to re-run the tests with the time -l command. Excluding the actual search results, here’s the output from that time command using the java/class approach:

$ /usr/bin/time -l java Find /Users/al/Projects -name "CatoGui.scala"
       27.85 real   2.31 user    14.34 sys
 109424640  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
     31133  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         2  signals received
     63223  voluntary context switches
      6407  involuntary context switches

And here are the results from the Graal native image:

$ /usr/bin/time -l find /Users/al/Projects -name "CatoGui.scala"
       16.15 real   0.59 user    6.94 sys
   1986560  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
       496  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
     52545  voluntary context switches
      2293  involuntary context switches

I’m not a performance-tuning expert, but one thing immediately stands out:

java:   109,424,640  maximum resident set size
graal:    1,986,560  maximum resident set size

I’ve seen multiple definitions for what the units of “maximum resident set size” are, but whatever the heck those units are, the java command with the .class file requires more than 55 times the memory that the Graal executable requires. The “page reclaims” required by the java version are also higher by a factor of almost 63x.

About the memory results

Despite some things I’ve read on the internet, it appears that those “maximum resident set size” units are bytes, and the Java version requires 104MB, while the Graal version requires only 2MB(!).

Java vs GraalVM memory use

Two things I can say for sure are that (a) the rss field of the ps command is documented and states that its results are in kilobytes, and it showed ~105572 when I ran it with the java/class command running, which is 103MB; also, at about the same time, (b) the RES field of the htop command shows 104M for the java/class command.

As a bit of proof, here’s an image of the htop screen for the java/class command:

htop showing memory use for the Java/JVM test

And here’s an htop image taken when the Graal executable was running, showing remarkably little memory use:

htop showing memory use for the GraalVM test

If you want to repeat these tests on your system, Mac users can install htop with Homebrew, and this is the ps command I ran:

$ ps -awxm -o %mem,rss,comm | sort -nr | grep java

You can also put that code in a loop if you want:

while true
do
    ps -awxm -o %mem,rss,comm | sort -nr | grep java
    sleep 2
done

For the java/class/jvm command, that shows output like this:

0.4  68204 /usr/bin/java
0.5  91156 /usr/bin/java
0.6 105780 /usr/bin/java
0.6 106140 /usr/bin/java
0.6 106220 /usr/bin/java
.
.
.

As a final note, you can also use -Xmx with your java command to put a cap on the maximum memory used:

$ time java -Xmx512M Find /Users/al/Projects -name "CatoGui.scala"

On my system this had no effect on the run time.

Please note that my thought process in using this Find.java code was, “What code can I find that I can run from the command-line that will take a while to run,” and not something like, “What code can I find where Graal can reduce its memory use by 98%.”

GraalVM native-image notes

One thing to know about Graal is that when you run the native-image command it takes a while to compile your class or JAR file to a native executable. Here’s the output from running native-image on Find.class:

$ native-image Find
Build on Server(pid: 53748, port: 55574)*
[find:53748]    classlist:   1,491.50 ms
[find:53748]        (cap):   2,155.80 ms
[find:53748]        setup:   3,447.06 ms
[find:53748]   (typeflow):   2,380.97 ms
[find:53748]    (objects):   1,613.12 ms
[find:53748]   (features):     291.77 ms
[find:53748]     analysis:   4,370.96 ms
[find:53748]     (clinit):      97.72 ms
[find:53748]     universe:     796.64 ms
[find:53748]      (parse):     422.46 ms
[find:53748]     (inline):     943.66 ms
[find:53748]    (compile):   4,635.51 ms
[find:53748]      compile:   6,337.11 ms
[find:53748]        image:     552.88 ms
[find:53748]        write:     195.97 ms
[find:53748]      [total]:  17,379.96 ms

As shown, it takes a little over 17 seconds to create a native image on my system.

A second thing to know is that this command starts a background server by default, and keeps it running after the command is finished. That server consumes over 1GB RAM, so you’ll want to stop/kill it. When it’s running, this command:

ps auxw | grep graalvm

shows a result like this:

al    53748   0.0  7.8 13148200 1307096   ??  S     1:43PM   1:28.48 /Users/al/bin/graalvm-ce-19.1.1/ ... much more here ...

After a while I learned that you can run native-image without the server, like this:

$ native-image --no-server Find

For much more information, here’s a link to the GraalVM native-image command.

No joy for Swing apps :(

In sad news, there is no native-image joy for Swing applications. I’ve created several Swing apps that I use, and anything to make them start faster and use less RAM sounded awesome, but sadly they won’t compile with native-image:

$ native-image JFrameExample
Build on Server(pid: 18953, port: 55329)
.
.
.
Warning: Aborting stand-alone image build. Unsupported features in 2 methods
Detailed message:
Error: Detected a started Thread in the image heap. 
Threads running in the image generator are no longer running at image run time. 
The object was probably created by a class initializer and is reachable from a 
static field...

Trace:  object sun.awt.AWTAutoShutdown
    method sun.awt.AWTAutoShutdown.getInstance()
Call path from entry point to sun.awt.AWTAutoShutdown.getInstance(): 
    at sun.awt.AWTAutoShutdown.getInstance(AWTAutoShutdown.java:133)
    at java.awt.EventQueue.detachDispatchThread(EventQueue.java:1137)
    at java.awt.EventDispatchThread.run(EventDispatchThread.java:88)
    at com.oracle.svm.core.thread.JavaThreads.threadStartRoutine(JavaThreads.java:460)
    at ...
Trace:  object sun.java2d.opengl.OGLRenderQueue
    field sun.java2d.opengl.OGLRenderQueue.theInstance

Somebody else already filed a bug report on this, so (fingers crossed) maybe it will be fixable in the future.

You can also read more about GraalVM’s current limitations on this Github page.

More information

I hope all of that was helpful. It sure seems promising for my Scala shell-script world.

For more information on GraalVM, see the GraalVM website.

Also, I haven’t watched it yet, but I know that this ScalaDays 2018 video talks about Twitter using Graal with their microservices.