Java Mission Control and Flight Recorder for OpenJDK 11

Because I couldn’t find it anywhere, I’ll just leave it here: to use Flight Recorder with OpenJDK 11 add

-XX:+FlightRecorder

to your command line. The free version of Mission Control seems not to be ready yet but early access builds can be found here http://jdk.java.net/jmc/

Cloud Benchmarking JVMs

So, I made a mistake. I have been saying for a while that I’m not so sure that Java is actually really good for microservices, because of the warm up time for the JIT and the high memory overhead. What I realized on JAX in London was that what I actually meant is that I’m not so sure that Java is actually a good fit for the Cloud, because of the above reasons. But apparently everyone is by now aware of that.

So my mistake was basically to look at pure request performance, which actually does not matter anymore, because Cloud. You can scale horizontally as far as you want. The real question is how low you can go with the memory.

I spend some time in summer hand optimizing JVM memory limits for small microservices. And even then I realized that with the way HotSpot is configuring its regions you always have a lot of memory that is just sitting there idle. Even if you can serve all the requests with your young generation and have a pretty static set of stuff in the old region you can not really tell that to HotSpot. And the issue is, with the cloud, this memory is just wasted money. Because you provision by memory and CPU, the CPU is pretty easy to over-commit, but the memory is not. It is reserved. It is there in your docker memory limit. It will determine how many machines you actually have to run.

So I took my example that I have been benchmarking and was really mean to it, I just gave it 128 MB of RAM. These are the results for HotSpot JDK 11:

Summary:
Total: 10.0060 secs
Slowest: 0.1229 secs
Fastest: 0.0007 secs
Average: 0.0284 secs
Requests/sec: 1758.2388


Response time histogram:
0.001 [1] |
0.013 [11039] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.025 [2177] |■■■■■■■■
0.037 [25] |
0.050 [1] |
0.062 [0] |
0.074 [0] |
0.086 [1092] |■■■■
0.098 [2806] |■■■■■■■■■■
0.111 [447] |■■
0.123 [5] |


Latency distribution:
10% in 0.0033 secs
25% in 0.0048 secs
50% in 0.0090 secs
75% in 0.0238 secs
90% in 0.0927 secs
95% in 0.0959 secs
99% in 0.1019 secs

I didn’t check but I’m pretty sure these outliers are GC, because with only 128 MB of visible memory the JVM will size its regions accordingly. I also bound it to only three of my HW Threads, in order to reduce the effects of Redis, which is bound to the fourth HW Thread. So this brings us down to ~1700 rps.

So for OpenJ9, which is supposed to use less memory, I would actually expect faster results, but that is not what I’m seeing:

Summary:
Total: 10.0754 secs
Slowest: 0.4198 secs
Fastest: 0.0361 secs
Average: 0.1554 secs
Requests/sec: 319.7897


Response time histogram:
0.036 [1] |
0.074 [270] |■■■■■■■■■■■
0.113 [843] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.151 [226] |■■■■■■■■■
0.190 [642] |■■■■■■■■■■■■■■■■■■■■■■■■■
0.228 [1015] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.266 [159] |■■■■■■
0.305 [46] |■■
0.343 [9] |
0.381 [4] |
0.420 [7] |


Latency distribution:
10% in 0.0786 secs
25% in 0.0884 secs
50% in 0.1731 secs
75% in 0.2012 secs
90% in 0.2208 secs
95% in 0.2362 secs
99% in 0.2853 secs

With 256MB the results look different, for OpenJ9 we are back to a more reasonable amount of requests:

Summary:
Total: 10.0068 secs
Slowest: 0.0500 secs
Fastest: 0.0007 secs
Average: 0.0086 secs
Requests/sec: 5813.5583


Response time histogram:
0.001 [1] |
0.006 [13041] |■■■■■■■■■■■■■■■■■
0.011 [31274] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.016 [10533] |■■■■■■■■■■■■■
0.020 [2466] |■■■
0.025 [592] |■
0.030 [197] |
0.035 [30] |
0.040 [35] |
0.045 [4] |
0.050 [2] |


Latency distribution:
10% in 0.0045 secs
25% in 0.0059 secs
50% in 0.0078 secs
75% in 0.0104 secs
90% in 0.0135 secs
95% in 0.0160 secs
99% in 0.0222 secs

For HotSpot:

Summary:
Total: 10.0102 secs
Slowest: 0.0655 secs
Fastest: 0.0006 secs
Average: 0.0068 secs
Requests/sec: 7344.8957


Response time histogram:
0.001 [1] |
0.007 [46179] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.014 [24111] |■■■■■■■■■■■■■■■■■■■■■
0.020 [2889] |■■■
0.027 [284] |
0.033 [43] |
0.040 [10] |
0.046 [1] |
0.053 [0] |
0.059 [1] |
0.066 [5] |


Latency distribution:
10% in 0.0033 secs
25% in 0.0045 secs
50% in 0.0061 secs
75% in 0.0084 secs
90% in 0.0112 secs
95% in 0.0133 secs
99% in 0.0176 secs

And HotSpot smashed it again. So I guess I have to dig deeper, because I would expect OpenJ9 to perform better. I’ll keep you posted.

Benchmarking JVMs Vol II

A while ago I tried out OpenJ9 compared to HotSpot when it comes to the JZenith Redis example app. Then I did some more optimizations and got it up to 7k requests per second.

I was talking the other day about the new Oracle license system for the JDK and the name Azul was mentioned, then I remembered that there is another JVM out there: Zing. So in addition to my last post here are the numbers for Zing, startup times are comparible to HotSpot, but the request numbers are not so nice:

Summary:
Total: 10.0109 secs
Slowest: 0.0469 secs
Fastest: 0.0007 secs
Average: 0.0101 secs
Requests/sec: 4910.9484


Response time histogram:
0.001 [1] |
0.005 [8930] |■■■■■■■■■■■■■■■■■■■■
0.010 [17706] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.015 [14121] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.019 [5621] |■■■■■■■■■■■■■
0.024 [1791] |■■■■
0.028 [658] |■
0.033 [208] |
0.038 [86] |
0.042 [33] |
0.047 [8] |


Latency distribution:
10% in 0.0038 secs
25% in 0.0064 secs
50% in 0.0095 secs
75% in 0.0129 secs
90% in 0.0168 secs
95% in 0.0198 secs
99% in 0.0271 secs

This is basically the same as in my unoptimized HotSpot benchmark. Even though it is not complaining about not being able to use the native transport I suspect that Netty is just more optimized for HotSpot.

Optimizing Vert.x request throughput

While benchmarking HotSpot against OpenJ9 I realised that 5k requests per second are nice enough, but that there might still be some room for optimization.

Vert.x has two ways to improve request performance, the native transport and the amount of Verticles (i.e. concurrency) you allow for requests. So I started playing around with that.

Native Transport

So Vert.x has this ability to use different transport implementations, that basically replace the Netty Channel based event loop with something EPoll based.

In order to enable that you just have to add

<dependency>
    <groupId>io.netty</groupId>
    <artifactId>netty-transport-native-epoll</artifactId>
    <version>4.1.19.Final</version>
    <classifier>linux-x86_64</classifier>
</dependency>

to your pom.xml. Make sure that the version matches the version of Netty Vert.x is using right now.

In order to prefer the native transport you have to set a Vert.x Option:

new VertxOptions().setPreferNativeTransport(true)

It also gives you some more options to play with the underlying TCP stack:

final HttpServerOptions options = new HttpServerOptions()
        .setTcpFastOpen(true)
        .setTcpNoDelay(true)
        .setTcpQuickAck(true);

So let’s see how that performs:

Summary:
Total: 10.0067 secs
Slowest: 0.0259 secs
Fastest: 0.0027 secs
Average: 0.0091 secs
Requests/sec: 5494.8290


Response time histogram:
0.003 [1] |
0.005 [43] |
0.007 [401] |
0.010 [49262] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.012 [3198] |■■■
0.014 [938] |■
0.017 [1033] |■
0.019 [74] |
0.021 [26] |
0.024 [7] |
0.026 [2] |


Latency distribution:
10% in 0.0081 secs
25% in 0.0085 secs
50% in 0.0089 secs
75% in 0.0093 secs
90% in 0.0097 secs
95% in 0.0107 secs
99% in 0.0151 secs

On average it gives you a plus of 300 requests per second. It is faster, but actually not much.

Verticle deployments

Vert.x has multiple event loops, in the default configuration as many as visible cores. So if you start your HttpServer in a Verticle, you can scale it on the same machine with the amount of instances that you deploy. Aligning that with the number of hardware threads that your system provide is in general a good option.

Vert.x is actually reusing the port binding, so it allows you to deploy multiple Verticles that bind to the same port, as long as they do the same thing that is generally not a problem.

vertx.deployVerticle(() -> new AbstractVerticle() {
            @Override
            public void start(Future<Void> startFuture) {
                vertx.createHttpServer(options)
                        .requestHandler(handler)
                        .listen(restConfiguration.getPort(), restConfiguration.getHost(), ar -> {
                            if (ar.succeeded()) {
                                startFuture.complete(null);
                            } else {
                                startFuture.fail(ar.cause());
                            }
                        });
            }
        }, new DeploymentOptions().setInstances(Runtime.getRuntime().availableProcessors()),
        completableHandler.handler());

I’m using availableProcessors(), which since Java 10 even takes visible processors through CGROUPs into account, meaning that inside your CPU limited Docker container you only get as many instances as you can utilize CPU.

Summary:
Total: 10.0053 secs
Slowest: 0.0313 secs
Fastest: 0.0006 secs
Average: 0.0070 secs
Requests/sec: 7096.2075


Response time histogram:
0.001 [1] |
0.004 [7472] |■■■■■■■■■■
0.007 [30415] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.010 [21795] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.013 [7789] |■■■■■■■■■■
0.016 [2382] |■■■
0.019 [701] |■
0.022 [281] |
0.025 [94] |
0.028 [49] |
0.031 [21] |


Latency distribution:
10% in 0.0036 secs
25% in 0.0049 secs
50% in 0.0065 secs
75% in 0.0086 secs
90% in 0.0110 secs
95% in 0.0129 secs
99% in 0.0176 secs

That gives another 2.5k requests per second and actually fully loads my poor Laptop, so that massively increases throughput up to 7k on HotSpot.

A quick look at OpenJ9 shows that it is consistently 1k/rps slower, so I guess most of the high performance stuff is just better optimized for HotSpot.

Benchmarking JVMs

While playing around with jZenith I realized that I somehow ignored OpenJ9 up to now. So I wanted to see if it actually makes a difference for a small test case.

As I wrote basically the same app over and over again in jZenith to play with different integrations I wanted to see how it looks for the example app for the Redis Plugin.

I use hey nowadays for REST benchmarks, because they draw these nice histograms.

Just a couple of quick numbers for further reference (and for when it finally runs with GraalVM). All numbers from my Laptop (Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz, 12 GB of RAM)

openjdk version "11" 2018-09-25
OpenJDK Runtime Environment (build 11+24-Ubuntu-118.04)
OpenJDK 64-Bit Server VM (build 11+24-Ubuntu-118.04, mixed mode, sharing)

Startup time is fairly consistent around 1.7 seconds, with JVM startup adding 1 second (!) of overhead in warm-up with 60 seconds of requests, then:

hey -z 10s http://localhost:8080/user/e01afce1-cf1d-49ab-a78d-53e5ca1032ad

Summary:
Total: 10.0086 secs
Slowest: 0.0401 secs
Fastest: 0.0038 secs
Average: 0.0098 secs
Requests/sec: 5103.0285


Response time histogram:
0.004 [1] |
0.007 [254] |
0.011 [43529] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.015 [5071] |■■■■■
0.018 [1446] |■
0.022 [518] |
0.026 [143] |
0.029 [79] |
0.033 [31] |
0.036 [1] |
0.040 [1] |


Latency distribution:
10% in 0.0084 secs
25% in 0.0088 secs
50% in 0.0091 secs
75% in 0.0096 secs
90% in 0.0127 secs
95% in 0.0142 secs
99% in 0.0193 secs

So 5k requests per second is quite good I’d say.

Next OpenJ9:

openjdk version "11" 2018-09-25
OpenJDK Runtime Environment AdoptOpenJDK (build 11+28-201810022340)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.10.0-rc2, JRE 11 Linux amd64-64-Bit 20181002_42 (JIT enabled, AOT enabled)
OpenJ9 - e44c4716
OMR - 32df9563
JCL - e80f5bd084 based on jdk-11+28)

Startup is slightly faster (measured internally) but the VM overhead is more around 1.5 secs.

Summary:
Total: 10.0055 secs
Slowest: 0.0378 secs
Fastest: 0.0030 secs
Average: 0.0116 secs
Requests/sec: 4313.1229


Response time histogram:
0.003 [1] |
0.007 [8] |
0.010 [961] |■
0.013 [40454] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.017 [738] |■
0.020 [52] |
0.024 [817] |■
0.027 [111] |
0.031 [2] |
0.034 [5] |
0.038 [6] |


Latency distribution:
10% in 0.0105 secs
25% in 0.0109 secs
50% in 0.0112 secs
75% in 0.0117 secs
90% in 0.0124 secs
95% in 0.0132 secs
99% in 0.0226 secs

So in this scenario HotSpot is definitely faster, but I guess measuring against a nightly of OpenJ9 is actually not really fair. We will see how it performs against Graal.

jZenith – an opinionated approach to building modern Java Microservices

So, after teasing about a Spring Boot for Vert.x I started coding a little bit. The result now runs under the name of jZenith.

It is a simple prototype with an example application that does simple CRUD on a simple entity. But it already has a lot of things that I think are needed, plus some bugs and some nice technologies.

The overall app setup currently looks like this:

JZenith.application(args)
       .withPlugins(
         RestPlugin.withResources(UserResource.class)
                   .withMapping(NoSuchUserException.class, 404),
         PostgresqlPlugin.create()
       )
       .withModules(new ServiceLayerModule(), new PersistenceLayerModule(), new MapperModule())
       .withConfiguration("postgresql.database", "test")
       .withConfiguration("postgresql.username", "test")
       .withConfiguration("postgresql.password", "test")
       .run();

I exchanged a lot of the technologies I used over the last years, basically it is Guice based, which I still prefer to any other DI framework. On the Rest side it uses Resteasy, which someone thankfully pointed out to me has native support for RxJava based Resources, making for nice resources like this:

public Single<UserResponse> getUser(@NonNull @PathParam("id") final UUID id) {
    return userService.getById(id)
                      .map(userMapper::mapToUserResponse);
}

I’m basically using it right now to play around with other technologies and having a little fun with low-level stuff. It is a great learning experience. Most of the stuff is actually just glue code between the different frameworks. The only thing that is actually some code is the configuration system.

The next steps will be to get some of the kinks out of the libraries that I’m using and put a better abstraction on SQL based databases, then maybe one of the other databases like Cassandra. And I need to write more tests.

And it already has a website, thanks to github pages (jzenith.org) and a logo. Setting up a project has never been easier, unless the fact that it is really hard to find free .org domains nowadays.

No state, no borders

I have a lot of formal education in Object Oriented Programming: a couple of university courses (one even featuring Eiffel), some books and in general a lot of reading. One thing that comes to short in my opinion is the fact that Object Orientation is all about reducing the state space of a program.

In general, there is a lot of focus in OO about carefully crafting your objects, finding the right abstractions and the right hierarchy for the domain you are trying to model. Given the fact that I had only a handful of cases in my entire career where this could be applied I think that this is the wrong focus. We mostly build layered applications, with pretty simple domain objects that not very often call for very clever design.

What actually should be the focus is the number of different states any given object can have. There are a couple of rules of thumb that I came to believe in while programming. I think a lot of them are actually influenced by Josh Bloch’s excellent Effective Java and one of my all time favorite OO article Object Calisthenics.

1. Use Classes to give names to things

An idea that comes from Object Calisthenics is to create a class for everything, which in general is a good idea. Strings, Lists and other basic library types are not a thing in itself. They are just the very basic building blocks for your program. You rarely just have a String of random characters, you have a Name, an Id, a Title, a Position. You almost always want the additional security of a specific type that signifies what this String of characters actually represents. This also gives you the added value of having a defined state and creating Null Objects in case a value is missing. This also makes for easier readable signatures and return values. Plus you can attach additional functionality to that object and unit test it.

2. Have a way to create the object for all defined states

In a language like Java, where constructors are unnamed, I generally believe in having static constructors. They give you the ability to give a name to what you are actually doing, adding more semantics to the object creation than just “new”.

They also allow you to offload the hard computations in a static method while the constructor in itself only makes sure that everything is in a valid state and immutable. Use Guavas Preconditions or Lombok to make sure you know the nullability of all your fields.

Plus you can have a defined Null Object for your class that comes in handy when you do need it.

3. Have only defined state transitions

I’m a huge believer in immutability and with it comes the need for creating explicit state transitions and copies for all changes. This also makes sure that you can only transition from one valid state to another and don’t have something in your program that is in an unexpected state.

In Java that obviously leads to a lot of verbosity, because you have to write a lot of code that actually validates state and ensures immutability, but it is usually worth it if you try to understand what the code does, at 4am, after a long night out, slightly drunk and with a cold coming up 😉

The V word – why is verbosity considered bad?

I came across an article the other day that was praising the fact that Java becomes less and less verbose due to the recent changes to the language. And I have a confession to make: I don’t think that this is necessarily a good thing.

I think the craft of programming has in general a lot of problems that come from its roots in mathematics and the genius myth. Programmers have grown up to be these mythical creatures that sit in the basement, don’t like the sunlight and write cryptic codes on white on black screens. And are, all of them, geniuses. That needs to go away.

One of the things that I usually say about programming is that you usually do it when you are well rested and the best of yourself, after a full night of sleep, the necessary coffee (or water in my case) provided and in general able to do deep concentrated work. The problem is that you do emergency maintenance at 4 a.m., when you just caught an hour of sleep after a long night out, slightly drunk, with a cold coming and usually the worst of yourself. So you should aim at this level of mental capacity. That’s a variation of the theme that one should write code for the future self, slightly more radical.

And here comes my problem with the verbose is bad myth. If you get a very verbose explanation of something that is usually a good thing. If the information you get is very verbose that is usually also a good thing. Only in code verbosity becomes a bad thing.

One example of this reduced verbosity is type inference. In general it has its use-cases, but if you have type inference throughout your whole code base you often lack necessary local information. Sure my IDE helps, but that is somehow hindering my flow while reading code. I have a runtime in my head that can understand Java code, but obviously I can’t hold the whole program. So I need local points to start my reading and understanding and if that is interrupted too much by looking up things it becomes just to hard. I know that the compiler is super clever and can infer all those types, but can I? The answer is no in a lot of places.

Another example is mutability. Of course it is annoying to write final in front of everything, but it conveys so much of the original authors intention, what she had in mind when she wrote that piece of code that it is a very helpful piece of information. That programming (be it object-oriented or functional does not matter) is mostly about reducing the amount of possible stats is a whole article in itself, but sometimes those verbose parts of the Java Programming Language are actually good.

In general I would expect anyone to put as much information in the few lines they are writing as possible. Of course that is more to type, but typing speed is in general not the limiting factor of programmer productivity. And yes, code is read a lot more times than it is written, and even if it is slightly more to read it might help the reader to understand all the state and all the implications faster, more efficient and with fewer errors. Verbosity in code has to become a positive aspect again.

JSR 311 and RxJava 2 in “Vert.x Boot”

Regarding my earlier posts on something Spring Boot like for Vert.x and after my evaluation of Vert.x Zero Up there is one idea that I still would like to try out, basically combining JSR 311 Resources with RxJava 2 directly on the resource level.

Usually resource code would look like this:

@GET
@Path("example")
@Produces(MediaType.TEXT_PLAIN)
public void getExample(@Suspended final AsyncResponse asyncResponse) {
    Single.just("Hello World").subscribe(asyncResponse::resume, asyncResponse::resume);
}

There are two boilerplaty things in here that I really would like to get rid of, the start of the Rx flow and the actual response subscription.

From my point of view it would make sense to abstract that away and enable rx flows directly:

@GET
@Path("example")
@Produces(MediaType.TEXT_PLAIN)
public Single<String> getExample() {
    return Single.just("Hello World");
}

@POST
@Path("examplePost")
@Produces(MediaType.TEXT_PLAIN)
public Single<String> postExample(Single<ExampleBody> request) {
    return request.map(service::doSomething).map(mapper::mapResponse);
}

Let’s see if there is way to generate the boilerplate around that and use rxified resources.

Vert.x Zero Up?

Regarding my former post, Vert.x Boot, I was told that there is actually a similar project rising:

Vertx Zero https://t.co/0ngUlTowm6

— Jose (@JARP80) August 10, 2018

Vert.x Zero Up is apparently flying mostly under the radar, at least I have not heard anything about it yet. So it is time to check it out, I don’t really have time to play with it right now, but maybe I can see from the docs if it is going into the direction that I want to go.

import io.vertx.up.VertxApplication;
import io.vertx.up.annotations.Up;

@Up
public class Driver {

    public static void main(final String[] args) {
        VertxApplication.run(Driver.class);
    }

}

That seems close enough to what Spring Boot is doing. I’m just wondering where the @Up annotation is needed for. Also using the io.vertx package namespace is a bold choice.

In my imagination that would be more like:

public class Application {

    public static void main(final String... args) {
        VertxBoot.application(args)
                 .withResources(UserResource.class, AvatarResource.class)
                 .withModules(SqlDatabaseModule.class, RedisModule.class)
                 .withModules(BusinessLayerModule.class, DaoLayerModule.class)
                 .bind(8080);
    }

}

A clear entry point for the application, that shows you what kind of features are used and what the structure of the application is.

I definitely don’t understand everything that Vert.x Zero Up does, apparently there are multiple modes to run an application, they all look the same code wise but require differnt YAML configuration files. It comes with JSR 311 support, which seems to be a custom implementation, something that would come for free when using Vert.x Jersey. There are some nice additions though, it allows @StreamParam with Vert.x own Buffer class, it requires some custom annotations though, and does classpath scanning, something that I’d really love to avoid due to startup performance reasons, we want to have micro services after all.

There is an interesting concept in there about offloading event loop work to worker threads with annotations (@Address) via the Vert.x event bus. I’m not a huge fan of this implicit coupling, that might get out of hand quite quick. On the other hand we are talking about micro services, so there should only be some of those bridges in any given service. I’m wondering if someone could achieve something similar that is transparent but still uses one given rx2 flow that would be easier to follow.

In general I’m pretty amazed of the amount of documentation there is, that is pretty good. Though a lot of it feels somehow generated or auto-translated but it is pretty excessive.

JSR 303 is supported, but only inside the system and not for request objects as far as I understand the documentation. I think that would be an essential part.

I still need to check if all the JSR 311 and JSR 330 support is custom, because I strongly believe that one should use existing libraries for that, otherwise maintenance will be a hell. From what I see from the docs it looks like that, because there are given limitations e.g. for JSR 330 as everything is a singleton.

I will play with it for an afternoon, but from what I’ve seen so far I want to go in a different direction and thus it may be still worth to spend some time to build a prototype of the framework I imagine.

BTW, I’m still looking for a good name for Vert.x Boot that does not collide with Vert.x and Spring Boot.

blog.thiesen.org

a geek life