Java Containerization Guide

I. System selection

About the most basic underlying image, usually most of us have only three choices: Alpine, Debian, CentOS; of these three for the operation and maintenance of the most familiar with the general CentOS, but unfortunately CentOS later no longer exists in a stable version, about its stability has been a mysterious problem; this is a matter of opinion, I Personally, I don’t use it if I can 😆.

Excluding CentOS, we’re only talking about Alpine or Debian; Alpine definitely wins from a mirror size point of view, but Alpine uses musl’s C library, which may have some compatibility issues with some deep dependencies on glibc. Of course, the depth of the dependency on glibc depends on the application, and I have only encountered a few font-related bugs with the OpneJDK in the official Alpine source so far.

On balance, my personal recommendation is that if the application has a deep dependency on glibc, for example containing some JNI-related code, then Debian or a Debian-based base image is a more stable choice; if there are no such heavy dependencies, then you can use Alpine when considering the size of the image. In fact, OpneJDK In fact, OpneJDK itself is not small, even if you use the Alpine version, after installing some common software, it will not be too small, so I personally used the Debian-based base image.

II. JDK OR JRE

Most people don’t seem to distinguish between JDK and JRE, so to be sure you need to understand what JDK and JRE are:

  • JDK: Java Development Kit
  • JRE: Java Runtime Environment

JDK is a development kit, it contains some debugging-related tool chains, such as javac, jps, jstack, jmap and other commands, which are necessary for debugging and compiling Java programs, and JDK as a development kit includes JRE; while JRE is only a Java runtime environment, it only contains some commands and commands necessary for running Java programs. JRE is only a Java runtime environment, it only contains some commands and dependent libraries that are necessary for running Java programs, so JRE is smaller and lighter than JDK.

If you only need to run Java programs such as a jar package, then JRE is sufficient; but if you want to capture some information at runtime for debugging, then you should choose JDK. **My personal habit is to use JDK as the base image to solve some production problems, and avoid the need to mount the JDK toolchain for debugging in some special cases. Of course, if there is no such need, and the size of the image is sensitive, then you can consider using JRE as the base image. **

Three, JDK selection

3.1, OracleJDK or OpenJDK

The choice between these two depends on one of the most direct questions: whether the application code uses Oracle JDK private APIs.

Usually “using these private APIs” means introducing some relevant classes, interfaces, etc. under the com.sun.* package, many of these APIs are private to Oracle JDK and may not be included at all in OpneJDK or have been changed. So if the code contains relevant calls then only Oracle JDK can be used.

It’s worth clarifying that in many cases the use of these APIs is not really a business requirement, it’s likely that the developer “slipped” when importing the package and it just so happens that the imported Classes and so on can also implement the corresponding functionality; for such imports can be replaced smoothly, for example with Apache Commons-related implementations . There is also a case where the developer found the import by mistake, but did not format the code and clean up the package, this will leave the relevant import reference in the code header, and Java allows such useless import; for this case, just reformat and optimize the import.

Tips: IDEA uses Option + Command + L (for formatting) and Control + Option + O (for automatic package import optimization).Tips.

3.2, OracleJDK rebuild problem

When there is no way to have to use Oracle JDK, it is recommended to download Oracle JDK package and write Dockerfile to create the base image. But this involves a core problem: Oracle JDK does not provide historical versions, so if you want to consider future rebuilding problems, it is recommended to keep the downloaded Oralce JDK package.

3.3, OpenJDK distribution

As we all know, OpenJDK is an open source distribution, based on open source protocols major vendors are providing some value-added services, but also pre-compiled some Docker images for our use; currently some of the mainstream distribution versions are as follows:

  • AdoptOpenJDK
  • Amazon Corretto
  • IBM Semeru Runtime
  • Azul Zulu
  • Liberica JDK

Some distributions may offer a wider choice of base images, for example, AdoptOpenJDK offers three base image distributions based on Alpine, Ubuntu, and CentOS; others offer other JVM implementations, for example, IBM Semeru Runtime offers a pre-compiled version of the OpenJ9 JVM. for example, IBM Semeru Runtime offers a pre-compiled version of the OpenJ9 JVM.

I personally like AdoptOpenJDK because it is community-driven, composed of JUG members and some vendors and other community members; Amazon Corretto and IBM Semeru Runtime are high end cloud players by name, and the usability is better. Others like Azul Zulu and Liberica JDK are JVM vendors, some of which are not recommended as they have a bit of black stuff.

AdoptOpenJDK has now been merged into Eclipse Foundation and is now called Eclipse Adoptium; so if you want to use the AdoptOpenJDK image, you should use eclipse-temurin in Docker Hub .com/_/eclipse-temurin) user.

IV. JVM Selection

For JVM implementation, Oracle has a JVM implementation specification that defines what features the Java-compatible code should have when running this VM; so ** as long as this JVM implementation specification is met and certified, then this JVM implementation can theoretically be used in production. ** There are many JVM implementations on the market today:

  • Hotspot
  • OpenJ9
  • TaobaoVM
  • LiquidVM
  • Azul Zing

These JVM implementations may have different features and performance, for example Hotspot is the most commonly used JVM implementation with the best overall performance, compatibility, etc.; OpneJ9, created by IBM and currently part of the Eclipse Foundation, is more containerization friendly, offering faster startup and memory footprint features.

It is generally recommended to use the “standard” Hotspot if you are not very familiar with the JVM; if you have higher requirements and expect to debug some JVM optimization parameters yourself, please consider Eclipse OpenJ9. I personally prefer OpenJ9, because its documentation is very well written and can be read with care. If you want to use the OpenJ9 image, it is recommended to use the ibm-semeru-runtimes pre-compiled image directly.

V. Semaphore Passing

When we need to close a program, usually the system will send a termination signal to the process, similarly when the container stops Kubernetes or other container tools will send a termination signal to the process with PID 1 in the container; if the container is running a Java program, then the signal is passed to the JVM and Java related frameworks such as Spring Boot will detect the signal and start executing something. If a Java program is running inside the container, the signal is passed to the JVM and Java-related frameworks such as Spring Boot detect the signal and start performing some cleanup before shutdown, which is called a “graceful shutdown “.

If we don’t get the signal to the JVM correctly when containerizing a Java application, then a scheduler like Kubernetes will force a shutdown after waiting for the container shutdown to time out, ** which is likely to result in some Java programs not releasing resources properly, such as database connections not closing, registries not back registering, etc. ** To verify this, I created a Spring Boot sample project to test it, which contains the following core files (see GitHub for the full code):

  • BeanTest.java: Use @PreDestroy to register a Hook to listen for shutdown events to simulate a graceful shutdown
  • Dockerfie.bad: Dockerfile for error demonstration
  • Dockerfile.direct: Run command directly to achieve graceful shutdown
  • Dockerfile.exec: Use exec to achieve graceful shutdown
  • Dockerfile.bash-c: Use bash -c for graceful shutdown
  • Dockerfile.tini: Verify that tini does not shut down gracefully in some cases
  • Dockerfile.dumb-init: Verify that dumb-init does not shut down gracefully in some cases

Since the BeanTest print-only tests are generic, here is the code:

package com.example.springbootgracefulshutdownexample;

import org.springframework.stereotype;

import javax.annotation.PreDestroy;

@Component
public class BeanTest {
    @PreDestroy
    public void destroy() {
        System.out.println("==================================");
        System.out.println("Received termination signal, executing graceful closure...") ;
        System.out.println("==================================");
    }
}

5.1. Wrong signaling

In many primitive Java projects there is usually a startup script, which may be self-written, or it may be some old Tomcat startup script, etc.; when we use the script to start and do not adjust the Dockerfile properly, there will be a problem with the signal not being passed correctly; for example, the following error example:

entrypoint.bad.sh: responsible for starting

#! /usr/bin/env bash

java -jar /SpringBootGracefulShutdownExample-0.0.1-SNAPSHOT.jar

Dockerfie.bad: using bash startup script, which causes the termination signal not to be delivered

FROM eclipse-temurin:11-jdk

COPY entrypoint.bad.sh /
COPY target/SpringBootGracefulShutdownExample-0.0.1-SNAPSHOT.jar /

# The following methods fail to forward signals
#CMD /entrypoint.bad.sh
# CMD ["/entrypoint.bad.sh"]
CMD ["bash", "/entrypoint.bad.sh"]

After running with this Dockerfile package, ** it is obvious that when using the docker stop command, it stalls for a while (actually docker is waiting for the processes in the container to exit by themselves), and when the scheduled timeout is reached, the processes in the container are forcibly terminated, so no log of graceful shutdown is printed:**

docker-run.png

5.2. Proper signaling

5.2.1, direct run method

There are many ways to solve this signaling problem; for example, it is common to run a java program directly using the CMD or ENTRYPOINT commands:

Dockerfile.direct: run the java program directly, it can receive the termination signal normally

FROM eclipse-temurin:11-jdk

COPY target/SpringBootGracefulShutdownExample-0.0.1-SNAPSHOT.jar /

CMD ["java", "-jar", "/SpringBootGracefulShutdownExample-0.0.1-SNAPSHOT.jar"]

As you can see, running the java command directly in the Dockerfile allows jvm to properly notify the application of a graceful shutdown:

java.png

5.2.2. Indirect Exec method

If you are familiar with Docker, you should know that running commands directly in Dockerfile cannot resolve environment variables; however, sometimes we rely on scripts to resolve variables, so we can first resolve them in scripts and use exec for final execution; this way we can also ensure the signaling (not pictured):

entrypoint.exec.sh: exec executes the final command, and can forward the signal

#! /usr/bin/env bash

# Pretend to do some variable handling, etc...
export VERSION="0.0.1"

exec java -jar /SpringBootGracefulShutdownExample-${VERSION}-SNAPSHOT.jar

5.2.3. Bash-c method

In addition to direct execution and exec, there is also what I call an “unstable” solution, which is to use bash -c to execute commands; when using bash -c to execute some simple commands, its behavior will be similar to exec, it will also replace the child process commands to the parent process so that -c will also replace the child process command with the parent process so that the command after -c will receive the system signal directly; **but note that this approach may not be 100% successful, for example, if the command after -c contains a pipe, redirect, etc., it may still trigger fork, and the child command will still not complete a graceful shutdown. **

Dockerfile.bash-c: Execute with bash -c to do a graceful shutdown if the command is simple

FROM eclipse-temurin:11-jdk

COPY entrypoint.bad.sh /
COPY target/SpringBootGracefulShutdownExample-0.0.1-SNAPSHOT.jar /

CMD ["bash", "-c", "java -jar /SpringBootGracefulShutdownExample-0.0.1-SNAPSHOT.jar"]

For a discussion of bash -c, see [StackExchange](https://unix.stackexchange.com/questions/466496/why-is-there-no-apparent-clone-or-fork-in- simple-bash-command-and-how-its-done).

5.2.4, tini or dump-init

The daemon is not a panacea, and both tini and dump-init have some problems.

These two tools are familiar tools to most people, and even Docker itself has them integrated; but it seems that many people have the misconception (I used to think so too) that **the addition of tini or dump-init signals will allow them to be forwarded and shut down gracefully; but this is not the case, and often the addition of these two things will only ensure that the zombie processes are recycled, but the child processes may still not be shut down gracefully. ** For example, the following example:

Dockerfile.tini: the case where adding tini does not shut down gracefully

FROM eclipse-temurin:11-jdk

RUN set -e \
    && apt update \
    && apt install tini psmisc -y

COPY entrypoint.bad.sh /
COPY target/SpringBootGracefulShutdownExample-0.0.1-SNAPSHOT.jar /

ENTRYPOINT ["tini", "-vvv", "--"]

CMD ["bash", "/entrypoint.bad.sh"]

The same problem exists for dump-init, but the root of the problem lies in bash: **When bash starts a script, bash forks a new child process; the forwarding logic for both tini and dump-init is to pass the signal to the process group; as long as the parent process in the process group responds to the signal, then the forwarding is considered complete. But the child process in the process group may die before it can gracefully shut down the parent process, which may result in the child process eventually being forced to kill. **This can lead to the child process being forced to kill eventually.

5.3. Best practices

Based on the above test and verification results, here is a summary of best practices:

    1. Built-in tini or dump-init in the container is a good practice to prevent zombie processes
  • 2, tini or dump-init can not be 100% graceful shutdown
  • 3, simple commands directly CMD execution can accept signal forwarding to achieve graceful shutdown
  • 4、Complex commands in the script exec execution can also accept signal forwarding to achieve graceful shutdown
    1. Directly using bash -c to run the simple command can also be gracefully closed, but need to test to determine the accuracy

VI. Memory Limits

I found that very few people go deeper and test this problem, with the development of containerization in the past two years, in fact, many things have long been inapplicable, so here we decided to test this memory problem specifically and carefully (** just want to see If you just want to see the conclusion, you can directly watch chapter 6.3. **).

As we all know, Java has a virtual machine, Java code is compiled into Class files and then run in the JVM; the JVM automatically sets the HeapSize by default according to the operating system environment, and one of the challenges of containerized Java applications is how to let the JVM get the right amount of available memory to avoid being killed.**

6.1. Adaptive without configuration

By default, without configuration, the ideal JVM should recognize the memory limit we impose on the container and automatically adjust the heap memory size; to verify which versions of OpenJDK can do this, I took some specific versions and ran the following tests:

  • Using `docker run -m 512m … ’ to limit the container memory to 512m, and the actual host to 16G
  • Use the command java -XX:+PrintFlagsFinal -version | grep MaxHeapSize to see the default maximum heap memory of the JVM (later found that -XshowSettings:vm looks clearer)

6.1.1, OpenJDK 8u111

This version of OpenJDK does not have any support for containerization, so it is theoretically impossible for it to get the memory limit of the limit:

jvm.png

You can see that the JVM does not recognize the limit, and still allocates heap memory at about 1/4 the size of the host, so if the memory usage of the java application inside is high, it may be killed directly.

6.1.2, OpenJDK 8u131

This version 8u131 is chosen because it adds -XX:+UseCGroupMemoryLimitForHeap parameter to support memory adaption, so let’s not turn it on and test it first:

8u131

Again, the memory limit is not recognized by default.

6.1.3, OpenJDK 8u222

Version 8u191 from OpneJDK 10 backported the XX:+UseContainerSupport parameter to support JVM containerization, but this version is not available for download, here we use the higher 8u222 to test, again without the specific parameter enabled for testing:

8u222

The same memory is not recognized correctly.

6.1.4, OpenJDK 11.0.15

OpenJDK 11 has full support for containerization, e.g. XX:+UseContainerSupport is enabled by default, so we still choose not to change any settings to test it here:

OpenJDK

As you can see, even with the UseContainerSupport switch turned on by default, it still does not adapt to memory properly.

6.1.5, OpenJDK 11.0.16

Many people may wonder, why test 11.0.16 when you have tested 11.0.15? Because there is a strange difference between these two versions without settings:

OpenJDK

As you can see, the 11.0.16 version automatically adapts to the container memory limit without any settings, changing the heap memory from nearly 4G to 120M.

6.1.6, OpenJDK 17

OPneJDK 17 is the latest LTS version, here we test the memory adaptation of OpneJDK 17 without adjusting any parameters:

OpenJDK

We can see that OpneJDK 17 can achieve memory adaption as well as OpenJDK 11.0.16 version.

6.2. Adaptive with configuration

We did some tests in the unconfigured case above, and the results were “puzzling” from version 11.0.15 onwards; theoretically 11+ already has the container support parameter turned on automatically, but some versions of memory adaptation still did not work, which prompted me to wonder about the actual effect of other parameters I started doing some tests by manually enabling the parameters according to the added version of each parameter.

6.2.1, OpenJDK 8u131

8u131 officially started containerization support, in this version a JVM option was added to tell the JVM to use the memory limit set by the cgroup; I added the -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap parameter to test it. , the result is that this option does not seem to work at all in my current environment:

OpenJDK

6.2.2, OpenJDK 8u222

Starting with version 8u191, another parameter to enable containerization support was added -XX:+UseContainerSupport, which was merged backwards from OpenJDK 10; I tried to test with this parameter, and it still didn’t work:

OpenJDK

6.2.3, OpenJDK 11+

Starting from 11+ version -XX:+UseContainerSupport is automatically enabled, we don’t need to do any special settings, so the result is the same as the no-configuration test: **Adaptive from 11.0.15 onwards, previous versions (including 11.0.15) don’t support adaptive. **

6.3. Analysis and summary

After some tests above, you will find that the parameters described in many articles or documents somehow do not work; this is mainly due to a very important update to containerization in the past two years: Cgroups v2; due to space issues here is not to list the test screenshots, the following is just to say that the conclusion.

6.3.1, Cgroups V1

For containerized environments using Cgroups V1, some of the “old” rules still apply (the new kernel adds the kernel parameter systemd.unified_cgroup_hierarchy=0 to fall back to Cgroups V1):

    1. OpenJDK 8u131 and later versions add -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap parameter to support memory adaption.
    1. OpenJDK 8u191 and later versions add the -XX:+UseContainerSupport parameter to support memory adaption.
    1. OpenJDK 11 and later versions have the -XX:+UseContainerSupport parameter enabled by default, which automatically supports memory adaption.

6.3.2, Cgroups V2

In newer versions of the system (look it up) with newer containerd and other containerization tools, it has been converted to Cgroups V2 by default, **Note that memory adaptation for Cgroups V2 is only supported in OpneJDK 11.0.16 and later, before that it is useless to turn on any parameters. **

Please see JDK-8230305 for more details on Cgroups V2 support:

Cgroups V2

VII. DNS Caching

In most Java programs we use domain names to access some services, maybe to access some API endpoints or to access some databases, and whatever the domain name is used will involve DNS caching; **Java’s DNS cache is controlled by the JVM, don’t take it for granted that the JVM DNS cache is very friendly, sometimes the DNS cache may may exceed expectations. ** To test the DNS cache I copied a test script from some guy that tests the DNS cache of three versions of OpenJDK:

jvm-dns-ttl-policy.sh

#!/usr/bin/env bash

set -e

for tag in 8-jdk 11-jdk 17-jdk; do

    tag_name="jvm-dns-ttl-policy"
    output_file="$(mktemp)"

    jvm_args=""
    if ! [ "${tag}" == "8-jdk" ]; then
        jvm_args="--add-exports java.base/sun.net=ALL-UNNAMED"
    fi

    ttl=""
    if ! [ "${1}" == "" ]; then
        ttl="-Dsun.net.inetaddr.ttl=${1}"
    fi

    dockerfile="
FROM        eclipse-temurin:${tag}
WORKDIR     /var/tmp
RUN         printf ' \\
              public class DNSTTLPolicy { \\
                public static void main(String args[]) { \\
                  System.out.printf(\"Implementation DNS TTL for JVM in Docker image based on 'eclipse-temurin:${tag}' is %%d seconds\\\\n\", sun.net.InetAddressCachePolicy.get()); \\
                } \\
              }' >DNSTTLPolicy.java
RUN         javac ${jvm_args} DNSTTLPolicy.java -XDignore.symbol.file
CMD         java ${jvm_args} ${ttl} DNSTTLPolicy
ENTRYPOINT  java ${jvm_args} ${ttl} DNSTTLPolicy
"

    dockerfile_security_manager="
FROM        eclipse-temurin:${tag}
WORKDIR     /var/tmp
RUN         printf ' \\
              public class DNSTTLPolicy { \\
                public static void main(String args[]) { \\
                  System.out.printf(\"Implementation DNS TTL for JVM in Docker image based on 'eclipse-temurin:${tag}' (with security manager enabled) is %%d seconds\\\\n\", sun.net.InetAddressCachePolicy.get()); \\
                } \\
              }' >DNSTTLPolicy.java
RUN         printf ' \\
              grant { \\
                permission java.security.AllPermission; \\
              };' >all-permissions.policy
RUN         javac ${jvm_args} DNSTTLPolicy.java -XDignore.symbol.file
CMD         java ${jvm_args} ${ttl} -Djava.security.manager -Djava.security.policy==all-permissions.policy DNSTTLPolicy
ENTRYPOINT  java ${jvm_args} ${ttl} -Djava.security.manager -Djava.security.policy==all-permissions.policy DNSTTLPolicy
"

    echo "Building Docker image based on eclipse-temurin:${tag} ..." >&2
    docker build -t "${tag_name}" - <<<"${dockerfile}" 2>&1 > /dev/null
    docker run --rm "${tag_name}" &>"${output_file}"
    cat "${output_file}"
    docker build -t "${tag_name}" - <<<"${dockerfile_security_manager}" 2>&1 > /dev/null
    docker run --rm "${tag_name}" &>"${output_file}"
    cat "${output_file}"
    echo ""

done

7.1. Default DNS caching

The default DNS cache results without any settings are as follows (just run the script directly):

DNS caching

As you can see, the DNS TTL is set to 30s by default, but if Security Manager is enabled it becomes -1s, so what does -1s mean (taken from the OpenJDK 11 source code):

/* The Java-level namelookup cache policy for successful lookups:
 *
 * -1: caching forever
 * any positive value: the number of seconds to cache an address for
 * default value is forever (F)
 * default value is forever (FOREVER), as we let the platform do the
 * For security reasons, this caching is made forever when
 For security reasons, this caching is made forever when * a security manager is set.
 */For security reasons, this caching is made forever when * a security manager is set.
private static volatile int cachePolicy = FOREVER;

/* The Java-level namelookup cache policy for negative lookups:
 * -1: caching forever.
 * -1: caching forever
 * any positive value: the number of seconds to cache an address for
 * default value is 0.
 * It can be set to some other value for
 * performance reasons.
 */It can be set to some other value for * performance reasons.
private static volatile int negativeCachePolicy = NEVER;

7.2. Setting up DNS caching

To avoid this weird DNS caching policy issue, it is best to manually set the DNS cache time at startup by adding the -Dsun.net.inetaddr.ttl=xxx parameter:

DNS caching

** As you can see, once we set the DNS cache manually, then Security Manager will follow our settings regardless of whether it is turned on or not. ** For more detailed debugging of DNS caching, we recommend using Alibaba’s open source DCM tool.

VIII. Native Compilation

Native compilation optimization means that Java code is compiled by GraalVM into binaries that can be executed directly by the platform, and the compiled executable will run much faster. **But GraalVM requires code layer adjustment, framework upgrade, and other operations, which are generally demanding; however, if you are working on a new project, it is best to have the development support for Native compilation with GraalVM, which can be a huge boost to startup speed. **

The project described above for testing graceful closure already has built-in support for GraalVM, just download GraalVM and set the JAVA_HOME and PATH variables, and use mvn clean package -Dmaven.test.skip=true -Pnative to compile:

Native Compilation

After successful compilation, a binary file will be created in the target directory that can be executed directly, and the following is a comparison of the startup speed:

Native Compilation

However, in general, this approach is not yet particularly mature, and the domestic Java ecosystem is still dominated by OpneJDK 8, so old projects need to be adjusted to meet GraalVM; so the conclusion is that new projects should be supported as much as possible, and old projects should not die. projects should not die.So the conclusion is that new projects should be supported as much as possible, and old projects should not die.