Thread-Affinity for CPU-bound threads

Introduction

In modern computer systems, there can be multiple CPUs, and each CPU can have multiple cores. In order to take full advantage of the capabilities of modern CPUs, JAVA introduced multithreading, where different threads can run on different CPUs or different CPU cores at the same time. However, for JAVA programmers it is possible to control how many threads are created, but which CPU the threads are running on is generally difficult to know.

However, if different CPU cores are scheduling the same thread, there may be a performance loss due to CPU switching. Normally this loss is relatively small, but if your application is particularly concerned about the loss caused by CPU switching, then you can try Java Thread Affinity today.

Introduction to Java Thread Affinity

Java thread Affinity is used to bind threads in JAVA code to specific CPU cores to improve the performance of your program.

Obviously, to interact with the underlying CPU, java thread Affinity must use JAVA and native methods for interaction, JNI is the official JAVA and native methods for interaction, but JNI is more cumbersome to use. So java thread Affinity actually uses JNA, JNA is a library for interacting with native methods based on JNI improvements.

First, let’s introduce a few concepts in the CPU, respectively, CPU,CPU socket and CPU core.

The first is the CPU, the full name of the CPU is central processing unit, also known as the central processing unit, is the key core used for task processing.

So what is CPU socket? The so-called socket is the slot where the CPU is inserted, if you have assembled a desktop computer, you should know that the CPU is installed on the socket.

CPU Core refers to the number of cores in the CPU. A long time ago, CPUs were single-core, but with the development of multi-core technology, a CPU can contain multiple cores, and the cores in the CPU are the real business processing units.

If you are on a linux machine, then you can check the CPU status of your system by using the lscpu command as follows.

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 94
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 3
CPU MHz: 2400.000
BogoMIPS: 4800.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
L3 cache: 28160K
NUMA node0 CPU(s): 0
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat

From the above output we can see that this server has one socket, one core per socket, and each core can handle 1 thread at a time.

This CPU information can be called CPU layout. in linux CPU layout information is stored in /proc/cpuinfo.

In Java Thread Affinity, there is a CpuLayout interface that corresponds to this information.

public interface CpuLayout {

    int cpus();

    int sockets();

    int coresPerSocket();

    int threadsPerCore();

    int socketId(int cpuId);

    int coreId(int cpuId);

    int threadId(int cpuId);
}

Based on the CPU layout information, AffinityStrategies provides some basic Affinity policies to arrange the distribution relationship between different threads, mainly the following.

    SAME_CORE - Runs in the same core.
    SAME_SOCKET - runs in the same socket, but not on the same core.
    DIFFERENT_SOCKET - runs in a different socket
    DIFFERENT_CORE - runs on a different core
    ANY - any case is fine

These policies are also distinguished based on the socketId and coreId of the CpuLayout. Let’s take SAME_CORE as an example and press its specific implementation.

SAME_CORE {
        @Override
        public boolean matches(int cpuId, int cpuId2) {
            CpuLayout cpuLayout = AffinityLock.cpuLayout();
            return cpuLayout.socketId(cpuId) == cpuLayout.socketId(cpuId2) &&
                    cpuLayout.coreId(cpuId) == cpuLayout.coreId(cpuId2);
        }
    }

Affinity policies can be sequential, the policy in front will be matched first, if it doesn’t match then the second policy will be selected and so on.

Use of AffinityLock

Next we look at the specific use of Affinity, the first is to get a CPU lock, before JAVA7, we can write it like this.

AffinityLock al = AffinityLock.acquireLock();
try {
     // do some work locked to a CPU.
} finally {
     al.release();
}

After JAVA 7, it can be written like this.

try (AffinityLock al = AffinityLock.acquireLock()) {
    // do some work while locked to a CPU.
}

The acquireLock method can get any available CPU for a thread. this is a coarse-grained lock. if you want to get a fine-grained core, you can use acquireCore:

try (AffinityLock al = AffinityLock.acquireCore()) {
    // do some work while locked to a CPU.
}

acquireLock also has a bind parameter that indicates whether to bind the current thread to the acquired CPU lock. If the bind parameter = true, then the current thread will run on the CPU acquired in acquireLock. If the bind parameter = false, it means that acquireLock will bind at some point in the future.

Above we mentioned the AffinityStrategy, which can be used as an argument to acquireLock:

    public AffinityLock acquireLock(AffinityStrategy... strategies) {
        return acquireLock(false, cpuId, strategies);
    }

By calling the acquireLock method of the current AffinityLock, the AffinityLock associated with the previous lock strategy can be assigned to the current thread.

AffinityLock also provides a dumpLocks method to view the current CPU and thread binding status. Let’s take an example.

private static final ExecutorService ES = Executors.newFixedThreadPool(4,
           new AffinityThreadFactory("bg", SAME_CORE, DIFFERENT_SOCKET, ANY));

for (int i = 0; i < 12; i++)
            ES.submit(new Callable<Void>() {
                @Override
                public Void call() throws InterruptedException {
                    Thread.sleep(100);
                    return null;
                }
            });
        Thread.sleep(200);
        System.out.println("\nThe assignment of CPUs is\n" + AffinityLock.dumpLocks());
        ES.shutdown();
        ES.awaitTermination(1, TimeUnit.SECONDS);

In the above code, we create a thread pool of 4 threads, the corresponding ThreadFactory is AffinityThreadFactory, name the thread pool bg, and assign 3 AffinityStrategy. The idea is to first assign to the same core, then to different sockets, and finally to any available CPU.

Then the specific execution process, we submitted 12 threads, but our Thread pool only has at most 4 threads, it can be expected that only 4 threads will be bound to the CPU in the result returned by AffinityLock.dumpLocks method.

The assignment of CPUs is
0: CPU not available
1: Reserved for this application
2: Reserved for this application
3: Reserved for this application
4: Thread[bg-4,5,main] alive=true
5: Thread[bg-3,5,main] alive=true
6: Thread[bg-2,5,main] alive=true
7: Thread[bg,5,main] alive=true

As you can see from the output, CPU0 is not available. The other 7 CPUs are available, but only 4 threads are bound, which matches our previous analysis.

Next, let’s modify the AffinityStrategy of AffinityThreadFactory as follows.

new AffinityThreadFactory("bg", SAME_CORE)

means that threads will only be bound to the same core, because in the current hardware, a core can only support one thread binding at the same time, so it can be expected that only one thread will be bound in the end result, which runs as follows.

The assignment of CPUs is
0: CPU not available
1: Reserved for this application
2: Reserved for this application
3: Reserved for this application
4: Reserved for this application
5: Reserved for this application
6: Reserved for this application
7: Thread[bg,5,main] alive=true

You can see that only the first thread is bound to the CPU, which matches the previous analysis.

Use the API to allocate CPUs directly

The AcquireLock method of AffinityLock we mentioned above actually accepts a CPU id parameter, which can be used to get the lock of the incoming CPU id directly. so that subsequent threads can run on the specified CPU.

    public static AffinityLock acquireLock(int cpuId) {
        return acquireLock(true, cpuId, AffinityStrategies.ANY);
    }

In real time this Affinity is stored in the BitSet, the index of the BitSet is the cpu id and the corresponding value is whether to acquire the lock or not.

First look at the definition of the setAffinity method.

    public static void setAffinity(int cpu) {
        BitSet affinity = new BitSet(Runtime.getRuntime().availableProcessors());
        affinity.set(cpu);
        setAffinity(affinity);
    }

Look again at the use of setAffinity.

long currentAffinity = AffinitySupport.getAffinity();
Affinity.setAffinity(1L << 5); // lock to CPU 5.

Note that since the underlying BitSet uses Long for data storage, the index here is bit index, so we need to convert the decimal CPU index.

Summarize

Java Thread Affinity can control the CPU used by the Thread in the program from the JAVA code, which is very powerful and can be used by everyone.