Houston, We Have a Problem - OpenJDK is on 26 Now!

Oracle JDK v.s. OpenJDK#

While both are Java Development Kits, the most critical differences between them make this post exclusively discuss new features of OpenJDK only, not of Oracle JDK.

Historically, Oracle JDK was seen as the “premium” option, sometimes with a slight performance edge and proprietary features like Java Flight Recorder (JFR) and Java Mission Control (JMC). However, since Java 11, Oracle has open-sourced these features and contributed them back to the OpenJDK project. As a result, the codebases have converged to the point where they are functionally identical for the vast majority of use cases.

A critical point to remember is that Oracle’s builds of the JDK are based on the OpenJDK source code. Think of OpenJDK as the “reference implementation” maintained by a community that includes Oracle engineers, Red Hat, IBM, and others. Oracle then takes that open source codebase, adds its own branding, packaging, and commercial support, and distributes it as Oracle JDK.

Oracle JDK fffers professional, 24/7 commercial support and long-term stability with a fixed quarterly update schedule. Enterprise that needs guaranteed, timely patches, and a service level agreement (SLA) would choose Oracle’s paid subscription. OpenJDK, on the other hand, has a community-driven support. While security vulnerabilities and bugs are promptly fixed by the community, there is no official, central support system. However, many vendors (like Red Hat, Azul, and Amazon) offer their own builds of OpenJDK with paid support contracts, providing a flexible alternative to Oracle’s offering at a potentially lower cost.

Hence the content of this post would be centered around the OpenJDK release page

Java Learning Resources

Effective Java: “Bible” of the language

Inside the Java Virtual Machine

Java Performance: The Definitive Guide

Java Generics FAQs

Defensive Copy

VisualVM

jConsole

Java Decompiler

Querydsl

Diagnostic Tools

The jps Utility

The jstat Utility

The jinfo Utility

The jmap Utility

The jhat Utility

The jstack Utility

Memory#

The Java Virtual Machine defines various run-time data areas that are used during execution of a program. Some of these data areas are created on Java Virtual Machine start-up and are destroyed only when the Java Virtual Machine terminates. Other data areas are per thread. Per-thread data areas are created when a thread is created and destroyed when the thread terminates. Those run-time data areas are

The pc Register
Java Virtual Machine Stacks
Heap
Method Area
Run-Time Constant Pool
Native Method Stacks

The pc Register#

The _pc (program counter) register is one of the most fundamental of memory areas and operates on a strict per-thread basis.

The Java Virtual Machine can support many threads of execution at once. Each Java Virtual Machine thread has its own pc (program counter) register. At any point, each Java Virtual Machine thread is executing the code of a single method, namely the current method for that thread. When a new thread is created, it is immediately allocated its own pc register. The primary responsibility of this register is to act as a “bookmark”, keeping track of exactly where the thread currently is in its execution sequence.

If that method is not native, the pc register contains the address of the Java Virtual Machine instruction currently being executed.
If the method currently being executed by the thread is native (meaning code written in another language like C or C++ that interacts directly with the host operating system), the JVM steps back, i.e. the value of the Java Virtual Machine’s pc register is undefined because the underlying native platform’s program counter takes over the tracking responsibility.

Here is a realistic illustration of how the JVM organizes the pc register across different threads as part of the Run-Time Data Areas.

This diagram demonstrates a few key aspects of the JVM specification:

Isolation: Thread 1 and Thread 2 are entirely separate containers with their own registers and stacks.

Tracking Java Code: For Thread 1, the PC Register points to a specific bytecode address (invokevirtual in the calculateValue method).

Handling Native Code: For Thread 2, which is executing code outside the JVM (a native method), the specifications state that the JVM pc register is undefined

Java Virtual Machine Stacks#

Each Java Virtual Machine thread has a private Java Virtual Machine stack, created at the same time as the thread. JVM Stacks are private to each thread and store frames, which are created with every method invocation and destroyed upon method completion. Each frame is the storage location for local variables (like integers or object references), an operand stack (for expressions), and other data required for method execution. Think of the stack as a record of method calls, and the frames as the internal state of each of those active calls.

Heap#

The Java Virtual Machine has a heap that is shared among all Java Virtual Machine threads. The heap is the run-time data area from which memory for all class instances and arrays is allocated.

The heap is created on virtual machine start-up. Heap storage for objects is reclaimed by an automatic storage management system (known as a garbage collector); objects are never explicitly deallocated. The Java Virtual Machine assumes no particular type of automatic storage management system, and the storage management technique may be chosen according to the implementor’s system requirements.

Method Area#

The Java Virtual Machine has a method area that is shared among all Java Virtual Machine threads. The method area is analogous to the storage area for compiled code of a conventional language or analogous to the “text” segment in an operating system process. It stores per-class structures such as the run-time constant pool, field and method data, and the code for methods and constructors, including the special methods used in class and interface initialization and in instance initialization.

Run-Time Constant Pool#

A run-time constant pool is a per-class or per-interface run-time representation of the constant_pool table in a class file. It contains several kinds of constants, ranging from numeric literals known at compile-time to method and field references that must be resolved at run-time. The run-time constant pool serves a function similar to that of a symbol table for a conventional programming language, although it contains a wider range of data than a typical symbol table.

Native Method Stacks#

An implementation of the Java Virtual Machine may use conventional stacks, colloquially called “C stacks,” to support native methods (methods written in a language other than the Java programming language). Native method stacks may also be used by the implementation of an interpreter for the Java Virtual Machine’s instruction set in a language such as C. Java Virtual Machine implementations that cannot load native methods and that do not themselves rely on conventional stacks need not supply native method stacks. If supplied, native method stacks are typically allocated per thread when each thread is created.

JDK 9#

Serialization Filter Configuration#

JEP 290: Filter Incoming Serialization Data
“Allow incoming streams of object-serialization data to be filtered in order to improve both security and robustness.”

JEP 290 introduced serialization filtering to Java, allowing control over which classes can be deserialized from an ObjectInputStream. As an example, suppose we have 2 serializable classes: AllowedClass and RestrictedClass:

1
package com.example.allowed;
2

3
import java.io.Serial;
4
import java.io.Serializable;
5

6
public class AllowedClass implements Serializable {
7
    @Serial private static final long serialVersionUID = 1L;
8
    public String message;
9

10
    public AllowedClass(String message) {
11
        this.message = message;
12
    }
13

14
    @Override
15
    public String toString() {
16
        return "AllowedClass: " + message;
17
    }
18
}

1
package com.example.restricted;
2

3
import java.io.Serial;
4
import java.io.Serializable;
5

6
public class RestrictedClass implements Serializable {
7
    @Serial private static final long serialVersionUID = 1L;
8
    public String secret;
9

10
    public RestrictedClass(String secret) {
11
        this.secret = secret;
12
    }
13

14
    @Override
15
    public String toString() {
16
        return "RestrictedClass: " + secret;
17
    }
18
}

To make sure AllowedClass can be deserialized while RestrictedClass cannot, we can use either pattern-based filters defined via system properties or security properties, or programmatically using the ObjectInputFilter API.

Pattern-Based Filter (System Property Example)#

This example demonstrates how to set a global filter using the jdk.serialFilter system property to allow only classes within com.example.allowed and reject all others.

If the JDK system property has been set with -Djdk.serialFilter="com.example.allowed.*;!*", the following runtime should execute successfully without error for AllowedClass and throw exception for RestrictedClass:

1
import java.io.ByteArrayInputStream;
2
import java.io.ByteArrayOutputStream;
3
import java.io.IOException;
4
import java.io.ObjectInputStream;
5
import java.io.ObjectOutputStream;
6
import java.io.Serializable;
7

8
public static void serialize(Serializable obj, Class<? extends Serializable> clazz) {
9
    try (ByteArrayOutputStream bos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(bos)) {
10
        oos.writeObject(obj);
11

12
        try (ByteArrayInputStream bis = new ByteArrayInputStream(bos.toByteArray());
13
             ObjectInputStream ois = new ObjectInputStream(bis)) {
14
            Serializable result = clazz.cast(ois.readObject());
15
        }
16
    } catch (IOException | ClassNotFoundException exception) {
17
        throw new IllegalStateException(exception);
18
    }
19
}
20

21
AllowedClass allowed = new AllowedClass("Hello Allowed!");
22
RestrictedClass restricted = new RestrictedClass("Secret Data!");
23

24
serialize(allowed, AllowedClass.class);       // ✅
25
serialize(restricted, RestrictedClass.class); // ❌ runtime error

Programmatic Filter (ObjectInputFilter API)#

This example shows how to set a filter directly on an ObjectInputStream using the ObjectInputFilter interface.

1
import java.io.ByteArrayInputStream;
2
import java.io.ByteArrayOutputStream;
3
import java.io.IOException;
4
import java.io.ObjectInputFilter;
5
import java.io.ObjectInputStream;
6
import java.io.ObjectOutputStream;
7
import java.io.Serializable;
8

9
public static void serialize(Serializable obj, Class<? extends Serializable> clazz) {
10
    try (ByteArrayOutputStream bos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(bos)) {
11
        oos.writeObject(obj);
12

13
        try (ByteArrayInputStream bis = new ByteArrayInputStream(bos.toByteArray());
14
             ObjectInputStream ois = new ObjectInputStream(bis)) {
15

16
            ois.setObjectInputFilter(info -> {
17
                if (info.serialClass() != null && info.serialClass().getName().startsWith("com.example.allowed")) {
18
                    return ObjectInputFilter.Status.ALLOWED;
19
                }
20
                return ObjectInputFilter.Status.REJECTED;
21
            });
22

23
            Serializable result = clazz.cast(ois.readObject());
24
        }
25
    } catch (IOException | ClassNotFoundException exception) {
26
        throw new IllegalStateException(exception);
27
    }
28
}
29

30
AllowedClass allowed = new AllowedClass("Hello Allowed!");
31
RestrictedClass restricted = new RestrictedClass("Secret Data!");
32

33
serialize(allowed, AllowedClass.class);       // ✅
34
serialize(restricted, RestrictedClass.class); // ❌ runtime error

In this example, the setObjectInputFilter method is used on the ObjectInputStream to apply a lambda expression as the filter. This filter explicitly allows classes starting with com.example.allowed and rejects all others.

Compact Strings#

JEP 254: Compact Strings
“Adopt a more space-efficient internal representation for strings.”

Before compact strings, Java’s String class stored characters in a char[] array, where each char is 2 bytes (UTF-16 encoding). This meant that even strings containing only ASCII characters (which could be represented by 1 byte) would still consume 2 bytes per character.

With JEP 254, the String implementation was changed to use either the old storage mechanism or a byte[] array plus an encoding flag. If the string contains only Latin-1 characters (characters with code points from 0 to 255, fitting within 1 byte), it will be stored as a byte[] array, effectively using 1 byte per character. If it contains characters outside of Latin-1 (requiring more than 1 byte), it will revert to the 2-byte-per-character UTF-16 representation.

The compact string optimization is transparent to the developer. We continue to use the String class as before. The JVM automatically decides the most memory-efficient internal representation based on the characters present in the string. This significantly reduces the memory footprint of applications that deal extensively with ASCII or Latin-1 based strings, which is a very common scenario.

Support Bounding the Size of Buffers Cached in the Per-Thread Buffer Caches#

JDK-8147468
JDK-8147468 is a specific enhancement that was included in OpenJDK 9 to address a potential memory management issue in the Java NIO (Non-blocking I/O) package. It allows developers to set a limit on the size of temporary buffers, preventing a form of memory leak.

Java’s NIO framework uses DirectByteBuffers for certain I/O operations. These are buffers allocated outside the Java heap, in “native memory,” to provide better performance by avoiding an extra copy of data. To optimize performance and reduce the overhead of constant allocation and deallocation, the Java Virtual Machine (JVM) caches these temporary DirectByteBuffers in a per-thread cache.

The core issue that JDK-8147468 addresses is that in older JDK versions (before 8u102 and JDK 9), this cache had no size limit. If an application performed a large, but infrequent, NIO operation that required a very big buffer, this large buffer would then be stored in the thread-local cache. Over time, these caches could accumulate a number of large buffers, leading to excessive native memory consumption, which could ultimately result in a java.lang.OutOfMemoryError even if the Java heap itself had plenty of space. It’s a bit like a person who, while trying to be efficient, ends up hoarding very large, but rarely-used, items in their pockets. Eventually, their pockets get so full and heavy that they can’t move freely.

To solve this, JDK-8147468 introduced a new system property: jdk.nio.maxCachedBufferSize. By setting this property, we can specify a maximum size for any buffer that can be held in the per-thread cache. For example, if we set jdk.nio.maxCachedBufferSize=1048576 (1 MB), any NIO operation that requires a buffer larger than 1 MB will not store that buffer in the cache. Instead, the JVM will simply allocate a new, one-off buffer for the operation and then free it immediately afterward. This prevents the native memory cache from growing indefinitely and mitigates the risk of an OutOfMemoryError due to a native memory leak. This change gives developers greater control over their application’s native memory footprint.

Why does NIO matter?#

Java NIO provides 2 main types of ByteBuffer: heap buffers and direct buffers. How they are managed by the garbage collector is fundamentally different.

Heap Buffers (ByteBuffer.allocate)

This buffer is essentially a wrapper around a standard Java byte[] array. Both the buffer object and the underlying byte array reside on the JVM heap

Because it lives on the heap, a heap buffer is fully managed by the garbage collector. When the ByteBuffer object is no longer referenced by our application, it becomes eligible for garbage collection, and the GC will reclaim its memory just like any other Java object. This process is automatic and predictable within the normal workings of the GC.
Direct Buffers (ByteBuffer.allocateDirect)

Direct buffers are designed for high-performance I/O and behave very differently. A direct buffer allocates a block of memory outside the normal garbage-collected heap. This is often called “off-heap” or “native” memory. The JVM will try to perform native I/O operations directly on this memory, avoiding the overhead of copying data between the JVM heap and the native I/O layer.

It gets tricky when it comes to garbage collection. The native memory block itself is not directly managed by the GC. However, the DirectByteBuffer object, which is a small Java object that acts as a reference to this native memory, does live on the JVM heap and is garbage collected. The JVM uses a special mechanism to free the native memory. When the DirectByteBuffer object on the heap becomes unreachable, the garbage collector will eventually process it. This action triggers a cleanup process (using an object called a Cleaner in modern Java versions) that deallocates the corresponding block of off-heap, native memory.

This indirect management can cause problems. If our application creates many short-lived direct buffers, we might exhaust our available native memory and get an OutOfMemoryError: Direct buffer memory. This can happen even if our JVM heap has plenty of free space, because the garbage collector may not run frequently enough to clean up the DirectByteBuffer objects that would, in turn, free the native memory. For this reason, direct buffers are best suited for large, long-lived buffers where their performance benefits are most significant.

Understand such nuanced intricacies requires systematic study of the memory structure and allocation of JVM. For example, we need a firm grip in our mind on what’s actually happening in the “overhead of copying data between the JVM heap and the native I/O layer”. In addition, as a serious Java developer, Understanding the JVM’s internal memory management is the key to writing truly high-performance, resilient applications and diagnosing complex production issues. While there are many resources, one book is consistently recommended as the definitive modern guide, which is highly recommended, Java Performance: The Definitive Guide by Scott Oaks:

This book is perfectly suited for a developer who wants a systematic and practical understanding of the JVM. It discusses garbage collection theory, the specifics of different GC algorithms (like G1, ZGC, and Shenandoah), and the tools (like profilers and command-line utilities) we need to observe the JVM in action. It provides a clear path from theory to practice.

This book, however, doesn’t talk too much about JVM memory structure. For example, it just mentions “heap memory” without defining or describing what it is. Those who need another “definitive” walkthrough on JVM internal structure could pick up Inside the Java Virtual Machine by Bill Venners.

While performance guides tell us how to make it run faster, Venners’ book explains what it is and why it is designed that way.

CAUTION
This book is a classic, with the 2nd edition published in 1999 for Java 2. This means the conceptual explanations of the architecture are timeless and unparalleled, but the implementation details are outdated. We won’t find information on modern garbage collectors like G1/ZGC, the JIT compiler, or the transition from PermGen to Metaspace.
The best approach is to use this book to build our fundamental understanding of the JVM’s blueprint and then supplement it with modern resources for the specifics of the HotSpot JVM, such as Java Virtual Machine Specification, the blueprint from which all JVMs are built. It’s not a tutorial and can be a dense, formal read, but it is the ultimate source of truth.

Java NIO#

Understanding Java NIO (New Input/Output) is crucial for building high-performance, scalable applications, especially those dealing with network communication or large file operations. Introduced in Java 1.4, NIO provides an alternative and often more efficient way to handle I/O compared to the traditional java.io package. In particular

Non-blocking IO: Java NIO enables us to do non-blocking IO. For instance, a thread can ask a channel to read data into a buffer. While the channel reads data into the buffer, the thread can do something else. Once data is read into the buffer, the thread can then continue processing it. The same is true for writing data to channels.
Channels and Buffers: In the standard IO API we work with byte streams and character streams. In NIO we work with channels and buffers. Data is always read from a channel into a buffer, or written from a buffer to a channel.
Selectors: Java NIO contains the concept of “selectors”. A selector is an object that can monitor multiple channels for events (like: connection opened, data arrived etc.). Thus, a single thread can monitor multiple channels for data.

Java NIO consist of the following core components:

Channels
Buffers
Selectors

Java NIO has more classes and components than these, but the Channel, Buffer and Selector forms the core of the API. The rest of the components, like Pipe and FileLock are merely utility classes to be used in conjunction with the 3 core components. Let’s then focus on these 3 components for now. The other components are explained in their own texts further below

Channels and Buffers#

Typically, all IO in NIO starts with a Channel. A Channel is a formal representation of a connection to an entity capable of performing I/O operations, such as a file, a network socket, or a hardware device. From the Channel data can be read into a Buffer. Data can also be written from a Buffer into a Channel.

Here is a basic example that uses a FileChannel to read some data into a Buffer:

1
RandomAccessFile file = new RandomAccessFile("data.txt", "r");
2
FileChannel fileChannel = file.getChannel();
3
ByteBuffer buffer = ByteBuffer.allocate(512);
4

5
int bytesRead = fileChannel.read(buffer);
6
while (bytesRead != -1) {
7
    buffer.flip();
8

9
    while (buffer.hasRemaining()) {
10
        System.out.print((char) buffer.get());
11
    }
12

13
    buffer.clear();
14
    bytesRead = fileChannel.read(buffer);
15
}
16

17
fileChannel.close();
18
file.close();

TIP
Notice the buffer.flip() call above. First we read into a Buffer. Then we flip it to read out of it.
When we read from the FileChannel into the ByteBuffer using fileChannel.read(buffer), the buffer is in write mode. The position advances as bytes are written into it. The line buffer.flip() is a necessary transitional step. It changes the buffer’s state from write mode to read mode by:

Setting the limit to the current position. This tells the buffer that the data to be read extends only up to this point.

Resetting the position back to 0. This ensures that the next read operation starts at the beginning of the data that was just written.

(To be continued…)

Caching#

A Brief Overview of Caching Algorithms#

There is no single “all” list, as new and hybrid algorithms are constantly being developed, but they all fall into a few key families. These algorithms, known as cache replacement policies, are rules that decide which piece of data to evict (remove) when the cache is full and new data needs to be added.

The goal is always to maximize the cache hit rate (finding the data in the cache) and minimize the cache miss rate (having to fetch the data from slower, underlying storage).

Here is a breakdown of the most important caching algorithms, from simple to advanced.

Simple Recency & Order-Based Algorithms#

These are the most basic algorithms, focusing on the order or time of access.

Least Recently Used (LRU): This is one of the most popular algorithms. It evicts the item that has not been used for the longest time. It’s based on the idea of temporal locality: if we used something recently, we are likely to use it again soon.
First-In, First-Out (FIFO): The first item added to the cache is the first one to be evicted, regardless of how often or recently it was used. It’s simple to implement but often inefficient, as it can evict popular items that were just loaded early.
Most Recently Used (MRU): This algorithm evicts the item that was most recently used. This seems counter-intuitive, but it’s effective in specific cases, such as database full-table scans, where the data is read once and is unlikely to be needed again soon.
Random Replacement (RR): As the name suggests, it just picks an item at random to evict. It’s very simple and avoids the overhead of tracking access, but its performance is unpredictable.

Frequency-Based Algorithms#

These algorithms track how many times an item is accessed.

Least Frequently Used (LFU): This algorithm evicts the item that has been accessed the fewest times. The idea is to keep the most popular items. LFU, however, struggles with cache pollution. An item that was popular in the past but is no longer needed can “pollute” the cache and never get evicted, while a new, instantly popular item might be evicted before its frequency count can build up.

Advanced & Adaptive Algorithms#

These are more complex, high-performance algorithms that solve the problems of the simpler ones. They are often “scan-resistant,” meaning a single pass over a large dataset (like a backup or index) won’t wipe out all the useful, popular data in the cache.

Adaptive Replacement Cache (ARC) is a high-performance algorithm that constantly balances between LRU (recency) and LFU (frequency). It’s considered one of the best general-purpose algorithms.
2Q (Two-Queue) is a simpler algorithm that also solves the scan-resistance problem.
SIEVE is an algorithm that has gained significant attention for being simpler than LRU but higher-performing than many complex algorithms.

Caching API in OpenJDK#

OpenJDK’s standard API does not provide built-in, “out-of-the-box” implementations for all of those algorithms. The Java standard library provides the building blocks (like HashMap and LinkedList), but it expects us to either build a cache ourselves or, more commonly, use a dedicated third-party library. The only algorithm we can easily implement with a built-in class, however, is LRU. The java.util.LinkedHashMap class can be configured to function as a simple LRU cache by overriding its removeEldestEntry method:

1
import java.util.LinkedHashMap;
2
import java.util.Map;
3

4
/**
5
 * A cache class with Least Recently Used (LRU) eviction policy.
6
 *
7
 * @param <K>  The type of keys maintained by this cache
8
 * @param <V>  The type of cached values
9
 */
10
public class LruCache<K, V> extends LinkedHashMap<K, V> {
11

12
    private final int cacheSize;
13

14
    /**
15
     * Constructs an empty {@link LruCache} instance with the provided maximum number of entries hold in the cache.
16
     *
17
     * @param cacheSize  Maximum number of entries in cache
18
     */
19
    private LruCache(final int cacheSize) {
20
        super(cacheSize * 4 / 3, 0.75f, true);
21
        this.cacheSize = cacheSize;
22
    }
23

24
    /**
25
     * Creates a new instance of {@link LruCache} with the provided maximum number of entries hold in the cache.
26
     *
27
     * @param cacheSize  Maximum number of entries in cache
28
     *
29
     * @param <K>  The type of keys maintained by this cache
30
     * @param <V>  The type of cached values
31
     *
32
     * @return a new initialized {@link LruCache} instance
33
     */
34
    public static <K, V> LruCache<K, V> ofSize(final int cacheSize) {
35
        return new LruCache<>(cacheSize);
36
    }
37

38
    @Override
39
    protected boolean removeEldestEntry(final Map.Entry<K, V> eldest) {
40
        return size() > getCacheSize();
41
    }
42

43
    private int getCacheSize() {
44
        return cacheSize;
45
    }
46
}

OpenJDK includes the javax.cache API, known as JCache (JSR 107). This is an interface specification, not an implementation. It provides a standard set of methods (CacheManager, Cache, get, put), but we must plug in a caching library (like Caffeine) that actually implements the logic.

For any serious caching, Java developers should use high-performance libraries. Here’s how they map to the algorithms we just listed:

Google Guava (The Predecessor): Guava’s cache uses a size-based LRU-like policy. It evicts the least recently used items when the cache reaches its maximum size.
Caffeine (The Modern Standard): This is the high-performance library that replaced Google’s Guava cache. It does not use a simple LRU or LFU. It uses W-TinyLFU, a much more advanced algorithm that provides near-optimal hit rates by combining the best parts of both LFU (frequency) and LRU (recency). It’s highly scan-resistant and generally superior to the simple algorithms.
Ehcache (The Enterprise Standard):
- Ehcache 2.x (older) allowed us to explicitly configure LRU, LFU, or FIFO.
- Ehcache 3.x (current) simplified this. It no longer lets you choose and primarily uses a sampling-based LRU for in-memory caching.