Home

Awesome

async-profiler

This project is a low overhead sampling profiler for Java that does not suffer from Safepoint bias problem. It features HotSpot-specific APIs to collect stack traces and to track memory allocations. The profiler works with OpenJDK and other Java runtimes based on the HotSpot JVM.

async-profiler can trace the following kinds of events:

See our Wiki or 3 hours playlist to learn about all features.

Download

Current release (3.0):

Previous releases

async-profiler also comes bundled with IntelliJ IDEA Ultimate 2018.3 and later.
For more information refer to IntelliJ IDEA documentation.

Supported platforms

Officially maintained buildsOther available ports
Linuxx64, arm64x86, arm32, ppc64le, riscv64, loongarch64
macOSx64, arm64

CPU profiling

In this mode profiler collects stack trace samples that include Java methods, native calls, JVM code and kernel functions.

The general approach is receiving call stacks generated by perf_events and matching them up with call stacks generated by AsyncGetCallTrace, in order to produce an accurate profile of both Java and native code. Additionally, async-profiler provides a workaround to recover stack traces in some corner cases where AsyncGetCallTrace fails.

This approach has the following advantages compared to using perf_events directly with a Java agent that translates addresses to Java method names:

If you wish to resolve frames within libjvm, the debug symbols are required.

ALLOCATION profiling

The profiler can be configured to collect call sites where the largest amount of heap memory is allocated.

async-profiler does not use intrusive techniques like bytecode instrumentation or expensive DTrace probes which have significant performance impact. It also does not affect Escape Analysis or prevent from JIT optimizations like allocation elimination. Only actual heap allocations are measured.

The profiler features TLAB-driven sampling. It relies on HotSpot-specific callbacks to receive two kinds of notifications:

Sampling interval can be adjusted with --alloc option. For example, --alloc 500k will take one sample after 500 KB of allocated space on average. Prior to JDK 11, intervals less than TLAB size will not take effect.

Installing Debug Symbols

Prior to JDK 11, the allocation profiler required HotSpot debug symbols. Some OpenJDK distributions (Amazon Corretto, Liberica JDK, Azul Zulu) already have them embedded in libjvm.so, other OpenJDK builds typically provide debug symbols in a separate package. For example, to install OpenJDK debug symbols on Debian / Ubuntu, run:

# apt install openjdk-17-dbg

(replace 17 with the desired version of JDK).

On CentOS, RHEL and some other RPM-based distributions, this could be done with debuginfo-install utility:

# debuginfo-install java-1.8.0-openjdk

On Gentoo the icedtea OpenJDK package can be built with the per-package setting FEATURES="nostrip" to retain symbols.

The gdb tool can be used to verify if debug symbols are properly installed for the libjvm library. For example, on Linux:

$ gdb $JAVA_HOME/lib/server/libjvm.so -ex 'info address UseG1GC'

This command's output will either contain Symbol "UseG1GC" is at 0xxxxx or No symbol "UseG1GC" in current context.

Wall-clock profiling

-e wall option tells async-profiler to sample all threads equally every given period of time regardless of thread status: Running, Sleeping or Blocked. For instance, this can be helpful when profiling application start-up time.

Wall-clock profiler is most useful in per-thread mode: -t.

Example: asprof -e wall -t -i 5ms -f result.html 8983

Java method profiling

-e ClassName.methodName option instruments the given Java method in order to record all invocations of this method with the stack traces.

Example: -e java.util.Properties.getProperty will profile all places where getProperty method is called from.

Only non-native Java methods are supported. To profile a native method, use hardware breakpoint event instead, e.g. -e Java_java_lang_Throwable_fillInStackTrace

Be aware that if you attach async-profiler at runtime, the first instrumentation of a non-native Java method may cause the deoptimization of all compiled methods. The subsequent instrumentation flushes only the dependent code.

The massive CodeCache flush doesn't occur if attaching async-profiler as an agent.

Here are some useful native methods that you may want to profile:

Building

Build status: Build Status

Make sure the JAVA_HOME environment variable points to your JDK installation, and then run make. GCC or Clang is required. After building, the profiler binaries will be in the build subdirectory.

Basic Usage

As of Linux 4.6, capturing kernel call stacks using perf_events from a non-root process requires setting two runtime variables. You can set them using sysctl or as follows:

# sysctl kernel.perf_event_paranoid=1
# sysctl kernel.kptr_restrict=0

async-profiler works in the context of the target Java application, i.e. it runs as an agent in the process being profiled. asprof is a tool to attach and control the agent.

A typical workflow would be to launch your Java application, attach the agent and start profiling, exercise your performance scenario, and then stop profiling. The agent's output, including the profiling results, will be displayed on the console where you've started asprof.

Example:

$ jps
9234 Jps
8983 Computey
$ asprof start 8983
$ asprof stop 8983

The following may be used in lieu of the pid (8983):

Alternatively, you may specify -d (duration) argument to profile the application for a fixed period of time with a single command.

$ asprof -d 30 8983

By default, the profiling frequency is 100Hz (every 10ms of CPU time). Here is a sample of the output printed to the Java application's terminal:

--- Execution profile ---
Total samples:           687
Unknown (native):        1 (0.15%)

--- 6790000000 (98.84%) ns, 679 samples
  [ 0] Primes.isPrime
  [ 1] Primes.primesThread
  [ 2] Primes.access$000
  [ 3] Primes$1.run
  [ 4] java.lang.Thread.run

... a lot of output omitted for brevity ...

          ns  percent  samples  top
  ----------  -------  -------  ---
  6790000000   98.84%      679  Primes.isPrime
    40000000    0.58%        4  __do_softirq

... more output omitted ...

This indicates that the hottest method was Primes.isPrime, and the hottest call stack leading to it comes from Primes.primesThread.

Launching as an Agent

If you need to profile some code as soon as the JVM starts up, instead of using the asprof, it is possible to attach async-profiler as an agent on the command line. For example:

$ java -agentpath:/path/to/libasyncProfiler.so=start,event=cpu,file=profile.html ...

Agent library is configured through the JVMTI argument interface. The format of the arguments string is described in the source code. asprof actually converts command line arguments to that format.

For instance, -e wall is converted to event=wall, -f profile.html is converted to file=profile.html, and so on. However, some arguments are processed directly by asprof. E.g. -d 5 results in 3 actions: attaching profiler agent with start command, sleeping for 5 seconds, and then attaching the agent again with stop command.

Multiple events

It is possible to profile CPU, allocations, and locks at the same time. Instead of CPU, you may choose any other execution event: wall-clock, perf event, tracepoint, Java method, etc.

The only output format that supports multiple events together is JFR. The recording will contain the following event types:

To start profiling cpu + allocations + locks together, specify

asprof -e cpu,alloc,lock -f profile.jfr ...

or use --alloc and --lock parameters with the desired threshold:

asprof -e cpu --alloc 2m --lock 10ms -f profile.jfr ...

The same, when starting profiler as an agent:

-agentpath:/path/to/libasyncProfiler.so=start,event=cpu,alloc=2m,lock=10ms,file=profile.jfr

Flame Graph visualization

async-profiler provides out-of-the-box Flame Graph support. Specify -o flamegraph argument to dump profiling results as an interactive HTML Flame Graph. Also, Flame Graph output format will be chosen automatically if the target filename ends with .html.

$ jps
9234 Jps
8983 Computey
$ asprof -d 30 -f /tmp/flamegraph.html 8983

Example

Profiler Options

asprof command-line options.

Profiling Java in a container

It is possible to profile Java processes running in a Docker or LXC container both from within a container and from the host system.

When profiling from the host, pid should be the Java process ID in the host namespace. Use ps aux | grep java or docker top <container> to find the process ID.

async-profiler should be run from the host by a privileged user - it will automatically switch to the proper pid/mount namespace and change user credentials to match the target process. Also make sure that the target container can access libasyncProfiler.so by the same absolute path as on the host.

By default, Docker container restricts the access to perf_event_open syscall. There are 3 alternatives to allow profiling in a container:

  1. You can modify the seccomp profile or disable it altogether with --security-opt seccomp=unconfined option. In addition, --cap-add SYS_ADMIN may be required.
  2. You can use "fdtransfer": see the help for --fdtransfer.
  3. Last, you may fall back to -e ctimer profiling mode, see Troubleshooting.

Restrictions/Limitations

Troubleshooting

Failed to change credentials to match the target process: Operation not permitted

Due to limitation of HotSpot Dynamic Attach mechanism, the profiler must be run by exactly the same user (and group) as the owner of target JVM process. If profiler is run by a different user, it will try to automatically change current user and group. This will likely succeed for root, but not for other users, resulting in the above error.

Could not start attach mechanism: No such file or directory

The profiler cannot establish communication with the target JVM through UNIX domain socket.

Usually this happens in one of the following cases:

  1. Attach socket /tmp/.java_pidNNN has been deleted. It is a common practice to clean /tmp automatically with some scheduled script. Configure the cleanup software to exclude .java_pid* files from deletion.
    How to check: run lsof -p PID | grep java_pid
    If it lists a socket file, but the file does not exist, then this is exactly the described problem.
  2. JVM is started with -XX:+DisableAttachMechanism option.
  3. /tmp directory of Java process is not physically the same directory as /tmp of your shell, because Java is running in a container or in chroot environment. jattach attempts to solve this automatically, but it might lack the required permissions to do so.
    Check strace build/jattach PID properties
  4. JVM is busy and cannot reach a safepoint. For instance, JVM is in the middle of long-running garbage collection.
    How to check: run kill -3 PID. Healthy JVM process should print a thread dump and heap info in its console.
Target JVM failed to load libasyncProfiler.so

The connection with the target JVM has been established, but JVM is unable to load profiler shared library. Make sure the user of JVM process has permissions to access libasyncProfiler.so by exactly the same absolute path. For more information see #78.

No access to perf events. Try --fdtransfer or --all-user option or 'sysctl kernel.perf_event_paranoid=1'

or

Perf events unavailable

perf_event_open() syscall has failed.

Typical reasons include:

  1. /proc/sys/kernel/perf_event_paranoid is set to restricted mode (>=2).
  2. seccomp disables perf_event_open API in a container.
  3. OS runs under a hypervisor that does not virtualize performance counters.
  4. perf_event_open API is not supported on this system, e.g. WSL.

For permissions-related reasons (such as 1 and 2), using --fdtransfer while running the profiler as a privileged user may solve the issue.

If changing the configuration is not possible, you may fall back to -e ctimer profiling mode. It is similar to cpu mode, but does not require perf_events support. As a drawback, there will be no kernel stack traces.

No AllocTracer symbols found. Are JDK debug symbols installed?

The OpenJDK debug symbols are required for allocation profiling. See Installing Debug Symbols for more details. If the error message persists after a successful installation of the debug symbols, it is possible that the JDK was upgraded when installing the debug symbols. In this case, profiling any Java process which had started prior to the installation will continue to display this message, since the process had loaded the older version of the JDK which lacked debug symbols. Restarting the affected Java processes should resolve the issue.

VMStructs unavailable. Unsupported JVM?

JVM shared library does not export gHotSpotVMStructs* symbols - apparently this is not a HotSpot JVM. Sometimes the same message can be also caused by an incorrectly built JDK (see #218). In these cases installing JDK debug symbols may solve the problem.

Could not parse symbols from <libname.so>

Async-profiler was unable to parse non-Java function names because of the corrupted contents in /proc/[pid]/maps. The problem is known to occur in a container when running Ubuntu with Linux kernel 5.x. This is the OS bug, see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1843018.

Could not open output file

Output file is written by the target JVM process, not by the profiler script. Make sure the path specified in -f option is correct and is accessible by the JVM.