Home

Awesome

AMD Research Instruction Based Sampling Toolkit

This repository contains tools which can be used to access the Instruction Based Sampling (IBS) mechanism in AMD microprocessors from Families 10h, 12h, 15h, 16h, and 17h. IBS is a hardware mechanism that samples a subset of all of the instructions going through a processor. For each sampled instruction, a large amount of information is gathered and saved off as a program runs.

This toolkit includes a Linux® kernel driver that helps gather these IBS samples, a user-level application to parse the raw binary dumped by the driver, and a helper application which will run other programs and collect IBS traces about them.

This toolkit was written by AMD Research as a simplified way to gather IBS samples on a wide range of Linux systems. Newer Linux kernels (Beginning in the 3.2 timeframe) have support for IBS as part of the perf_events system. This toolkit offers a simplified interface to the IBS system, but it also includes a set of directions (ibs_with_perf_events.txt) for implementing the same functionality the "official" way. In essence, this toolkit may be useful for prototyping a system that uses IBS, which can later be ported to use perf_events.

Table of Contents

AMD Research IBS Toolkit File Structure

The AMD Research IBS Toolkit is split into three major pieces, each of which is licensed separately. These three pieces are:

The AMD Research IBS Driver

A library to configure IBS and read IBS samples

A collection of user-level tools to gather and analyze IBS samples

An application that tests the IBS driver

An IBS monitoring program

An application to decode binary IBS dumps

An application to match IBS samples with their instructions

An application that uses the libIBS daemon

Building and Installing the AMD Research IBS Driver and Toolkit

Everything in the AMD Research IBS Toolkit can be built from the main directory using the command:

make

This will build the driver, libIBS, and all of the tools. Alternately, it is also possible to go into each directory and use the make command to build only that tool.

The make command uses the CC and CXX environment variables to find its compiler, and it uses the system-wide cc and c++ compilers by default. You can override these to use other compilers (e.g. clang), by running e.g.:

CC=clang CXX=clang++ make

Note that this also allows Clang's scan-build by running:

scan-build make

In addition, compilation can be done in parallel with make -j {parallelism #}

Finally, the cppcheck and pylint tools can be run on this repo with:

make check

Before using any IBS-using tools, you should install the IBS driver that you have built. There is a helper script in the ./driver/ directory for this:

./driver/install_ibs_driver.sh

Note that, if you don't run this script with sudo, it will attempt to install the driver using a sudo command that will likely ask for your password. You may need to do this every time you boot the system, unless you add the ibs.ko module to your boot-time list of modules to load.

After installing the driver, you should see IBS nodes in the file system at the following locations for each core ID <core_id>:

  1. /dev/cpu/<core_id>/ibs/op
  2. /dev/cpu/<core_id>/ibs/fetch

To uninstall the IBS driver, you can either run:

rmmod ibs

Or you can use the helper script at:

./drivers/remove_ibs_driver.sh

The user interface to the driver is documented in ./include/ibs-uapi.h. This file may be included by user application code. See ./tools/ibs_monitor/ for an example of how to interface with the driver.

AMD Research IBS Toolkit Compatibility

This toolkit has been tested to compile and install on the following systems:

In addition, it has been tested on the following processors, though its logic should work for any processors in AMD Families 10h, 12h, 14h, 15h, 16h, or 17h that support IBS:

Using the AMD Research IBS Toolkit

The AMD Research IBS Toolkit includes most of the tools necessary to analyze applications using IBS. This includes the driver to access IBS, a monitoring application which automatically gathers IBS samples from an application under test, an application to decode these IBS samples into a human-readable format, and a tool to annotate these samples with application-level information about each instruction.

All of the directions here assume that the IBS driver, contained in ./driver/, has been build and installed successfully.

The simplest mechanism to access IBS traces is the IBS Monitor application in ./tools/ibs_monitor/. This application allows users to pass a target application to be studied. The application will be run with system-wide IBS samples enabled, and the monitor will continually gather these until the program ends. In order to decrease the noise caused by saving these traces out to the target files, the monitor stores IBS traces in a raw format -- basically dumping the data structure directly to file.

After the trace has been gathered, the IBS decoder application can be used to decode these raw IBS traces into a human-readable CSV file. This application is found in ./tools/ibs_decoder/. This CSV file has one line per IBS sample, and each column describes one piece of information contained in that IBS sample.

An example of how to run the IBS Monitor and Decoder is as follows. These commands assume you are in the ./tools/ directory.

The following command will run the requested program with the given command line, and produce two IBS traces. One for Op samples (app.op) and one for Fetch samples (app.fetch).

./ibs_monitor/ibs_monitor -o app.op -f app.fetch ${program command line}

The following command will then decode the two IBS traces and save them into their respective CSV files:

./ibs_decoder/ibs_decoder -i app.op -o op.csv -f app.fetch -g fetch.csv

The follow command will run both of the above commands back-to-back and also annotate each IBS sample with information about the instruction that it sampled (such as its opcode and which line of code created it):

./tools/ibs_run_and_annotate/ibs_run_and_annotate -o -f -d ${output directory} -t ${temp directory} -w ${program working directory} -- ${program command line}

Background on Instruction Based Sampling

AMD Instruction Based Sampling (IBS) is a hardware performance monitoring mechanism that is available on AMD CPUs starting with the Family 10h generation of cores (e.g. processors code-named "Barcelona" and "Shanghai" and Phenom™ II branded consumer CPUs were from this generation). It is supported on AMD CPUs up through and including the current Family 17h processors (e.g. the Ryzen™ branded consumer CPUs) with various features in each generation.

Traditionally, hardware performance counters increment whenever an event happens inside the CPU core. These events are counted whenever the core sees some event (such as a cache miss). This can lead to overcounting in cores that perform speculative, out-of-order execution, because the instruction that caused the event may never actually commit.

A related limitation of traditional performance counters becomes apparent when performing sampling. Traditional performance counters allow the application to be interrupted whenever a performance counter rolls over from '-1' to '0'. This is often referred to as event-based sampling, since it samples (interrupts on) every Nth event [1], depending on the initial negative value in the counter.

Event-based sampling allows developers to learn where in an applications events occur. However, out-of-order cores may not be able to precisely interrupt on the instruction that caused the Nth event (or, because of the reason mentioned above, may not even know which of many outstanding events is the Nth event). This produces a problem known as 'skid'. A developer that wants to know exactly which instruction causes an event will encounter many difficulties when using traditional performance counters in a speculative, out-of-order core [2].

AMD's solution to this problem is known as Instruction Based Sampling (IBS). In a nutshell, IBS tracks instructions rather than events (hence instruction-based sampling instead of event-based sampling). Every Nth instruction that goes through the core is 'marked'. As it flows through the pipeline, information about many events caused by that instruction are gathered. Then, when the instruction is completed, multiple pieces of information about that instruction's operation are available for logging [3, 4].

IBS on AMD processors is split into two parts: fetch sampling (front-end) and op sampling (back-end). AMD cores operate on AMD64/x86 instructions in the in-order front end of the processor. These are broken down into internal micro-operations for execution in the out-of-order back end of the processor. As such, IBS for front-end operations and IBS for back-end operations work in similar ways, but are completely separate from one another.

Fetch (front-end) sampling counts the number of completed (successfully sent to the decoder) fetches. After observing N fetches (where N is a programmable number), the next fetch attempt is sampled. Information about that fetch operation is gathered. When the fetch operation is either sent to the decoder (i.e. it completes) or is aborted (e.g. due to a page fault), the processor is interrupted and the IBS information about the sampled fetch is made available to the OS through a series of model-specific registers (MSRs).

Depending on the processor family, these Fetch IBS Samples can contain some or all of the following information:

Op (back-end) sampling can be configured to count either the number of clock cycles or the number of dispatched micro-ops. In either case, once the programmable number of counts has taken place, the next micro-op is tagged. As that micro-op flows through the out-of-order back end of the processor, information about the events it causes are stored. When the op is retired, the processor is interrupted and the IBS information about the sampled op is made available to the OS through a series of MSRs.

Depending on the process family, these Op IBS Samples can contain some or all of the following information:

For more information about the technical details of AMD's Instruction Based Sampling, please refer AMD's various processor manuals: [5-17]

For more information about micro-ops in AMD cores, please refer to AMD's software optimization guides: [5-6, 18-19]. In particular, note that some of the descriptions in these manuals refer to macro-ops and micro-ops. For instance, in Family 17h cores, AMD64 instructions are broken into one or more macro-ops. These macro-ops are dispatched into the back-end of the pipeline, where they may be split into one or two micro-ops. For instane, an instruction that needs both the ALU (to do math or logic operatinos) and AGU (to calculate an address for a load or a store) will be split into two micro-ops. One of those micro-ops will go into the ALU scheduler units and the other will go to the AGU scheduler units. In these Family 17h cores, IBS op sampling actually samples macro-ops at dispatch time.

Background References

  1. S. V. Moore, "A Comparison of Counting and Sampling Modes of Using Performance Monitoring Hardware," in Proc. of the Int'l Conf. on Computational Science-Part II (ICCS), 2002.
  2. J. Dean, J. Hicks, C. A. Waldspurger, W. E. Weihl, G. Chrysos, "ProfileMe: Hardware Support for Instruction-Level Profiling on Out-of-Order Processors," in Proc. of the 30th IEEE/ACM Int'l Symp. on Microarchitecture (MICRO-30), 1997.
  3. P. J. Drongowski, "Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors," AMD Technical Report, 2007.
  4. P. Drongowski, L. Yu, F. Swehosky, S. Suthikulpanit, R. Richter, "Incorporating Instruction-Based Sampling into AMD CodeAnalyst," in Proc. of the 2010 IEEE Int'l Symp. on Performance Analysis of Systems & Software (ISPASS), 2010.
  5. Advanced Micro Devices, Inc. "Software Optimization Guide for AMD Family 10h and 12h Processors". AMD Publication #40546. Rev. 3.13. Appendix G.
  6. Advanced Micro Devices, Inc. "Software Optimization Guide for AMD Family 15h Processors". AMD Publication #47414. Rev. 3.07. Appendix F.
  7. Advanced Micro Devices, Inc. "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors". AMD Publication #31116. Rev. 3.62.
  8. Advanced Micro Devices, Inc. "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 12h Processors". AMD Publication #41131. Rev. 3.03.
  9. Advanced Micro Devices, Inc. "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 14h Models 00h-0Fh Processors". AMD Publication #43170. Rev. 3.03.
  10. Advanced Micro Devices, Inc. "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 15h Models 00h-0Fh Processors". AMD Publication #42301. Rev. 3.14.
  11. Advanced Micro Devices, Inc. "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 15h Models 10h-1Fh Processors". AMD Publication #42300. Rev. 3.12.
  12. Advanced Micro Devices, Inc. "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 15h Models 30h-3Fh Processors". AMD Publication #49125. Rev. 3.06.
  13. Advanced Micro Devices, Inc. "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 15h Models 60h-6Fh Processors". AMD Publication #50742. Rev. 3.05.
  14. Advanced Micro Devices, Inc. "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 15h Models 70h-7Fh Processorsi". AMD Publication #55072. Rev. 3.00.
  15. Advanced Micro Devices, Inc. "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 16h Models 00h-0Fh Processors". AMD Publication #48751. Rev. 3.03.
  16. Advanced Micro Devices, Inc. "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 16h Models 30h-3Fh Processors". AMD Publication #52740. Rev. 3.06.
  17. Advanced Micro Devices, Inc. "Processor Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1 Processors". AMD Publication #54945.
  18. Advanced Micro Devices, Inc. "Software Optimization Guide for AMD Family 16h Processors". AMD Publication #52128. Rev. 1.1.
  19. Advanced Micro Devices, Inc. "Software Optimization Guide for AMD Family 17h Processors". AMD Publication #55723. Rev. 3.00.

Trademark Attribution

© 2017-2019 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Phenom, Opteron, Ryzen, EPYC, and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Linux is a registered trademark of Linus Torvalds. Other names are for informational purposes only and may be trademarks of their respective owners.