Home

Awesome

Container Profiler

University of Washington Tacoma

<img src="./docs/logo.png" alt="drawing" width="400"/>

Table of Contents

FAQ

General

Why should I use the Container Profiler?

Easy to use profiling for applications or workflows in a container.

Usage

How do I use the Container Profiler on my own container?

1. Install the Container Profiler

2. Here the common use cases for applying the Container Profiler to profile a container

Miscellaneous

How should I reference the Container Profiler if I use it?

MANUAL

GENERAL INFORMATION

The Container Profiler can be used as a tool to profile an application or workflow by taking interval snapshots of a collection of linux resource utilization metrics throughout the course of the job. These snapshots are then stored as JSON data which can then be used to see how the metrics changed once the job is finished.

In order to use the Container Profiler, a container with an application/workflow/script to be run and profiled is needed. Use of Linux as the Docker host operating system is recommended.

Overview: Running the Container Profiler

Container Profiler

ContainerProfiler includes bash scripts rudataall.sh to profile the resource utilization on VM level, container level and process level and deltav2.sh to compute the delta statistics of resource utilization between two time instances. Detailed usage of the profiler script can be found in the YouTube video linked below (demo scripts can be found in profiler_demo directory).

Authors: Wes Lloyd & Huazeng Deng & Ling-hong Hung & Varik Hoang

Version: 0.3

GitHub: https://github.com/wlloyduw/ContainerProfiler

Preprint: https://arxiv.org/abs/2005.11491

Prerequisite: Linux host operating system (recommended)

Metrics Description

=======

The text below describes the metrics captured by the script rudataall.sh for profiling resource utilization on the virtual machine (VM) level, container level and process level. A complete metrics description spreadsheet can be found at https://github.com/wlloyduw/ContainerProfiler/blob/master/metrics_description_for_rudataall.xlsx

VM Level Metrics


AttributeDescription
vCpuTimeTotal CPU time (cpu_user+cpu_kernel) in centiseconds (cs) (hundreths of a second)
vCpuTimeUserModeCPU time for processes executing in user mode in centiseconds (cs)
vCpuTimeKernelModeCPU time for processes executing in kernel mode in centiseconds (cs)
vCpuIdleTimeCPU idle time in centiseconds (cs)
vCpuTimeIOWaitCPU time waiting for I/O to complete in centiseconds (cs)
vCpuTimeIntSrvcCPU time servicing interrupts in centiseconds (cs)
vCpuTimeSoftIntSrvcCPU time servicing soft interrupts in centiseconds (cs)
vCpuContextSwitchesThe total number of context switches across all CPUs
vCpuNiceTime spent with niced processes executing in user mode in centiseconds (cs)
vCpuStealTime stolen by other operating systems running in a virtual environment in centiseconds (cs)
vCpuTypeThe model name of the processor
vCpuMhzThe precise speed in MHz for thee processor to the thousandths decimal place
vDiskSectorReadsThe number of disk sectors read, where a sector is typically 512 bytes, assumes /dev/sda1
vDiskSectorWritesThe number of disk sectors written, where a sector is typically 512 bytes, assumes /dev/sda1
vDiskSuccessfulReadsNumber of disk reads completed succesfully
vDiskMergedReadsNumber of disk reads merged together (adjacent and merged for efficiency)
vDiskReadTimeTime spent reading from the disk in millisecond (ms)
vDiskSuccessfulReadsNumber of disk reads completed succesfully
vDiskSuccessfulWritesNumber of disk writes completed succesfully
vDiskMergedWritesNumber of disk writes merged together (adjacent and merged for efficiency)
vDiskWriteTimeTime spent writing in milliseconds (ms)
vMemoryTotalTotal amount of usable RAM in kilobytes (KB)
vMemoryFreeThe amount of physical RAM left unused by the system in kilobytes (KB)
vMemoryBuffersThe amount of temporary storage for raw disk blocks in kilobytes (KB)
vMemoryCachedThe amount of physical RAM used as cache memory in kilobytes (KB)
vNetworkBytesRecvdNetwork Bytes received assumes eth0 in bytes
vNetworkBytesSentNetwork Bytes written assumes eth0 in bytes
vLoadAvgThe system load average as an average number of running plus waiting threads over the last minute
vPgFaulttype of exception raised by computer hardware when a running program accesses a memory page that is not currently mapped by the memory management unit (MMU) into the virtual address space of a process
vMajorPageFaultMajor page faults are expected when a prdocess starts or needs to read in additional data and in these cases do not indicate a problem condition
vIdVM ID (default is "unavailable")
currentTimeNumber of seconds (s) that have elapsed since January 1, 1970 (midnight UTC/GMT)

Container Level Metrics


AttributeDescription
cCpuTimeTotal CPU time consumed by all tasks in this cgroup (including tasks lower in the hierarchy) in nanoseconds (ns)
cProcessorStatsSelf-defined parameter
cCpu${i}TIMECPU time consumed on each CPU by all tasks in this cgroup (including tasks lower in the hierarchy) in nanoseconds (ns)
cNumProcessorsNumber of CPU processors
cCpuTimeUserModeCPU time consumed by tasks in user mode in this cgroup in centiseconds (cs)
cCpuTimeKernelModePU time consumed by tasks in kernel mode in this cgroup in centiseconds (cs)
cDiskSectorIONumber of sectors transferred to or from specific devices by a cgroup
cDiskReadBytesNumber of bytes transferred from specific devices by a cgroup in bytes
cDiskWriteBytesNumber of bytes transferred to specific devices by a cgroup in bytes
cMemoryUsedTotal current memory usage by processes in the cgroup in bytes
cMemoryMaxUsedMaximum memory used by processes in the cgroup in bytes
cNetworkBytesRecvdThe number of bytes each interface has received
cNetworkBytesSentThe number of bytes each interface has sent
cIdContainer ID

Process Level Metrics


AttributeDescription
pIdProcess ID
pNumThreadsNumber of threads in this process
pCpuTimeUserModeTotal CPU time this process was scheduled in user mode, measured in clock ticks (divide by sysconf(_SC_CLK_TCK))
pCpuTimeKernelModeTotal CPU time this process was scheduled in kernel mode, measured in clock ticks (divide by sysconf(_SC_CLK_TCK))
pChildrenUserModeTotal time children processes of the parent were scheduled in user mode, measured in clock ticks
pChildrenKernelModeTotal time children processes of the parent were scheduled in kernel mode, measured in clock ticks
pVoluntaryContextSwitchesNumber of voluntary context switches
pNonvoluntaryContextSwitchesNumber of involuntary context switches
pBlockIODelaysAggregated block I/O delays, measured in clock ticks
pVirtualMemoryBytesVirtual memory size in bytes
pResidentSetSizeResident Set Size: number of pages the process has in real memory. This is just the pages which count toward text, data, or stack space. This does not include pages which have not been demand-loaded in, or which are swapped out
pNumProcessesNumber of processes inside a container

Tutorial: Profiling a Container

Prerequisites

The Container Profiler has been designed to operating where Linux is the host operating system. The tool may be operable on other platforms besides Linux but has not been tested. For best results, use of a Linux host operating system is recommended.

Video Demonstration

Video Channel: https://www.youtube.com/@containerprofiler6371

  1. Getting Started with the Container Profiler tool - Part 1 & Part 2
  2. Profiling a bash script with the Container Profiler - Link
  3. Profiling applications with the Container Profiler using the install script - Link
  4. Graphing Resource Utilization with the Container Profiler tool - Link
  5. Profiling and graphing resource utilization of pgbench, the postgresql database benchmark - Link

Install the Container Profiler

git clone https://github.com/wlloyduw/ContainerProfiler

How do I build the ContainerProfiler to profile the total resource utilization

Building the ContainerProfiler should be completed using a Linux environment where Docker has been preinstalled.

sudo ./build.sh

1. How do I profile a task or application

sudo docker run --rm -v ${PWD}:/OUTPUT_DIR  profiler:custom profile -o /OUTPUT_DIR SET_OF_TASKS

For example:

sudo docker run --rm -v ${PWD}:/data  profiler:custom profile -o /data "sleep 5; ls -al"

OUTPUT_DIR: the directory that holds profiling files in JSON format

2. How do I perform time series profiling of a task or application

The idea is to add the '-t' argument to specify a time series sampling interval. (e.g. '-t 1' for 1-second sampling)

sudo docker run --rm
    -v ${PWD}:/OUTPUT_DIR
    profiler:custom profile -t TIME_INTERVAL -o /OUTPUT_DIR SET_OF_TASKS

For example:

sudo docker run --rm -v ${PWD}:/data  profiler:custom profile -t 1 -o /data "sleep 5; ls -al"

How do I build a new container that integrates the ContainerProfiler into on an existing container

You will need access to the Dockerfile used to build your container. The idea is that your container will already be configured to run a specified task or application, and we simply want to integrate the container profiler so it is easy to profile. The idea is to point to the folder containing your Dockerfile and any other dependencies.

sudo ./build.sh -d DOCKER_FILE_PATH

For example:

sudo ./build.sh -d docker/sysbench.docker

docker/sysbench.docker:

FROM ubuntu:20.04
MAINTAINER varikmp<varikmp@uw.edu>
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get -y update \
    && apt-get install -y sysbench \
    && rm -rf /var/lib/apt/lists/*
ENTRYPOINT ["sysbench"]

3. How do I profile my container once I've integrated the ContainerProfiler

sudo docker run --rm \
    -e TOOL=profile \
    -e TOOL_ARGUMENTS="-o /data" \
    -v ${PWD}:/data \
    profiler:CONTAINER_TAG YOUR_ARUMENTS_GO_HERE

For example:

sudo docker run --rm \
    -e TOOL=profile \
    -e TOOL_ARGUMENTS="-o /data" \
    -v ${PWD}:/data \
    profiler:sysbench --test=cpu --cpu-max-prime=20000 --max-requests=4000 run

4. How do I perform time series sampling on my container once I've integrated the ContainerProfiler

The idea is to add the '-t' argument to specify a time series sampling interval. (e.g. '-t 1' for 1-second sampling)

sudo docker run --rm \
    -e TOOL=profile \
    -e TOOL_ARGUMENTS="-t TIME_INTERVAL -o /data" \
    -v ${PWD}:/data \
    profiler:CONTAINER_TAG YOUR_ARUMENTS_GO_HERE

For example:

sudo docker run --rm \
    -e TOOL=profile \
    -e TOOL_ARGUMENTS="-t 1 -o /data" \
    -v ${PWD}:/data \
    profiler:sysbench --test=cpu --cpu-max-prime=20000 --max-requests=4000 run

How do I build a container that integrates the ContainerProfiler to profile software where I provide an installation script

Here the installation script should install all software dependencies required to run the application. It is not necessary to preface installation commands with 'sudo'.

sudo ./build.sh -i INSTALL_SCRIPT_PATH

For example:

sudo ./build.sh -i docker/install.sh

docker/install.sh:

# do NOT remove this command
apt-get update

# fill up your additional steps for the package installation
apt-get install -y build-essential gcc nano git \
	sysbench && echo "test"

# remove unnecessary build packages for execution
apt-get remove -y gcc build-essential nano git

# clean up the installation
apt-get autoclean -y && apt-get autoremove -y --purge && rm -rf /var/lib/apt/lists/* && rm -rf /var/cache/apk*

You will be asked to enter an entry point based on the software you attempt to install in your install script. The entry point is the name of the command (without any arguments) that will be run. For example, if the installation script installs sysbench, then the name of the command will be 'sysbench'. Later, when running the container you do not need to specify the command again, but just the arguments that are to be passed to the command.

5. How do I profile a task or application installed using the installation script

After the container name 'profiler:sysbench' you will need to specify the command line arguments for the application being profiled.

Short NameLong NameOptionalDescriptions
-o--output-directoryNospecify the output directory for profiling files in JSON format
-m--metric-levelYesspecify the metric levels (v: virtual machine, c: container, p: process)
-t--time-stepsYesspecify the time steps between two instance times
-c--clean-upYesclean up the profiling files from the previous run
sudo docker run --rm \
    -e TOOL=profile \
    -e TOOL_ARGUMENTS="-o /data" \
    -v ${PWD}:/data \
    profiler:CONTAINER_TAG YOUR_ARUMENTS_GO_HERE

For example:

sudo docker run --rm \
	-e TOOL=profile \
	-e TOOL_ARGUMENTS="-o /data" \
	-v ${PWD}:/data \
	 profiler:sysbench --test=cpu --cpu-max-prime=20000 --max-requests=4000 run

6. How do I perform time series sampling of the task or application installed using the installation script

After the container name 'profiler:sysbench' you will need to specify the command line arguments for the application being profiled.

In addition, add the '-t' argument to specify a time series sampling interval. (e.g. '-t 1' for 1-second sampling)

sudo docker run --rm \
    -e TOOL=profile \
    -e TOOL_ARGUMENTS="-t TIME_INTERVAL -o /data" \
    -v ${PWD}:/data \
    profiler:CONTAINER_TAG YOUR_ARUMENTS_GO_HERE

For example:

sudo docker run --rm \
	-e TOOL=profile \
	-e TOOL_ARGUMENTS="-t 1 -o /data" \
	-v ${PWD}:/data \
	 profiler:sysbench --test=cpu --cpu-max-prime=20000 --max-requests=4000 run

Delta: a tool is to compute the delta statistics of resource utilization between time instances

After receiving profiling files from the previous step, we run the delta option to generate delta statistics in JSON format.

Short NameLong NameOptionalDescriptions
-i--input-directoryNospecify the input directory for calculating aggregate values in JSON format
-o--output-directoryNospecify the output directory for calculating aggregate values in JSON format
-a--aggregate-config-fileYesspecify the aggregate configuration file
-c--clean-upYesclean up the aggregate files from the previous run
sudo docker run --rm \
	-e TOOL=delta \
	-e TOOL_ARGUMENTS="-i /data -o /data" \
	-v ${PWD}:/data \
	 profiler:sysbench

CSV generator: a tool is to generate the statistics of resource utilization in JSON format

We need to specify the directory that holds statistic files. Those files are generated from the delta tool.

Short NameLong NameOptionalDescriptions
-i--input-directoryNospecify the input directory of aggregate files
-o--csv-output-fileNospecify the output file for CSV file generation
-w--overwriteYesoverwrite the CSV file from the previous run
sudo docker run --rm \
	-e TOOL=csv \
	-e TOOL_ARGUMENTS="-i /data -o /data/delta.csv" \
	-v ${PWD}:/data \
	 profiler:sysbench

Graph: a tool is to make graph based on the statistic CSV file

The tool generate the graphs based on the statistic file in CSV format. Also, we can provide the metric configuration file for the graphs.

Short NameLong NameOptionalDescriptions
-r--csv-input-fileNospecify the aggregate CSV file
-m--metric-input-fileYesspecify the metric file specifying metrics for graphing
-g--graph-output-directoryNospecify the output directory for graph images
-s--single-plotYesplot single curve on a graph
sudo docker run --rm \
	-e TOOL=graph \
	-e TOOL_ARGUMENTS="-i /data -o /data" \
	-v ${PWD}:/data \
	 profiler:sysbench

Sysbench profiling example

Starting from a fresh checkout of the ContainerProfiler sources, here is how to build a separate Container with a benchmark application (sysbench), and then use the ContainerProfiler to profile resource utilization.

# From ContainerProfiler checkout directory

# build ContainerProfiler
sudo ./build.sh

# Create a new Docker container to encapsulate sysbench for benchmarking:
mkdir sysb

# Create Dockerfile for sysbench container:
gedit sysb/sysbench

Here is the content of the sysbench Docker container:

FROM ubuntu
RUN apt-get update
RUN apt-get install -y sysbench

Next build the sysbench Docker container integrating the ContainerProfiler tool:

# build sysbench container integrating ContainerProfiler
sudo ./build.sh -d docker/sysbench.docker

docker/sysbench.docker:

FROM ubuntu:20.04
MAINTAINER varikmp<varikmp@uw.edu>
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get -y update \
    && apt-get install -y sysbench \
    && rm -rf /var/lib/apt/lists/*
ENTRYPOINT ["sysbench"]

Now perform delta resource utilization profiling to measure resource consumption of running sysbench. All output files will go under local data directory.

# make data directory
mkdir data

# profile sysbench
sudo docker run --rm -e TOOL=profile -e TOOL_ARGUMENTS="-o /data" -v ${PWD}/data:/data profiler:sysbench \
'sysbench --test=cpu --cpu-max-prime=2000000 --num-threads=2 --max-requests=10 run'

# calculate delta values
sudo docker run --rm -e TOOL=delta -e TOOL_ARGUMENTS="-i /data -o /data" -v ${PWD}/data:/data profiler:sysbench

# create csv output
sudo docker run --rm -e TOOL=csv -e TOOL_ARGUMENTS="-w -i /data -o /data/output.csv" -v ${PWD}/data:/data \
profiler:sysbench

Under the data directory, inspect the raw resource utilization sampling files which should be named using unique date/time stamps. Also you will find the delta JSON and delta CSV output files. Static.json contains attributes sampled by the ContainerProfiler that do not change.