Home

Awesome

nvidia-pstated

A daemon that automatically manages the performance states of NVIDIA GPUs.

How it works

flowchart TD
    A(Start) --> B
    subgraph For each GPU
    B("Check temperature[1]") -->|Below threshold| D
    B("Check temperature[1]") -->|Above threshold| C
    C("Enter low PState[2]") --> M
    D(Check utilization) -->|Is 0%| E
    D(Check utilization) -->|Is not 0%| J
    E(Check current PState) -->|High| F
    E(Check current PState) -->|Low| I
    F("Iterations counter exceeded threshold[3]") -->|Yes| G
    F("Iterations counter exceeded threshold[3]") -->|No| H
    G("Enter low PState[2]") --> H
    H(Increment iterations counter) --> M
    I(Do nothing) --> M
    J(Check current PState) -->|High| K
    J(Check current PState) -->|Low| L
    K(Reset iterations counter) --> M
    L("Enter high PState[4]") --> M
    end
    M(End) --> N
    N("Sleep[5]") --> A

1 - Threshold is controlled by option --temperature-threshold (default: 80 degrees C)
2 - Value is controlled by option --performance-state-low (default: 8)
3 - Threshold is controlled by option --iterations-before-switch (default: 30 iterations)
4 - Value is controlled by option --performance-state-high (default: 16)
5 - Value is controlled by option --sleep-interval (default: 100 milliseconds)

Installation

Prerequirements

Linux

Make sure the proprietary NVIDIA driver is installed.

You will need the following libraries:

Packages that provide these libraries:

On Debian derivatives, you can use apt search libnvidia-api.so.1 and apt search libnvidia-ml.so.1 to find the package you need.

Note that you MUST run this daemon at the host level, i.e. where the CUDA Driver is available. You can NOT run this daemon in a container.

nvidia-container-stack

Windows

Make sure the NVIDIA driver is installed.

Installation

Download the latest version of the executable for your OS from releases.

Building from source code

Prerequirements

Building

# Configure
cmake -B build

# Build
cmake --build build

Misc

Managing only specific GPUs

You can use -i/--ids option to manage only specific GPUs.

Suppose you have 8 GPUs and you want to manage only the first 4 (as in nvidia-smi):

./nvidia-pstated -i 0,1,2,3

systemd service

Install nvidia-pstated in /usr/local/bin. Then save the following as /etc/systemd/system/nvidia-pstated.service.

[Unit]
Description=A daemon that automatically manages the performance states of NVIDIA GPUs
StartLimitInterval=0

[Service]
DynamicUser=yes
ExecStart=/usr/local/bin/nvidia-pstated
Restart=on-failure
RestartSec=1s

[Install]
WantedBy=multi-user.target

vGPU manager

If you are using a hypervisor (KVM) with a vGPU manager, you cannot run nvidia-pstated in virtual machines. Instead, you can run it at the hypervisor level.

To do this, you need to:

  1. Extract libnvidia-api.so.1 from your guest driver (in my case Guest_Drivers/nvidia-linux-grid-535_535.183.06_amd64.deb/data.tar.xz/usr/lib/x86_64-linux-gnu/libnvidia-api.so.1) to some directory.
  2. Download nvidia-pstated to the same directory.
  3. Try running nvidia-pstated: LD_LIBRARY_PATH=. ./nvidia-pstated. You should get the following:
    $ LD_LIBRARY_PATH=. ./nvidia-pstated
    NvAPI_Initialize(): NVAPI_ERROR
    
    Check dmesg, you should get the following message:
    NVRM: API mismatch: the client has the version 535.183.06, but
    NVRM: this kernel module has the version 535.183.04.  Please
    NVRM: make sure that this kernel module and all NVIDIA driver
    NVRM: components have the same version.
    
  4. Use sed -i 's/535.183.06/535.183.04/g' libnvidia-api.so.1 (replace the values with what you got in dmesg) to replace the client version in libnvidia-api.so.1.
  5. Run nvidia-pstated: LD_LIBRARY_PATH=. ./nvidia-pstated. Enjoy.