Home

Awesome

pic

Add sysctl knobs for protecting the working set

Protection of clean file pages (page cache) may be used to prevent thrashing, reducing I/O under memory pressure, avoid high latency and prevent livelock in near-OOM conditions. The current le9 patches are based on patches that were originally created by Mandeep Singh Baines (2010) and Marcus Linsner (2018-2019). Let's give the floor to the original founders:

On ChromiumOS, we do not use swap. When memory is low, the only way to free memory is to reclaim pages from the file list. This results in a lot of thrashing under low memory conditions. We see the system become unresponsive for minutes before it eventually OOMs. We also see very slow browser tab switching under low memory. Instead of an unresponsive system, we'd really like the kernel to OOM as soon as it starts to thrash. If it can't keep the working set in memory, then OOM. Losing one of many tabs is a better behaviour for the user than an unresponsive system.

This patch create a new sysctl, min_filelist_kbytes, which disables reclaim of file-backed pages when when there are less than min_filelist_bytes worth of such pages in the cache. This tunable is handy for low memory systems using solid-state storage where interactive response is more important than not OOMing.

With this patch and min_filelist_kbytes set to 50000, I see very little block layer activity during low memory. The system stays responsive under low memory and browser tab switching is fast. Eventually, a process a gets killed by OOM. Without this patch, the system gets wedged for minutes before it eventually OOMs.

https://lore.kernel.org/lkml/20101028191523.GA14972@google.com/

The attached kernel patch (applied on top of 4.18.5) that I've tried, almost completely eliminates the disk thrashing (the constant reading of executable (and .so) files on every context switch) associated with freezing the OS and so, with this patch, the OOM-killer is triggered within a maximum of 1 second when it is needed, rather than, without this patch, freeze the OS for minutes (or just a long time, it may even auto reboot depending on your kernel .config options set to panic (reboot) on hang after xx seconds) with constant disk reading well before OOM-killer gets triggered.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/159356/comments/89

Original le9 patches (by Marcus Linsner) protected active file pages. Current versions (le9ec) allow to protect the specified amount of clean file pages and anonymous pages.

le9ec patch

The kernel does not provide a way to protect the working set under memory pressure. A certain amount of anonymous and clean file pages is required by the userspace for normal operation. First of all, the userspace needs a cache of shared libraries and executable binaries. If the amount of the clean file pages falls below a certain level, then thrashing and even livelock can take place.

The patch provides sysctl knobs for protecting the working set (anonymous and clean file pages) under memory pressure.

The vm.anon_min_kbytes sysctl knob provides hard protection of anonymous pages. The anonymous pages on the current node won't be reclaimed under any conditions when their amount is below vm.anon_min_kbytes. This knob may be used to prevent excessive swap thrashing when anonymous memory is low (for example, when memory is going to be overfilled by compressed data of zram module).

The vm.clean_low_kbytes sysctl knob provides best-effort protection of clean file pages. The file pages on the current node won't be reclaimed under memory pressure when the amount of clean file pages is below vm.clean_low_kbytes unless we threaten to OOM. Protection of clean file pages using this knob may be used when swapping is still possible to

The vm.clean_min_kbytes sysctl knob provides hard protection of clean file pages. The file pages on the current node won't be reclaimed under memory pressure when the amount of clean file pages is below vm.clean_min_kbytes. Hard protection of clean file pages using this knob may be used to

le9ec patches provide three sysctl knobs (vm.anon_min_kbytes, vm.clean_low_kbytes, vm.clean_min_kbytes) with zero values and does not protect the working set by default (CONFIG_ANON_MIN_KBYTES=0, CONFIG_CLEAN_LOW_KBYTES=0, CONFIG_CLEAN_MIN_KBYTES=0). You can specify other values during kernel build, or change the knob values on the fly.

Effects

Note that the effects depend on the values of the sysctl tunables.

Testing

These tools may be used to monitor memory and PSI metrics during stress tests:

Please report your results here.

Demo

Warning

Review at LKML

What about non-x86?

No data. Testing is encouraged. Please report your results here.

le9 and Multigenerational LRU Framework

User feedback

See USER_FEEDBACK.md.

Resources

See RESOURCES.md.