Awesome
bypass4netns: Accelerator for slirp4netns using SECCOMP_IOCTL_NOTIF_ADDFD
(Kernel 5.9)
bypass4netns is as fast as --net=host
and almost as secure as traditional slirp4netns.
The current version of bypass4netns needs to be used in conjunction with slirp4netns, however, future version may work without slirp4netns.
Benchmark
Workload: iperf3 -c HOST_IP
from podman run
--net=host
(insecure): 57.9 Gbps- bypass4netns: 56.5 Gbps
- slirp4netns: 7.56 Gbps
How it works
bypass4netns eliminates the overhead of slirp4netns by trapping socket syscals and executing them in the host network namespace using
SECCOMP_IOCTL_NOTIF_ADDFD
.
See also the talks.
Requirements
- kernel >= 5.9
- runc >= 1.1, or crun >= 1.6
- libseccomp >= 2.5
- Rootless Docker, Rootless Podman, or Rootless containerd/nerdctl
Build-time requirement:
- golang >= 1.17
Compile
make
sudo make install
The following binaries will be installed into /usr/local/bin
:
bypass4netns
: the bypass4netns binary.bypass4netnsd
: an optional REST daemon for controlling bypass4netns processes from a non-initial network namespaces. Used by nerdctl.
Usage
Hard way (docker|podman|nerdctl)
$ bypass4netns --ignore="127.0.0.0/8,10.0.0.0/8,auto" -p="8080:80"
--ignore=...
is a list of the CIDRs that cannot be bypassed:
- loopback CIDRs (
127.0.0.0/8
) - slirp4netns CIDR (
10.0.0.0/8
) - CNI CIDRs inside the slirp's network namespace (
auto
)
$ ./test/seccomp.json.sh >$HOME/seccomp.json
$ $DOCKER run -it --rm --security-opt seccomp=$HOME/seccomp.json --runtime=runc alpine
$DOCKER
is either docker
, podman
, or nerdctl
.
Easy way (nerdctl)
bypass4netns is experimentally integrated into nerdctl (>= 0.17.0).
containerd-rootless-setuptool.sh install-bypass4netnsd
nerdctl run -it --rm -p 8080:80 --annotation nerdctl/bypass4netns=true alpine
NOTE: nerdctl prior to v2.0 needs --label
instead of --annotation
.
Also, the syntax will be probably replaced with --security-opt
or something like --network-opt
in a future version of nerdctl.
:warning: Caveats :warning:
Accesses to host abstract sockets and host loopback IPs (127.0.0.0/8) from containers are designed to be rejected.
However, it is probably possible to connect to host loopback IPs by exploiting TOCTOU
of struct sockaddr *
pointers.
TODOs
- Integration for Docker
- Integration for Podman
- Enable to connect to port-fowarded ports from other containers
- This means that a container with publish option like
-p 8080:80
cannot be connected to port80
from other containers in the same network namespace
- This means that a container with publish option like
- Handle protocol specific publish option like
-p 8080:80/udp
.- Currently, bypass4netns ignores porotocol in publish option.
- Bind port when bypass4netns starts with publish option like
-p 8080:80
- Currently, bypass4netns bind socket to port
8080
when it handles bind(2) with target port80
. - bind(2) can fail if other process bind port
8080
before container's process bind port80
- Currently, bypass4netns bind socket to port
Publications
- Naoki Matsumoto and Akihiro Suda. Accelerating TCP/IP Communications in Rootless Containers by Socket Switching. Presented in the 156th meeting of the Special Interest Groups on System Software and Operating System (SIGOS), SWoPP 2022, Shimonoseki, Japan, July 2022.
- Naoki Matsumoto and Akihiro Suda. bypass4netns: Accelerating TCP/IP Communications in Rootless Containers. arXiv:2402.00365 [cs.NI], February 2024.