Home

Awesome

Kinvolk Seccomp Agent

The Kinvolk Seccomp Agent is receiving seccomp file descriptors from container runtimes and handling system calls on behalf of the containers. Its goal is to support different use cases:

It is possible to write your own seccomp agent with a different behaviour by reusing the packages in the pkg/ directory. The Kinvolk Seccomp Agent is only about 100 lines of code. It relies on different packages:

Basic demo

sudo ./seccompagent -resolver=demo-basic

Demo of mount in a container without CAP_SYS_ADMIN:

/ # mount -t proc proc root
/ # ls /root/self/cmdline
/root/self/cmdline
/ # mkdir /abc
/ # ls -1d /ab*
/abc-pid-4072889
/ # chmod 777 /
chmod: /: Bad message

Demo on Kubernetes

Before you install the demo on k8s, please ensure all the requirements are satisfied.

This demo shows that the Seccomp Agent can have different behaviour depending on the Kubernetes pod (in this case, the pod's namespace and name).

{
   "architectures" : [
      "SCMP_ARCH_X86",
      "SCMP_ARCH_X32"
   ],
   "defaultAction" : "SCMP_ACT_ALLOW",
   "listenerPath": "/run/seccomp-agent.socket",
   "listenerMetadata": "MKDIR_TMPL=-{{.Namespace}}-{{.Pod}}-{{.Container}}\nEXEC_PATTERN=/bin/true\nEXEC_DURATION=2s\nMOUNT_PROC=true",
   "syscalls" : [
      {
         "action" : "SCMP_ACT_NOTIFY",
         "names" : [
            "openat",
            "open",
            "mkdir",
            "mount",
	    "chmod"
         ]
      }
   ]
}
kubectl apply -f deploy/seccompagent.yaml
apiVersion: v1
kind: Pod
metadata:
  name: mynotifypod
  # For older versions of Kubernetes (this annotation was deprecated in
  # Kubernetes v1.19 and completely removed in v1.27):
  annotations:
    seccomp.security.alpha.kubernetes.io/pod: localhost/notify.json
spec:
  restartPolicy: Never
  securityContext:
    # /var/lib/kubelet/seccomp/notify.json
    seccompProfile:
      type: Localhost
      localhostProfile: notify.json
  containers:
  - name: container1
    image: busybox
    command: ["sh"]
    args: ["-c", "sleep infinity"]
$ kubectl exec -it mynotifypod -- /bin/sh
/ # mkdir /abc
/ # ls -1d /abc*
/abc-default-mynotifypod-TODO
/ # mount -t proc proc root
/ # mount|grep /root
proc on /root type proc (rw,relatime)
/ # time -f %E /bin/echo -n ""
0m 0.00s
/ # time -f %E /bin/true
0m 2.00s

Combining with user namespaces

By combining this with Kubernetes's user namespace support it is possible to allow a user within a user namespace to perform some operations which would otherwise be limited to host root.

One example is mounting other filesystem types. This is most useful combined with user namespaces to allow mounting network file systems while a pod is running. This is far safer than giving the container privileged access but does expose more of the kernel to the pod, so you should consider your security carefully.

There is a possibility a process could change its user namespace after making the mount system call, which could result in a confusing state. To fix this the seccomp notify policy should use the SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV flag, however this is not yet available in runc and requires Linux >= 5.19.

Configure a policy, similar to above, but with the following metadata:

{
   "architectures" : [
      "SCMP_ARCH_X86",
      "SCMP_ARCH_X32"
   ],
   "defaultAction" : "SCMP_ACT_ALLOW",
   "listenerPath": "/run/seccomp-agent.socket",
   "listenerMetadata": "MOUNT_OTHER_FS_LIST=cifs\nMOUNT_NEED_CAP_ADMIN=true",
   "syscalls" : [
      {
         "action" : "SCMP_ACT_NOTIFY",
         "names" : [
            "mount"
         ]
      },
      {
         "action" : "SCMP_ACT_ALLOW",
         "names" : [
            "umount"
         ]
      }
   ]
}

(Policy cut down for sake of example, recommended to use a full policy that additionally configures notify for mount and allows umount.)

This has currently been successfully tested with cifs. Other filesystem types should work; NFS will need NFS client utilities installing within the container and on the host (e.g. to make upcalls work).

apiVersion: v1
kind: Pod
metadata:
  name: mynotifypod-userns
spec:
  restartPolicy: Never
  # Needs "UserNamespacesSupport" feature gate currently
  hostUsers: false
  securityContext:
    # /var/lib/kubelet/seccomp/notify.json
    seccompProfile:
      type: Localhost
      localhostProfile: notify.json
  containers:
  - name: container1
    image: alpine
    command: ["sh"]
    args: ["-c", "sleep infinity"]
    securityContext:
      capabilities:
        # This is safe combined with hostUsers: false
        add: [SYS_ADMIN]
$ kubectl exec -it mynotifypod-userns -- /bin/sh
/ # mkdir /mnt
/ # mount -t cifs -o username=user,password=pass '//10.0.0.1/C' /mnt
/ # df -h /mnt
/mnt # df -h /mnt
Filesystem                Size      Used Available Use% Mounted on
//10.0.0.1/C           95.4G     85.3G     10.1G  89% /mnt
/ # ls /mnt
$Recycle.Bin               Documents and Settings     Program files
[...]
/ # sed -i 's!^\(nobody.*/\)false!\1sh!' /etc/passwd
/ # su nobody
/ $ mount -t cifs -o username=user,password=pass '//10.0.0.1/C' /mnt
mount: permission denied (are you root?)