Home

Awesome

Content

SilentMoonWalk

Description

Unwinder provides a full weaponization of SilentMoonWalk technique, allowing to obtain complete and stable call stack spoofing in Rust.

This technique comes with the following characteristics:

Credits

kudos to the creators of the SilentMoonWalk technique:

And of course a huge shoutout to namazso for the Twitter thread that inspired this whole project.

Usage

Import this crate into your project by adding the following line to your cargo.toml and compile on release mode:

[dependencies]
unwinder = "0.1.3"

The main functionality of this crate has been wrapped in two macros:

To use any of these macros it is required to import std::ffi::c_void data type.

Both macros return a *mut c_void that can be used to retrieve the value returned by the function executed. More detailed information in the examples section.

call_function macro

This macro is used to call any desired function with a clean call stack. The macro expects the following parameters:

indirect_syscall macro

This macro is used to perform any desired indirect syscall with a clean call stack. The macro expects the following parameters:

Parameter passing

In order to pass arguments of different types to these two macros, the following considerations must be taken into account:

Examples

Calling Sleep

let k32 = dinvoke_rs::dinvoke::get_module_base_address("kernel32.dll");
let sleep = dinvoke_rs::dinvoke::get_function_address(k32, "Sleep"); // Memory address of kernel32.dll!Sleep() 
let miliseconds = 1000i32;
unwinder::call_function!(sleep, false, miliseconds);

Calling OpenProcess

let k32 = dinvoke_rs::dinvoke::get_module_base_address("kernel32.dll"); 
let open_process: isize = dinvoke_rs::dinvoke::get_function_address(k32, "Openprocess");
let desired_access: u32 = 0x1000;
let inherit = 0i32;
let pid = 20628i32;
let handle = unwinder::call_function!(open_process, false, desired_access, inherit, pid); // returns *mut c_void
let handle: HANDLE = std::mem::transmute(handle);
println!("Handle id: {:x}", handle.0);

Notice that the macro returns a *mut c_void that can be directly converted to a HANDLE since both data types has the same size. This allows to access to the value returned by OpenProcess, which is the new handle to the target process.

Calling NtDelayExecution as indirect syscall

let large = 0x8000000000000000 as u64; // Sleep indefinitely
let large: *mut i64 = std::mem::transmute(&large);
let alertable = false;
let ntstatus = unwinder::indirect_syscall!("NtDelayExecution", false, alertable, large); // returns *mut c_void
println!("ntstatus: {:x}", ntstatus as i32);

Notice that the macro returns a *mut c_void that can be used to retrieve the NTSTATUS returned by NtDelayExecution.

Concatenate macro calls

The spoofing process can be concatenated any number of times without an abnormal call stack size increment. The execution flow will be preserved as well. The following code is an example of this:

fn main()
{
	function_a();
}

fn function_a()
{
	unsafe
	{
		let func_b = function_b as usize;
		call_function!(func_b, false);
		println!("function_a done.");
	}
}

fn function_b()
{
	unsafe
	{
		let func_c = function_c as usize;
		call_function!(func_c, false);
		println!("function_b done.")
	}
}

fn function_c()
{
	unsafe
	{
		let large = 0x0000000000000000 as u64; // Don't sleep so we return to function_b, allowing to check the execution flow preservation.
		let large: *mut i64 = std::mem::transmute(&large);
		let alertable = false;
		let ntstatus = unwinder::indirect_syscall!("NtDelayExecution", false, alertable, large);
		println!("ntstatus: {:x}", (ntstatus as usize) as i32); //NTSTATUS is a i32, although that second casting is not really required in this case.
	}
}

Considerations

Initial frame

If you set the second parameter to true (both macros), the spoofing process will try to keep the thread start address' frame in the call stack to increase legitimacy.

Call stack spoofed keeping the main module.

Sometimes, the thread's start function does not perform a call to a subsequent function (e.g. a jmp instruction is executed instead), meaning there is not return address pushed to the stack. In that scenario (and also if you set that second parameter to false), the spoofed call stack will start at BaseThreadInitThunk's frame.

Call stack spoofed without main module.

PoC

In order to test the implementation of the technique, PE-sieve has been used with the flag /threads. The results of the test shows how the inpection of the call stack does not reveal the pressence of the payload when this crate's functionalities are used. As it can be seen in the second image, the payload is detected when unwinder is not used.

PE-sieve results when unwinder is used. PE-sieve results when unwinder is not used.

Stack replacement

Technique description

This is a call stack spoofing alternative to SilentMoonWalk that allows to keep a clean call stack during the execution of your program. The main idea behind this technique is that each called function inside your module takes care of the previously pushed return address, finding at runtime a legitimate function with the same frame size as that of the return address to be spoofed. Once a legitime function with the same frame size has been located, an offset within it is calculated and the final address is used to replace the last return address, hiding any anomalous entry in the call stack and keeping it unwindable. The original return address is stored by unwinder and it is moved back to the right position in the stack before a return instruction is executed, allowing to continue the normal flow of the program.

<p align="center"> <img src="/images/stack_replacement.png" alt="Stack replacement" width="700" > </p>

This is an experimental feature that despite being fully functional it is still under development and research, so make sure to test your code if you decide to integrate this technique on it.

How to use it

To use the stack replacement functionality you should add the following line to your cargo.toml and compile on release mode:

[dependencies]
unwinder = {version = "0.1.3", features = ["Experimental"]}

The main functionality of this feature has been wrapped in the following macros:

To use these macros it is required to import std::ffi::c_void data type. All the functions using any of these macros should be labeled with the #[no_mangle] or #[inline(never)] attributes to prevent the rust compiler from inlining them during the optimization process.

Before diving into a practical example showing how to use all of this stuff, just a quick inspection of the replace_and_call/replace_and_syscall pair of macros and how to pass them the expected arguments.

replace_and_call

This macro is used to call any desired function outside of the current module with a clean call stack while using stack replacement. The macro expects the following parameters:

replace_and_syscall

This macro is used to perform any desired indirect syscall with a clean call stack while using stack replacement. The macro expects the following parameters:

Example

I think the best way to show how these macros are used is through a practical example. Let's suppose we are creating a dll that will be reflectively injected to memory. This dll will export two functions ExportA and ExportB, so we will consider these two functions as the module's entry points. Both of them must call start_stack_replacement macro right at the beginning and also they must call the reverse end_replacement macro before returning. The start_stack_replacement macro expects as argument the module's base address, or you can pass 0 if you dont know that address at runtime, the macro will try to figure it out by itself.

#[no_mangle]
fn ExportedA(base_address: usize) -> bool
{
    unwinder::start_replacement!(base_address);
    ...
    unwinder::end_replacement!();

    true
}

#[no_mangle]
fn ExportedB() -> bool
{
    unwinder::start_replacement!(0);
    ...
    unwinder::end_replacement!();

    true
}

Starting the stack replacement process involves the manual crafting of a new stack that will be used until the end_replacement macro is called. The following picture illustrates what is going on under the hood:

<p align="center"> <img src="/images/start_stack_replacement.png" alt="Stack replacement" width="700" > </p>

Although theoretically it would not be necessary to start a new stack from scratch, I've decided to implement the process this way to ensure stability and to prevent anything from breaking.

Now, let's assume that our ExportedA function makes several calls to another two internal functions. These two internal functions are responsible for replacing/restoring the original return address that will point to some place within ExportedA, breaking the call stack unless we take care of it. This replacement process involves wrapping our internal function's code between the replace_and_continue and restore macros:

#[no_mangle]
fn ExportedA(base_address: usize) -> bool
{
    unwinder::start_replacement!(base_address);
    let ret_a = internal_a();
    let ret_b = internal_b(ret_a);
    unwinder::end_replacement!();

    ret_b
}

#[inline(never)] // This attribute is mandatory
fn internal_a() -> bool
{
    unwinder::replace_and_continue();
    ...
    unwinder::restore();
    
    some_value
} 

#[inline(never)] // This attribute is mandatory
fn internal_b(value: bool) -> bool
{
    unwinder::replace_and_continue();
    ...
    unwinder::restore();
    
    some_value
} 

Finally, both internal_a and internal_b functions make use of some Windows API functionality. To keep the unwindable call stack, these calls should be performed through the replace_and_call (normal call) or replace_and_syscall (indirect syscall) macros.

#[no_mangle] // This attribute is mandatory
fn ExportedA(base_address: usize) -> bool
{
    unwinder::start_replacement!(base_address);
    let ret_a = internal_a();
    let ret_b = internal_b(ret_a);
    unwinder::end_replacement!();

    ret_b
}

#[inline(never)] // This attribute is mandatory
fn internal_a() -> bool
{
    unwinder::replace_and_continue();
    ...
    let module_name = "advapi32.dll";
    let module_name = CString::new(module_name.to_string()).expect("");
    let module_name_ptr: *mut u8 = std::mem::transmute(module_name.as_ptr());
    let k32 = dinvoke_rs::dinvoke::get_module_base_address("kernel32.dll");
    let load_library = dinvoke_rs::dinvoke::get_function_address(k32, "LoadLibraryA");
    let ret = unwinder::replace_and_call!(load_library, module_name_ptr); // Load a dll with an unwindable call stack
    println!("advapi.dll base address: 0x{:x}", ret as usize);
    ...
    unwinder::restore();
    
    some_value
} 

#[inline(never)] // This attribute is mandatory
fn internal_b(value: bool) -> bool
{
    unwinder::replace_and_continue();
    ...
    let large = 0xFFFFFFFFFF676980 as u64; // Sleep one second
    let large: *mut i64 = std::mem::transmute(&large);
    let alertable = false;
    let ntstatus = unwinder::replace_and_syscall!("NtDelayExecution", alertable, large);
    println!("ntstatus: {:x}", ntstatus as usize);
    ...
    unwinder::restore();
    
    some_value
} 

Remarks

Since this is an under development feature, some stuff must be taken into account:

Please report me any bug that may arise when using this feature.