Awesome

Content

SilentMoonWalk
Stack replacement

SilentMoonWalk

Description

Unwinder provides a full weaponization of SilentMoonWalk technique, allowing to obtain complete and stable call stack spoofing in Rust.

This technique comes with the following characteristics:

Support to run any arbitrary function with up to 11 parameters.
Support to run indirect syscalls (no additional heap allocations) with up to 11 parameters.
The crate allows to retrieve the value returned by the functions called through it.
The spoofing process can be concatenated any number of times without increasing the call stack size.
TLS is used to increase efficiency during the spoofing process.
dinvoke_rs is used to make any Windows API call required by the crate.

Credits

kudos to the creators of the SilentMoonWalk technique:

And of course a huge shoutout to namazso for the Twitter thread that inspired this whole project.

Usage

Import this crate into your project by adding the following line to your cargo.toml and compile on release mode:

[dependencies]
unwinder = "0.1.3"

The main functionality of this crate has been wrapped in two macros:

The call_function!() macro allows to run any arbitrary function with a clean call stack.
The indirect_syscall!() macro executes the specified (indirect) syscall with a clean call stack.

To use any of these macros it is required to import std::ffi::c_void data type.

Both macros return a *mut c_void that can be used to retrieve the value returned by the function executed. More detailed information in the examples section.

call_function macro

This macro is used to call any desired function with a clean call stack. The macro expects the following parameters:

The first parameter is the memory address to call after spoofing the call stack. This parameter should be passed as a usize, isize or a pointer.
The second parameter is a bool indicating whether or not keep the start function frame. If you are not sure about this, set it to false which always guarantees a good call stack.
The following parameters are those arguments to send to the function once the call stack has been spoofed.

indirect_syscall macro

This macro is used to perform any desired indirect syscall with a clean call stack. The macro expects the following parameters:

The first parameter is a string that contains the name of the NT function whose syscall you want to execute.
The second parameter is a bool indicating whether or not keep the start function frame. If you are not sure about this, set it to false which always guarantees a good call stack.
The following parameters are those arguments to send to the NT function.

Parameter passing

In order to pass arguments of different types to these two macros, the following considerations must be taken into account:

Any basic data type that can be converted to usize (u8-u64, i8-i64, bool, etc.) can be passed directly to the macros.
Structs and unions of size 8, 16, 32, or 64 bits are passed as if they were integers of the same size.
Structures and unions with a size larger than 64 bits must be passed as a pointer.
Strings (&str and String) must be passed as a pointer.
Null pointers (ptr::null(), ptr::null_mut(), etc. ) are passed as a 0 (no matter if it is u8, u16, i32 or any other).
Floating-point and double-precision parameters are not currently supported.
Any other data type must be passed as a pointer.

Examples

Calling Sleep

let k32 = dinvoke_rs::dinvoke::get_module_base_address("kernel32.dll");
let sleep = dinvoke_rs::dinvoke::get_function_address(k32, "Sleep"); // Memory address of kernel32.dll!Sleep() 
let miliseconds = 1000i32;
unwinder::call_function!(sleep, false, miliseconds);

Calling OpenProcess

let k32 = dinvoke_rs::dinvoke::get_module_base_address("kernel32.dll"); 
let open_process: isize = dinvoke_rs::dinvoke::get_function_address(k32, "Openprocess");
let desired_access: u32 = 0x1000;
let inherit = 0i32;
let pid = 20628i32;
let handle = unwinder::call_function!(open_process, false, desired_access, inherit, pid); // returns *mut c_void
let handle: HANDLE = std::mem::transmute(handle);
println!("Handle id: {:x}", handle.0);

Notice that the macro returns a *mut c_void that can be directly converted to a HANDLE since both data types has the same size. This allows to access to the value returned by OpenProcess, which is the new handle to the target process.

Calling NtDelayExecution as indirect syscall

let large = 0x8000000000000000 as u64; // Sleep indefinitely
let large: *mut i64 = std::mem::transmute(&large);
let alertable = false;
let ntstatus = unwinder::indirect_syscall!("NtDelayExecution", false, alertable, large); // returns *mut c_void
println!("ntstatus: {:x}", ntstatus as i32);

Notice that the macro returns a *mut c_void that can be used to retrieve the NTSTATUS returned by NtDelayExecution.

Concatenate macro calls

The spoofing process can be concatenated any number of times without an abnormal call stack size increment. The execution flow will be preserved as well. The following code is an example of this:

fn main()
{
	function_a();
}

fn function_a()
{
	unsafe
	{
		let func_b = function_b as usize;
		call_function!(func_b, false);
		println!("function_a done.");
	}
}

fn function_b()
{
	unsafe
	{
		let func_c = function_c as usize;
		call_function!(func_c, false);
		println!("function_b done.")
	}
}

fn function_c()
{
	unsafe
	{
		let large = 0x0000000000000000 as u64; // Don't sleep so we return to function_b, allowing to check the execution flow preservation.
		let large: *mut i64 = std::mem::transmute(&large);
		let alertable = false;
		let ntstatus = unwinder::indirect_syscall!("NtDelayExecution", false, alertable, large);
		println!("ntstatus: {:x}", (ntstatus as usize) as i32); //NTSTATUS is a i32, although that second casting is not really required in this case.
	}
}

Considerations

Initial frame

If you set the second parameter to true (both macros), the spoofing process will try to keep the thread start address' frame in the call stack to increase legitimacy.

Call stack spoofed keeping the main module.

Sometimes, the thread's start function does not perform a call to a subsequent function (e.g. a jmp instruction is executed instead), meaning there is not return address pushed to the stack. In that scenario (and also if you set that second parameter to false), the spoofed call stack will start at BaseThreadInitThunk's frame.

Call stack spoofed without main module.

PoC

In order to test the implementation of the technique, PE-sieve has been used with the flag /threads. The results of the test shows how the inpection of the call stack does not reveal the pressence of the payload when this crate's functionalities are used. As it can be seen in the second image, the payload is detected when unwinder is not used.

PE-sieve results when unwinder is used.

Stack replacement

Technique description

This is a call stack spoofing alternative to SilentMoonWalk that allows to keep a clean call stack during the execution of your program. The main idea behind this technique is that each called function inside your module takes care of the previously pushed return address, finding at runtime a legitimate function with the same frame size as that of the return address to be spoofed. Once a legitime function with the same frame size has been located, an offset within it is calculated and the final address is used to replace the last return address, hiding any anomalous entry in the call stack and keeping it unwindable. The original return address is stored by unwinder and it is moved back to the right position in the stack before a return instruction is executed, allowing to continue the normal flow of the program.

This is an experimental feature that despite being fully functional it is still under development and research, so make sure to test your code if you decide to integrate this technique on it.

How to use it

To use the stack replacement functionality you should add the following line to your cargo.toml and compile on release mode:

[dependencies]
unwinder = {version = "0.1.3", features = ["Experimental"]}

The main functionality of this feature has been wrapped in the following macros:

The start_stack_replacement!()/end_replacement!() pair of macros indicates unwinder to start/end the stack replacement process. These two macros must be called in your code's entry point (e.g. in your dll's exported functions).
The replace_and_continue!()/restore!() pair of macros performs the replacement/restoration of the last return address.
Finally, the replace_and_call!()/replace_and_syscall!() pair of macros are used to perform stack replacement when we want to call functions outside of the current module (e.g. when using Windows API or calling any other dll's code). Both of these macros will return a *mut c_void containing the value returned by the function called this way (i.e. they operate the same way as described for the macros call_function and indirect_syscall used to execute SilentMoonWalk).

To use these macros it is required to import std::ffi::c_void data type. All the functions using any of these macros should be labeled with the #[no_mangle] or #[inline(never)] attributes to prevent the rust compiler from inlining them during the optimization process.

Before diving into a practical example showing how to use all of this stuff, just a quick inspection of the replace_and_call/replace_and_syscall pair of macros and how to pass them the expected arguments.

replace_and_call

This macro is used to call any desired function outside of the current module with a clean call stack while using stack replacement. The macro expects the following parameters:

The first parameter is the memory address of the function to call. This parameter should be passed as a usize, isize or a pointer.
The following parameters are those arguments to send to the specified function. They follow the same rules specified in the Parameter passing section.

replace_and_syscall

This macro is used to perform any desired indirect syscall with a clean call stack while using stack replacement. The macro expects the following parameters:

The first parameter is a string that contains the name of the NT function whose syscall you want to execute.
The following parameters are those arguments to send to the NT function. They follow the same rules specified in the Parameter passing section.

Example

I think the best way to show how these macros are used is through a practical example. Let's suppose we are creating a dll that will be reflectively injected to memory. This dll will export two functions ExportA and ExportB, so we will consider these two functions as the module's entry points. Both of them must call start_stack_replacement macro right at the beginning and also they must call the reverse end_replacement macro before returning. The start_stack_replacement macro expects as argument the module's base address, or you can pass 0 if you dont know that address at runtime, the macro will try to figure it out by itself.

#[no_mangle]
fn ExportedA(base_address: usize) -> bool
{
    unwinder::start_replacement!(base_address);
    ...
    unwinder::end_replacement!();

    true
}

#[no_mangle]
fn ExportedB() -> bool
{
    unwinder::start_replacement!(0);
    ...
    unwinder::end_replacement!();

    true
}

Starting the stack replacement process involves the manual crafting of a new stack that will be used until the end_replacement macro is called. The following picture illustrates what is going on under the hood:

Although theoretically it would not be necessary to start a new stack from scratch, I've decided to implement the process this way to ensure stability and to prevent anything from breaking.

Now, let's assume that our ExportedA function makes several calls to another two internal functions. These two internal functions are responsible for replacing/restoring the original return address that will point to some place within ExportedA, breaking the call stack unless we take care of it. This replacement process involves wrapping our internal function's code between the replace_and_continue and restore macros:

#[no_mangle]
fn ExportedA(base_address: usize) -> bool
{
    unwinder::start_replacement!(base_address);
    let ret_a = internal_a();
    let ret_b = internal_b(ret_a);
    unwinder::end_replacement!();

    ret_b
}

#[inline(never)] // This attribute is mandatory
fn internal_a() -> bool
{
    unwinder::replace_and_continue();
    ...
    unwinder::restore();
    
    some_value
} 

#[inline(never)] // This attribute is mandatory
fn internal_b(value: bool) -> bool
{
    unwinder::replace_and_continue();
    ...
    unwinder::restore();
    
    some_value
}

Finally, both internal_a and internal_b functions make use of some Windows API functionality. To keep the unwindable call stack, these calls should be performed through the replace_and_call (normal call) or replace_and_syscall (indirect syscall) macros.

#[no_mangle] // This attribute is mandatory
fn ExportedA(base_address: usize) -> bool
{
    unwinder::start_replacement!(base_address);
    let ret_a = internal_a();
    let ret_b = internal_b(ret_a);
    unwinder::end_replacement!();

    ret_b
}

#[inline(never)] // This attribute is mandatory
fn internal_a() -> bool
{
    unwinder::replace_and_continue();
    ...
    let module_name = "advapi32.dll";
    let module_name = CString::new(module_name.to_string()).expect("");
    let module_name_ptr: *mut u8 = std::mem::transmute(module_name.as_ptr());
    let k32 = dinvoke_rs::dinvoke::get_module_base_address("kernel32.dll");
    let load_library = dinvoke_rs::dinvoke::get_function_address(k32, "LoadLibraryA");
    let ret = unwinder::replace_and_call!(load_library, module_name_ptr); // Load a dll with an unwindable call stack
    println!("advapi.dll base address: 0x{:x}", ret as usize);
    ...
    unwinder::restore();
    
    some_value
} 

#[inline(never)] // This attribute is mandatory
fn internal_b(value: bool) -> bool
{
    unwinder::replace_and_continue();
    ...
    let large = 0xFFFFFFFFFF676980 as u64; // Sleep one second
    let large: *mut i64 = std::mem::transmute(&large);
    let alertable = false;
    let ntstatus = unwinder::replace_and_syscall!("NtDelayExecution", alertable, large);
    println!("ntstatus: {:x}", ntstatus as usize);
    ...
    unwinder::restore();
    
    some_value
}

Remarks

Since this is an under development feature, some stuff must be taken into account:

If you are removing your PE's headers during the loading process, you must pass to the start_stack_replace macro the module's base address. Right now, it won't be able to find it by itself (to be solved in the next update).
In case you are wondering, stack replacement uses the same combination of jmp rbx + concealment frame as the SilentMoonWalk technique. This happens only when using replace_and_call and replace_and_syscall macros and it is planned to be changed in the next update.
Both replace_and_call and replace_and_syscall macros return a *mut c_void that can be used to retrieve the value returned by the function executed through them. This is the same behaviour as the one described for the call_function and indirect_syscall macros.
replace_and_call and replace_and_syscall macros allow up to 11 arguments.

Please report me any bug that may arise when using this feature.