Awesome
Content
SilentMoonWalk
Description
Unwinder provides a full weaponization of SilentMoonWalk technique, allowing to obtain complete and stable call stack spoofing in Rust.
This technique comes with the following characteristics:
- Support to run any arbitrary function with up to 11 parameters.
- Support to run indirect syscalls (no additional heap allocations) with up to 11 parameters.
- The crate allows to retrieve the value returned by the functions called through it.
- The spoofing process can be concatenated any number of times without increasing the call stack size.
- TLS is used to increase efficiency during the spoofing process.
- dinvoke_rs is used to make any Windows API call required by the crate.
Credits
kudos to the creators of the SilentMoonWalk technique:
And of course a huge shoutout to namazso for the Twitter thread that inspired this whole project.
Usage
Import this crate into your project by adding the following line to your cargo.toml
and compile on release
mode:
[dependencies]
unwinder = "0.1.3"
The main functionality of this crate has been wrapped in two macros:
- The
call_function!()
macro allows to run any arbitrary function with a clean call stack. - The
indirect_syscall!()
macro executes the specified (indirect) syscall with a clean call stack.
To use any of these macros it is required to import std::ffi::c_void
data type.
Both macros return a *mut c_void
that can be used to retrieve the value returned by the function executed. More detailed information in the examples section.
call_function macro
This macro is used to call any desired function with a clean call stack. The macro expects the following parameters:
- The first parameter is the memory address to call after spoofing the call stack. This parameter should be passed as a
usize
,isize
or a pointer. - The second parameter is a bool indicating whether or not keep the start function frame. If you are not sure about this, set it to false which always guarantees a good call stack.
- The following parameters are those arguments to send to the function once the call stack has been spoofed.
indirect_syscall macro
This macro is used to perform any desired indirect syscall with a clean call stack. The macro expects the following parameters:
- The first parameter is a string that contains the name of the NT function whose syscall you want to execute.
- The second parameter is a bool indicating whether or not keep the start function frame. If you are not sure about this, set it to false which always guarantees a good call stack.
- The following parameters are those arguments to send to the NT function.
Parameter passing
In order to pass arguments of different types to these two macros, the following considerations must be taken into account:
- Any basic data type that can be converted to
usize
(u8-u64, i8-i64, bool, etc.) can be passed directly to the macros. - Structs and unions of size 8, 16, 32, or 64 bits are passed as if they were integers of the same size.
- Structures and unions with a size larger than 64 bits must be passed as a pointer.
- Strings (
&str
andString
) must be passed as a pointer. - Null pointers (
ptr::null()
,ptr::null_mut()
, etc. ) are passed as a 0 (no matter if it isu8
,u16
,i32
or any other). - Floating-point and double-precision parameters are not currently supported.
- Any other data type must be passed as a pointer.
Examples
Calling Sleep
let k32 = dinvoke_rs::dinvoke::get_module_base_address("kernel32.dll");
let sleep = dinvoke_rs::dinvoke::get_function_address(k32, "Sleep"); // Memory address of kernel32.dll!Sleep()
let miliseconds = 1000i32;
unwinder::call_function!(sleep, false, miliseconds);
Calling OpenProcess
let k32 = dinvoke_rs::dinvoke::get_module_base_address("kernel32.dll");
let open_process: isize = dinvoke_rs::dinvoke::get_function_address(k32, "Openprocess");
let desired_access: u32 = 0x1000;
let inherit = 0i32;
let pid = 20628i32;
let handle = unwinder::call_function!(open_process, false, desired_access, inherit, pid); // returns *mut c_void
let handle: HANDLE = std::mem::transmute(handle);
println!("Handle id: {:x}", handle.0);
Notice that the macro returns a *mut c_void
that can be directly converted to a HANDLE
since both data types has the same size. This allows to access to the value returned by OpenProcess
, which is the new handle to the target process.
Calling NtDelayExecution as indirect syscall
let large = 0x8000000000000000 as u64; // Sleep indefinitely
let large: *mut i64 = std::mem::transmute(&large);
let alertable = false;
let ntstatus = unwinder::indirect_syscall!("NtDelayExecution", false, alertable, large); // returns *mut c_void
println!("ntstatus: {:x}", ntstatus as i32);
Notice that the macro returns a *mut c_void
that can be used to retrieve the NTSTATUS
returned by NtDelayExecution
.
Concatenate macro calls
The spoofing process can be concatenated any number of times without an abnormal call stack size increment. The execution flow will be preserved as well. The following code is an example of this:
fn main()
{
function_a();
}
fn function_a()
{
unsafe
{
let func_b = function_b as usize;
call_function!(func_b, false);
println!("function_a done.");
}
}
fn function_b()
{
unsafe
{
let func_c = function_c as usize;
call_function!(func_c, false);
println!("function_b done.")
}
}
fn function_c()
{
unsafe
{
let large = 0x0000000000000000 as u64; // Don't sleep so we return to function_b, allowing to check the execution flow preservation.
let large: *mut i64 = std::mem::transmute(&large);
let alertable = false;
let ntstatus = unwinder::indirect_syscall!("NtDelayExecution", false, alertable, large);
println!("ntstatus: {:x}", (ntstatus as usize) as i32); //NTSTATUS is a i32, although that second casting is not really required in this case.
}
}
Considerations
Initial frame
If you set the second parameter to true (both macros), the spoofing process will try to keep the thread start address' frame in the call stack to increase legitimacy.
Sometimes, the thread's start function does not perform a call
to a subsequent function (e.g. a jmp
instruction is executed instead), meaning there is not return address pushed to the stack. In that scenario (and also if you set that second parameter to false), the spoofed call stack will start at BaseThreadInitThunk's frame.
PoC
In order to test the implementation of the technique, PE-sieve has been used with the flag /threads
. The results of the test shows how the inpection of the call stack does not reveal the pressence of the payload when this crate's functionalities are used. As it can be seen in the second image, the payload is detected when unwinder is not used.
Stack replacement
Technique description
This is a call stack spoofing alternative to SilentMoonWalk that allows to keep a clean call stack during the execution of your program. The main idea behind this technique is that each called function inside your module takes care of the previously pushed return address, finding at runtime a legitimate function with the same frame size as that of the return address to be spoofed. Once a legitime function with the same frame size has been located, an offset within it is calculated and the final address is used to replace the last return address, hiding any anomalous entry in the call stack and keeping it unwindable. The original return address is stored by unwinder
and it is moved back to the right position in the stack before a return instruction is executed, allowing to continue the normal flow of the program.
This is an experimental feature that despite being fully functional it is still under development and research, so make sure to test your code if you decide to integrate this technique on it.
How to use it
To use the stack replacement functionality you should add the following line to your cargo.toml
and compile on release
mode:
[dependencies]
unwinder = {version = "0.1.3", features = ["Experimental"]}
The main functionality of this feature has been wrapped in the following macros:
- The
start_stack_replacement!()
/end_replacement!()
pair of macros indicatesunwinder
to start/end the stack replacement process. These two macros must be called in your code's entry point (e.g. in your dll's exported functions). - The
replace_and_continue!()
/restore!()
pair of macros performs the replacement/restoration of the last return address. - Finally, the
replace_and_call!()
/replace_and_syscall!()
pair of macros are used to perform stack replacement when we want to call functions outside of the current module (e.g. when using Windows API or calling any other dll's code). Both of these macros will return a *mut c_void containing the value returned by the function called this way (i.e. they operate the same way as described for the macroscall_function
andindirect_syscall
used to execute SilentMoonWalk).
To use these macros it is required to import std::ffi::c_void
data type.
All the functions using any of these macros should be labeled with the #[no_mangle]
or #[inline(never)]
attributes to prevent the rust compiler from inlining them during the optimization process.
Before diving into a practical example showing how to use all of this stuff, just a quick inspection of the replace_and_call
/replace_and_syscall
pair of macros and how to pass them the expected arguments.
replace_and_call
This macro is used to call any desired function outside of the current module with a clean call stack while using stack replacement. The macro expects the following parameters:
- The first parameter is the memory address of the function to call. This parameter should be passed as a
usize
,isize
or a pointer. - The following parameters are those arguments to send to the specified function. They follow the same rules specified in the Parameter passing section.
replace_and_syscall
This macro is used to perform any desired indirect syscall with a clean call stack while using stack replacement. The macro expects the following parameters:
- The first parameter is a string that contains the name of the NT function whose syscall you want to execute.
- The following parameters are those arguments to send to the NT function. They follow the same rules specified in the Parameter passing section.
Example
I think the best way to show how these macros are used is through a practical example. Let's suppose we are creating a dll that will be reflectively injected to memory. This dll will export two functions ExportA
and ExportB
, so we will consider these two functions as the module's entry points. Both of them must call start_stack_replacement
macro right at the beginning and also they must call the reverse end_replacement
macro before returning. The start_stack_replacement
macro expects as argument the module's base address, or you can pass 0 if you dont know that address at runtime, the macro will try to figure it out by itself.
#[no_mangle]
fn ExportedA(base_address: usize) -> bool
{
unwinder::start_replacement!(base_address);
...
unwinder::end_replacement!();
true
}
#[no_mangle]
fn ExportedB() -> bool
{
unwinder::start_replacement!(0);
...
unwinder::end_replacement!();
true
}
Starting the stack replacement process involves the manual crafting of a new stack that will be used until the end_replacement
macro is called. The following picture illustrates what is going on under the hood:
Although theoretically it would not be necessary to start a new stack from scratch, I've decided to implement the process this way to ensure stability and to prevent anything from breaking.
Now, let's assume that our ExportedA
function makes several calls to another two internal functions. These two internal functions are responsible for replacing/restoring the original return address that will point to some place within ExportedA
, breaking the call stack unless we take care of it. This replacement process involves wrapping our internal function's code between the replace_and_continue
and restore
macros:
#[no_mangle]
fn ExportedA(base_address: usize) -> bool
{
unwinder::start_replacement!(base_address);
let ret_a = internal_a();
let ret_b = internal_b(ret_a);
unwinder::end_replacement!();
ret_b
}
#[inline(never)] // This attribute is mandatory
fn internal_a() -> bool
{
unwinder::replace_and_continue();
...
unwinder::restore();
some_value
}
#[inline(never)] // This attribute is mandatory
fn internal_b(value: bool) -> bool
{
unwinder::replace_and_continue();
...
unwinder::restore();
some_value
}
Finally, both internal_a
and internal_b
functions make use of some Windows API functionality. To keep the unwindable call stack, these calls should be performed through the replace_and_call
(normal call) or replace_and_syscall
(indirect syscall) macros.
#[no_mangle] // This attribute is mandatory
fn ExportedA(base_address: usize) -> bool
{
unwinder::start_replacement!(base_address);
let ret_a = internal_a();
let ret_b = internal_b(ret_a);
unwinder::end_replacement!();
ret_b
}
#[inline(never)] // This attribute is mandatory
fn internal_a() -> bool
{
unwinder::replace_and_continue();
...
let module_name = "advapi32.dll";
let module_name = CString::new(module_name.to_string()).expect("");
let module_name_ptr: *mut u8 = std::mem::transmute(module_name.as_ptr());
let k32 = dinvoke_rs::dinvoke::get_module_base_address("kernel32.dll");
let load_library = dinvoke_rs::dinvoke::get_function_address(k32, "LoadLibraryA");
let ret = unwinder::replace_and_call!(load_library, module_name_ptr); // Load a dll with an unwindable call stack
println!("advapi.dll base address: 0x{:x}", ret as usize);
...
unwinder::restore();
some_value
}
#[inline(never)] // This attribute is mandatory
fn internal_b(value: bool) -> bool
{
unwinder::replace_and_continue();
...
let large = 0xFFFFFFFFFF676980 as u64; // Sleep one second
let large: *mut i64 = std::mem::transmute(&large);
let alertable = false;
let ntstatus = unwinder::replace_and_syscall!("NtDelayExecution", alertable, large);
println!("ntstatus: {:x}", ntstatus as usize);
...
unwinder::restore();
some_value
}
Remarks
Since this is an under development feature, some stuff must be taken into account:
- If you are removing your PE's headers during the loading process, you must pass to the
start_stack_replace
macro the module's base address. Right now, it won't be able to find it by itself (to be solved in the next update). - In case you are wondering, stack replacement uses the same combination of
jmp rbx
+ concealment frame as the SilentMoonWalk technique. This happens only when usingreplace_and_call
andreplace_and_syscall
macros and it is planned to be changed in the next update. - Both
replace_and_call
andreplace_and_syscall
macros return a*mut c_void
that can be used to retrieve the value returned by the function executed through them. This is the same behaviour as the one described for thecall_function
andindirect_syscall
macros. replace_and_call
andreplace_and_syscall
macros allow up to 11 arguments.
Please report me any bug that may arise when using this feature.