4. Direct and Indirect Syscalls (shellcode runner)

#syscalls #directsyscalls #indirectsyscalls #Golang #EDREvasion

Intro

In the previous section we worked on a piece of code to detect if ntdll.dll was hooked by the EDR installed on the machine.

Once a hook is detected we have a few choices. We could replace the hook bytes with the original bytes by calculating the SSN. This is a similar approach to loading a fresh copy from the disk. The only difference is that we don't remap the whole dll we just unhook the functions of interest. The issue with this approach is that we still have to modify the dll in memory. This includes using the suspicious WinAPIs VirtualProtect(Ex) WriteProcessMemory etc. Another issue with this approach is that theese functions themselves might be hooked.

This is where the Direct and Indirect syscalls come in handy.

What is a syscall ?

Sequence of events when calling windows APIs

Before going into detail on what a syscall is let's analyze the sequence of events that take place when a simple windows API function is called. The following code is used for analysis.

func main() {
	PROCESS_ALL_ACCESS := 0x1F0FFF
	time.Sleep(30 * time.Second)
	println("run")
	pHandle, _ := windows.OpenProcess(uint32(PROCESS_ALL_ACCESS), false, 9340)
	windows.CloseHandle(pHandle)
}

All we do in this code is sleep for 30 seconds , just to have enough time to attach windbg and set our breakpoints get a handle on a process and then close the handle.

When we try to set a breakpoint on kernel32!OpenProcess we get the following error

To list all functions starting with O in kernel32 we use the following command

What we are interested here is to get a break point 00007ffa9aa34730 KERNEL32!OpenProcessStub on line 16. We will then follow execution to understand what happens.

Let's set a breakpoint in windbg using the following command

Then sending the command g resumes execution until our breakpoint hit.

The jmp instruction on line 5 directs execution to the address held at 00007ffa`9aaa5180, which is the address of kernelbase!OpenProcess.

So far we called the OpenProcess api from kernel32 which forwards our request to kernelbase to execute. So where do syscalls come in?

Let's dig further into the kernelbase where the actual implementation of the OpenProcess function is.

Line 20: we can see another call which eventually takes us to ntdll!NtOpenProcess

NtOpenProcess in the ntdll.dll is where the syscall resides.

So here comes the question again. What is a syscall and what does it actually do ?

Under the hood, when a user-mode application calls one of these API functions, the Windows kernel handles the actual system call invocation. The transition from user mode to kernel mode is typically managed through a mechanism called a software interrupt or a similar mechanism.

So in order to get a handle on a process all we don't really have to call either of the three functions. All we have to do is follow the x64 calling convention to prepare our registers and the stack move 0x26 (for this particular version of windows) to eax and call the syscall instruction.

The benefit of directly (or indirectly) calling the syscall instruction is that any EDR that relies on userland hooks for detection will be bypassed.

Direct or Indirect Syscalls.

Direct Syscalls

What "direct syscalls" means is that an asm function is written within our executable that calls the syscall instruction directly. Since syscalls are only called from ntdll.dll any calls coming from any other module should be malicious or at least flagged as anomalous.

Direct Syscalls have served us well. One of the early articles I remember reading regarding syscalls was this one from outflank written back in 2019. Although direct syscalls are effective to this day EDR vendors started catching up with the technique (elastic detection of Direct Syscall via Assembly Bytes).

The detection essentially checks if the following sequence of instructions is called from any other module other than ntdll. If that's the case it's flagged as malicious.

Indirect Syscalls

The easiest way around this detection is to find the location of syscall within ntdll.dll and instead of directly calling syscall in our assembly function we instead use the call instruction to call the address within the ntdll that holds syscall. In our example above the function will look something like this:

In the previous blog we stored an address value in a variable called "trampoline". The trampoline variable was the syscall instruction address for each exported function of the ntdll.

With all the knowledge we have now let's write the shellcode runner using both direct and indirect syscalls.

Shellcode Runner using direct and indirect syscalls

Before we start writing our shellcode runner we need to modify our code first to perform the following:

  • Calculate the SSNs of the hooked functions using the adjacent unhooked functions

    • We can do this by sorting all functions by their address

    • The SSNs are sequential for both Zw and Nt functions

    • Find the last unhooked function and extrapolate the values

  • Develop the assembly functions to call the syscalls / indirect syscalls

  • Write wrapper functions to call from our golang main function

Let's continue from where we left off.

Calculate the SSNs of the hooked functions using the adjacent unhooked functions

It is fairly easy to find the unhooked values since we keep our values in a slice. We will loop through the values in the slice and if a function is hooked we will increase the SSN of the previous value by 1.

Here is the unhooking function.

As mentioned earlier we could restore the value in memory but that would require changing memory permissions therefore more opportunities for the defenders to be alerted.

In our main function we run the UnhookFuncs function and print the exports again on the host running OpenEDR.

Before and after running the UnhookFuncs() function

Let's cross check that the value 0x33 if it's the correct SSN for NtOpenFile from another identical windows host not running an EDR.

NtOpenFile SSN

We have programmatically managed to get the correct SSN values of the hooked functions. We are now ready to start building the assembly functions that perform direct and indirect syscalls.

Assembly Functions

Since writing assembly is not the aim of this blog post I will not go in too much detail about it. If this is a subject of interest you can review the go documentation and the plan 9 assembler manual. Also keep an eye out for my upcoming goASM blog.

For the sake of simplicity we will take the ASM functions from two existing projects.

  1. Direct Syscall Function from BananaPhone Project is located here

  2. Indirect Syscall Function from acheron project is located here

Both of these functions are essentially modified versions of the Golang function (asmstdcall) used to perform windows API calls.

Let's have a quick look at the direct syscall function

Before we dive into the assembly let's not the main differences between Go Asm and what we will see in windbg (intel syntax)

goASM
Intel

Parameter order

Source before the destination.

Destination before source.

Address names

All registers are 64 bit, but instructions access low-order 8, 16 and 32 bits. For example, MOVL to AX puts a value in the low-order 32 bits and clears the top 32 bits to zero.

  • RAX is the 64-bit general-purpose register.

  • EAX is the 32-bit general-purpose register, and in 64-bit mode, it's the lower 32 bits of RAX.

  • AX is the 16-bit version of the register, and in 64-bit mode, it's the lower 16 bits of RAX.

The function receives a uint16 as an argument. That is the SSN of the syscall we want to perform. The rest of the uintptrs are the arguments passed to the function

  • Line 5 sets the RAX register to 0 by performing the xor operation.

  • Line 6 moves the value of the first argument to RAX essentially recreating the mov eax,33 we have seen in the ntdll exported functions.

  • Line 7 takes the number of arguments into RCX.

  • Lines 17-45: It basically checks how many arguments were passed to the function and follows the x64 calling convention. First 4 arguments passed to the registers RCX, RDX,R8,R9 and the rest are stored in the stack.

  • Line 46 is where the syscall instruction is called.

The indirect syscall function is very similar:

The main differences are:

  • In addition to the ssn it receives a trampoline argument which is the address of the syscall;ret; located in ntdll.dll

  • Line 7: The trampoline addess is stored in R11 register

  • Line 65: Instead of syscall of using the syscall instruction we use the CALL R11 instruction that calls the syscall in ntdll.dll

Wrapper Functions for our assembly functions

In order to be able to call the assembly functions in go we need to save them in the same directory as our code. Since our functions will only work on x64 the name should end with amd64.s. If a 32bit implementation of the function was present we would have to create a separate file ending with _i386.s . That's letting the compiler know the architecture of the assembly functions.

Adding the assembly functions in the project directory

In our code we should also define the functions without a body

That's all needed before we can call the functions.

We will then write a wrapper function that receives the ntapi function as a string and the function arguments. It will then resolve the ssn and trampoline as needed before calling our assembly function.

Shellcode Runner code

In order to create our shellcode runner the following native APIs should be called:

  1. NtAllocateVirtualMemory (== VirtualAlloc)

  2. rtlMoveMemory

  3. NtProtectVirtualMemory (== VirtualProtect)

  4. NtCreateThreadEx (==CreateThread)

Let's write the functions one by one. At this point using direct or indirect syscalls has no difference at all. We just have to call the respective function(IndirectSyscall or Syscall) and the code will do the work for us. We will run both implementations against openEDR and elasticEDR to see if any alerts are generated.

Let's create a wrapper function for each ntAPI

NtAllocateVirtualMemory

The arguments passed are identical to the VirtualAlloc function (which is not always the case).

The allocated address is stored at the BaseAddress variable defined before the syscall.

The easiest way to debug if our stack / registers are correct before the syscall is to set up a break point just before the syscall. An easy way to find the address of a function in golang is using the following code which prints the address of the IndirectSyscall() in memory. The the sleep function will give us enough time to attach to process and set our breakpoints.

In the main function we add the following code to call our wrapper function:

We have the memory allocated at 0x2ab19810000

rtlMoveMemory

We can use the rtlMoveMemory function to copy the bytes stored in the sc slice to the allocated memory.

NtProtectVirtualMemory

We then use this native api to adjust the memory permissions to RX.

In the main function we add this piece of code to

Memory permissions changed successfuly

NtCreateThreadEx

The last and final step before is to create a thread pointing to our shellcode.

And in the main function:

Thread created and returned the handle

Detections ?

Using the default rules none of the EDRs generated any alerts other than the process creation.

OpenEDR (22-09-2023)

Succesfully executed without generating any alerts

ElasticEDR (22-09-2023)

Succesfully executed without generating any alerts

Complete Code

Ideally the syscall functionality should be turned into a package and then imported wherever needed.

Last updated

Was this helpful?