2. Windows x64 Shellcode Development intro
#x64 #shellcode #golang #asm
Introduction
Shellcode is a small piece of code written in assembly language that is used to perform a specific function in the context of a software exploit. The term "shellcode" comes from the idea that the code often opens a shell, providing an attacker with command-line access to a compromised system.
Shellcode is commonly associated with security exploits, especially in the field of cybersecurity and penetration testing. It is often injected into a vulnerable program's memory through various means, such as buffer overflows or other vulnerabilities, to take control of the program's execution flow.
The functionality of shellcode can vary widely, depending on the goals of the attacker. It might include actions like spawning a shell, downloading and executing malicious payloads, or performing other malicious activities.
It's important to note that while shellcode itself is not inherently malicious, it is commonly used as a component of exploits and attacks.
A common tool for generating shellcode is msfvenom. Any payload generated from this tool is heavily signatured by AV/EDR vendors. Being able to write custom shellcode is a great addition to the arsenal of any offensive security professional.
Assembly - Intel Syntax
Throughout this blog post I will be using Intel Syntax. I think it's much easier to read and write. It's also the default syntax for Windbg which is the debugger I will be using for testing my assembly code.
Syntax
Intel syntax follows this convention:
So let's take a a real example. The add command will add the value on the right to the to the value on the left.
In this case if rax had the value 2 before the command was ran, after execution it will have the value of 3.
Common ASM instructions
Instructions | Explanation |
---|---|
mov rax,1; | Moves value 1 (decimal) to rax. Add 0x in front of the number for hex values |
mov rax, qword ptr [r8]; | Moves the qword from the location of r8 to rax |
add rax,1; | Adds 1 to rax |
sub rax,1; | Subtracts 1 from rax |
push rax; | Pushes the value of rax to the stack |
pop rax; | Pops the first value of the stack into rax; |
call rax; | Calls the function at the address stored in rax |
jmp rax; | Jumps at the address stored in rax |
xor rax,rax; | logical xor, zeros the contents of rax |
int3; | breakpoint |
The above instructions are the most commonly used when we are writing shellcode. A few others will be used but they will be explained as we walk through the actual code.
Registers
The whole list of registers can be found on microsoft's website. Let's have a quick look on how some of the registers will be used within the shellcode.
Let's take rax as an example.
register | size |
---|---|
rax | 64-bit register |
eax | 32-bit register (lower 32 bits of rax) |
ax | 16-bit register (lower 16 bits of eax) |
ah | 8-bit register (higher 8 bits of ax) |
al | 8-bit register (lower 8 bits of ax) |
Let's take r8 as another example
register | size |
---|---|
r8 | 64-bit register |
r8d | 32-bit register (lower 32 bits of r8) |
r8w | 16-bit register (lower 16 bits of r8d) |
r8b | 8-bit register (lower 8 bits of r8w) |
d - double word
w - word
b - byte
A few things to note when using the different variations of these registers. Let's say the following instruction is used:
The source and destination should match in size. We cannot use a 64-bit register as our destination but a 32 bit (dword) as our source. Keystone will not generate the op codes for us.
Shellcode template
A good template I found online when I was looking for one can be found here.
I am not a big fan of python so I ported the above script in go. Also to make it easy for development and debugging I will include the shellcode runner script and automatically launch and attach Windbg Preview.
The ported code can be downloaded from my github page here. If you are more familiar with python you can use the original template from exploitdb and follow along. Only caveat is that you will have to write your own shellcode runner to execute the code.
Objective
With everything we have now in place, let's have a quick look at the code we are trying to execute in a higher level language.
From the above code we essentially have 4 lines of code we would like to turn into assembly.
Line 12: Import kernel32.dll
Line14: Get the Process address of WinExec
Line16: Get a Pointer to a null terminated string "calc.exe"
Line 18: Call the Winexec function passing the pointer from line 16 as the first argument and 1 as the second.
Shellcode
Finding kernel32.dll
As mentioned previously the first step of developing our shellcode is to find the base address of kernel32.dll in memory. Kernel32.dll is always loaded in the process memory on creation.
To find the address we have to perform the following tasks.
From PEB we can get the PEB_LDR_DATA
And from PEB_LDR_DATA we can get InMemoryOrderModuleList
Let's walk through the assembly code in windbg to ensure we get the expected results. We start by adding a breakpoint int3;
at the top of our shellcode.
Get the address of the PEB structure from TEB
Line 3: In the context of Windows, the gs
register points to the thread information block (TEB), which contains information about the current thread
In Windbg we can view the structure using the following command
We can see that the PEB is located at offset 0x60 from the beginning of the TEB. Once we step over the following instruction we should get the address of PEB in rax
A quick sanity check confirms that the value in the rax register matches the one from the TEB.
From PEB we can get the PEB_LDR_DATA
Line 5: We have the value of PEB in RAX and we now try to get the address of the PEB_LDR_DATA.
We can then use the following command to view the PEB Structure and identify the offset for LDR
As we can see from above the offset to Ldr is 0x18. Let's step over line 5 that has the following instruction to see if we get the address of PEB_LDR_DATA in the RSI register
Let's do a quick sanity check using windbg
Great, we now have the address of the struct in rsi
We can get InMemoryOrderModuleList from PEB_LDR_DATA
Line 6: We now have the PEB_LDR_DATA address in rsi and we want to get the value of InMemoryOrderModuleList to rsi. Let's view the struct in windbg once again to make sure we have the correct offset in our shellcode.
The offset seems to be correct in our code
Let's step over in our code to see if we get the right value in rsi.
Walk the link list.
Kernel32 comes after the current executable and ntdll. So moving forward twice should give us the _LDR_DATA_TABLE_ENTRY of kernel32. Let's confirm this.
First Entry ks.exe:
Second entry ntdll.dll
Third entry kernel32.dll
We can see from the beginning of the structure the offset to the DllBase is at 0x30. Since we are substructing 0x10 from the r9 register to get to the beginning of the structure we only need to add 0x20 to get the DllBase value.
Let's confirm that after stepping over line 9 in our code the register r9 will hold the kernel32.dll base address.
Awesome.. with the address of kernel32 in r9 we can now proceed to get the address of winexec
GetProcAddress
The next step step in our shellcode is to create a function to walk through the exports directory of any given dll (base address) and return the absolute address of the function. Although in this example we will only call it once, in larger and more real world scenarios we will most likely have to call this function multiple times.
Let's break the code down into smaller pieces to understand exactly what's happening.
parse_module
The parse_module function expects 2 values from the caller:
R9 -> should hold the base address of the dll
R8d -> should have the hash of the function (more on this later)
Line 2: The offset value to the beginning of the nt header is moved to ecx
PE-bear is an excellent tool that can be used to cross check if the values we see in windbg are indeed the right ones.
Let's step over the code in line 2 to make sure we are getting the correct result. The value we expect to see in ecx is E8.
Lines 3-7: What happens on these lines is basically the following calculation
NtHeader = DllBase + 0xE8
Export Directory = NtHeader + 0x88
From a quick calculation we can see that Export directory is at offset 0x170 from the base address. Let's check in PE Bear if that offset points to the export directory in PE Bear
We can see that the offset 0x170 points to the RVA of the export directory.
Let's walk over the following instruction in windbg to ensure that r15 holds the RVA value we expect to see
When we move a dword in the lower 32-bits of a register, the higher 32-bits are filled with 0s.
This is not the case when we move a value into the lower 16-bits of the register.
We can now add r9 which holds the dllBase address to calculate the absolute address of the Export directory.
In windbg we can confirm that we are indeed pointing to the right location by viewing the first two double words.
We can see the value of Characteristics (00000000 ) and ReproChecksum (2e35230e)
The last 3 lines store the number of function names in ecx and the address of names in r14.
A quick look in PE-bear reveals that we have the right values in both ecx, and r14. We can see that the first value at r14 is the same as the first Name RVA below.
search_function
The functionality of this code is fairly simple.
Line 2: checks if ecx = 0 and if it is it jumps to line 10 that terminates the execution of our shellcode. When ecx is 0 it means that our shellcode went through the whole export list without finding the requested function.
Line3: Decrements ecx by 1 for every iteration
Line4: zeros rsi
Line 5: For the first iteration the last Export RVA is moved to esi
Line 6: Adds base address to RVA to get absolute value in rsi
The second time the loop reaches this point this is the output from Windbg
It matches the exported functions from PE-bear
function_hashing
The shellcode author in this case came up with a smart algorithm that generates a hash based on the Function name. It then compares the generated hash with the hash we provide it. The caveat of that is that we have to write a piece of code to calculate that hash for us.
Lines 1-4 : Zero rax & rdx and clear DF flag
The iteration code is where the hashing happens.
Line 6: loadsb takes the first byte from the address pointed to by the RSI and write is to the lowest byte of rax (al)
Lines 8-9: Checks if the value is 0 that indicates the end of the string and jumps to the next function
Line 10: The x86-64 assembly instruction ror edx, 0x0d
performs a "rotate right" operation on the contents of the edx register. In this case, the rotation is by 13 bits (0x0d in hexadecimal is 13 in decimal).
Imagine edx could only hold 4 bits. Here is an example of the ror effect after rotating right 1 bit.
Line 11: Adds eax to edx
Line 12: Loops to the next byte
compare_hash
The last part of the code is where the actual comparison takes place with the provided hash. Our hash will be located in r8d.
Line 2: Compares calculated hash from the previous function with the one we provided
Line 3: If they are not equal it jumps back to our search_function loop to get the next entry.
Lines 4-10 Only execute if the provided and calculated hashes match
Line 4: r15 holds the address of the export directory. The offset 0x24 points to the AddressOfNameOrdinals
Line 5: Adds base address to the RVA to get absolute address of the Address of name ordinal
Line 6: Adds the ordinal value of the function above the desired one in ecx
As we can see the ordinal value in ecx is pointing to the function WideCharToMultiByte
The ordinal value of WinExec is 639.
Lines 7-8: Point to the addresses of functions. That 's the value we need to call the function.
Line 9: Gets the RVA of the address of function for WinExec in eax
Comparing with the previous screenshot we can see that it's a match
Line 10: We add the base address and we have the function in the rax register ready to be called as needed.
Helper code for hash calculation
If you made it to this point, you are probably wondering how can you calculate the hash and provide it to the assembly code.
The following code will calculate and print the hash for us:
Call the WinExec function
We now reach the end of our code.
Referring back to the original go code we need to get a pointer to a null terminated string, in this example a pointer to 'calc.exe' and then call the function.
Line 2: we can use the helper code to calculate the function hash:
We then feed the value to r8d.
Line 3: We call the parse_module function. if everything went well rax will have the address of the function
Great, we now only have to pass the arguments to the function.
It's a good place to pause now and have a quick look on the x64 calling convention. When calling a function in x64 the first four arguments will go to the registers rcx,rdx,r8,r9 and all the rest to the stack from right to left. So the last argument should be pushed to the stack first and so on.
A great source of information is Microsoft's website.
With this knowledge let's pass the arguments to WinExec.
So WinExec definition from microsoft states that that the first argument should be the a pointer to the null terminated string.
Lines 4-8:
Line 4: zero -> rcx
Line 5: push 0 to the stack. This will act as the null termination
Line 6: The hex values of calc.exe are moved to rcx
To convert ascii to hex I am using this online converter https://www.rapidtables.com/convert/number/ascii-to-hex.html
calc.exe = 63 61 6C 63 2E 65 78 65 + 00
Be careful when the bytes are pushed into the stack the order will be reversed, so the bytes should be written in the register in reverse order as shown above.
Also the register can only hold 8-bytes so if the string is longer we will need to go through this process multiple times until the whole string is stored
Line 7: Pushes the string to the stack
Line 8: Get a pointer to the string in the rcx register. ( first argument)
Lines 9-10: zero rdx and inc by 1.
Line 11: Argument storage space ( shadow space) and stack alignment
Line 12: Finally calling the function.
Just before calling the function this is what we see in rcx,rdx which is exactly what we expect.
Stepping over the function should launch a calc.exe process
The whole shellcode template can be found here.
Last updated