Shellcode is a sequence of machine code that is commonly abused to execute malicious codes after vulnerability exploitation, download the next payload, or beacon back to its C2 server. This article presents various techniques and tools to analyze Windows shellcode. It contains are 3 main parts:
- Shellcode Introduction
- Dynamic Analysis via Emulation and Real Execution
- Code Analysis via Debugging and Static Analysis
Shellcode Introduction
Shellcode is a sequence of hex-value CPU instructions that can be interpreted and executed directly by the CPU. Below is an example of a 32-bit shellcode:
56 64 a1 30 00 00 00 8b 40 0c 8b 70 1c ad 8b 40 08 5e c3 55 8b ec 8b 45 08 52 33 d2 c1 c2 03 32 10 40 80 38 00 75 f5 8b c2 5a 5d c2 04 00 55 8b ec 51 51 53 56 57 60 8b 5d 08 33 c0 8b 75 0c 8b fe 03 76 3c 8b 4e 78 03 cf 8b 51 1c 52 8b 51 24 52 8b 71 14 4e 89 75 fc 8b 71 20 03 f7 99 4a ad 42 60 3b 55 fc 75 04 33 c0 eb 37 33 ff 03 45 0c 97 8b cf ae 75 fd 2b f9 4f 51 e8 94 ff ff ff 3b c3 61 74 02 eb d9 8b 45 0c 92 5e
CPU interprets “56” as “PUSH ESI” and “64 a1 30 00 00 00” as “MOV EAX, FS:[0x30]”. Figure 1 presents how the CPU interprets and executes instructions from the example above:
If you are interested in understanding the inner work of shellcode which is useful for debugging and code analysis, it is recommended to check chapter 19 in the Practical Malware Analysis book, in which the author explains the necessary steps needed for a shellcode to execute properly (e.g. get PEB, find module addresses and load libraries). There are also a few samples for exercise purposes. However, I rarely use this technique when analyzing shellcode since it takes time, more difficult to investigate and there is a better way to achieve more or less similar results, dynamic analysis.
Depend on the language of a malicious file being investigated, shellcode can be stored in multiple forms. Below are some common forms of Shellcode embedded in a malicious file.
- Hex number:
56 64 a1 30 00 00
- Backslash:
\x56\x64\xa1\x30\x00\x00
- Percentage Unicode
%u6456%u30a1%u0000
- Backslash Unicode
\u6456\u30a1\u0000
- Byte array
[0x56, 0x64, 0xa1, 0x30, 0x00, 0x00] //depend on programing language
When analyzing a shellcode, there is often a need to execute/debug the shellcode to observe its behavior (it will be covered in later sections), it is convenient to convert the shellcode to an EXE file. There are multiple tools for this purpose; however, I find shellcode2exe bat script is working well for both 32-bit and 64-bit shellcode.
shellcode2exe.bat 32/64 <shellcode filename> <EXE output filename>
Dynamic Analysis via Emulation and Real Execution
In my opinion, dynamic analysis is the most efficient way to analyze shellcode; since shellcode is a small piece of machine instructions, it often doesn’t contain defensive mechanisms again dynamic analysis. In other words, by executing it, we should be able to observe most of its activities. There are two favors of dynamic analysis, namely emulation and real execution:
- Emulation: as the name indicates, the shellcode is not actually executed, but only emulated to records all of its activities. scdbg tool is an excellent tool for this purpose. Figure 3 presents how simple it is to analyze a shellcode with scdbg.
scdbg -f <shellcode> -s <number of execution steps>Note:
- the default number of steps is 2000000.
- If the "Stepcount" reach the default number of execution step, it means that the shellcode still needs to execute beyond that number of step. In that case, "-s" is used to increate the number of execution steps
- Real execution:
The easiest way is to convert shellcode to an EXE file and perform dynamic analysis. If you are not familiar with the topic, I have explained in detail malware behavior analysis in the article Malware Analysis Lab and Behavioral Analysis Steps.
However, sometimes you would need more control over how the shellcode is started. For example, a shellcode is initiated with some custom parameters, or some opened handles to files are requested for the shellcode to run properly, etc. If such control is needed, jmp2it is a great tool to customize the initiation.
jmp2it <shellcode file> <offset to shellcode> [pause|addhandle]- Option pause: jmp2it inserts a JMP causing infinite loop prior to shellcode. In that way, analyst can attach debugger to jmp2it and debug the shellcode
- Option addhandle: jmp2it creates a handle to the file
Code Analysis via Debugging and Static Analysis
The above technique, dynamic analysis, works well for shellcodes that are not armed with defensive mechanisms. However, a complicated shellcode may perform few environmental checks prior to executing its malicious activities. For example, a shellcode may check to make sure that there is no anti-virus software installed in the targeted machine. In those cases, code analysis either via debugging or static code analysis or a combination of both would help analysts understanding the code logic and bypassing those evasion techniques.
- Debugging: is similar to dynamic analysis in the sense that the technique relies on shellcode execution; hence, converting shellcode to EXE or using jmp2it tool are also the starting step. However, it is different from the dynamic analysis in the controlled manner of execution. Dynamic analysis executes a shellcode uninterruptedly; while in debugging mode,
an analyst can step into/step over a subroutine/instruction(s), pause, set breakpoints, and examine all data in memory and registries for every single execution step. It gives the analyst a great view of the flow of the shellcode. There are quite a few debugger tools in user mode, one of them is x64dbg. Unlike its name, the tool can be used to debug both 32-bit and 64-bit shellcodes.
- Code analysis: is to statically examine shellcode in a disassembler to understand its logic without the need for execution. This technique is similar to read an application code in an editor. There are few popular options both free and commercial (e.g. IDA, Ghidra, and Radare). IDA is undoubtedly the most popular disassembler; however, its commercial version is pretty expensive. I used to use IDA free for code analysis; however, after the release of NSA’s tool, Ghidra, I mainly utilize this tool since it offers many IDA’s commercial features without any cost. When import shellcode to Ghidra, it is important that you select the correct language and compiler specifications. For example, in the figure below, my shellcode is 32-bit and is designed to run in the Windows environment.
Initially, Ghidra would treat hex values in the shellcode file as data; hence, we need to explicitly tell Ghidra to interpret as code.
To disassemble, select all rows --> right click and choose “Disassemble” (or press D)