MS Office Forensic: VBA obfuscation via VBA Stomping and Purging

Tho Le
4 min readJan 6, 2021

VBA is one of the most common attack vectors against MS office; hence, adversaries have been finding ways to evade Anti-virus, forensic tools and even static analysis. In this article, I cover two interesting approaches, namely VBA Stomping and VBA Purging.

P/S: for details of MS document analysis, please refer to my previous article “Malicious Microsoft Office Document Analysis and Analyze a Cobalt Sample

The article contains three parts as below:

  • VBA Hierarchy Explanation
  • VBA Stomping
  • VBA Purging

VBA Hierarchy Explanation

Microsoft provides rich details about VBA storage in its documentation page . The figure below illustrates the overview of VBA hierachy:

Figure 1: VBA Storage Hierarchy (source: https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-ovba/005bffd7-cd96-4f25-b75f-54433a646b88)

The actual structure of a MS document with macro can be viewed via GUI tools, like ssview and hexflex. An example is shown in the Figure 2 via ssview:

Figure 2: VBA hierarchy from a word file

Some important streams for VBA operation are explained as below:

  • <module_stream>: in the example above, they are “NewMacros” and “ThisDocument”. module stream(s) containts the actual VBA codes that are executed. In each module stream, there are two parts, namely PerformanceCache (compiled VBA code for performance or pcode) and CompressedSourceCode. These components are abused to evade detection, which will be discussed further in coming sections for VBA Stomping and VBA Purging.
Figure 3: Module stream structure (Source: https://blog.nviso.eu/2020/02/25/evidence-of-vba-purging-found-in-malicious-documents/)
Figure 4: PerformanceCache (above and unreadable) and CompressedSourceCode (below and contain some readable characters)
  • dir: is a compressed binary stream for the VBA layout. The offset between PerformanceCache and CompressedSourceCode are defined in this stream
  • _VBA_PROJECT: is a binary stream which instruct the execution of VBA engine. “If the MS Office version specified in the _VBA_PROJECT stream matches the MS Office version of the host program then the VBA source code in the module stream is ignored and the p-code is executed instead.” (source: https://outflank.nl/blog/2019/05/05/evil-clippy-ms-office-maldoc-assistant/)
  • _SRP_n: contains PerformanceCache related data
  • Project: is a stream of text data regarding project information.

VBA Stomping

VBA Stomping is a technique that simply removes the CompressedSourceCode in a module stream and only leave the complied code for execution (PerformanceCache); hence, forensic tools, which relies on VBA code, would not detect malicious MS files applying this technique. It is important to note that MS office only executes compiled codes if the MS office version in the target host is the same to the value specified in _VBA_PROJECT stream.

For more details, vbastomp.com contains good resources to dig deeper. Furthermore, outflank has release a tool named EvilClippy to stomp VBA code and confuse analysis tools.

Lets have a look at an example found in VT with SHA256: 23f7817eae61fda0a25292c4fd4f8d7e07150c657274c912776548c59e6c71f3

  • ssview prompts error when trying to open the file. It could due to the removal of compressed code.
Figure 5: ssview failed to open a VBA stomping file
  • oledump.py tool doesn’t detect any macro codes. Hence, AV or analysts may clarify the file as benign. However, there are VBA-related streams found in the output, which hints the present of VBA usage.
Figure 6: VBA output
  • Since CompressedSourceCode has been stomped from “Module1”, VBA source codes can’t been seen and only compiled version.(PerformanceCache). pcodedump and olevba tools can be used to decompiled the embedded VBA codes.
Figure 7: PerformanceCache (compiled codes) output from pcodedmp tool

VBA Purging

February 2020, Didier Stevens has published evidence of a new technique dubbed VBA Purging. VBA Purging is simple a reverse of VBA Stomping, in which it removes PerformanceCache and leave CompressedSourceCode. This is clearly a obfuscation move to evade signature detection based on PerformanceCache scanning.

Few month later (November 2020), FireEye also published an excellent blog explaining VBA Purging. They also released a commandline tool to test this technique, called OfficePurge.

In a nutshell, there are 2 notables from this techniques as in Figure 8:

  • Compiled codes are removed from module streams. In this example, it can be seen that stream 7 and 8 has 0 byte for complied codes, while compressed codes are 137 and 924 bytes respectively.
  • all VBA-related information in _VBA_Project are removed, leaving only 7 bytes for the magic header. Recalling from previous section, MS Office reads information in this stream to decide whether it should use compiled codes or compressed codes. In this case, compiled codes have been removed; hence, the stream should be empty.
  • All _SRP_n streams are moved (if having) as those streams also contains data regarding PerformanceCache
Figure 8: Oledump Output of a sample purged word file from OfficePurge
  • As usual, the compressed VBA code can be viewed via analysis tools, such as oledump.
Figure 9:Oledump output of compressed VBA codes

--

--

Tho Le

Senior Cyber Security Analyst — be better than the yesterday self