Home

Awesome

pcodedmp.py - A VBA p-code disassembler

Introduction

It is not widely known, but macros written in VBA (Visual Basic for Applications; the macro programming language used in Microsoft Office) exist in three different executable forms, each of which can be what is actually executed at run time, depending on the circumstances. These forms are:

Since most of the time it is the p-code that determines what exactly a macro would do (even if neither source code, nor execodes are present), it would make sense to have a tool that can display it. This is what prompted us to create this VBA p-code disassembler.

Installation

The script will work both in Python version 2.6+ and in Python 3.x. The simplest way to install it is from PyPi with pip:

pip install pcodedmp -U

The above command will install the latest version of pcodedmp (upgrading an older one if it already exists), while also installing all the necessary dependencies (currently only oletools and win_unicode_console but there might be additional ones in the future).

If you would rather install it from the GitHub repository, you can do it like this:

git clone https://github.com/bontchev/pcodedmp.git
cd pcodedmp
pip install .

Usage

The script takes as a command-line argument a list of one or more names of files or directories. If the name is an OLE2 document, it will be inspected for VBA code and the p-code of each code module will be disassembled. If the name is a directory, all the files in this directory and its subdirectories will be similarly processed. In addition to the disassembled p-code, by default the script also displays the parsed records of the dir stream, as well as the identifiers (variable and function names) used in the VBA modules and stored in the _VBA_PROJECT stream.

The script supports VBA5 (Office 97, MacOffice 98), VBA6 (Office 2000 to Office 2009) and VBA7 (Office 2010 and higher).

The script also accepts the following command-line options:

-h, --help Displays a short explanation how to use the script and what the command-line options are.

-v, --version Displays the version of the script.

-n, --norecurse If a name specified on the command line is a directory, process only the files in this directory; do not process the files in its subdirectories.

-d, --disasmonly Only the p-code will be disassembled, without the parsed contents of the dir stream or the identifiers in the _VBA_PROJECT stream.

-b, --verbose The contents of the dir and _VBA_PROJECT streams is dumped in hex and ASCII form. In addition, the raw bytes of each compiled into p-code VBA line are also dumped in hex and ASCII.

-o OUTFILE, --output OUTFILE Save the results to the specified output file, instead of sending it to the standard output.

For instance, using the script on one of the documents in the proof of concept mentioned above produces the following results:

python pcodedmp.py -d Word2013.doc

Processing file: Word2013.doc
===============================================================================
Module streams:
Macros/VBA/ThisDocument - 1517 bytes
Line #0:
        FuncDefn (Private Sub Document_Open())
Line #1:
        LitStr 0x001D "This could have been a virus!"
        Ld vbOKOnly
        Ld vbInformation
        Add
        LitStr 0x0006 "Virus!"
        ArgsCall MsgBox 0x0003
Line #2:
        LitStr 0x0008 "calc.exe"
        Paren
        ArgsCall Shell 0x0001
Line #3:
        EndSub

For reference, it is the result of compiling the following VBA code:

Private Sub Document_Open()
    MsgBox "This could have been a virus!", vbOKOnly + vbInformation, "Virus!"
    Shell("calc.exe")
End Sub

Known problems

I do not have access to 64-bit Office 2016 and the few samples of documents, generated by this version of Office, that I have, have been insufficient for me to figure out where the corresponding information resides. I know where it resides in the other versions of Office, but it has been moved elsewhere in 64-bit Office 2016 and the old algorithms no longer work.

To do

Change log

Version 1.2.6:

Version 1.2.5:

Version 1.2.4:

Version 1.2.3:

Version 1.2.2:

Version 1.2.1:

Version 1.2.0:

Version 1.1.0:

Version 1.0.0: