


A bulk_extractor scanner plug-in to detect and validate New Zealand Inland Revenue (IR) numbers. They are commonly referred to as IRD numbers or IR numbers. This number is similar to the US-based Social Security Number (SSN). IRD numbers are used by both businesses and individuals for all tax, entitlements and personal details held by the Inland Revenue, a New Zealand government department which administers government revenue and various social support programmes.

The purpose of this project is to provide a plug-in to the well-known bulk_extractor tool - a computer forensics tool used to scan electronic data and extract artifacts of interest. In this case, the artifacts of interest are IRD numbers. Bulk_extractor is unique in the fact that it can process any type of data input, such as forensic disk images, folders of files and network traffic captures.

Quickstart: Ubutnu 18.04

Download bulk extractor:

git clone --recursive https://github.com/simsong/bulk_extractor.git

Download IRDNumberScanner and copy files to plugins directory:

git clone https://github.com/thomaslaurenson/IRDNumberScanner.git

cp ~/IRDNumberScanner/scan_ird.flex ~/bulk_extractor/plugins/scan_ird.flex

cp ~/IRDNumberScanner/Makefile.am ~/bulk_extractor/plugins/Makefile.am

Compile bulk_extractor:

cd ~/bulk_extractor

bash etc/CONFIGURE_UBUNTU18.bash




cd plugins

make plugins

Associated Project Material

This repository hosts the code produced as a result of the research performed by Henry Gee for his Master's thesis research conducted at the University of Otago. In addition, I have continued development of the project adding additional tools and documentation so that the project remains relevant and useful to computer forensic practitioners.

You can access Henry Gee Master's Thesis from the OUR Archive hosted by the University of Otago. In addition, you can access the resultant academic publication of the paper submitted to the 2015 Australian Digital Forensics Conference, available online from the Edith Cowen University Research Online repository. Finally, the associated conference presentation entitled Improving the Detection and Validation of Inland Revenue Numbers is available on my personal website.

Project Structure

This project comes with a collection of useful programs and files. Below is a brief summary of the contents of the project:

IRD Number Format and Validation Method

New Zealand Inland Revenue (IR/IRD) numbers have a specified number format and validation method. The most current documentation (2018) from the Inland Revenue department is available online as a PDF. Specifically, see pages 33 to 35 of the provided link. Below is a summary of the IRD number structure and validation process.

An IRD number is an eight or nine digit number consisting of the following components:

IRD Number Format

New Zealand IRD numbers may be stored in a variety of different structures or formats. Although an IRD number is an 8 or 9 digit number, it is commonly stored with spaces or dash delimiters. However, there is no standardised storage method. An example of the storage format is provided below:

A full table of potential storage formats is provided in the table below. The scan_ird plug-in search for all potential IRD numbers as documented by the table below:

IRD Number DescriptionIRD Number StructureExample
8 digitsNNNNNNNN49091850
8 digits with space delimiterNN NNN NNN49 091 850
8 digits with dash delimiterNN-NNN-NNN49-091-850
9 digitsNNNNNNNNN136410133
9 digits with space delimiterNNN NNN NNN136 410 133
9 digits with dash delimiterNNN-NNN-NNN136-410-133

Project Installation Instructions

This plug-in has been tested with bulk_extractor version 1.5.5 and the development code hosted on GitHub (dated: 2018/05/15, commit: ecb627d7b60d5a34b51639a8deaffc4db59fda27). The following installation instructions outline how to install the IRDNumberScanner on the Git development version, using Ubuntu 18.04 LTS.

Make sure git is installed:

sudo apt install git

This documentation expects you are in the home directory of your user account:

cd ~

Clone the official bulk_extractor repository:

git clone --recursive https://github.com/simsong/bulk_extractor.git

Change to the bulk_extractor directory:

cd ~/bulk_extractor

Execute the provided script to configure Ubuntu system, which installs various system dependencies:

bash etc/CONFIGURE_UBUNTU18.bash

Clone this (IRDNumberScanner) repository:

cd ~

git clone https://github.com/thomaslaurenson/IRDNumberScanner.git

Copy the two required files (scan_ird.flex and Makefile.am) to the plugins directory for bulk_extractor. These instructions expect that both bulk_extractor and IRDNumberScanner repositories have been cloned directly into your home directory:

cp ~/IRDNumberScanner/scan_ird.flex ~/bulk_extractor/plugins/scan_ird.flex

cp ~/IRDNumberScanner/Makefile.am ~/bulk_extractor/plugins/Makefile.am

Just to be clear: The scan_ird.flex file is the code implemented to perform IRD number scanning that is implemented as a plugin to bulk_extractor. While the Makefile.am is a modified version of the original bulk_extractor Makefile, used to compile the plugin source code. The only modifications to the Makefile.am file is the inclusion of the scan_ird plugin, and disabling of the scan_flexdemo plugin (as it was not operating correctly at time of the last development on this project).

Make sure you are in the bulk_extractor directory:

cd ~/bulk_extractor

Run the bootstrap script:

chmod u+x bootstrap.sh


Run the configure script:


Now, compile (or make) the entire bulk_extractor porject:


Finally, the plugins also require compilation:

cd ~/bulk_extractor/plugins/

make plugins

You can either install bulk_extractor system wide using the following command:

sudo make install

Or, simply use the created binaries in the project:

cd ~/bulk_extractor

Then run the bulk_extractor binary:


You can also specifically include the plug-ins directory using the -P command line argument:

./src/bulk_extractor -P plugins

You can verify that the scan_ird plug-in has been successfully compiled and is available using the following command:

./src/bulk_extractor -P plugins -h | grep ird

Finally, below is an example command of how to run the IRD scanner:

./src/bulk_extractor -P plugins -E ird -o ~/test-run -R ~/sample-ird-numbers.txt

This command specifies that only the IRD scanner should be run (-E ird), that the directory for output is saved in a folder called test-run in the home directory (-o ~/test-run), and that the IRD scanner is executed against a file named sample-ird-numbers.txt (-R ~/sample-ird-numbers.txt).