

Hadoop File System Forensics Toolkit (HDFS FTK)

View and extract files from an offline image of Hadoop file system.



Hadoop File Systems is one of the most widely used distributed file systems in the world. However, forensic techniques to analyze and audit the systems remain limited.

In HDFS, metadata is separated from the actual data blocks. The namenode contains metadata (file name, timestamps, permissions); while actual data is stored in the datanodes in blocks. Although HDFS has command client tools to manage the extraction files, it only works with a running cluster of HDFS machines. This tool aims to provide investigators with the ability to perform forensics analysis on offline evidence captures of Hadoop File System images.



Evidence Acquisition Procedure

$namenode: hdfs dfsadmin -safemode enter
$namenode: hdfs dfsadmin –saveNamespace
$namenode: hdfs oiv -i <PATH_TO_FSIMAGE> -o <FILE> -p XML
$datanodes: tar czf datanodex.tar.gz $HADOOP_HOME/Hadoop_data


Hadoop Forensics File System Forensics Tool Kit (HDFS FTK) ver. 1.0
usage: hdfs_ftk.py [-h] [-v] [-displayfsimage] [-showfilesonly] [-showdironly]
                   [-filterByName FILTERBYNAME] [-d D] [-o O] [-r R] -f F

HDFS Forensics Toolkit.

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Verbose mode
  -displayfsimage       Print out fsimage table
  -showfilesonly        Show files only
  -showdironly          Show directories only
  -filterByName FILTERBYNAME
                        Filter fsimage display by filename
  -d D                  Number of Datanodes
  -o O, -output O       output directory
  -r R, -recover R      Recover file given ID number

Required named arguments:
  -f F, -fsimage F      Path to fsimage

Example commands:

Demo videos


