Awesome

comms-analyzer-toolbox

Docker image that provides a simplified OSINT toolset for the import and analysis of communications content from email MBOX files, and other CSV data (such as text messages) using Elasticsearch and Kibana. This provides a single command that launches a full OSINT analytical software stack as well as imports all of your communications into it, ready for analysis w/ Kibana and ElasticSearch.

Summary
Docker setup
Importing email from MBOX files
Importing data from CSV files
Analyze previously imported data
Expected warnings
Help/Resources
Security/Privacy

<a id="summary"></a> Summary

This project manages a Dockerfile to produce an image that when run starts both ElasticSearch and Kibana and then optionally imports communications data using the the following tools bundled within the container:

IMPORTANT the links below are FORKS of the original projects due to outstanding issues w/ the original projects that were not fixed at the time of this projects development

elasticsearch-gmail python scripts which import email data from an MBOX file. (See this link for issues this fork addresses)
csv2es python scripts which can import any data from an CSV file. (See this link for issues this fork addresses)

From there... well, you can analyze and visualize practically anything about your communications. Enjoy.

Diag1

Diag2

<a id="dockersetup"></a>Docker setup

Before running the example below, you need Docker installed.

Windows Note: When you git clone this project on Windows prior to building be sure to add the git clone flag --config core.autocrlf=input. Example git clone https://github.com/bitsofinfo/comms-analyzer-toolbox.git --config core.autocrlf=input. read more here

Once Docker is installed bring up a command line shell and type the following to build the docker image for the toolbox:

docker build -t comms-analyzer-toolbox .

Docker toolbox for Windows notes

The default docker machine VM created is likely to underpowered to run this out of the box. You will need to do the following to increase the CPU and memory of the local virtual-box machine

Bring up a "Docker Quickstart Terminal"
Remove the default machine: docker-machine rm default
Recreate it: docker-machine create -d virtualbox --virtualbox-cpu-count=[N cpus] --virtualbox-memory=[XXXX megabytes] --virtualbox-disk-size=[XXXXXX] default

Troubleshooting error: "max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]"

If you see this error when starting the toolbox (the error is reported from Elasticsearch) you will need do the following on the docker host the container is being launched on.

sysctl -w vm.max_map_count=262144

If you are using Docker Toolbox, you have to first shell into the boot2docker VM first with docker ssh default to run this command. Or do the following to make it permanent: https://github.com/docker/machine/issues/3859

<a id="mboxsummary"></a>MBOX import summary

For every email message in your MBOX file, each message becomes a separate document in ElasticSearch where all email headers are indexed as individual fields and all body content indexed and stripped of html/css/js.

For example, each email imported into the index has the following fields available for searching and analysis in Kibana (plus many, many more)

date_ts (epoch_millis timestamp in GMT/UTC)
to
from
cc
bcc
subject
body
body_size

<a id="gmailexample"></a>Example: export Gmail email to mbox file

Once Docker is available on your system, before you run comms-analyzer-toolbox you need to have some email to analyze in MBOX format. As an example, below is how to export email from Gmail.

Login to your gmail account with a web-browser on a computer
Go to: https://takeout.google.com/settings/takeout
On the screen that says "Download your data", under the section "Select data to include" click on the "Select None" button. This will grey-out all the "Products" listed below it
Now scroll down and find the greyed out section labeled "Mail" and click on the X checkbox on the right hand side. It will now turn green indicating this data will be prepared for you to download.
Scroll down and click on the blue "Next" button
Leave the "Customize archive format" settings as-is and hit the "Create Archive" button
This will now take you to a "We're preparing your archive." screen. This might take a few hours depending on the size of all the email you have.
You will receive an email from google when the archive is ready to download. When you get it, download the zip file to your local computer's hard drive, it will be named something like takeout-[YYYMMMDDD..].zip
Once save to your hard drive, you will want to unzip the file, once unzipped all of your exported mail from Gmail will live in an mbox export file in the Takeout/Mail/ folder and the filename with all your mail is in: All mail Including Spam and Trash.mbox
You should rename this file to something simpler like my-email.mbox
Take note of the location of your .mbox file as you will use it below when running the toolbox.

<a id="runningmbox"></a>Running: import emails for analysis

Before running the example below, you need Docker installed.

Bring up a terminal or command prompt on your computer and run the following, before doing so, you need to replace PATH/TO/YOUR/email.mbox and PATH/TO/ELASTICSEARCH_DATA_DIR below with the proper paths on your local system as appropriate.

Note: if using Docker Toolbox for Windows: All of the mounted volumes below should live somewhere under your home directory under c:\Users\[your username]\... due to permissions issues.

docker run --rm -ti \
   --ulimit nofile=65536:65536 \
  -v PATH/TO/YOUR/my-email.mbox:/toolbox/email.mbox \
  -v PATH/TO/ELASTICSEARCH_DATA_DIR:/toolbox/elasticsearch/data \
  comms-analyzer-toolbox:latest \
  python /toolbox/elasticsearch-gmail/src/index_emails.py \
  --infile=/toolbox/email.mbox \
  --init=[True | False] \
  --index-bodies=True \
  --index-bodies-ignore-content-types=application,image \
  --index-bodies-html-parser=html5lib \
  --index-name=comm_data

Setting --init=True will delete and re-create the comm_data index. Setting --init=False will retain whatever data already exists

The console will log output of what is going on, when the system is booted up you can bring up a web browser on your desktop and go to http://localhost:5601 to start using Kibana to analyze your data. Note: if running docker toolbox; 'localhost' might not work, execute a docker-machine env default to determine your docker hosts IP address, then go to http://[machine-ip]:5601"

On the first screen that says Configure an index pattern, in the field labeled Index name or pattern you type comm_data you will then see the date_ts field auto-selected, then hit the Create button. From there Kibana is ready to use!

Launching does several things in the following order

Starts ElasticSearch (where your indexed emails are stored)
Starts Kibana (the user-interface to query the index)
Starts the mbox importer

When then mbox importer is running you will see the following entries in the logs as the system does its work importing your mail from the mbox files

...
[I 170825 18:46:53 index_emails:96] Upload: OK - upload took:  467ms, total messages uploaded:   1000
[I 170825 18:48:23 index_emails:96] Upload: OK - upload took:  287ms, total messages uploaded:   2000
...

<a id="mboxoptions"></a>Toolbox MBOX import options

When running the comms-analyzer-toolbox image, one of the arguments is to invoke the elasticsearch-gmail script which takes the following arguments. You can adjust the docker run command above to pass the following flags as you please:

Usage: /toolbox/elasticsearch-gmail/src/index_emails.py [OPTIONS]

Options:

  --help                           show this help information

/toolbox/elasticsearch-gmail/src/index_emails.py options:

  --batch-size                     Elasticsearch bulk index batch size (default
                                   500)
  --es-url                         URL of your Elasticsearch node (default
                                   http://localhost:9200)
  --index-bodies                   Will index all body content, stripped of
                                   HTML/CSS/JS etc. Adds fields: 'body',
                                   'body_size' and 'body_filenames' for any
                                   multi-part attachments (default False)
  --index-bodies-html-parser       The BeautifulSoup parser to use for
                                   HTML/CSS/JS stripping. Valid values
                                   'html.parser', 'lxml', 'html5lib' (default
                                   html.parser)
  --index-bodies-ignore-content-types
                                   If --index-bodies enabled, optional list of
                                   body 'Content-Type' header keywords to match
                                   to ignore and skip decoding/indexing. For
                                   all ignored parts, the content type will be
                                   added to the indexed field
                                   'body_ignored_content_types' (default
                                   application,image)
  --index-name                     Name of the index to store your messages
                                   (default gmail)
  --infile                         The mbox input file

  --init                           Force deleting and re-initializing the
                                   Elasticsearch index (default False)
  --num-of-shards                  Number of shards for ES index (default 2)

  --skip                           Number of messages to skip from the mbox
                                   file (default 0)

<a id="mboxwarn"></a> MBOX import expected warnings

When importing MBOX email data, in the log output you may see warnings/errors like the following.

They are expected and ok, they are simply warnings about some special characters that are not able to be decoded etc.

...
/usr/lib/python2.7/site-packages/bs4/__init__.py:282: UserWarning: "https://someurl.com/whatever" looks like a URL. Beautiful Soup is not an HTTP client. You should probably use an HTTP client like requests to get the document behind the URL, and feed that document to Beautiful Soup.
  ' that document to Beautiful Soup.' % decoded_markup
[W 170825 18:41:56 dammit:381] Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
[W 170825 18:41:56 dammit:381] Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
...

<a id="csvsummary"></a>CSV import summary

The CSV import tool csv2es embedded in the toolbox can import ANY CSV file, not just this example format below.

For every row of data in a CSV file, each row becomes a separate document in ElasticSearch where all CSV columns are indexed as individual fields

For example, each line in the CSV data file below (text messages from an iphone) imported into the index has the following fields available for searching and analysis in Kibana

"Name","Address","date_ts","Message","Attachment","iMessage"
"Me","+1 555-555-5555","7/17/2016 9:21:39 AM","How are you doing?","","True"
"Joe Smith","+1 555-444-4444","7/17/2016 9:38:56 AM","Pretty good you?","","True"
"Me","+1 555-555-5555","7/17/2016 9:39:02 AM","Great!","","True"
....

date_ts (epoch_millis timestamp in GMT/UTC)
name
address
message
attachment
imessage

The above text messages export CSV is just an example. The csv2es tool that is bundled with the toolbox can import ANY data set you want not just the example format above.

<a id="iphoneexample"></a>Example: Export text messages from Iphone

Once Docker is available on your system, before you run comms-analyzer-toolbox you need to have some data to analyze in CSV format. As an example, below is how to export text messages from an iphone to a CSV file.

Export iphone messages using iExplorer for mac or windows
Edit the generated CSV file and change the first row's header value of "Time" to "date_ts", save and exit.
Take note of the location of your .csv file as you will use it below when running the toolbox.

<a id="runningcsv"></a>Running: import CSV of text messages for analysis

Before running the example below, you need Docker installed.

This example below is specifically for a CSV data file containing text message data exported using IExplorer

Contents of data.csv

"Name","Address","date_ts","Message","Attachment","iMessage"
"Me","+1 555-555-5555","7/17/2016 9:21:39 AM","How are you doing?","","True"
"Joe Smith","+1 555-444-4444","7/17/2016 9:38:56 AM","Pretty good you?","","True"
"Me","+1 555-555-5555","7/17/2016 9:39:02 AM","Great!","","True"
....

Contents of csvdata.mapping.json

{
    "dynamic": "true",
    "properties": {
        "date_ts": {"type": "date" },
        "name": {"type": "string", "index" : "not_analyzed"},
        "address": {"type": "string", "index" : "not_analyzed"},
        "imessage": {"type": "string", "index" : "not_analyzed"}
    }
}

Bring up a terminal or command prompt on your computer and run the following, before doing so, you need to replace PATH/TO/YOUR/data.csv, PATH/TO/YOUR/csvdata.mapping.json and PATH/TO/ELASTICSEARCH_DATA_DIR below with the proper paths on your local system as appropriate.

Note: if using Docker Toolbox for Windows: All of the mounted volumes below should live somewhere under your home directory under c:\Users\[your username]\... due to permissions issues.

docker run --rm -ti -p 5601:5601 \
  -v PATH/TO/YOUR/data.csv:/toolbox/data.csv \
  -v PATH/TO/YOUR/csvdata.mapping.json:/toolbox/csvdata.mapping.json \
  -v PATH/TO/ELASTICSEARCH_DATA_DIR:/toolbox/elasticsearch/data \
  comms-analyzer-toolbox:latest \
  python /toolbox/csv2es/csv2es.py \
    [--existing-index \]
    [--delete-index \]
	 --index-name comm_data \
	 --doc-type txtmsg \
	 --mapping-file /toolbox/csvdata.mapping.json \
	 --import-file /toolbox/data.csv \
	 --delimiter ',' \
	 --csv-clean-fieldnames \
	 --csv-date-field date_ts \
	 --csv-date-field-gmt-offset -1

If running against a pre-existing comm_data index make sure to include the --existing-index flag only. If you want to re-create the comm_data index prior to import, include the --delete-index flag only.

Launching does several things in the following order

Starts ElasticSearch (where your indexed CSV data is stored)
Starts Kibana (the user-interface to query the index)
Starts the CSV file importer

When then mbox importer is running you will see the following entries in the logs as the system does its work importing your mail from the mbox files

<a id="csvoptions"></a>Toolbox CSV import options

When running the comms-analyzer-toolbox image, one of the arguments is to invoke the csv2es script which takes the following arguments. You can adjust the docker run command above to pass the following flags as you please:

Usage: /toolbox/csv2es/csv2es.py [OPTIONS]

  Bulk import a delimited file into a target Elasticsearch instance. Common
  delimited files include things like CSV and TSV.

  Load a CSV file:
    csv2es --index-name potatoes --doc-type potato --import-file potatoes.csv

  For a TSV file, note the tab delimiter option
    csv2es --index-name tomatoes --doc-type tomato --import-file tomatoes.tsv --tab

  For a nifty pipe-delimited file (delimiters must be one character):
    csv2es --index-name pipes --doc-type pipe --import-file pipes.psv --delimiter '|'

Options:
  --index-name TEXT               Index name to load data into
                                  [required]
  --doc-type TEXT                 The document type (like user_records)
                                  [required]
  --import-file TEXT              File to import (or '-' for stdin)
                                  [required]
  --mapping-file TEXT             JSON mapping file for index
  --delimiter TEXT                The field delimiter to use, defaults to CSV
  --tab                           Assume tab-separated, overrides delimiter
  --host TEXT                     The Elasticsearch host
                                  (http://127.0.0.1:9200/)
  --docs-per-chunk INTEGER        The documents per chunk to upload (5000)
  --bytes-per-chunk INTEGER       The bytes per chunk to upload (100000)
  --parallel INTEGER              Parallel uploads to send at once, defaults
                                  to 1
  --delete-index                  Delete existing index if it exists
  --existing-index                Don't create index.
  --quiet                         Minimize console output
  --csv-clean-fieldnames          Strips double quotes and lower-cases all CSV
                                  header names for proper ElasticSearch
                                  fieldnames
  --csv-date-field TEXT           The CSV header name that represents a date
                                  string to parsed (via python-dateutil) into
                                  an ElasticSearch epoch_millis
  --csv-date-field-gmt-offset INTEGER
                                  The GMT offset for the csv-date-field (i.e.
                                  +/- N hours)
  --tags TEXT                     Custom static key1=val1,key2=val2 pairs to
                                  tag all entries with
  --version                       Show the version and exit.
  --help                          Show this message and exit.

<a id="analyzeonly"></a>Running: analyze previously imported data

Running in this mode will just launch elasticsearch and kibana and will not import anything. It just brings up the toolbox so you can analyze previously imported data that resides in elasticsearch.

Note: if using Docker Toolbox for Windows: All of the mounted volumes below should live somewhere under your home directory under c:\Users\[your username]\... due to permissions issues.

docker run --rm -ti -p 5601:5601 \
  -v PATH/TO/ELASTICSEARCH_DATA_DIR:/toolbox/elasticsearch/data \
  comms-analyzer-toolbox:latest \
  analyze-only

Want to control the default ElasticSearch JVM memory heap options you can do so via a docker environment variable i.e. -e ES_JAVA_OPTS="-Xmx1g -Xms1g" etc.

<a id="help"></a>Help/Resources

Gmail

IPhone text messages

Exporting text messages from IPhone to CSV

Hotmail/Outlook

For hotmail/outlook, you need to export to PST, and then as a second step convert to MBOX

Kibana, graphs, searching

<a id="security"></a> Security/Privacy

Using this tool is completely local to whatever machine you are running this tool on (i.e. your Docker host). In the case of running it on your laptop or desktop computer its 100% local.

Data is not uploaded or transferred anywhere.

The data does not go anywhere other than on disk locally to the Docker host this is running on.

To completely remove the data analyzed, you can docker rm -f [container-id] of the comms-analyzer-toolbox container running on your machine.

If you mounted the elasticsearch data directory via a volume on the host (i.e. -v PATH/TO/ELASTICSEARCH_DATA_DIR:/toolbox/elasticsearch/data) that locally directory is where all the indexed data resides locally on disk.