Home

Awesome

Aleph Artifact Inspection Pipeline

This is the Aleph core. Responsible for processing samples and extracting artifacts and intelligence.

CircleCI

Installing

I'm running on a debian VM. Current install expects everything running locally but you can edit config.yaml to whatever you want.

To install rabbitmq-server

$ sudo apt-get install rabbitmq-server

To install elasticsearch

See: https://www.elastic.co/guide/en/elasticsearch/reference/current/deb.html

Create a folder for aleph

mkdir -p /opt/aleph
cd /opt/aleph

Install python3 and deps

$ sudo apt-get install build-essential libffi-dev python3 python3-dev python3-pip libfuzzy-dev

Create a virtual environment

$ python3 -m .venv

Then clone git repo to same folder and copy over the sample config git clone git@bitbucket.org:janseidl/aleph2.git

Set up the config.yaml file

$ cp config.yaml.example config.yaml
$ vim config.yaml

Activate virtual environment

$ source .venv/bin/activate

Install python dependencies

$ pip3 install -r requirements.txt

Creating the index

The index must be created on ES and the aleph mapping applied to the index. This must be done prior to running Aleph. Aleph won't try to create any indexes nor configure ES.

The example below creates and applies the mapping(assuming ES is on localhost):

$ curl -X PUT http://localhost:9200/aleph-samples
$ curl -X PUT http://localhost:9200/aleph-tracking
$ curl -X PUT http://localhost:9200/aleph-samples/_mapping/sample -d @sample.mapping.json --header 'Content-Type: application/json'
$ curl -X PUT http://localhost:9200/aleph-tracking/_mapping/sample -d @tracking.mapping.json --header 'Content-Type: application/json'

Running

The example config file uses a local collector and local file storage. Please set the appropriate paths before launching Aleph

Standalone

To run aleph in a single worker, use the following celery command

$ celery worker -A aleph -B -c 1

Distributed

In the distributed configuration, you can run any combination of workers and queues you want. Just bind the worker to a queue with -Q queue option.

Don't forget to give each worker a different name otherwise they will conflict and die.

Default Queues

Running

$  celery worker -A aleph -Q collector -n worker1@%h -c 1
$  celery worker -A aleph -Q manager,store -n worker2@%h
$  celery worker -A aleph -Q plugins.generic -n worker3@%h  
$  celery worker -A aleph -Q plugins.sandbox -n worker4@%h  
...
$  celery -A aleph beat # Scheduler

And that should have you up and running

Concurrency / Number of concurrent workers

Each celery command can run under multiple forked() processes by using the -c N parameter.

DO NOT RUN MORE THAN 1 WORKER (-c 1) FOR COLLECTORS OR GOD WILL MURDER ALL FIRSTBORNS

Logging

So far it is configured to pipe log entries to both stdout and aleph.log. If you want to suppress file-logging, just omit the path directive in the logging section of the config file. Also all logger calls are doing debug entries, which we must fix (promote some of them to INFO). Error logging is a @TODO as well.

In order to see the debug log, you must set the log level to debug on the worker

$ celery worker -A aleph -B -l debug

Using Docker

The docker-compose.yml will spawn both rabbitmq and elasticsearch dockers, alongside the 4 aleph machines with the same worker configuration as in the "Distributed" example.

Create a samples folder in the same folder of the Dockerfile and call docker-compose up to bring them up to speed.