Home

Awesome

Build Gitlab-CI Circle-CI Lines of Code Code Smells Bugs Vulnerabilities Reliability Rating Quality Gate Status Docker Automated build

References

ntop Webinar 2022 ntopconf 2023

Disclaimer

Please respect & protect the privacy of others.

The purpose of this software is not to spy on others, but to detect network anomalies and malicious traffic.

Abstract

nDPId is a set of daemons and tools to capture, process and classify network traffic. Its minimal dependencies (besides a half-way modern C library and POSIX threads) are libnDPI (>=4.9.0 or current github dev branch) and libpcap.

The daemon nDPId is capable of multithreading for packet processing, but w/o mutexes for performance reasons. Instead, synchronization is achieved by a packet distribution mechanism. To balance the workload to all threads (more or less) equally, a unique identifier represented as hash value is calculated using a 3-tuple consisting of: IPv4/IPv6 src/dst address; IP header value of the layer4 protocol; and (for TCP/UDP) src/dst port. Other protocols e.g. ICMP/ICMPv6 lack relevance for DPI, thus nDPId does not distinguish between different ICMP/ICMPv6 flows coming from the same host. This saves memory and performance, but might change in the future.

nDPId uses libnDPI's JSON serialization interface to generate a JSON messages for each event it receives from the library and which it then sends out to a UNIX-socket (default: /tmp/ndpid-collector.sock ). From such a socket, nDPIsrvd (or other custom applications) can retrieve incoming JSON-messages and further proceed working/distributing messages to higher-level applications.

Unfortunately, nDPIsrvd does not yet support any encryption/authentication for TCP connections (TODO!).

Architecture

This project uses a kind of microservice architecture.

                connect to UNIX socket [1]        connect to UNIX/TCP socket [2]                
_______________________   |                                 |   __________________________
|     "producer"      |___|                                 |___|       "consumer"       |
|---------------------|      _____________________________      |------------------------|
|                     |      |        nDPIsrvd           |      |                        |
| nDPId --- Thread 1 >| ---> |>           |             <| ---> |< example/c-json-stdout |
| (eth0) `- Thread 2 >| ---> |> collector | distributor <| ---> |________________________|
|        `- Thread N >| ---> |>    >>> forward >>>      <| ---> |                        |
|_____________________|  ^   |____________|______________|   ^  |< example/py-flow-info  |
|                     |  |                                   |  |________________________|
| nDPId --- Thread 1 >|  `- send serialized data [1]         |  |                        |
| (eth1) `- Thread 2 >|                                      |  |< example/...           |
|        `- Thread N >|         receive serialized data [2] -'  |________________________|
|_____________________|                                                                   

where:

JSON stream format

JSON messages streamed by both nDPId and nDPIsrvd are presented with:

[5-digit-number][JSON message]

as with the following example:

01223{"flow_event_id":7,"flow_event_name":"detection-update","thread_id":12,"packet_id":307,"source":"wlan0", ...snip...}
00458{"packet_event_id":2,"packet_event_name":"packet-flow","thread_id":11,"packet_id":324,"source":"wlan0", ...snip...}
00572{"flow_event_id":1,"flow_event_name":"new","thread_id":11,"packet_id":324,"source":"wlan0", ...snip...}

The full stream of nDPId generated JSON-events can be retrieved directly from nDPId, without relying on nDPIsrvd, by providing a properly managed UNIX-socket.

Technical details about the JSON-message format can be obtained from the related .schema file included in the schema directory

Events

nDPId generates JSON messages whereby each string is assigned to a certain event. Those events specify the contents (key-value-pairs) of the JSON message. They are divided into four categories, each with a number of subevents.

Error Events

They are 17 distinct events, indicating that layer2 or layer3 packet processing failed or not enough flow memory available:

  1. Unknown datalink layer packet
  2. Unknown L3 protocol
  3. Unsupported datalink layer
  4. Packet too short
  5. Unknown packet type
  6. Packet header invalid
  7. IP4 packet too short
  8. Packet smaller than IP4 header:
  9. nDPI IPv4/L4 payload detection failed
  10. IP6 packet too short
  11. Packet smaller than IP6 header
  12. nDPI IPv6/L4 payload detection failed
  13. TCP packet smaller than expected
  14. UDP packet smaller than expected
  15. Captured packet size is smaller than expected packet size
  16. Max flows to track reached
  17. Flow memory allocation failed

Detailed JSON-schema is available here

Daemon Events

There are 4 distinct events indicating startup/shutdown or status events as well as a reconnect event if there was a previous connection failure (collector):

  1. init: nDPId startup
  2. reconnect: (UNIX) socket connection lost previously and was established again
  3. shutdown: nDPId terminates gracefully
  4. status: statistics about the daemon itself e.g. memory consumption, zLib compressions (if enabled)

Detailed JSON-schema is available here

Packet Events

There are 2 events containing base64 encoded packet payloads either belonging to a flow or not:

  1. packet: does not belong to any flow
  2. packet-flow: belongs to a flow e.g. TCP/UDP or ICMP

Detailed JSON-schema is available here

Flow Events

There are 9 distinct events related to a flow:

  1. new: a new TCP/UDP/ICMP flow seen which will be tracked
  2. end: a TCP connection terminates
  3. idle: a flow timed out, because there was no packet on the wire for a certain amount of time
  4. update: inform nDPIsrvd or other apps about a long-lasting flow, whose detection was finished a long time ago but is still active
  5. analyse: provide some information about extracted features of a flow (Experimental; disabled per default, enable with -A)
  6. guessed: libnDPI was not able to reliably detect a layer7 protocol and falls back to IP/Port based detection
  7. detected: libnDPI sucessfully detected a layer7 protocol
  8. detection-update: libnDPI dissected more layer7 protocol data (after detection already done)
  9. not-detected: neither detected nor guessed

Detailed JSON-schema is available here. Also, a graphical representation of Flow Events timeline is available here.

Flow States

A flow can have three different states while it is been tracked by nDPId.

  1. skipped: the flow will be tracked, but no detection will happen to reduce memory usage. See command line argument -I and -E
  2. finished: detection finished and the memory used for the detection is freed
  3. info: detection is in progress and all flow memory required for libnDPI is allocated (this state consumes most memory)

Build (CMake)

nDPId build system is based on CMake

git clone https://github.com/utoni/nDPId.git
[...]
cd ndpid
mkdir build
cd build
cmake ..
[...]
make

see below for a full/test live-session

Based on your build environment and/or desiderata, you could need:

mkdir build
cd build
ccmake ..

or to build with a staticially linked libnDPI:

cmake -S . -B ./build \
    -DSTATIC_LIBNDPI_INSTALLDIR=[path/to/your/libnDPI/installdir] \
    -DNDPI_NO_PKGCONFIG=ON
cmake --build ./build

If you use the latter, make sure that you've configured libnDPI with ./configure --prefix=[path/to/your/libnDPI/installdir] and remember to set the all-necessary CMake variables to link against shared libraries used by your nDPI build. You'll also need to use -DNDPI_NO_PKGCONFIG=ON if STATIC_LIBNDPI_INSTALLDIR does not contain a pkg-config file.

e.g.:

cmake -S . -B ./build \
    -DSTATIC_LIBNDPI_INSTALLDIR=[path/to/your/libnDPI/installdir] \
    -DNDPI_NO_PKGCONFIG=ON \
    -DNDPI_WITH_GCRYPT=ON -DNDPI_WITH_PCRE=OFF -DNDPI_WITH_MAXMINDDB=OFF
cmake --build ./build

Or let a shell script do the work for you:

cmake -S . -B ./build \
    -DBUILD_NDPI=ON
cmake --build ./build

The CMake cache variable -DBUILD_NDPI=ON builds a version of libnDPI residing as a git submodule in this repository.

run

As mentioned above, in order to run nDPId, a UNIX-socket needs to be provided in order to stream our related JSON-data.

Such a UNIX-socket can be provided by both the included nDPIsrvd daemon, or, if you simply need a quick check, with the ncat utility, with a simple ncat -U /tmp/listen.sock -l -k. Remember that OpenBSD netcat is not able to handle multiple connections reliably.

Once the socket is ready, you can run nDPId capturing and analyzing your own traffic, with something similar to: sudo nDPId -c /tmp/listen.sock If you're using OpenBSD netcat, you need to run: sudo nDPId -c /tmp/listen.sock -o max-reader-threads=1 Make sure that the UNIX socket is accessible by the user (see -u) to whom nDPId changes to, default: nobody.

Of course, both ncat and nDPId need to point to the same UNIX-socket (nDPId provides the -c option, exactly for this. By default, nDPId refers to /tmp/ndpid-collector.sock, and the same default-path is also used by nDPIsrvd for the incoming socket).

Give nDPId some real-traffic. You can capture your own traffic, with something similar to:

socat -u UNIX-Listen:/tmp/listen.sock,fork - # does the same as `ncat`
sudo chown nobody:nobody /tmp/listen.sock # default `nDPId` user/group, see `-u` and `-g`
sudo ./nDPId -c /tmp/listen.sock -l

nDPId supports also UDP collector endpoints:

nc -d -u 127.0.0.1 7000 -l -k
sudo ./nDPId -c 127.0.0.1:7000 -l

or you can generate a nDPId-compatible JSON dump with:

./nDPId-test [path-to-a-PCAP-file]

You can also automatically fire both nDPId and nDPIsrvd automatically, with:

Daemons:

make -C [path-to-a-build-dir] daemon

Or a manual approach with:

./nDPIsrvd -d
sudo ./nDPId -d

or for a usage printout:

./nDPIsrvd -h
./nDPId -h

And why not a flow-info example?

./examples/py-flow-info/flow-info.py

or anything below ./examples.

nDPId tuning

It is possible to change nDPId internals w/o recompiling by using -o subopt=value. But be careful: changing the default values may render nDPId useless and is not well tested.

Suboptions for -o:

Format: subopt (unit, comment): description

test

The recommended way to run regression / diff tests:

cmake -S . -B ./build-like-ci \
    -DBUILD_NDPI=ON -DENABLE_ZLIB=ON -DBUILD_EXAMPLES=ON
# optional: -DENABLE_CURL=ON -DENABLE_SANITIZER=ON
./test/run_tests.sh ./libnDPI ./build-like-ci/nDPId-test
# or: make -C ./build-like-ci test

Run ./test/run_tests.sh to see some usage information.

Remember that all test results are tied to a specific libnDPI commit hash as part of the git submodule. Using test/run_tests.sh for other commit hashes will most likely result in PCAP diffs.

Code Coverage

You may generate code coverage by using:

cmake -S . -B ./build-coverage \
    -DENABLE_COVERAGE=ON -DENABLE_ZLIB=ON
# optional: -DBUILD_NDPI=ON
make -C ./build-coverage coverage-clean
make -C ./build-coverage clean
make -C ./build-coverage all
./test/run_tests.sh ./libnDPI ./build-coverage/nDPId-test
make -C ./build-coverage coverage
make -C ./build-coverage coverage-view

Contributors

Special thanks to Damiano Verzulli (@verzulli) from GARRLab for providing server and test infrastructure.