Home

Awesome

Build Status Go Version Latest Version License Open Issues

<img src="static/dnsmonster-logo.png" width=60%>

Table of Contents

Dnsmonster

Passive DNS monitoring framework built on Golang. dnsmonster implements a packet sniffer for DNS traffic. It can accept traffic from a pcap file, a live interface or a dnstap socket, and can be used to index and store hundreds of thousands of DNS queries per second as it has shown to be capable of indexing 200k+ DNS queries per second on a commodity computer. It aims to be scalable, simple, and easy to use, and help security teams to understand the details about an enterprise's DNS traffic. dnsmonster doesn't look to follow DNS conversations, rather it aims to index DNS packets as soon as they come in. It also doesn't aim to breach the privacy of the end-users, with the ability to mask Layer 3 IPs (IPv4 and IPv6), enabling teams to perform trend analysis on aggregated data without being able to trace back the queries to an individual. Blogpost

The code before version 1.x is considered beta quality and is subject to breaking changes. Please visit the release notes for each tag to see the list of breaking scenarios between each release, and how to mitigate potential data loss.

graph TD
    subgraph Input
        B1["network input"]
        B2["pcap file"]
        B3["dnstap socket"]
    end
    
    subgraph "Process"
        C1["Sampling based of ratio"]
        C2["Packet Process"]
        C3["Dispatcher"]
        O11["Output1"]
        O12["Domain Skip (optional)"]
        O13["Domain Allow (optional)"]
        O21["Output2"]
        O22["Domain Skip (optional)"]
        O23["Domain Allow (optional)"]
        O31["Output3"]
        O32["Domain Skip (optional)"]
        O33["Domain Allow (optional)"]
    end
    
    B1 --> Process
    B2 --> Process
    B3 --> Process
    
    C1 --> C2
    C2 --> C3
    C3 --> O11
    C3 --> O21
    C3 --> O31
    
    O11 --> O12 --> O13
    O21 --> O22 --> O23
    O31 --> O32 --> O33
    
    subgraph Output
        Splunk
        Syslog
        H["ClickHouse"]
        Postgres
        Kafka
        I["JSON File"]
        Influx
        Elastic
        J["stdout"]
        Parquet
        Sentinel
    end
    
    O13 --> H
    O23 --> I
    O33 --> J

Main features

Installation

Linux

Best way to get started with dnsmonster is to download the binary from the release section. The binary is statically built against musl, hence it should work out of the box for many distros. For afpacket support, you must use kernel 3.x+. Any modern Linux distribution (CentOS/RHEL 7+, Ubuntu 14.0.4.2+, Debian 7+) is shipped with a 3.x+ version so it should work out of the box. If your distro isn't working with the pre-compiled version, please submit an issue with the details, and build dnsmonster manually using this section Build Manually.

Container

Since dnsmonster uses raw packet capture funcationality, Docker/Podman daemon must grant the capability to the container

sudo docker run --rm -it --net=host --cap-add NET_RAW --cap-add NET_ADMIN --name dnsmonster ghcr.io/mosajjal/dnsmonster:latest --devName lo --stdoutOutputType=1

Build manually

git clone https://github.com/mosajjal/dnsmonster --depth 1 /tmp/dnsmonster 
cd /tmp/dnsmonster
go get
go build -o dnsmonster ./cmd/dnsmonster
git clone https://github.com/mosajjal/dnsmonster --depth 1 /tmp/dnsmonster 
cd /tmp/dnsmonster
go get
go build -o dnsmonster -tags nolibpcap ./cmd/dnsmonster

The above build also works on ARMv7 (RPi4) and AArch64.

Build statically

If you have a copy of libpcap.a, you can build the statically link it to dnsmonster and build it fully statically. In the code below, please change /root/libpcap-1.9.1/libpcap.a to the location of your copy.

git clone https://github.com/mosajjal/dnsmonster --depth 1 /tmp/dnsmonster
cd /tmp/dnsmonster/
go get
go build --ldflags "-L /root/libpcap-1.9.1/libpcap.a -linkmode external -extldflags \"-I/usr/include/libnl3 -lnl-genl-3 -lnl-3 -static\"" -a -o dnsmonster ./cmd/dnsmonster

For more information on how the statically linked binary is created, take a look at this Dockerfile.

Windows

Bulding on Windows is much the same as Linux. Just make sure that you have npcap. Clone the repository (--history 1 works), and run go get and go build ./cmd/dnsmonster

As mentioned, Windows release of the binary depends on npcap to be installed. After installation, the binary should work out of the box. It's been tested in a Windows 10 environment and it executed without an issue. To find interface names to give --devName parameter and start sniffing, you'll need to do the following:

dnsmonster.exe --devName \Device\NPF_{16000000-0000-0000-0000-145C4638064C}

Note that you must change \Tcpip from getmac.exe to \NPF and then pass it to dnsmonster.exe.

FreeBSD and MacOS

Much the same as Linux and Windows, make sure you have git, libpcap and go installed, then follow the same instructions:

git clone https://github.com/mosajjal/dnsmonster --depth 1 /tmp/dnsmonster 
cd /tmp/dnsmonster
go get
go build -o dnsmonster ./cmd/dnsmonster

Architecture

All-in-one Installation using Docker

Basic AIO Diagram

In the example diagram, the egress/ingress of the DNS server traffic is captured, after that, an optional layer of packet aggregation is added before hitting the DNSMonster Server. The outbound data going out of DNS Servers is quite useful to perform cache and performance analysis on the DNS fleet. If an aggregator isn't available for you, you can have both TAPs connected directly to DNSMonster and have two DNSMonster Agents looking at the traffic.

running ./autobuild.sh creates multiple containers:

All-in-one Demo

AIO Demo

Enterprise Deployment

Basic AIO Diagram

Configuration

DNSMonster can be configured using 3 different methods. Command line options, Environment variables and configuration file. Order of precedence:

Command line options

Note that command line arguments are case-insensitive as of v0.9.5

# [capture]
# Device used to capture
--devname=

# Pcap filename to run
--pcapfile=

# dnstap socket path. Example: unix:///tmp/dnstap.sock, tcp://127.0.0.1:8080
--dnstapsocket=

# Port selected to filter packets
--port=53

# Capture Sampling by a:b. eg sampleRatio of 1:100 will process 1 percent of the incoming packets
--sampleratio=1:1

# Cleans up packet hash table used for deduplication
--dedupcleanupinterval=1m0s

# Set the dnstap socket permission, only applicable when unix:// is used
--dnstappermission=755

# Number of routines used to handle received packets
--packethandlercount=2

# Size of the tcp assembler
--tcpassemblychannelsize=10000

# Size of the tcp result channel
--tcpresultchannelsize=10000

# Number of routines used to handle tcp packets
--tcphandlercount=1

# Size of the channel to send packets to be defragged
--defraggerchannelsize=10000

# Size of the channel where the defragged packets are returned
--defraggerchannelreturnsize=10000

# Size of the packet handler channel
--packetchannelsize=1000

# Afpacket Buffersize in MB
--afpacketbuffersizemb=64

# BPF filter applied to the packet stream. If port is selected, the packets will not be defragged.
--filter=((ip and (ip[9] == 6 or ip[9] == 17)) or (ip6 and (ip6[6] == 17 or ip6[6] == 6 or ip6[6] == 44)))

# Use AFPacket for live captures. Supported on Linux 3.0+ only
--useafpacket

# The PCAP capture does not contain ethernet frames
--noetherframe

# Deduplicate incoming packets, Only supported with --devName and --pcapFile. Experimental 
--dedup

# Do not put the interface in promiscuous mode
--nopromiscuous

# [clickhouse_output]
# Address of the clickhouse database to save the results. multiple values can be provided.
--clickhouseaddress=localhost:9000

# Username to connect to the clickhouse database
--clickhouseusername=

# Password to connect to the clickhouse database
--clickhousepassword=

# Database to connect to the clickhouse database
--clickhousedatabase=default

# Interval between sending results to ClickHouse. If non-0, Batch size is ignored and batch delay is used
--clickhousedelay=0s

# Clickhouse connection LZ4 compression level, 0 means no compression
--clickhousecompress=0

# Debug Clickhouse connection
--clickhousedebug

# Use TLS for Clickhouse connection
--clickhousesecure

# Save full packet query and response in JSON format.
--clickhousesavefullquery

# What should be written to clickhouse. options:
#	0: Disable Output
#	1: Enable Output without any filters
#	2: Enable Output and apply skipdomains logic
#	3: Enable Output and apply allowdomains logic
#	4: Enable Output and apply both skip and allow domains logic
--clickhouseoutputtype=0

# Minimum capacity of the cache array used to send data to clickhouse. Set close to the queries per second received to prevent allocations
--clickhousebatchsize=100000

# Number of Clickhouse output Workers
--clickhouseworkers=1

# Channel Size for each Clickhouse Worker
--clickhouseworkerchannelsize=100000

# [elastic_output]
# What should be written to elastic. options:
#	0: Disable Output
#	1: Enable Output without any filters
#	2: Enable Output and apply skipdomains logic
#	3: Enable Output and apply allowdomains logic
#	4: Enable Output and apply both skip and allow domains logic
--elasticoutputtype=0

# elastic endpoint address, example: http://127.0.0.1:9200. Used if elasticOutputType is not none
--elasticoutputendpoint=

# elastic index
--elasticoutputindex=default

# Send data to Elastic in batch sizes
--elasticbatchsize=1000

# Interval between sending results to Elastic if Batch size is not filled
--elasticbatchdelay=1s

# [file_output]
# What should be written to file. options:
#	0: Disable Output
#	1: Enable Output without any filters
#	2: Enable Output and apply skipdomains logic
#	3: Enable Output and apply allowdomains logic
#	4: Enable Output and apply both skip and allow domains logic
--fileoutputtype=0

# Path to output folder. Used if fileoutputType is not none
--fileoutputpath=

# Interval to rotate the file in cron format
--fileoutputrotatecron=0 0 * * *

# Number of files to keep. 0 to disable rotation
--fileoutputrotatecount=4

# Output format for file. options:json, csv, csv_no_header, gotemplate. note that the csv splits the datetime format into multiple fields
--fileoutputformat=json

# Go Template to format the output as needed
--fileoutputgotemplate={{.}}

# [influx_output]
# What should be written to influx. options:
#	0: Disable Output
#	1: Enable Output without any filters
#	2: Enable Output and apply skipdomains logic
#	3: Enable Output and apply allowdomains logic
#	4: Enable Output and apply both skip and allow domains logic
--influxoutputtype=0

# influx Server address, example: http://localhost:8086. Used if influxOutputType is not none
--influxoutputserver=

# Influx Server Auth Token
--influxoutputtoken=dnsmonster

# Influx Server Bucket
--influxoutputbucket=dnsmonster

# Influx Server Org
--influxoutputorg=dnsmonster

# Minimum capacity of the cache array used to send data to Influx
--influxoutputworkers=8

# Minimum capacity of the cache array used to send data to Influx
--influxbatchsize=1000

# [kafka_output]
# What should be written to kafka. options:
#	0: Disable Output
#	1: Enable Output without any filters
#	2: Enable Output and apply skipdomains logic
#	3: Enable Output and apply allowdomains logic
#	4: Enable Output and apply both skip and allow domains logic
--kafkaoutputtype=0

# kafka broker address(es), example: 127.0.0.1:9092. Used if kafkaOutputType is not none
--kafkaoutputbroker=

# Kafka topic for logging
--kafkaoutputtopic=dnsmonster

# Minimum capacity of the cache array used to send data to Kafka
--kafkabatchsize=1000

# Output format. options:json, gob. 
--kafkaoutputformat=json

# Kafka connection timeout in seconds
--kafkatimeout=3

# Interval between sending results to Kafka if Batch size is not filled
--kafkabatchdelay=1s

# Compress Kafka connection
--kafkacompress

# Compression Type Kafka connection [snappy gzip lz4 zstd]; default(snappy).
--kafkacompressiontype=snappy

# Use TLS for kafka connection
--kafkasecure

# Path of CA certificate that signs Kafka broker certificate
--kafkacacertificatepath=

# Path of TLS certificate to present to broker
--kafkatlscertificatepath=

# Path of TLS certificate key
--kafkatlskeypath=

# [parquet_output]
# What should be written to parquet file. options:
#	0: Disable Output
#	1: Enable Output without any filters
#	2: Enable Output and apply skipdomains logic
#	3: Enable Output and apply allowdomains logic
#	4: Enable Output and apply both skip and allow domains logic
--parquetoutputtype=0

# Path to output folder. Used if parquetoutputtype is not none
--parquetoutputpath=

# Number of records to write to parquet file before flushing
--parquetflushbatchsize=10000

# Number of workers to write to parquet file
--parquetworkercount=4

# Size of the write buffer in bytes
--parquetwritebuffersize=256000

# [psql_output]
# What should be written to Microsoft Psql. options:
#	0: Disable Output
#	1: Enable Output without any filters
#	2: Enable Output and apply skipdomains logic
#	3: Enable Output and apply allowdomains logic
#	4: Enable Output and apply both skip and allow domains logic
--psqloutputtype=0

# Psql endpoint used. must be in uri format. example: postgres://username:password@hostname:port/database?sslmode=disable
--psqlendpoint=

# Number of PSQL workers
--psqlworkers=1

# Psql Batch Size
--psqlbatchsize=1

# Interval between sending results to Psql if Batch size is not filled. Any value larger than zero takes precedence over Batch Size
--psqlbatchdelay=0s

# Timeout for any INSERT operation before we consider them failed
--psqlbatchtimeout=5s

# Save full packet query and response in JSON format.
--psqlsavefullquery

# [sentinel_output]
# What should be written to Microsoft Sentinel. options:
#	0: Disable Output
#	1: Enable Output without any filters
#	2: Enable Output and apply skipdomains logic
#	3: Enable Output and apply allowdomains logic
#	4: Enable Output and apply both skip and allow domains logic
--sentineloutputtype=0

# Sentinel Shared Key, either the primary or secondary, can be found in Agents Management page under Log Analytics workspace
--sentineloutputsharedkey=

# Sentinel Customer Id. can be found in Agents Management page under Log Analytics workspace
--sentineloutputcustomerid=

# Sentinel Output LogType
--sentineloutputlogtype=dnsmonster

# Sentinel Output Proxy in URI format
--sentineloutputproxy=

# Sentinel Batch Size
--sentinelbatchsize=100

# Interval between sending results to Sentinel if Batch size is not filled. Any value larger than zero takes precedence over Batch Size
--sentinelbatchdelay=0s

# [splunk_output]
# What should be written to HEC. options:
#	0: Disable Output
#	1: Enable Output without any filters
#	2: Enable Output and apply skipdomains logic
#	3: Enable Output and apply allowdomains logic
#	4: Enable Output and apply both skip and allow domains logic
--splunkoutputtype=0

# splunk endpoint address, example: http://127.0.0.1:8088. Used if splunkOutputType is not none, can be specified multiple times for load balanace and HA
--splunkoutputendpoint=

# Splunk HEC Token
--splunkoutputtoken=00000000-0000-0000-0000-000000000000

# Splunk Output Index
--splunkoutputindex=temp

# Splunk Output Proxy in URI format
--splunkoutputproxy=

# Splunk Output Source
--splunkoutputsource=dnsmonster

# Splunk Output Sourcetype
--splunkoutputsourcetype=json

# Send data to HEC in batch sizes
--splunkbatchsize=1000

# Interval between sending results to HEC if Batch size is not filled
--splunkbatchdelay=1s

# [stdout_output]
# What should be written to stdout. options:
#	0: Disable Output
#	1: Enable Output without any filters
#	2: Enable Output and apply skipdomains logic
#	3: Enable Output and apply allowdomains logic
#	4: Enable Output and apply both skip and allow domains logic
--stdoutoutputtype=0

# Output format for stdout. options:json,csv, csv_no_header, gotemplate. note that the csv splits the datetime format into multiple fields
--stdoutoutputformat=json

# Go Template to format the output as needed
--stdoutoutputgotemplate={{.}}

# Number of workers
--stdoutoutputworkercount=8

# [syslog_output]
# What should be written to Syslog server. options:
#	0: Disable Output
#	1: Enable Output without any filters
#	2: Enable Output and apply skipdomains logic
#	3: Enable Output and apply allowdomains logic
#	4: Enable Output and apply both skip and allow domains logic
--syslogoutputtype=0

# Syslog endpoint address, example: udp://127.0.0.1:514, tcp://127.0.0.1:514. Used if syslogOutputType is not none
--syslogoutputendpoint=udp://127.0.0.1:514

# [zinc_output]
# What should be written to zinc. options:
#	0: Disable Output
#	1: Enable Output without any filters
#	2: Enable Output and apply skipdomains logic
#	3: Enable Output and apply allowdomains logic
#	4: Enable Output and apply both skip and allow domains logic
--zincoutputtype=0

# index used to save data in Zinc
--zincoutputindex=dnsmonster

# zinc endpoint address, example: http://127.0.0.1:9200/api/default/_bulk. Used if zincOutputType is not none
--zincoutputendpoint=

# zinc username, example: admin@admin.com. Used if zincOutputType is not none
--zincoutputusername=

# zinc password, example: password. Used if zincOutputType is not none
--zincoutputpassword=

# Send data to Zinc in batch sizes
--zincbatchsize=1000

# Interval between sending results to Zinc if Batch size is not filled
--zincbatchdelay=1s

# Zing request timeout
--zinctimeout=10s

# [general]
# Garbage Collection interval for tcp assembly and ip defragmentation
--gctime=10s

# Duration to calculate interface stats
--capturestatsdelay=1s

# Mask IPv4s by bits. 32 means all the bits of IP is saved in DB
--masksize4=32

# Mask IPv6s by bits. 32 means all the bits of IP is saved in DB
--masksize6=128

# Name of the server used to index the metrics.
--servername=default

# Set debug Log format
--logformat=text

# Set debug Log level, 0:PANIC, 1:ERROR, 2:WARN, 3:INFO, 4:DEBUG
--loglevel=3

# Size of the result processor channel size
--resultchannelsize=100000

# write cpu profile to file
--cpuprofile=

# write memory profile to file
--memprofile=

# GOMAXPROCS variable
--gomaxprocs=-1

# Limit of packets logged to clickhouse every iteration. Default 0 (disabled)
--packetlimit=0

# Skip outputing domains matching items in the CSV file path. Can accept a URL (http:// or https://) or path
--skipdomainsfile=

# Hot-Reload skipdomainsfile interval
--skipdomainsrefreshinterval=1m0s

# Allow Domains logic input file. Can accept a URL (http:// or https://) or path
--allowdomainsfile=

# Hot-Reload allowdomainsfile file interval
--allowdomainsrefreshinterval=1m0s

# Skip TLS verification when making HTTPS connections
--skiptlsverification

# [metric]
# Metric Endpoint Service
--metricendpointtype=

# Statsd endpoint. Example: 127.0.0.1:8125 
--metricstatsdagent=

# Prometheus Registry endpoint. Example: http://0.0.0.0:2112/metric
--metricprometheusendpoint=

# Format for  output.
--metricformat=json

# Interval between sending results to Metric Endpoint
--metricflushinterval=10s

Environment variables

all the flags can also be set via env variables. Keep in mind that the name of each parameter is always all upper case and the prefix for all the variables is "DNSMONSTER."

Example:

$ export DNSMONSTER_PORT=53
$ export DNSMONSTER_DEVNAME=lo
$ sudo -E dnsmonster

Configuration file

you can run dnsmonster using the following command to use configuration file:

$ sudo dnsmonster --config=dnsmonster.ini

# Or you can use environment variables to set the configuration file path
$ export DNSMONSTER_CONFIG=dnsmonster.ini
$ sudo -E dnsmonster

What's the retention policy

The default retention policy for the ClickHouse tables is set to 30 days. You can change the number by building the containers using ./autobuild.sh. Since ClickHouse doesn't have an internal timestamp, the TTL will look at incoming packet's date in pcap files. So while importing old pcap files, ClickHouse may automatically start removing the data as they're being written and you won't see any actual data in your Grafana. To fix that, you can change TTL to a day older than your earliest packet inside the PCAP file.

NOTE: to change a TTL at any point in time, you need to directly connect to the Clickhouse server using a clickhouse client and run the following SQL statement (this example changes it from 30 to 90 days):

ALTER TABLE DNS_LOG MODIFY TTL DnsDate + INTERVAL 90 DAY;`

NOTE: The above command only changes TTL for the raw DNS log data, which is the majority of your capacity consumption. To make sure that you adjust the TTL for every single aggregation table, you can run the following:

ALTER TABLE DNS_LOG MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_DOMAIN_COUNT` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_DOMAIN_UNIQUE` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_PROTOCOL` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_GENERAL_AGGREGATIONS` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_EDNS` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_OPCODE` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_TYPE` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_CLASS` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_RESPONSECODE` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_SRCIP_MASK` MODIFY TTL DnsDate + INTERVAL 90 DAY;

UPDATE: in the latest version of clickhouse, the .inner tables don't have the same name as the corresponding aggregation views. To modify the TTL you have to find the table names in UUID format using SHOW TABLES and repeat the ALTER command with those UUIDs.

Sampling and Skipping

pre-process sampling

dnsmonster supports pre-processing sampling of packet using a simple parameter: sampleRatio. this parameter accepts a "ratio" value, like 1:2. 1:2 means for each 2 packet that arrives, only process one of them (50% sampling). Note that this sampling happens AFTER bpf filters and not before. if you have an issue keeping up with the volume of your DNS traffic, you can set this to something like 2:10, meaning 20% of the packets that pass your bpf filter, will be processed by dnsmonster.

skip domains

dnsmonster supports a post-processing domain skip list to avoid writing noisy, repetitive data to your Database. The domain skip list is a csv-formatted file, with only two columns: a string and a logic for that particular string. dnsmonster supports three logics: prefix, suffix and fqdn. prefix and suffix means that only the domains starting/ending with the mentioned string will be skipped to be written to DB. Note that since the process is being done on DNS questions, your string will most likely have a trailing . that needs to be included in your skip list row as well (take a look at skipdomains.csv.sample for a better view). You can also have a full FQDN match to avoid writing highly noisy FQDNs into your database.

allow domains

dnsmonster has the concept of allowdomains, which helps building the detection if certain FQDNs, prefixes or suffixes are present in the DNS traffic. Given the fact that dnsmonster supports multiple output streams with different logic for each one, it's possible to collect all DNS traffic in ClickHouse, but collect only allowlist domains in stdout or in a file in the same instance of dnsmonster.

SAMPLE in clickhouse SELECT queries

By default, the main tables created by tables.sql (DNS_LOG) file have the ability to sample down a result as needed, since each DNS question has a semi-unique UUID associated with it. For more information about SAMPLE queries in Clickhouse, please check out this document.

Supported Inputs

NOTE: if your pcap file is captured by one of Linux's meta-interfaces (for example tcpdump -i any), dnsmonster won't be able to read the Ethernet frame off of it since it doesn't exist. you can use a tool like tcprewrite to convert the pcap file to Ethernet.

Supported Outputs

Roadmap

Related projects