Awesome
pdscan
Scan your data stores for unencrypted personal data (PII)
- Last names (US)
- Email addresses
- IP addresses (IPv4)
- Street addresses (US)
- Phone numbers
- Credit card numbers
- Social Security numbers (US)
- Dates of birth
- Location data
- OAuth tokens
- MAC addresses
Uses data sampling and naming, and works with compressed files
:boom: Zero runtime dependencies and minimal database load
Installation
Download the latest version:
You can also install it with Homebrew or Docker.
Data Stores
Elasticsearch
pdscan elasticsearch+http://user:pass@host:9200
For HTTPS, use elasticsearch+https://
.
You can also specify indices.
pdscan elasticsearch+http://user:pass@host:9200/index1,index2
Wildcards are also supported.
pdscan "elasticsearch+http://user:pass@host:9200/index*"
Files
pdscan file://path/to/file.txt
You can also specify a directory.
pdscan file://path/to/directory
For absolute paths, use file:///
.
pdscan file:///absolute/path/to/file.txt
For paths relative to your home directory on Mac and Linux, use:
pdscan file://$HOME/file.txt
MariaDB
pdscan mariadb://user:pass@host:3306/dbname
MongoDB
pdscan mongodb://user:pass@host:27017/dbname
MySQL
pdscan mysql://user:pass@host:3306/dbname
OpenSearch
pdscan opensearch+http://user:pass@host:9200
For HTTPS, use opensearch+https://
.
You can also specify indices.
pdscan opensearch+http://user:pass@host:9200/index1,index2
Wildcards are also supported.
pdscan "opensearch+http://user:pass@host:9200/index*"
Postgres
pdscan postgres://user:pass@host:5432/dbname
Always make sure your connection is secure when connecting to a database over a network you don’t fully trust. Your best option is to connect over SSH or a VPN. Another option is to use sslmode=verify-full
. If you don’t do this, your database credentials can be compromised.
If your connection doesn’t use SSL, append to the URI:
?sslmode=disable
For best sampling, enable the tsm_system_rows extension (ships with Postgres 9.5+).
CREATE EXTENSION tsm_system_rows;
Redis
pdscan redis://user:pass@host:6379/db
S3
pdscan s3://bucket/path/to/file.txt
Requires
s3:GetObject
permission
You can also specify a prefix by ending with a /
.
pdscan s3://bucket/path/to/directory/
Requires
s3:ListBucket
ands3:GetObject
permissions
SQLite
pdscan sqlite://path/to/dbname.sqlite3
Not available with prebuilt binaries
SQL Server
pdscan "sqlserver://user:pass@host:1433?database=dbname"
Options
Show the data found
pdscan --show-data
Show low confidence matches
pdscan --show-all
Change the sample size
pdscan --sample-size 50000
Specify the number of processes to use (defaults to 1)
pdscan --processes 4
Scan for only certain types of data
pdscan --only email,phone,location
Scan for all except certain types of data
pdscan --except ip,mac
Specify the minimum number of rows/documents/lines for a match (experimental)
pdscan --min-count 10
Specify a custom pattern (experimental)
pdscan --pattern "\d{16}"
Output newline delimited JSON (experimental)
pdscan --format ndjson
Additional Installation Methods
Homebrew
With Homebrew, you can use:
brew install ankane/brew/pdscan
Docker
Get the Docker image with:
docker pull ankane/pdscan
And run it with:
docker run -ti ankane/pdscan <connection-uri>
For data stores on the host machine, use host.docker.internal
as the hostname
docker run -ti ankane/pdscan "postgres://user@host.docker.internal:5432/dbname?sslmode=disable"
On Linux, this requires Docker 20.04+ and
--add-host=host.docker.internal:host-gateway
For files on the host machine, use:
docker run -ti -v /path/to/files:/data ankane/pdscan file:///data
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/pdscan.git
cd pdscan
make test