Awesome
Dockerfiles for DevOps, CI/CD, Big Data & NoSQL
Contains 50+ DockerHub repos with 340+ tags, many different versions of standard official open source software, see Full Inventory futher down
These docker images are tested by hundreds of tools and also used in the full functional test suites of various other GitHub repos.
See also the Kubernetes configs repo.
Overview - this repo contains:
- Hadoop & Big Data ecosystem technologies (Spark, Kafka, Presto, Drill, Nifi, ZooKeeper)
- NoSQL datastores (HBase, Cassandra, Riak, SolrCloud)
- OS & development images (Alpine, CentOS, Debian, Fedora, Ubuntu)
- DevOps, CI/CD (CircleCI, GitHub Actions, Jenkins, TeamCity etc), open source (RabbitMQ Cluster, Mesos, Consul)
- My GitHub repos containing hundreds of tools related to these technologies with all dependencies pre-built in the docker images
These images are all available pre-built on My DockerHub - https://hub.docker.com/u/harisekhon/.
- Quality and Testing - this repo has entire test suites run against it from various GitHub repositories to validate the docker images' functionality, branches vs tagged versions align, latest contains correct version from master branch, syntax checks covering all common build and file formats (Make/JSON/CSV/INI/XML/YAML configurations) etc.
These are reusable tests that can anybody can implement and can be found in my DevOps Python Tools and DevOps Bash Tools repos as well as the Advanced Nagios Plugins Collection which contains hundreds of technology specific API-level test programs to ensure the docker images are functioning as intended.
Continuous Integration in run on this and adjacent repos that form a bi-directional validation between these docker images and several other repositories full of hundreds of programs. All of this is intended to keep the quality of this repo as high as possible.
Hari Sekhon
Cloud & Big Data Contractor, United Kingdom
(ex-Cloudera, former Hortonworks Consultant)
<br>(you're welcome to connect with me on LinkedIn)
Ready to run Docker images
docker search harisekhon
docker run harisekhon/nagios-plugins
To see more than the 25 DockerHub repos limited by docker search
(docker issue 23055) I wrote dockerhub_search.py
using the DockerHub API, available in my DevOps Python Tools github repo and as a pre-built docker image:
docker run harisekhon/pytools dockerhub_search.py harisekhon
There are lots of tagged versions of official software in my repos to allow development testing across multiple versions, usually more versions than available from the official repos (and new version updates available on request, just raise a GitHub issue).
DockerHub tags are not shown by docker search
(docker issue 17238) so I wrote dockerhub_show_tags.py
available in my DevOps Python Tools github repo and as a pre-built docker image - eg. to see an organized list of all CentOS tags:
docker run harisekhon/pytools dockerhub_show_tags.py centos
For service technologies like Hadoop, HBase, ZooKeeper etc for which you'll also want port mappings, each directory in the GitHub project contains both a standard docker-compose
configuration as well as a make run
shortcut (which doesn't require docker-compose
to be installed) - either way you don't have to remember all the command line switches and port number specifics:
cd zookeeper
docker-compose up
or for technologies with interactive shells like Spark, ZooKeeper, HBase, Drill, Cassandra where you want to be dropped in to an interactive shell, use the make run
shortcut instead:
cd zookeeper
make run
which is much easier to type and remember than the equivalent bigger commands like:
docker run -ti -p 2181:2181 harisekhon/zookeeper
and avoid this for more complex services like Hadoop / HBase:
docker run -ti -p 2181:2181 -p 8080:8080 -p 8085:8085 -p 9090:9090 -p 9095:9095 -p 16000:16000 -p 16010:16010 -p 16201:16201 -p 16301:16301 harisekhon/hbase
docker run -ti -p 8020:8020 -p 8032:8032 -p 8088:8088 -p 9000:9000 -p 10020:10020 -p 19888:19888 -p 50010:50010 -p 50020:50020 -p 50070:50070 -p 50075:50075 -p 50090:50090 harisekhon/hadoop
Full Inventory
Official Standard Open Source Technologies
More specific information can be found in the readme page under each respective directory in the Dockerfiles git repo.
- Alluxio - distributed in-memory filesystem for cluster computing frameworks by UC Berkely's AMPLab - readme
- Apache Drill - distributed SQL engine by MapR (opens Drill SQL shell) - readme
- Awless - a Mighty CLI for AWS - readme
- AWS Elastic Beanstalk CLI - CLI for AWS Elastic Beanstalk - readme
- Backstage - Spotify's Backstage software catalog and developer portal - readme
- Cassandra - distributed NoSQL datastore by Facebook and DataStax (opens CQL shell, bundled with nagios-plugins)
- CircleCI Runner - CI/CD runner for CircleCI - readme
- Consul - distributed service discovery by HashiCorp
- FakeS3 - Amazon S3 API simulator for testing without incurring AWS S3 costs - readme
- GitHub Actions Runner - CI/CD runner for GitHub Actions - readme
- Git + Kustomize - minimal Git + Kustomize for CI/CD GitOps workflows - readme
- H2O - distributed machine learning framework by 0xdata
- Hadoop (HDFS + Yarn) - distributed storage and compute cluster by Yahoo, Cloudera and Hortonworks - readme
- HBase - distributed NoSQL datastore by Facebook (opens HBase shell) - readme
- Jenkins Agent with Docker - Jenkins inbound-agent with docker & docker-compose installed - readme
- Jenkins Agent with PHP + libs + New Relic - Jenkins inbound-agent with PHP + libs + New Relic installed - readme
- Jython - Python on Java JVM (useful for Hadoop python utilities using Hadoop's Java API. eg. DevOps Python Tools)
- Kafka - pub-sub data broker by LinkedIn and Confluent. Deprecated, see new Confluent docker images instead
- Mesos - datacenter resource manager by Mesosphere (mostly obsoleted by more free Hortonworks / Hadoop Yarn resource manager)
- Nifi - IOT data flow engine by NSA and Hortonworks
- OpenTSDB TCollector - metrics collector - sends metrics to OpenTSDB - readme
- Presto - distributed SQL engine by Facebook (opens Presto SQL shell) - readme
- Presto (Teradata distribution) - Teradata's Presto distribution including ODBC and JDBC drivers (opens Presto SQL shell) - readme
- RabbitMQ Cluster - pub-sub message queue broker by Pivotal (extension of RabbitMQ official image with added plugins)
- Riak KV - distributed NoSQL datastore by Basho
- Riak KV (bundled with nagios-plugins)
- Serf - decentralized cluster coordination engine by HashiCorp
- Solr - mature indexing engine built on Lucene search library
- SolrCloud - clustered distributed indexing engine version of Solr
- Spark - fast distributed cluster compute engine usually used on Hadoop, by UC Berkely's AMPLab and Databricks (opens Spark shell)
- Superset - data visualization by Airbnb
- Tachyon (Alluxio < 1.0) - distributed in-memory filesystem for cluster computing frameworks by UC Berkely's AMPLab
- tfenv - Terraform version manager - readme
- ZooKeeper (opens ZK shell) - distributed coordination and sychronization service by Yahoo
Repos suffixed with -dev
are the official technologies + development & debugging tools + my github repos with all dependencies pre-built.
My GitHub Repos (with all libs + deps pre-built)
You might like this Dockerfile trick for busting the Docker cache to get the latest repo updates:
# Cache Bust upon new commits
ADD https://api.github.com/repos/HariSekhon/DevOps-Bash-tools/git/refs/heads/master /.git-hashref
- Advanced Nagios Plugins Collection - 450+ nagios plugins for every Hadoop distribution and every major NoSQL technology - Hadoop, Redis, Elasticsearch, Solr, HBase, Cassandra & DataStax OpsCenter, MongoDB, MySQL, Kafka, Riak, Memcached, Couchbase, CouchDB, Mesos, Spark, Neo4j, Datameer, H2O, WanDisco, Yarn, HDFS, Impala, Apache Drill, Presto, ZooKeeper, Cloudera, Hortonworks, MapR, IBM BigInsights, Infrastructure - Linux, DNS, Whois, SSL Certs etc
harisekhon/tools
- DevOps Tools superset of the below images, containing hundreds of programs:- - DevOps Python Tools - 80+ DevOps CLI tools tools for AWS, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, Ambari, Blueprints, CloudFormation, Elasticsearch, Solr, Pig etc.
- - DevOps Bash Tools - 750+ DevOps CLI tools for AWS, GCP, Kubernetes, Hadoop, Hive, Impala, Kafka, Docker, LDAP, Git, Code & build linting, package management for Linux / Mac / Python / Perl / Ruby / NodeJS / Golang, and lots more random goodies
- - DevOps Perl Tools - 25+ DevOps CLI Tools - Log Anonymizer, Hadoop HDFS & Hive tools, Solr/SolrCloud CLI, SQL ReCaser (MySQL, PostgreSQL, AWS Redshift, Snowflake, Apache Drill, Hive, Impala, Cassandra CQL, Microsoft SQL Server, Oracle, Couchbase N1QL, Dockerfiles, Pig Latin, Neo4j, InfluxDB), Linux, Nginx stats & HTTP(S) URL watchers for load balanced web farms, Ambari FreeIPA Kerberos, Datameer etc.
- all of the above repos come with tags for
alpine
,centos
,debian
,fedora
andubuntu
builds
- Spotify Tools - Spotify API tools - eg. convert Spotify URIs to
Artist - Track
form by querying the Spotify API - readme
Github repos
My GitHub repo pre-built on major Linux distros with CLI programs located at /github/<project>
Available as both harisekhon/github:<distro>
and harisekhon/<distro>-github
for convenience, and to allow shorter use of :latest
by using just harisekhon/github
harisekhon/github:latest
is the same as harisekhon/github:ubuntu
Base Images
Linux Distros + Development Tools
Available as both harisekhon/<distro>-dev
and harisekhon/dev:<distro>
harisekhon/dev:latest
is the same as harisekhon/dev:ubuntu
- - Alpine latest with Java JDK, Perl, Python, Jython, Ruby, Scala, Groovy, GCC, Maven, SBT, Gradle, Make, Expect etc.
- - CentOS latest with Java JDK, Perl, Python, Jython, Ruby, Scala, Groovy, GCC, Maven, SBT, Gradle, Make, Expect, EPEL etc.
- - Debian latest with Java JDK, Perl, Python, Jython, Ruby, Scala, Groovy, GCC, Maven, SBT, Gradle, Make, Expect etc.
- - Fedora latest with Java JDK, Perl, Python, Jython, Ruby, Scala, Groovy, GCC, Maven, SBT, Gradle, Make, Expect etc.
- - Ubuntu latest with Java JDK, Perl, Python, Jython, Ruby, Scala, Groovy, GCC, Maven, SBT, Gradle, Make, Expect etc.
Base Images of Java / Scala
All builds use OpenJDK with jre
and jdk
numbered tags. See this article below for why it might be illegal to bundle Oracle Java (and why no Linux distributions do this either):
https://www.javacodegeeks.com/2016/03/running-java-docker-youre-breaking-law.html
- - Alpine latest with Java 8
- - CentOS latest combinations of Java 7 / 8 and Scala 2.10 / 2.11
- - Debian latest with Java 7, 8
- - Fedora latest combinations of Java 7/8 and Scala 2.10/2.11
-
- Ubuntu 14.04 with Java 7
- Ubuntu latest with Java 8, 9
Build from Source
All images come pre-built on DockerHub but if you want to compile from source for any reason such as developing improvements, I've made this easy to do:
git clone https://github.com/HariSekhon/Dockerfiles
cd Dockerfiles
To build all Docker images, just run the make
command at the top level:
make
To build a specific Docker image, enter its directory and run make
:
cd nagios-plugins
make
You can also build a specific version by checking out the git branch for the version and running the build:
cd consul
git checkout consul-0.9
make
or build all versions of a given software project like so:
cd hadoop
make build-versions
See the top level Makefile
as well as the Makefile.in
which is sourced per project with any project specific overrides in the <project_directory>/Makefile
.
Support
Please raise tickets for issues and improvements at https://github.com/HariSekhon/Dockerfiles/issues
Star History
More Core Repos
<!-- OTHER_REPOS_START -->Knowledge
<!-- Not support on GitHub Markdown: <iframe src="https://raw.githubusercontent.com/HariSekhon/HariSekhon/main/knowledge.md" width="100%" height="500px"></iframe> Does nothing: <embed src="https://raw.githubusercontent.com/HariSekhon/HariSekhon/main/knowledge.md" width="100%" height="500px" /> -->DevOps Code
<!-- [![Gist Card](https://github-readme-stats.vercel.app/api/gist?id=f8f551332440f1ca8897ff010e363e03)](https://gist.github.com/HariSekhon/f8f551332440f1ca8897ff010e363e03) -->Containerization
CI/CD
DBA - SQL
DevOps Reloaded
Templates
Misc
The rest of my original source repos are here.
Pre-built Docker images are available on my DockerHub.
<!-- 1x1 pixel counter to record hits --> <!-- OTHER_REPOS_END -->