Home

Awesome

Hari Sekhon - DevOps Python Tools

GitHub stars GitHub forks Lines of Code License My LinkedIn GitHub Last Commit

<!-- doesn't include /tests?/ or comments [![Lines of Code](https://sonarcloud.io/api/project_badges/measure?project=HariSekhon_DevOps-Python-tools&metric=ncloc)](https://sonarcloud.io/dashboard?id=HariSekhon_DevOps-Python-tools) --> <!-- site broken [![PyUp](https://pyup.io/repos/github/HariSekhon/DevOps-Python-tools/shield.svg)](https://pyup.io/account/repos/github/HariSekhon/DevOps-Python-tools/) [![Python 3](https://pyup.io/repos/github/HariSekhon/DevOps-Python-tools/python-3-shield.svg)](https://pyup.io/repos/github/HariSekhon/DevOps-Python-tools/) -->

Codacy CodeFactor Quality Gate Status Maintainability Rating Reliability Rating Security Rating Vulnerabilities

Linux Mac Docker Dockerfile DockerHub Pulls DockerHub Build Automated StarTrack StarCharts

<!-- these badges don't work any more [![Docker Build Status](https://img.shields.io/docker/build/harisekhon/pytools?logo=docker&logoColor=white)](https://hub.docker.com/r/harisekhon/pytools/builds) [![MicroBadger](https://images.microbadger.com/badges/image/harisekhon/pytools.svg)](http://microbadger.com/#/images/harisekhon/pytools) -->

CI Builds Overview Jenkins Concourse GoCD TeamCity

CircleCI BuildKite AppVeyor Drone Codefresh Cirrus CI Semaphore Buddy Shippable Travis CI

Azure DevOps Pipeline GitLab Pipeline BitBucket Pipeline AWS CodeBuild GCP Cloud Build

Repo on GitHub Repo on GitLab Repo on Azure DevOps Repo on BitBucket

ShellCheck JSON YAML XML Validation Kics Grype Semgrep Semgrep Cloud Trivy

Docker Build (Alpine) Docker Build (Debian) Docker Build (Fedora) Docker Build (Ubuntu)

GitHub Actions Ubuntu Mac Mac 11 Mac 12 Ubuntu Ubuntu 20.04 Ubuntu 22.04 Debian Debian 10 Debian 11 Debian 12 Fedora Alpine Alpine 3

Python versions Python 3.7 Python 3.8 Python 3.9 Python 3.10 Python 3.11

git.io/pytools

AWS, Docker, Spark, Hadoop, HBase, Hive, Impala, Python & Linux Tools

DevOps, Cloud, Big Data, NoSQL, Python & Linux tools. All programs have --help.

Hari Sekhon

Cloud & Big Data Contractor, United Kingdom

My LinkedIn <br>(you're welcome to connect with me on LinkedIn)

Make sure you run make update if updating and not just git pull as you will often need the latest library submodule and possibly new upstream libraries

Quick Start

Ready to run Docker image

All programs and their pre-compiled dependencies can be found ready to run on DockerHub.

List all programs:

docker run harisekhon/pytools

Run any given program:

docker run harisekhon/pytools <program> <args>

Automated Build from source

installs git, make, pulls the repo and build the dependencies:

curl -L https://git.io/python-bootstrap | sh

or manually:

git clone https://github.com/HariSekhon/DevOps-Python-tools pytools
cd pytools
make

To only install pip dependencies for a single script, you can just type make and the filename with a .pyc extension instead of .py:

make anonymize.pyc

Make sure to read Detailed Build Instructions further down for more information.

Some Hadoop tools with require Jython, see Jython for Hadoop Utils for details.

Usage

All programs come with a --help switch which includes a program description and the list of command line options.

Environment variables are supported for convenience and also to hide credentials from being exposed in the process list eg. $PASSWORD, $TRAVIS_TOKEN. These are indicated in the --help descriptions in brackets next to each option and often have more specific overrides with higher precedence eg. $AMBARI_HOST, $HBASE_HOST take priority over $HOST.

DevOps Python Tools - Inventory

Detailed Build Instructions

Python VirtualEnv localized installs

The automated build will use 'sudo' to install required Python PyPI libraries to the system unless running as root or it detects being inside a VirtualEnv. If you want to install some of the common Python libraries using your OS packages instead of installing from PyPI then follow the Manual Build section below.

Manual Setup

Enter the pytools directory and run git submodule init and git submodule update to fetch my library repo:

git clone https://github.com/HariSekhon/DevOps-Python-tools pytools
cd pytools
git submodule init
git submodule update
sudo pip install -r requirements.txt

Offline Setup

Download the DevOps Python Tools and Pylib git repos as zip files:

https://github.com/HariSekhon/DevOps-Python-tools/archive/master.zip

https://github.com/HariSekhon/pylib/archive/master.zip

Unzip both and move Pylib to the pylib folder under DevOps Python Tools.

unzip devops-python-tools-master.zip
unzip pylib-master.zip

mv -v devops-python-tools-master pytools
mv -v pylib-master pylib
mv -vf pylib pytools/

Proceed to install PyPI modules for whichever programs you want to use using your usual procedure - usually an internal mirror or proxy server to PyPI, or rpms / debs (some libraries are packaged by Linux distributions).

All PyPI modules are listed in the requirements.txt and pylib/requirements.txt files.

Internal Mirror example (JFrog Artifactory or similar):

sudo pip install --index https://host.domain.com/api/pypi/repo/simple --trusted host.domain.com -r requirements.txt

Proxy example:

sudo pip install --proxy hari:mypassword@proxy-host:8080 -r requirements.txt

Mac OS X

The automated build also works on Mac OS X but you'll need to install Apple XCode (on recent Macs just typing git is enough to trigger Xcode install).

I also recommend you get HomeBrew to install other useful tools and libraries you may need like OpenSSL for development headers and tools such as wget (these are installed automatically if Homebrew is detected on Mac OS X):

bash-tools/install/install_homebrew.sh
brew install openssl wget

If failing to build an OpenSSL lib dependency, just prefix the build command like so:

sudo OPENSSL_INCLUDE=/usr/local/opt/openssl/include OPENSSL_LIB=/usr/local/opt/openssl/lib ...

You may get errors trying to install to Python library paths even as root on newer versions of Mac, sometimes this is caused by pip 10 vs pip 9 and downgrading will work around it:

sudo pip install --upgrade pip==9.0.1
make
sudo pip install --upgrade pip
make

Jython for Hadoop Utils

The 3 Hadoop utility programs listed below require Jython (as well as Hadoop to be installed and correctly configured)

hdfs_time_block_reads.jy
hdfs_files_native_checksums.jy
hdfs_files_stats.jy

Run like so:

jython -J-cp $(hadoop classpath) hdfs_time_block_reads.jy --help

The -J-cp $(hadoop classpath) part dynamically inserts the current Hadoop java classpath required to use the Hadoop APIs.

See below for procedure to install Jython if you don't already have it.

Automated Jython Install

This will download and install jython to /opt/jython-2.7.0:

make jython

Manual Jython Install

Jython is a simple download and unpack and can be fetched from http://www.jython.org/downloads.html

Then add the Jython install bin directory to the $PATH or specify the full path to the jython binary, eg:

/opt/jython-2.7.0/bin/jython hdfs_time_block_reads.jy ...

Configuration for Strict Domain / FQDN validation

Strict validations include host/domain/FQDNs using TLDs which are populated from the official IANA list is done via my PyLib library submodule - see there for details on configuring this to permit custom TLDs like .local, .intranet, .vm, .cloud etc. (all already included in there because they're common across companies internal environments).

Python SSL certificate verification problems

If you end up with an error like:

./dockerhub_show_tags.py centos ubuntu
[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:765)

It can be caused by an issue with the underlying Python + libraries due to changes in OpenSSL and certificates. One quick fix is to do the following:

sudo pip uninstall -y certifi &&
sudo pip install certifi==2015.04.28

Updating

Run:

make update

This will git pull and then git submodule update which is necessary to pick up corresponding library updates.

If you update often and want to just quickly git pull + submodule update but skip rebuilding all those dependencies each time then run make update-no-recompile (will miss new library dependencies - do full make update if you encounter issues).

Testing

Continuous Integration is run on this repo with tests for success and failure scenarios:

To trigger all tests run:

make test

which will start with the underlying libraries, then move on to top level integration tests and functional tests using docker containers if docker is available.

Contributions

Patches, improvements and even general feedback are welcome in the form of GitHub pull requests and issue tickets.

You might also be interested in the following really nice Jupyter notebook for HDFS space analysis created by another Hortonworks guy Jonas Straub:

https://github.com/mr-jstraub/HDFSQuota/blob/master/HDFSQuota.ipynb

Star History

Star History Chart

git.io/python-tools

git.io/pytools

More Core Repos

<!-- OTHER_REPOS_START -->

Knowledge

Readme Card Readme Card

<!-- Not support on GitHub Markdown: <iframe src="https://raw.githubusercontent.com/HariSekhon/HariSekhon/main/knowledge.md" width="100%" height="500px"></iframe> Does nothing: <embed src="https://raw.githubusercontent.com/HariSekhon/HariSekhon/main/knowledge.md" width="100%" height="500px" /> -->

DevOps Code

Readme Card Readme Card Readme Card Readme Card

<!-- [![Gist Card](https://github-readme-stats.vercel.app/api/gist?id=f8f551332440f1ca8897ff010e363e03)](https://gist.github.com/HariSekhon/f8f551332440f1ca8897ff010e363e03) -->

Containerization

Readme Card Readme Card

CI/CD

Readme Card Readme Card

DBA - SQL

Readme Card

DevOps Reloaded

Readme Card Readme Card Readme Card Readme Card Readme Card

Templates

Readme Card Readme Card

Misc

Readme Card Readme Card

The rest of my original source repos are here.

Pre-built Docker images are available on my DockerHub.

<!-- 1x1 pixel counter to record hits -->

<!-- OTHER_REPOS_END -->