Awesome
CityHash/FarmHash
Python wrapper for FarmHash and CityHash, a family of fast non-cryptographic hash functions.
Getting Started
To install from PyPI:
pip install cityhash
To install in a Conda environment:
conda install -c conda-forge python-cityhash
The package exposes Python APIs for CityHash and FarmHash under cityhash
and
farmhash
namespaces, respectively. Each provides 32-, 64- and 128-bit
implementations.
Usage Examples
Stateless hashing
Usage example for FarmHash:
>>> from farmhash import FarmHash32, FarmHash64, FarmHash128
>>> FarmHash32("abc")
1961358185
>>> FarmHash64("abc")
2640714258260161385
>>> FarmHash128("abc")
76434233956484675513733017140465933893
Hardware-independent fingerprints
Fingerprints are seedless hashes that are guaranteed to be hardware- and platform-independent. This can be useful for networking applications which require persisting hashed values.
>>> from farmhash import Fingerprint128
>>> Fingerprint128("abc")
76434233956484675513733017140465933893
Incremental hashing
CityHash and FarmHash do not support incremental hashing and thus are not ideal for hashing of character streams. If you require incremental hashing, consider another hashing library, such as MetroHash or xxHash.
Fast hashing of NumPy arrays
The Buffer Protocol allows Python objects to expose their data as raw byte arrays for fast access without having to copy to a separate location in memory. NumPy is one well-known library that extensively uses this protocol.
All hashing functions in this package will read byte arrays from objects that expose them via the buffer protocol. Here is an example showing hashing of a four-dimensional NumPy array:
>>> import numpy as np
>>> from farmhash import FarmHash64
>>> arr = np.zeros((256, 256, 4))
>>> FarmHash64(arr)
1550282412043536862
The NumPy arrays need to be contiguous for this to work. To convert a
non-contiguous array, use NumPy's ascontiguousarray()
function.
SSE4.2 support
For x86-64 platforms, the PyPI repository for this package includes wheels compiled with SSE4.2 support. The 32- and 64-bit (but not the 128-bit) variants of FarmHash significantly benefit from SSE4.2 instructions.
The vanilla CityHash functions (under cityhash
module) do not take advantage
of SSE4.2. Instead, one can use the cityhashcrc
module provided with this
package which exposes 128- and 256-bit CRC functions that do harness SSE4.2.
These functions are very fast, and beat FarmHash128
on speed (FarmHash does
not include a 256-bit function). Since FarmHash is the intended successor of
CityHash, I would be careful before using the CityHash-CRC functions, however,
and would verify whether they provide sufficient randomness for your intended
application.
Development
Local workflow
For those wanting to contribute, here is a quick start using Make commands:
git clone https://github.com/escherba/python-cityhash.git
cd python-cityhash
make env # create a virtual environment
make test # run Python tests
make cpp-test # run C++ tests
make shell # enter IPython shell
To find out which Make targets are available, enter:
make help
Distribution
The package wheels are built using cibuildwheel and are distributed to PyPI using GitHub actions. The wheels contain compiled binaries and are available for the following platforms: windows-amd64, ubuntu-x86, linux-x86_64, linux-aarch64, and macosx-x86_64.
See Also
For other fast non-cryptographic hash functions available as Python extensions, see MetroHash, MurmurHash, and xxHash.
Authors
The original CityHash Python bindings are due to Alexander [Amper] Marshalov. They were rewritten in Cython by Eugene Scherba, who also added the FarmHash bindings. The CityHash and FarmHash algorithms and their C++ implementation are by Google.
License
This software is licensed under the MIT License. See the included LICENSE file for details.