Home

Awesome

The parallel gzip

This is an implementation of a parallel gzip. It works by splitting the input into chunks (by default by 32MBs, but this can be configured). Each chunk is compressed independently and the results are concatenated together. Such result can be read and decompressed by the usual gzip implementation.

The motivation is to speed up transfers of large amounts of data across a fast network through ssh. The ssh throughput is limited by either its compression or encryption routines, which are single-threaded. This allows turning compression off in ssh and using multiple cores to compress the data. As the decompression is much faster, it is not necessary to use parallel decompression.

Limitations

There are certain limitations:

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.