Home

Awesome

Learning LZW compression algorithm

Just one of the things I'm learning. https://github.com/hchiam/learning

Tutorial

https://hackernoon.com/unixs-lzw-compression-algorithm-how-does-it-work-cp65347h which in turn adapts a version of the code here: https://github.com/adityagupta3006/LZW-Compressor-in-Python?ref=hackernoon.com

Notes

"Unlike Huffman Encoding, LZW doesn’t require this dictionary to be included in the compressed data. One of the unique characteristics of this algorithm is that this dictionary can be rebuilt while uncompressing the data, ... with the 256 ASCII codes as a preliminary step." LZW can be implemented with variable-width encoding. A maximum width would then be used to put a cap on memory consumption, so once the dictionary is "filled", no more words can be added. LZW works better on data that has redundancy.

Pseudocode

def lzw(input: array):
    output = compressed_data
    dictionary.pre_filled_with(basic_ascii_chars)
    for word in input:
        if word in dictionary:
            output.add(dictionary[word])
        elif dictionary.is_full():
            output.add(word)
        else: # not in dictionary and dictionary has space
            dictionary.add(word)
            output.add(dictionary[word])
    return output

Running the code in this repo

python encode_lzw.py # example_text.txt -> example_text.lzw
python decode_lzw.py # example_text.lzw -> example_text_decoded.txt
bash check_diffs.sh