Awesome

BTHOWeN

Code to accompany the paper:

Weightless Neural Networks for Efficient Edge Inference, Zachary Susskind, Aman Arora, Igor Dantas Dos Santos Miranda, Luis Armando Quintanilla Villon, Rafael Fontella Katopodis, Leandro Santiago de Araújo, Diego Leonel Cadette Dutra, Priscila Lima, Felipe Maia Galvão França, Mauricio Breternitz Jr., Lizy John

Presented at the 31st International Conference on Parallel Architectures and Compilation Techniques (PACT 2022)

Usage

Prerequisites

Our codebase was written for Python 3.8.10; other version may very well work but are untested.

We recommend constructing a virtual environment for dependency management:

python3 -m venv env
source env/bin/activate

From here, dependency installation can be automatically handled with a single command:

pip install -r requirements.txt

If you'd like to synthesize generated RTL using our Make flow, you'll need a VCS installation and of course a valid license. Point the VCS_HOME environment variable to your top-level VCS installation directory (the executable path should be $(VCS_HOME)/bin/vcs). We derived our power and area estimates using Vivado; reports for the provided pre-trained models are available (see below).

Creating BTHOWeN Models

All relevant code lives in the software_model/ directory. Natively supported datasets are MNIST, Ecoli, Iris, Letter, Satimage, Shuttle, Vehicle, Vowel, and Wine.

train_swept_models.py is the primary script for programmatic model sweeping. It allows for specification of Bloom filter and encoding parameters; run with --help for more details.
Example usage: ./train_swept_models.py MNIST --filter_inputs 28 --filter_entries 1024 --filter_hashes 2 --bits_per_input 2
--filter_inputs, --filter_entries, --filter_hashes, and --bits_per_input can all be provided with multiple values, in which case all permutations are tried.
Run-to-run variation in accuracy is expected, particularly on small models. This is largely a result of the random input mapping.
Note: Dataset names are not case-sensitive

evaluate.py runs inference on a pre-trained model - invocation takes the form ./evaluate.py <model_fname> <dset_name>.

calc_model_size.py is a small script which gives the total Bloom filter size (in KiB; 1 KiB = 1024B) of the specified pre-trained model.

convert_dset.py is the script we used to binarize the MNIST dataset as an input to our RTL testbenches for checking correctness. You're unlikely to need this one, but we provide it for completeness.

Producing RTL for Pre-Trained BTHOWeN Models

All relevant code lives in the rtl/ directory.

We provide a Makefile for generating the RTL. Invoking make with no arguments will generate RTL for a sample model (our small MNIST model), and then attempt to build the RTL and testbenches. To generate the RTL without building it, run make template.
The Makefile also allows for the model file, number of hash units in the accelerator, and data bus width to be specified as optional command-line arguments. So for instance, you could run:

make template MODEL=../software_model/selected_models/letter.pickle.lzma HASH_UNITS=2 BUS_WIDTH=32

If HASH_UNITS is left as the default (-1), the script will choose the smallest number of hash units possible without causing the device to be memory-bound. This is a function of the exact model architecture and the bus width.

SystemVerilog sources are generated using the Mako templating library from the .sv.mako sources under rtl/mako_srcs/, and written under rtl/sv_srcs/.

Replication

Software Models

Models with identical sizes to those mentioned in Table 3 of the paper (replicated below) can be trained with: ./train_swept_models.py <Dataset name> --filter_inputs <Bits/Filter> --filter_entries <Entries/Filter> --filter_hashes <Hashes/Filter> --bits_per_input <Bits/Input> Run-to-run variation in input mapping may cause results to not exactly match, particularly on the very small datasets (e.g. Wine). The pretrained models used in the paper are available under software_model/selected_models/.

Model Name	Bits/Input	Bits/Filter	Entries/Filter	Hashes/Filter	Size (KiB)	Test Accuracy
MNIST-Small	2	28	1024	2	70.0	0.934
MNIST-Medium	3	28	2048	2	210	0.943
MNIST-Large	6	49	8192	4	960	0.952
Ecoli	10	10	128	2	0.875	0.875
Iris	3	2	128	1	0.281	0.980
Letter	15	20	2048	4	78.0	0.900
Satimage	8	12	512	4	9.00	0.880
Shuttle	9	27	1024	2	2.63	0.999
Vehicle	16	16	256	3	2.25	0.762
Vowel	15	15	256	4	3.44	0.900
Wine	9	13	128	3	0.422	0.983

RTL Power and Area

Replicating RTL power/energy/area results requires a Vivado license. If you encounter timing violations, set intermediate_buffer = False on line 19 of rtl/mako_srcs/hash.sv.mako; this will insert an additional stage in the pipeline. We needed to do this for our medium and large MNIST models.
We also provide the Vivado reports which were used in our analysis under rtl/synthesis_reports.