Awesome
Create fast and efficient standard cell based adders, multipliers and multiply-adders.
Features
Fast
A 2 cycle 64 bit multiply-adder (64bit * 64bit + 128bit -> 128bit
) built with
the OpenROAD RTL to GDSII
flow and the ASAP7 7nm
academic PDK makes timing at 1.85 GHz 1. It takes up 3600um of area:
A 4 cycle 32 bit multiplier (32bit * 32bit -> 64bit
), also using OpenROAD and
ASAP7 makes timing at 2.7 GHz 1. Both cases are likely to improve as OpenROAD
improves (including better timing aware global placement and global routing,
improvements to the resizer, improvements to clock tree synthesis and the use of
LVT cells).
vlsiffra achieves this by using many well established techniques including Booth encoding, Dadda reduction and a choice of fast adders like Kogge-Stone.
For more details about these algorithms, check out this Twitter thread which details the implementation of the multiplier in the Bluegene Q supercomputer.
Configurable
vlsiffra is written in the Amaranth HDL language which allows it to be very configurable, including:
-
Configurable number of bits
Any power of two likely works, although Amaranth does start to slow down when building 64 bit multipliers due to a polynomial time complexity issue when adding signals. An issue has been opened to track this and once fixed larger multipliers should be possible.
-
Choice of algorithms
Various addition algorithms are supported:
- Brent-Kung (less area, lower performance)
- Kogge-Stone (more area, higher performance)
- Han-Carlson (a balance of area and performance)
- Ripple (lowest area, lowest performance)
-
Configurable number of stages
Configurable number of stages, from purely combinational, to 4 register stages. All configurations are fully pipelined. Trade latency for frequency.
Formally verified
Yosys is used to formally verify the standard cell implementation matches gold behavioural models. Amaranth unit tests and Verilator based tests are also used to further verify the design.
Support for many technologies.
vlsiffra currently supports the SkyWater sky130hd, GlobalFoundries GF180MCU and ASAP7 PDKs and standard cell libraries.
Easy to add support for new technologies
vlsiffra only requires a few standard cells (full and half adders, 2 input xor, 2 input and, inverter as well as a couple of more complicated cells (ao21, ao22, ao33)
Installation
vlsiffra is a python package, so this will install it and any dependencies:
pip3 install git+https://github.com/antonblanchard/vlsiffra
Another option is to install it from a checked out source tree:
pip3 install .
Amaranth requires Yosys. If you don't have a version installed, you can use the amaranth-yosys package:
pip3 install amaranth-yosys
Example usage
Create a GF180MCU 64 bit Kogge-Stone adder:
vlsi-adder --bits=64 --algorithm=koggestone --tech=gf180mcu --output=adder.v
Create an ASAP7 32 bit multiplier, using a Brent-Kung adder:
vlsi-multiplier --bits=32 --algorithm=brentkung --tech=asap7 --output=multiplier.v
Create a sky130hd 2 cycle 64 bit multiply-adder, which was taped out in the OpenPOWER Microwatt core for the Google/Efabless/SkyWater MPW7 shuttle (one for the fixed point multiplier and another for the floating point multiplier):
vlsi-multiplier --bits=64 --multiply-add --algorithm=hancarlson --tech=sky130hd --register-post-ppg --output=multiply_adder_pipelined.v
The two multipliers on the Microwatt MPW7 tape out can be seen on the left side of the die:
Testing
Local testing requires an installation of both yosys and verilator. Run
make check
. Submitting a pull request will kick off the same set of tests.
Adding a new technology
Using ASAP7 as an example:
-
A technology file that contains code to instantiate the standard cells required. Use one of the existing ones as a starting point.
When creating instances, Amaranth uses the i_* prefix for inputs and the o_* prefix for outputs, ie i_VDD means the instance has an input called VDD. As an example, this instantiates the XOR2x1_ASAP7_75t_R xor cell that has A and B inputs and a Y output.
Also note that ASAP7 inverts the outputs of the full and half adders, so you will see inverters in this file to undo this. Remove them if your technology has non inverting outputs.
eg Adding the xor definition:
def _generate_xor(self, a, b, o):
xorgate = self._PoweredInstance(
"XOR2x1_ASAP7_75t_R",
i_A=a,
i_B=b,
o_Y=o
)
self.m.submodules += xorgate
-
Modify get_tech() to hook the new tech up.
-
Verilog behavioural models for the standard cells, used for verification.
Issues
-
No support for signed multipliers. Planning to add this.
-
No support for carry in or carry out of adders. Planning to add this.
-
No support for clock gating yet.
-
Formal verification of multipliers is slow, and gets unbearably slow as the multiplier reaches 64 bits. As a result, we formally verify smaller configurations only. We should check if we there are faster equivalence checking methods in Yosys. Another idea might be to verify each output bit in a different Yosys process, parallelising things.
-
Adding more optional register stages. Splitting Dadda reduction into two cycles and perhaps final addition into two cycles would improve the multiplier frequency.
-
We use OpenROAD for cell placement. We might be able to improve the area of the design by doing manual placement, but it's not clear the effort is worth it. We currently use Yosys to instantiate FFs, so we'd need to do this before attempting manual placement.
-
Support for 4:2 compressors (basically 2 full adders). This is what Bluegene Q uses and might help to improve area and frequency a bit. We'd need to create a 4:2 compressor cell since none of the standard cell
Why vlsiffra?
My last attempt to name an Open Source project resulted in the impossible to Google for "Microwatt" OpenPOWER VHDL core. vlsiffra is a portmanteau of VLSI and siffra, the Swedish word for number. Thanks to @ruscur for the idea. Hello to all our Swedish readers.