Awesome

WB2AXIP: Bus interconnects, bridges, and other components

The bus components and bridges within this repository are unique in that they are all designed for 100% throughput with no throughput overhead. They are also unique in that the vast majority of the cores within have all been formally verified.

Where the protocol allows it, such as with AXI4, AXI-lite, and Wishbone B4 pipelined, multiple transactions may be in flight at a time so that protocol handling doesn't stall the bus.

This is uncommon among AXI4 implementations and almost unheard of in the example AXI-lite implementations I have examined.

Most AXI4 implementations will process a single burst transaction packet at a time and require some overhead to make this happen. Xilinx's AXI-lite implementations, both interconnect and slave implementations, only handle one request at a time. Other buses, such as Wishbone Classic, AHB, or APB, will only ever process one transaction word at a time.

If you are coming from AXI4, AXI-lite, or any one of these other bus implementations to the AXI4 or even AXI-lite components supported here, then you should expect to see a throughput increase by using one (or more) of the cores listed here--given of course that you have a bus master capable of issuing multiple requests at a time.

This performance improvement may be as significant as a 16x speedup when toggling an I/O, a 4x speedup when comparing this slave against Xilinx's block RAM memory controller (when processing single beat transactions), or as insignificant as 2% improvement from using the AXI4 MM to Slave converters (according to Xilinx's data sheets---I haven't yet run the test myself). This increased performance extends to the crossbar implementations contained within this repository as well, and so you may notice the improvement only increases when using these crossbars.

A Pipelined Wishbone B4 to AXI4 bridge

Built out of necessity, this repository was originally built around a Wishbone (WB) to AXI4 bridge, which is designed to provide a conversion from a (simpler) pipelined wishbone bus to an AXI4 bus for the purposes of driving memory transactions through Xilinx's SDRAM controllers. The WB->AXI bridge is designed to connect a wishbone bus to an AXI bus which may be wider--such as from a 32-bit WB bus to a 128-bit AXI bus. Hence, if the Memory Interface Generator DDR3 controller is running at a 4:1 clock rate, memory clocks to AXI system clocks, then this bus translator should be able to accomplish one transaction per clock at a sustained (pipelined) rate (neglecting any stalls due to refresh cycles).

Since the initial build of the core, I've added the WB to AXI lite bridge. This is also a pipelined bridge, and like the original one it has also been formally verified.

AXI to Wishbone conversion

While not the original purpose of the project, it now has both AXI-lite to WB and AXI to WB bridges. Each of these bridges comes in two parts, a read and write half. These halves can be used either independently, generating separate inputs to a WB crossbar, or combined through a WB arbiter.

The AXI-lite to WB bridge has been both formally verified and FPGA proven. This includes both the write half as well as the read half. Given the reluctance of the major vendors to support high speed AXI-lite interfaces, you aren't likely to find this kind of performance elsewhere.

The AXI to WB bridge write and read components have only been formally verified through about a dozen steps or so. This proof is deep enough to verify most of the bus interactions, but not nearly deep enough to verify any issues associated with internal FIFO overflows.

Wishbone pipeline to WB Classic

There's also a Wishbone (pipelined, master) to Wishbone (classic, slave) bridge, as well as the reverse Wishbone (classic, master) to Wishbone (pipelined, slave) bridge. Both of these have passed their formal tests. They are accompanied by a set of formal properties for Wishbone classic, both for slaves as well as masters.

AXI3 bridging

I'm now in the process of adding AXI3 bridges to this repository. These will be necessary for working with the Zynq chips, and others, that are still using AXI3. While the work is ongoing, I do have an AXI3 to AXI4 bridge available that's undergoing testing. The bridge supports two algorithms for W* reordering, and should be suitable for most applications.

Formal Verification

Currently, the project contains formal specifications for Avalon, Wishbone (classic), Wishbone (pipelined), APB, and AXI-lite buses. There's also a (partial) formal property specification for an AXI (full) bus, but the one in the master branch is incomplete. The complete set of AXI properties are maintained elsewhere. These properties, and the cores they've been used to verify, have all been tested and verified using SymbiYosys.

Xilinx Cores

The formal properties were first tested on a pair of Xilinx AXI demonstration cores. These cores failed formal verification. You can read about them on my blog, at zipcpu.com, here for AXI-lite and here for AXI. You can find the Xilinx cores referenced in those articles here and here for reference, for those who wish to repeat or examine my proofs.

Firewalls

A firewall is a guarantor: given an interface, of which only one side is trusted, the firewall guarantees the other side can trust the interface. More than that, a firewall can be used to trigger an in-circuit logic analyzer: if ever the interface rules are violated, the firewall will set an ouput fault indicator, which can then be used to trigger the logic analyzer. On top of that, the firewalls below are also built with an optional reset, allowing the design to safely return to functionality after triggering. In many cases, this requires resetting the downstream (untrusted) component.

AXILSAFETY is a bus fault isolator AXI-lite translator, sometimes called a firewall, designed to support a connection to a trusted AXI-lite master, and an untrusted AXI-lite slave. Should the slave attempt to return an illegal response, or perhaps a response beyond the user parameterized timeouts, then the untrusted slave will be "disconnected" from the bus, and a bus error will be returned for both the errant transaction and any following.

AXILSAFETY also has a mode where, once a fault has been detected, the slave is reset and then allowed to return to the bus infrastructure again until its next fault.

This core has been formally verified.
AXISAFETY is a bus fault isolator/firewall very similar to the AXILSAFETY bus fault isolator above with many of the same options. The difference is that the AXISAFETY core works with the full AXI4 specification, whereas the AXILSAFETY core works only with AXI4-lite.

As with the AXILSAFETY example, the AXISAFETY firewall also has a mode where, once a fault has been detected, the slave is reset and allowed to return to the bus infrastructure until its next fault. Unliike the AXILSAFETY example, this one will only ever process a single AXI4 burst at a time.

This core has been formally verified.
AXISSAFETY is a firewall for the AXI stream protocol. It guarantees the stream protocol, and optionally that the incoming stream will never be stalled for too long a period or that all packets downstream have the same length.

This core has been formally verified.
WBSAFETY is a bus fault isolator/firewall, very similar to the AXILSAFETY firewall above, only for the Wishbone bus. Unlike many vendor firewall implementations, this one is able to reset the downstream core following any error without impacting it's ability to respond to the bus in a protocol compliant fashion.

This core has been formally verified.

Cross-bars and AXI demonstrators

This repository has since become a repository for all kinds of bus-based odds and ends in addition to the bus translators mentioned above. Some of these odds and ends include crossbar switches and AXI demonstrator cores. As mentioned above, these cores are unique in their 100% throughput capabilities.

WBXBAR is a fully function N master to M slave Wishbone crossbar. Unlike my Unlike my earlier WB interconnects, this one guarantees that the ACK responses won't get crossed, and that misbehaving slave accesses will be timed out. The core also has options for checking for starvation (where a master's request is not granted in a particular period of time), double-buffering all outputs (i.e. with skid buffers, and forcing idle channel values to zero in order to reduce power.

This core has been formally verified and used in several designs.
AXILXBAR is a fully functional, formally verified, N master to M slave AXI-lite crossbar interconnect. As such, it permits min(N,M) active channel connections between masters and slaves all at once. This core also has options for low power, whereby unused outputs are forced to zero, and lingering. Since the AXI protocol doesn't specify exactly when to close a channel, there's an OPT_LINGER allowing you to specify how many cycles the channel should be idle for in order for the channel to be closed. If the channel is not closed, a clock can be spared when reusing it. Otherwise, two clocks will be required to access a given channel.

This core has been formally verified.

While I haven't tested Xilinx's interconnect to know, if the quality of their demonstration AXI-lite slave core is any indication, then this cross-bar should easily outperform anything they have. The key unusual feature? The ability to maintain one transaction per clock over an extended period of time across any channel pair. (Their crossbar artificially limits AXI-lite interfaces to one transaction at a time.)
AXIL2AXIS converts from AXI-lite to AXI stream and back again. It's primary purpose is for testing AXI stream components at low speed, to make certain that they work before increasing the speed of the stream to the system clock rate. As such, writes to the core will generate writes to the AXI stream on the master side, and reads from the core will accept AXI stream reads on the slave side.

While this isn't really intended to be a high performance core, it can still handle 100% throughput like most of my IP here. Therefore, anything less than 100% throughput through this core will be a test of and reflection of how the rest of your system works.

This core has been formally verified.
AXIEMPTY is a cross bar helper. It's the simplest, most basic slave I could come up with that obeyed all the rules of AXI while returning a bus error for every request. It's designed to be used by the interconnect generator for those cases where there are no slaves on a given AXI bus.

This core has been formally verified.
AXILEMPTY is a cross bar helper along the same lines as the AXIEMPTY core above. It has an nearly identical purpose, save only that AXILEMPTY is built to be the empty slave on an AXI-lite bus, not an AXI one.

This core has been formally verified.
AXILSINGLE is designed to be a companion to AutoFPGA's AXI-lite support. It's purpose is to simplify connectivity logic when supporting multiple AXI-lite registers. This core takes a generic AXI-lite interface, and simplifies the interface so that multiple single-register cores can be connected to it at no loss in throughput. The single-register cores can either be full AXI-lite cores in their own respect, subject to simplification rules (listed within), or even further simplified from that. They must never stall the bus, and must always return responses within one clock cycle. The AXILSINGLE handles all backpressure issues. If done right, the backpressure logic from any downstream slave core will be removed by the synthesis tool, allowing all backpressure logic to be condensed into a few shared wires.

This core has been formally verified.
AXILDOUBLE is the second AXI-lite companion to AutoFPGA's AXI-lite support. It's purpose is to simplify connectivity logic when supporting multiple AXI-lite slaves while imposing no throughput penalty. This core takes a generic AXI-lite interface, and simplifies the interface so that multiple peripherals can be connected to it. These peripheral cores can either be full AXI-lite cores in their own respect, subject to simplification rules discussed within, or even simplified from that. They must never stall the bus, and must always return responses within one clock cycle. The AXILDOUBLE core handles all backpressure issues, address selection, and invalid address returns.

This core has been formally verified.
AXIXBAR is a fun project to develop a full NxM configurable crossbar using the full AXI protocol.

Unique to this (full) AXI crossbar is the ability to have multiple ongoing transactions on each of the master-to-slave channels. Were Xilinx's crossbar to do this, it would've broken their demonstration AXI-full slave core.

This core has been formally verified and used in several designs.
DEMOAXI is a demonstration AXI-lite slave core with more power and capability than Xilinx's demonstration AXI-lite slave core. Particular differences include 1) this one passes a formal verification check (Xilinx's core has bugs), and 2) this one can handle a maximum throughput of one transaction per clock. (Theirs did at best one transaction every other clock period.) You can read more about this demonstration AXI-lite slave core on ZipCPU.com in this article.

This core has been formally verified.
EASYAXIL is a second demonstration AXI-lite slave core, only this time re-engineered to look and feel simpler than the DEMOAXI core above. It's also designed to use internal registers, vice a memory, so that it can be more easily extended. The core can either use skidbuffers, in which case its performance matches the DEMOAXI core above, or not, in which case it has only half the throughput. The real key difference is that the skid buffers have been separated into an external module.

This core has been formally verified. While not used in any designs per se it has formed the basis for many successful AXI-lite designs.
AXILGPIO is a basic GPIO controller derived from the EASYAXIL design above.

This core has been formally verified.
DEMOFULL is a fully capable AXI4 demonstration slave core rather than just the AXI-lite protocol. Well, okay, it doesn't do anything with the PROT, QOS, CACHE, and LOCK flags, so perhaps it isn't truly the full AXI protocol. Still, it's sufficient for most needs.

Unlike Xilinx's demonstration AXI4 slave core, this one can handle 100% loading on both read and write channels simultaneously. That is, it can handle one read and one write beat per channel per clock with no stalls between bursts if the environment will allow it.

This core has been formally verified and used in several designs.
AXI2AXILITE converts incoming AXI4 (full) requests for an AXI-lite slave. This conversion is fully pipelined, and capable of sending back to back AXI-lite requests on both channels.

This core has been formally verified and used in several designs.
AXIS2MM converts an incoming stream signal into outgoinng AXI (full) requests. Supports bursting and aborted transactions. Also supports writes to a constant address, and continuous writes to concurrent addresses. This core depends upon all stream addresses being aligned.

This core has been formally verified and checked in simulation.
AXIMM2S reads from a given address, and writes it to a FIFO buffer and then to an eventual AXI stream. Read requests are not issued unless room already exists in the FIFO, yet for a sufficiently fast stream the read requests may maintain 100% bus utilization--but only if the rest of the bus does as well. Supports continuous, fixed address or incrementing, and aborted transactions.

Both this core and the one above it depend upon all stream words being aligned to the stream.

This core has been both formally verified and checked in simulation.
AXIDMA is a hardware assisted memory copy. Given a source address, read address, and length, this core reads from the source address into a FIFO, and then writes the data from the FIFO to memory. As an optimization, memory address requests are not made unless the core is able to transfer at a 100% throughput rate.

This core has been formally verified and used in several designs.
AXISGDMA is a brand new scatter-gather/vector-io based DMA controller. Give it a pointer to a table of DMA descriptors, and it will issue commands to the DMA until the table is complete.

Both the internal FSM and the table reader have been separately verified. The AXISGDMA has not yet been verified.
AXIVCAMERA is a AXI-based frame-buffer writer. Given an AXI-stream video source, a frame start address, the number of lines in the image and the number of bytes per line, this core will copy one (or more) frames of video to memory.

This core has been formally verified, and used successfully in a simulation based demonstration.
AXIVDISPLAY is a AXI-based frame-buffer source. Given a frame start address in memory, the number of lines in an image and the number of bytes per line, this core will perpetually read a video image from memory and produce it on an outgoing stream interface.

This particular version can only handle bus aligned transfers.

This core has been formally verified.

You can find a demonstration of this core being used in my VGA simulator--supporting both VGA and HDMI outputs.
AXISINGLE is a (to be written) bus simplifier core along the lines of the AXILSINGLE, AXILDOUBLE and AXIDOUBLE cores, in that it can handle all of the bus logic for multiple AXI slaves while simplifying the bus interactions for each but at no throughput penalty. Once built, this will also be an AutoFPGA companion core. Slave's of type "SINGLE" (one register, one clock to generate a response) can be ganged together using it. This core will then essentially turn an AXI core into an AXI-lite core, with the same interface as AXILSINGLE above. When implemented, it will look very similar to the AXIDOUBLE core mentioned below.
AXIDOUBLE is the second AXI4 (full) companion to AutoFPGA's AXI4 (full) support. It's purpose is to simplify connectivity logic when supporting multiple AXI4 (full) slaves. This core takes a generic AXI4 (full) interface, and simplifies the interface so that peripherals can be connected to it with a minimal amount of logic. These peripherals cores can either be full AXI4 (full) cores in their own respect, subject to simplification rules discussed within, simplified AXI-lite slave as one might use with AXILDOUBLE, or even simpler than that. Key to this simplification is the assumption that the simplified slaves must never stall the bus, and that they must always return responses within one clock cycle. The AXIDOUBLE core handles all backpressure issues, ID logic, burst logic, address selection, invalid address return and exclusive access logic.

This core has been formally verified.
WBXCLK can be used to cross clock domains on a pipelined Wishbone bus. It's conceptually an asynchronous request FIFO coupled with an asynchronous acknowledgment FIFO to cross clock domains. A counter in the original clock domain guarantees that the number of outstanding transactions remains smaller than the FIFO size. The design is complicated by the masters ability to arbitrarily lower CYC at any time mid-cycle and reliably be able to cancel any outgoing transactions in the downstream channel direction.

This core has been formally verified.
APBXCLK can be used to cross clock domains on an APB bus. Unlike other solutions in this repository, this implementation is not pipelined--simply because the APB bus specification will not let it be so.

This core has been formally verified.
AXIVFIFO implements a virtual FIFO. A virtual FIFO is basically a memory backed FIFO. Hence, after data gets written to this core it is then burst across an AXI bus to the whatever memory device is connected to the bus. This allows you to build FIFOs of arbitrarily large length for ... whatever task.

This core has been formally verified.
AXIXCLK can be used to cross clock domains in an AXI context. As implemented, it is little more than a set of asynchronous FIFOs applied to each of the AXI channels. The asynchronous FIFOs have been formally verified,
AXIPERF is an AXI4 performance measurement peripheral. It has an AXI4 monitor interface, for use with monitoring an AXI4 (full) bus. A second AXI4-lite interface allows you to start, stop, or clear the data collection, as well as the ability to read the results back out. This core has been used to successfully measure bus latency and throughput, as well as to gain other valuable insights from any monitored AXI4 interface.

This core has been demonstrated in simulation. The AXI-lite interface has been formally verified.

AXI Stream

AXISBROADCAST is a quick AXI stream processing engine that takes a single AXI-stream source, and "broadcasts" it to multiple downstream AXI-stream sinks.

This core has been formally verified.
AXISPACKER packs AXI stream beats by removing bytes where TKEEP is zero.

This core has been formally verified.
AXISRANDOM is a quick AXI stream source generating random numbers via a linear feedback shift register.

This core has been formally verified.
AXISSWITCH is a quick switch for AXI streams. Given N stream inputs, select from among them to produce a stream output. Guarantees that the switch takes place at packet boundaries. Provides an AXI-lite interface for controlling which AXI stream gets forwarded downstream.

This core has been formally verified.

APB

There are now two APB cores in this repository:

APBSLAVE is a demonstration APB slave.

This core has been formally verified.
AXIL2APB -- a high throughput AXI-lite to APB bridge. Unlike other bridges, this one bridges to a single APB slave only. It can also maintain PSEL high across multiple bursts, achieving a maximum throughput rate of 50%.

This core has been formally verified.

Frequently Asked Questions and Common Issues

default_nettype none

It is my practice to set all of my design modules to default_nettype none. This tells the synthesis tool to generate an error any time I reference a signal which hasn't yet been defined. The default value, default_nettype wire, instructs the synthesis tool to instead generate a value, which will be a single bit of type wire, any time it sees an undefind value. The default also creates an opportunity for a misspelling to quickly turn into a design bug in many, many ways.

The annoying part of default_nettype none is that all inputs must be declared as wires. input signal_name; is not good enough, it must be input wire signal_name;. I find this to be a small nuisance to pay for the tremendous benefit default_nettype none offers.

Where this becomes a problem is when interfacing with other tools or other IP. Vivado HLS is known for producing logic which is not compatible with default_nettype none. I know of at least one ASIC foundary which produces simulation models for its components that are not default_nettype none safe. It's a common problem.

One solution to this problem is to remove the default_nettype none line. This defeats the whole purpose of using the flag. Another solution is to place default_nettype wire at the bottom of the file at issue. For some tools (Yosys), this will also defeat the benefits of default_nettype none.

A better solution is to fix the offending logic.

An easier solution is to adjust the synthesis file order. Because Verilog directives are processed as though all of the files were concatenated together, a change of file order can often fix this issue.

Licensing

This repository is licensed under the Apache 2 license.

Thanks

I'd like to thank @wallento for his initial work on a Wishbone to AXI converter, and his encouragement to improve upon it. While this isn't a fork of his work, the initial pipelined wishbone to AXI bridge which formed the core seed for this project took its initial motivation from his work.

Many of the rest of these projects have been motivated by the desire to learn and develop my formal verification skills. For that, I would thank the staff of Symbiotic EDA for their tools and their encouragement.