Awesome
nocgen
This package includes a Perl script that generates Verilog HDL codes of on-chip network consisting of virtual-channel routers.
RTL simulation, logic synthesis, place-and-route, post-layout simulation, and power-estimation can be performed for the generated Verilog HDL codes with standard EDA tools. These EDA scripts for Nangate Open Cell Library 45nm library are also included.
By modifying the Perl script, you can customize the network topology, routing algorithm, flit width, number of virtual channels, input buffer depth per VC, traffic pattern, packet length, etc.
Generating Verilog HDL model
$ ./nocgen.pl
You can customize the network topology. Below are some examples.
2D Mesh topology (4x4 = 16 nodes):
$array_size = 4;
$topology_type = mesh;
$routing_type = mesh2d;
Linear topology (16 nodes):
$array_size = 16;
$topology_type = linear;
$routing_type = mesh1d;
You can customize various router parameters. Below are the default values.
$data_width = 32;
$vch_num = 8;
$buf_size = 16;
$arbiter_type = fixed; # fixed or roundrobin
You can customize the traffic pattern. Below are the default values.
$traffic_ptn = random; # random or uniform
$packet_len = 5;
$packet_num = 40;
RTL simulation (Icarus Verilog or Cadence NC-Verilog)
For Icarus Verilog:
$ make isim
For NC-Verilog:
$ make nsim
Below are performance results ($packet_num=32, $traffic_ptn=uniform, $packet_len=15, $buf_size=15).
2D Mesh (16 nodes, 1 VC) 920 cycles
2D Mesh (16 nodes, 8 VCs) 693 cycles
Linear (16 nodes, 1 VC) 2402 cycles
Linear (16 nodes, 8 VCs) 1634 cycles
2D Mesh is better than Linear. Performance improves when using more VCs.
Design Synthesis (Synopsys Design Compiler)
$ make syn
Place and Route (Cadence SoC Encounter)
$ make par
Static Timing Analysis (Synopsys Design Compiler)
$ make sta
Gate-level simulation with SDF file (Cadence NC-Verilog)
$ make dsim
Power Estimation (Synopsys Design Compiler)
$ make power
Then you can estimate the energy-per-bit of a large router ($vch_num=8, $buf_size=16), as follows.
Power with 0 stream: 45.7 mW
Power with 1 stream: 46.0 mW
Power with 2 streams: 46.4 mW
Power with 3 streams: 46.8 mW
Power with 4 streams: 47.2 mW
Power with 5 streams: 47.6 mW
Delta is approx 0.4 mW. So, 0.4 mJ is consumed in 1 sec.
Frequency: 200 MHz
Link utilization: 4/13
Flit width: 32 bit
0.4[mJ] / 200[MHz] * 13/4 / 32[bit]
= ((0.4 * 10^(-3) / (200 * 10^6)) * 13/4 / 32) * 10^12
= 0.203125 [pJ/bit]
Also you can estimate the energy-per-bit of a large router ($vch_num=2, $buf_size=5), as follows.
Power with 0 stream: 4.59 mW
Power with 1 stream: 4.75 mW
Power with 2 streams: 4.92 mW
Power with 3 streams: 5.08 mW
Power with 4 streams: 5.25 mW
Power with 5 streams: 5.43 mW
Delta is approx 0.17 mW. So, 0.17 mJ is consumed in 1 sec.
Frequency: 200 MHz
Link utilization: 4/13
Flit width: 32 bit
0.17[mJ] / 200[MHz] * 13/4 / 32[bit]
= (0.17 * 10^(-3) / (200 * 10^6)) * 13/4 / 32) * 10^12
= 0.08632812 [pJ/bit]
Remove unused files
$ make allclean
$ ./nocgen.pl clean
Reference
If you use our NoC generator, please cite our original paper as follows.
"An open-source on-chip router model originally developed for [Matsutani_HPCA09]"
[Matsutani_HPCA09] Hiroki Matsutani, Michihiro Koibuchi, Hideharu Amano, Tsutomu Yoshinaga, "Prediction Router: Yet Another Low Latency On-Chip Router Architecture", Proc. of the 15th IEEE International Symposium on High-Performance Computer Architecture (HPCA'09), pp.367-378, Feb 2009.