Awesome
Muchisim is a Simulation Framework for Design Exploration of Multi-Chip Manycore Systems We evaluate Muchisim at simulating systems with up to a million interconnected processing elements (PEs) while modeling data movement and communication in a cycle-accurate manner. In addition to performance, Muchisim reports the energy, area, and cost of the simulated system, and it comes with a benchmark application suite and two data visualization tools. Muchisim supports various parallelization strategies and communication primitives such as task-based parallelization and message passing, making it highly relevant for architectures with software-managed coherence and distributed memory. Via a case study, we show that Muchisim helps users explore the balance between memory and computation units and the constraints related to chiplet integration and inter-chip communication. Muchisim enables scaling up the systems in which new techniques or design parameters are evaluated, opening the gate for further research in this area.
For detailed information, check out our full paper. To cite Muchisim, please use
@inproceedings{muchisim,
title={Muchisim: A Simulation Framework for Design Exploration of Multi-Chip Manycore Systems},
author={Orenes-Vera, Marcelo and Tureci, Esin and Martonosi, Margaret and Wentzlaff, David},
booktitle={Proceedings of 2024 International Symposium on Performance Analysis of Systems and Software (ISPASS)},
year={2024},
url={https://doi.org/10.48550/arXiv.2312.10244}
}
<img width="323" alt="muchiSim" src="https://github.com/PrincetonUniversity/muchiSim/assets/55038083/c25721f2-7702-4a78-bc56-ca2d4f39a9ce">
Requirements
This simulator is written entirely in C/C++. It requires C++11 or superior.
It uses by default the compiler path set in the environment variable CXX or g++ if that variable is not set.
It has been tested with G++ versions 12 and 13, Apple Clang version 15 and Intel C++ Compiler version 2021.1.2
For parallelization it uses pthreads by default, or alternatively it can use OpenMP (see exp/run.sh)
The scripts inside the plots
folder require Python version 3. These scripts parse the simulator traces generated by the experiments to create plots.
Usage
Because many parameters are passed as macros, the simulator is compiled every time that a new experiment is launched. There are scripts that launch experiments. Those are stored in the exp/ folder. For example one can use the following command to run Sparse Matrix Vector Multiplication (SPMV), which corresponds to application #4 out of the applications that are included in this simulator by default (SSSP, Pagerank, BFS, WCC, SPMV, Histogram, 3D-FFT and SPMM)
exp/run_app.sh 4 0 A
The '0' refers to the configuration set inside the run_app.sh script.
The 'A' refers to the name we are giving to the experiment. This 'A' is also considered the name for the binary created inside the bin
folder.
Simulator parameters
The simulator has many configuration parameters inside src/configs
. Other parameters are set as C macros inside exp/run.sh
which is the file where the simulator gets compiled. The reason why some parameters are macros is for simulation efficiency, so that the code inside the ifdef
macros is only executed if that parameter is set.
Folder structure
src
contains the source files of the simulator
doall
contains the sequential or doall implementations of some of the applications included as benchmarks inside this simulator.
sim_logs
is the folder where simulation traces are generated into.
bin
contains the binary files created by the compilation of different experiments.
datasets
contains the datasets in binary format and TSV format.
exp
contains experiments scrips.
gui
contains a PyQt5 GUI to show plots based on simulation traces. (Installation instructions contained inside that folder.)
plots
contains python scripts to plot heatmaps and other characterization plots for different experiments.
More information
More information about muchiSim concepts are found on src/README.md
Research using MuchiSim
MuchiSim has helped evaluating "Tascade: Hardware Support for Atomic-free, Asynchronous and Efficient Reduction Trees ", and "DCRA: A Distributed Chiplet-based Reconfigurable Architecture for Irregular Applications".