Awesome
Calculon - Co-design for large scale parallel applications
Running
Run Calculon like this:
$> PYTHONPATH=. ./bin/ <args>
Calculon is a hierarchical command line. To see the commands it accepts, use --help
or -h
:
$> PYTHONPATH=. ./bin/ -h
You can also see how to use any command specifically by using --help
or -h
on the command:
$> PYTHONPATH=. ./bin/ llm -h
LLM Example
Run a single calculation for LLM (~1 sec):
$> PYTHONPATH=. ./bin/ llm models/megatron-1T.json examples/3072_t4_p64_d12_mbs4_full.json systems/a100_80g.json -
Run a system execution optimizer for LLM (~1 min):
$> PYTHONPATH=. ./bin/ llm-optimal-execution models/turing-530B.json 5128 2520 float16 systems/a100_80g.json output.json -m
opt_exe.json
will contain the optimal way to run Turing-530B across 5128 A100 GPUs.
To store results from all successful runs from the same experiment, run a special system optimizer (~1 min):
$> PYTHONPATH=. ./bin/ llm-all-executions models/turing-530B.json 5128 2520 float16 systems/a100_80g.json all_output.csv
Testing and validation (optional)
To make sure that the current build is working, use
$> make test
To validate Calculon performance modeling against Megatron run on NVIDIA's Selene A100-based supercomputer with results published in "Sequence parallelism" paper, use
$> PYTHONPATH=. ./bin/calculon llm-validation