Awesome

MOE

Collecting Activations for large models

Run python main.py --model=xxx --sharding. The script will load the pretrained weight from HF to our customized model and save the weight in a sharded format at ./result/[DATABASE]/[MODEL]/ShardedCkpt
Run python main.py --model=xxx to perform inference with the HF load_and_dispatch and collect the activations for use.

TODO:

[ ] Add Disk Offload Function. [ ] Process sharded format when the model size is larger than the main memory.