Home

Awesome

<p align="center"> <img src="https://i.imgur.com/7eR7Pan.png" width="400"><br> Run large language models at home, BitTorrent-style.<br> Fine-tuning and inference <a href="https://github.com/bigscience-workshop/petals#benchmarks">up to 10x faster</a> than offloading <br><br> <a href="https://pypi.org/project/petals/"><img src="https://img.shields.io/pypi/v/petals.svg?color=green"></a> <a href="https://discord.gg/tfHfe8B34k"><img src="https://img.shields.io/discord/865254854262652969?label=discord&logo=discord&logoColor=white"></a> <br> </p>

Generate text with distributed Llama 3.1 (up to 405B), Mixtral (8x22B), Falcon (40B+) or BLOOM (176B) and fine‑tune them for your own tasks β€” right from your desktop computer or Google Colab:

from transformers import AutoTokenizer
from petals import AutoDistributedModelForCausalLM

# Choose any model available at https://health.petals.dev
model_name = "meta-llama/Meta-Llama-3.1-405B-Instruct"

# Connect to a distributed network hosting model layers
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoDistributedModelForCausalLM.from_pretrained(model_name)

# Run the model as if it were on your computer
inputs = tokenizer("A cat sat", return_tensors="pt")["input_ids"]
outputs = model.generate(inputs, max_new_tokens=5)
print(tokenizer.decode(outputs[0]))  # A cat sat on a mat...
<p align="center"> πŸš€ &nbsp;<b><a href="https://colab.research.google.com/drive/1uCphNY7gfAUkdDrTx21dZZwCOUDCMPw8?usp=sharing">Try now in Colab</a></b> </p>

πŸ¦™ Want to run Llama? Request access to its weights, then run huggingface-cli login in the terminal before loading the model. Or just try it in our chatbot app.

πŸ” Privacy. Your data will be processed with the help of other people in the public swarm. Learn more about privacy here. For sensitive data, you can set up a private swarm among people you trust.

πŸ’¬ Any questions? Ping us in our Discord!

Connect your GPU and increase Petals capacity

Petals is a community-run system β€” we rely on people sharing their GPUs. You can help serving one of the available models or host a new model from πŸ€— Model Hub!

As an example, here is how to host a part of Llama 3.1 (405B) Instruct on your GPU:

πŸ¦™ Want to host Llama? Request access to its weights, then run huggingface-cli login in the terminal before loading the model.

🐧 Linux + Anaconda. Run these commands for NVIDIA GPUs (or follow this for AMD):

conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia
pip install git+https://github.com/bigscience-workshop/petals
python -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct

πŸͺŸ Windows + WSL. Follow this guide on our Wiki.

πŸ‹ Docker. Run our Docker image for NVIDIA GPUs (or follow this for AMD):

sudo docker run -p 31330:31330 --ipc host --gpus all --volume petals-cache:/cache --rm \
    learningathome/petals:main \
    python -m petals.cli.run_server --port 31330 meta-llama/Meta-Llama-3.1-405B-Instruct

🍏 macOS + Apple M1/M2 GPU. Install Homebrew, then run these commands:

brew install python
python3 -m pip install git+https://github.com/bigscience-workshop/petals
python3 -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct
<p align="center"> πŸ“š &nbsp;<b><a href="https://github.com/bigscience-workshop/petals/wiki/FAQ:-Frequently-asked-questions#running-a-server">Learn more</a></b> (how to use multiple GPUs, start the server on boot, etc.) </p>

πŸ”’ Security. Hosting a server does not allow others to run custom code on your computer. Learn more here.

πŸ’¬ Any questions? Ping us in our Discord!

πŸ† Thank you! Once you load and host 10+ blocks, we can show your name or link on the swarm monitor as a way to say thanks. You can specify them with --public_name YOUR_NAME.

How does it work?

<p align="center"> <img src="https://i.imgur.com/RTYF3yW.png" width="800"> </p> <p align="center"> πŸ“œ &nbsp;<b><a href="https://arxiv.org/pdf/2209.01188.pdf">Read paper</a></b> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; πŸ“š &nbsp;<b><a href="https://github.com/bigscience-workshop/petals/wiki/FAQ:-Frequently-asked-questions">See FAQ</a></b> </p>

πŸ“š Tutorials, examples, and more

Basic tutorials:

Useful tools:

Advanced guides:

Benchmarks

Please see Section 3.3 of our paper.

πŸ› οΈ Contributing

Please see our FAQ on contributing.

πŸ“œ Citations

Alexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Max Ryabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, and Colin Raffel. Petals: Collaborative Inference and Fine-tuning of Large Models. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations). 2023.

@inproceedings{borzunov2023petals,
  title = {Petals: Collaborative Inference and Fine-tuning of Large Models},
  author = {Borzunov, Alexander and Baranchuk, Dmitry and Dettmers, Tim and Riabinin, Maksim and Belkada, Younes and Chumachenko, Artem and Samygin, Pavel and Raffel, Colin},
  booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},
  pages = {558--568},
  year = {2023},
  url = {https://arxiv.org/abs/2209.01188}
}

Alexander Borzunov, Max Ryabinin, Artem Chumachenko, Dmitry Baranchuk, Tim Dettmers, Younes Belkada, Pavel Samygin, and Colin Raffel. Distributed inference and fine-tuning of large language models over the Internet. Advances in Neural Information Processing Systems 36 (2023).

@inproceedings{borzunov2023distributed,
  title = {Distributed inference and fine-tuning of large language models over the {I}nternet},
  author = {Borzunov, Alexander and Ryabinin, Max and Chumachenko, Artem and Baranchuk, Dmitry and Dettmers, Tim and Belkada, Younes and Samygin, Pavel and Raffel, Colin},
  booktitle = {Advances in Neural Information Processing Systems},
  volume = {36},
  pages = {12312--12331},
  year = {2023},
  url = {https://arxiv.org/abs/2312.08361}
}

<p align="center"> This project is a part of the <a href="https://bigscience.huggingface.co/">BigScience</a> research workshop. </p> <p align="center"> <img src="https://petals.dev/bigscience.png" width="150"> </p>