Awesome

llama3.np

llama3.np is a pure NumPy implementation for Llama 3 model. For an accurate implementation, I ran the stories15M model trained by Andrej Karpathy.

For a detailed explanation in English, see Llama 3 implemented in pure NumPy.
If you're interested in CUDA implementation, see Llama 3 implemented in pure C/CUDA.

Usage

$ python llama3.py "I have a dream"
"""
I have a dream. He dream of a big, beautiful garden full of flower and tree. He dream of playing with hi friend and eating yummy snack.
One day, he wa walking in the garden when he saw

Token count: 50, elapsed: 1.53s, 33 tokens/s
"""

Citing llama3.np

If you use or discuss llama3.np in your academic research, please cite the project to help spread awareness:

@misc{llama3.np,
  title = {llama3.np: pure NumPy implementation for Llama 3 model},
  author = {Sang Park}, 
  howpublished = {\url{https://github.com/likejazz/llama3.np}},
  note = {llama3.np, MIT License}
  year = {2024},
}

References

Thank you to the creators of the following libraries and tools and their contributors:

llama2.c - @karpathy
llama.np - @hscspring
modeling_llama.py - Hugging Face's Transformers

I got a lot of information from the articles below:

42dot LLM 1.3B - 42dot
Exploring and building the LLaMA 3 Architecture : A Deep Dive into Components, Coding, and Inference Techniques - @vi.ai_
Rotary Embeddings: A Relative Revolution - EleutherAI
Mastering LLM Techniques: Inference Optimization - NVIDIA

And title image was generated by DALL-E

License

MIT