Awesome

.. -- mode: rst --

PixelBytes+

PixelBytes+ is an Python project that generates and processes multimodal sequences, including pixels/video, audio, action-states, and text in a unified representation.

Installation

Requires Python 3.8+. Install via PyPI:

.. code-block:: bash

pip install git+https://github.com/fabienfrfr/PixelBytes.git@main

Overview

PixelBytes+ builds on theoretical foundations including Image Transformers, PixelRNN/PixelCNN, Bi-Mamba+, and MambaByte to create a unified representation for coherent multimodal generation and processing. It handles:

Pixel/video sequences
Audio data
Action-state control
Text

The model seamlessly manages transitions between modalities and maintains dimensional consistency.

Usage

Basic commands :

.. code-block:: python

tokenizer = ActionPixelBytesTokenizer(data_slicing=DATA_REDUCTION)
config = ModelConfig(vocab_size=VOCAB_SIZE, embed_size=EMBED_SIZE, hidden_size=HIDDEN_SIZE, 
                      num_layers=NUM_LAYERS, pxby_dim=PXBY_DIM, auto_regressive=AR, model_type=MODEL_TYPE)
model = aPxBySequenceModel(config).to(DEVICE)
dataset = TokenPxByDataset(ds, tokenizer, SEQ_LENGTH, STRIDE)
dataloader = DataLoader(dataset, batch_size=BATCH_SIZE, collate_fn=collate_fn, shuffle=True)
model.train_model(train_dataloader, val_dataloader, optimizer, criterion, DEVICE, scaler, EPOCHS, ACCUMULATION_STEPS)

For detailed documentation, see the docs folder <docs/>_.

Dataset

Use the PixelBytes-Pokemon dataset from Hugging Face: ffurfaro/PixelBytes-Pokemon <https://huggingface.co/datasets/ffurfaro/PixelBytes-Pokemon>_

Cloud Deployment

Build and push Docker image:

.. code-block:: bash

docker build -t $USER/img_name . docker push $USER/img_name

docker-compose up --build

Deploy to your preferred cloud provider (OVH, Azure, AWS, Google Cloud).

Contributing

Contributions welcome. Fork, create a feature branch, and submit a pull request.

License

MIT License

Contact

fabien.furfaro_at_gmail.com

Citation

.. code-block:: bibtex

@article{furfaro:hal-04683349, TITLE = {{PixelBytes: Catching Unified Representation for Multimodal Generation}}, AUTHOR = {Furfaro, Fabien}, URL = {https://hal.science/hal-04683349}, NOTE = {working paper or preprint}, YEAR = {2024}, KEYWORDS = {Embedding ; Multimodal representation learning ; Sequence generation}, HAL_ID = {hal-04683349}, }

@misc{furfaro2024pixelbytes_project, author = {Furfaro, Fabien}, title = {PixelBytes: A Unified Multimodal Representation Learning Project}, year = {2024}, howpublished = { GitHub: \url{https://github.com/fabienfrfr/PixelBytes}, Models: \url{https://huggingface.co/ffurfaro/PixelBytes-Pokemon} and \url{https://huggingface.co/ffurfaro/aPixelBytes-Pokemon}, Datasets: \url{https://huggingface.co/datasets/ffurfaro/PixelBytes-Pokemon} and \url{https://huggingface.co/datasets/ffurfaro/PixelBytes-PokemonAll} }, note = {GitHub repository, Hugging Face Model Hub, and Datasets Hub} }