.. -- mode: rst --
PixelBytes+ is an Python project that generates and processes multimodal sequences, including pixels/video, audio, action-states, and text in a unified representation.
Requires Python 3.8+. Install via PyPI:
.. code-block:: bash
pip install git+
PixelBytes+ builds on theoretical foundations including Image Transformers, PixelRNN/PixelCNN, Bi-Mamba+, and MambaByte to create a unified representation for coherent multimodal generation and processing. It handles:
- Pixel/video sequences
- Audio data
- Action-state control
- Text
The model seamlessly manages transitions between modalities and maintains dimensional consistency.
Basic commands :
.. code-block:: python
tokenizer = ActionPixelBytesTokenizer(data_slicing=DATA_REDUCTION)
config = ModelConfig(vocab_size=VOCAB_SIZE, embed_size=EMBED_SIZE, hidden_size=HIDDEN_SIZE,
num_layers=NUM_LAYERS, pxby_dim=PXBY_DIM, auto_regressive=AR, model_type=MODEL_TYPE)
model = aPxBySequenceModel(config).to(DEVICE)
dataset = TokenPxByDataset(ds, tokenizer, SEQ_LENGTH, STRIDE)
dataloader = DataLoader(dataset, batch_size=BATCH_SIZE, collate_fn=collate_fn, shuffle=True)
model.train_model(train_dataloader, val_dataloader, optimizer, criterion, DEVICE, scaler, EPOCHS, ACCUMULATION_STEPS)
For detailed documentation, see the docs folder <docs/>
Use the PixelBytes-Pokemon dataset from Hugging Face: ffurfaro/PixelBytes-Pokemon <>
Cloud Deployment
Build and push Docker image:
.. code-block:: bash
docker build -t $USER/img_name . docker push $USER/img_name
docker-compose up --build
Deploy to your preferred cloud provider (OVH, Azure, AWS, Google Cloud).
Contributions welcome. Fork, create a feature branch, and submit a pull request.
MIT License
.. code-block:: bibtex
@article{furfaro:hal-04683349, TITLE = {{PixelBytes: Catching Unified Representation for Multimodal Generation}}, AUTHOR = {Furfaro, Fabien}, URL = {}, NOTE = {working paper or preprint}, YEAR = {2024}, KEYWORDS = {Embedding ; Multimodal representation learning ; Sequence generation}, HAL_ID = {hal-04683349}, }
@misc{furfaro2024pixelbytes_project, author = {Furfaro, Fabien}, title = {PixelBytes: A Unified Multimodal Representation Learning Project}, year = {2024}, howpublished = { GitHub: \url{}, Models: \url{} and \url{}, Datasets: \url{} and \url{} }, note = {GitHub repository, Hugging Face Model Hub, and Datasets Hub} }