Home

Awesome

<picture> <source media="(prefers-color-scheme: dark)" srcset="https://github.com/bentoml/BentoML/assets/489344/d3e6c95d-d224-49a5-9cff-0789f094e127"> <source media="(prefers-color-scheme: light)" srcset="https://github.com/bentoml/BentoML/assets/489344/de4da660-6aeb-4e5a-bf76-b7177435444d"> <img alt="BentoML: Unified Model Serving Framework" src="https://github.com/bentoml/BentoML/assets/489344/de4da660-6aeb-4e5a-bf76-b7177435444d" width="370" style="max-width: 100%;"> </picture>

Unified Model Serving Framework

🍱 Build model inference APIs and multi-model serving systems with any open-source or custom AI models. 👉 Join our Slack community!

License: Apache-2.0 Releases CI Twitter Community

What is BentoML?

BentoML is a Python library for building online serving systems optimized for AI apps and model inference.

Getting started

Install BentoML:

# Requires Python≥3.8
pip install -U bentoml

Define APIs in a service.py file.

from __future__ import annotations

import bentoml

@bentoml.service(
    resources={"cpu": "4"}
)
class Summarization:
    def __init__(self) -> None:
        import torch
        from transformers import pipeline

        device = "cuda" if torch.cuda.is_available() else "cpu"
        self.pipeline = pipeline('summarization', device=device)

    @bentoml.api(batchable=True)
    def summarize(self, texts: list[str]) -> list[str]:
        results = self.pipeline(texts)
        return [item['summary_text'] for item in results]

Run the service code locally (serving at http://localhost:3000 by default):

pip install torch transformers  # additional dependencies for local run

bentoml serve service.py:Summarization

Now you can run inference from your browser at http://localhost:3000 or with a Python script:

import bentoml

with bentoml.SyncHTTPClient('http://localhost:3000') as client:
    summarized_text: str = client.summarize([bentoml.__doc__])[0]
    print(f"Result: {summarized_text}")

Deploying your first Bento

To deploy your BentoML Service code, first create a bentofile.yaml file to define its dependencies and environments. Find the full list of bentofile options here.

service: "service:Summarization" # Entry service import path
include:
  - "*.py" # Include all .py files in current directory
python:
  packages: # Python dependencies to include
  - torch
  - transformers
docker:
  python_version: 3.11

Then, choose one of the following ways for deployment:

<details> <summary>🐳 Docker Container</summary>

Run bentoml build to package necessary code, models, dependency configs into a Bento - the standardized deployable artifact in BentoML:

bentoml build

Ensure Docker is running. Generate a Docker container image for deployment:

bentoml containerize summarization:latest

Run the generated image:

docker run --rm -p 3000:3000 summarization:latest
</details> <details> <summary>☁️ BentoCloud</summary>

BentoCloud provides compute infrastructure for rapid and reliable GenAI adoption. It helps speed up your BentoML development process leveraging cloud compute resources, and simplify how you deploy, scale and operate BentoML in production.

Sign up for BentoCloud for personal access; for enterprise use cases, contact our team.

# After signup, run the following command to create an API token:
bentoml cloud login

# Deploy from current directory:
bentoml deploy .

bentocloud-ui

</details>

For detailed explanations, read Quickstart.

Use cases

Check out the examples folder for more sample code and usage.

Advanced topics

See Documentation for more tutorials and guides.

Community

Get involved and join our Community Slack 💬, where thousands of AI/ML engineers help each other, contribute to the project, and talk about building AI products.

To report a bug or suggest a feature request, use GitHub Issues.

Contributing

There are many ways to contribute to the project:

Thanks to all of our amazing contributors!

<a href="https://github.com/bentoml/BentoML/graphs/contributors"> <img src="https://contrib.rocks/image?repo=bentoml/BentoML" /> </a>

Usage tracking and feedback

The BentoML framework collects anonymous usage data that helps our community improve the product. Only BentoML's internal API calls are being reported. This excludes any sensitive information, such as user code, model data, model names, or stack traces. Here's the code used for usage tracking. You can opt-out of usage tracking by the --do-not-track CLI option:

bentoml [command] --do-not-track

Or by setting the environment variable:

export BENTOML_DO_NOT_TRACK=True

License

Apache License 2.0