Home

Awesome

<img src="flagopen.png">

Latest News

About

FlagScale is a comprehensive toolkit designed to support the entire lifecycle of large models, developed with the backing of the Beijing Academy of Artificial Intelligence (BAAI). It builds on the strengths of several prominent open-source projects, including Megatron-LM and vllm, to provide a robust, end-to-end solution for managing and scaling large models.

The primary objective of FlagScale is to enable seamless scalability across diverse hardware architectures while maximizing computational resource efficiency and enhancing model performance. By offering essential components for model development, training, and deployment, FlagScale seeks to establish itself as an indispensable toolkit for optimizing both the speed and effectiveness of large model workflows.

FlagScale is also a part of FlagAI-Open, an open-source initiative by BAAI that aims to foster an open-source ecosystem for AI technologies. It serves as a platform where developers, researchers, and AI enthusiasts can collaborate on various AI projects, contribute to the development of cutting-edge AI solutions, and share their work with the global community.

Quick Start

FlagScale leverages Hydra for configuration management. The configurations are organized into two levels: an outer experiment-level YAML file and an inner task-level YAML file.

All valid configurations in the task-level YAML file correspond to the arguments used in backend engines such as Megatron-LM and vllm, with hyphens (-) replaced by underscores (_). For a complete list of available configurations, please refer to the backend engine documentation. Simply copy and modify the existing YAML files in the examples folder to get started.

Setup

We recommend using the latest release of NGC's PyTorch container for setup.

  1. Clone the repository:

    git clone https://github.com/FlagOpen/FlagScale.git
    
  2. Install the dependencies:

    cd FlagScale
    pip install -r requirements/requirements-dev.txt
    

    You can install only the required packages for the specific backend engine you need by modifying the requirements.

  3. Install the packages with customized extensions:

    cd vllm
    pip install .
    
    pip install -e ./megatron-energon
    cp -r megatron-energon/src/megatron/energon megatron/megatron
    

Run a Task

FlagScale provides a unified runner for various tasks, including training,inference and serve. Simply specify the configuration file to run the task with a single command. The runner will automatically load the configurations and execute the task. The following example demonstrates how to run a distributed training task.

Train

  1. Start the distributed training job:

    python run.py --config-path ./examples/aquila/conf --config-name config action=run
    

    The data_path in the demo is the path of the training datasets following the Megatron-LM format. For quickly running the pretraining process, we also provide a small processed data (bin and idx) from the Pile dataset.

  2. Stop the distributed training job:

    python run.py --config-path ./examples/aquila/conf --config-name config action=stop
    

Serve

  1. Start the server:
    python run.py --config-path ./examples/qwen/conf --config-name config_qwen2.5_7b action=run
    
  2. Stop the server:
    python run.py --config-path ./examples/qwen/conf --config-name config_qwen2.5_7b action=stop
    

For more details, please refer to Quick Start.

License

This project is licensed under the Apache License (Version 2.0). This project also contains other third-party components under other open-source licenses. See the LICENSE file for more information.