Home

Awesome

<div align="center"> <img src="figures/ultra_logo.png" width="250px">

Large-scale, Informative, and Diverse Multi-round Dialogue Data, and Models

<p align="center"> <a href="#UltraLM"> UltraLM</a> • <a href="http://39.101.77.220/">Data Explorer</a> • <a href="https://atlas.nomic.ai/map/0ce65783-c3a9-40b5-895d-384933f50081/a7b46301-022f-45d8-bbf4-98107eabdbac">Nomic AI Atlas Explorer</a> • <a href="#data">Data Release</a> • <a href="#construction-of-ultrachat">Construction Process</a> • <a href="https://arxiv.org/abs/2305.14233">Paper</a> </p> </div> <div align="center">

Dialogues Dialogues Dialogues

</div>

News

UltraLM

UltraLM is a series of chat language models trained on UltraChat. Currently, we have released the 13B version, which ranks #1 among open-source models and ranks #4 among all models on AlpacaEval Leaderboard (June 28, 2023). UltraLM-13B is based upon LLaMA-13B and supported by BMTrain in the training process.

Download

ModelLinkVersion
UltraLM-13BHuggingface Repov1.0
UltraLM-65BHuggingface Repov1.0
UltraLM-13BHuggingface Repov2.0
UltraRM-13BHuggingface Repov1.0
UltraCM-13BHuggingface Repov1.0

Use UltraLM

Note: Different hyper-parameters or system prompts will affect the outputs. You can refer to details in /UltraLM/inference_cli.py for our default setting.

<details><summary> <b> Performance </b> </summary> <p>

We report three evaluations in this section: Alpaca-Eval from Stanford, Evol-instruct from Microsoft's WizardLM, and our curated evaluation set. Evaluations of modern LLMs may be biased and affected by many factors, we are also actively working on more comprehensive evaluation methods.

Alpaca-Eval

AlpacaEval is a leaderboard specifically designed for evaluating LLMs. The leaderboard is created based on the win-rate against Text-Davince-003 automatically evaluated by GPT-4.

<img src="figures/alpaca.png" width="550px">

Evol-instruct

This dataset is constructed with an evolutionary strategy by rewriting the instructions through multiple rounds to obtain instructions at different complexity levels. The benchmark is developed by the WizardLM project, another excellent chat language model!

Results

Our Evaluation Set

We curate an evaluation set, encompassing the Vicuna Benchmark and additional 300 questions and instructions generated by GPT-4. The questions/instructions cover a wide range of topics, including commonsense, world knowledge, professional knowledge (specifically physics and biology) , mathematics, and writing tasks on different levels of difficulty. We use GPT-4 for evaluation. Here is the dataset.

Results

</p> </details> <details><summary> <b> Examples of UltraLM </b> </summary> <p> </p> </details>

Overview of UltraChat

This project aims to construct open-source, large-scale, and multi-round dialogue data powered by Turbo APIs to facilitate the construction of powerful language models with general conversational capability. In consideration of factors such as safeguarding privacy, we do not directly use any data available on the Internet as prompts.

<details><summary> <b> UltraChat is composed of three sectors </b> </summary> <p> </p> </details>

Disclaimer: Although the process of building UltraChat does NOT involve any publicly available benchmark data, scaling to a certain extent may still result in some overlap in some evaluation benchmarks. We would like to emphasize again that all the data is automatically generated (including the instructions and responses), and we do not insert any open benchmark data. For example, UltraChat was released (April, 2023) earlier than Alpaca Eval (May, 2023). We encourage users to closely monitor such phenomena, while we are also actively considering how to evaluate LLMs more properly.

<details><summary> <b>An Example of UltraChat </b> </summary> <p> <div align="center"> <img src="https://i.328888.xyz/2023/04/02/iHh8DC.png" width="900px"> </div> </p> </details>

Data

The dataset is intended solely for research and educational purposes and should not be construed as reflecting the opinions or views of the creators, owners, or contributors of this dataset. And it is distributed under the MIT license.

Data Release

Explore the data before downloading, or use Atlas explorer.

Direct Download links:

Data Format

Each line in the downloaded data file is a json dict containing the data id and dialogue data in a list format. Below is an example line.

{
  "id": "0", 
  "data": [
    "How can cross training benefit groups like runners, swimmers, or weightlifters?", 
    "Cross training can benefit groups like runners, swimmers, or weightlifters in the following ways: ...", 
    "That makes sense. I've been wanting to improve my running time, but I never thought about incorporating strength training. Do you have any recommendations for specific exercises?", 
    "Sure, here are some strength training exercises that can benefit runners: ...", 
    "Hmm, I'm not really a fan of weightlifting though. Can I incorporate other forms of exercise into my routine to improve my running time?", 
    "Yes, absolutely! ...",
    "..."
    ]
}

Training

We provide training code to fine-tune LLaMa (however we are not distributing the weights of LLaMa) on UltraChat in .src/, the training is accelerated by BMTrain.

We also provide a training script to fine-tune GPT-J on UltraChat in .src/train_legacy/, which is implemented with OpenPrompt

Construction of UltraChat

The general idea of UltraChat is to use separate LLMs to generate opening lines, simulate users and respond to queries. Each sector of UltraChat has its own challenges and requires particular strategy designs. We will specify the construction process once a sector of UltraChat is released.

<div align="center"> <img src="figures/ultra-process.png" width="700px"> </div> <details><summary> <b>Questions about the World</b> </summary> <p>

Meta Topics & Sub-Topics

<div align="center"> <img src="figures/meta_topic.png" width="650px"> </div> </p> <p>

Common Real-world Entities

</p> </details> <details><summary> <b>Writing and Creation</b> </summary> <p> <div align="center"> <img src="https://github.com/thunlp/UltraChat/raw/main/figures/figure.png" width="650px"> </div> </p> </details> <details><summary> <b>Assistance on Existent Materials</b> </summary> <p> </p> </details>

To Do

Limitations

Citation

Feel free to cite the repo if you think UltraChat is useful.

@article{ding2023enhancing,
  title={Enhancing Chat Language Models by Scaling High-quality Instructional Conversations},
  author={Ding, Ning and Chen, Yulin and Xu, Bokai and Qin, Yujia and Zheng, Zhi and Hu, Shengding and Liu, Zhiyuan and Sun, Maosong and Zhou, Bowen},
  journal={arXiv preprint arXiv:2305.14233},
  year={2023}
}