Home

Awesome

<div align="center"> <h1> Index-1.9B </h1> </div> <p align="center"> <a href="./README_zh.md" target="_blank">切换到中文</a> | Online: <a href="https://huggingface.co/spaces/IndexTeam/Index-1.9B" target="_blank">Chat</a> and <a href="https://huggingface.co/spaces/IndexTeam/Index-1.9B-Character" target="_blank">Role-playing</a> | QQ: <a href="media/group_qrcode.jpg" target="_blank">QQ Group</a> </p>

Recent Updates :star2:

  1. Open-source 32K long-context model Index-1.9B-32K. Details: 📖 Index-1.9B-32K_Long_Context_Technical_Report.md
  2. Adapted to llamacpp and Ollama, see Index-1.9B-Chat-GGUF
  3. Open source Checkpoint before Decay available for research, see Index-1.9B-Constant-LR

Model Introduction

The Index-1.9B series is a lightweight version of the Index series models, including the following models:

Evaluation Results

ModelAverage scoreAverage English scoreMMLUCEVALCMMLUHellaSwagArc-CArc-E
Google Gemma 2B41.5846.7741.8131.3631.0266.8236.3942.07
Phi-2 (2.7B)58.8972.5457.6131.1232.0570.9474.5187.1
Qwen1.5-1.8B58.9659.2847.0559.4857.1258.3356.8274.93
Qwen2-1.5B(report)65.1762.5256.570.670.366.643.983.09
MiniCPM-2.4B-SFT62.5368.7553.849.1950.9767.2969.4484.48
Index-1.9B-Pure50.6152.9946.2446.5345.1962.6341.9761.1
Index-1.9B64.9269.9352.5357.0152.7980.6965.1581.35
Llama2-7B50.7960.3144.3232.4231.117646.374.6
Mistral-7B (report)/69.2360.1//81.355.580
Baichuan2-7B54.5353.5154.6456.1956.9525.0457.2577.12
Llama2-13B57.5166.6155.7839.9338.776.2258.8875.56
Baichuan2-13B68.9071.6959.6359.2161.2772.6170.0484.48
MPT-30B (report)/63.4846.9//79.950.676.5
Falcon-40B (report)/68.1855.4//83.654.579.2

Evaluation code is based on OpenCompass with compatibility modifications. See the evaluate folder for details.

Model Download

HuggingFaceModelScope
🤗 Index-1.9B-ChatIndex-1.9B-Chat
🤗 Index-1.9B-Character (Role-playing)Index-1.9B-Character (Role-playing)
🤗 Index-1.9B-BaseIndex-1.9B-Base
🤗 Index-1.9B-Base-PureIndex-1.9B-Base-Pure
🤗 Index-1.9B-32K (32K Long Context)Index-1.9B-32K (32K Long Context)

Usage Instructions

Environment Setup

  1. Download this repository:
git clone https://github.com/bilibili/Index-1.9B
cd Index-1.9B
  1. Install dependencies using pip:
pip install -r requirements.txt

Loading with Transformers

You can load the Index-1.9B-Chat model for dialogue using the following code:

import argparse
from transformers import AutoTokenizer, pipeline

# Attention! The directory must not contain "." and can be replaced with "_".
parser = argparse.ArgumentParser()
parser.add_argument('--model_path', default="./IndexTeam/Index-1.9B-Chat/", type=str, help="")
parser.add_argument('--device', default="cpu", type=str, help="") # also could be "cuda" or "mps" for Apple silicon
args = parser.parse_args()

tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
generator = pipeline("text-generation",
                    model=args.model_path,
                    tokenizer=tokenizer, trust_remote_code=True, 
                    device=args.device)


system_message = "你是由哔哩哔哩自主研发的大语言模型,名为“Index”。你能够根据用户传入的信息,帮助用户完成指定的任务,并生成恰当的、符合要求的回复。"
query = "续写 天不生我金坷垃"
model_input = []
model_input.append({"role": "system", "content": system_message})
model_input.append({"role": "user", "content": query})

model_output = generator(model_input, max_new_tokens=300, top_k=5, top_p=0.8, temperature=0.3, repetition_penalty=1.1, do_sample=True)

print('User:', query)
print('Model:', model_output)

Web Demo

Depends on Gradio, install with:

pip install gradio==4.29.0

Start a web server with the following code. After entering the access address in the browser, you can use the Index-1.9B-Chat model for dialogue:

python demo/web_demo.py --port='port' --model_path='/path/to/model/'

Terminal Demo

Note: Index-1.9B-32K can only be launched using this tool: demo/cli_long_text_demo.py!!!

Start a terminal demo with the following code to use the Index-1.9B-Chat model for dialogue:

python demo/cli_demo.py  --model_path='/path/to/model/'

Openai Api Demo

Depends on Flask, install with:

pip install flask==2.2.5

Start a Flask API with the following code:

python demo/openai_demo.py --model_path='/path/to/model/'

You can conduct dialogues via command line:

curl http://127.0.0.1:8010/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "messages": [
    {"role": "system", "content": "你是由哔哩哔哩自主研发的大语言模型,名为“Index”。你能够根据用户传入的信息,帮助用户完成指定的任务,并生成恰当的、符合要求的回复。"},
    {"role": "user", "content": "花儿为什么这么红?"}
    ]
    }'

Index-1.9B-32K Long Context Model Introduction

Model Overview

Index-1.9B-32K is a language model with only 1.9 billion parameters, yet it supports a context length of 32K (meaning this extremely small model can read documents of over 35,000 words in one go). The model has undergone Continue Pre-Training and Supervised Fine-Tuning (SFT) specifically for texts longer than 32K tokens, based on carefully curated long-text training data and self-built long-text instruction sets. The model is now open-source on both Hugging Face and ModelScope.

Despite its small size (about 2% of models like GPT-4), Index-1.9B-32K demonstrates excellent long-text processing capabilities. As shown in the figure below, our 1.9B-sized model's score even surpasses that of the 7B-sized model. Below is a comparison with models like GPT-4 and Qwen2:

<p align="center"> <img src="media/pk-all.png" alt="" width="800"> </p> <p align="center"><strong>Comparison of Index-1.9B-32K with GPT-4, Qwen2, and other models in Long Context capability</strong> </p>

In a 32K-length needle-in-a-haystack test, Index-1.9B-32K achieved excellent results, as shown in the figure below. The only exception was a small yellow spot (91.08 points) in the region of (32K length, 10% depth), with all other areas performing excellently in mostly green zones.

<p align="center"> <img src="media/needle-bench-en.png" alt="" width="900"> </p> <p align="center"><strong>NeedleBench Evaluation</strong></p>

Index-1.9B-32K Model Download, Usage, and Technical Report:

For details on downloading, usage, and the technical report for Index-1.9B-32K, see:

<a href="https://github.com/bilibili/Index-1.9B/blob/main/Index-1.9B-32K_Long_Context_Technical_Report.md" style="color: blue;"> 📖 <strong>Index-1.9B-32K Long Context Technical Report</strong> </a>

Details and Disclaimer for the Index Series Models

Index-1.9B-Chat Output Examples

Role Playing

We have simultaneously open-sourced the role-playing model and the accompanying framework. gradio demo

For detailed usage, please refer to the roleplay folder.

Long Text Translation and Summary(Index-1.9B-32K)

cd demo/
CUDA_VISIBLE_DEVICES=0 python cli_long_text_demo.py --model_path '/path/to/model/' --input_file_path data/user_long_text.txt
<p align="center"> <img src="media/qa-mark.png" alt="" width="1000"> </p> <p align="center"><strong>Translation and Summary (Bilibili financial report released on 2024.8.22)</strong></p>

Quantization

Depends on bitsandbytes, installation command:

pip install bitsandbytes==0.43.0

You can use the following script to perform int4 quantization, which has less performance loss and further saves video memory usage.

import torch
import argparse
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TextIteratorStreamer,
    GenerationConfig,
    BitsAndBytesConfig
)

parser = argparse.ArgumentParser()
parser.add_argument('--model_path', default="", type=str, help="")
parser.add_argument('--save_model_path', default="", type=str, help="")
args = parser.parse_args()

tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    llm_int8_threshold=6.0,
    llm_int8_has_fp16_weight=False,
)
model = AutoModelForCausalLM.from_pretrained(args.model_path, 
                                             device_map="auto",
                                             torch_dtype=torch.float16,
                                             quantization_config=quantization_config,
                                             trust_remote_code=True)
model.save_pretrained(args.save_model_path)
tokenizer.save_pretrained(args.save_model_path)

Fine-tuning

Follow the steps in the fine-tuning tutorial to quickly fine-tune the Index-1.9B-Chat model. Give it a try and customize your exclusive Index model!

Limitations and Disclaimer

Index-1.9B may generate inaccurate, biased, or otherwise objectionable content in certain situations. The model cannot understand, express personal opinions, or make value judgments. Its outputs do not represent the views and positions of the model developers. Therefore, please use the generated content with caution. Users should independently evaluate and verify the content generated by the model and should not disseminate harmful content. Developers should conduct safety tests and fine-tuning according to specific applications before deploying any related applications.

We strongly advise against using these models to create or disseminate harmful information or engage in activities that may harm public, national, or social security or violate regulations. Do not use the models for internet services without proper safety review and filing. We have made every effort to ensure the compliance of the training data, but due to the complexity of the model and data, unforeseen issues may still exist. We will not be held responsible for any problems arising from the use of these models, whether related to data security, public opinion risks, or any risks and issues caused by misunderstanding, misuse, dissemination, or non-compliant use of the model.

Model Open Source License

Using the source code from this repository requires compliance with the Apache-2.0. The use of the Index-1.9B model weights requires compliance with the INDEX_MODEL_LICENSE.

The Index-1.9B model weights are fully open for academic research and support free commercial use.

Citation

If you think our work is helpful to you, please feel free to cite it!

@article{Index,
  title={Index1.9B Technical Report},
  year={2024}
}

Extended Works

libllm: https://github.com/ling0322/libllm/blob/main/examples/python/run_bilibili_index.py

chatllm.cpp:https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md#role-play-with-rag

ollama:https://ollama.com/milkey/bilibili-index

self llm: https://github.com/datawhalechina/self-llm/blob/master/bilibili_Index-1.9B/04-Index-1.9B-Chat%20Lora%20微调.md