Home

Awesome

<div align="center"> <h1>Smurfs<br><a href=https://yoursmiles.org/h-smurf.php><img src=https://yoursmiles.org/hsmile/smurf/h3602.gif></a><a href=https://yoursmiles.org/h-smurf.php><img src=https://yoursmiles.org/hsmile/smurf/h3607.gif></a><a href=https://yoursmiles.org/h-smurf.php><img src=https://yoursmiles.org/hsmile/smurf/h3623.gif></a><a href=https://yoursmiles.org/h-smurf.php><img src=https://yoursmiles.org/hsmile/smurf/h3625.gif></a></h1> </div> <p align="center"> <img src="assets/logo.webp" width="512"> </p>

🤖This project aims to construct a synergistic multi-agent system that can handle complex multi-tool instructions without necessitating extra training. This MAS system is called Smurfs, just like the beloved cartoon characters of the same name, symbolize unity and resourcefulness, and are good at using tools to overcome any challenge they encounter.

✨ What's New

🗓 Coming Soon

✨Here is an overview of the Smurfs framework.

<br> <div align="center"> <img src="assets/overview.png" width="800px"> </div> <br>

✨✨Here is a demo of using Smurfs

<div align="center">

https://github.com/FreedomIntelligence/Smurfs/assets/99324175/2edd6d2e-e7f1-4e8e-a78e-56c613d2ba13

</div>

✨✨You can also try it using our huggingface space here

🚀 Inference

Add tool function to Smurfs/tools/tool_env.py and add all available tool function to tool_env variable, for example:

class HotpotToolEnv: ...

HPEnv = HotpotToolEnv()

tool_env = {
    "BingSearch": HPEnv.BingSearch,
    "Retrieve": HPEnv.Retrieve,
    "Lookup": HPEnv.Lookup
    }

Then add the tool description to a json file, for example:

[
    {
        "api_name": "BingSearch",
        "api_description": "BingSearch can search for rich external knowledge on the Internet based on keywords, which can compensate for knowledge fallacy and knowledge outdated.",
        "required_parameters": [
            {
                "name": "query",
                "type": "string",
                "description": "query used to search on the Internet. Should be specific and precise with your query to increase the chances of getting relevant results.",
                "default": ""
            }
        ],
        "optional_parameters": []
    },
   ... 
]

then run

python Smurfs/deploy/cli_inference.py

and type in the input query.

python Smurfs/deploy/gradio_inference.py

📚 Data

You need to first get the StableToolBench dataset and server cache by following the instructions in their repo, and deploy the API server to perform the experiment.

The reproduction data of smurfs can be found at reproduction_data. You can use these data to reproduce our experiment result.

🧐 Experiment

model_path="/home/Mistral-7B-Instruct-v0.2"
model_name="Mistral-7B-Instruct-v0.2"
tensor_parallel_size=4

cd $model_path
cd ..
python -m vllm.entrypoints.openai.api_server --model $model_name --dtype=half --tensor-parallel-size $tensor_parallel_size

Noted that some models do not have chat template in their tokenizer config file like vicuna, you need to download their chat template from the internet (for example here) and use the script below:

model_name="Your/Model/Name"
tensor_parallel_size=4
chat_template_path="Your/Template/Path"

cd $model_path
cd ..
python -m vllm.entrypoints.openai.api_server --model $model_name --dtype=half --tensor-parallel-size $tensor_parallel_size --chat-template $chat_template_path

The vLLM server can provide easy, fast, and cheap LLM serving for most popular open-source models. Using it can significantly increase the experiment speed. For more information of vLLM, see vLLM

export toolbench_key="Your_key"

model_name="Mistral-7B-Instruct-v0.2"
method_name="smurfs"
test_query_id_path="toolbench_data/data/test_query_ids"
query_file_dir="toolbench_data/data/test_instruction"
tool_env_dir="toolbench_data/data/toolenv/tools"


python Smurfs/inference/inference.py \
    --model_name $model_name \
    --toolbench_key $toolbench_key \
    --method_name $method_name \
    --test_query_id_path $test_query_id_path \
    --query_file_dir $query_file_dir \
    --tool_env_dir $tool_env_dir

If you want to do inference with customized RapidAPI account, pass your rapidapi key through rapidapi_key and specify the use_rapidapi_key argument in the script:

export rapidapi_key="Your_key"

model_name="Mistral-7B-Instruct-v0.2"
method_name="smurfs"
test_query_id_path="toolbench_data/data/test_query_ids"
query_file_dir="toolbench_data/data/test_instruction"
tool_env_dir="toolbench_data/data/toolenv/tools"


python Smurfs/inference/inference.py \
    --model_name $model_name \
    --toolbench_key $toolbench_key \
    --method_name $method_name \
    --test_query_id_path $test_query_id_path \
    --query_file_dir $query_file_dir \
    --tool_env_dir $tool_env_dir \
    --use_rapidapi_key
test_sets=("G2_category" "G2_instruction" "G3_instruction")
input_dir="data/smurfs"
example_dir="reproduction_data/mistral_smurfs"

python Smurfs/data/post_process.py \
    --input_dir $input_dir \
    --test_sets "${test_sets[@]}" \
    --example_dir $example_dir

📊 Experiment Result

In our main experiments on StableToolBench, Smurfs can improve the ability of the base model to handle complex multi-tool instructions that match or even exceed that of capabilities of GPT4-DFSDT. Below are the main results. The win rate for each model is compared with ChatGPT-ReACT.

Pass Rate:

BackboneMethodI1-Inst.I1-Cat.I1-Tool.I2-Cat.I2-Inst.I3-Inst.Average
GPT-3.5 TurboReACT41.6±1.248.4±0.552.5±0.552.2±1.031.6±1.239.9±2.044.4±1.1
GPT-3.5 TurboDFSDT54.1±1.060.1±0.059.9±1.760.9±0.952.8±3.744.3±4.855.4±2.0
GPT-3.5 TurboSmurfs60.3±1.567.0±1.060.3±1.354.3±0.442.6±1.660.1±1.057.4±1.1
Mistral-7BReACT0.00.00.00.00.00.00.0
Mistral-7BDFSDT0.00.00.00.00.00.00.0
Mistral-7BSmurfs76.3±0.886.7±1.281.0±1.970.4±2.763.8±2.485.2±0.777.2±1.6
GPT-4 TurboReACT41.1±1.553.2±1.342.2±1.150.0±0.738.7±0.837.7±1.343.8±1.1
GPT-4 TurboDFSDT52.7±1.458.2±0.959.7±1.259.3±0.752.2±2.361.5±1.857.3±1.4
GPT-4 TurboSmurfs59.3±1.473.3±1.367.4±0.766.7±1.955.5±1.470.5±0.065.5±1.1

Win Rate:

BackboneMethodI1-Inst.I1-Cat.I1-Tool.I2-Cat.I2-Inst.I3-Inst.Average
GPT-3.5 TurboReACT///////
GPT-3.5 TurboDFSDT64.461.453.862.966.054.160.4
GPT-3.5 TurboSmurfs65.069.954.463.764.257.462.4
Mistral-7BReACT0.00.00.00.00.00.00.0
Mistral-7BDFSDT0.00.00.00.00.00.00.0
Mistral-7BSmurfs63.862.758.254.067.057.460.5
GPT-4 TurboReACT60.162.148.157.365.147.556.7
GPT-4 TurboDFSDT69.966.058.262.167.965.665.0
GPT-4 TurboSmurfs71.272.569.673.466.072.170.8

Citation

@misc{chen2024smurfs,
      title={Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning}, 
      author={Junzhi Chen and Juhao Liang and Benyou Wang},
      year={2024},
      eprint={2405.05955},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

We are from the School of Data Science, the Chinese University of Hong Kong, Shenzhen (CUHKSZ) and the Shenzhen Rsearch Institute of Big Data (SRIBD).

Acknowledgement

We are aware that our works are inspired by the following works, including but not limited to