Awesome
Imitater
A unified language model server built upon vllm and infinity.
Usage
Install
pip install -e .
Launch Server
imitater -c config/example.yaml
<details><summary>Show configuration instruction.</summary>
Add an OpenAI model
- name: OpenAI model name
- token: OpenAI token
Add a chat model
- name: Display name
- path: Model name on hub or local model path
- device: Device IDs
- port: Port ID
- maxlen: Maximum model length (optional)
- agent_type: Agent type (optional) {react, aligned}
- template: Template jinja file (optional)
- gen_config: Generation config folder (optional)
Add an embedding model
- name: Display name
- path: Model name on hub or local model path
- device: Device IDs (does not support multi-gpus)
- port: Port ID
- batch_size: Batch size (optional)
</details>
[!NOTE] Chat template is required for the chat models.
Use
export USE_MODELSCOPE_HUB=1
to download model from modelscope.
Test Server
python tests/test_openai.py -c config/example.yaml
Roadmap
- Response choices.
- Rerank model support.