

MOSS Vortex

Moss Vortex is a lightweight, fast, simple, and high-performance deployment and inference backend engineered specifically for MOSS 003, providing a wealth of features aimed at enhancing performance and functionality, built upon the foundations of MOSEC and Torch.

You only need to execute a few commands and spend a few minutes to quickly deploy your MOSS 003 inference server on your own GPU server.

The features encompass:

The main flaw of MOSS Vortex is does not implement Token Batching, which is crucial for LLM reasoning, and I will implement it shortly.

<img src="./img/search_case.jpeg" alt="example" width="400" height="400">


To quickly deploy Moss Vortex using Docker:

git clone https://github.com/piglaker/vortex.git  
cd Vortex  
bash install_run.sh     


To run a test on MOSS Vortex:

bash scripts/test.sh


curl -X POST -d \
'{"x": "<|Human|>: hello<eoh>\n<|Inner thoughts|>: None.<eot>\n<|Commands|>: None<eoc>\n<|Results|>: None<eor>\n<|MOSS|>:", \  
"max_iterations":"128", \  
"temperature":"0.7", \

Return Format:

>> bash scripts/short_vortex_test.sh
  "pred": "<|Human|>: hello<eoh>\n<|Inner Thoughts|>: None.<eot>\n<|Commands|>: None<eoc>\n<|Results|>: None<eor>\n<|MOSS|>: Hello! How may I assist you today?<eom>", \
"input_token_num": 331, \
"new_generations_token_num": 10 \
"new_generations": " Hello! How may I assist you today?<eom>"


To check the metrics:


Logs Format:

2023-04-18 00:50:46,707 - 210 - INFO - mosec_server.py:652 - <|Human|>: 写一段python快排代码<eoh>
<|Inner Thoughts|>: None<eot>
<|Commands|>: None<eoc>
<|Results|>: None<eor>
<|MOSS|>: 这里是一个简单的Python快速排序的代码示例:

def quick_sort(arr):
    if len(arr) <= 1:
        return arr

    pivot = arr[0]  # 选择第一个元素作为基准点
    left = [x for x in arr[1:] if x <= pivot]
    right = [x for x in arr[1:] if x > pivot]

    return quick_sort(left) + right

# 示例
print(quick_sort([3, 6, 8, 10, 1, 2]))

输出结果为 `[1, 2, 3, 6, 8, 10]`。<eom>
2023-04-18 00:51:30,113 - 213 - INFO - mosec_server.py:743 - [MOSEC] [FORWARD] First Token Generation Cost: 0.09637761116027832
2023-04-18 00:51:30,461 - 213 - INFO - mosec_server.py:747 - [MOSEC] [FORWARD] Recent Token Generation Cost: 0.04331459999084473
2023-04-18 00:51:30,912 - 213 - INFO - mosec_server.py:542 - [MOSEC] [STREAM] Graceful close websockets 
2023-04-18 00:51:30,912 - 213 - INFO - mosec_server.py:623 - [MOSEC] [INFER] Request Cost: 0.8990638256072998


The following configurations are available for Moss Vortex:


If you use Moss Vortex in your work, please cite it as follows:

  title = {{Moss Vortex: An advanced deployment and inference backend for MOSS based on MOSEC and Torch}},  
  author = {Xiaotian Zhang, Zhengfu He, Tianxiang Sun},  
  url = {https://github.com/piglaker/Vortex},  
  year = {2023}  