Home

Awesome

Data-Copilot

Open in Spaces arXiv zhihu

Overview

Data-Copilot is a LLM-based system that help you address data-related tasks.

Data-Copilot connects data sources from different domains and diverse user tastes, with the ability to autonomously manage, process, analyze, predict, and visualize data.

<img src="./assets/Word Art.png" alt="Image" style="width: 900px;">

See our paper: Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow, Wenqi Zhang, Yongliang Shen, Weiming Lu, Yueting Zhuang

πŸ”₯Demo video

Since gpt3.5 has only a 4k input token limit, it currently can access to Chinese stocks, funds and some economic data.

The Data-Copilot can query and predict data autonomously:

<img src="./demo1.png" alt="Image" style="width: 900px;">

Support model and data sources:

CHN StockCHN FundCHN Economic dataCHN Financial data
Openai-GPT3.5βœ“βœ“βœ“βœ“
Azure-GPT3.5βœ“βœ“βœ“βœ“
Qwen-72b-Chatβœ“βœ“βœ“βœ“
<img src="./assets/fig1.png" alt="Image" style="width: 900px;">

We propose Data-Copilot, an LLM-based system linking Chinese financial markets such as stock, funds, economic, financial data, and live news

🌳 QuickStart

First replace openai.key and Tushare token in main.py with your personal Openai key and Tushare token. The organization of the whole project is as follows:

|-- README.md
|-- app.py
|-- create_tool
|   |-- Atomic_api_json.py
|   `-- all_atomic_api.json
|-- lab_gpt4_call.py
|-- lab_llms_call.py
|-- main.py
|-- output
|-- prompt_lib
|   |-- prompt_economic.json
|   |-- prompt_financial.json
|   |-- prompt_fund.json
|   |-- prompt_intent_detection.json
|   |-- prompt_stock.json
|   |-- prompt_task.json
|   `-- prompt_visualization.json
|-- requirements.txt
|-- tool.py
|-- tool_lib
|   |-- atomic_api.json
|   |-- tool_backup.json
|   |-- tool_economic.json
|   |-- tool_financial.json
|   |-- tool_fund.json
|   |-- tool_stock.json
|   `-- tool_visualization.json

app.py is the file to start gradio. main.py is the processing flow of interface scheduling, and lab_gpt4_call.py is the file to call the GPT35 model. The tool_lib and tool.py contain the interface tools obtained after the first phase of interface design. The folder prompt_lib contains the design of the prompt and the in context demonstration.

Requirements

pip install -r requirements.txt

Then run the following command:

For Local

python main.py

You can select the LLM in main.py by setting:

model='<the model you choose>'

Remember to fill in the key of the LLM you chose:

Also, remember to fill in the Tushare token before running the code:

In tool.py for Tushare token

tushare_token = os.getenv('TUSHARE_TOKEN')
pro = ts.pro_api(tushare_token)

For Gradio

The Gradio demo is now hosted on Hugging Face Space. You can also run the following commands to start the demo locally:

python app.py

🌿 How to play

You can try our Data-Copilot for Chinese financial markets in Hugging Face Space:

Open in Spaces It has access to Chinese stocks, funds and some economic data. But because gpt3.5 only has 4k input token length, the current data access is still relatively small. In the future, data-copilot will support more data from foreign financial markets.

<img src="./assets/app.png" alt="Image" style="width: 900px">

🍺 Some cases

A case for Check the inflow of northbound every trading date

<img src="./assets/case2.png" alt="Image" style="width: 900px">

Citation

If you find this work useful in your method, you can cite the paper as below:

@article{zhang2023data,
  title={Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow},
  author={Zhang, Wenqi and Shen, Yongliang and Lu, Weiming and Zhuang, Yueting},
  journal={arXiv preprint arXiv:2306.07209},
  year={2023}
}

Contact

If you have any questions, please contact us by email: zhangwenqi@zju.edu.cn

Acknowledgement