Home

Awesome

AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework

<p align="center"> <img src="assets/readme_logo.png" width="40%"/> <p>

Disclaimer

The content in this repository is for ACADEMIC RESEARCH AND EDUCATIONAL PURPOSE ONLY. Although StockGPT provides financial services across a variety of tasks and scenarios, the model should only be used as a reference for the user, and nothing generated by the model should be construed as financial, legal, or investment advice. The authors and contributors are not responsible for the accuracy, completeness or usefulness of the information generated by StockGPT, and users are encouraged to exercise their own judgment and seek professional advice before making any financial, legal or investment decisions. The use of the software and information in this repository is at the user's own risk.

By using or accessing the information in this repository, you agree to indemnify, defend, and hold harmless the authors, contributors, and any affiliated organizations or persons from any and all claims or damages.

Brief Introduction

<p align="center"> <img src="assets/case.png" width="50%"/> <p>

We opensource our AlphaFin series, now including AlphaFin dataset, the chat models trained on AlphaFin, namely StockGPT-Stage1 and StockGPT-Stage2, as well as Stock-Chain, the retrieval-augmented financial analysis framework.

We focus on two financial real-world tasks: Stock Trend Prediction and Financial Q&A. By integrating with RAG, we address the issue of hallucination of LLM’s output and LLM’s inability to generate real-time content.

Quick Start

Preparation

git clone https://github.com/AlphaFin-proj/AlphaFin.git
cd AlphaFin
pip install -r requirements.txt

Stage 1

Firstly, download ChatGLM2-6B and StockGPT-Stage1 model checkpoints locally. Then, fill in your tushare api token and the local path of these ckpts in scripts/stage1_trend_prediction.sh. Finally, execute the following command to run the code

bash scripts/stage1_trend_prediction.sh

Stage 2

Firstly, download ChatGLM2-6B, StockGPT-Stage2, and BGE-Large-zh model checkpoints locally. Then, specify the local path of these ckpts in scripts/stage2_financial_qa.sh. Finally, execute the following command to run the code

bash scripts/stage2_financial_qa.sh

For Stage 2, we provide 200 sample data of news, research report and stock price documents for you to try this project, and more document data will be opened in this project as soon as possible after it is sorted out.

Dataset

<p align="center"> <img src="assets/datasource.png" width="600"/> <p>

Data source and preprocessing of the proposed AlphaFin datasets is shown in the figure. We ensure that our dataset covers a wide range of core financial analysis tasks, including NLI, financial QA, stock trend predictions, and so on. AlphaFin contains both Chinese and English datasets to eliminate potential linguistic biases. The English data primarily encompasses traditional financial and NLP-related tasks, while the Chinese data mainly contains financial research reports and stock predictions.

DatasetSizeInput Len.Output Len.Lang.
Research42K712.85.6en
StockQA21K1313.640.8zh
Fin. News79K497.864.2zh
Fin. Reports120K220317.2zh
Fin. Reports CoT2002184.8407.8zh

Performance

<p align="center"> <img src="assets/long-short-test.png" width="1000"/> <p>

Our Stock-Chain achieves the highest AR and maintains an upward trend, starting from 2023. It indicates the effectiveness of Stock-Chain in investment. Stock-Chain achieves the highest 30.8% of ARR demonstrating its effectiveness.

Financial Q&A Cases

image

Citation

If you use AlphaFin in your work, please cite our paper.

@misc{li2024alphafin,
      title={AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework}, 
      author={Xiang Li and Zhenyu Li and Chen Shi and Yong Xu and Qing Du and Mingkui Tan and Jun Huang and Wei Lin},
      year={2024},
      eprint={2403.12582},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}