Home

Awesome

<div align= "center"> <h1> <img src="img/dbagent.png" width="100px"> LLM As Database Administrator</h1> </div> <!-- <p align="center"> <a href="https://github.com/TsinghuaDatabaseGroup/DB-GPT/blob/main/LICENSE"> <img alt="License: Apache2" src="https://img.shields.io/badge/License-Apache_2.0-green.svg"> </a> <a href="https://github.com/OpenBMB/AgentVerse/blob/main/LICENSE"> <img alt="License: Apache2" src="https://img.shields.io/badge/License-Apache_2.0-green.svg"> </a> </p> --> <div align="center">

<a href="https://github.com/TsinghuaDatabaseGroup/DB-GPT/tree/main/diagnostic_files/public_testing_set/all_anomalies.jsonl">Dialogues</a> <a href="https://github.com/TsinghuaDatabaseGroup/DB-GPT/tree/main/multiagents/tools">Dialogues</a> <a href="https://github.com/TsinghuaDatabaseGroup/DB-GPT/tree/main/multiagents/localized_llms/training_data">Dialogues</a> <a href="https://cloud.tsinghua.edu.cn/f/6e8a3ad547204303a5ae/?dl=1">Dialogues</a> <a href="https://github.com/TsinghuaDatabaseGroup/DB-GPT/tree/main/multiagents/llms">Dialogues</a>

</div> <!-- <div align="center"> <a href="https://github.com/TsinghuaDatabaseGroup/DB-GPT/tree/main/localized_llms"> <em style="color: red;">DiagLLM</em> </a> πŸ”₯ </div> --> <p align="center"> <!-- <a href="#-features">Features</a> β€’ --> <a href="#-demo">Demo</a> β€’ <a href="#-quickstart">QuickStart</a> β€’ <a href="#-anomalies">Alerts And Anomalies</a> β€’ <a href="#-customize">Knowledge And Tools</a> β€’ <a href="#-docker">Dockers</a> β€’ <a href="#-FAQ">FAQ</a> β€’ <a href="#-community">Community</a> β€’ <a href="#-citation">Citation</a> β€’ <a href="#-contributors">Contributors</a> β€’ <a href="https://aihomea.com">OpenAI, Azure aggregated API discounted access plan.</a> </p> <p align="center"> <a href="#-contact">πŸ‘« Join Us on WeChat!</a> <bar> <a href="https://www.benchcouncil.org/evaluation/opencs/annual.html#Achievements">πŸ† Top 100 Open Project!</a> <bar> <a href="#-citation">🌟 VLDB 2024!</a> </p> <p align="center"> 【English | <a href="README_Chinese.md">δΈ­ζ–‡</a>】 </p> <p align="center"> <img src="img/overview_v4.png" width="800px"> </p>

🦾 Build your personal database administrator (D-Bot)πŸ§‘β€πŸ’», which is good at solving database problems by reading documents, using various tools, writing analysis reports! <a href="#-news">Undergoing An Upgrade!</a>

<!-- **An important, major [version update](https://github.com/TsinghuaDatabaseGroup/DB-GPT/tree/chatchat) is coming soon, stay tuned!** πŸ“£ πŸ”œ --> <!-- >Besides, to extend the database maintenance capability, we are also finetuning LLMs to support localized diagnosis, *query rewriting* and *anomaly simulation* (comming soon). -->

<span id="-demo"></span>

πŸ—Ί Online Demo

  1. After launching the local service (adopting frontend and configs from Chatchat), you can easily import documents into the knowledge base, utilize the knowledge base for well-founded Q&A and diagnosis analysis of abnormal alarms.
<!-- <p align="center"> <a href="http://dbgpt.dbmind.cn"> <img src="img/demo-dbot_en.png" width="800px"> </a> </p> -->

Watch the video

  1. With the user feedback function πŸ”—, you can (1) send feedbacks to make D-Bot follow and refine the intermediate diagnosis results, and (2) edit the diagnosis result by clicking the β€œEdit” button. D-Bot can accumulate refinement patterns from the user feedbacks (stored in vector database) and adaptively align to user's diagnosis preference.
<!-- extracted and stored refinement patterns from the feedbacks, with which D-Bot can adapt its performance by RAG techniques. --> <p align="center"> <img src="img/feedback-demo.png" width="800px"> </p>
  1. On the online website (http://dbgpt.dbmind.cn), you can browse all historical diagnosis results, used metrics, and detailed diagnosis processes.
<p align="center"> <a href="http://dbgpt.dbmind.cn"> <img src="img/frontend_v2_2.png" width="800px" alt="frontend_v2"> </a> </p>

Old Version 1: [Gradio for Diag Game] (no langchain)

Old Version 2: [Vue for Report Replay] (no langchain)

<span id="-news"></span>

πŸ“° Updates

<!-- - [x] An end-to-end framework is available! <a href="#-diagnosis">πŸš€ link</a> --> <!-- - [x] **[2023/8/25]** Support vue-based website interface. More flexible and beautiful! <a href="#-frontend">πŸ”— link</a> -->

This project is evolving with new features πŸ‘«πŸ‘«<br/> Don't forget to star ⭐ and watch πŸ‘€ to stay up to date :)

<span id="-quickstart"></span>

πŸ•Ή QuickStart

1. Environment Setup

1.1 backend setup

$ python --version
Python 3.10.12
# Clone the repository
$ git clone https://github.com/TsinghuaDatabaseGroup/DB-GPT.git

# Enter the directory
$ cd DB-GPT

# Install all dependencies
$ pip3 install -r requirements.txt 
$ pip3 install -r requirements_api.txt # If only running the API, you can just install the API dependencies, please use requirements_api.txt

# Default dependencies include the basic runtime environment (Chroma-DB vector library). If you want to use other vector libraries, please uncomment the respective dependencies in requirements.txt before installation.
<!-- > You can comment the Llama2Chat/CodeLlamaChat/Baichuan2Chat imports in ./llms/__init__.py if using openai models only -->

If fail to install google-colab, try conda install -c conda-forge google-colab

  1. Download the model parameters of Sentence Trasformer

Create a new directory ./multiagents/localized_llms/sentence_embedding/

Place the downloaded sentence-transformer.zip in the ./multiagents/localized_llms/sentence_embedding/ directory; unzip the archive.

  1. Download LLM and embedding models from HuggingFace.

To download models, first install Git LFS, then run

$ git lfs install
$ git clone https://huggingface.co/moka-ai/m3e-base
$ git clone https://huggingface.co/Qwen/Qwen-1_8B-Chat
  1. Adapt the model configuration to the download model paths, e.g.,
EMBEDDING_MODEL = "m3e-base"
LLM_MODELS = ["Qwen-1_8B-Chat"]
MODEL_PATH = {
    "embed_model": {
        "m3e-base": "m3e-base", # Download path of embedding model.
    },

    "llm_model": {
        "Qwen-1_8B-Chat": "Qwen-1_8B-Chat", # Download path of LLM.
    },
}
  1. Download and config localized LLMs.

1.2 frontend setup

$ node -v
v18.15.0

Install pnpm and dependencies

cd webui
# pnpm address https://pnpm.io/zh/motivation
# install dependency(Recommend use pnpm)
# you can  use "npm -g i pnpm" to install pnpm 
pnpm install

2. Initialize Knowledge Base and Configuration Files

Copy the configuration files

$ python copy_config_example.py
# The generated configuration files are in the configs/ directory
# basic_config.py is the basic configuration file, no modification needed
# diagnose_config.py is the diagnostic configuration file, needs to be modified according to your environment.
# kb_config.py is the knowledge base configuration file, you can modify DEFAULT_VS_TYPE to specify the storage vector library of the knowledge base, or modify related paths.
# model_config.py is the model configuration file, you can modify LLM_MODELS to specify the model used, the current model configuration is mainly for knowledge base search, diagnostic related models are still hardcoded in the code, they will be unified here later.
# prompt_config.py is the prompt configuration file, mainly for LLM dialogue and knowledge base prompts.
# server_config.py is the server configuration file, mainly for server port numbers, etc.

!!! Attention, please modify the following configurations before initializing the knowledge base, otherwise, it may cause the database initialization to fail.

# EMBEDDING_MODEL   Vectorization model, if choosing a local model, it needs to be downloaded to the root directory as required.
# LLM_MODELS        LLM, if choosing a local model, it needs to be downloaded to the root directory as required.
# ONLINE_LLM_MODEL  If using an online model, you need to modify the configuration.
# WEBUI_SERVER.api_base_url   Pay attention to this parameter, if deploying the project on a server, then you need to modify the configuration.
DIAGNOSTIC_CONFIG_FILE = "config.yaml"
DIAGNOSTIC_CONFIG_FILE = "config_feedback.yaml"
DIAGNOSTIC_CONFIG_FILE = "config_qwen.yaml"
$ python init_database.py --recreate-vs

3. One-click Start

Start the project with the following commands

$ python startup.py -a

4. Launch Interface Examples

If started correctly, you will see the following interface

  1. FastAPI Docs Interface

  1. Web UI Launch Interface Examples:

img

πŸ‘©πŸ»β€βš•οΈ Anomaly Diagnosis

<span id="-prerequisites"></span>

1. Prerequisites

Save time by trying out the <a href="#-docker">docker deployment</a>.

<!-- Step 2: Configure environment variables. - Export your OpenAI API key ```bash # macos export OPENAI_API_KEY="your_api_key_here" ``` ```bash # windows set OPENAI_API_KEY="your_api_key_here" ``` --> <!-- Step 2: Add database/anomaly/prometheus settings into [diagnose_config.py](configs/tool_config_example.yaml) and rename into *tool_config.yaml*: ```bash POSTGRESQL: host: 182.92.xxx.x port: 5432 user: xxxx password: xxxxx dbname: postgres DATABASESERVER: server_address: 182.92.xxx.x username: root password: xxxxx remote_directory: /var/lib/pgsql/12/data/log PROMETHEUS: api_url: http://8.131.xxx.xx:9090/ postgresql_exporter_instance: 172.27.xx.xx:9187 node_exporter_instance: 172.27.xx.xx:9100 ``` > *remote_directory* in the DATABASESERVER setting indicates where the slow query log file is located at (<a href="#-prerequisites">link</a>). - If accessing openai service via vpn, execute this command: ```bash # macos export https_proxy=http://127.0.0.1:7890 http_proxy=http://127.0.0.1:7890 all_proxy=socks5://127.0.0.1:7890 ``` - Test your openai key ```bash cd others python openai_test.py ``` --> <!-- Step 4: Download [Sentence Trasformer](https://cloud.tsinghua.edu.cn/f/6e8a3ad547204303a5ae/?dl=1) model parameters - Create new directory *./localized_llms/sentence_embedding* - Move the downloaded sentence-transformer.zip to *./localized_llms/sentence_embedding/* directory, and unzip it. -->

2. Test typical cases

We put multiple test cases under the test_case folder. You can select a case file on the front-end page for diagnosis or use the command line.

python3 run_diagnose.py --anomaly_file ./test_cases/testing_cases_5.json --config_file config.yaml 
<!-- - Test single case ```shell python main.py ``` - Test in batch ```shell python batch_main.py ``` -->

<span id="-anomalies"></span>

🎩 Alerts And Anomalies

Alert Management

Check out how to deploy prometheus and alertmanager in prometheus_service_docker.

<!-- You can find more information about how to configure alertmanager here: [alertmanager.md](https://prometheus.io/docs/alerting/latest/configuration/). - We provide configuration files associated with AlertManager, The value can be [alertmanager.yml](./prometheus_service_docker/alertmanager.yml) and [node_rules.yml](prometheus_service_docker/node_rules) .yml) and [pgsql_rules.yml](prometheus_service_docker/pgsql_rules.yml). You can deploy it to your Prometheus server to retrieve the associated exceptions. - We also provide webhook server that supports getting alerts. The path is a webhook folder in the root directory that you can deploy to your server to get and store Prometheus's alerts in files. - Currently, the alert file is obtained using SSh. You need to configure your server information in the [tool_config.yaml](./config/tool_config_example.yaml) in the config folder. - [node_rules.yml](prometheus_service_docker/node_rules.yml) and [pgsql_rules.yml](prometheus_service_docker/pgsql_rules.yml) is a reference https://github.com/Vonng/pigsty code in this open source project, their monitoring do very well, thank them for their effort. -->

Anomaly Simulation

Script-Triggered Anomalies

We provide scripts that trigger typical anomalies (anomalies directory) using highly concurrent operations (e.g., inserts, deletes, updates) in combination with specific test benches.

Single Root Cause Anomalies:

Execute the following command to trigger a single type of anomaly with customized parameters:

python anomaly_trigger/main.py --anomaly MISSING_INDEXES --threads 100 --ncolumn 20 --colsize 100 --nrow 20000

Parameters:

Multiple Root Cause Anomalies:

To trigger anomalies caused by multiple factors, use the following command:

python anomaly_trigger/multi_anomalies.py

Modify the script as needed to simulate different types of anomalies.

Root CauseDescriptionPotential Alerts
INSERT_LARGE_DATALong execution time for large data insert
FETCH_LARGE_DATALong execution time for large data fetch
REDUNDANT_INDEXUnnecessary and redundant indexes in tables
VACUUMUnused space caused by data modifications
POOR_JOIN_PERFORMANCEPoor performance of join operators
CORRELATED_SUBQUERYNon-promotable subqueries in SQL statements,
LOCK_CONTENTIONLock contention issues
CPU_CONTENTIONSevere CPU resource contention
IO_CONTENTIONIO resource contention affecting SQL performance
COMMIT_CONTENTIONHighly concurrent commits affecting SQL execution
SMALL_MEMORY_ALLOCToo small allocated memory space

Check detailed use cases at http://dbgpt.dbmind.cn.

Manually Designed Anomalies

Click to check 29 typical anomalies together with expert analysis (supported by the DBMind team)

<span id="-customize"></span>

πŸ“Ž Customize Knowledge And Tools

<span id="-doc2knowledge"></span>

1. Knowledge Extraction

(Basic version by Zui Chen)

(1) If you only need simple document splitting, you can directly use the document import function in the "Knowledge Base Management Page".

(2) We require the document itself to have chapter format information, and currently only support the docx format.

Step 1. Configure the ROOT_DIR_NAME path in ./doc2knowledge/doc_to_section.py and store all docx format documents in ROOT_DIR_NAME.

Step 2. Configure OPENAI_KEY.

export OPENAI_API_KEY=XXXXX

Step 3. Split the document into separate chapter files by chapter index.

cd doc2knowledge/
python doc_to_section.py

Step 4. Modify parameters in the doc2knowledge.py script and run the script:

python doc2knowledge.py

Step 5. With the extracted knowledge, you can visualize their clustering results:

python knowledge_clustering.py

<span id="-tools"></span>

2. Tool Preparation

Index Advisor Tool

We utilize db2advis heuristic algorithm to recommend indexes for given workloads. The function api is optimize_index_selection.

<span id="-docker"></span>

🐳 Docker Start

You can use docker for a quick and safe use of the monitoring platform and database.

1. Install Docker and Docker-Compose

Refer to tutorials (e.g., on CentOS) for installing Docker and Docoker-Compose.

2. Start service

We use docker-compose to build and manage multiple dockers for metric monitoring (prometheus), alert (alertmanager), database (postgres_db), and alert recording (python_app).

cd prometheus_service_docker
docker-compose  -p prometheus_service  -f docker-compose.yml up --build

Next time starting the prometheus_service, you can directly execute "docker-compose -p prometheus_service -f docker-compose.yml up" without building the dockers.

3. Run anomaly files and generate new alerts

Configure the settings in anomaly_trigger/utils/database.py (e.g., replace "host" with the IP address of the server) and execute an anomaly generation command, like:

cd anomaly_trigger
python3 main.py --anomaly MISSING_INDEXES --threads 100 --ncolumn 20 --colsize 100 --nrow 20000

You may need to modify the arugment values like "--threads 100" if no alert is recorded after execution.

After receiving a request sent to http://127.0.0.1:8023/alert from prometheus_service, the alert summary will be recorded in prometheus_and_db_docker/alert_history.txt, like:

<p align="center"> <img src="img/example_alert.png" width="800px"> </p>

This way, you can use the alert marked as `resolved' as a new anomaly (under the ./diagnostic_files directory) for diagnosis by d-bot.

<span id="-FAQ"></span>

πŸ’ FAQ

<details><summary><b>🀨 The '.sh' script command cannot be executed on windows system.</b></summary> Switch the shell to *git bash* or use *git bash* to execute the '.sh' script. </details> <details><summary><b>🀨 "No module named 'xxx'" on windows system.</b></summary> This error is caused by issues with the Python runtime environment path. You need to perform the following steps:

Step 1: Check Environment Variables.

<div align="center"> <img src="img/faq2.png" width="800px"> </div>

You must configure the "Scripts" in the environment variables.

Step 2: Check IDE Settings.

For VS Code, download the Python extension for code. For PyCharm, specify the Python version for the current project.

</details>

⏱ Todo

<span id="-community"></span>

πŸ‘« Community

<span id="-projects"></span>

πŸ€— Relevant Projects

https://github.com/OpenBMB/AgentVerse

https://github.com/Vonng/pigsty

https://github.com/UKPLab/sentence-transformers

https://github.com/chatchat-space/Langchain-Chatchat

https://github.com/shreyashankar/spade-experiments

<span id="-citation"></span>

πŸ“’ Citation

Feel free to cite us (paper link) if you like this project.

@misc{zhou2023llm4diag,
      title={D-Bot: Database Diagnosis System using Large Language Models}, 
      author={Xuanhe Zhou, Guoliang Li, Zhaoyan Sun, Zhiyuan Liu, Weize Chen, Jianming Wu, Jiesi Liu, Ruohang Feng, Guoyang Zeng},
      year={2023},
      eprint={2312.01454},
      archivePrefix={arXiv},
      primaryClass={cs.DB}
}
@misc{zhou2023dbgpt,
      title={DB-GPT: Large Language Model Meets Database}, 
      author={Xuanhe Zhou, Zhaoyan Sun, Guoliang Li},
      year={2023},
      archivePrefix={Data Science and Engineering},
}

<span id="-contributors"></span>

πŸ“§ Contributors

<!-- Copy-paste in your Readme.md file --> <a href="https://github.com/TsinghuaDatabaseGroup/DB-GPT/network/dependencies"> <img src="https://contrib.rocks/image?repo=TsinghuaDatabaseGroup/DB-GPT" /> </a>

Other Collaborators: Wei Zhou, Kunyi Li.

We thank all the contributors to this project. Do not hesitate if you would like to get involved or contribute!

<span id="-contact"></span>

Contact Information

πŸ‘πŸ»Welcome to our wechat group! If the QR code has expired, you can add the WeChat ID: ming621431. <span style="color: red">Please be sure to note [DB-GPT]!!!</span>

<div align= "center"> <img src="img/dbgpt_group_2024-09-11.jpg" width="400px"> </div> <!-- ## ⭐️ Star History [![Star History Chart](https://api.star-history.com/svg?repos=TsinghuaDatabaseGroup/DB-GPT&type=Date)](https://star-history.com/#TsinghuaDatabaseGroup/DB-GPT&Date) -->