Home

Awesome

nvBench: Natural Language to Visualization (NL2VIS) Benchmarks

nvBench is a large dataset for complex and cross-domain NL2VIS task, which covers 105 domains, supports seven common types of visualizations, and contains 25,750 (NL, VIS) pairs. This repository contains the corpus of NL2VIS, with JSON format and Vega-Lite format.

Introduction to nvBench

nvBench.json

(NL, VIS) JSON format

Each (NL, VIS) pair is denoted as a JSON object in NVBench.json, with the following fields:

Below is an example:

"8": {
        "vis_query": {
            "vis_part": "Visualize PIE",
            "data_part": {
                "sql_part": "SELECT Rank , COUNT(Rank) FROM Faculty GROUP BY Rank",
                "binning": ""
            },
            "VQL": "Visualize PIE SELECT Rank , COUNT(Rank) FROM Faculty GROUP BY Rank"
        },
        "chart": "Pie",
        "hardness": "Easy",
        "db_id": "activity_1",
        "vis_obj": {
            "chart": "pie",
            "x_name": "Rank",
            "y_name": "CNT(Rank)",
            "x_data": [
                [
                    "AssocProf",
                    "AsstProf",
                    "Instructor",
                    "Professor"
                ]
            ],
            "y_data": [
                [
                    8,
                    15,
                    8,
                    27
                ]
            ],
            "classify": [],
            "describe": "GROUP BY Rank"
        },
        "nl_queries": [
            "A pie chart showing the number of faculty members for each rank.",
            "What is the number of the faculty members for each rank? Return a pie.",
            "Compute the total the number of rank across rank as a pie chart."
        ]
    }

Citation

When you use the nvBench dataset and the corresponding baseline models, we would appreciate it if you cite the following:

@inproceedings{nvBench_SIGMOD21,
  author    = {Yuyu Luo and
               Nan Tang and
               Guoliang Li and
               Chengliang Chai and
               Wenbo Li and
               Xuedi Qin},
  title     = {Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks},
  booktitle = {Proceedings of the 2021 International Conference on Management of
               Data, {SIGMOD} Conference 2021, June 20–25, 2021, Virtual Event, China},
  publisher = {{ACM}},
  year      = {2021},
}

NL2VIS Baselines

Please adapt the Seq2Seq Baselines at the Spider repository. Replace the data preprocessing part and fed the (NL, VIS) pairs of nvBench for training and testing.

Publications

For more details, please refer to our research paper.

Contributors

#ContributorAffiliationContact
1Guoliang LiProfessor, Tsinghua UniversityLastName+FirstName@tsinghua.edu.cn
2Nan TangSenior Scientist, Qatar Computing Research Institutentang@hbku.edu.qa
3Yuyu LuoPhD Student, Tsinghua Universityluoyy18@mails.tsinghua.edu.cn
If you have any questions or feedbacks about this project, please feel free to contact Yuyu Luo (luoyy18@mails.tsinghua.edu.cn).

License

nvBench is available under the MIT license.