Home

Awesome

<div align="center"> <!-- TITLE -->

VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models

</div>

VNHSGE is a dataset for large language models, collected from the Vietnamese National High School Graduation Examination and similar exams.

arXiv PWC

VNHSGE dataset and other datasets: the performance of ChatGPT and BingChat on the VNHSGE dataset is compared to other datasets in the GPT-4 Report

alt text

Latest News

<!--- We will update evaluation set including 30 essay on literature and 1700 multiple choice questions one other subjects [5/30/2023] --> <!-- SETUP -->

Dataset structure

VNHSGE dataset covers nice subjects including 300 essays on Literature and 19,000 multiple choice questions on other subjects.

SubjectTypeNumber of questions per examNumber of examsQuestion Total
MathematicsMultiple choice50502500
LiteratureEssay650300
EnglishMultiple choice50502500
PhysicsMultiple choice40502000
ChemistryMultiple choice40502000
BiologyMultiple choice40502000
HistoryMultiple choice40502000
GeographyMultiple choice40502000
Civic EducationMultiple choice40502000

Dataset folder

VNHSGE
├── VNHSGE-V                                 # Vietnamese versions
│    ├── JSON format                         # JSON folder                            
│    │   ├── eval                            # eval set
│    │   │     ├── Mathematics               # VNHSGE mathematics dataset
│    │   │     │   ├── MET_Math_IE_2023.json # JSON file     
│    │   │     │   ├── MET_Math_IE_2023      # Image folder
│    │   │     ├── ..........                # 
│    │   │     ├── Civic Education           # VNHSGE civic education dataset
│    │   ├── test                            # test set
│    │   ├── train                           # train set
│    └── Word format                         # Word folder                            
│    │   ├── eval                   
│    │   │     ├── Mathematics               # VNHSGE mathematics dataset
│    │   │     │   ├── MET_Math_IE_2023.docx # Word file 
│    │   │     ├── ..........                # 
│    │   │     ├── Civic Education           # VNHSGE civic education dataset
├── VNHSGE-E                                 # English versions

Word format

IDIQQCIAE
1The volume of a cube with edge 2a is: A. 8a^3 B. 2a^3 C. a^3 D. 6a^3AThe volume of a cube with edge 2a is: V=(2a)^3=8a^3

ID refers to the ID of the question; IQ refers to the images of the question; Q refers to the question content; C refers to the choice options; IE refers to the images of the explanation; and E refers to the explanation content.

JSON format

{ 
    "ID": "1", 
    "IQ": " ", 
    "Q": "1) The volume of a cube with edge 2a is:\nA. 8a^3.\t\nB. 2a^3.\t\nC. a^3.\t\nD. 6a^3.", "C": "A", 
    "IA": " ", 
    "E": "The volume of a cube with edge 2a is: V=(2a)^3=8a^3.",
}
<!-- LLMs Performance -->

ChatGPT and Bing AI Chat Performances

JSON format

{ 
    "ID": "1", 
    "IQ": " ", 
    "Q": "1) The volume of a cube with edge 2a is:\nA. 8a^3.\t\nB. 2a^3.\t\nC. a^3.\t\nD. 6a^3.", "C": "A", 
    "IA": " ", 
    "E": "The volume of a cube with edge 2a is: V=(2a)^3=8a^3.", 
    "CC": "A", 
    "CE": "The formula for the volume of a cube is V = s^3, where s is the length of one of its sides. Therefore, the volume of the
cube with a side length of 2a is: V = (2a)^3 = 8a^3", 
}

We evaluated the performance of ChatGPT and BingChat on the VNHSGE dataset.

Math ChatGPTMath BingChatLit ChatGPTLit BingChatEng ChatGPTEng BingChatPhy ChatGPTPhy BingChatChe ChatGPTChe BingChatBio ChatGPTBio BingChatHis ChatGPTHis BingChatGeo ChatGPTGeo BingChatCiv ChatGPTCiv BingChat
201952567552.757692605540556067.542.582.550756075
2020665668.951.25869662.567.542.557.56072.547.58552.5707087.5
202160667560.2576866067.562.55052.567.555907582.562.592.5
2022626056.37080946567.547.547.557.572.56092.562.58582.590
2023546264.849.75789457.572.547.552.5606577.592.567.58577.582.5
AVG58.8606856.879.292.461664852.8586956.588.561.579.570.585.5

For complex calculation and reasoning subjects like mathematics, physics, chemistry, and biology, ChatGPT and BingChat have performance ranges from 48% to 69%. However, for subjects that rely more on language skills, such as literature, English, history, geography, and civic education, their performances range from 56.5% to 92.4%.

alt text

<!-- CITATION -->

Citation

If you find this work useful for your research, please feel free to use them (don't forget to cite our paper):

@article{xuan2023vnhsge,
  title={VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models},
  author={Xuan-Quy, Dao and Ngoc-Bich, Le and The-Duy, Vo and Xuan-Dung, Phan and Bac-Bien, Ngo and Van-Tien, Nguyen and Thi-My-Thanh, Nguyen and Hong-Phuoc, Nguyen},
  journal={arXiv preprint arXiv:2305.12199},
  year={2023}
}

@article{dao2023can,
  title={Can ChatGPT pass the Vietnamese National High School Graduation Examination?},
  author={Xuan-Quy, Dao and Ngoc-Bich, Le and Xuan-Dung, Phan and Bac-Bien, Ngo},
  journal={arXiv preprint arXiv:2306.09170},
  year={2023}
}

@article{dao2023chatgpt,
  title={ChatGPT is Good but Bing Chat is Better for Vietnamese Students},
  author={Xuan-Quy, Dao and Ngoc-Bich, Le and Xuan-Dung, Phan and Bac-Bien, Ngo},
  journal={arXiv preprint arXiv:2307.08272},
  year={2023}
}