Awesome

Awesome-Education-LLM

Large Language Models (LLMs) are increasingly prevalent in every aspect of our lives, and their impact on education is particularly noteworthy. While some instructors are embracing the new technology to increase learning efficiency, others are concerned about potential negative impacts such as over-reliance. To help instructors, students, as well as AI researchers to keep track of the latest progress in AI applications in education, we have curated a reading list on this topic in this repo.

LLM-Assisted Teaching
LLM for Exercise Generation
LLM as Evaluator
AIGC Detection in Education
Education-related LLM Benchmarks
- Programming
- Others

LLM-Assisted Teaching

Computer Science

"The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming" [ACE 2022] [2022-02] [paper]
"Experiences from Using Code Explanations Generated by Large Language Models in a Web Software Development E-Book" [SIGCSE 2023] [2022-11] [paper]
"ChatGPT and Software Testing Education: Promises & Perils" [ICSTW 2023] [2023-02] [paper]
"Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming" [CHI 2023] [2023-02] [paper]
"ChatGPT for Education and Research: Opportunities, Threats, and Strategies" [Applied Sciences] [2023-05] [paper]
"Exploring the Responses of Large Language Models to Beginner Programmers' Help Requests" [ICER 2023] [2023-06] [paper]
"Learning from Teaching Assistants to Program with Subgoals: Exploring the Potential for AI Teaching Assistants" [2023-09] [paper]
"Teaching CS50 with AI: Leveraging Generative Artificial Intelligence in Computer Science Education" [SIGCSE 2024] [2024-03] [paper]
"An Exploratory Study on Upper-Level Computing Students' Use of Large Language Models as Tools in a Semester-Long Project" [ASEE 2024] [2024-03] [paper]
"Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint Novices" [CHI 2024] [2024-04] [paper]
"AI-Tutoring in Software Engineering Education" [2024-04] [paper]
"Enhancing Educational Efficiency: Generative AI Chatbots and DevOps in Education 4.0" [2024-04] [paper]
"CS1-LLM: Integrating LLMs into CS1 Instruction" [2024-04] [paper]
"The Widening Gap: The Benefits and Harms of Generative AI for Novice Programmers" [ICER 2024] [2024-05] [paper]
"Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming Course" [2024-06] [paper]
"Estimating Difficulty Levels of Programming Problems with Pre-trained Model" [2024-06] [paper]
"Enhancing Computer Programming Education with LLMs: A Study on Effective Prompt Engineering for Python Code Generation" [2024-07] [paper]

Writing

"ChatGPT User Experience: Implications for Education" [2022-12] [paper]

Math

"Learning gain differences between ChatGPT and human tutor generated algebra hints" [2023-02] [paper]

Medicine

"How Does ChatGPT Perform on the Medical Licensing Exams? The Implications of Large Language Models for Medical Education and Knowledge Assessment" [2022-12] [paper]
"Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models" [PLOS Digit Health] [2023-02] [paper]
"LLM-empowered Chatbots for Psychiatrist and Patient Simulation: Application and Evaluation" [2023-05] [paper]
"Simulation-Based Education in the Artificial Intelligence Era" [2023-06] [paper]
"PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals" [2024-05] [paper]

Finance

"Exploring the Role of Artificial Intelligence in Enhancing Academic Performance: A Case Study of ChatGPT" [2023-01] [paper]

Social Skills

"Social Skill Training with Large Language Models" [2024-04] [paper]

Teacher Training

"Using Large Language Models to Provide Explanatory Feedback to Human Tutors" [2023-06] [paper]
"GPTeach: Interactive TA Training with GPT-based Students" [L@S 2023] [2023-07] [paper]

Miscellaneous

"The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues" [EDM 2022] [2022-05] [paper]
"A Systematic Review of Generative AI for Teaching and Learning Practice" [2024-06] [paper]
"Simulating Classroom Education with LLM-Empowered Agents" [2024-06] [paper]
"Educational Personalized Learning Path Planning with Large Language Models" [2024-07] [paper]

LLM for Exercise Generation

Language

"EQG-RACE: Examination-Type Question Generation" [AAAI 2021] [2020-12] [paper]
"Question Generation for Reading Comprehension Assessment by Modeling How and What to Ask" [ACL Findings 2022] [2022-04] [paper]
"ReadingQuizMaker: A Human-NLP Collaborative System that Supports Instructors to Design High-Quality Reading Quiz Questions" [CHI 2023] [2023-04] [paper]
"Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language Learning" [EDM 2024] [2024-05] [paper]
"Generating Educational Materials with Different Levels of Readability using LLMs" [2024-06] [paper]

Computer Science

"Automatic Generation of Programming Exercises and Code Explanations using Large Language Models" [ICER 2022] [2022-06] [paper]
"Quiz Maker: Automatic Quiz Generation from Text Using NLP" [FTNCT] [2022-11] [paper]
"A Survey Study on the State of the Art of Programming Exercise Generation using Large Language Models" [CSEE&T 2024] [2024-05] [paper]
"Evaluating Contextually Personalized Programming Exercises Created with Generative AI" [2024-07] [paper]

Math

"A Multi-language Platform for Generating Algebraic Mathematical Word Problems" [ICIIS 2019] [2019-11] [paper]
"Generative AI for Enhancing Active Learning in Education: A Comparative Study of GPT-3.5 and GPT-4 in Crafting Customized Test Questions" [2024-06] [paper]

Medicine

"CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering" [BIBM 2021] [2020-10] [paper]
"LLM-Generated Multiple Choice Practice Quizzes for Pre-Clinical Medical Students; Use and Validity" [2024-05] [paper]

Other Disciplines

"Towards Human-Like Educational Question Generation with Large Language Models" [AIED 2022] [2022-07] [paper]
"Scalable Educational Question Generation with Pre-trained Language Models" [AIED 2023] [2023-05] [paper]
"Exploring the Capabilities of Prompted Large Language Models in Educational and Assessment Applications" [EDM 2024] [2024-05] [paper]
"How Effective is GPT-4 Turbo in Generating School-Level Questions from Textbooks Based on Bloom's Revised Taxonomy?" [2024-06] [paper]

LLM as Evaluator

Writing Evaluation

"Automated Essay Scoring based on Two-Stage Learning" [2019-01] [paper]
"Language models and Automated Essay Scoring" [2019-09] [paper]
"Should You Fine-Tune BERT for Automated Essay Scoring?" [BEA 2020] [2020-07] [paper]
"Domain-Adaptive Neural Automated Essay Scoring" [SIGIR 2020] [2020-07] [paper]
"Enhancing Automated Essay Scoring Performance via Fine-tuning Pre-trained Language Models with Combination of Regression and Ranking" [EMNLP Findings 2020] [2020-11] [paper]
"Automated essay scoring: A review of the field" [CITS 2021] [2021-11] [paper]
"On the Use of BERT for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation" [NAACL 2022] [2022-05] [paper]
"Exploring the potential of using an AI language model for automated essay scoring" [Research Methods in Appl. Ling.] [2023-04] [paper]
"Automated evaluation of written discourse coherence using GPT-4" [BEA 2023] [2023-07] [paper]
"Rating Short L2 Essays on the CEFR Scale with GPT-4" [BEA 2023] [2023-07] [paper]
"Is GPT-4 a reliable rater? Evaluating Consistency in GPT-4 Text Ratings" [2023-08] [paper]
"FABRIC: Automated Scoring and Feedback Generation for Essays" [2023-10] [paper]
"From Automation to Augmentation: Large Language Models Elevating Essay Scoring Landscape" [2024-01] [paper]
"Can Language Models Evaluate Human Written Text? Case Study on Korean Student Writing for Education" [2024-07] [paper]

Math Evaluation

"Automatic Short Math Answer Grading via In-context Meta-learning" [2022-05] [paper]
"Algebra Error Classification with Large Language Models" [2023-05] [paper]
"Modeling and Analyzing Scorer Preferences in Short-Answer Math Questions" [2023-06] [paper]
"Automatically Detecting Incoherent Written Math Answers of Fourth-Graders" [2023-07] [Systems 2023] [paper]
"GPT-4 in Education: Evaluating Aptness, Reliability, and Loss of Coherence in Solving Calculus Problems and Grading Submissions" [2024-05] [paper]
"LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought" [2024-05] [paper]

Programming Evaluation

"Generating Feedback-Ladders for Logical Errors in Programming using Large Language Models" [2024-05] [paper]
"Evaluating Language Models for Generating and Judging Programming Feedback" [2024-07] [paper]

Other Subjects

[Astronomy] "Grading Massive Open Online Courses Using Large Language Models" [2024-06] [paper]
[AI] "Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course" [2024-07] [paper]
[Pronunciation] "Pronunciation Assessment with Multi-modal Large Language Models" [2024-07] [paper]

AIGC Detection in Education

"Chatting and cheating: Ensuring academic integrity in the era of ChatGPT" [IETI] [2023-03] [paper]
"ArguGPT: evaluating, understanding and identifying argumentative essays generated by GPT models" [2023-04] [paper]
"Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers" [npj Digital Medicine] [2023-04] [paper]
"Perception, performance, and detectability of conversational artificial intelligence across 32 university courses" [2023-05] [paper]
"Ghostbuster: Detecting Text Ghostwritten by Large Language Models" [2023-05] [paper]
"On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing" [2023-06] [paper]
"Hidding the Ghostwriters: An Adversarial Evaluation of AI-Generated Student Essay Detection" [EMNLP 2023] [2023-12] [paper]
"Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays" [Computers and Education: AI] [2024-01] [paper]
"Delving into ChatGPT usage in academic writing through excess vocabulary" [2024-06] [paper]

Education-related LLM Benchmarks

Programming

"Benchmarking Educational Program Repair" [2024-05] [paper]
"Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation" [2024-06] [paper]
"CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors" [2024-06] [paper]

Others

"EduNLP: Towards a Unified and Modularized Library for Educational Resources" [2024-06] [paper]

Awesome

Awesome-Education-LLM

Table of Contents

LLM-Assisted Teaching

Computer Science

Writing

Math

Medicine

Finance

Social Skills

Teacher Training

Miscellaneous

LLM for Exercise Generation

Language

Computer Science

Math

Medicine

Other Disciplines

LLM as Evaluator

Writing Evaluation

Math Evaluation

Programming Evaluation

Other Subjects

AIGC Detection in Education

Education-related LLM Benchmarks

Programming

Others