Home

Awesome

A Survey on the Honesty of Large Language Models

<img src="https://img.shields.io/badge/Version-1.0-blue.svg" alt="Version"> License: MIT

This repository offers a comprehensive collection of papers exploring the honesty of LLMs, covering its clarification, evaluation approaches, and strategies for improvement. Dive deeper into these studies by reading our in-depth survey: A Survey on the Honesty of Large Language Models.

Table of Content

🌟 Honesty in LLMs

What is Honesty in LLMs

<div align="center"> <img src="./assets/main_figure.jpg"> <p><em>Figure 1: An illustration of an honest LLM that demonstrates both self-knowledge and self-expression.</em></p> </div>

In this paper, we consider an LLM to be honest if it fulfills these two widely accepted criteria: <i>possessing both self-knowledge and self-expression</i>. Self-knowledge involves the model being aware of its own capabilities, recognizing what it knows and what it doesn’t, allowing it to acknowledge limitations or convey uncertainty when necessary. Self-expression refers to the model’s ability to faithfully express its knowledge, leading to reliable outputs. An illustrated example is shown in Fig. 1.

Self-knowledge

The self-knowledge capacity of LLMs hinges on their ability to recognize what they know and what they don’t know. This enables them to explicitly state “I don’t know” when lacking necessary knowledge, thereby avoiding making wrong statements. Additionally, it also allows them to provide confidence or uncertainty indicators in responses to reflect the likelihood of their correctness.

Self-expression

Self-expression refers to the model’s ability to express its knowledge faithfully, either parametric knowledge acquired through training or in-context knowledge. This enables the model to ground its responses in its knowledge rather than fabricating information.

📈 Evaluation of LLM Honesty

Self-knowledge

<div align="center"> <img src="./assets/evaluation_self_knowledge.jpg"> <p><em>Figure 2: Illustrations of self-knowledge evaluation, encompassing the recognition of known/unknown, calibration, and selective prediction. “Conf” indicates the LLM’s confidence score and “Acc” represents the accuracy of the response.</em></p> </div>

Recognition of Known/Unknown

Calibration

Selective Prediction

Self-expression

<div align="center"> <img src="./assets/evaluation_self_expression.jpg"> <p><em>Figure 3: Illustrations of self-expression evaluation, encompassing both identification-based and identification-free approaches.</em></p> </div>

Identification-based Evaluation

Identification-free Evaluation

🚀 Improvement of Self-knowledge

<div align="center"> <img src="./assets/improvement_self_knowledge.jpg"> <p><em>Figure 4: Improvement of self-knowledge, encompassing both training-based and training-free approaches.</em></p> </div>

Training-free Approaches

Predictive Probability

Prompting

Sampling and Aggregation

Training-based Approaches

Supervised Fine-tuning

Reinforcement Learning

Probing

🚀 Improvement of Self-expression

<div align="center"> <img src="./assets/improvement_self_expression.jpg"> <p><em>Figure 5: Improvement of self-expression, encompassing both training-based and training-free approaches.</em></p> </div>

Training-free Approaches

Prompting

Decoding-time Intervention

Sampling and Aggregation

Post-generation Revision

Training-based Approaches

Self-aware Fine-tuning

Self-supervised Fine-tuning

📌 Citation

If you find this resource valuable for your research, we would appreciate it if you could cite our paper. Thank you!

@article{li2024survey,
      title={A Survey on the Honesty of Large Language Models},
      author={Siheng Li and Cheng Yang and Taiqiang Wu and Chufan Shi and Yuji Zhang and Xinyu Zhu and Zesen Cheng and Deng Cai and Mo Yu and Lemao Liu and Jie Zhou and Yujiu Yang and Ngai Wong and Xixin Wu and Wai Lam},
      year={2024},
      journal={arXiv preprint arXiv:2409.18786}
}