Home

Awesome

Awesome Language Model Analysis Awesome

This paper list focuses on the theoretical and empirical analysis of language models, especially large language models (LLMs). The papers in this list investigate the learning behavior, generalization ability, and other properties of language models through theoretical analysis, empirical analysis, or a combination of both.

Scope of this list:

Limitations of this list:

Statistics of This paper list:

If you have any suggestions or want to contribute, please feel free to open an issue or a pull request.

For details on how to contribute, please refer to the contribution guidelines.

You can also share your thoughts and discuss with others in the Discussions.

[!NOTE]
For uncategorized version, please refer to here.

Table of Content

<!--ts--> <!--te-->

Phenomena of Interest

^ back to top ^

Categories focusing on different phenomena, properties, and behaviors observed in large language models (LLMs) and transformer-based models.

In-Context Learning

^ back to top ^

Papers focusing on the theoretical and empirical analysis of in-context learning in large language models.

<details open> <summary><em>paper list (click to fold / unfold)</em></summary> <br> </details>

Chain-of-Thought

^ back to top ^

Papers analyzing the chain-of-thought phenomenon in large language models, exploring theoretical and empirical perspectives.

<details open> <summary><em>paper list (click to fold / unfold)</em></summary> <br> </details>

Hallucination

^ back to top ^

Papers examining the hallucination phenomenon in language models, including both theoretical and empirical analysis.

<details open> <summary><em>paper list (click to fold / unfold)</em></summary> <br> </details>

Reversal Curse

^ back to top ^

Papers that analyze the reversal curse phenomenon in large language models.

<details open> <summary><em>paper list (click to fold / unfold)</em></summary> <br> </details>

Scaling Laws / Emergent Abilities / Grokking / etc.

^ back to top ^

Papers exploring how model performance scales with model size, data size, or computational resources, and the emergence of unexpected abilities.

<details open> <summary><em>paper list (click to fold / unfold)</em></summary> <br> </details>

Knowledge / Memory Mechanisms

^ back to top ^

Papers focusing on how large language models store, retrieve, and utilize knowledge, analyzing the memory mechanisms involved.

<details open> <summary><em>paper list (click to fold / unfold)</em></summary> <br> </details>

Training Dynamics / Landscape / Optimization / Fine-tuning / etc.

^ back to top ^

Papers discussing various aspects of the training process, including optimization, fine-tuning, and the training landscape of large language models.

<details open> <summary><em>paper list (click to fold / unfold)</em></summary> <br> </details>

Learning / Generalization / Reasoning / Weak to Strong Generalization

^ back to top ^

Papers analyzing the learning capabilities and generalization performance of language models, from weak to strong generalization.

<details open> <summary><em>paper list (click to fold / unfold)</em></summary> <br> </details>

Other Phenomena / Discoveries

^ back to top ^

Papers discussing other interesting phenomena or discoveries related to the behavior and properties of language models.

<details open> <summary><em>paper list (click to fold / unfold)</em></summary> <br> </details>

Representational Capacity

^ back to top ^

Categories focused on the representational capacities and limitations of transformers and language models.

What Can Transformer Do? / Properties of Transformer

^ back to top ^

Papers providing positive results into the capabilities and properties of transformer-based models, e.g., expressiveness and learning abilities.

<details open> <summary><em>paper list (click to fold / unfold)</em></summary> <br> </details>

What Can Transformer Not Do? / Limitation of Transformer

^ back to top ^

Papers investigating the limitations of transformer-based models, including expressiveness and learning constraints, e.g., limitations in reasoning.

<details open> <summary><em>paper list (click to fold / unfold)</em></summary> <br> </details>

Architectural Effectivity

^ back to top ^

Categories analyzing different architectural components and their effects in transformer models.

Layer-normalization

^ back to top ^

Papers discussing the role, effects, and optimization of layer normalization in transformer models.

<details open> <summary><em>paper list (click to fold / unfold)</em></summary> <br> </details>

Tokenization / Embedding

^ back to top ^

Papers focused on tokenization, embedding strategies, and input representations in language models.

<details open> <summary><em>paper list (click to fold / unfold)</em></summary> <br> </details>

Linear Attention / State Space Models / Recurrent Language Models / etc.

^ back to top ^

Papers analyzing alternative architectures to the standard transformer models, such as linear attention and state space models.

<details open> <summary><em>paper list (click to fold / unfold)</em></summary> <br> </details>

Training Paradigms

^ back to top ^

Categories discussing various training methodologies and paradigms for language models.

<details open> <summary><em>paper list (click to fold / unfold)</em></summary> <br> </details>

Mechanistic Engineering / Probing / Interpretability

^ back to top ^

Categories exploring the internal mechanisms and interpretability of language models.

<details open> <summary><em>paper list (click to fold / unfold)</em></summary> <br> </details>

Miscellanea

^ back to top ^

Categories for papers that do not fit neatly into other classifications but discuss theoretical or empirical aspects of language models.

<details open> <summary><em>paper list (click to fold / unfold)</em></summary> <br> </details>

Detailed Statistics


Related links:


Contact