Home
Awesome
awesome-long-context
Efficient Inference, Sparse Attention, Efficient KV Cache
[2020/01]
Reformer: The Efficient Transformer
[2020/06]
Linformer: Self-Attention with Linear Complexity
[2022/12]
Parallel Context Windows for Large Language Models
[2023/04]
Unlocking Context Constraints of LLMs: Enhancing Context Efficiency of LLMs with Self-Information-Based Content Filtering
[2023/05]
Landmark Attention: Random-Access Infinite Context Length for Transformers
[2023/05]
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time
[2023/06]
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
[2023/06]
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
[2023/07]
Scaling In-Context Demonstrations with Structured Attention
[2023/08]
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
[2023/09]
EFFICIENT STREAMING LANGUAGE MODELS WITH ATTENTION SINKS
[2023/10]
HyperAttention: Long-context Attention in Near-Linear Time
[2023/10]
TRAMS: Training-free Memory Selection for Long-range Language Modeling
External Memory & Information Retrieval
[2023/06]
Augmenting Language Models with Long-Term Memory
[2023/06]
Long-range Language Modeling with Self-retrieval
[2023/07]
Focused Transformer: Contrastive Training for Context Scaling
Positional Encoding
[2021/04]
RoFormer: Enhanced Transformer with Rotary Position Embedding
[2022/03]
Transformer Language Models without Positional Encodings Still Learn Positional Information
[2022/04]
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
[2022/05]
KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation
[2022/12]
Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis
[2022/12]
The Impact of Positional Encoding on Length Generalization in Transformers
[2023/05]
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
[2023/06]
Extending Context Window of Large Language Models via Positional Interpolation
[2023/07]
Exploring Transformer Extrapolation
[2023/09]
YaRN: Efficient Context Window Extension of Large Language Models
[2023/09]
Effective Long-Context Scaling of Foundation Models
[2023/10]
CLEX: Continuous Length Extrapolation for Large Language Models
Context Compression
[2022/12]
Structured Prompting: Scaling In-Context Learning to 1,000 Examples
[2023/05]
Efficient Prompting via Dynamic In-Context Learning
[2023/05]
Adapting Language Models to Compress Contexts
[2023/05]
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
[2023/07]
In-context Autoencoder for Context Compression in a Large Language Model
[2023/10]
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
[2023/10]
RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation
[2023/10]
Compressing Context to Enhance Inference Efficiency of Large Language Models
[2023/10]
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
[2023/10]
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
[2023/10]
TCRA-LLM: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction
Architecture Variances
[2021/11]
Efficiently Modeling Long Sequences with Structured State Spaces
[2022/12]
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
[2023/02]
Hyena Hierarchy: Towards Larger Convolutional Language Models
[2023/04]
Scaling Transformer to 1M tokens and beyond with RMT
[2023/06]
Block-State Transformer
[2023/07]
Retentive Network: A Successor to Transformer for Large Language Models
[2023/10]
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
[2023/10]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
White-Box
[2019/06]
Theoretical Limitations of Self-Attention in Neural Sequence Models
[2020/06]
$O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers
[2022/02]
Overcoming a Theoretical Limitation of Self-Attention
[2023/05]
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
[2023/10]
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Long Context Modeling
[2023/07]
LongNet: Scaling Transformers to 1,000,000,000 Tokens
[2023/08]
Giraffe: Adventures in Expanding Context Lengths in LLMs
[2023/09]
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
[2023/10]
Mistral 7B
Benchmarks
[2020/11]
Long Range Arena: A Benchmark for Efficient Transformers
[2022/01]
SCROLLS: Standardized CompaRison Over Long Language Sequences
[2023/01]
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
[2023/05]
ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding
[2023/08]
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
[2023/10]
M4LE: A MULTI-ABILITY MULTI-RANGE MULTITASK MULTI-DOMAIN LONG-CONTEXT EVALUATION BENCHMARK FOR LARGE LANGUAGE MODELS
Data
[2023/12]
Structured Packing in LLM Training Improves Long Context Utilization
[2024/01]
LongAlign: A Recipe for Long Context Alignment of Large Language Models
[2024/02]
Data Engineering for Scaling Language Models to 128K Context
Others
[2023/07]
Zero-th Order Algorithm for Softmax Attention Optimization
[2023/10]
(Dynamic) Prompting might be all you need to repair Compressed LLMs
[2023/10]
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors