Home

Awesome

Readings in Databases

A list of papers essential to understanding databases and building new data systems. The list is curated and maintained by Reynold Xin (@rxin). If you think a paper should be part of this list, please submit a pull request. It might take a while since I need to go over the paper.

If you are reading this and taking the effort to understand these papers, we would love to talk to you about opportunities at Databricks.

<a name='TOC'>Table of Contents</a>

  1. Basics and Algorithms
  2. Essentials of Relational Databases
  3. Classic System Design
  4. Columnar Databases
  5. Data-Parallel Computation
  6. Consensus and Consistency
  7. Trends (Cloud Computing, Warehouse-scale Computing, New Hardware)
  8. Miscellaneous
  9. External Reading Lists

<a name='basic-and-algo'> Basics and Algorithms

<a name='essentials'> Essentials of Relational Databases

<a name='system-design'> Classic System Design

<a name='column'> Columnar Databases

Columnar storage and column-oriented query engine are critical to analytical workloads, e.g. OLAP. It's been 20 years since it first came out (the MonetDB paper in 1999), and almost every commercial warehouse database has a columnar engine by now.

<a name='data-parallel'> Data-Parallel Computation

<a name='consensus'> Consensus and Consistency

<a name='trends'> Trends (Cloud Computing, Warehouse-scale Computing, New Hardware)

<a name='misc'> Miscellaneous

<a name='external'> External Reading Lists

A number of schools have their own reading lists for graduate students in databases.