Home

Awesome

pgsql-bloat-estimation

Queries to mesure statistical bloat in btree indexes and tables for PostgreSQL.

Three different kind of non used space should be considered:

ToC:

Bloat in Tables

The queries from the "table" folder estimate the bloat for tables. They expose these fields:

AS 7.4, 8.0 and 8.1 do not have fillfactor, extra_size, extra_pct and bloat_size are not reported.

Bloat in btree indexes

The queries from the "btree" folder estimate the bloat for btree indexes. They expose these fields:

The query in file btree_bloat-superuser.sql is much faster than other one. However, it must be executed by a superuser role only.

Caveats

The is_na column

This field allows you to filter out statistics considered wrong by the query itself. Just uncomment the WHERE clause.

This excludes:

Toasted fields

PostgreSQL has an internal mechanism to compress/slice large values from large rows outside of the heap space in a relation called TOAST. See the following page for more informations: https://www.postgresql.org/docs/current/storage-toast.html

Depending on your values, some fields might be toasted and some other not among the same table. Unfortunately, there's no statistics available about how much data has been toasted away from the heap, how much has been compressed inside the heap, their real size, etc. In short, the average field statistics in heap do not consider if the value is a toast pointer or a real value.

Because of this, statistics on variable length fields might be largely underestimated, even leading to negative bloat for some tables.

There's no way to include such situation in the existing is_na column.

Alignment padding

Unfortunately, as it is not possible to compute the space wasted by the alignment paddings, it is always reported in the bloat fields. Sometime, this space can takes up to 10% or more of the table size. See the chapter "The alignment deviation" from this page for more information.

This means you can estimate this space by running the query on non-bloated table. The bloat fields will then only report this alignment padding space. For large table, you can sample it in a smaller table of 100 pages or so, keeping the same field order. The bloat estimation query will report the same average space wasted by alignment padding from this table.

Size of tables/indexes

Small table or indexes (few pages) will certainly reports high bloat percentage. Each pages beeing 8kB, the less you have rows to fill them, the smaller they are, the more you will have natural spaces in there.

As example, if you need 100 rows to fill one page and your table have 150 rows, your table will be on 2 pages, 16kB. The second page having only 50 rows, You'll have a natural bloat of 4kB, 25% of your table.