2024-07-08
BRIN is designed for handling very large tables in which certain columns have some natural correlation with their physical location within the table.
BRIN works in terms of block ranges (or “page ranges”). A block range is a group of pages that are physically adjacent in the table; for each block range, some summary info is stored by the index. For example, a table storing a store's sale orders might have a date column on which each order was placed, and most of the time the entries for earlier orders will appear earlier in the table as well; a table storing a ZIP code column might have all codes for a city grouped together naturally.
In general, BRIN requires very low INSERT-time overhead, and very small index size.
The documentation provides a few examples already. I found it very useful when data has a timestamp that is always going up.
Data stored in tables are arranged in units called "pages" that store 8kb of data (default). A table can thus be described in physical disk as a collection of pages.
In each page, rows are stored at the front of the page and there might be empty blocks as data gets removed or updated.
Some spare space at the end of the page might also be present for future data.
PAGE representation:
| row_1 | row_2 | row_3 | | row_4 | spare |
|-------|-------|-------|-------|-------|-------|
| data | data | data | empty | data | empty |
If a table is narrow, that is, it doesn't have many rows and these rows store values of small size on disk, a page might hold many rows worth of data.
Therefore, for each page, we can determine the max and the min values of a particular column inside that page. When Postgres searches for a value, it can go straight to a page and decide whether to ignore or include rows from that page in the result if the max-min range is within the range specified in the search.
When the layout of the table on disk and the ordering of the column to be filtered for are strongly correlated.
TABLE representation on disk
with strong datestamp correlation
(min-max ranges)
| page_1 | page_2 | page_3 | page_4 | page_n... |
|-----------|-----------|-----------|-----------|-----------------|
| 2010-2011 | 2011-2012 | 2012-2013 | 2013-2014 | 2014-newest... |
This information can be found by the following query:
/* Check correlation between the disk layout ordering
* and the ordering of the column to filter for:
*/
SELECT correlation
FROM pg_stats
WHERE tablename = 'table_name' AND attname = 'column_name'
The BRIN index is just a simple structure mapping ranges of values to ranges of pages:
| page_range | 1-2 | 2-4 | ... |
|-------------|-----------|-----------|------|
| value_range | 2010-2012 | 2012-2014 | ... |
The user has the option to select a range of page values that the index should host - the default being 128 pages per range unit. Tuning this number can make a considerable difference in query performance.
pages_per_range
First you need to know how many rows "fit" within a page of your table. To get a rough estimation you can perform these queries:
-- Get the number of pages/blocks for the table
select relpages from pg_class where relname = 'my_table';
-- Get the table count
select count(*) from my_table;
-- Divide the number of rows above by
-- the number of pages/blocks to get an
-- estimated rough number of how many
-- rows fit in a single page/block.
The value is not perfect, but it is accurate enough to get an estimation.
The next step is to understand the use of your table.
In my case, I had a table that I wanted to query within a range from 1h to about 24h.
SELECT *
FROM my_table
WHERE date_time BETWEEN (
'2000-01-01 00:00:00+00'
AND
'2000-01-01 23:59:59+00'
)
So assuming that my table has:
That means that we have:
I want to have enough pages in the index map so that my number of pages per range is not too high so that Postgres have to skip too many rows to compute the correct result. That means mapping less than 2,150 pages per index entry. In fact much less than that.
I would reduce that number by an order of magnitude and pick 215
as my value
of pages_per_range
. I could potentially go further than that and reduce the
number by another order of magnitude (21).
TIP: If the index performance isn't good enough, I would reduce the value
of pages_per_range
to half and try again. This would make the index twice as
fast, but twice as big. This process can be repeated until the proper final
value is found.
Similarly, if performance is sufficient but the index size is too big, the
value of pages_per_range
can be increased.
You want to turn autosummarize on with your index in most cases. This means that one a range becomes full, it is summarised.
If this value is false, summarisation will only occur via vacuum or autovacuum. Depending on your infrastructure, these operations might not happen frequently, and thus many unsummarised index entries would be hanging around.
For our test scenario above, our SQL would be:
CREATE INDEX CONCURRENTLY "mytable_column_brin_idx" ON "my_table"
USING brin ("column_name") WITH (autosummarize = on, pages_per_range = 215);