Created at: 2024-11-04
This extension helps with removing bloat from tables and indexes. It can also optionally restore the physical order of clustered indexes.
Differently from VACUUM FULL, and CLUSTER this works without an exclusive lock on the table. This extension claims to perform as well as CLUSTER itself.
Here is what it is capable of doing:
VACUUM is one of the key maintenance operations in PostgreSQL, responsible for cleaning up dead tuples from the table pages, thereby freeing up space. This space can either be re-used for future insertions and updates or remain unused and lead to fragmentation. With massive data purge operations involving deletion of a significant percentage of data from a table, the amount of fragmentation may be significantly larger. Eventually, these fragmented pages increase the size of the table and utilize more disk space than required. Such fragmentation might distribute tuples across an unnecessarily large number of pages, creating performance bottlenecks. source
The problem with VACUUM FULL is that it blocks reads and writes.
An article worth reading is Amazon's "Reducing bloat in tables and indexes with pg_repac source
This is how you'd rebuild a table using pg_repack
# Perform a dry-run pass to see whether we can rebuild the table online
$ pg_repack --table foo -d my_db --dry-run
# To execute the rebuild
$ pg_repack --table foo -d my_db
INFO: repacking table "public.foo"
# It may throw this warning if the table doesn't have a PK or unique index
# non-nullable column:
WARNING: relation "public.foo" must have a primary key or not-null unique keys
Useful flags:
A useful flag is -e
, it will give more verbose information back to the
caller.
There are 7 steps performed by pg_repack here:
id | pk | row
----+----------+--------------------------------------------------------
1 | | (100012,my_name001,42,"2025-02-04 18:14:53.780287+13")
2 | | (100013,my_name003,42,"2025-02-04 18:14:53.785157+13")
3 | (100013) | (100013,my_name003,43,"2025-02-04 18:14:53.785157+13")
4 | (100012) |
5 | (100013) |
The above means that rows 100012 and 100013 were inserted and then deleted
(empty data in the row).Sometimes you need to rebuild an old table, creating a new table, and swap it by the new one.
One such case I've come across in production is that someone wanted to add an EXCLUSION constraint to a table with over 100 million rows and a lot of access across the codebase. EXCLUSION constraints can't be created without blocking reads and writes. In this case, a new table needed to be created, with the new exclusion constraint, and swapped with the old one.
I prefer to build things from source whenever I can. If you built postgres from source, you'll have to also install pg_repack from source as well.
POSTGRES_DIR=$HOME/workspace/postgres15
REPACK_VERSION=1.5.1
cd $POSTGRES_DIR
# I use the "build" folder for building artifacts:
cd build/
git clone git@github.com:reorg/pg_repack.git
cd pg_repack
git checkout tags/ver_$REPACK_VERSION
# PG repack needs to know where `pg_config` is. Remove the "bear" part if you
# don't want a compile_commands.json file at the end.
PG_CONFIG=$POSTGRES_DIR/build/bin/pg_config bear --output compile_commands.json -- make
PG_CONFIG=$POSTGRES_DIR/build/bin/pg_config make install
# To use the binary (command line) copy it to the postgres directory so that
# all binaries are in the same place.
cp bin/pg_repack $POSTGRES_DIR/build/bin
Now hop on psql
and create the extension in the database of choice:
CREATE EXTENSION pg_repack;
Here's some instructions to run it via docker.
Create tables and data:
Note: The bellow requires the btree_gist
extension to be installed.
DROP TABLE IF EXISTS table_for_repack;
CREATE TABLE table_for_repack (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
value INTEGER NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW(),
CONSTRAINT value_check CHECK (value >= 0),
CONSTRAINT name_unique UNIQUE (name)
);
CREATE INDEX idx_value ON table_for_repack (value);
CREATE INDEX idx_created_at ON table_for_repack (created_at);
-- Insert 100_000 rows
INSERT INTO table_for_repack (name, value)
SELECT
'item_' || generate_series(1, 100000) AS name,
(generate_series(1, 100000) % 100000) + 1 AS value;
DROP TABLE IF EXISTS table_with_fk;
CREATE TABLE table_with_fk (
id SERIAL PRIMARY KEY,
table_for_repack_id INTEGER NOT NULL,
CONSTRAINT fk_table_for_repack FOREIGN KEY (table_for_repack_id)
REFERENCES table_for_repack(id)
ON DELETE CASCADE
ON UPDATE CASCADE
);
-- Insert 100_000 rows
INSERT INTO table_with_fk (table_for_repack_id)
SELECT generate_series(1, 100000);
Now let's run pg_repack in and print the statements:
POSTGRES_DIR=$HOME/workspace/postgres15
PG_REPACK_BIN=$POSTGRES_DIR/build/bin/pg_repack
TABLE=table_for_repack
DB_NAME=repack_test
$PG_REPACK_BIN --table $TABLE -d $DB_NAME --echo --elevel=DEBUG
This is what I get in return:
$PG_REPACK_BIN --table table_for_repack -d repack_test --echo --elevel=DEBUG
Process 67229 launched: '/Users/marcelo.fernandes/workspace/postgres15/build/bin/pg_repack' (arm64)
DEBUG: No workers to disconnect.
LOG: (query) SET search_path TO pg_catalog, pg_temp, public
LOG: (query) SET search_path TO pg_catalog, pg_temp, public
LOG: (query) select repack.version(), repack.version_sql()
LOG: (query) SET statement_timeout = 0
LOG: (query) SET search_path = pg_catalog, pg_temp, public
LOG: (query) SET client_min_messages = warning
LOG: (query) SELECT r FROM (VALUES ($1, 'r')) AS given_t(r,kind) WHERE NOT EXISTS( SELECT FROM repack.tables WHERE relid=to_regclass(given_t.r)) AND NOT EXISTS( SELECT FROM pg_catalog.pg_class c WHERE c.oid=to_regclass(given_t.r) AND c.relkind = given_t.kind AND given_t.kind = 'p')
LOG: (param:0) = table_for_repack
LOG: (query) SELECT t.*, coalesce(v.tablespace, t.tablespace_orig) as tablespace_dest FROM repack.tables t, (VALUES ($1::text)) as v (tablespace) WHERE (relid = $2::regclass) ORDER BY t.relname, t.schemaname
LOG: (param:0) = (null)
LOG: (param:1) = table_for_repack
INFO: repacking table "public.table_for_repack"
DEBUG: ---- repack_one_table ----
DEBUG: target_name : public.table_for_repack
DEBUG: target_oid : 2094620
DEBUG: target_toast : 0
DEBUG: target_tidx : 0
DEBUG: pkid : 2094626
DEBUG: ckid : 0
DEBUG: create_pktype : SELECT repack.create_index_type(2094626,2094620)
DEBUG: create_log : SELECT repack.create_log_table(2094620)
DEBUG: create_trigger : CREATE TRIGGER repack_trigger AFTER INSERT OR DELETE OR UPDATE ON public.table_for_repack FOR EACH ROW EXECUTE PROCEDURE repack.repack_trigger('id')
DEBUG: enable_trigger : ALTER TABLE public.table_for_repack ENABLE ALWAYS TRIGGER repack_trigger
DEBUG: create_table : SELECT repack.create_table($1, $2)
DEBUG: dest_tablespace : pg_default
DEBUG: copy_data : INSERT INTO repack.table_2094620 SELECT id,name,value,created_at FROM ONLY public.table_for_repack
DEBUG: alter_col_storage : (skipped)
DEBUG: drop_columns : (skipped)
DEBUG: delete_log : DELETE FROM repack.log_2094620
DEBUG: lock_table : LOCK TABLE public.table_for_repack IN ACCESS EXCLUSIVE MODE
DEBUG: sql_peek : SELECT * FROM repack.log_2094620 ORDER BY id LIMIT $1
DEBUG: sql_insert : INSERT INTO repack.table_2094620 VALUES ($1.*)
DEBUG: sql_delete : DELETE FROM repack.table_2094620 WHERE (id) = ($1.id)
DEBUG: sql_update : UPDATE repack.table_2094620 SET (id, name, value, created_at) = ($2.id, $2.name, $2.value, $2.created_at) WHERE (id) = ($1.id)
DEBUG: sql_pop : DELETE FROM repack.log_2094620 WHERE id IN (
DEBUG: ---- setup ----
LOG: (query) SELECT pg_try_advisory_lock($1, CAST(-2147483648 + $2::bigint AS integer))
LOG: (param:0) = 16185446
LOG: (param:1) = 2094620
LOG: (query) BEGIN ISOLATION LEVEL READ COMMITTED
LOG: (query) SET LOCAL lock_timeout = 100
LOG: (query) LOCK TABLE public.table_for_repack IN ACCESS EXCLUSIVE MODE
LOG: (query) RESET lock_timeout
LOG: (query) SELECT pg_get_indexdef(indexrelid) FROM pg_index WHERE indrelid = $1 AND NOT indisvalid
LOG: (param:0) = 2094620
LOG: (query) SELECT indexrelid, repack.repack_indexdef(indexrelid, indrelid, $2, FALSE) FROM pg_index WHERE indrelid = $1 AND indisvalid
LOG: (param:0) = 2094620
LOG: (param:1) = (null)
DEBUG: index[0].target_oid : 2094626
DEBUG: index[0].create_index : CREATE UNIQUE INDEX index_2094626 ON repack.table_2094620 USING btree (id)
DEBUG: index[1].target_oid : 2094628
DEBUG: index[1].create_index : CREATE UNIQUE INDEX index_2094628 ON repack.table_2094620 USING btree (name)
DEBUG: index[2].target_oid : 2094630
DEBUG: index[2].create_index : CREATE INDEX index_2094630 ON repack.table_2094620 USING btree (value)
DEBUG: index[3].target_oid : 2094631
DEBUG: index[3].create_index : CREATE INDEX index_2094631 ON repack.table_2094620 USING btree (created_at)
LOG: (query) SELECT repack.conflicted_triggers($1)
LOG: (param:0) = 2094620
LOG: (query) SELECT repack.create_index_type(2094626,2094620)
LOG: (query) SELECT repack.create_log_table(2094620)
LOG: (query) CREATE TRIGGER repack_trigger AFTER INSERT OR DELETE OR UPDATE ON public.table_for_repack FOR EACH ROW EXECUTE PROCEDURE repack.repack_trigger('id')
LOG: (query) ALTER TABLE public.table_for_repack ENABLE ALWAYS TRIGGER repack_trigger
LOG: (query) SELECT repack.disable_autovacuum('repack.log_2094620')
LOG: (query) BEGIN ISOLATION LEVEL READ COMMITTED
LOG: (query) SELECT pg_backend_pid()
DEBUG: LOCK TABLE public.table_for_repack IN ACCESS SHARE MODE
LOG: (query) SELECT pid FROM pg_locks WHERE locktype = 'relation' AND granted = false AND relation = 2094620 AND mode = 'AccessExclusiveLock' AND pid <> pg_backend_pid()
DEBUG: No competing DDL to cancel.
LOG: (query) COMMIT
DEBUG: Waiting on ACCESS SHARE lock...
DEBUG: ---- copy tuples ----
LOG: (query) BEGIN ISOLATION LEVEL SERIALIZABLE
LOG: (query) SELECT set_config('work_mem', current_setting('maintenance_work_mem'), true)
LOG: (query) SELECT coalesce(array_agg(l.virtualtransaction), '{}') FROM pg_locks AS l LEFT JOIN pg_stat_activity AS a ON l.pid = a.pid LEFT JOIN pg_database AS d ON a.datid = d.oid WHERE l.locktype = 'virtualxid' AND l.pid NOT IN (pg_backend_pid(), $1) AND (l.virtualxid, l.virtualtransaction) <> ('1/1', '-1/0') AND (a.application_name IS NULL OR a.application_name <> $2) AND a.query !~* E'^\\s*vacuum\\s+' AND a.query !~ E'^autovacuum: ' AND ((d.datname IS NULL OR d.datname = current_database()) OR l.database = 0)
LOG: (param:0) = 67243
LOG: (param:1) = pg_repack
LOG: (query) DELETE FROM repack.log_2094620
LOG: (query) SAVEPOINT repack_sp1
LOG: (query) SELECT pid FROM pg_locks WHERE locktype = 'relation' AND granted = false AND relation = 2094620 AND mode = 'AccessExclusiveLock' AND pid <> pg_backend_pid()
DEBUG: No competing DDL to cancel.
LOG: (query) SET LOCAL lock_timeout = 100
LOG: (query) LOCK TABLE public.table_for_repack IN ACCESS SHARE MODE
LOG: (query) RESET lock_timeout
LOG: (query) SELECT repack.create_table($1, $2)
LOG: (param:0) = 2094620
LOG: (param:1) = pg_default
LOG: (query) INSERT INTO repack.table_2094620 SELECT id,name,value,created_at FROM ONLY public.table_for_repack
LOG: (query) SELECT repack.disable_autovacuum('repack.table_2094620')
LOG: (query) COMMIT
LOG: (query) SELECT 'repack.table_2094620'::regclass::oid
DEBUG: ---- create indexes ----
DEBUG: Have 4 indexes and num_workers=0
DEBUG: set up index_jobs [0]
DEBUG: target_oid : 2094626
DEBUG: create_index : CREATE UNIQUE INDEX index_2094626 ON repack.table_2094620 USING btree (id)
LOG: (query) CREATE UNIQUE INDEX index_2094626 ON repack.table_2094620 USING btree (id)
DEBUG: set up index_jobs [1]
DEBUG: target_oid : 2094628
DEBUG: create_index : CREATE UNIQUE INDEX index_2094628 ON repack.table_2094620 USING btree (name)
LOG: (query) CREATE UNIQUE INDEX index_2094628 ON repack.table_2094620 USING btree (name)
DEBUG: set up index_jobs [2]
DEBUG: target_oid : 2094630
DEBUG: create_index : CREATE INDEX index_2094630 ON repack.table_2094620 USING btree (value)
LOG: (query) CREATE INDEX index_2094630 ON repack.table_2094620 USING btree (value)
DEBUG: set up index_jobs [3]
DEBUG: target_oid : 2094631
DEBUG: create_index : CREATE INDEX index_2094631 ON repack.table_2094620 USING btree (created_at)
LOG: (query) CREATE INDEX index_2094631 ON repack.table_2094620 USING btree (created_at)
LOG: (query) SELECT repack.repack_apply($1, $2, $3, $4, $5, $6)
LOG: (param:0) = SELECT * FROM repack.log_2094620 ORDER BY id LIMIT $1
LOG: (param:1) = INSERT INTO repack.table_2094620 VALUES ($1.*)
LOG: (param:2) = DELETE FROM repack.table_2094620 WHERE (id) = ($1.id)
LOG: (param:3) = UPDATE repack.table_2094620 SET (id, name, value, created_at) = ($2.id, $2.name, $2.value, $2.created_at) WHERE (id) = ($1.id)
LOG: (param:4) = DELETE FROM repack.log_2094620 WHERE id IN (
LOG: (param:5) = 1000
LOG: (query) SELECT pid FROM pg_locks WHERE locktype = 'virtualxid' AND pid <> pg_backend_pid() AND virtualtransaction = ANY($1)
LOG: (param:0) = {}
DEBUG: ---- swap ----
LOG: (query) SAVEPOINT repack_sp1
LOG: (query) SET LOCAL lock_timeout = 100
LOG: (query) LOCK TABLE public.table_for_repack IN ACCESS EXCLUSIVE MODE
LOG: (query) RESET lock_timeout
LOG: (query) SAVEPOINT repack_sp1
LOG: (query) SET LOCAL lock_timeout = 100
LOG: (query) LOCK TABLE repack.table_2094620 IN ACCESS EXCLUSIVE MODE
LOG: (query) RESET lock_timeout
LOG: (query) SELECT repack.repack_apply($1, $2, $3, $4, $5, $6)
LOG: (param:0) = SELECT * FROM repack.log_2094620 ORDER BY id LIMIT $1
LOG: (param:1) = INSERT INTO repack.table_2094620 VALUES ($1.*)
LOG: (param:2) = DELETE FROM repack.table_2094620 WHERE (id) = ($1.id)
LOG: (param:3) = UPDATE repack.table_2094620 SET (id, name, value, created_at) = ($2.id, $2.name, $2.value, $2.created_at) WHERE (id) = ($1.id)
LOG: (param:4) = DELETE FROM repack.log_2094620 WHERE id IN (
LOG: (param:5) = 0
LOG: (query) SELECT repack.repack_swap($1)
LOG: (param:0) = 2094620
LOG: (query) COMMIT
DEBUG: ---- drop ----
LOG: (query) BEGIN ISOLATION LEVEL READ COMMITTED
LOG: (query) SAVEPOINT repack_sp1
LOG: (query) SET LOCAL lock_timeout = 100
LOG: (query) LOCK TABLE public.table_for_repack IN ACCESS EXCLUSIVE MODE
LOG: (query) RESET lock_timeout
LOG: (query) SELECT repack.repack_drop($1, $2)
LOG: (param:0) = 2094620
LOG: (param:1) = 4
LOG: (query) COMMIT
DEBUG: ---- analyze ----
LOG: (query) BEGIN ISOLATION LEVEL READ COMMITTED
LOG: (query) ANALYZE public.table_for_repack
LOG: (query) COMMIT
LOG: (query) SELECT pg_advisory_unlock($1, CAST(-2147483648 + $2::bigint AS integer))
LOG: (param:0) = 16185446
LOG: (param:1) = 2094620
DEBUG: No workers to disconnect.
Process 67229 exited with status = 0 (0x00000000)
TODO: Check log of all queries on postgres side
The best way to understand this is by reading through the code of the main
function that gets executed when the binary runs. The code can be accessed
here.
Straight away, this function uses the pgut
library a lot. This library
is present also in pg_reorg
. The commit that added it into pg_repack
(574b6dc2) only states "Support PGXS".
In summary:
pgut
with the command line arguments (via pgut_getopt
). This
will set the program name to pg_repack
, set the postgres locally for the
command so that it can run from the scripts folder, and set up interruption
handlers so that pgut disconnects at the end.--tablespace=TBLSPC
. Performs this query:select spcname from pg_tablespace where spcname = $1;
orderby
).repack_one_database
or repack_all_databases
.This is just a wrapper for repacking a table. It will collate a few bits of info:
And then will try to do a few things:
SET statement_timeout = 0;
SET search_path = pg_catalog, pg_temp, public;
SET client_min_messages = warning;
(gets rid of annoying "create
implicit..." messages)-- creates a temporary table with two columns: r and kind.
-- $1 is a placeholder for a parameter that will be provided when the
-- query is executed. In this case, it's the table name "table_for_repack".
-- AS given_t(r, kind) renames the columns of the temporary values table to r
-- and kind.
-- given_t is the alias of the temporary table.
--
SELECT r FROM (VALUES ($1, 'r')) AS given_t(r,kind)
WHERE NOT EXISTS (
SELECT FROM repack.tables
WHERE relid=to_regclass(given_t.r)
) AND NOT EXISTS (
SELECT FROM pg_catalog.pg_class c
WHERE c.oid=to_regclass(given_t.r)
AND c.relkind = given_t.kind
-- Check if the kind is 'p' (which stands for partitioned tables in
-- PostgreSQL). So, this is filtering for partitioned tables only.
-- Partitioned tables aren't supported.
AND given_t.kind = 'p'
)
SELECT t.*, coalesce(v.tablespace, t.tablespace_orig) as tablespace_dest
FROM repack.tables t,
(VALUES ($1::text)) as v (tablespace)
WHERE (relid = $2::regclass) ORDER BY t.relname, t.schemaname
repack_table
. This is an
example of what it holds in our sandbox case:(repack_table) {
target_name = 0x00000001398087d8 "public.table_for_repack"
target_oid = 2093608
target_toast = 0
target_tidx = 0
pkid = 2093614
ckid = 0
temp_oid = 0
create_pktype = 0x000000013980880b "SELECT repack.create_index_type(2093614,2093608)"
create_log = 0x000000013980883c "SELECT repack.create_log_table(2093608)"
create_trigger = 0x0000000139808864 "CREATE TRIGGER repack_trigger AFTER INSERT OR DELETE OR UPDATE ON public.table_for_repack FOR EACH ROW EXECUTE PROCEDURE repack.repack_trigger('id')"
enable_trigger = 0x00000001398088f9 "ALTER TABLE public.table_for_repack ENABLE ALWAYS TRIGGER repack_trigger"
create_table = 0x0000000139808942 "SELECT repack.create_table($1, $2)"
dest_tablespace = 0x0000000139808b8b "pg_default"
copy_data = 0x0000000139808970 "INSERT INTO repack.table_2093608 SELECT id,name,value,created_at FROM ONLY public.table_for_repack"
alter_col_storage = 0x0000000000000000
drop_columns = 0x0000000000000000
delete_log = 0x00000001398089d3 "DELETE FROM repack.log_2093608"
lock_table = 0x0000000139808a08 "LOCK TABLE public.table_for_repack IN ACCESS EXCLUSIVE MODE"
sql_peek = 0x0000000139808a44 "SELECT * FROM repack.log_2093608 ORDER BY id LIMIT $1"
sql_insert = 0x0000000139808a7a "INSERT INTO repack.table_2093608 VALUES ($1.*)"
sql_delete = 0x0000000139808aa9 "DELETE FROM repack.table_2093608 WHERE (id) = ($1.id)"
sql_update = 0x0000000139808adf "UPDATE repack.table_2093608 SET (id, name, value, created_at) = ($2.id, $2.name, $2.value, $2.created_at) WHERE (id) = ($1.id)"
sql_pop = 0x0000000139808b5e "DELETE FROM repack.log_2093608 WHERE id IN ("
n_indexes = 0
indexes = NULL
}
repack_one_table
.This function effectivelly performs the repacking. During its set up, it will print the following under DEBUG mode:
INFO: repacking table "public.table_for_repack"
DEBUG: ---- repack_one_table ----
DEBUG: target_name : public.table_for_repack
DEBUG: target_oid : 2093608
DEBUG: target_toast : 0
DEBUG: target_tidx : 0
DEBUG: pkid : 2093614
DEBUG: ckid : 0
DEBUG: create_pktype : SELECT repack.create_index_type(2093614,2093608)
DEBUG: create_log : SELECT repack.create_log_table(2093608)
DEBUG: create_trigger : CREATE TRIGGER repack_trigger AFTER INSERT OR DELETE OR UPDATE ON public.table_for_repack FOR EACH ROW EXECUTE PROCEDURE repack.repack_trigger('id')
DEBUG: enable_trigger : ALTER TABLE public.table_for_repack ENABLE ALWAYS TRIGGER repack_trigger
DEBUG: create_table : SELECT repack.create_table($1, $2)
DEBUG: dest_tablespace : pg_default
DEBUG: copy_data : INSERT INTO repack.table_2093608 SELECT id,name,value,created_at FROM ONLY public.table_for_repack
DEBUG: alter_col_storage : (skipped)
DEBUG: drop_columns : (skipped)
DEBUG: delete_log : DELETE FROM repack.log_2093608
DEBUG: lock_table : LOCK TABLE public.table_for_repack IN ACCESS EXCLUSIVE MODE
DEBUG: sql_peek : SELECT * FROM repack.log_2093608 ORDER BY id LIMIT $1
DEBUG: sql_insert : INSERT INTO repack.table_2093608 VALUES ($1.*)
DEBUG: sql_delete : DELETE FROM repack.table_2093608 WHERE (id) = ($1.id)
DEBUG: sql_update : UPDATE repack.table_2093608 SET (id, name, value, created_at) = ($2.id, $2.name, $2.value, $2.created_at) WHERE (id) = ($1.id)
DEBUG: sql_pop : DELETE FROM repack.log_2093608 WHERE id IN (
Then:
pg_repack
of happening on the same table.table_for_repack
in access exclusive mode. This helps
with doing pg_get_indexdef
calls (which require an access share lock
anyway) and making sure they won't be blocked.SELECT pg_get_indexdef(indexrelid)
FROM pg_index WHERE indrelid = $1 AND NOT indisvalid
-- LOG: (query) SELECT pg_get_indexdef(indexrelid) FROM pg_index
-- WHERE indrelid = $1 AND NOT indisvalid
-- LOG: (param:0) = 2093608
This query checks if there's any invalid index present on the table, and
raises an error if --error-on-invalid-index
is not set.SELECT indexrelid, repack.repack_indexdef(indexrelid, indrelid, $2, FALSE)
FROM pg_index WHERE indrelid = $1 AND indisvalid
-- LOG: (param:0) = 2093608
This query will store the indexrelid
information, and the
repack_indexdef
function figures out what the definition of that index has
to be for the table to be repacked.SELECT repack.create_index_type(2093614,2093608)
.SELECT repack.create_log_table(2093608)
.CREATE TRIGGER repack_trigger AFTER INSERT OR DELETE OR UPDATE ON
public.table_for_repack
FOR EACH ROW EXECUTE PROCEDURE repack.repack_trigger('id')
ALTER TABLE public.table_for_repack ENABLE ALWAYS TRIGGER repack_trigger
.SELECT repack.disable_autovacuum('repack.log_2093608')
I'm not sure why this is needed. Worth checking history of this (see
bin/pg_repack.c:1389)table_for_repack
in
AccessShare mode to avoid DDLs trying to change that table of running
before we repack. That would affect our repacking process. DDLs are killed
if that's the case.
LOCK TABLE public.table_for_repack IN ACCESS SHARE MODE
.Now the process of starting to backfill starts
SET work_mem = maintanence_work_mem
SELECT set_config('work_mem', current_setting('maintenance_work_mem'), true)", 0, NULL);
DELETE FROM repack.log_2093608
. This truncates the
table.default
. (Not sure when this happens).SELECT repack.create_table($1, $2)
LOG: (query) SELECT repack.create_table($1, $2)
LOG: (param:0) = 2093608
LOG: (param:1) = pg_default
INSERT INTO repack.table_2093608 SELECT id,name,value,created_at
FROM ONLY public.table_for_repack
SELECT repack.disable_autovacuum('repack.table_2093608')
SELECT 'repack.table_2093608'::regclass::oid
apply_log
on a ethernal loop which runsLOG: (query) SELECT repack.repack_apply($1, $2, $3, $4, $5, $6)
LOG: (param:0) = SELECT * FROM repack.log_2093608 ORDER BY id LIMIT $1
LOG: (param:1) = INSERT INTO repack.table_2093608 VALUES ($1.*)
LOG: (param:2) = DELETE FROM repack.table_2093608 WHERE (id) = ($1.id)
LOG: (param:3) = UPDATE repack.table_2093608 SET (id, name, value, created_at) = ($2.id, $2.name, $2.value, $2.created_at) WHERE (id) = ($1.id)
LOG: (param:4) = DELETE FROM repack.log_2093608 WHERE id IN (
LOG: (param:5) = 1000
TODO: Learn about apply_log
.ANALYZE public.table_for_repack
(why?)repack_drop
The function has 6 arguments and returns the number of performed operations.
* @param sql_peek SQL to pop tuple from log table.
* @param sql_insert SQL to insert into temp table.
* @param sql_delete SQL to delete from temp table.
* @param sql_update SQL to update temp table.
* @param sql_pop SQL to bulk-delete tuples from log table.
* @param count Max number of operations, or no count iff <=0.
* @retval Number of performed operations.
In my example:
sql_peek: SELECT * FROM repack.log_2093608 ORDER BY id LIMIT $1
sql_insert: INSERT INTO repack.table_2093608 VALUES ($1.*)
sql_delete: DELETE FROM repack.table_2093608 WHERE (id) = ($1.id)
sql_update: UPDATE repack.table_2093608 SET (id, name, value, created_at) = (
$2.id, $2.name, $2.value, $2.created_at
) WHERE (id) = ($1.id)
sql_pop: DELETE FROM repack.log_2093608 WHERE id IN (
count: 1000
The repack_prepare
function parses and prepares the SQL to be executed, but
doesn't run it. It's the job of execute_plan
to do so.
I've used this script to quickly recompile pg_repack
:
run_lldb() {
POSTGRES_DIR=$HOME/workspace/postgres15
REPACK_VERSION=1.5.1
cd $POSTGRES_DIR
cd build/
cd pg_repack
PG_CONFIG=$POSTGRES_DIR/build/bin/pg_config make
PG_CONFIG=$POSTGRES_DIR/build/bin/pg_config make install
cp bin/pg_repack $POSTGRES_DIR/build/bin
PG_REPACK_BIN=$POSTGRES_DIR/build/bin/pg_repack
TABLE=table_for_repack
DB_NAME=repack_test
lldb -- $PG_REPACK_BIN
}
run_lldb
(lldb) b b
(lldb) run --table table_for_repack -d repack_test --echo --elevel=DEBUG
Additionally, some of the queries triggered from inside the extension don't
really show up on the logs. I had to manually hack the places where I wanted
to debug with elog
calls such as:
elog(LOG, "[REPACK_TEST] SPI_execute: %s", sql);
For this to take effect, remember to recompile the C project and then:
DROP EXTENSION pg_repack CASCADE; CREATE EXTENSION pg_repack;
Also, if I wanted to pause the program I dropped some sleep()
calls, this was
helpful when I wanted to throw some more data in the table_to_repack while the
repack apply process was running.
INSERT INTO table_for_repack (name, value) VALUES ('my_name000', 42);
INSERT INTO table_for_repack (name, value) VALUES ('my_name001', 42);
INSERT INTO table_for_repack (name, value) VALUES ('my_name003', 42);
UPDATE table_for_repack SET value = 43 WHERE name = 'my_name003';
DELETE FROM table_for_repack WHERE name = 'my_name001';
DELETE FROM table_for_repack WHERE name = 'my_name002';
DELETE FROM table_for_repack WHERE name = 'my_name003';
-- This has to be run very fast after these update queries!
SELECT * FROM repack.log_2094620;
--
-- id | pk | row
------+----------+--------------------------------------------------------
-- 1 | | (100012,my_name001,42,"2025-02-04 18:14:53.780287+13")
-- 2 | | (100013,my_name003,42,"2025-02-04 18:14:53.785157+13")
-- 3 | (100013) | (100013,my_name003,43,"2025-02-04 18:14:53.785157+13")
-- 4 | (100012) |
-- 5 | (100013) |