Marcelo Fernandes

Profiling a Django Migration in Postgres

2025-02-17T00:00:00Z

Profiling a Django Migration in Postgres

Created at: 2025-02-17

In this post I want to start from the end. I want to look into the SQL for a particular schema change, and then verify whether a Django migration that produces this change is safe to run in production or not.

Let's start with a question: Is the following schema change safe to run in a production database?

ALTER TABLE foo ADD COLUMN bar int NOT NULL DEFAULT 1234;

In this hypothetical scenario, foo is:

Fairly large (over 100GB).
Used in anger in production.
Running on a supported Postgres version (> v12).

Without answering the question yet, I want you to consider this other statement:

ALTER TABLE foo ADD COLUMN buzz int NOT NULL DEFAULT (random() * 10000)::int;

So, have you figured if either (or both) of those are safe to run?

If not, you might want to start thinking about what Postgres would have to do in order to have a NOT NULL column with a DEFAULT value.

Would it need to scan the table and store those values in existing rows? What if the new rows didn't fit in the page? Is there a way to do it so that Postgres doesn't need to scan the table?

One of the worst things that can happen when you perform a schema change is for it to end up rewriting the table. Rewriting takes time, and while the table is being rewritten, the DDL statement will be holding an access exclusive lock, not permitting any other sessions and transactions to read or write to the table.

Supposition: The table is rewritten

So starting with the first statement, let's investigate whether it rewrites the table or not. We will first need to get the foo table up, and populate it.

-- Create the table
DROP TABLE IF EXISTS foo;
CREATE TABLE foo (id SERIAL PRIMARY KEY);

-- Insert 100_000 rows.
INSERT INTO foo (id) SELECT generate_series(1, 100000);

Next, we want to know what Postgres is doing internally. For that, we'll need to profile what happens when the ALTER TABLE command is running.

Note: As I am writing this post on a Mac, I will use "Instruments" to profile Postgres, but if you are on a Linux machine you can use perf instead. I wrote a guide here for the Linux users.

The first step is to grab the process id of the psql shell we are going to use for profiling:

SELECT pg_backend_pid();

Then, open the "Time Profiler" tool on Instruments.

And find the Postgres process. In terms of configuration I mostly use the defaults. I only change the frequency to "High", and recording mode to "Deferred":

Now we hit RECORD, and perform these statements on psql:

BEGIN;
ALTER TABLE foo ADD COLUMN buzz int NOT NULL DEFAULT (random() * 1000)::int;

And then we hit STOP. The profiler result would look something like this:

There is a suspicious call to ATRewriteTable... This is not good!

Let's see what the other alter table with a constant default does. But first, let's rollback that transaction.

ROLLBACK;

And now let's run our Time Profiler and then execute the command:

BEGIN;
ALTER TABLE foo ADD COLUMN bar int NOT NULL DEFAULT 12345;

Wait a minute... Is this calling ATRewriteTables?

Yes! But this is a false positive... Calling this function doesn't mean that it is actually rewritting the table. Perhaps a better name for that function should be ATMaybeRewriteTables? ...

In any case, if ATRewriteTables is going to actually do anything, it will call the ATRewriteTable (note the singular) function, where the magic happens.

But also, scrolling down that function I see the pattern:

    if (newrel || needscan)
    {
        if (newrel)
            ereport(DEBUG1,
                    (errmsg_internal("rewriting table \"%s\"",
                                     RelationGetRelationName(oldrel))));
        else
            ereport(DEBUG1,
                    (errmsg_internal("verifying table \"%s\"",
                                     RelationGetRelationName(oldrel))));

So this means that Postgres writes to the logger when it's rewriting or verifying a table. This configuration can be turned on by:

SET client_min_messages=debug1;

So if we run the SQL statements again, we'll see that log message showing up in the psql shell:

BEGIN;

ALTER TABLE foo ADD COLUMN buzz int NOT NULL DEFAULT (random() * 10000)::int;
-- DEBUG:  rewriting table "foo"

-- This one doesn't print anything, as the table is not rewritten.
ALTER TABLE foo ADD COLUMN bar int NOT NULL DEFAULT 12345;

ROLLBACK;

The Django Equivalent

Say we have the following "dumb" model:

class Foo(models.Model):
    pass

Let's add a new integer field with a default:

class Foo(models.Model):
    bar = models.IntegerField(null=False, default=10)

Django will create the following migration automatically:

# Generated by Django 5.1.6 on 2025-02-17 05:54

from django.db import migrations, models


class Migration(migrations.Migration):

    dependencies = [
        ('app', '0001_initial'),
    ]

    operations = [
        migrations.AddField(
            model_name='foo',
            name='bar',
            field=models.IntegerField(default=10),
        ),
    ]

Which results in these SQL statements:

BEGIN;
--
-- Add field bar to foo
--
ALTER TABLE "myfoo" ADD COLUMN "bar" integer DEFAULT 10 NOT NULL;
ALTER TABLE "myfoo" ALTER COLUMN "bar" DROP DEFAULT;
COMMIT;

Why is Django creating a default and dropping it immediately? This happens due to the consequences of three considerations:

Django allows default to be a callable:

def my_default():
    import random
    return random.randint(0, 42)

class Foo(models.Model):
    bar = models.IntegerField(null=False, default=my_default)

In this case, Django grabs the first value returned by my_default as the value to generate the DDL statement. If you run sqlmigrate multiple times, it will even generate different outputs!

-- ./manage.py sqlmigrate app 0004

BEGIN;
ALTER TABLE "foo" ADD COLUMN "bar" integer DEFAULT 15 NOT NULL;
ALTER TABLE "foo" ALTER COLUMN "bar" DROP DEFAULT;
COMMIT;

-- ./manage.py sqlmigrate app 0004

BEGIN;
--
-- Add field bar to foo
--
ALTER TABLE "foo" ADD COLUMN "bar" integer DEFAULT 4 NOT NULL;
ALTER TABLE "foo" ALTER COLUMN "bar" DROP DEFAULT;
COMMIT;

As a consequence of the above, the callable may contain a very complex logic that isn't able to be reproduced as SQL. This means that Django has to enforce the default in the application level, not in the database level.
If the above is true, why have a DEFAULT then? That's because adding a NOT NULL without a default in an existing table is an error in Postgres:
```
ALTER TABLE foo ADD COLUMN buzz_buzz int NOT NULL;
-- ERROR:  column "buzz_buzz" of relation "foo" contains null values
```

We can see these limitations as a consequence of Django's design to allow the default argument to work with callables.

Further Problems

If your database can be used by people from outside the Django application, the defaults won't be honoured. From a data-integrity perspective, it is best to enforce rules on the database than on the application.

A Little Plot Twist

Luckily, Django 5.0 now includes the parameter Field.db_default that allows the default to be enforced on the database level!

So you can have this change:

class Foo(models.Model):
    bar = models.IntegerField(null=False, db_default=10)

Which creates these changes:

BEGIN;
--
-- Add field bar to foo
--
ALTER TABLE "myfoo" ADD COLUMN "bar" integer DEFAULT 10 NOT NULL;
COMMIT;

Note how the DEFAULT is not dropped in this case.

Should you not use Postgres varchar(n) by default?

2025-02-01T00:00:00Z

Should you not use Postgres varchar(n) by default?

Created at: 2025-02-01

The Postgres wiki has a page called "Don't Do This" where general good practices are discussed.

Amongst them, there is a session titled: "Don't use varchar(n) by default" which is copied verbatim below:

Why not? varchar(n) is a variable width text field that will throw an error if you try and insert a string longer than n characters (not bytes) into it.

varchar (without the (n)) or text are similar, but without the length limit. If you insert the same string into the three field types they will take up exactly the same amount of space, and you won't be able to measure any difference in performance.

If what you really need is a text field with an length limit then varchar(n) is great, but if you pick an arbitrary length and choose varchar(20) for a surname field you're risking production errors in the future when Hubert Blaine Wolfeschlegelsteinhausenbergerdorff signs up for service.

Some databases don't have a type that can hold arbitrary long text, or if they do it's not as convenient or efficient or well-supported as varchar(n). Users from those databases will often use something like varchar(255) when what they really want is text.

If you need to constrain the value in a field you probably need something more specific than a maximum length - maybe a minimum length too, or a limited set of characters - and a check constraint can do all of those things as well as a maximum string length.

When should you?

When you want to, really. If what you want is a text field that will throw an error if you insert too long a string into it, and you don't want to use an explicit check constraint then varchar(n) is a perfectly good type. Just don't use it automatically without thinking about it.

Also, the varchar type is in the SQL standard, unlike the text type, so it might be the best choice for writing super-portable applications.

The reasons for using varchar (without the (n)) are compelling:

No performance penalties.
Reduced risk of errors if you misrepresented the size of the data.

What the wiki doesn't do a good job of, is steelmanning the downsides of the approach it directs towards, namely: using varchar by default.

Let's go over them.

Denial-of-Service (DoS) via Uncontrolled Data Insertion

If you choose a bare varchar for your surname field, you'll need validation somewhere in the application to ensure this field doesn't become a vector for attacks.

If a malicious party finds this free-text field without an upper-limit validation, they can perform database stuffing by storing enormous volume of data in the database, ending on a DoS attack.

Even if the application pre-validates the data before storing it, this level of validation is much weaker as a guarantee of data integrity than delegating the job to the database. The database is excellent for data integrity guarantees, application code is not.

Most services have hard constraints on such inputs. For example, the below is the limit for names on Twitter:

Storing free-text in a database may not be a good idea

Databases are optimised for structured data. There are better alternatives for storing free-text like S3, CDNs, or even just a dump static file server.

Having large free-text fields stored on a table will reduce the performance of the server, at the minimum you have the overhead of a TOAST table for some large rows, but also you are slowing down many db maintenance activities and backup tasks for data that you might not always need to have at hand.

Increasing the size of a varchar(n) is not a problem

Performing an ALTER TABLE to pump the value of n up is a catalogue-only operation and won't culminate in a database outage.

Of course, at that point you might have had a few angry customers complaining about errors in the application. You have to ponderate if this is worth over the risks of having a DoS via Uncontrolled Data Insertion attack instead.

There is a real problem though if you want to decrease the value of n. This will rewrite the table:

-- This will tell you if a table is being re-written.
SET client_min_messages=debug1;

DROP TABLE IF EXISTS test;

CREATE TABLE test (id SERIAL PRIMARY KEY, str varchar(6));

INSERT INTO test (str) SELECT generate_series(1, 1000);

-- Increasing the value of `n`, no problem here.
ALTER TABLE test ALTER COLUMN str TYPE varchar(7);

-- Also completely removing `n`, no problem!
-- Caveat: this will trigger a potential "building_index" operation for the
-- TOAST table.
ALTER TABLE test ALTER COLUMN str TYPE varchar;

-- Decreasing the value of `n`. This is a risky operation!
ALTER TABLE test ALTER COLUMN str TYPE varchar(4);
-- DEBUG:  rewriting table "test"

I haven't seen cases of having to reduce the value of n before in production.

But even then, there is a way to set a lower upper bound without downtime via check constraints:

-- [OPTIONAL] you can promote the field to a bare varchar first
ALTER TABLE test ALTER COLUMN str TYPE varchar;

-- Add a NOT VALID constraint, so that it does not scan the table while holding
-- an AccessExclusive Lock.
ALTER TABLE test
ADD CONSTRAINT chk_str_length CHECK (LENGTH(str) <= 4)
NOT VALID;

-- This will only acquire a ShareUpdateExclusiveLock
ALTER TABLE test VALIDATE CONSTRAINT chk_str_length;

Note that the check constraint performance may be slower than the native varchar(n) check due to the function evaluations behind performing a constraint check.

Conclusions

Think thoroughly about upper/lower bounds of your data before creating a field.
Ponderate between the risks of having length-limit errors versus a potential DDoS attack surface.
Do not reach out for free-text fields by default. Unless you are always adding a CHECK CONSTRAINT to sanitise input limits.

Code Reviews In Vim

2024-11-21T00:00:00Z

Code Reviews In Vim

Created at: 2024-11-21

A common way of reviewing code today is by performing the review using the repository host UI (GitHub, GitLab, etc.).

I have done that for awhile, and I still do it when code changes are trivial.

"Trivial" means I don't need to play with the branch locally first to have confidence the changes are correct.

However, often I will work with code that is hard to "only see" and feel confident it does the right thing.

For more complex cases, having the branch locally allows me to inspect and alter the code better than I can using the repository host UI.

Fetching

The first step is to get the branch locally:

git fetch origin branch_name && git checkout branch_name

You can simplify this command and skip to git checkout if you always fetch the entire remote (which I don't do because it takes too much space/time).

Showing the commits

The next step is to find exactly which commits the new branch includes.

git log -p master..HEAD

This command basically means "show me all the commits that this branch (HEAD) introduced since it diverged from master".

This effectively shows the commits created by the branch author and nothing else.

The -p (patches) includes the diffs for each commit in the result. I skip this sometimes if I am visualising the patches in a different way.

When in Vim, I use the vim-fugitive plugin. Running the same command through the plugin wrapper gives me a quick-fix window containing the commits from the pull request.

Running :Gclog master..HEAD looks like this:

Now I can quickly navigate between commits to see what's changed.

If you are using a different tool but still want to see the commit changes in your editor, you can try:

git show <commit_hash>

Checking Out Each Commit

I currently work on codebases that use atomic commits. This essentially means that for each commit:

The test suite must fully pass.
The integrity of the codebase isn't in jeopardy (no half-done changes between commits).
The codebase is in a deployable state.
A commit explains a single change, not multiple.

For example, a commit title "Change X and Y" is an indication that the commit isn't atomic. Multiple things are changing in the same commit.

Having atomic commits means that I can code-review a Pull Request commit-by-commit.

Of course there are many more benefits of this practice. For example, I can use git blame effectively. No change will be part of a 40-commits rebased branch with lack of detailed explanation in the commit description.

So after I have run git log -p master..HEAD, I will go through each commit and perform:

# Checkout the relevant commit I want to play with
git checkout <commit hash>

If I have messed things up, I can just check the reflog and go back to the place where I got the branch from.

# List the "reference logs" to find the record when the tip of the branch
# reference changed.

git reflog

# Now go back to the hash representing the time I checked the branch at the
# first time. It will look something like:
# 43c0eb3 HEAD@{3}: checkout: moving from main to my_branch

git checkout 43c0eb3

Taking Notes

If I am reviewing a complicated branch, I will usually open a new file in the /tmp/ folder to take some notes in.

There isn't anything fancy about that. It comes from the principle of wanting to have vim-editing capabilities when writing down a comment on a pull request.

Often, I will write code blocks in reply to a commit anyway, so editing comments in vim makes it easier to edit the comment than say, the GitHub UI.

Note: There are ways to embed vim into a browser nowadays, but it often feels strange. I prefer to not use an embedded vim.

About That Postgres Json Field

2024-10-10T00:00:00Z

About That Postgres Json Field

Created at: 2024-10-10

The json type was introduced in Postgres 9.2. Since then, the json type has gone through multiple enhancements, including the addition of the jsonb type (9.4).

Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within json objects. Also, if a json object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast, jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept. source

This sounds great, but there are downsides to storing json fields (specially big ones) in Postgres.

The degradation of some database functions grows linearly as a function of the json size.

To analyse this behaviour, I set up a sandbox script. The script creates a table with 500,000 rows, with varying sizes of json fields and runs common operations against it.

The script and the data are included verbatim at the bottom of this post.

VACUUM

VACUUM becomes more resource-intensive for tables with large json’s. Each dead tuple linked to a large json field adds to the overhead.

This makes VACUUM run longer. Possibly delaying other maintenance tasks or DDLs.

INSERTS

UPDATES

SELECT (queries)

Big json fields can degrade query performance. Tables with such fields take longer to read from disk and use more storage when cached in memory.

This makes querying these tables less efficient. SELECTs have to do more work to scan the same amount of data.

If the json field doesn’t exceed a particular threshold (2KB default but can be configurable on a table-per-table basis with CREATE TABLE ... WITH (toast_tuple_target=128)) it won’t be stored in a toast table. Instead, it will be stored inline on the page.

If a table doesn't have a high ratio of HOT updates, the volume of dead rows will increase. This further affects performance.

Considerations

Consider omitting the json field from your SELECT queries when applicable. This will remove the overhead added by querying and decoding the data from the TOAST table.

If you don't update the json data after it has been stored, consider a different type of storage. One option is storing the data on a bucket like S3.

Buckets can be a good option if you are satisfied with the following trade-offs:

| Factor                   | Keep json in PostgreSQL                  | Move json to S3                              |
|--------------------------|------------------------------------------|----------------------------------------------|
| Database size            | Increases database size                  | Keeps DB lean; reduces size                  |
| Performance              | json querying is slower for large fields | Keeps queries fast; json retrieved separately|
| Cost                     | Higher storage costs                     | Cheaper for large; unstructured data         |
| Atomicity & Transactions | Full transactional consistency           | No transactional guarantees                  |
| Querying                 | Direct SQL querying on json              | No direct querying                           |
| Simplicity               | All data in one place                    | Separate management of S3 and DB             |
| Access Latency           | Low-latency access                       | Potential latency in fetching from S3        |

Inspecting the TOAST table

You can find the name of a toast table with this query:

SELECT
    c.relname AS main_table,
    t.relname AS toast_table
FROM
    pg_class c
JOIN
    pg_class t ON c.reltoastrelid = t.oid
WHERE
    c.relname = 'my_table';

After you find the name of that toast_table, you can query the pg_toast schema to find stats about the toast table (assuming the toast table name is pg_toast_4532686):

SELECT *
FROM pg_stat_all_tables
WHERE relid = 'pg_toast.pg_toast_24683'::regclass;

You can also see the size of each id and its associated chunks:

SELECT
    chunk_id,
    COUNT(*) as chunks,
    pg_size_pretty(sum(octet_length(chunk_data)::bigint))
FROM pg_toast.pg_toast_340484
GROUP BY 1 ORDER BY 1;

And the size of the toast table in comparison to the table itself.

SELECT
    c1.relname,
    pg_size_pretty(pg_relation_size(c1.relname::regclass)) AS size,
    c2.relname AS toast_relname,
    pg_size_pretty(pg_relation_size(('pg_toast.' || c2.relname)::regclass)) AS toast_size
FROM
    pg_class c1
    JOIN pg_class c2 ON c1.reltoastrelid = c2.oid
WHERE
    c1.relkind = 'r'
    AND c1.relname = 'table_name';

And you can query its rows as a regular table:

SELECT *
FROM pg_toast.pg_toast_24683 LIMIT 100;

For more on toast follow these links:

blog post hakibenita

The benchmark script

import psycopg2
import time
import os
import json
import random
import string

"""
This code creates a benchmark for Postgres tables with
json fields.

For a table with NUM_OF_ROWS, for each value of JSON_SIZE_IN_BYTES:

- Check how long it takes to update PERCENTAGE rows in the table.
- Check how long it takes to insert PERCENTAGE rows in the table.
- Without vacuuming yet, check the average time to query NUM_QUERIES. There
  will be dead tuples impacting the performance from the operations above.
- Check how long it takes to vacuum.

"""

NUM_OF_ROWS = 500_000

JSON_SIZE_IN_BYTES = [
    10,
    100,
    200,
    500,
    1000,
    3_000,
    5_000,
    10_000,
    15_000,
    20_000,
    40_000,
]
PERCENTAGE = 0.1

NUM_QUERIES = 10_000
QUERY_LIMIT = 500


def get_cursor_and_connection():
    # Update connection details as per your PostgreSQL setup
    conn = psycopg2.connect(
        dbname="test_db",
        user="postgres",
        password="postgres",
        host="localhost",
        port="5441",
    )
    conn.autocommit = True
    return conn.cursor(), conn


def vacuum_table(cursor):
    print("- Vacuuming json_bench...")
    start_time = time.time()
    cursor.execute("VACUUM ANALYZE json_bench;")
    duration = time.time() - start_time
    print(f"- Vacuum took: {duration:.2f} seconds.")
    return duration


def create_table(cursor):
    print("- Creating table...")
    cursor.execute("""
        -- Idempotency for convenience.
        DROP TABLE IF EXISTS json_bench;
        CREATE TABLE json_bench (
            id SERIAL PRIMARY KEY,
            json_field JSONB
        );

        -- Disable autovacuum to not interfere with results.
        ALTER TABLE json_bench
        SET (autovacuum_enabled = false);

        -- Make sure the json field will be toasted at 2kb
        -- and compressed too.
        ALTER TABLE json_bench
        ALTER COLUMN json_field
        SET STORAGE EXTENDED;

        -- The default toast threshold is 2kb (comp time)
        -- #define TOAST_TUPLE_THRESHOLD 2048
    """)


def generate_json(json_size_in_bytes):
    return json.dumps(
        {
            "data": "".join(
                random.choices(
                    string.ascii_letters + string.digits, k=json_size_in_bytes
                )
            )
        }
    )


def populate_table(cursor, json_size_in_bytes):
    print(f"- Populating table with {NUM_OF_ROWS:,} rows...")

    json_data = generate_json(json_size_in_bytes)
    json_rows = [(json_data,) for _ in range(NUM_OF_ROWS)]

    start_time = time.time()
    cursor.executemany("INSERT INTO json_bench (json_field) VALUES (%s);", json_rows)
    duration = time.time() - start_time

    print(f"- Populating took: {duration:.2f} seconds.")
    return duration


def update_data(cursor, json_size_in_bytes):
    """
    This will generate some dead rows.
    """
    update_count = int(PERCENTAGE * NUM_OF_ROWS)
    print(f"- Updating {update_count:,} rows...")

    json_data = generate_json(json_size_in_bytes)

    update_query = """
    UPDATE json_bench
    SET json_field = %s
    WHERE id IN (
        SELECT id
        FROM json_bench
        ORDER BY id DESC
        LIMIT %s
    );
    """

    start_time = time.time()
    cursor.execute(update_query, (json_data, update_count))
    duration = time.time() - start_time

    print(f"- Update took: {duration:.2f} seconds.")
    return duration


def insert_data(cursor, json_size_in_bytes):
    insertion_count = int(PERCENTAGE * NUM_OF_ROWS)
    print(f"- Inserting {insertion_count:,} rows into the table...")

    json_data = generate_json(json_size_in_bytes)
    json_rows = [(json_data,) for _ in range(insertion_count)]

    start_time = time.time()
    cursor.executemany("INSERT INTO json_bench (json_field) VALUES (%s);", json_rows)
    duration = time.time() - start_time

    print(f"- Insertion took: {duration:.2f} seconds.")
    return duration


def benchmark_queries(cursor):
    print(f"- Bench marking {NUM_QUERIES:,} queries against the table...")
    start_time = time.time()

    for _ in range(NUM_QUERIES):
        cursor.execute(
            f"SELECT * FROM json_bench ORDER BY RANDOM() LIMIT {QUERY_LIMIT};"
        )
        cursor.fetchall()

    duration = time.time() - start_time
    print(f"- Average query time: {(duration/NUM_QUERIES):.5f} seconds.")
    return duration / NUM_QUERIES


def clear_cache():
    print("- Clearing cache by restarting docker container...")
    os.system("docker restart postgres15")
    print("- sleeping for 10s")
    time.sleep(10)


def query_dead_tuples(cursor):
    cursor.execute(
        "SELECT n_dead_tup FROM pg_stat_user_tables WHERE relname = 'json_bench';"
    )
    print(f"- There are {cursor.fetchone()[0]} dead tuples...")


def query_hot_updates(cursor):
    cursor.execute(
        "SELECT n_dead_tup FROM pg_stat_user_tables WHERE relname = 'json_bench';"
    )
    cursor.execute(
        """
        SELECT n_tup_hot_upd
        FROM pg_stat_user_tables
        WHERE relname = 'json_bench';
        """
    )
    print(f"- There were {cursor.fetchone()[0]} hot updates...")


def run_tests():
    results = {}
    print(
        f"\nReport details:\n"
        f"  - rows in the table: {NUM_OF_ROWS:,}\n"
        f"  - percentage of updates and inserts: {PERCENTAGE*100:.2f}%\n"
        f"  - number of queries to benchmark: {NUM_QUERIES:,} with limit {QUERY_LIMIT:,}"
    )

    for json_size in JSON_SIZE_IN_BYTES:
        print(f"\nRunning tests with json's of {json_size:,} bytes...")

        # Get a cursor to run queries.
        cursor, conn = get_cursor_and_connection()

        # Create the table and insert data.
        create_table(cursor)
        populate_table(cursor, json_size)

        # Vacuum so there are no dead rows.
        vacuum_table(cursor)

        # Update PERCENTAGE new rows to create some dead rows
        # for a more realistic scenario.
        update_duration = update_data(cursor, json_size)
        query_dead_tuples(cursor)
        query_hot_updates(cursor)

        # Insert PERCENTAGE new rows while we have dead rows
        insert_duration = insert_data(cursor, json_size)
        query_dead_tuples(cursor)
        query_hot_updates(cursor)

        # With the dead rows, check the query performance
        query_avg_duration = benchmark_queries(cursor)

        # Check how long vacuuming the dead rows takes
        vacuum_duration = vacuum_table(cursor)

        # Clear OS cache and get a new cursor
        clear_cache()
        cursor, conn = get_cursor_and_connection()

        results[json_size] = {
            "update_duration": update_duration,
            "insert_duration": insert_duration,
            "query_avg_duration": query_avg_duration,
            "vacuum_duration": vacuum_duration,
        }

    cursor.close()
    conn.close()

    # Print final results
    print("\nFinal Results:")
    for json_size, times in results.items():
        print(f"json size: {json_size:,} bytes")
        print(f" - Update duration: {times['update_duration']:.2f} seconds")
        print(f" - Insert duration: {times['insert_duration']:.2f} seconds")
        print(f" - Query AVG duration: {times['query_avg_duration']:.5f} seconds")
        print(f" - Vacuum duration: {times['vacuum_duration']:.2f} seconds")


run_tests()

The results in CSV format

json_length,update_time,insert_time,avg_query_time,vacuum_time
10,0.22,41.2,0.0689,0.08
100,0.22,40.29,0.07287,0.1
200,0.24,40.73,0.07694,0.16
500,0.35,42.35,0.12824,0.2
1000,0.51,46.06,0.13995,0.36
3000,1.65,71.18,0.10875,2.4
5000,2.17,59.36,0.10136,3.57
10000,9.67,62.27,0.15999,10.67
15000,8.77,91.57,0.16642,16.69
20000,12.7,111.92,0.22364,20.71
40000,28.16,184.93,0.32687,35.98

The machine that ran the tests

[~] neofetch
                   -`                    x@archlinux
                  .o+`                   -----------
                 `ooo/                   OS: Arch Linux x86_64
                `+oooo:                  Host: 20W0005AAU ThinkPad T14 Gen 2i
               `+oooooo:                 Kernel: 6.6.52-1-lts
               -+oooooo+:                Uptime: 3 days, 8 hours, 27 mins
             `/:-:++oooo+:               Packages: 1273 (pacman)
            `/++++/+++++++:              Shell: bash 5.2.37
           `/++++++++++++++:             Resolution: 1920x1080, 1920x1080
          `/+++ooooooooooooo/`           WM: i3
         ./ooosssso++osssssso+`          Theme: Adwaita [GTK2/3]
        .oossssso-````/ossssss+`         Icons: Adwaita [GTK2/3]
       -osssssso.      :ssssssso.        Terminal: alacritty
      :osssssss/        osssso+++.       Terminal Font: LiterationMono Nerd Font
     /ossssssss/        +ssssooo/-       CPU: 11th Gen Intel i5-1135G7 (8) @ 4.200GHz
   `/ossssso+/:-        -:/+osssso+-     GPU: Intel TigerLake-LP GT2 [Iris Xe Graphics]
  `+sso+:-`                 `.-/+oso:    Memory: 6629MiB / 15717MiB
 `++:.                           `-/+/

Thoughts on 3 years of management

2024-10-08T00:00:00Z

Thoughts on 3 years of management

Created at: 2024-10-08

I've been managing for the past 3 years at my current job. This is not extensive experience and I still see myself as a junior manager.

As I reflect on my journey, I talk through lessons learnt from managing other developers and on recurring patterns I have observed.

Now, a post about management is not cringe enough unless it includes an "advice top list".

The one I could come up with is made of 5 must's.

You must facilitate.
You must take active interest in the development of your reports.
You must reassure through real recognition.
You must listen.
You must keep an open mind about your own management skills.

Note: Having innate charisma would greatly help. It'd make easier to perform most points above. Sadly the nerdy type often lacks charisma. We need to work harder on it. More on that later.

You Must Facilitate

“The manager’s function is not to make people work, but to make it possible for people to work.” (Peopleware)

Being promoted to manager after performing well as a developer is a regular occurrence in many companies. Another regular occurrence is for a former developer to not be good at managing their peers.

Promoting a good developer has many positives. It ensures that the new manager won't be a mere conduit for communication. Having experience in the field helps facilitate technical discussions.

However, The Venn diagram intersection between skills required to be a good developer and to be a good manager is narrow. Facilitation skills do not regularly feature on the good-developer skill set as much as they should.

I did not have guidance and mentorship once I became a manager. On top of that, I wanted to keep coding at the same pace as before. This situation made it harder for me to manage my own conflicts of interest between coding and managing.

That meant that I wasn't facilitating much. The lack of leadership and direction quickly hampered the development of one of my teams.

Once that team grew to about ~7 developers the situation became unsustainable.

I wasn't having quality time to spend on important coding tasks because I was constantly on management duties. At the same time I wasn't helping to unblock my team's work as well as I could since I spent considerable time coding.

By sheer luck, the team was independent enough to perform well without strong management guidance. That could equally have been the other way around.

I had to step back and rethink my approach so that I could balance facilitation duties against coding responsibilities.

Being proactive about facilitation helped save time for myself and to relieve stress from my team. It sounds cliché, but finding problems before they happen and answering the questions before they are asked are important to help reduce stress across your team.

The most important thing was realising that unblocking 7 people and letting them do good work was more important than me, as a single contributor, writing some of the code.

This didn't make my work easier, though. There are many ways to facilitate progress, and some situations are harder than others.

Technical facilitation is usually straight forward. For example, the product manager asks for a feature but the details aren't clear enough for a developer to jump on the task. You pop a meeting with the client and the P.M. Together you clear up requirements so that a developer is not stuck with vague requirements and lack of direction.

The goal isn't to detail the design for the solution (the developer will do that). Instead, the goal is to make sure requirements are understood and there is a definition of what "done" means for that task.

Technical facilitation only takes time and organising. I.e., getting the relevant people together and taking the time to write up the details of what was discussed and agreed.

My experience as a developer helped greatly in these areas. I was already used to going to client meetings and explaining what was possible versus what was not possible. I also frequently helped clients with estimations backed up by my knowledge of the tech and current architecture of the project.

The hard type of facilitation is the human one. For example, when a team member is having a hard time because of external factors or conflicts between colleagues. This is the real human-factor part of the job. I have little experience on this.

These situations take a lot of time and energy to solve. Each situation of this type is different and facilitation might have to be defined on a case-by-case basis.

The tools available by the business will be relevant here and the manager needs to be aware of them to be able to put them to good use. E.g., unlimited leave, team rotation, mental-health days, external mediation, etc.

My general observation is to treat the situation with compassion and empathy. In the end of the day a manager will be dealing with people-problems frequently. In the eyes of the person being troubled, their problems will be be more concerning than they may be comfortable telling you. You are at the risk of miss-characterising the problem if you don't take it seriously.

There is little point in trying to play a hard hand as an authoritarian manager that only dictates solutions without taking the time to care and understand the situation. "It's your job, just do it!". This almost always goes wrong and makes the manager lose respect even from the people not involved in the situation. Luckily this is something I learned from observation and not from direct experience.

You Must Take An Active Interest In The Development Of Your Reports

Of all the "musts", this is the one that took me the longest to think and write about. It's hard to find a recipe for what "taking an active interest" is, even though most people have a good intuition about what this phrase means.

Different people need different things depending on where they are at their careers, what they want to achieve, and who they are.

The first step is understanding what each report needs based on their career stage, goals, and personality. But there is a meta step here too. Both you and the report need to be willing to learn and improve together for that relationship to work and those questions to be answered. It's not just about the report's growth but also the manager's willingness to evolve in response to their team's needs.

In the best manager-managed relationships I have had, both people were learning together what they needed from each other and what they could provide.

To use an analogy from pedagogy theory:

Through dialogue, the teacher-of-the-students and the students-of-the-teacher cease to exist and a new term emerges: teacher-student with students-teachers. The teacher is no longer merely the-one-who-teaches, but one who is himself taught in dialogue with the students, who in turn while being taught also teach. They become jointly responsible for a process in which all grow − Paulo Freire.

Having a manager that allows themselves to not know but are comfortable asking dumb questions is a great thing. I had managers on the other side of the spectrum who pretended to be acquainted with things they had no clue about. I believe they were insecure about showing potential shortcomings to their reports. This strikes me as a bad thing.

It is difficult to provide quality help to develop a report's career if the manager is not an active member of the team in some capacity.

It is hard to relate to the work of a report if you have no skin in the game.

This is why I am in favour of managers that get their hands dirty on the factory line at least some of the time, even if only on smaller tasks.

This helps building team spirit and camaraderie. Frankly, you can provide a much richer feedback as a manager if you are close to the work yourself.

That is not to say that there aren't exceptions where the manager cannot be part of the team. For example, a project may be so novel that no one except the few reports deep in the weeds can actively contribute to the advancement of that project.

This problem also seem to occur the higher up the management chain you go. It must be difficult for the manager-of-managers to stay on top of each manager's team work.

However, this does not mean that the manager shouldn't try their best to understand the project. Even if it is to support the team when collateral damage happens.

It is hard to take an active interest in someone without getting to know them. Make sure to invest in the relationship early as it takes time to build trust.

This is not necessarily popular advice. The book Peopleware makes this observation below.

managers are usually not part of the teams that they manage. Teams are made up of peers, equals that function as equals. The manager is most often outside the team, giving occasional direction from above and clearing away administrative and procedural obstacles. By definition, the manager is not a peer and so can’t be part of the peer group. (Peopleware).

I recommend to not take the hierarchical manager as an example. It may work in certain business areas, but I haven't seen it working well in software development so far.

You Must Listen

"Listening is an active skill". In this case not only one of "paying attention" but taking proactive effort to reflect on what has been reported and to act on it.

A good way of creating space for listening is through recurring one-on-one meetings. Those are the minimum to keep a relationship flowing. Otherwise, without the space to exchange ideas, get feedback, or otherwise just rant, there'll be a barrier between manager and report that might not serve either.

It is easy to get caught on the "busyness-of-it" and put off 1:1's - both as a report or as a manager. If this becomes a frequent occurrence it might be because of a potential underlying problem.

Here is a small list of tips for active listening. It serves well just to be aware of these things before you jump on a 1:1 with a report:

Never interrupt. It frustrates your report and affects the full understanding of the message. If you have international people in your team, your way of communication might differ from theirs. Some cultures tend to provide low context communication whereas others provide high context.
Defer judgement. Remain open and neutral. If your report is telling you what problems they are going through, it is unhelpful to start expressing your personal opinions on the problem without being asked to or without taking into consideration social cues for when to do it.
Avoid distractions. Please, just close Slack.

You Must Reassure Through Real Recognition

“People who feel untrusted have little inclination to bond together into a cooperative team.”

A pat on the back goes a long way, specially when it's done publicly. But reassuring is not just about rewarding but building trust and cooperation.

Everyone likes to feel like they're winning and getting validation for their good work. Many software development managers are introverts who struggle with giving compliments.

I think that more than a skill that can be learned, you need to be constantly aware of opportunities to provide recognitions (this is harder than it sounds).

I have worked for companies that had free-pizza events (or insert another snack here) to celebrate team achievements. There's nothing wrong with that.

Celebration is necessary for a team to function. However, some organisations tend to over-characterise such acts as proof of their generosity and reassurance that employees are doing a good job.

I think that most employees can see through this over characterisation and that leaves them with a bitter taste in their mouths. Without proper reassurance and true recognition, it doesn't matter how many pizza events there are, people won't feel reassured and valued in their job.

It might even be the opposite: "why is the company going a long way with such events whereas people aren't getting paid enough?". panem et circenses

One of the best kinds of recognition is the financial one, specially when that financial acknowledgement is made proactively from the manager-side before "official" raise ceremonies take place.

It might be harder to give financial recognition in start-up companies struggling for cash. There are other means to account for that like granting share options or title promotions.

However, navigate the "title promotion" situation with care. Although such promotions are a good way to recognise the work that someone has done, they can also be a problem. Eagerly promoting people who aren't ready for the role as a retention strategy can backfire. This is not true recognition.

It is incredible that in certain places salary raises only happen once a year. If you missed the date but got a lot of responsibility on your back, you need to perform the role for a year without financial recognition.

It is very hard to be a manager in such companies as you don't have freedom to really manage your team.

You must keep an open mind about your management skills

The same divisive effect occurs in connection with the so-called “leadership training courses,” which are (although carried out without any such intention by many of their organizers) in the last analysis alienating. These courses are based on the naïve assumption that one can promote the community by training its leaders—as if it were the parts that promote the whole and not the whole which, in being promoted, promotes the parts.

Paulo Freire.

Managing is hard, and there is so much material out there that isn't necessary relevant or particularly useful to your situation.

From that, it is easy to grow cynical thinking that "no one can teach management". But that is also not a helpful way to see things. I bet your reports would raise an eyebrow if you said that to them.

Material on how to manage creative workers like software developers seems particularly scarce. The creative-types also seem to require a different kind of management than the general "KPI-based" literature teaches.

To make matters worse, there is usually not many clear metrics you can track on how much impact a creative has made (usually in terms of revenue) to the business. Of course you should be able to tell whether someone is performing to the level of their role, but overall impact is a harder thing to measure.

For example, you might have productive developers that look good on paper but aren't generating tons of concrete value. They may, at the same time, demand a lot of resources from across the team for code review.

In the same way, you might have workers that seem slower but always provide high-quality and impactful changes (and feedback) that are aligned with core-business values.

These situations are tricky. Part of growing as a manager is recognising those types of workers exist and providing feedback that enables them both to grow.

Even though improving-as-a-manager is a slow process and difficult in a different way than improving-as-a-developer, progressive improvement is possible.

Managing takes a different source of energy than coding. If you are not a real people person you have to be prepared to spend more time and energy being a manager.

Quite frankly, it's already very hard to keep on top of these "5 musts", specially as the number of reports goes up.

The final advice here is to keep an open mind. Check the literature and try and read some books on management. Observe how your team members interact with each other and listen to what they have to say. Learn from teams that perform well, and from teams that don't perform well. What are the differences? Read other posts like this one about what other managers are thinking and think critically about what you read. Don't take our word for it!

Some recommendations for reading:

The Culture Map (Erin Meyer)
Peopleware (Tom DeMarco)
Ruined by Design (Mike Monteiro)
The Manager's Path (Camille Fournier)

Closing Remarks

Even after writing about all of these topics, I myself am not able to perform these advices to the dot every single day.

If not by an act of human fallibility, in some situations it is just not possible to follow general advice. I think that this is OK.

What matters the most is:

Constantly learning and taking active interested in how to be better at management.
Checking whether you'd like someone to manage you the way you manage your team.
Critically thinking about management and leadership decisions.
Keeping track of progress!

On Git Commit Messages

2024-10-05T00:00:00Z

On Git Commit Messages

Created at: 2024-10-05

There are two web pages that provide a great summary on Git Commits best practice.

I recommend reading them before going through the rest of this post:

Those articles are "old". One is from 2008 and the other is from 2014. This means that the following occasional comment pops up now and then:

It's 2024, do we really have to restrict ourselves to 72-long commit titles? I think it's acceptable to simply:

git -m "ISSUE:1234X Add date of birth, salary, ethnicity, pronouns, height (in centimetres), salary text fields, and more, to the request loan submit form for Chameleon MVP."

We also have so many good tools around git that make it easier to see the changes and the diffs. We can also link to rich context on JIRA and Asana, why are we focusing so much on terminal limitations?

We are agile and always fix forward. We never have use for the old bits of git like git bisect, git revert, or even git log... What do these do again?!...

Given that I have little to contribute to the excellent content in the articles above, I'll limit my contribution to talking about the reason these posts have aged so well.

A Disclaimer

Learning a tool like git takes time. To make the most of git, one needs to learn it well. The same way there are programmers who debug exclusively with print(), there are programmers who only use three git commands: pull, commit, and push.

That is fine. You can go a long way without ever needing more advanced git commands (or debugging tools). That also means, however, that the justifications behind good-practice advice will be harder to understand. I will try to make those clearer in this post even if you don't go beyond these three git commands.

The tragedy of it all, however, is that not knowing those advanced use cases before creating a repository might jeopardise the ability of advanced users to take advantage of good commit etiquette.

Some code bases have a "before good commits" and "after good commits". The "before" is usually a dark place we don't like to go.

Make up your reasons wisely and trust the advice of people who have been there before and learned the hard lessons.

The Summary

Before going into the reasons why the advice from those posts is still sound, here is a very short summary of the two articles above:

Do not mix two unrelated functional changes in the same commit: It's hard to catch flaws during review when changes are mixed together. If the commit needs to be reverted, the two changes need to be untangled first. Similarly, it is harder to bisect and find which change created a bug if multiple functional changes are included in a commit.
Do not assume the reviewer uses the same tools as you.
Do not assume the reviewer has access to an external website.

These provide justification for seemingly arbitrary content on the linked blog posts such as "commit titles should be no longer than 50 characters and commit bodies no longer than 72 characters".

Do not mix two unrelated functional changes in the same commit

So commit messages to me are almost as important as the code change itself. Sometimes the code change is so obvious that no message is really required, but that is very very rare. And so one of the things I hope developers are thinking about, the people who are actually writing code, is not just the code itself, but explaining why the code does something, and why some change was needed. Because that then in turn helps the managerial side of the equation, where if you can explain your code to me, I will trust the code...

Linus Torvalds.

Let's start with a bad example:

commit e5b18b256c0f4f5d369c62785248632075790867 (HEAD -> master)
Author: John Doe <john.doe@gmail.com>
Date:   Sat Oct 5 18:33:21 2024 +1300

    Revamp customer profile page

    This commit:

      - Refactor ResetPassword form UI to reuse textbox component.
      - Add an index to the "users" table to lookup emails faster.
      - Apply compression to user's uploaded profile pictures.
      - Change hash algorithm for profile picture names.
      - Fix broken layout on mobile devices using landscape format.
      - Add a new canary flag to control "Under Maintanence" banner.

Although John Doe's commit title evokes the idea that there's only one thing happening, a closer look at the commit description reveals that there are many unrelated changes sneaking in at the same time.

Why is this bad? Let's start with a simple example.

Suppose the third change has a bug:

"Apply compression to user's uploaded profile pictures".

The profile_update.c file where all the operations for updating a user's profile live has a code-path that crashes the server.

Naturally you want to revert that commit. But in the meanwhile John's colleague Mary has changed one of the UI layout files that John's commit had also touched as part of an unrelated change:

commit 85cf1a1501a2062dbc9310d6b598dcf72e284cbc (HEAD -> master)
Author: Mary Silva <mary.silva@gmail.com>
Date:   Sat Oct 6 20:45:40 2024 +1300

    Upgrade profile page UI layout

    This commit:

      - Move css classes to the new file "user_layout.css".
      - Refactor text boxes to use the same css style.
      - Remove unreacheable (dead) JavaScript code.

Now you can't revert John's commit because you got a conflict.

git revert e5b18b256c0f4f5d369c62785248632075790867

CONFLICT (modify/delete): README.md deleted in (empty tree) and modified in
HEAD.  Version HEAD of README.md left in tree.

error: could not revert e5b18b2... Revamp customer profile page

The bug in the profile_update.c file has nothing to do with the UI layout in the profile page.

When John added all of those unrelated changes in a single commit, his commit became a conflict magnet. Conflict magnet commits are very hard to revert.

Bisecting

In this case we already knew that John's commit introduced a bug, but what if we didn't? git bisect is a git tool built for finding where a bug was introduced.

To use git bisect you give it two arguments: A "bad" commit that is known to contain the bug (even if not introduced by that commit itself), and a "good" commit that is known to be before the bug was introduced.

The short version of what bisect does is: Bisect will pick a commit between the "bad" and "good" one and ask you whether it's good or bad. It is up to you to decide.

How you do that depends on the project, you might run the test suite with a test that reproduces the bug, or simply look at the diff changes. In each iteration, git bisect shrinks the search window until John's offending commit is found.

Inevitably you find John's commit. However, the commit has two changes in the same profile_update.c file that causes the bug.

  - Apply compression to user's uploaded profile pictures.
  - Change hash algorithm for profile picture names.

So which of the changes is the bad one?

That question might not be trivial to answer. It might be hard to untangle which change actually broke the code. Specially if the compression algorithm and the hash algorithm use the same underling routines.

Function evolutions

Another tool that can't be used as well if multiple changes are present in the same commit is git log -L.

git log -L :<funcname>:<file_name>

By running the command above, git will display a diff with all the commits that touched that function in the past.

This presents a way to see the "evolution" of a function over time.

You want commits to be split so that you can see which individual patches changed that function.

Having a single commit with too much noise makes that more difficult to understand why that function changed.

If you are not convinced yet, there are many more git tools that are affected by non-atomic commits. Check the list below and see if may you use any of the following:

git blame: Atomic commits give you the direct answer to: "Why was this change made?"
git rebase: For rebasing, dropping changes, re-editing commit messages, or adding fix-ups.
git cherry-pick: For applying a specific commit from one branch to another.
git diff: For seeing one change at a time.

It's reasonable to conclude that one commit per functional change is still relevant today.

Do not assume the reviewer uses the same tools as you

Word-wrapping is a property of the text. And the tool you use to visualize things cannot know. End result: you do word-wrapping at the only stage where you can do it, namely when writing it. Not when showing it.

Some things should not be word-wrapped. They may be some kind of quoted text - long compiler error messages, oops reports, whatever. Things that have a certain specific format.

The tool displaying the thing can't know. The person writing the commit message can. End result: you'd better do word-wrapping at commit time, because that's the only time you know the difference.

(And the rule is not 80 characters, because you do want to allow the standard indentation from git log, and you do want to leave some room for quoting).

Linus Torvalds

This is what a long commit looks like with the default pager (less on most *nix systems).

Those white arrows at the right-hand side show where the text was truncated.

Although less supports wrapping text, it may not be on by default depending on how your OS came configured.

Every command that takes the commit summary (top line) truncate and become unreadable. More examples of such commands are found on Tim Pope's blog post.

There isn't much I can add here. I think that it is important to have good writing skills for both commit titles and commit messages with the goal of keeping them succinct and informative at the same time. You can see for yourself how nice Linux's kernel git log reads for inspiration.

The kernel has a restrictive line-length limit.

the summary must be no more than 70-75 characters.

No text gets wrapped or truncated, and everything is nice and in good style.

It is true that there is no strong evidence about what the "ideal" line-length for coding is. But we know that number for human-readable text:

Research has led to recommendations that line length should not exceed about 70 characters per line. The reason behind this finding is that both very short and very long lines slow down reading by interrupting the normal pattern of eye movements and movements throughout the text.

source

Throw no stones! We are talking about human-readable, i.e., text like books, magazines, papers, and git logs!!!.

As Linus has explained on the quote at the top of this section, the writer is responsible for wrapping the text because the pager tool might not be able to do it for the reader.

The default pager (less) is not the only tool that truncates instead of wrapping, even in 2024 we haven't found the magic solution for perfect text-wrapping yet.

Even GitHub truncates long commit titles to 72 characters. Be mindful of that when committing long titles and messages.

Do not assume the reviewer has access to an external website

I worked for a company that used to use GitHub for issue tracking as well as repository hosting. Many of our commit messages merely pointed at GitHub links and had no description at all.

It was sad to see the company getting acquired and the parent company moving to BitBucket while deleting the old GitHub account.

It is OK to link to GitHub, Jira, Asana, etc., but the most important thing is to make sure the commit message has everything you need in it so that you don't depend on external services.

Outro

Linus on Tim Pope's post:

Postgres Unique Constraints Without Downtime

2024-10-01T00:00:00Z

Postgres Unique Constraints Without Downtime

Created at: 2024-10-01
Updated at: 2024-10-08

The syntax for adding a unique constraint in Postgres is as follow:

ALTER TABLE "table_name"
ADD CONSTRAINT "unique_constraint_on_foo"
UNIQUE ("foo");

This constraint will prevent multiple rows having the same value stored in the foo column.

However, this operation acquires an ACCESS EXCLUSIVE lock, blocking all reads and writes to the table until it's finished.

If you are adding a unique constraint to a large table, the amount of time spent to create the constraint might be prohibitive.

How To Safely Add a Unique Constraint Without Downtime

From the Postgres documentation:

PostgreSQL automatically creates a unique index when a unique constraint or primary key is defined for a table.

Creating this index on the background while holding an ACCESS EXCLUSIVE is the problem we are trying to avoid.

What we want to do is create the index first, and CONCURRENTLY, so that when we add the constraint to the table, the table can use the already existing index. This will make the subsequent ALTER TABLE much faster to run.

If you have the following table:

CREATE TABLE example_table (
    id SERIAL PRIMARY KEY,
    int_field INT
);

You can create a unique index concurrently (this won't block any reads or writes on this table), with the following command:

SET lock_timeout '0';

CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS unique_int_field_idx
ON example_table (int_field);

Side Note 1: If you are using any value of lock_timeout that is not zero, you have to set it to zero before you create the index. This will prevent leaving an invalid index behind if the operation fails due to a time out.

Side Note 2: You cannot use a partial index here. Postgres allows the creation of partial unique indexes, but it does not allow the creation of partial unique constraint. The documentation states source:

A uniqueness restriction covering only some rows cannot be written as a unique constraint, but it is possible to enforce such a restriction by creating a unique partial index.

If you try to use a partial index to create a unique constraint, Postgres will raise the following error:

ERROR:  "unique_int_field_idx" is a partial index

Therefore, if you need a partial unique restriction, just keep your index. It will be enough.

Once this command finished, you can add the new constraint USING the index above:

SET lock_timeout '10s'

ALTER TABLE example_table
ADD CONSTRAINT unique_int_field UNIQUE USING INDEX unique_int_field_idx;

The operation above takes virtually no time.

Note: I have reset lock_timeouts to a reasonable value (10s). This is a safeguard. If there is a long-running transaction that would block the ALTER TABLE statement, which in turn would block all reads and writes, the statement will time out instead of causing a potential outage.

Why not just use the index for constraint validation?

Both the unique index and constraint raise the same error when an insert attempt fails: "duplicate key value violates unique constraint."

So why would one bother even creating the constraint if the index suffice?

From an old (v9.4) Postgres documentation:

Note: The preferred way to add a unique constraint to a table is ALTER TABLE ... ADD CONSTRAINT. The use of indexes to enforce unique constraints could be considered an implementation detail that should not be accessed directly. One should, however, be aware that there's no need to manually create indexes on unique columns; doing so would just duplicate the automatically-created index.

source

This note has since been removed from Postgres since version 9.5. The commit that removed the note (049a7799dfc) says:

docs: remove outdated note about unique indexes

There is no guidance on why that was outdated and how unique indexes should be interpreted.

The differences remaining are:

Constraints can be deferred.
Indexes can be partial, which is useful if uniqueness is restricted to a subset of data. You cannot add a table constraint from a partial index. Make sure your index wasn't created with "WHERE ...".
If you care about the SQL standard, constraints are part of it, whereas indexes aren't (they're an implementation detail).
External tools that care about uniqueness being defined through constraints might care about it and not work properly if the constraint isn't defined on the schema.

Timing Different Approaches

The Python script below times how long it takes to add a constraint using two approaches:

ALTER TABLE without a pre-existing index.
ALTER TABLE with a pre-existing index.

Note: These results were taken from a local database without any concurrency.

TLDR: Creating an index concurrently first, and then using it to create the constraint takes a little longer in total, but is a much safer approach.

First, the results in, and then the script:

CSV:

rows,unique constraint without index,unique index,unique constraint using index
1000000, 0.18, 0.23, 0.0
2000000, 0.38, 0.51, 0.0
3000000, 0.57, 0.8, 0.0
4000000, 0.74, 1.06, 0.0
5000000, 0.9, 1.3, 0.0
6000000, 1.11, 1.55, 0.0
7000000, 1.25, 1.84, 0.01
8000000, 1.58, 2.14, 0.0
9000000, 1.61, 2.4, 0.0
10000000, 1.78, 2.52, 0.0
20000000, 3.72, 4.97, 0.0
30000000, 5.52, 8.07, 0.0
40000000, 7.93, 10.77, 0.0
50000000, 10.16, 13.91, 0.0
60000000, 12.17, 17.24, 0.0
70000000, 15.74, 22.68, 0.01
80000000, 25.58, 38.55, 0.0
90000000, 40.45, 58.64, 0.0
100000000, 52.37, 60.59, 0.0
200000000, 124.62, 168.0, 0.0
300000000, 198.78, 293.85, 0.0
400000000, 250.16, 340.47, 0.01
500000000, 323.56, 435.08, 0.0
600000000, 395.54, 541.21, 0.0
700000000, 491.53, 724.37, 0.0
800000000, 589.03, 782.12, 0.0
900000000, 635.95, 909.58, 0.0
1000000000, 734.21, 1059.35, 0.01

The script

import psycopg2
import time
import os


def get_cursor_and_connection():
    # Update connection details as per your PostgreSQL setup
    conn = psycopg2.connect(
        dbname="test_db",
        user="postgres",
        password="postgres",
        host="localhost",
        port="5441",
    )
    conn.autocommit = True
    return conn.cursor(), conn


def vacuum_table(cursor):
    """
    Vacuum is necessary to optimise the table
    structure before we perform the benchmark.

    It ensures that the performance tests are
    not affected by any leftover internal
    inconsistencies or unnecessary disk overhead
    from unvacuumed data.

    It also prevents autovacuum'ing from interfering
    on test results.

    The ANALYZE part is there to better inform
    Postgres on how to find the best planner
    for the ALTER TABLE / index
    """
    print("Vacuuming example_table...")
    cursor.execute("VACUUM ANALYZE example_table;")


def create_table(cursor):
    print("Creating table...")
    cursor.execute("""
        DROP TABLE IF EXISTS example_table;
        CREATE TABLE example_table (
            id SERIAL PRIMARY KEY,
            int_field INT
        );
    """)


def insert_data(cursor, num_rows):
    print(f"Inserting {num_rows} unique rows...")
    cursor.execute(f"""
        INSERT INTO example_table (int_field)
        SELECT s
        FROM (
            SELECT generate_series(1, {num_rows}) AS s
            ORDER BY RANDOM()
        ) AS shuffled;
    """)


def clear_cache():
    print("Clearing cache by restarting docker container...")
    os.system("docker restart postgres15")
    print("sleeping for 10s")
    time.sleep(10)


def add_unique_constraint(cursor, conn):
    print("Adding unique constraint directly...")
    start_time = time.time()
    cursor.execute("""
        ALTER TABLE example_table
        ADD CONSTRAINT unique_int_field UNIQUE (int_field);
    """)
    conn.commit()
    duration = time.time() - start_time
    print(f"Time taken to add unique constraint: {duration:.2f} seconds")
    return duration


def add_unique_constraint_with_index_first(cursor, conn):
    print("Creating unique index...")
    idx_start_time = time.time()
    cursor.execute("""
        CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS unique_int_field_idx
        ON example_table (int_field);
    """)
    conn.commit()
    idx_duration = time.time() - idx_start_time
    print(f"Time taken to add index: {idx_duration:.2f} seconds")

    print("Adding unique constraint using index...")
    start_time = time.time()
    cursor.execute("""
        ALTER TABLE example_table
        ADD CONSTRAINT unique_int_field UNIQUE USING INDEX unique_int_field_idx;
    """)
    conn.commit()
    constraint_duration = time.time() - start_time
    print(
        f"Time taken to add unique constraint with index first: {constraint_duration:.2f} seconds"
    )
    return idx_duration, constraint_duration


def run_tests(table_sizes):
    results = {}

    for num_rows in table_sizes:
        print(f"\nRunning tests with {num_rows} rows...")

        # Get a cursor to run queries.
        cursor, conn = get_cursor_and_connection()

        # Create the table and insert data
        create_table(cursor)
        insert_data(cursor, num_rows)
        conn.commit()

        # Vacuum the table after inserting rows
        vacuum_table(cursor)

        # Clear OS cache and get a new cursor
        clear_cache()
        cursor, conn = get_cursor_and_connection()

        # Test 1: Add unique constraint directly
        time_direct = add_unique_constraint(cursor, conn)

        # Clear OS cache again and get a new cursor
        clear_cache()
        cursor, conn = get_cursor_and_connection()

        # Test 2: Create index first, then add unique constraint
        create_table(cursor)  # Drop and recreate the table
        insert_data(cursor, num_rows)
        conn.commit()
        vacuum_table(cursor)

        # Clear OS cache and get a new cursor
        clear_cache()
        cursor, conn = get_cursor_and_connection()

        idx_duration, constraint_duration = add_unique_constraint_with_index_first(
            cursor, conn
        )

        results[num_rows] = {
            "direct_constraint": time_direct,
            "index_then_constraint": {
                "idx_duration": idx_duration,
                "constraint_duration": constraint_duration,
            },
        }

    cursor.close()
    conn.close()

    # Print final results
    print("\nTest Results:")
    for num_rows, times in results.items():
        print(f"Rows: {num_rows}")
        print(f" - Direct constraint: {times['direct_constraint']:.2f} seconds")
        print(
            f" - Index then constraint: \n"
            f"   - {times['index_then_constraint']['idx_duration']:.2f} seconds (idx)\n"
            f"   - {times['index_then_constraint']['constraint_duration']:.2f} seconds (constraint)"
        )


table_sizes = [
    1_000_000,
    2_000_000,
    3_000_000,
    4_000_000,
    5_000_000,
    6_000_000,
    7_000_000,
    8_000_000,
    9_000_000,
    10_000_000,
    20_000_000,
    30_000_000,
    40_000_000,
    50_000_000,
    60_000_000,
    70_000_000,
    80_000_000,
    90_000_000,
    100_000_000,
    200_000_000,
    300_000_000,
    400_000_000,
    500_000_000,
    600_000_000,
    700_000_000,
    800_000_000,
    900_000_000,
    1_000_000_000,
]
run_tests(table_sizes)

Why Is This Site Built With C

2024-08-26T00:00:00Z

Why Is This Site Built With C

Created at: 2024-08-26

I've been writing about things on a personal website since 2017.

Most of what I have written features in the category of notes-to-self. Mostly on how to do A or B.

Only recently I've started polishing notes together and forming posts on specific topics.

One thing I realised was preventing me of writing more frequently wasn't the lack of ideas (or motivation), but the trouble of having to deal with the website builder and platform I was using at the time.

GitHub pages didn't exist at the time and the canonical way was to have an Apache server running the website in some web provider. I didn't know anything about Apache and the little I saw didn't interest me, so I looked for an alternative.

I built my first website with Django (serviced by Nginx) in a server hosted on Digital Ocean. This is before the Droplets-era, so I had to rent an Ubuntu machine which costed $5.00 USD per month. That was a bit steep for a dev on a Brazilian salary considering I had to pay for other services too (registrar, email, etc).

I was highly motivated to post things as I was still fresh in the web development world and wanted to know how everything worked. I also had no idea what I was doing and wanted my own website to be a sandbox where I could try new things out.

That was my first mistake. Building a "static" website with Django is too cumbersome. You have to set up views, templates, run the server, get GitHub hooks for resetting the remote server in Digital Ocean when new commits are pushed, etc.

Once the romantic view of a newbie blog-poster faded away, handling the whole apparatus to publish a note took more time than writing the note itself.

At some point I had to make a switch before the website grew too big.

My second take was to ditch the whole website and start from scratch using a static website generator. I decided to use Nuxt because I was using Vue at work and the whole set up looked simple to start with.

It was nice in the beginning. I set it up with GitHub Pages. I only had to get the static site that Nuxt creates via a cli command pushed to my git repo and GitHub handled the rest for me. That was a major improvement over the previous infrastructure. On top of that, I could do cool dynamic things with JavaScript being embedded and having the framework to interact with it.

But I only had one blog post where I needed fancy JavaScript tooling. Soon it became painful to maintain the website again. Publishing posts involved writing things in Vue and that was just not an ergonomic way to write regular prose.

Also the framework was a new technology, and maintainers were pushing updates that broke backwards compatibility. Handling versioning of Vue and Nuxt along with all their JavaScript dependencies was a big pain point and I had to give up at some point.

Now

Learning from these two past mistakes, I came up with a set of requirements for my next (and hopefully final) website:

Starting a post must be as easy as typing into a blank file.
The website must be statically generated. And Fast.
There should be little to none dependencies for generating the website.
It needs to last for at least the next 10 years.

The first requirement is satisfied by writing using markdown files. Writing this blog post in Neovim looks like this:

The second requirement is a bit trickier, but it is directly related to the third.

Writing posts in markdown means that there needs to be a parser to convert the files to html. I could either code this parser myself and have a beautiful static site generator with zero dependencies, or I can allow myself a single dependency.

The problem is that the move from zero dependencies to one dependency is huge. It feels way bigger than going from 10 dependencies to 100 dependencies.

The problem is that writing a markdown parser isn't the most trivial enterprise. At the same time, the parser was the only dependency I needed to have. I managed to convince myself that a dependency was okay and then I moved on.

My first instinct was to reach out to Pandoc. I did so and implemented a small shell script that could read my directory tree of markdown files and transpose them to html.

That worked fine for about 20 to 30 markdown files. After that, the process of converting files to html started to deteriorate in speed. Pandoc is written in Haskell, and it is not known for being fast at parsing large volumes of files.

An alternative for saving time with recompilation was to update my script so that only new markdown files or changed ones are marked for recompilation. That would involve too much wizardry if I wanted to make the script nice and robust. I didn't want to do that. I didn't want my script to grow so much that I would need to start adding test cases and coverage.

I also knew that parsing hundreds or even thousands of small files should be doable in single-digit seconds. The problem was that Pandoc slowed everything down, so my second requirement was not met.

More over, the whole Pandoc ecosystem requires a lot of of dependencies. 227 dependencies and over 400MB of installed size to be exact:

Packages (227) ghc-libs-9.2.8-1  haskell-aeson-2.1.2.1-47  haskell-aeson-pretty-0.8.10-7
               haskell-ansi-terminal-0.11.4-66  haskell-ansi-wl-pprint-0.6.9-418
               haskell-appar-0.1.8-14  haskell-asn1-encoding-0.9.6-230
               haskell-asn1-parse-0.9.5-230  haskell-asn1-types-0.3.4-209  haskell-assoc-1.0.2-266
               haskell-async-2.2.5-27  haskell-attoparsec-0.14.4-74
               haskell-attoparsec-aeson-2.1.0.0-31  haskell-attoparsec-iso8601-1.1.0.0-50
               haskell-auto-update-0.1.6-339  haskell-base-compat-0.12.2-2
               haskell-base-compat-batteries-0.12.2-83  haskell-base-orphans-0.8.8.2-13
               haskell-base-unicode-symbols-0.2.4.2-14  haskell-base16-bytestring-1.0.2.0-80
               haskell-base64-0.4.2.4-69  haskell-base64-bytestring-1.2.1.0-104
               haskell-basement-0.0.16-2  haskell-bifunctors-5.6-77  haskell-bitvec-1.1.3.0-94
               haskell-blaze-builder-0.4.2.3-2  haskell-blaze-html-0.9.1.2-226
               haskell-blaze-markup-0.8.3.0-10  haskell-boring-0.2.1-3
               haskell-bsb-http-chunked-0.0.0.4-383  haskell-byteorder-1.0.4-25
               haskell-call-stack-0.4.0-184  haskell-case-insensitive-1.2.1.0-203
               haskell-cassava-0.5.3.1-4  haskell-cereal-0.5.8.3-2  haskell-citeproc-0.8.1-105
               haskell-cmdargs-0.10.22-2  haskell-colour-2.3.6-210  haskell-commonmark-0.2.4.1-1
               haskell-commonmark-extensions-0.2.4-2  haskell-commonmark-pandoc-0.2.1.3-82
               haskell-comonad-5.0.8-261  haskell-conduit-1.3.5-53  haskell-conduit-extra-1.3.6-134
               haskell-constraints-0.13.4-50  haskell-contravariant-1.5.5-4  haskell-cookie-0.4.6-2
               haskell-crypton-0.34-11  haskell-crypton-connection-0.3.2-8
               haskell-crypton-x509-1.7.6-28  haskell-crypton-x509-store-1.6.9-28
               haskell-crypton-x509-system-1.6.7-28  haskell-crypton-x509-validation-1.6.12-28
               haskell-data-array-byte-0.1.0.1-55  haskell-data-default-0.7.1.1-306
               haskell-data-default-class-0.1.2.0-25
               haskell-data-default-instances-containers-0.0.1-37
               haskell-data-default-instances-dlist-0.0.1-319
               haskell-data-default-instances-old-locale-0.0.1-37  haskell-data-fix-0.3.2-102
               haskell-dec-0.0.5-5  haskell-digest-0.0.1.7-2  haskell-digits-0.3.1-21
               haskell-distributive-0.6.2.1-209  haskell-dlist-1.0-241
               haskell-doclayout-0.4.0.1-29  haskell-doctemplates-0.11-71
               haskell-easy-file-0.2.5-21  haskell-emojis-0.1.3-10  haskell-erf-2.0.0.0-25
               haskell-fast-logger-3.1.2-74  haskell-file-embed-0.0.15.0-2
               haskell-foldable1-classes-compat-0.1-77  haskell-generically-0.1.1-2
               haskell-ghc-bignum-orphans-0.1.1-2  haskell-glob-0.10.2-90
               haskell-gridtables-0.1.0.0-48  haskell-haddock-library-1.11.0-17
               haskell-hashable-1.4.3.0-46  haskell-hourglass-0.2.12-246  haskell-hslua-2.3.0-52
               haskell-hslua-aeson-2.3.0.1-34  haskell-hslua-classes-2.3.0-53
               haskell-hslua-core-2.3.1-45  haskell-hslua-list-1.1.1-60
               haskell-hslua-marshalling-2.3.1-5  haskell-hslua-module-doclayout-1.1.0-58
               haskell-hslua-module-path-1.1.0-53  haskell-hslua-module-system-1.1.0.1-27
               haskell-hslua-module-text-1.1.0.1-27  haskell-hslua-module-version-1.1.0-53
               haskell-hslua-module-zip-1.1.1-22  haskell-hslua-objectorientation-2.3.0-49
               haskell-hslua-packaging-2.3.1-14  haskell-hslua-repl-0.1.2-11
               haskell-hslua-typing-0.1.1-7  haskell-http-api-data-0.5.1-54
               haskell-http-client-0.7.15-23  haskell-http-client-tls-0.3.6.3-58
               haskell-http-date-0.0.11-136  haskell-http-media-0.8.1.1-14
               haskell-http-types-0.12.4-6  haskell-http2-4.1.0-22  haskell-hunit-1.6.2.0-227
               haskell-indexed-traversable-0.1.3-69
               haskell-indexed-traversable-instances-0.1.1.2-44
               haskell-integer-logarithms-1.0.3.1-7  haskell-iproute-1.7.12-82
               haskell-ipynb-0.2-139  haskell-isocline-1.0.9-2  haskell-jira-wiki-markup-1.5.1-22
               haskell-juicypixels-3.3.8-31  haskell-lexer-1.1.1-2  haskell-libyaml-0.1.4-5
               haskell-lpeg-1.0.4-26  haskell-lua-2.3.2-6  haskell-memory-0.18.0-8
               haskell-mime-types-0.1.2.0-2  haskell-mmorph-1.2.0-6
               haskell-monad-control-1.0.3.1-102  haskell-mono-traversable-1.0.17.0-8
               haskell-network-3.1.4.0-20  haskell-network-byte-order-0.1.7-2
               haskell-network-uri-2.6.4.2-31  haskell-old-locale-1.0.0.7-31
               haskell-old-time-1.1.0.4-2  haskell-onetuple-0.3.1-75  haskell-only-0.1-23
               haskell-optparse-applicative-0.17.1.0-29  haskell-ordered-containers-0.2.3-2
               haskell-pandoc-3.1.8-34  haskell-pandoc-lua-engine-0.2.1.2-23
               haskell-pandoc-lua-marshal-0.2.4-2  haskell-pandoc-server-0.1.0.5-39
               haskell-pandoc-types-1.23.1-21  haskell-pem-0.2.4-286  haskell-pretty-show-1.10-15
               haskell-prettyprinter-1.7.1-165  haskell-primitive-0.7.4.0-111
               haskell-psqueues-0.2.8.0-10  haskell-quickcheck-2.14.3-64  haskell-random-1.2.1.2-8
               haskell-recv-0.1.0-30  haskell-regex-base-0.94.0.2-3  haskell-regex-tdfa-1.3.2.2-44
               haskell-resourcet-1.2.6-51  haskell-safe-0.3.21-5
               haskell-safe-exceptions-0.1.7.4-21  haskell-scientific-0.3.7.0-113
               haskell-semialign-1.2.0.1-160  haskell-semigroupoids-5.3.7-142
               haskell-servant-0.20.1-12  haskell-servant-server-0.20-23  haskell-sha-1.6.4.4-20
               haskell-simple-sendfile-0.2.32-36  haskell-singleton-bool-0.1.7-3
               haskell-skylighting-0.14-15  haskell-skylighting-core-0.14-14
               haskell-skylighting-format-ansi-0.1-121
               haskell-skylighting-format-blaze-html-0.1.1.2-8
               haskell-skylighting-format-context-0.1.0.2-86
               haskell-skylighting-format-latex-0.1-121  haskell-socks-0.6.1-237
               haskell-some-1.0.5-2  haskell-sop-core-0.5.0.2-2  haskell-split-0.2.5-6
               haskell-splitmix-0.1.0.5-22  haskell-statevar-1.2.2-3
               haskell-streaming-commons-0.2.2.6-26  haskell-strict-0.4.0.1-240
               haskell-string-conversions-0.4.0.1-171  haskell-syb-0.7.2.4-8
               haskell-tagged-0.8.8-2  haskell-tagsoup-0.14.8-226  haskell-temporary-1.3-585
               haskell-texmath-0.12.8.4-15  haskell-text-conversions-0.3.1.1-63
               haskell-text-icu-0.8.0.5-2  haskell-text-short-0.1.5-79
               haskell-th-abstraction-0.4.5.0-2  haskell-th-compat-0.1.5-2  haskell-th-lift-0.8.4-2
               haskell-th-lift-instances-0.1.20-47  haskell-these-1.1.1.1-267
               haskell-time-compat-1.9.6.1-97  haskell-time-manager-0.0.1-35  haskell-tls-1.8.0-29
               haskell-toml-parser-1.3.1.3-18  haskell-transformers-base-0.4.6-102
               haskell-transformers-compat-0.7.2-2  haskell-type-equality-1.0.1-1
               haskell-typed-process-0.2.11.1-15  haskell-typst-0.3.2.1-32
               haskell-typst-symbols-0.1.4-2  haskell-unicode-collation-0.1.3.6-12
               haskell-unicode-data-0.4.0.1-33  haskell-unicode-transforms-0.4.0.1-74
               haskell-uniplate-1.6.13-223  haskell-unix-compat-0.7.1-16
               haskell-unix-time-0.4.13-1  haskell-unliftio-0.2.25.0-10
               haskell-unliftio-core-0.2.1.0-2  haskell-unordered-containers-0.2.20-18
               haskell-utf8-string-1.0.2-150  haskell-uuid-types-1.0.5.1-16
               haskell-vault-0.3.1.5-185  haskell-vector-0.13.1.0-31
               haskell-vector-algorithms-0.9.0.2-3  haskell-vector-stream-0.1.0.1-2
               haskell-wai-3.2.4-19  haskell-wai-app-static-3.1.9-14  haskell-wai-cors-0.2.7-355
               haskell-wai-extra-3.1.15-2  haskell-wai-logger-2.4.0-443  haskell-warp-3.3.30-59
               haskell-witherable-0.4.2-101  haskell-word8-0.1.3-23  haskell-xml-1.3.14-31
               haskell-xml-conduit-1.9.1.3-53  haskell-xml-types-0.3.8-9  haskell-yaml-0.11.11.2-49
               haskell-zip-archive-0.4.3.2-2  haskell-zlib-0.6.3.0-60  hslua-cli-1.4.1-49
               lua-lpeg-1.1.0-2  numactl-2.0.18-1  pandoc-cli-0.1.1.1-113

Total Download Size:    65.52 MiB
Total Installed Size:  473.35 MiB

There are too many dependencies for me to trust the environment will be stable for a long time. The last thing I want to do is deal with backward incompatible changes on my wee blog.

I looked for a better alternative and found md4c, which is a parser written in C with no dependencies other than the standard C library. It also has only one header file and one source file, making it easy to embed it straight into any C project.

The only work I needed to do was to write a C script (which turned out to be ~250 LOC) to call md4c functions and parse my md files, and then chuck those converted files into the GitHub Pages repo.

My website converter script, which is all in this 250 LOC source file (less md4c) is feature-complete and runs on any compiler that supports the C standard from 1999 onwards. There's no platform-dependent code and it's portable to Windows, Linux, and MacOS.

It runs incredibly well. I have 87 markdown files at the moment and parsing all those files at the same time from scratch can be done virtually instantaneously:

[~] time ./scripts/website/converter.bin

real    0m0.115s
user    0m0.087s
sys     0m0.091s

This allows me to completely flush the whole repo away and create it from scratch in almost no time. I do not have to worry about creating specific logic to just re-parse files that have changed or anything fancy like that, which reduces the burden of maintenance and makes my script smaller and easier to reason about.

This result was way more reasonable than the amount of time Pandoc took to parse a mere amount of 87 markdown files, which was over the double-digit mark (of seconds).

Outro

One popular alternative of current days for this problem is Hugo. There is nothing inherently bad with Hugo. It is decently fast (written in Go) and it is easy to get going for a simple website. It seems better than some alternatives like pelican which is written in Python and thus will be slower to parse md files.

However, Hugo doesn't particularly appeal to me because the framework seems too big and opinionated for what I need:

Hugo takes data files, i18n bundles, configuration, templates for layouts, static files, assets, and content written in Markdown, HTML, AsciiDoctor, or Org-mode and renders a static website. Some notable features are multilingual support, image processing, asset management, custom output formats, markdown render hooks and shortcodes. Nested sections allow for different types of content to be separated, e.g. for a website containing a blog and a podcast.

source

There's a lot in there, so much so that big websites that need a lot of features (e.g. Smashing Magazine) are capable of relying heavily on Hugo.

Also as much as Hugo looks satisfiable today, I'm not expecting that it won't keep growing and changing in ways that would make me have to keep up with it every now and then.

I just need a parser that performs a one-off job of parsing the given markdown file. There is no benefit of bringing a GC-based language into this type of problem.

I also wanted my website to use tech that I know will continue to work in the upcoming decades (my last requirement). There is virtually nothing that beats C compilers in that area as of today. For any new platform out there the first thing that needs to happen is getting a C compiler built along with the standard library (which is probably the only standard lib of popular programming languages that fits in a commented book of 500 pages...). Otherwise, nothing can run in the platform. So I'm hoping this bet will pay off.

The Dumbest Compiler Imaginable

2024-08-20T00:00:00Z

The Dumbest Compiler Imaginable

Created at: 2024-08-20

Python is about having the simplest, dumbest compiler imaginable, and the official runtime semantics actively discourage cleverness in the compiler like parallelizing loops or turning recursions into loops.

− Guido van Rossum (creator of Python). source

This might be a shocking statement to read for someone used to languages that compile to machine code directly like C, C++, Zig, Rust, etc.

The compiler is, usually, a major source of optimisation for human-written code, being capable of lexically analysing code and removing unnecessary computation.

This basically allows developers to write whatever code they think is the most readable and expressive, while handing over the work of optimising the actual code to the compiler.

A classic example in C is:

int foo() {
  int bar1 = 42; // Unused variable.
  int bar2 = 100;
  return bar2;
}

Which when compiled with the most basic level of optimisation via gcc -O1 produces the following assembly code:

foo:
  mov  eax, 100
  ret

The function just returns the value 100. There is no use of the stack to store values, no load operation to fetch variables, and no other form of allocations. The compiler is able to reason that the code returns 100 and thus creates the machine code to do just that.

Compare this with the Python function below:

def foo():
    bar1 = 42
    bar2 = 100
    return bar2

Which returns the following bytecode via dis.dis(foo).

LOAD_CONST               1 (42)
STORE_FAST               0 (bar1)
LOAD_CONST               2 (100)
STORE_FAST               1 (bar2)
LOAD_FAST                1 (bar2)
RETURN_VALUE

The bar1 variable hasn't been discarded, even though it hasn't been used by the code. A full description of Python opcodes is provided in the official documentation (source), but basically the instructions above do:

LOAD_CONST: Pushes the value 42 onto the stack. The big switch case in CPython uses this underlying C code:

        TARGET(LOAD_CONST) {
            frame->instr_ptr = next_instr;
            next_instr += 1;
            INSTRUCTION_STATS(LOAD_CONST);
            _PyStackRef value;
            value = PyStackRef_FromPyObjectNew(GETITEM(FRAME_CO_CONSTS, oparg));
            stack_pointer[0] = value;
            stack_pointer += 1;
            assert(WITHIN_STACK_BOUNDS());
            DISPATCH();
        }

STORE_FAST: Pops the last value of the stack (42) into the local variable (bar1).

      TARGET(STORE_FAST) {
          frame->instr_ptr = next_instr;
          next_instr += 1;
          INSTRUCTION_STATS(STORE_FAST);
          _PyStackRef value;
          value = stack_pointer[-1];
          SETLOCAL(oparg, value);
          stack_pointer += -1;
          assert(WITHIN_STACK_BOUNDS());
          DISPATCH();
      }

LOAD_FAST: Pushes a reference to bar2 onto the stack.

      TARGET(LOAD_FAST) {
          frame->instr_ptr = next_instr;
          next_instr += 1;
          INSTRUCTION_STATS(LOAD_FAST);
          _PyStackRef value;
          assert(!PyStackRef_IsNull(GETLOCAL(oparg)));
          value = PyStackRef_DUP(GETLOCAL(oparg));
          stack_pointer[0] = value;
          stack_pointer += 1;
          assert(WITHIN_STACK_BOUNDS());
          DISPATCH();
      }

RETURN_VALUE: pops the stack, returning the value (bar2) back to the caller. (C code is too long to add here).

Note that those operations aren't light on CPU time nor on store/loads. But, that is Python, so there's no much we can do there.

As comparison, this is what a function returning 100 would do:

def foo():
    return 100

RETURN_CONST             1 (100)

In theory, a bytecode optimiser could generate the disassembled version of our code above once it analysed the function to always return the same constant value. This type of optimisation is called Peephole Optimisation, a term coined back in 1965.

The problem of optimising the snippet above seems simple, but one immediate problem is Python's dynamic-typed nature. It isn't trivial to know whether an operation has been overloaded or not, but if we could tell the compiler "I haven't overloaded anything for these classes" we could possibly have some nice optimisations.

An interesting paper from 1998 has a lot to say about optimising Python bytecode.

For example, one snippet provided in the paper above talks about a common unpacking pattern:

a,b,c = 1,2,3

Which in current versions of Python generates the bytecode:

LOAD_CONST               0 ((1, 2, 3))
UNPACK_SEQUENCE          3
STORE_NAME               0 (a)
STORE_NAME               1 (b)
STORE_NAME               2 (c)
RETURN_CONST             1 (None)

Which is less desirable than the code:

a = 1
b = 2
c = 3

Which skips the tuple allocation and unpacking overhead, dealing with variables directly stored in the stack:

LOAD_CONST               0 (1)
STORE_NAME               0 (a)
LOAD_CONST               1 (2)
STORE_NAME               1 (b)
LOAD_CONST               2 (3)
STORE_NAME               2 (c)

Interestingly, Python tuples are immutable, and thus can be loaded as constants. That's why the LOAD_CONST opcode managed to load the three values in one chunk.

There are several other examples of optimisation opportunities, but the key question is: Why isn't CPython already doing those?

Guido's quote provided at the top of this post isn't sufficient to elucidate what perks we are getting from having an unoptimised bytecode compiler. One obvious one is maintainability. A dumb compiler is way easier to maintain and change than a smart compiler that optimises a lot.

I know that compiler experts will say that provided the right compiler architecture, pattern-matching optimisation becomes easy as such optimisations can be injected as "plugins". Since I am not a compiler expert I have no way to validate this idea. All I have is Guido's quote.

I am not particularly found on having simplicity on the compiler at the expense of all Python programs being slowed down because of it. But maybe if Python had an optimiser for bytecode, Python wouldn't have existed in the first place.

It seems like the tides have been changing though as Python 3.13 will get a JIT.

Another perk from a dumb compiler is debugging. Written Python code directly translates into bytecode, and thus we can do this:

def foo():
    a = 10
    breakpoint()
    b = 20
    return b

In the process of debugging we know that the variable a is there and won't be optimised away. However... This is a bit of a straw man argument as we could generate the bytecode with different levels of optimisation as in gcc's -O1, -O2, and -O3, thus using a lower level of optimisation when debugging.

Pypy

Pypy only has a couple of special bytecodes on top of what CPython already has, and Pypy in general doesn't perform a lot of bytecode optimisations either.

But there are two interesting opcodes that make a significant difference on a program's performance. For the following code:

class Foo:
    def bar(self, x: int, y: int) -> int:
        return x + y


foo = Foo()
x = 1
y = 2

Running this function call on the CPython interpreter:

# Runs foo.bar(x, y) bytecode.
dis.dis(lambda: foo.bar(x, y))

Gives me the bytecode:

LOAD_GLOBAL              0 (foo)
LOAD_ATTR                3 (NULL|self + bar)
LOAD_GLOBAL              4 (x)
LOAD_GLOBAL              6 (y)
CALL                     2
RETURN_VALUE

Whereas in Pypy we have:

LOAD_GLOBAL              0 (foo)
LOAD_METHOD              1 (bar)
LOAD_GLOBAL              2 (x)
LOAD_GLOBAL              3 (y)
CALL_METHOD              2
RETURN_VALUE

Both have 6 bytecode instructions each, but the first difference is that Pypy uses its special LOAD_METHOD opcode instead of a LOAD_ATTR instruction.

The LOAD_METHOD pushes two values to the stack instead of one. It passes the unbounded Python function object (Foo.bar) and the object itself (foo).

The CALL_METHOD (which received a parameter of N = 2) will pop the two variables from the stack (x, y) as well as the "self" argument for the unbounded function ("foo") and call Foo.bar(self, x, y).

To understand why this optimises the underlying code one must understand the difference between bound and unbound methods in Python.

# Using the object.
print(foo.bar)
>> <bound method Foo.bar of <__main__.Foo object at 0x7646b4648bf0>>

# Using the class.
print(Foo.bar)
>> <function Foo.bar at 0x7646b46972e0>

The first case where the function bar comes from an instantiated object is called "bounded", because that function is linked to the object and thus has a "self" variable bounded to it.

The second case, where we use the Foo class, the function bar is not bounded as it didn't come from an instantiated object and thus does not have a self object. Trying to call it will result in error:

>>> Foo.bar(x=1, y=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Foo.bar() missing 1 required positional argument: 'self'

However, you can do this:

>>> Foo.bar(self=foo, x=1, y=2)
3

So why is the original LOAD_ATTR instruction a problem?

The problem comes from the performance penalty imposed by creating bounded methods and calling them.

CPython creates bounded methods on demand. I.e., every time the bounded method is needed for the first time, it is initialised and allocated in memory right there.

>>> obj_1 = Foo()
>>> obj_2 = Foo()
>>>
>>> obj_1.bar is obj_2.bar
False
>>> obj_1.bar == obj_2.bar
False
>>> obj_1.bar
<bound method Foo.bar of <__main__.Foo object at 0x7646b49ebf80>>
>>> obj_2.bar
<bound method Foo.bar of <__main__.Foo object at 0x7646b49e8320>>

Note how the addresses of obj_1.bar and obj_2.bar are different. CPython will create instances of those bound methods for each object before it can call the bounded .bar function (allocation on demand). However, Pypy will use the stack to cache the unbounded method, and call it with the "self" object that is stored in the stack already, so that there is no overhead of allocation and creation of bounded methods when an object function needs to be called. It operates similarly to Foo.bar(self=obj, x=1, y=2).

This strategy provides a considerable performance improvement for heavily OOP programs. According to Pypy:

Another optimization, or rather set of optimizations, that has a uniformly good effect are the two ‘method optimizations’, i.e. the method cache and the LOOKUP_METHOD and CALL_METHOD opcodes. On a heavily object-oriented benchmark (richards) they combine to give a speed-up of nearly 50%, and even on the extremely un-object-oriented pystone benchmark, the improvement is over 20%.

source

Outro

It is important to note that there have been many attempts to make Python faster, many of which have failed [1] [2].

As much as it would be nice to have another Python interpreter fully JIT'ed and full of bytecode optimisations, in reality it is really hard to compete against CPython. Many 3rd-party libraries use CPython's C-extensions directly, which aren't necessarily available in other Python interpreters (excluding some forks), rendering such libraries unusable.

It might be too far to say that Python is a mono-implementation language, but it does feel like it. If a fork is successful it may be merged up-stream instead of remaining a fork. If the interpreter itself is built without CPython's C-extensions in mind, it will not provide a rich ecosystem for all the performance-dependent 3rd-party libs out there and will thus probably be less used.

Managing Python Environments

2024-08-06T00:00:00Z

Managing Python Environments

Created at: 2024-08-06

The ecosystem for managing Python environments is huge, and so is the number of tools that are used to manage these environments.

We have: pyenv, virtualenv, virtualenvwrapper, asdf, conda, anaconda, uv, poetry, pipenv etc.

It is very easy to break your local environment if you are new to all of this. This cartoon from xkcd sums it up well:

source

The purpose of this post is to argue whether we need any of these tools when working on multiple Python projects that potentially have incompatible Python versions and dependencies.

Mind you that I program for work. Some of the knowledge and intentions behind the way I do things have been learned through trial and error over time. What is written here assumes that you either have a similar background or the ability to understand or come to common ground on why some of those decisions are harder to me than others.

Also, I expect that you don't need convincing that using your global Python executable for everything is a bad idea, and that isolated virtual environments for each project is the best solution to avoid dependency headaches.

What Do These Tools Provide?

Every tool I cited above is a little bit different, but I will pick pyenv as an example since it is one of the tools with the smallest footprint.

pyenv let's you:

Install a new Python version: pyenv install 3.10.4
Let pyenv automatically pick a Python environment when you cd into a folder: pyenv local <version>
A plugin framework that let's you add virtualenv among other tools.
Auto completion of commands.

This all sounds pretty neat, but is it worth installing this tool made of 101,263 lines of code and data files (as per v2.4.9) just so that you have these commands plus a plugin framework?

My answer is no. You are not going to need it and you are better off with the default tools (more on that later).

There're three main points that I consider undesired behaviour coming from the abstraction provided by pyenv:

Bash shims. A shim is merely a proxy. If you add the pyenv collection of shims at the beginning of your $PATH variable, as in:
```
$(pyenv root)/shims:/usr/local/bin:/usr/bin:/bin`
```
these shims will intercept Python commands like pip so that the correct virtualised pip version for the directory you previously cd into is chosen. Effectively, these shim commands replace your Python environment commands like pip by the commands hardcoded by pyenv that do the magic for you. But besides the magic nature of those shims, they can be very slow (example). If you have many Python projects, these shims start to become a bit of a dark magic and you won't have direct access to the Python tools if anything bad happens. Plus if pyenv adds an order of magnitude of slowness as compared to running the Python binary itself, pyenv becomes a painful tool to use.
Python versions are hidden from you, which is another magical feature that makes it a little less ergonomic for you to control or debug a particular environment yourself. This problem can be enhanced when the bug is in pyenv itself. Checking for recent issues in the repository, one can see many distinct problems ranging from incompatible changes within pyenv itself, to weird missing C++ links in the Python executable, failing to create a virtual environment for a specific version of Python, unavailable or unsupported Python binary, operating system upgrades breaking the tool, etc. There are 1,700+ issues to date to pick from.
So much bash. Assuming you are one of these people in the issues page that need support, jumping into the source code isn't trivial. Almost half of the repo is composed of bash scripts. That's about 50,000 lines of bash code according to Github. I like bash for small scripts, specially for my own. Debugging thousands of lines of someone else's bash is a much harder problem.

After reading all that, you might still find that pyenv is actually useful for you and the drawbacks aren't that meaningful. If that is the case, please go for it! If pyenv wasn't useful it wouldn't be so popular. But developers come in different flavours, and given past experience I can tell that pyenv isn't for me.

I personally am not a big fan of magical tools and I like to have control and understanding of how to fundamentally control my work environment as this is an important part of my job. Any breakage in my local environment in the past has caused me great pain and stress. Most of these problems have been caused by mismanagement of dependencies; problems either created by me (lack of knowledge of how underlying tools work), by the Operating System (ubuntu and MacOS in particular), or by magic tools changing in backwards incompatible ways.

On the other hand, I have frequently been surprised by how easy it is to learn and use basic tools available by the OS or the programming language itself, which has only added to my scepticism of magical tools adding value in exchange for their added cognitive load and potential bugs.

I also mentioned at the top of this section that pyenv is one of the tools with the smallest footprint. That is true. Other tools such as conda, asdf and, heck, nix are on a higher level of abstraction. To me, they are even less desirable for the task of managing Python environments locally.

There are also other caveats with these tools such as the fact that they change, grow bigger, and sometimes these changes create backwards incompatibility with their own earlier versions as we saw above with pyenv.

It is not hard to find issues on those repositories where some conflicting dependency has broken the dependency resolver tool itself [1]. If you are in a situation where you need a version management tool to manage your version management tool, things get complicated. It is a fact that software breaks, and if your environment management that is build upon high levels of abstraction has failed you, how will you fix this issue without knowing enough about this 100,000 lines code repository?

So Why Do People Use These Tools?

I can only speculate on empirical knowledge since I don't have any hard data I can reach to, so take that with a grain of salt.

I imagine that whether someone will choose to use an environment manager tool comes down to their background. Preferring to pick a tool over another is a choice compounded by many factors:

How junior or senior a developer is. Being a junior developer generally means that there are many pressing things to learn at once: The programming language, the specific auxiliary technology ubiquitous in their areas, the product of the business they are working for, text editors, frameworks, developer hype, etc. It is totally understandable that when it comes to understanding tools for managing an environment, spending time to analyse all choices and select the best one is lower in their list of priorities. They will pick the one that magically handles everything for them so that they can move on. I have done that myself many times in many different problem areas. Magic isn't by itself a bad thing, but I think that as one progresses to more senior levels and becomes interested in particular topics, it is important to materialise current knowledge and evolve it into deep knowledge about how things work, and to make an effort to help the community simplify things if all possible.
How much they care about their environment being deterministic at all times. If a developer only works in a single codebase and the requirements don't change often, why care about managing environments at all? This is a bit of a moot point, but I know that developers who come from projects like this have a hard time when they get a job at a company that has several codebases with different tools and requirements for each and struggle to understand or care about this type of problem.
Popularity of a given tool. There is trust that popular projects will be stable enough and have a community of people backing it up. Trusting that X tool is the tool that professionals in the field use to solve their problems, so it must be good.

What do I do then?

I am writing this article in 2024. Building Python from source is incredibly easy yet surprisingly very few people actually do it. Yes... Building from source! What a crazy idea, nobody builds from source these days and many people don't know how to.

It is possible to download a specific Python version and set up a virtual environment using Python's own venv tool without any extra dependency whatsoever.

Here's a short list of bash commands that download Python 3.11.5 and set a virtual environment for it:

# You'll be installing your Python binaries at $HOME/python_bin.
mkdir -p $HOME/.python_bin/ && cd $HOME/.python_bin/

# Download the tar for the Python version you want.
curl -O https://www.python.org/ftp/python/3.11.5/Python-3.11.5.tgz

# Decompress and install it.
tar -xzf Python-3.11.5.tgz && cd Python-3.11.5
./configure --prefix=/tmp/localpython/3.11.5 && make && make install

# Create your environment anywhere you like.
./$HOME/python_bin/Python-3.11.5/python -m venv my_env
source my_env/bin/activate

That is it. Now you know how the whole process works (it is so easy) and you're using venv which was introduced to core Python in 3.3+. You can also play with compilation flags and build the binary with some extensions (but you don't have to!).

Of course this is still using some tools that abstract the burden of building the binary for the project. If you have never built a big C project like CPython before, you might be asking yourself what is this ./configure script, what is make and so on so forth. In a nutshell, this is how binaries are usually packaged - at some level either you are doing this or your operating system has come up with a standardised way to build from source for you via a package manager.

So now you can run however many virtual environments you want from that binary, and put them anywhere you like. If you want extra convenience to activate that environment for a particular project, just create an alias:

alias myproj="/somewhere/my_env/bin/activate && cd /somewhere/myproj"

If you aren't using Python 3.3+, just swap venv for anything else that works for your version, or heck, just directly use that Python executable for your project - it is totally disposable and you can download another one any time you like. Now that you know how the process works, it is very easy to change it to your taste, and that's exactly what I wanted to show in this post.

If you want some further ideas, this is the script I am using on my bashrc file.

install_python_version() {
  # call this function with a version of Python
  # like `install_python_version 3.11.9`.

  # Clean up first
  rm -rf /tmp/python-install

  # This is where the different Python executables will be installed.
  DIR=$HOME/.python_bin/python-$1
  mkdir -p $DIR

  # This is where temporary installation files will be available.
  mkdir -p /tmp/python-install && cd /tmp/python-install

  # Download the python version
  curl -O https://www.python.org/ftp/python/$1/Python-$1.tgz

  tar -xzf Python-$1.tgz && cd /tmp/python-install/Python-$1
  ./configure --prefix=$DIR && make && make install

  echo "Now you can install your virtualenv:"
  echo "$HOME/.python_bin/python-$1/bin/python3 -m venv /tmp/my_env"
}

You can invoke it from the shell with install_python_version 3.11.5.

But If Building From Source Was Good Package Managers Wouldn't Exist...

While this is generally true, and package managers are incredibly useful tools, I think that it is worth picking a battle now and then and building something from source when it makes sense to do so. I think that at a minimum, being comfortable building your main tools plus other tools that are notorious for having conflicting versions from source is a good general advice.

In my case, I rely on my OS package manager a lot for my secondary tools. But even though pacman is a great package manager, it is not without its drawbacks. It only builds dependencies with the default flags. If I need more customisation, I have to step out of the manager or understand how the manager works so that I can apply the particular building flags I want.

This is also a problem in a rolling release system like Arch Linux, as installing multiple versions of the same dependency will point you towards some form of virtualisation (using docker, for example) or building from source.

Which Assembly Syntax to Choose?

2024-07-24T00:00:00Z

Which Assembly Syntax to Choose?

Created at: 2024-07-24
Updated at: 2024-08-13

TLDR: Use the Intel syntax, but AT&T isn't that bad.

I usually prefer not to post content that is already easily searchable on the internet. But the problem is, the really great information on this topic seems to be distributed across just a few different places which sometimes are tricky to find and often not argumentative enough to prescribe a syntax recommendation.

That means that when I eventually forget why I picked one versus another, I have to scramble across various posts to figure out which syntax to use for a new project.

Top results on Google don't help much as many link to Reddit threads. Due to the nature of Reddit, the arguments are rare or non-existent.

As you already figured out from the TLDR at the top, I prefer the Intel syntax. I think that a good approach is to be contrarian and start with the differences that seem to make the Intel Syntax look less desirable. I am a fan of honest downsides being up front, and I think it makes an article more honest. So here we go.

Order of Operands

In the Intel syntax, the first operand is the destination and the second operand is the source, whereas in AT&T it is the opposite. This is just about the most confusing thing when you are comparing AT&T assembly with Intel assembly.

If you don't read assembly often, it is easy to forget which order each syntax uses.

| Intel         | AT&T             |
| --------------|------------------|
| mov rax, 0xFF | movq $0xFF, %rax |

I prefer the AT&T syntax here because it flows better in English. E.g. "Move the value 0xFF into rax".

The counter argument here for some people is that they still prefer the Intel syntax in this case because it reads like C:

mov rax, rdx        ; rax = rdx
sub rbx, rdi        ; rbx -= rdi
shlx rax, rbx, rdi  ; rax = rbx << rdi

If that mode of thinking fits your brain well you probably won't see that as a problem. For me, I always have to "reverse think".

Update (2024-08-13): There is another counter argument. I've come to realise that ABI rules favour the Intel syntax. So for example the function:

long sum(long foo, long bar);
// foo -> %rdi
// bar -> %rsi

foo is stored in rdi ("d" standing for destination), and bar is stored in rsi ("s" standing for source). The convention is to have the destination first then the source, just like in Intel syntax.

AT&T is The Default on GCC, objdump, and GDB

This point isn't about syntax at all, but I often find tooling characteristics relevant when making an important choice and thus I can't ignore them. I spend a great deal of time inside gdb and also printing objdumps and if there was a major inconvenience about using a syntax that would put a damper on my using of gcc, objdump and gdb, I'd probably consider learning a new syntax.

For historical reasons GAS (the GNU disassembler that is a backend of GCC) originally used the AT&T syntax. Support for Intel was only used later, and naturally the default remained AT&T syntax.

This can be changed by configurations, of course, so I have the following line in my ~/.config/gdb/gdbinit file:

set disassembly-flavor intel

And when using gcc's disassembler I use the following:

gcc -S -masm=intel

And finally for objdump I have to run:

objdump -Mintel

This isn't a problem on my local machine since I can use aliases. But on another dev environment, or when someone is sharing some code from theirs, it isn't absurd to expect they'll be using the defaults. This was a strong reason for me to commit to learning both syntaxes well. I do have to spin my brain on hyperthreaded mode to read AT&T syntax. Writing is a bit harder for me because I keep forgetting the instruction suffixes, and the % and $ signs as I'm more used to writing Intel.

Comments

Intel syntax uses ; for comments. Whereas AT&T uses # or C style comments. I do have a slight preference for AT&T style here (C style comments!) but this is the last point where I think AT&T syntax is better.

Now the cons... I will follow course and start with the minor problems and go up to bigger problems.

Suffixes

Many instructions require suffixes on AT&T when the size of operands matter:

# AT&T operator suffixes
movb al, bl
movw ax, bx
movl eax, ebx
movq rax, rbx

b is for byte, w is for word (16 bits), l is for long-word (32 bits), and q is for quadword (64 bits).

I don't know why the 32bit length is called "long-word". I imagine it's because it was added when 32 bits were seen as the limit and "long" made sense then.

As soon as we got 64 bits "long" became a confusing word. Specially because C has the long keyword and on modern machines sizeof(long) is 64 bits instead of 32 bits. In Intel syntax this is called a "double word", which in my opinion is a much clearer nominator.

This is a minor issue, you get used to it. In the Intel syntax you often don't need size specifiers because the operands give you this information implicitly:

; because esi is 32bits, this is
; the equivalent of "movl" in AT&T
mov esi, 8

However other operations in Intel syntax may also require a suffix if the operators alone aren't sufficient to determine the size of the operation. For example:

; how many bytes??
mov  [rbp-20], 20

You are moving 20 to the address in memory calculated by the value rbp-20 but how many bytes from the value "20" are you moving? You need to clarify:

mov DWORD PTR [rbp-20], 20

Prefixes

Both registers and immediate values have prefixes in AT&T syntax.

# AT&T
movl $25, %rdi

The fact that Intel doesn't use prefixes for registers and immediate values already shows the reader that prefixes aren't necessary.

The only "downside" I can think of (and please reader correct me if I am wrong), is that we can't have symbols with register names in Intel i.e., rax is not a valid symbol name.

For example this code fails to compile:

main:
  mov eax, ebx
  call ax
  ret
ax:
  mov bl, cl
  ret

Changing ax to something other than a register name will fix the code. This may only be a problem when writing code manually. But note that if you are overriding gcc defaults the following code blows up when running gcc -masm=intel main.c

#include <stdio.h>

long rax(int a, int b) {
  return 32*a << b;
}

int main() {
  long a;
  a = rax(42, 42);
  printf("%ld", a);
}
// Error:
// gcc -masm=intel main.c
// A.s: Assembler messages:
// Error: .size expression for rax does not evaluate to a constant
//

That blows up because a symbol (the function named rax) uses the name of a register. Changing the name of the function to something else fixes the problem.

Memory Operands

This is the biggest pain point of AT&T. Addressing memory scales.

Intel, AT&T
instr bar, [base+index*scale+disp], instr disp(base,index,scale),foo
add rax,[rbx+rcx*0x4-0x22], addq -0x22(%rbx,%rcx,0x4), %rax

Note that displacements aren't the same as immediate values and thus don't require a $ prefix. I'm sure some will think of it as an inconsistency.

This is where everything packs together. The suffixes, prefixes, and a strange way to calculate memory addresses. At least the form never changes, so once you're used the expression it becomes more familiar.

Final Remarks

Is that all? Why! It doesn't look so bad!

Well, it doesn't look so bad because it isn't that bad! But also keep in mind that I didn't show you any long snippets of assembly code. Take a file with 200 lines of assembly and naturally the AT&T syntax will be more visually daunting.

There are also other arguments I didn't add here regarding documentation. Intel manuals naturally use the Intel syntax, and there are plenty of Intel manuals out there, so chances are you'll be reading some. Also some of the MCUs I've worked with on embedded systems follow a syntax that is closer to Intel.

If you are writing a new project in Assembly I'd recommend the Intel syntax.

But considering that you will likely come across both when reading code, my recommendation is to learn both syntaxes, and if you don't use assembly that often just keep a cheatsheet handy so that you can quickly navigate between the discrepancies.

mov edi, edi

2024-06-08T00:00:00Z

mov edi, edi

Created at: 2024-06-08
Updated at: 2024-07-27

I had a surprise today when I saw the instruction mov edi, edi as the first instruction of a function call.

This is my C code:

unsigned int func(unsigned int idx) {
  static unsigned int my_table[] = {10, 20, 30, 40};
  return my_table[idx];
}

Which returned the following x86 assembly (compiled via gcc):

func:
  mov  edi, edi
  lea  rax, my_table.0[rip]
  mov  eax, DWORD PTR [rax+rdi*4]
  ret
  .size  func, .-func
  .section  .rodata
  .align 16
  .type  my_table.0, @object
  .size  my_table.0, 16
my_table.0:
  .long  10
  .long  20
  .long  30
  .long  40

This code was compiled with the flag -O3, which I thought was going to eliminate all useless instructions. To my surprise, when I removed all the unsigned keywords from the function, the mov edi, edi disappeared in favour of a movsx rdi, edi! Here's the equivalent asm code:

func:
  movsx  rdi, edi
  lea  rax, my_table.0[rip]
  mov  eax, DWORD PTR [rax+rdi*4]
  ret
  .size  func, .-func
  .section  .rodata
  .align 16
  .type  my_table.0, @object
  .size  my_table.0, 16
my_table.0:
  .long  10
  .long  20
  .long  30
  .long  40

I went on a spiral of research, and I found many links pointing to this instruction being necessary in Microsoft Windows, so that the OS could operate hot-patching. source

However, I compiled this on Linux. This should not be relevant to me. Here is the catch; that mov edi, edi operation is used for zero'ing the most significant 32 bits of the rdi register.

It does not seem obvious, but the answer can be found in the x86 tour of Intel manuals source

General-purpose Registers (...)

32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register. (...)

UPDATE: In case it wasn't clear from the quote above, the zero-extension only works for 32 bit operands. If you run mov di, di (di is 16 bits long), the zero-extension will not happen.

The zero-extension is indeed what happens when I try to run the following mock assembly code below:

main:
  ; Load `rdi` with all one's.
  mov rdi, 0xFFFFFFFFFFFFFFFF
  ; After the instruction below,
  ; rdi will be 0x0000000011111111
  mov edi, edi
  ret

This was not obvious to me at all. The initial instruction mov edi, edi just looked like a nop equivalent with two bytes...

Coming back to my original function:

unsigned int func(unsigned int idx) {
  static unsigned int my_table[] = {10, 20, 30, 40};
  return my_table[idx];
}

Since I am using unsigned integers, the compiler can trust that the arguments passed to that function in assembly won't be more than 32 bits long in my machine.

UPDATE: The compiler actually doesn't need to "trust" anything, it actually does not matter. The movsx instruction accepts operands of different sizes. This means that the 32 bits in edi will be moved with sign-extension to fit the 64 bits of rdi. The underlying 32 bit value will remain the same, and it doesn't matter what bits were in the most-significant upper 32bits of rdi before the mov operation - they will just be completely ignored. That is why the instruction mov edi, edi is not necessary beforehand!

Remember that the ABI for C functions calls in assembly is that the first argument to the function, in this case idx, will be passed in the register rdi.

So this function is cleaning up the most significant bits of rdi for us. I am still not totally sure why this is necessary, but perhaps the compiler assumes that some garbage could be held in the most significant bits of rdi and tries to clean that up first to avoid potential bugs.

This assumption makes sense to me at first, because down in the assembly function body, we rely on rdi for finding the address offset of the element in the table that we want to return: mov eax, DWORD PTR [rax+rdi*4].

Now remains the question: "Why is there an assumption by the compiler that rdi can contain garbage in the most significant bits?".

This can happen if the function is being called with a "casted" value given that casting per-se does not clean up unused bits of a 64bit register. That could happen if a 64 bit integer was casted down to a 32 bit one.

Again, this is very much based on my own understanding on how assembly works in my platform, if you think that I got something wrong please send me an email at marceelofernandes@gmail.com.

Goodbye ZSH

2024-05-08T00:00:00Z

Goodbye ZSH

Created: 2024-05-08
Updated: 2024-07-06

After 7 years using zsh and oh-my-zsh, I've completely ditched both of them today.

I would like to state at the top that there isn't anything inherently bad or wrong with zsh and oh-my-zsh. It is just that these technologies don't fit well within my way of doing things, and have become unnecessary over time.

There are many reasons for this, but I will start with the reasons for getting rid of oh-my-zsh first.

oh-my-zsh

One may think that oh-my-zsh is zsh itself, but that is not true. oh-my-zsh is simply a "plugin manager" for zsh.

The oh-my-zsh package promises wonders. From their website:

Oh My Zsh will not make you a 10x developer...but you may feel like one!

Once installed, your terminal shell will become the talk of the town or your money back! With each keystroke in your command prompt, you'll take advantage of the hundreds of powerful plugins and beautiful themes. Strangers will come up to you in cafés and ask you, "that is amazing! are you some sort of genius?"

oh-my-zsh comes with hundreds of plugins pre-installed, many of which you will never use or hear of, and that includes themes as well.

Even though this isn't a problem in itself, as those plugins are just text files that will hang around in your system, they are still there when you didn't ask for them.

This is something that I personally have been trying to reduce in my system as the burden of maintenance rises with every package added.

The less unused files, dependencies, libs, etc, the less risk there is of something crashing, requiring updates, or being a security risk. This is particularly relevant as on-my-zsh plugins are just a bunch of zsh shell scripts.

But this is just me preaching a particular philosophy. A more important practical problem, is around the bash aliases that oh-my-zsh brings with it. Many of each are for applications you may not even have installed.

For example, these are some of the aliases available with oh-my-zsh:

alias help='man'
alias _='sudo '
alias :3='echo'
alias dud='du -d 1 -h'
alias drm='docker container rm'
alias p='ps -f'
alias rm='rm -i'
alias ldot='ls -ld .*'
alias lS='ls -1FSsh'
alias hadat='heroku addons:attach'

This might not be a problem unless you have a crashing alias. But still, did you know you had all those aliases available? Are them really that useful to you? Have you come across them accidentally and became surprised? The helper alias to man particularly bothers me. But also, I don't want heroku aliases in my user land.

Even if some aliases or plugins are useful, you can copy the ones you want, and just plug into your .bashrc file.

In the end of the day oh-my-zsh plugins are just bash files written in zsh syntax, many of which are compatible with plain bash, or can be easily ported.

There is no versioning control or anything fancy like that. Just files. This is one of the reasons many people won't categorise oh-my-zsh as a plug-in manager (and why it put it between quote marks when I mentioned it earlier), as it lacks so many features to that end.

For me it comes down to: I don't need this technology, and it does not add much value to my daily use of my computer, therefore it must go.

Next are the reasons why I stopped using zsh.

zsh

One thing that people aren't really aware of is that zsh doubles as a scripting language of its own. They might not realise this until they share a script with someone, and that script doesn't run on their machine.

For example, the syntax below is only available in zsh:

# All files that are NOT .c files (^ provides negation)
ls -d ^*.c

# Grouping
ls (foo|bar).*

# Recursive search with **
ls **/*bar

There are more advanced filename-generation patterns, but you get the idea.

zsh also allows you to cd into a directory just by typing its name

% cd /
% setopt autocd
% bin
% pwd
/bin

The official introduction page has a lot more examples of what is available. You can check it here.

Some functionalities like expanding /u/lo/b to /usr/local/bin are things that I do not want to have in my shell, and they strike me as bad patterns due to the high risk of doing the wrong matching and expanding to the wrong dir or file.

But the biggest problem for me is that the zsh scripting language adds way too many non-POSIX compliant features that end up confusing me a lot. I always have to look up the syntax to make sure my zsh script isn't going to be flawed in another environment that doesn't use zsh due to invalid syntax errors.

This diminishes my ability to write good portable scripts.

Part of this is skill-issue on my side (everything is!) as we know every bash script should be POSIX compliant (joking, not even bash is POSIX compliant), but nonetheless, for newcomers like I once was, picking up the shell that looked the most "cool" was part of a factor for picking up a shell.

This type of problem is more pronounced for me because I have several bash scripts that I created overtime with zsh scripting not even knowing I was using zsh scripting. This is a common newbie mistake to make, but when you just want to get something going you often get into these types trade-offs that become more pronounced later once you have mastered a few tools.

So what is the alternative to all of this?

bash

Yep. I'm just using plain bash now and trying to figure out how far I can get with it. So far I haven't got a reason to get anything more featureful than bash.

I have been using fzf in the terminal, which is a dependency I already had and am familiar with, to deal with autocompletion and recursive command search instead. The experience is much better than the zsh autocompletion.

These are the lines in my .bashrc that turn on the fzf integration.

source /usr/share/fzf/key-bindings.bash
source /usr/share/fzf/completion.bash

This will enable:

Ctrl+t list files+folders in current directory (e.g., type git add , press Ctrl+t, select a few files using Tab, finally Enter)
Ctrl+r search history of shell commands
Alt+c fuzzy change directory

This is handy for me, as the functionality is similar to the Telescope plugin I have been using in neovim, and I can see a quick preview of files but also a fuzzy-search output of the reverse search I'm performing at the time.

Also, I use alacritty as my terminal. Alacritty has vi key bindings, so I don't need my shell to provide that for me, one less feature I need from the shell!

Most terminal emulators have some form of emacs or vi key bindings these days, so this isn't something necessary for a shell to support.

But that said, I can still turn vi mode on bash with set -o vi, so you can choose between using vi mode on your shell or on your terminal.

And that is pretty much it.

Nothing fancy - just getting rid of new technology that doesn't aggregate value in my day-to-day activities.

I'm on a journey to make my installation script as lean as possible to make updating my system as fast as possible, and also to give my system less entrypoints to break or be exploited.

Granted, I haven't had any bad experiences with zsh, but that alone doesn't mean I should re-check my previous assumptions and switch a particular technology for something better (or just pick the boring tech that has always been there to begin with).

I have no plans to go more basic and further switch to sh at this stage, but I will be looking at dash next to get the sweet performance enhancements and something that is more POSIX compliant than bash.

Branchless Programming Experiments in C++ and Python

2023-08-22T00:00:00Z

Branchless Programming Experiments in C++ and Python

Created: 2023-08-22
Updated: 2024-07-28

This article talks about high-level theoretical concepts of branchless programming, along with examples of branchless programming in C++ and Python.

What's branchless programming and why does it matter?

A branchless program is a program that doesn't include any conditional operator (if, else, switch, ...).

The reason why people would go through the trouble of branchless programming is onefold: performance.

Modern CPUs try to read future instructions before they are executed so that they can stay ahead of the game. This is called "instruction pipelining", and is meant to implement instruction-level parallelism on single processors.

However, when the CPU is pipelining and a branch is present, the CPU won't be able to know what path it needs to run, so it takes a guess. When this guess is incorrect, the CPU discards the instructions previously read, and read the new instruction set for the correct path. This takes time and valuable clock cycles.

UPDATE: According to the author of the CSAPP book, microprocessors are architected in a way to achive branch prediction success rates of about 90%. The author also provides an estimation of 15 to 30 clock cycles of wasted work when the branch prediction fails.

How does Instruction Pipelining work?

The CPU is composed of multiple processor units. Each processor unit performs an instruction such as adding two numbers, comparing two numbers, jumping to a different part of a program, loading and storing data in memory, etc. Those operations are hardwired into the circuitry of the processor inside the CPU.

When the CPU is asked to perform an instruction, it will receive an opcode, which is just a unique binary number that the CPU will decode into controlling signals that will orchestrate the behaviour of the CPU.

The CPU executes an instruction by fetching it from memory (either the computer's memory or the CPU cache), following up by decoding the opcode, executing the instruction itself in the processor, and storing it back to memory.

In a nutshell, a pipeline is consisted of four stages: fetch, decode, execute, write-back.

Each one of those stages will be handled by a circuit in the CPU. So whenever a instruction needs to be run, there are 4 high-level steps until the result is finally stored in memory.

Pipeline analogy time

Imagine you are going to a buffet restaurant with 4 different dishes. This is a peculiar restaurant, and you need to wait for the person in front of you to go through all the 4 dishes and pay for it before you can go down and start serving yourself.

This is a waste of time. A better way of serving people is to only wait for the person in front of you to go through the first dish before you start serving yourself.

This is what CPUs try to do by "pipelining" the work. While one instruction is being decoded, the following one is already being fetched. When the first instruction is decoded and starts being executed, now the second one starts being decoded, and a third one is fetched, and so on so forth...

This is how it looks visually (image borrowed from wikipedia):

What happens if the person in front of you in the buffet grabbed all the chips from the buffet plate, and if you had known that in advance, you would go back and put another spoon of mashed potatos on your plate?

That happens a lot in the CPU when the next instruction depends on the execution of the current one. In this case, the CPU needs to wait for the first instruction to resolve before executing the next one, and this incurs a time penalty.

In the example below, during cycle 3 the purple instruction can only be decoded once the green one is executed. A bubble is created to represent that during cycle 3 the decode step will be idle, and subsequently on cycle 4 the execute step will be idle and so on so forth until the bubble is out of the pipeline - at which point execution resumes normally.

Sometimes it is even worse than this, you might have an if/else statement in your code, and the CPU tried to guess which one to load beforehand, but it it guessed the wrong one. Now it has to flush all of those instructions out of the pipeline and load the correct ones.

Here is where branchless programming comes handy. Code that doesn't have conditionals will likely have less erroneously-guessed instructions loaded as the equivalent code with conditionals.

How do branches look in assembly language?

Let's start with the strawman example. Here's some simple C++ code with a branch:

int max(int a, int b) {
  if (a > b) {
    return b;
  } else {
    return a;
  }
};

And the resulting assembly code (note: no optimisation flag turned on):

max(int, int):
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], edi
        mov     DWORD PTR [rbp-8], esi
        mov     eax, DWORD PTR [rbp-4]
        cmp     eax, DWORD PTR [rbp-8]
        jle     .L2
        mov     eax, DWORD PTR [rbp-8]
        jmp     .L3
.L2:
        mov     eax, DWORD PTR [rbp-4]
.L3:
        pop     rbp
        ret

You will notice that we have two conditional jumps. The equivalent branchless code looks like this:

int max(int a, int b) {
    return a*(a > b) + b*(b >= a);
};

max(int, int):
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], edi
        mov     DWORD PTR [rbp-8], esi
        mov     eax, DWORD PTR [rbp-4]
        cmp     eax, DWORD PTR [rbp-8]
        setg    al
        movzx   eax, al
        imul    eax, DWORD PTR [rbp-4]
        mov     edx, eax
        mov     eax, DWORD PTR [rbp-8]
        cmp     eax, DWORD PTR [rbp-4]
        setge   al
        movzx   eax, al
        imul    eax, DWORD PTR [rbp-8]
        add     eax, edx
        pop     rbp
        ret

This looks a bit more convoluted, and it has more instructions. However, we got rid of those jumps.

This example is terrible, and it's chosen on purpose. The first function, can be very easily optimised by the compiler if we use the flag -O3. Generating this assembly code:

max(int, int):
        cmp     edi, esi
        mov     eax, esi
        cmovle  eax, edi
        ret

Whereas for the second code, even with the optimisation flag on, the underlying assembly code is worse as the compiler can't optimise it further:

max(int, int):
        xor     eax, eax
        cmp     edi, esi
        cmovle  edi, eax
        cmovg   esi, eax
        lea     eax, [rdi+rsi]
        ret

In this case, the branchless C++ code fell apart due to the compiler being really good at optimisations. One of such optimisations is using branchless programming itself! However, this illustrates why it's important to actually see what the compiled code looks like. However, all things being equal, branchless code will be faster on an assembly level, and there will be many times where the compiler can't optimise the code (like when you have volatile variables all over).

What about interpreted languages?

Many interpreted languages don't have the cleverness for optimisation of a GCC compiler, and in many cases, code ran by the virtual machine is murky to the outsiders eyes. Nevertheless, I work with Python at the moment and it would be interesting to see what happens once branchless programming takes over.

Using the same example in Python we have:

def max(a, b):
    if a > b:
        return a
    else:
        return b

And this is the disassembled Python byte code into mnemonics:

  2           0 LOAD_FAST                0 (a)
              2 LOAD_FAST                1 (b)
              4 COMPARE_OP               4 (>)
              6 POP_JUMP_IF_FALSE        6 (to 12)

  3           8 LOAD_FAST                0 (a)
             10 RETURN_VALUE

  5     >>   12 LOAD_FAST                1 (b)
             14 RETURN_VALUE

First things first, what is happening under the hood? For every bytecode instruction that is executed, the interpreter will branch out many times. The comparison operator > for example, requires a branch to check for the opcode equivalent of >, another branch to verify if the object being compared has a __gt__ method, more branches to verify if both objects being compared are valid for the comparison being performed, and many other branches until the value of the function call is actually computed and returned.

We cannot compare Python bytecode with a single machine-level instruction, because a single bytecode instruction will perform many machine-level instructions inside the interpreter. Also, some Python bytecode instructions like calling a function are more expensive than other simpler ones like performing a mathematical operation like adding.

With all the conditional compilation clutter removed from CPython, the code that evaluates a piece of bytecode into a C instruction is as follows:

PyObject* _PyEval_EvalFrameDefault(/* ... */ ) {
    // context setup
    for (;;) {
        // periodic check
        switch (opcode) {
            case TARGET(LOAD_FAST): {
                PyObject *value = GETLOCAL(oparg);
                if (value == NULL) {
                    format_exc_check_arg(/* ... */ );
                    goto error;
                }
                Py_INCREF(value);
                PUSH(value);
                FAST_DISPATCH();
            }
            case TARGET(STORE_FAST): {
                PyObject *value = POP();
                SETLOCAL(oparg, value);
                FAST_DISPATCH();
            }
            case TARGET(BINARY_MULTIPLY): {
                PyObject *right = POP();
                PyObject *left = TOP();
                PyObject *res = PyNumber_Multiply(left, right);
                Py_DECREF(left);
                Py_DECREF(right);
                SET_TOP(res);
                if (res == NULL)
                goto error;
                DISPATCH();
            }
        /* ... */
        }
    }
error:
    // exception unwinding
}
    // context cleanup
}

The full implementation is here. The interesting bit is that even for a simple instruction like LOAD_FAST, we can see a branch in the top-level case statement handler.

This means that to get a rough estimation of how two functions compare, we'll need to check how many bytecode instructions there are, and how expensive those bytecode instructions are.

At the moment of writing, I haven't found a handy table of Python bytecodes ordered from more-overhead to less-overhead, so we'll analyse one by one.

Our max(a, b) function above had the following instructions:

LOAD_FAST (4x): Performs an index lookup in the local variables array to load the variable. This is pretty fast.
COMPARE_OP (1x): Has a very high overhead when the comparison operator is not just checking object identity as it needs to look at what is in the dunder method for the particular comparison.
POP_JUMP_IF_FALSE (1x): Has a low overhead from the interpreter's perspective as the next position to jump to is not hard to find out by reading the bytecode.
RETURN_VALUE (2x): This just pops the stack, nice and easy.

How about the branchless version?

def max(a, b):
    return a*(a > b) + b*(b >= a)

opcodes:

  2           0 LOAD_FAST                0 (a)
              2 LOAD_FAST                0 (a)
              4 LOAD_FAST                1 (b)
              6 COMPARE_OP               4 (>)
              8 BINARY_MULTIPLY
             10 LOAD_FAST                1 (b)
             12 LOAD_FAST                1 (b)
             14 LOAD_FAST                0 (a)
             16 COMPARE_OP               5 (>=)
             18 BINARY_MULTIPLY
             20 BINARY_ADD
             22 RETURN_VALUE

We already can tell that this will not be light on the interpreter due to having double COMPARE_OP instructions. The other differences here are:

BINARY_MULTIPLY: Surprisingly this has a considerable amount of overhead. The interpreter needs to figure out the types being multiplied and find their underlying multiply function before they can actually be multiplied. So a "binary multiply" does not mean the interpreter will just process a C * between the two operands.
BINARY_ADD is very similar to the above, curiously enough it seems like someone tried to optimise int summation but failed.

/* NOTE(haypo): Please don't try to micro-optimize int+int on
   CPython using bytecode, it is simply worthless.
   See http://bugs.python.org/issue21955 and
   http://bugs.python.org/issue10044 for the discussion. In short,
   no patch shown any impact on a realistic benchmark, only a minor
   speedup on microbenchmarks. */

In conclusion this kind of branchless optimisation does not quite work with Python. However, due to time, I haven't really analysed other branchless techniques that are superior in many situations like bit masking.

A Critique of SOLID

2023-04-15T00:00:00Z

A Critique of SOLID

Created: 2023-04-15
Updated: 2024-07-06

Introduction

SOLID is an acronym coined by Robert C. Martin (also known as uncle Bob), particularly focused at making Object Oriented Programming designs easier to understand, maintain, and adapt.

The original paper (archived) introducing the term in 2000 is a quick read worth checking, even if only for historical context.

Before the paper starts to talk about SOLID, it mentions the 4 symptoms of "rotting software" (a very popular term between 1998-2006 according to google ngram). Those 4 symptoms are: rigidity, fragility, immobility and viscosity.

It is important to know what those terms mean, because they are the reason that SOLID principles exist in the first place. Here is a brief summary:

Rigidity: How difficult it is to change the code.
Fragility: How easy it is to break the code.
Immobility: How hard it is to reuse existing code.
Viscosity: How hard it is to preserve the existing design of code when developing new changes.

The general expectation set by Robert C. Martin is that if you follow the SOLID principles your code will experience less software rot.

SOLID

The 5 principles of object oriented class design (as called by the paper), are:

Single responsibility principle.
Open-closed principle.
Liskov substitution principle.
Interface segregation principle.
Dependency inversion principle.

My plan is to analyse each of them critically, understanding their weaknesses and providing evidence to counter their adoption.

The Open Closed Principle (OPC)

According to Martin this is the most important principle, and we will start with it. Here is its definition:

A module should be open for extension but closed for modification.

This sounds simple and reasonable. Perfect code that addresses a particular problem is only needed to be written once. To address further problems, this perfect code doesn't need to change, but only be extended.

When developers keep this in mind, they try to produce perfect code that can be extended whilst keeping its original elegance and efficiency. If that's materialised, the developer can be said to be following the OPC principle.

You probably can see where this is going by my wording ("perfect code", whatever that actually means). But let's not diverge from the theme just yet, we will go through an example of code provided by Martin that violates the OPC principle:

struct Modem {
    enum Type {hayes, courrier, ernie) type;
};

struct Hayes {
    Modem::Type type;
    // Hayes related stuff
};

struct Courrier {
    Modem::Type type;
    // Courrier related stuff
};

struct Ernie
{
    Modem::Type type;
    // Ernie related stuff
};

void LogOn(Modem& m, string& pno, string& user, string& pw) {
    if (m.type == Modem::hayes)
        DialHayes((Hayes&)m, pno);
    else if (m.type == Modem::courrier)
        DialCourrier((Courrier&)m, pno);
    else if (m.type == Modem::ernie)
        DialErnie((Ernie&)m, pno)
    // ...you get the idea
}

The LogOn function violates Martin's OPC because its inner code needs to change every time a new modem is added. Moreover, every modem type depends on the struct Modem. Therefore if we need to add a new type of modem to that struct, all the existing modems need to be recompiled.

So how do we make this code better? According to Martin:

Abstraction is the key to the OCP

There're a few possible abstraction techniques that conform to OPC. Let's follow the first one Martin provides, Dynamic Polymorphism.

Dynamic Polymorphism

In Object Oriented Programming this means we will have an abstract class with abstract methods (virtual functions), and concrete child classes implementing the actual code for each method.

Here the word "Dynamic" means that the form (concrete class implementation) is found during run time. Putting it all together and rewriting the code above, we end up with something like the example provided in Martin's paper:

class Modem {
public:
    virtual void Dial(const string& pno) = 0;
    // other virtual methods here, you get the idea.
};

void LogOn(Modem& m, string& pno, string& user, string& pw) {
    m.Dial(pno);
    // you get the idea.
}

Now the class Modem is closed for modification when we need to add new types of modems. To use the LogOn function, you need code like this:

class Hayes : public Modem {
public:
    virtual void Dial(const std::string& pno) {
        // do something...
    }
};


int main() {
    Hayes hayes = Hayes();
    string pno = "1";
    LogOn(hayes, pno);
}

Now that we see the whole picture we can tell that the function LogOn calls the method Dial on the child instance. This child instance is only available at runtime. Note that Martin omits this initialisation code in his original paper. He also does not discuss the downsides of this approach.

However, I don't think we should toss this example away too quickly.

We know for a fact that when types are only resolved at runtime our code becomes slower. But why? For each class that inherits virtual functions the compiler creates a vtable. A vtable is essentially a table of function pointers. When an object is created at runtime, a pointer to its vtable is added to its memory layout, and when a virtual function is called on this object, the code looks at the appropriate function pointer on the object vtable and then calls the function through that pointer.

This all means that by using Dynamic polymorphism our code requires an additional level of indirection for each virtual function call, and the memory footprint of objects will increase program memory usage too.

Furthermore, the given example is a simplistic one. In cases where dynamic polymorphism is overused, the code may have convoluted inheritance hierarchies. Such complexity impairs compiler optimization as virtual function calls aren't assignable until runtime.

You might be thinking that the code above looks clean and that there's no reason to not do it. After all, reduced performance and more memory usage is a fair price to pay for cleaner code, right? Well. What if we didn't have to pay this price and still have "clean code"?

We will get there, but to keep things fair we need to go through another abstraction example given by Martin.

Static Polymorphism

What if there was a way to have something similar to Dynamic Polymorphism but without the runtime overhead, with more compiler optimisation, and more control over object types? Here is where static polymorphism comes handy.

Static here means that the child type will be defined at compile time. But how? The answer is by using generic programming templates.

template <typename MODEM>
void LogOn(MODEM& m, string& pno, string& user, string& pw) {
    m.Dial(pno);
    // you get the idea.
}

This looks alright, but is this a real improvement? What is the trade off?

Each time a template is instantiated a new copy of the code is created. This is how compilers make generic programming type-safe at compiling time. The more instantiations your code has, the larger your executable becomes, and the longer compilation takes.

This all means that we are trading runtime performance for compile time performance. It might be OK to do so in certain applications, but what if we didn't have to?

Addressing the open-closed principle

Martin's original argument stressed the following:

The dependency between the Modem struct and its implementation structs is bad.

My problem with this assertion is that it sounds like a straw man fallacy. You absolutely don't need to create one struct for each type of modem. Let's fix the example:

enum ModemType { HAYES, COURRIER, ERNIE };

struct Modem {
    ModemType type;
    // other attributes, you get the idea.
};

void LogOn(Modem& m, string& pno) {
    switch (m.type) {
        case ModemType.HAYES:
            // do something with hayes
        case ModemType.COURRIER:
            // do something with courrier
        // ...you get the idea
    }
}

Now the compiler knows all the code paths the code goes through at compiling time. Code can easily be optimised.
No layers of indirection and thus no performance costs at runtime.
No memory overhead.
No executable size overhead.
Less lines of code.

Is the code above objectively bad? Is it difficult to read? I could argue that any CS student could make sense of this code in a very short amount of time.

The idea behind polymorphism is that it makes code flexible and easier to read. However, the flexibility of using virtual functions can be costly as we have seen. If you don't know whether you need that flexibility yet, you will be imagining future use cases that may use your code. If this flexibility ends up not being needed, the programmer has fundamentally over-scoped their own code and caused unnecessary memory and CPU degradation to their program with no added benefit.

Here is where the trade-off lies. Imagine that you need to add a new type of modem. Using the switch above you'd have to change the LogOn function and recompile it. You will also need to recompile everything that depends on the LogOn function, and that can be a lot. If you were using polymorphism, you just needed to add a new type (potentially in a new file), and you would need to only compile one single file plus the place where it is instantiated (and everything that depends on that).

But what if you need to add new functionality in the abstract class? In the polymorphism case you'd need to add a new function for every single type, and that would induce a lot of recompilation across the project for a fairly use piece of code.

Differently, in the switch case, you could add a new function (potentially in a new file) and you only need to compile that one single file.

Someone might argue that a potential downside from the switch approach is that when a new type of Modem is added, you need to track all the functions that have a switch (m.type) to change their code, and this can induce human error.

I would tend to agree. However, compiler warnings will let you know of all these use cases that are missing a switch handle. For example:

#include <stdio.h>

enum ModemType { HAYES, COURRIER, ERNIE };

struct Modem {
    enum ModemType type;
};

void LogOn(struct Modem* m) {
  switch (m->type) {
    // Note how we are only handling HAYES and
    // forgot to handle COURRIER and ERNIE.
    case HAYES:
      printf("LogOn HAYES");
      break;
  }
}

int main() {
  struct Modem m = {HAYES};
  LogOn(&m);
  return 0;
}

If I try to run this code with:

gcc -Wall <file_name> -o /tmp/a.o

I get the following error:

/tmp/main.c: In function ‘LogOn’:
/tmp/main.c:10:3: warning: enumeration value ‘COURRIER’ not handled in switch [-Wswitch]
   10 |   switch (m->type) {
      |   ^~~~~~
/tmp/main.c:10:3: warning: enumeration value ‘ERNIE’ not handled in switch [-Wswitch]

In other words, the compile catches this mistake for us.

In conclusion, this design choice will determine what will be easier and what will be harder for your program. Ask yourself the question: "Do I want it to be easier to add more functionality to my program, or do I prioritise adding more types?". The answer will dictate what is best for your situation.

Polymorphism isn't a silver bullet that will always make your code OPC compliant. In fact, the more functional your code looks like, the greater the penalty of using polymorphism.

In my professional experience, I spend much more time adding new functionality to existing software than I spend adding new types. So for me, it would be inadequate to prefer an architecture that focuses on types.

The Liskov Substitution Principle (LSP)

This principle states that an object (such as an abstract class), may be replaced by a sub-object (such as a child class) without breaking the program.

This means that if we have a function foo(Parent bar), we should also be able to call foo as foo(Child bar) without altering the correctness of the program.

Martin uses an example of a parent class called Ellipse and a child class called Circle. As every circle is an ellipse with a very particular configuration, this sounds about right. However, Martin only uses this example to stress that it is an inheritance bad practice.

void f(Ellipse& e) {
    Point a(-1,0);
    Point b(1,0);
    e.SetFoci(a,b);
    e.SetMajorAxis(3);
    assert(e.GetFocusA() == a);
    assert(e.GetFocusB() == b);
    assert(e.GetMajorAxis() == 3);
}

The function f above, for example, can't receive a Circle instead of an Ellipse. Reason being that the method setFoci will alter the circle and turn it into an ellipse. This can become a subtle bug in the application code.

A safe option is for the Circle::SetFoci method to add an extra validation, asserting that a == b. This violates LSP. The child object now has an extra restriction that the parent object doesn't. Martin concludes that:

derived methods should expect no more and provide no less.

I don't have much to critique about this principle itself. To be honest it sounds alright to me. However, this is only one more thing to remember when you're using polymorphism, which contributes as a negative to the OPC principle through abstraction that we discussed above.

Apart from performance and memory degradation, the programmer also has to worry about loose/tight contracts between the parent and the child. This means that the functionality may not work that well if the type implementation is faulty.

Martin himself comments that LSP violation can be costly. If the interface is being used in many different places, the cost of repairing this violation might be too much to take. A possible solution, as stated by Martin, is to provide an if/else statement to make sure the Ellipse is indeed an Ellipse.

Another problem of this type of polymorphism is that the Circle object is way simpler than an Ellipse. This means that the Circle class will inherit many methods that it doesn't need to use, generating method spam. This violates another principle of SOLID, the Interface Segregation Principle that we will look into later on.

The Dependency Inversion Principle (DIP)

Depend upon Abstractions. Do not depend upon concretions.

In essence, this principle means that code in high level modules should not depend on code in low level modules. High level modules are usually top abstractions that express the policies of the application, whereas low level modules contain the implementation details. In other words, the abstraction should not worry about the implementation.

By inverting the dependency, i.e., making the low level modules depend on the high ones and not the opposite, a developer will be DIP complaint.

You might be noticing a trend now. Most of these SOLID rules are greatly related to polymorphism. After all, they were created for OOP. As I feel like I have addressed the main trade-off in that approach, I could easily end up repeating myself here. Let's proceed nonetheless :')

Inverting dependencies in a codebase must be seen as a trade-off. As now your abstraction can be changed without having to worry about the implementation details, you can't change the implementation details without having to worry about the abstraction. This means that if the interface changes often, it will be harder to manage the concrete implementations.

More over, classes that could simply be a concrete implementation alone, with no dependency on a high-level interface, now may have an artificial interface so that this principle isn't violated. This usually happens because "you never know whether another concrete implementation that needs to use the interface might come about". One regular example is an interface ShipmentPolicy which has only one concrete implementation, often called ShipmentPolicyImpl. One could say that this is a code redundancy.

The Interface Segregation Principle (ISP)

ISP states that no code should depend on methods it does not use. This implies having smaller interfaces so that clients that depend on them only need to know a few relevant methods.

This idea sounds reasonable, and it's a way of controlling inheritance method spam when inheriting from bloated interfaces. But I think this technique is more of a refactoring tool rather than a principle itself.

Treating ISP as a principle can add unnecessary complexity. You should design your code to solve the problem at hand rather than worrying about whether your interface will become too bloated in future when more use cases are added. As you don't know how your codebase will progress in future, you shouldn't make compromises from early on. If an interface grew too much, and now you have legitimate reason to split it into smaller ones, then go and refactor it.

As all the other principles we have discussed so far, take ISP as a trade-off instead of a principle.

Here are a few quotes from the Code Complete book that are relevant for the matter:

If the derived class isn't going to adhere completely to the same interface contract defined by the base class, inheritance is not the right implementation.

Be suspicious of base classes of which there is only one derived class.

Single Responsibility Principle (SRP)

This principle wasn't included in the original publication I linked above, and more details can be found in this blog post from clean coder.

This principle builds up on the "Separation of Concerns" term that was popularised in a famous article "On the role of scientific thought" by Dijkstra source

The principle can be summarised as:

The Single Responsibility Principle (SRP) states that each software module should have one and only one reason to change.

The definition lacks specification and my major criticism here is the specification itself. What is a reasonable "reason to change"?

Martin gives more information on the blog post linked about, saying:

Imagine you took your car to a mechanic in order to fix a broken electric window. He calls you the next day saying it’s all fixed. When you pick up your car, you find the window works fine; but the car won’t start. It’s not likely you will return to that mechanic because he’s clearly an idiot. ... That’s how customers and managers feel when we break things they care about that they did not ask us to change. ... Another wording for the Single Responsibility Principle is: Gather together the things that change for the same reasons. Separate those things that change for different reasons. ... However, as you think about this principle, remember that the reasons for change are people. It is people who request changes. And you don’t want to confuse those people, or yourself, by mixing together the code that many different people care about for different reasons.

I agree that mixing code from different teams with different responsibilities isn't ideal. I would call that unnecessary coupling.

It is obviously bad, for example, for a change in a back-end billing engine of a bank to affect its front-end application and display data in a different format.

Although I think that the definition isn't ideal, the principle here does sound like the most reasonable in the list.

Conclusions

In my opinion, those shouldn't be called "principles". The word principle as it is defined below should be reserved for terms that are really hard to debunk:

Principle: a fundamental truth or proposition that serves as the foundation for a system of belief or behaviour or for a chain of reasoning. "the basic principles of justice"

Most of what we have seen here fit within the saying "different horses for different courses". I do appreciate that when those principles were written OOP was all the rage and many people were investing resources in it and many books were written. So I can accept that they were the single truth at the time.

However, having many decades passed since the inception of those principles, many other things have improved in the industry. I feel like we have moved forward as a whole.

OOP is not considered the only way of developing any more, though still very popular. Polymorphism, inheritance, and most importantly multi-level inheritance is seen with bad eyes. More and more people have come to appreciate composition over inheritance.

The classic book example of inheritance Shape->Ellipsis->Circle is very hard to derive in the real world in a non-forceful way. Inheritance and thus polymorphism has become a way of getting generic functionality from parent classes instead of sharing identities between classes of the same base implementation, and thus much tangled code has been created so that unrelated classes could get the same shared behaviour. I feel like a lot of people nowadays have scars to prove that.

Nonetheless, I feel positive that Martin has created these principles even though I don't agree with them fully. It is easy to look back on the past and point fingers about decisions that don't apply to the present. I think that overall the popularity of SOLID and the outcome of having more people thinking about designs and their own set of principles is a positive thing.

Linux Rice

2020-08-30T00:00:00Z

Linux Rice

Created: 2020-08-30
Updated: 2024-07-06

Introduction

When someone is "ricing" their unix system, they are making functional and visual customisations to their desktop. These changes could be anything from changing the colour of a status bar to completely restructuring their computer environment.

Why?

Productivity: You can customise your applications and keyboard shortcuts to satisfy your work-flow.
Performance: You are in control of what gets installed on your application and not have to worry about unknown apps running on the background.
Privacy: It is your system and the defaults in some distributions can contain software that can spy on your behaviour like Canonical has done in the past.
Visual Satisfaction: Whatever colour scheme you like.
Because it is fun.

How does it look like?

$ neofetch
                   -`                    x@x
                  .o+`                   ---
                 `ooo/                   OS: Arch Linux x86_64
                `+oooo:                  Host: 20Q5S01400 ThinkPad L490
               `+oooooo:                 Kernel: 6.6.39-1-lts
               -+oooooo+:                Uptime: 1 hour, 32 mins
             `/:-:++oooo+:               Packages: 519 (pacman)
            `/++++/+++++++:              Shell: bash 5.2.26
           `/++++++++++++++:             Resolution: 1920x1080
          `/+++ooooooooooooo/`           WM: i3
         ./ooosssso++osssssso+`          Theme: Adwaita [GTK2/3]
        .oossssso-````/ossssss+`         Icons: Adwaita [GTK2/3]
       -osssssso.      :ssssssso.        Terminal: alacritty
      :osssssss/        osssso+++.       Terminal Font: LiterationMono Nerd Font
     /ossssssss/        +ssssooo/-       CPU: Intel i7-8565U (8) @ 4.600GHz
   `/ossssso+/:-        -:/+osssso+-     GPU: Intel WhiskeyLake-U GT2 [UHD Graphics 620]
  `+sso+:-`                 `.-/+oso:    Memory: 1571MiB / 7134MiB
 `++:.                           `-/+/
 .`                                 `/

du -h /
# 5.6G

Workflow Tools

Most of my tools revolve around rofi which is an application launcher.

For example, my "TODO" list is a script that let's me add, read, and remove entries from a list via rofi:

You can configure rofi to be able to run any script you like. Some of my scripts include: two-factor authentication, vpn connections, getting a wayback-machine link to a website, compiling my website and rss feed, etc.

Software used

Neovim

I use Neovim for coding (work and personal projects), for taking notes, and journaling.

This website, for example, is entirely written from Neovim. The website is built using markdown files that are parsed through a C program capable of converting the .md files into .html files.

After having tried so many auto-generators and converters, I decided to build myself a simple and fast one. It was also such a fun C project.

This is what editing this website feels like along with a snippet of the C code used to compile the .md files into .html:

Arch Linux

Arch has been my daily driver since 2019. Before that, I've used Linux Mint and Ubuntu at work; back when I didn't care and didn't know the differences between all the flavours of Linux. I have also had to use MacOS for a job where the company mandated developers to use Apple machines.

I have tried other different flavours of Linux on virtual machines in the past, but I decided to stick with Arch Linux given how simple it is to customise.

That means I can use my simple bash script to download and auto-configure my system without manual intervention. I can also sync my environment between my work laptop and my personal laptop with one command. Apart from the hardware, all my machines are identical from a user's experience.

An basic understanding of Linux is necessary to use this distro. You don't need much more than being comfortable around the terminal these days. This is because everything you need to learn is already in the arch wiki and arch now has assisted installation scripts via archinstall.

Some other perks of using arch are:

Rolling releases
Rich user repository (AUR)
The fantastic Arch wiki
No corporations behind it (community support only)
Helpful community

i3-gaps

i3-gaps is a fork of the i3wm (tilling window manager) for X11. Instead of having stacked windows that overlap (like in microsoft windows, or macOs), windows are organized side-by-side as default, having gaps between them. The benefits are:

Able to customise keyboard shortcuts to navigate through windows.
Easy to setup.

polybar

Polybar is was the easiest and most user-friendly status bar I could find. With a lot of pre-configured setups and out-of-the-box integrations, getting up to speed was very simple.

rofi

A very simple and configurable application/script launcher.

pywal

For setting colours and themes.

More about Ricing

When it comes to finding inspiration for ricing in Linux, a good place to look at is /r/unixporn. Most of my setup came from picking apart different rices that users have shared in that channel. It is also a good place to visit once in a while to stay up-to-date with what the rest of the community is using and trying.

Marcelo Fernandes

Profiling a Django Migration in Postgres

Profiling a Django Migration in Postgres

Supposition: The table is rewritten

The Django Equivalent

Further Problems

A Little Plot Twist

Should you not use Postgres varchar(n) by default?

Should you not use Postgres varchar(n) by default?

Denial-of-Service (DoS) via Uncontrolled Data Insertion

Storing free-text in a database may not be a good idea

Increasing the size of a varchar(n) is not a problem

Conclusions

Code Reviews In Vim

Code Reviews In Vim

Fetching

Showing the commits

Checking Out Each Commit

Taking Notes

About That Postgres Json Field

About That Postgres Json Field

VACUUM

INSERTS

UPDATES

SELECT (queries)

Considerations

Inspecting the TOAST table

The benchmark script

The results in CSV format

The machine that ran the tests

Thoughts on 3 years of management

Thoughts on 3 years of management

You Must Facilitate

You Must Take An Active Interest In The Development Of Your Reports

You Must Listen

You Must Reassure Through Real Recognition

You must keep an open mind about your management skills

Closing Remarks

On Git Commit Messages

On Git Commit Messages

A Disclaimer

The Summary

Do not mix two unrelated functional changes in the same commit

Bisecting

Function evolutions

Do not assume the reviewer uses the same tools as you

Do not assume the reviewer has access to an external website

Outro

Further Reading

Postgres Unique Constraints Without Downtime

Postgres Unique Constraints Without Downtime

How To Safely Add a Unique Constraint Without Downtime

Why not just use the index for constraint validation?

Timing Different Approaches

The script

Why Is This Site Built With C

Why Is This Site Built With C

Now

Outro

The Dumbest Compiler Imaginable

The Dumbest Compiler Imaginable

Pypy

Outro

Managing Python Environments

Managing Python Environments

What Do These Tools Provide?

So Why Do People Use These Tools?

What do I do then?

But If Building From Source Was Good Package Managers Wouldn't Exist...

Which Assembly Syntax to Choose?

Which Assembly Syntax to Choose?

Order of Operands

AT&T is The Default on GCC, objdump, and GDB

Comments

Suffixes

Prefixes

Memory Operands

Final Remarks

mov edi, edi

mov edi, edi