February 28, 2024
When Django queries are evaluated, they cache the whole result in the QuerySet object. For example:
qs = Account.objects.only("id", "created_at", "number")[:10]
assert qs._result_cache is None
result = list(qs)
# Try to evaluate again, this doesn't trigger another SQL statement because
# the query has already been evaluated.
result_again = list(qs)
# This is where the result is stored.
qs._result_cache
Triggering the following query
SELECT "account"."id",
"account"."created_at"
"account"."number"
FROM "account"
ORDER BY "accounts_account"."created_at" DESC
LIMIT 10
This is fine if your query is not returning an extremely high number of objects.
# use pympler because sys.asizeof doesn't follow references memory use!
from pympler import asizeof
ten_accounts = list(Account.objects.all()[:10])
asizeof.asizeof(ten_accounts)
# or fancy results
print(asizeof.asized(ten_accounts, detail=1).format())
Iterator can help us reduce the memory overhead of queries as they only pull data from the database as needed.
from pympler import asizeof
qs = Account.objects.iterator(chunk_size=2)
for i in qs:
print(asizeof.asizeof(qs))
If you use an iterator in a transaction, it will hold locks until all the Python code is processed.
Run this in one shell:
import time
from django.db import transaction
with transaction.atomic():
for account in Account.objects.iterator():
time.sleep(1)
And this in another:
from django.db import connection
with connection.cursor() as cursor:
cursor.execute(
"""
LOCK TABLE accounts_account IN ACCESS EXCLUSIVE MODE;
SELECT pg_sleep(1);
""",
)
You will be locked. But this is true if you are not using iterator()
as well.