What are N+1 queries?
N+1 queries are when you unwittingly query the database in a for-loop and should be avoided so you don’t DOS your database.
To illustrate the N+1 issue, let’s start with a classic blog site model.
class Author(models.Model):
name = models.TextField()
class Post(models.Model):
name = models.CharField(max_length=100)
content = models.TextField()
author = models.ForeignKey(Author, on_delete=models.CASCADE)
Note the one-to-many relationship. One Author
has many Posts
.
The problem occurs when we access data from a related model in a loop, where each time we access the related field we make another call to the database.
posts = Post.objects.all() # No query.
for post in posts: # Query database for posts
print(post.author.name) # Query author
Here we make one query to get all the posts and N queries to get the author.name. Hence N+1 queries.
When does this happen?
Now you might say, well duh, this is obvious and who would make database queries in a for-loop?
Django. Django would.
To be fair, it’s not really Django’s fault. It’s our fault for not thinking about how the database is queried when designing our views. Django makes database queries in a for-loop when your view uses a related field on a queryset. This isn’t the only time but this happens to me whenever I have a nested serializer, or serializer with a related field, when using the Django Rest Framework.
Consider this example where we’re nesting a list of Posts
inside the Author
JSON in our API response.
class PostSerializer(serializers.Serializer):
name = serializers.CharField()
content = serializer.CharField()
class AuthorSerializer(serializers.Serializer):
name = serializers.CharField()
posts = PostSerializer(many=True, read_only=True)
Given a Rest-style api layout, running GET /api/v1/authors
using these serializers would make N+1 queries for every author in the database. Oops.
The fix
Django Rest Framework makes a note of the fact that it’s the programmer’s responsibility to not make this mistake as fixing it in Django would be “too much magic”. Fair enough. The docs also provides two solutions to the problem:
select_related
makes a query with joins to avoid N+1 queries but doesn’t work for many-to-many relationships.
Using select_related
, we can fix our previous code.
- posts = Post.objects.all() # No query.
+ posts = Post.objects.select_related("author").all() # No query
for post in posts: # evaluate posts. Make a single query using joins.
print(post.author.name) # no more calls to db
Now when posts
is accessed and the database is queried, Django makes a query with joins instead of multiple queries. Every time post.author.name
is accessed, the cached queryset is used rather than making another call to the database.
prefetch_related
works in a similar way but queries each table and does the joins in python. In general, it’s better do joins in the database and not in python but prefetch_selected
also works for many-to-many and GenericRelations
and select_related
does not.
Summary
- N+1 queries are when we iteratively make calls to the database.
- N+1 queries occur because we’re accessing columns in a related table but Django’s ORM doesn’t automatically sort this for us.
- Remember to use
select_related
orprefetch_selected
when accessing many instances of related models to mitigate the issue.