Django: prefetch_related grants performance only for non paginated requests? - python

For example, I have 1000 Users with lots of related objects that I use in template.
Is it right that this:
User.objects.all()[:10]
Will always perform better than this:
User.objects.all().prefetch_related('educations', 'places')[:10]

This line will do an extra query to fetch the related objects for educations and places.
User.objects.all().prefetch_related('educations', 'places')[:10]
However it will only fetch the related objects for the sliced queryset User.objects.all()[:10], so you don't have to worry that it will fetch the related objects the thousands of other users in your database.

Related

How do you avoid SQL Injection attacks in your Django Rest APIs if using native ORM?

They say that by using Django ORM you are already protected against most SQL Injection attacks. However, I wanted to know if there are any additional measures that should or can be used to process user input? Any libraries like bleach?
The main danger of using a Django ORM is that you might give users a powerful tool to select, filter and aggregate over arbitrary fields.
Indeed, say for example that you make a form that enables users to select the fields to return, then you can implement this as:
data = MyModel.objects.values(*request.GET.getlist('fields'))
If MyModel has a ForeignKey to the user model named owner, then the user could forge a request with owner__password as field, and thus retrieve the (hashed) passwords. While Django stores for its default User model a hashed password, it still means that the hashed data is exposed and it might make it easier to thus retrieve passwords.
But even if there is no user model, it can result in the fact that users can forge requests where they use links to sensitive data, and thus can retrieve a large amount of sensitive data. The same can happen with arbitrary filtering, annotating, aggregating, etc.
What you thus should do is keep a list of acceptable values, and check that the request only contains these values, for example:
acceptable = {'title', 'description', 'created_at'}
data = [field for field in request.GET.getlist('fields') if field in acceptable]
data = MyModel.objects.values(*data)
If you for example make use of packages like django-filter [readthedocs.io] you list the fields that can be filtered and what lookups can be done for these fields. The other data in the request.GET will be ignored, and thus will prevent filtering with arbitrary fields.

Delete entries from ManyToMany table using _raw_delete

I have a huge amount of data in my db.
I cannot use .delete() method cause performance of Django ORM is insufficient in my case.
_raw_delete() method suits me cause I can do it python instead using raw SQL.
But I have problem I have no idead how can I delete relation tables using _raw_delete. They need to be deleted before models cause I have restrict in DB. Any ideas how can I achieve this?
I have found a solution.
You can operate on link model with this:
link_model = MyModel._meta.get_field('my_m2m_field').remote_field.through
qs = link_model.objects.filter(mymodel_id__in=mymodel_ids)
qs._raw_delete(qs.db)

Django select_related does nothing

I have a comment model in django which contains a foreign key reference to the User model and I'm trying to lookup comments (of a certain post ID) and then join/get the user data of the author of the commment. This is what I'm doing
result = Comment.objects.filter(post=post).select_related('user').order_by('-created_at')
When I return the result, I get the same exact object I got before I added the select_related() function. Am I missing something here?
The .select_related(…) [Django-doc] function makes a JOIN in the query, and thus will use the query to load the data of the related .user object. If you do not use .select_related(…), then accessing .user of a Comment will result in an extra query. If you thus need to load all the users of N Comments, then that will take N+1 queries (and this is the famous N+1 problem).
.select_related(…) thus functionally does not change (much), it however results in a (significant) performance boost if you plan to access all the .users of the Comments.
You thus can for example print the .username of the .users in the Comment objects with:
for comment in Comment.objects.select_related('user'):
print(comment.user.username)
If you do this without a .select_related(…) clause, it will result in a large amount of queries.

Django ORM: Filter results by values from list, limit answers per value?

I'm using Django 2.0 and have a Content model with a ForeignKey(User, ...). I also have a list of user IDs for which I'd like to fetch that Content, ordered by "newest first", but only up to 25 elements per user. I know I can do this:
Content.objects.filter(user_id__in=[1, 2, 3, ...]).order_by('-id')
...to fetch all the Content objects created by each of these users, plus I'll get it all sorted with newest elements first. But I'd like to fetch up to 25 elements for each of these users (some users might create hundreds of these objects, some might create zero). There's of course the dumb way:
for user in [1, 2, 3, ...]:
Content.objects.filter(user_id=user).order_by('-id')[:25]
This however hits the database as many times as there's objects in the user ID list, and that goes quite high (around 100 or so per page view). Is there any way to optimize this case? (I've tried looking around select_related, but that seems to fetch as many related models as possible.)
There are plenty of ways to form a greatest-n-per-group query, but in this case you could form a union of top-n queries of all users:
contents = Content.objects.\
none().\
union(*[Content.objects.
filter(user_id=uid).
order_by('-id')[:25] for uid in user_ids],
all=True)
Using prefetch_related() you could then produce a queryset that fetches the users and injects an attribute of latest content:
users = User.objects.\
filter(id__in=user_ids).\
prefetch_related(models.Prefetch(
'content_set',
queryset=contents,
to_attr='latest_content'))
Does it actually hit the database that many times? I have not looked at the raw SQL but according to the documentation it is equivalent to the LIMIT clause and it also states "Generally, slicing a QuerySet returns a new QuerySet – it doesn’t evaluate the query".
https://docs.djangoproject.com/en/2.0/topics/db/queries/#limiting-querysets
I would be curious to see the raw SQL if you are looking at it and it does NOT do this as I use this paradigm.

how to limit/offset sqlalchemy orm relation's result?

in case i have a user Model and article Model, user and article are one-to-many relation. so i can access article like this
user = session.query(User).filter(id=1).one()
print user.articles
but this will list user's all articles, what if i want to limit articles to 10 ? in rails there is an all() method which can have limit / offset in it. in sqlalchemy there also is an all() method, but take no params, how to achieve this?
Edit:
it seems user.articles[10:20] is valid, but the sql didn't use 10 / 20 in queries. so in fact it will load all matched data, and filter in python?
The solution is to use a dynamic relationship as described in the collection configuration techniques section of the SQLAlchemy documentation.
By specifying the relationship as
class User(...):
# ...
articles = relationship('Articles', order_by='desc(Articles.date)', lazy='dynamic')
you can then write user.articles.limit(10) which will generate and execute a query to fetch the last ten articles by the user. Or you can use the [x:y] syntax if you prefer which will automatically generate a LIMIT clause.
Performance should be reasonable unless you want to query the past ten articles for 100 or so users (in which instance at least 101 queries will be sent to the server).

Categories