Does django core pagination retrieve all data first?

Does django core pagination retrieve all data first? - python

I am using django 1.5. I need to split pages to data. I read docs here. I am not sure about whether it retrieves all data first or not. Since I have a large table, it should be better to using something like 'limit'. Thanks.
EDIT
I am using queryset in ModelManager.
example:
class KeywordManager(models.Manager):
def currentkeyword(self, kw, bd, ed):
wholeres = super(KeywordManager, self).get_query_set() \
.values("sc", "begindate", "enddate") \
.filter(keyword=kw, begindate__gte=bd, enddate__lte=ed) \
.order_by('enddate')
return wholeres

First, a queryset is a lazy object, and django will retrieve the data as soon you request it, but if you dont, django won't hit the DB. If you use over a queryset any list methods as len(), you will evaluate all the queryset and forcing django to retrieve all the data.
If you pass a queryset to the Paginator, it would not retrieve all the data, because, as docs says, if you pass a queryset, it will use .count() methods avoiding converting the queryset into a list and the use of len() method.

If your data is not coming from the database, then yes - Paginator will have to load all the information first in order to determine how to "split" it.
If you're not and you're simply interacting with the database with Django's auto-generated SQL, then the Paginator performs a query to determine the number of items in the database (i.e. an SQL COUNT()) and uses the value you supplied to determine how many pages to generate. Example: count() returns 43, and you want pages of 10 results - the number of pages generated is equivalent to: 43 % 10 + 1 = 5

Related

Django large queryset return as response efficiently

I have a model in django called "Sample"
I want to query and return a large number of rows ~ 100k based on filters.
However, it's taking up to 4-5 seconds to return the response and I was wondering whether I could make it faster.
(Need to improve converting from queryset to df to response json. Not querying from DB)
My current code looks like this:
#api_view(['POST'])
def retrieve_signal_asset_weight_ts_by_signal(request):
#code to get item.id here based on request
qs = Sample.objects.filter(
data_date__range=[start_date, end_date],
item__id = item.id).values(*columns_required)
df = pd.DataFrame(list(qs), columns=columns_required)
response = df .to_json(orient='records')
return Response(response, status=status.HTTP_200_OK)
Based on multiple test cases -- I've noticed that the slow part isn't actually getting the data from DB, it's converting it to a DataFrame and then returning as JSON.
It's actually taking about 2 seconds just for this part df = pd.DataFrame(list(qs), columns=columns_required). Im looking for a faster way to convert queryset to a json which I can send as part of my "response" object!
Based on this link I've tried other methods including django-pandas and using .values_list() but they seem to be slower than this, and I noticed many of the answers are quite old so I was wondering whether Django 3 has anything to make it faster.
Thanks
Django version : 3.2.6

With your code, you can't write:
(Need to improve converting from queryset to df to response json. Not querying from DB)
It's actually taking about 2 seconds just for this part
df = pd.DataFrame(list(qs), columns=columns_required)
Get data from database is a lazy operation, so the query will be executed only when data is needed list(qs). According to the documentation:
QuerySets are lazy – the act of creating a QuerySet doesn’t involve any database activity. You can stack filters together all day long, and Django won’t actually run the query until the QuerySet is evaluated. Take a look at this example:
Try to separate operation:
records = list(qs)
df = pd.DataFrame(records, columns=columns_required))
Now, you can determine which operation is time-consuming.
Maybe, you look at StreamingHttpResponse

Django: Admin - order results based on regular expression

I have a problem where I want to order model rows in Django admin based on a number within a string and ignore the letters - like XX0345XX, X0346XXX, XXX0347XX. I'm thinking that I should probably use a regular expression for that. I have an SQL query (PostgreSQL) that returns what I want:
select * from mytable order by substring(my_field, '\d+')::int DESC
But I'm having trouble applying it to get the same result in the Django admin get_queryset().
I've tried doing something like:
def get_queryset():
return Model.objects.raw("select * from mytable order by substring(my_field, '\d+')::int DESC")
but the problem is that I'm not returning the right type this way. Model.objects.raw('...') returns RawQuerySet, but get_queryset() should return QuerySet instance not RawQuerySet one, so I can't do it this way.
Any suggestions on how to solve this problem? Thanks!

You can use the .extra() method of to convert from a rawqueryset to a queryset, see here
This example is taken from, here
class CustomManager(manager.Manager):
def get_queryset():
qs = self.get_queryset()
sql = "myapp_model_b.id IN (SELECT UNNEST(myapp_model_a.pk_values) FROM myapp_model_a WHERE myapp_model_a.id='%s')" % index_id
return qs.extra(where=[sql])

As pointed out by #Azy_Crw4282, you can use QuerySet.extra() to run raw SQL queries and still have the results returned as a QuerySet instance.
Here's how I managed to do a regex based "ORDER BY" on objects in admin view, based on a number within a string. I.e. order fields like XX0345XX, X0346XXX, XXX0347XX, based on the number they contain - basically get a substring that matches '\d+' regular expression.
Just override the get_queryset() function in your admin.ModelAdmin class.
Solution:
def get_queryset(self, request):
sql_query = "CAST(substring(booking_reference, '\d+') as int)"
bookings = Booking.objects.extra(select={'book_ref_digits': sql_query}).order_by('-book_ref_digits')
return bookings
As far as I understand, it adds a new temporary field, e.g. book_ref_digits in the QuerySet object and then you can do .order_by() on that.
Note: Using older version of Django 1.10.5

How to serialize and deserialize Django ORM query (not queryset)?

My use case is that I need to store queries in DB and retrieve them from time to time and evaluate. Thats needed for mailing-app where every user can subscribe to a web-site content selected by individually customized query.
Most basic solution is to store raw SQL and use it with RawQuerySet. But I wonder is there better solutions?

At first glance, it is really dangerous to hand out query building job to others, since they can do anything (even delete all your data in your database or drop entire table etc.)
Even you let them build a specific part of the query, it is still open to Sql Injection. If it is ok for all those dangers, then you may try the following.
This is and old script I used and let users set a specific part of the query. Basics are using string.Template and eval (the evil part)
Define your Model:
class SomeModel(Model):
usr = ForeingKey(User)
ct = ForeignKey(ContentType) # we will choose related DB table with this
extra_params = TextField() # store extra filtering criteria in here
Lets execute all queries belongs to a user. Say we have a User query with extra_params is_staff and 'username__iontains'
usr: somebody
ct: User
extra_params: is_staff=$stff_stat, username__icontains='$uname'
$ defines placeholders in extra_params
from string import Template
for _qry in SomeModel.objects.filter(usr='somebody'): # filter somebody's queries
cts = Template(_qry.extra_params) # take extras with Template
f_cts = cts.substitute(stff_stat=True, uname='Lennon') # sustitute placeholders with real time filtering values
# f_cts is now `is_staff=True, username__icontains='Lennon'`
qry = Template('_qry.ct.model_class().objects.filter($f_cts)') # Now, use Template again to place our extras into a django `filter` query. We also select related model in here with `_qry.ct.model_class()`
exec_qry = qry.substitute(f_cts=f_cts)
# now we have `User.objects.filter(is_staff=True, username__icontains='Lennon')
query = eval(exec_qry) # lets evaluate it!
If you have all relted imports done,then you an use Q or any other query building option in your extra_params. Also You can use other methods to form Create or Update queries.
You can read more about Template form there. But as I said. It is REALLY DANGEROUS to give a such option to other users.
Also you may need to read about Django Content Type
Update: As #GillBates mentioned, you can use a dictonary structure to create the query. In this case, you will not need Template anymore. You can use json for such data transfer (or any other if you wish). Assuming you use json to get the data from an outer source following code is a scratch that uses some variables from the upper code block.
input_data : '{"is_staff"=true, "username__icontains"="Lennon"}'
import json
_data = json.loads(input_data)
result_set = _qry.ct.model_class().objects.filter(**_data)

According to your answer,
User passes some content-specific parameters into a form, then view function, that recieves POST, constructs query
one option is to store parameters (pickle'd or json'ed, or in a model) and reconstruct query with regular django means. This is somewhat more robust solution, since it can handle some datastructure changes.

You could create a new model user_option and store the selections in this table.
From your question, it's hard to determine whether it is a better solution, but it would make your user's choices more explicit in your data structure.

In django, is there a way to directly annotate a query with a related object in single query?

Consider this query:
query = Novel.objects.< ...some filtering... >.annotate(
latest_chapter_id=Max("volume__chapter__id")
)
Actually what I need is to annotate each Novel with its latest Chapter object, so after this query, I have to execute another query to select actual objects by annotated IDs. IMO this is ugly. Is there a way to combine them into a single query?

Yes, it's possible.
To get a queryset containing all Chapters which are the last in their Novels, simply do:
from django.db.models.expressions import F
from django.db.models.aggregates import Max
Chapters.objects.annotate(last_chapter_pk=Max('novel__chapter__pk')
).filter(pk=F('last_chapter_pk'))
Tested on Django 1.7.

Possible with Django 3.2+
Make use of django.db.models.functions.JSONObject (added in Django 3.2) to combine multiple fields (in this example, I'm fetching the latest object, however it is possible to fetch any arbitrary object provided that you can get LIMIT 1) to yield your object):
MainModel.objects.annotate(
last_object=RelatedModel.objects.filter(mainmodel=OuterRef("pk"))
.order_by("-date_created")
.values(
data=JSONObject(
id="id", body="body", date_created="date_created"
)
)[:1]
)

Yes, using Subqueries, docs: https://docs.djangoproject.com/en/3.0/ref/models/expressions/#subquery-expressions
latest_chapters = Chapter.objects.filter(novel = OuterRef("pk"))\
.order_by("chapter_order")
novels_with_chapter = Novel.objects.annotate(
latest_chapter = Subquery(latest_chapters.values("chapter")[:1]))
Tested on Django 3.0
The subquery creates a select statement inside the select statement for the novels, then adds this as an annotation. This means you only hit the database once.
I also prefer this to Rune's answer as it actually annotates a Novel object.
Hope this helps, anyone who came looking like much later like I did.

No, it's not possible to combine them into a single query.
You can read the following blog post to find two workarounds.

Djapian - filtering results

I use Djapian to search for object by keywords, but I want to be able to filter results. It would be nice to use Django's QuerySet API for this, for example:
if query.strip():
results = Model.indexer.search(query).prefetch()
else:
results = Model.objects.all()
results = results.filter(somefield__lt=somevalue)
return results
But Djapian returns a ResultSet of Hit objects, not Model objects. I can of course filter the objects "by hand", in Python, but it's not realistic in case of filtering all objects (when query is empty) - I would have to retrieve the whole table from database.
Am I out of luck with using Djapian for this?

I went through its source and found that Djapian has a filter method that can be applied to its results. I have just tried the below code and it seems to be working.
My indexer is as follows:
class MarketIndexer( djapian.Indexer ):
fields = [ 'name', 'description', 'tags_string', 'state']
tags = [('state', 'state'),]
Here is how I filter results (never mind the first line that does stuff for wildcard usage):
objects = model.indexer.search(q_wc).flags(djapian.resultset.xapian.QueryParser.FLAG_WILDCARD).prefetch()
objects = objects.filter(state=1)
When executed, it now brings Markets that have their state equal to "1".

I dont know Djapian, but i am familiar with xapian. In Xapian you can filter the results with a MatchDecider.
The decision function of the match decider gets called on every document which matches the search criteria so it's not a good idea to do a database query for every document here, but you can of course access the values of the document.
For example at ubuntuusers.de we have a xapian database which contains blog posts, forum posts, planet entries, wiki entries and so on and each document in the xapian database has some additional access information stored as value. After the query, an AuthMatchDecider filters the potential documents and returns the filtered MSet which are then displayed to the user.
If the decision procedure is as simple as somefield < somevalue, you could also simply add the value of somefield to the values of the document (using the sortable_serialize function provided by xapian) and add (using OP_FILTER) an OP_VALUE_RANGE query to the original query.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Does django core pagination retrieve all data first? - python

Related

Django large queryset return as response efficiently

Django: Admin - order results based on regular expression

How to serialize and deserialize Django ORM query (not queryset)?

In django, is there a way to directly annotate a query with a related object in single query?

Djapian - filtering results

Categories

Resources