Django: Store Q query objects for repeatable search? - python

In my Django based web app users can perform a search; the query consists of several dynamically constructed complex Q objects.
Depending on the user search parameters, the search will query a variable number of columns and also can stretch over multiple models.
The user should be able to save her search to repeat it at some later point.
For that I'd like to store the Q objects (I guess) in a database table.
Is this good practice? How would you approach this?
Thanks in advance.

If you have just one or a fixed number of Q objects as part of the filter, you can save the argument passed to Q as a dict.
.e.g This:
Q(buy_book__entity__type=ENTITY.INTERNAL)
Is equivalent to this:
q_filter = {"buy_book__entity__type": ENTITY.INTERNAL}
Q(**q_filter)
You can save q_filter in your datastore.

Related

django icontains to search postgres database

I currently have a Django application and postgres database. I want to have a search bar that allows users to enter in a value and it will search some of the fields of the model to search for matching values. I want this to work even for values of "". I currently have:
MyModel.objects.filter(myfield__icontains=search_query).order_by(...)
How would I make this so that it can search multiple fields of the model at the same time. What is the most efficient way to do so? Is "icontains" okay for this?
Any help would be greatly appreciated!
Doing this through regular filter queries and icontains is not advisable as it becomes inefficient pretty quickly - you certainly don't want to be doing that on multiple large text fields.
However, PostgreSQL comes with a full text search engine which is designed for exactly this purpose. Django provides support for this.
You can define a SearchVector to perform full text search on multiple fields at once, e.g., :
from django.contrib.postgres.search import SearchVector
MyModel.objects.annotate(
search=SearchVector('field_1') + SearchVector('field_2'),
).filter(search='search_query')
The documentation I've linked to provides a lot of additional information on how to perform ranking etc. on search results.
Another alternative is to use a search engine like Elasticsearch - whether this is necessary depends on how many objects you have and what kind of filtering and ranking you need to do on results.
You can use Q to search multiple fields, for example:
fields that you want to search:
field0
field1
field2
Django search code:
from django.db.models import Q
search_result = MyModel.objects.filter(
Q(field0_icontains=search_query) |
Q(field1_icontains=search_query) |
Q(field2_icontains=search_query)
).order_by(...)

Django ORM: Filter results by values from list, limit answers per value?

I'm using Django 2.0 and have a Content model with a ForeignKey(User, ...). I also have a list of user IDs for which I'd like to fetch that Content, ordered by "newest first", but only up to 25 elements per user. I know I can do this:
Content.objects.filter(user_id__in=[1, 2, 3, ...]).order_by('-id')
...to fetch all the Content objects created by each of these users, plus I'll get it all sorted with newest elements first. But I'd like to fetch up to 25 elements for each of these users (some users might create hundreds of these objects, some might create zero). There's of course the dumb way:
for user in [1, 2, 3, ...]:
Content.objects.filter(user_id=user).order_by('-id')[:25]
This however hits the database as many times as there's objects in the user ID list, and that goes quite high (around 100 or so per page view). Is there any way to optimize this case? (I've tried looking around select_related, but that seems to fetch as many related models as possible.)
There are plenty of ways to form a greatest-n-per-group query, but in this case you could form a union of top-n queries of all users:
contents = Content.objects.\
none().\
union(*[Content.objects.
filter(user_id=uid).
order_by('-id')[:25] for uid in user_ids],
all=True)
Using prefetch_related() you could then produce a queryset that fetches the users and injects an attribute of latest content:
users = User.objects.\
filter(id__in=user_ids).\
prefetch_related(models.Prefetch(
'content_set',
queryset=contents,
to_attr='latest_content'))
Does it actually hit the database that many times? I have not looked at the raw SQL but according to the documentation it is equivalent to the LIMIT clause and it also states "Generally, slicing a QuerySet returns a new QuerySet – it doesn’t evaluate the query".
https://docs.djangoproject.com/en/2.0/topics/db/queries/#limiting-querysets
I would be curious to see the raw SQL if you are looking at it and it does NOT do this as I use this paradigm.

MongoEngine: Limiting number of responses from DBRef

I have a document with around 7k DBRefs in one field to other objects. I want to limit the number of the objects coming back when I query the DBRef field but I cannot find an obvious way of doing it.
project = Project.objects.find({'id': 1})
users = project.users[:10]
On line 2 MongoEngine performs a query to retrieve ALL the users not just the first 10. What can I do to limit the query to only retrieve the first 10?
users = project.users[:10],
This operation is a client side operation, which is performed on the users array that has all the 7k DBRefs values returned by mongodb.
What can I do to limit the query to only retrieve the first 10?
You need to include a projection operation to just select the first 10 elements in the users array.
Project.objects.find({"id": 1},{"users":{"$slice":10}})
The syntax in MongoEngine:
Project.objects(id=1).fields(slice__users[0,10])
If I understand you correctly, there is no way to return a portion of one field. You can pick and choose what fields you are returning, but there is no way to specify a portion of one field.

Django: storing/querying a dictionary-like data set?

I apologize if this has been asked already, or if this is answered somewhere else.
Anyways, I'm working on a project that, in short, stores image metadata and then allows the user to search said metadata (which resembles a long list of key-value pairs). This wouldn't be too big of an issue if the metadata was standardized. However, the problem is that for any given image in the database, there is any number of key/values in its metadata. Also there is no standard list of what keys there are.
Basically, I need to find a way to store a dictionary for each model, but with arbitrary key/value pairs. And I need to be able to query them. And the organization I'm working for is planning on uploading thousands of images to this program, so it has to query reasonably fast.
I have one model in my database, an image model, with a filefield.
So, I'm in between two options, and I could really use some help from people with more experience on choosing the best one (or any other solutions that would work better)
Using a traditional relational database like MySql, and creating a separate model with a foreignkey to the image model, a key field, and a value field. Then, when I need to query the data, I'll ask for every instance of this separate table that relates to an image, and then query those rows for the key/value combination I need.
Using something like MongoDB, with django-toolbox and its DictField to store the metadata. Then, when I need to query, I'll access the dict and search it for the key/value combination I need.
While I feel like 1 would be much better in terms of query time, each image may have up to 40 key/values of metadata, and that makes me worry about that separate "dictionary" table growing far too large if there's thousands of images.
Any advice would be much appreciated. Thanks!
What's the type of metadata? Both key and value are string? I assume it's the case.
The scale of your dataset matters. If you will have up to thousands images and each image has up to 40 key-value pairs, then in option 1, the separate table would have at most 400k records. That's no problem for modern database, as long as you have not bad machine and correct DB settings. One issue to take care is to composite index fields in the table. In Django ORM, it would be something like:
class ImageMeta(models.Model):
image = models.ForeignKey('Image')
key = models.CharField(max_length=XXXX)
value = models.CharField(max_length=XXXX)
class Meta:
index_together = [ ["image", "key", "value"], ] # Django 1.5 and above
In a Django project you've got 4 alternatives for this kind of problem, in no particular order:
using PostgreSQL, you can use the hstore field type, that's basically a pickled python dictionary. It's not very helpful in terms of querying it, but does its job saving your data.
using Django-NoRel with mongodb you get the ListField field type that does the same thing and can be queried just like anything in mongo. (option 2)
using Django-eav to create an entity attribute value store with your data. Elegant solution but painfully slow queries. (option 1)
storing your data as a json string in a long enough TextField and creating your own functions to serializing and deserializing the data, without thinking on being able to make a query over it.
In my own experience, if you by any chance need to query over the data, your option two is by far the best choice. EAV in Django, without composite keys, is painful.

Designing a Tag table that tells how many times it's used

I am trying to design a tagging system with a model like this:
Tag:
content = CharField
creator = ForeignKey
used = IntergerField
It is a many-to-many relationship between tags and what's been tagged.
Everytime I insert a record into the assotication table,
Tag.used is incremented by one, and decremented by one in case of deletion.
Tag.used is maintained because I want to speed up answering the question 'How many times this tag is used?'.
However, this seems to slow insertion down obviously.
Please tell me how to improve this design.
Thanks in advance.
http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html
If your database support materialized indexed views then you might want to create one for this. You can get a large performance boost for frequently run queries that aggregate data, which I think you have here.
your view would be on a query like:
SELECT
TagID,COUNT(*)
FROM YourTable
GROUP BY TagID
The aggregations can be precomputed and stored in the index to minimize expensive computations during query execution.
I don't think it's a good idea to denormalize your data like that.
I think a more elegant solution is to use django aggregation to track how many times the tag has been used http://docs.djangoproject.com/en/dev/topics/db/aggregation/
You could attach the used count to your tag object by calling something like this:
my_tag = Tag.objects.annotate(used=Count('post'))[0]
and then accessing it like this:
my_tag.used
assuming that you have a Post model class that has a ManyToMany field to your Tag class
You can order the Tags by the named annotated field if needed:
Tag.objects.annotate(used=Count('post')).order_by('-used')

Categories