Djapian - filtering results - python

I use Djapian to search for object by keywords, but I want to be able to filter results. It would be nice to use Django's QuerySet API for this, for example:
if query.strip():
results = Model.indexer.search(query).prefetch()
else:
results = Model.objects.all()
results = results.filter(somefield__lt=somevalue)
return results
But Djapian returns a ResultSet of Hit objects, not Model objects. I can of course filter the objects "by hand", in Python, but it's not realistic in case of filtering all objects (when query is empty) - I would have to retrieve the whole table from database.
Am I out of luck with using Djapian for this?

I went through its source and found that Djapian has a filter method that can be applied to its results. I have just tried the below code and it seems to be working.
My indexer is as follows:
class MarketIndexer( djapian.Indexer ):
fields = [ 'name', 'description', 'tags_string', 'state']
tags = [('state', 'state'),]
Here is how I filter results (never mind the first line that does stuff for wildcard usage):
objects = model.indexer.search(q_wc).flags(djapian.resultset.xapian.QueryParser.FLAG_WILDCARD).prefetch()
objects = objects.filter(state=1)
When executed, it now brings Markets that have their state equal to "1".

I dont know Djapian, but i am familiar with xapian. In Xapian you can filter the results with a MatchDecider.
The decision function of the match decider gets called on every document which matches the search criteria so it's not a good idea to do a database query for every document here, but you can of course access the values of the document.
For example at ubuntuusers.de we have a xapian database which contains blog posts, forum posts, planet entries, wiki entries and so on and each document in the xapian database has some additional access information stored as value. After the query, an AuthMatchDecider filters the potential documents and returns the filtered MSet which are then displayed to the user.
If the decision procedure is as simple as somefield < somevalue, you could also simply add the value of somefield to the values of the document (using the sortable_serialize function provided by xapian) and add (using OP_FILTER) an OP_VALUE_RANGE query to the original query.

Related

Finding document containing array of nested names in pymongo (CrossRef data)

I have a dataset of CrossRef works records stored in a collection called works in MongoDB and I am using a Python application to query this database.
I am trying to find documents based on one author's name. Removing extraneous details, a document might look like this:
{'DOI':'some-doi',
'author':[{'given': 'Albert','family':'Einstein',affiliation:[]},
{'given':'R.','family':'Feynman',affiliation:[]},
{'given':'Paul','family':'Dirac',affiliation:['University of Florida']}]
}
It isn't clear to me how to combine the queries to get just Albert Einstein's papers.
I have indexes on author.family and author.given, I've tried:
cur = works.find({'author.family':'Einstein','author.given':'Albert'})
This returns all of the documents by people called 'Albert' and all of those by people called 'Einstein'. I can filter this manually, but it's obviously less than ideal.
I also tried:
cur = works.find({'author':{'given':'Albert','family':'Einstein','affiliation':[]}})
But this returns nothing (after a very long delay). I've tried this with and without 'affiliation'. There are a few questions on SO about querying nested fields, but none seem to concern the case where we're looking for 2 specific things in 1 nested field.
Your issue is that author is a list.
You can use an aggregate query to unwind this list to objects, and then your query would work:
cur = works.aggregate([{'$unwind': '$author'},
{'$match': {'author.family':'Einstein', 'author.given':'Albert'}}])
Alternatively, use $elemMatch which matches on arrays that match all the elements specified.
cur = works.find({"author": {'$elemMatch': {'family': 'Einstein', 'given': 'Albert'}}})
Also consider using multikey indexes.

Extract only the data from a Django Query Set

I am working to learn Django, and have built a test database to work with. I have a table that provides basic vendor invoice information, so, and I want to simply present a user with the total value of invoices that have been loaded to into the database. I found that the following queryset does sum the column as I'd hoped:
total_outstanding: object = Invoice.objects.aggregate(Sum('invoice_amount'))
but the result is presented on the page in the following unhelpful way:
Total $ Outstanding: {'invoice_amount__sum': Decimal('1965')}
The 1965 is the correct total for the invoices that I populated the database with, so the queryset is pulling what I want it to, but I just want to present that portion of the result to the user, without the other stuff.
Someone else asked a similar question (basically the same) here: how-to-extract-data-from-django-queryset, but the answer makes no sense to me, it is just:
k = k[0] = {'name': 'John'}
Queryset is list .
Can anyone help me with a plain-English explanation of how I can extract just the numerical result of that query for presentation to a user?
What you here get is a dictionary that maps the name of the aggregate to the corresponding value. You can use subscripting to obtain the corresponding value:
object = Invoice.objects.aggregate(
Sum('in,voice_amount')
)['invoice_amount__sum']

How to filter on calculated column of a query and meanwhile preserve mapped entities

I have a query which selects an entity A and some calculated fields
q = session.query(Recipe,func.avg(Recipe.somefield).join(.....)
I then use what I select in a way which assumes I can subscript result with "Recipe" string:
for entry in q.all():
recipe=entry.Recipe # Access KeyedTuple by Recipe attribute
...
Now I need to wrap my query in an additional select, say to filter by calculated field AVG:
q=q.subquery();
q=session.query(q).filter(q.c.avg_1 > 1)
And now I cannot access entry.Recipe anymore!
Is there a way to make SQLAlchemy adapt a query to an enclosing one, like aliased(adapt_on_names=True) orselect_from_entity()`?
I tried using those but was given an error
As Michael Bayer mentioned in a relevant Google Group thread, such adaptation is already done via Query.from_self() method. My problem was that in this case I didn't know how to refer a column which I want to filter on
This is due to the fact, that it is calculated i.e. there is no table to refer to!
I might resort to using literals(.filter('avg_1>10')), but 'd prefer to stay in the more ORM-style
So, this is what I came up with - an explicit column expression
row_number_column = func.row_number().over(
partition_by=Recipe.id
).label('row_number')
query = query.add_column(
row_number_column
)
query = query.from_self().filter(row_number_column == 1)

How can I make a Django query for the first occurrence of a foreign key in a column?

Basically, I have a table with a bunch of foreign keys and I'm trying to query only the first occurrence of a particular key by the "created" field. Using the Blog/Entry example, if the Entry model has a foreign key to Blog and a foreign key to User, then how can I construct a query to select all Entries in which a particular User has written the first one for the various Blogs?
class Blog(models.Model):
...
class User(models.Model):
...
class Entry(models.Model):
blog = models.Foreignkey(Blog)
user = models.Foreignkey(User)
I assume there's some magic I'm missing to select the first entries of a blog and that I can simple filter further down to a particular user by appending:
query = Entry.objects.magicquery.filter(user=user)
But maybe there's some other more efficient way. Thanks!
query = Entry.objects.filter(user=user).order_by('id')[0]
Basically order by id (lowest to highest), and slice it to get only the first hit from the QuerySet.
I don't have a Django install available right now to test my line, so please check the documentation if somehow I have a type above:
order by
limiting querysets
By the way, interesting note on 'limiting queysets" manual section:
To retrieve a single object rather
than a list (e.g. SELECT foo FROM bar
LIMIT 1), use a simple index instead
of a slice. For example, this returns
the first Entry in the database, after
ordering entries alphabetically by
headline:
Entry.objects.order_by('headline')[0]
EDIT: Ok, this is the best I could come up so far (after yours and mine comment). It doesn't return Entry objects, but its ids as entry_id.
query = Entry.objects.values('blog').filter(user=user).annotate(Count('blog')).annotate(entry_id=Min('id'))
I'll keep looking for a better way.
Ancient question, I realise - #zalew's response is close but will likely result in the error:
ProgrammingError: SELECT DISTINCT ON expressions must match initial
ORDER BY expressions
To correct this, try aligning the ordering and distinct parts of the query:
Entry.objects.filter(user=user).distinct("blog").order_by("blog", "created")
As a bonus, in case of multiple entries being created at exactly the same time (unlikely, but you never know!), you could add determinism by including id in the order_by:
To correct this, try aligning the ordering and distinct parts of the query:
Entry.objects.filter(user=user).distinct("blog").order_by("blog", "created", "id")
Can't test it in this particular moment
Entry.objects.filter(user=user).distinct("blog").order_by("id")

In Django, how does one filter a QuerySet with dynamic field lookups?

Given a class:
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=20)
Is it possible, and if so how, to have a QuerySet that filters based on dynamic arguments? For example:
# Instead of:
Person.objects.filter(name__startswith='B')
# ... and:
Person.objects.filter(name__endswith='B')
# ... is there some way, given:
filter_by = '{0}__{1}'.format('name', 'startswith')
filter_value = 'B'
# ... that you can run the equivalent of this?
Person.objects.filter(filter_by=filter_value)
# ... which will throw an exception, since `filter_by` is not
# an attribute of `Person`.
Python's argument expansion may be used to solve this problem:
kwargs = {
'{0}__{1}'.format('name', 'startswith'): 'A',
'{0}__{1}'.format('name', 'endswith'): 'Z'
}
Person.objects.filter(**kwargs)
This is a very common and useful Python idiom.
A simplified example:
In a Django survey app, I wanted an HTML select list showing registered users. But because we have 5000 registered users, I needed a way to filter that list based on query criteria (such as just people who completed a certain workshop). In order for the survey element to be re-usable, I needed for the person creating the survey question to be able to attach those criteria to that question (don't want to hard-code the query into the app).
The solution I came up with isn't 100% user friendly (requires help from a tech person to create the query) but it does solve the problem. When creating the question, the editor can enter a dictionary into a custom field, e.g.:
{'is_staff':True,'last_name__startswith':'A',}
That string is stored in the database. In the view code, it comes back in as self.question.custom_query . The value of that is a string that looks like a dictionary. We turn it back into a real dictionary with eval() and then stuff it into the queryset with **kwargs:
kwargs = eval(self.question.custom_query)
user_list = User.objects.filter(**kwargs).order_by("last_name")
Additionally to extend on previous answer that made some requests for further code elements I am adding some working code that I am using
in my code with Q. Let's say that I in my request it is possible to have or not filter on fields like:
publisher_id
date_from
date_until
Those fields can appear in query but they may also be missed.
This is how I am building filters based on those fields on an aggregated query that cannot be further filtered after the initial queryset execution:
# prepare filters to apply to queryset
filters = {}
if publisher_id:
filters['publisher_id'] = publisher_id
if date_from:
filters['metric_date__gte'] = date_from
if date_until:
filters['metric_date__lte'] = date_until
filter_q = Q(**filters)
queryset = Something.objects.filter(filter_q)...
Hope this helps since I've spent quite some time to dig this up.
Edit:
As an additional benefit, you can use lists too. For previous example, if instead of publisher_id you have a list called publisher_ids, than you could use this piece of code:
if publisher_ids:
filters['publisher_id__in'] = publisher_ids
Django.db.models.Q is exactly what you want in a Django way.
This looks much more understandable to me:
kwargs = {
'name__startswith': 'A',
'name__endswith': 'Z',
***(Add more filters here)***
}
Person.objects.filter(**kwargs)
A really complex search forms usually indicates that a simpler model is trying to dig it's way out.
How, exactly, do you expect to get the values for the column name and operation?
Where do you get the values of 'name' an 'startswith'?
filter_by = '%s__%s' % ('name', 'startswith')
A "search" form? You're going to -- what? -- pick the name from a list of names? Pick the operation from a list of operations? While open-ended, most people find this confusing and hard-to-use.
How many columns have such filters? 6? 12? 18?
A few? A complex pick-list doesn't make sense. A few fields and a few if-statements make sense.
A large number? Your model doesn't sound right. It sounds like the "field" is actually a key to a row in another table, not a column.
Specific filter buttons. Wait... That's the way the Django admin works. Specific filters are turned into buttons. And the same analysis as above applies. A few filters make sense. A large number of filters usually means a kind of first normal form violation.
A lot of similar fields often means there should have been more rows and fewer fields.

Categories