GAE ndb filter by key(id)

GAE ndb filter by key(id) - python

I want to filter by Key value( If I can, by Id)
Developers console offers searching data with key value.
I want to do in my code just like:
DataModel.query(DataModel.key > ndb.Key('DataModel', id_value)).order(
DataModel.date,
DataModel.times).fetch(2000)
Which raise error...
my id_value is integers.
How can I search and filter to get data that have higher id than id_value?

The filtering by inequality on key is fine, what's wrong is that you cannot combine filtering by inequality on one thing, with ordering by another thing. To quote https://cloud.google.com/appengine/docs/python/ndb/queries under Limitations: ...:
combining too many filters, using inequalities for multiple
properties, or combining an inequality with a sort order on a
different property are all currently disallowed.
The last one of these three limitations is what you're running into.

One option is to fetch all the results from the query and then sort them in your program (instead of in the .order() clause).
A second option is to query on all three fields () to reduce the number of results, and then sort them as above:
DataModel.query(DataModel.key > ndb.Key('DataModel', id_value),
DataModel.date > some_date,
DataModel.times > some_times)
A third option is to use MapReduce to process large quantities of data.

Related

How to filter documents using 'fq' in Solr based on two or more fields?

Let us say I have a Solr6.5 core containing metadata for each entry and I want to eliminate a few entries during specific queries based on the values of the fields.
For example, for every entry (document) I have many fields, let's consider two, field1 and field2.
How do I remove only entries that have both field1 = 'x' and field2 = 'a' (The Intersection)
Im querying Solr from Python as a get request.
I have tried sending the 'fq' parameter in the request as a string, where,
'fq' : '-field1:x AND -field2:a'
Also, this part of the fq is a part of a larger fq on several other fields. But for those fields, I do not worry about any intersections
It is reducing the number of found entries for queries significantly. I suspect that the entries with field1=x and field2:a are getting eliminated independently but I want the elimination to occur only when both the conditions are met.
Am I handling this correctly or is there a better/correct way of ensuring only the intersection is eliminated?
Is adding parenthesis around that part of the larger fq make it work?
e.g:
'fq' : '....other-conditions..... AND (-field1:x AND -field2:a) AND ....other-conditions.....'
Will this help? Am I using the AND condition improperly?
Here https://solr.apache.org/guide/6_6/common-query-parameters.html I found that an & is used, but I assume that that is used as a curl command.

Peewee dynamically generate where clause to check for duplicates

I am currently using peewee as an ORM in a python project.
I am trying to determine given a set of objects if any of these objects already exist in the database. For objects where the uniqueness is based on one key that is simple -- I can generate a list of the keys and do a select in the database:
Model.select().where(Model.column << ids)
However, in some cases the uniqueness is determined by two columns. (Note that I don't have the primary key in hand at this moment, which is why I can't just rely on id.)
I tried to genericize the logic, where a list of all the column names that determined uniqueness could be passed in. Here is my code:
clauses = []
for obj in db_objects:
# get_unique_key returns a tuple of the all the column values
# that determine uniqueness for this object.
uniq_key = self._get_unique_key(obj)
subclause = [getattr(self.model_class, uniq_column) == value
for uniq_column, value in zip(self.uniq_columns, uniq_key)]
clauses.append(reduce(operator.and_, subclause))
dups = self.model_class.select().where(reduce(operator.or_, clauses)).execute()
Note that self.dup_columns contains the names of all the columns that together determine uniqueness, and _get_unique_key returns a tuple of those column values.
When I run this I get an error that max recursion depth has been exceeded. I suppose this is due to how peewee resolves expressions. One way around it might be to break up my clauses into some max amount of objects (i.e. create a clause for every 100 objects and then issue the query, and do this until all the objects have been processed).
Wanted to see if there was a better way instead.

Django: distinct on a foreign key, then ordering

I have two models, Track and Pair. Each Pair has a track1, track2 and popularity. I'm trying to get an ordered list by popularity (descending) of pairs, with no two pairs having the same track1. Here's what I've tried so far:
lstPairs = Pair.objects.order_by('-popularity','track1__id').distinct('track1__id')[:iNumPairs].values_list('track1__id', 'track2__id', 'popularity')
This gave me the following error:
ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
...so I tried this:
lstPairs = Pair.objects.order_by('-popularity','track1__id').distinct('popularity', 'track1__id')[:iNumPairs].values_list('track1__id', 'track2__id', 'popularity')
This gave me entries with duplicate track1__ids. Does anyone know of a way of solving this problem? I'm guessing I'll have to use raw() or something similar but I don't know how I'd approach a problem like this. I'm using PostgreSQL for the DB backend so DISTINCT should be supported.

First, let's clarify: DISTINCT is standard SQL, while DISTINCT ON is a PostgreSQL extension.
The error (DISTINCT ON expressions must match initial ORDER BY expressions) indicates, that you should fix your ORDER BY clause, not the DISTINT ON (if you do that, you'll end up with different results, like you already experienced).
The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.
This will give you your expected results:
lstPairs = Pair.objects.order_by('track1__id','-popularity').distinct('track1__id')[:iNumPairs].values_list('track1__id', 'track2__id', 'popularity')
In SQL:
SELECT DISTINCT ON (track1__id) track1__id, track2__id, popularity
FROM pairs
ORDER BY track1__id, popularity DESC
But probably in a wrong order.
If you want your original order, you can use a sub-query here:
SELECT *
FROM (
SELECT DISTINCT ON (track1__id) track1__id, track2__id, popularity
FROM pairs
ORDER BY track1__id
-- LIMIT here, if necessary
)
ORDER BY popularity DESC, track1__id

See the documentation on distinct.
First:
On PostgreSQL only, you can pass positional arguments (*fields) in order to specify the names of fields to which the DISTINCT should apply.
You dont' specify what is your database backend, if it is not PostrgreSQL you have no chance to make it work.
Second:
When you specify field names, you must provide an order_by() in the QuerySet, and the fields in order_by() must start with the fields in distinct(), in the same order.
I think that you should use raw(), or get the entire list of Pairs ordered by popularity and then make the filtering by track1 uniqueness in Python.

Order_by in sqlalchemy with outer join

I have the following sqlalchemy queries:
score = Scores.query.group_by(Scores.email).order_by(Scores.date).subquery()
students = db.session.query(Students, score.c.id).filter_by(archive=0).order_by(Students.exam_date).outerjoin(score, Students.email == score.c.email)
And then I render the things with:
return render_template('students.html', students=students.all())
Now, the issue is that I want the last score for a student to be displayed, as there are many of them corresponding to each user. But the first one seems to be returned. I tried some sortings and order_by on the first query, score, but without success.
How can I affect and pick only one latest result from the "score" to be paired with the corresponding row in "students"?
Thanks!

First of all you want to make sure that the subquery selects only rows for a particular student. You do that with the correlate method:
score = db.session.query(Scores.id).order_by(Scores.date.desc()).correlate(Students)
This alone does nothing, as you do not access the students. The idea of correlate is that if you use Students on your subquery, it will not add it to the FROM list but instead rely on the outer query providing it. Now you will want to refine your query (the join condition if you wish):
score = score.filter(Students.email == Scores.email)
This will produce a subquery that each time only returns the score for a single student. The remaining question is, if each student has to multiple scores. If so, you need to limit it (if there isn't, you don't need the order_by part from above as well):
score = score.limit(1)
Now you have made sure your query returns a single scalar value. Since you are using the query in a select context you will want to turn it into a scalar value:
students = db.session.query(Students, score.as_scalar()).filter_by(archive=0).order_by(Students.exam_date)
The as_scalar method is a way of telling SQLAlchemy that this returns a single row and column. Because otherwise you could not put it into a select.
Note: I am not 100% sure you need that limit if you put as_scalar. Try it out and expirment. If each student has only one score anyway then you don't need to worry at all about any of that stuff.
A little hint on the way: A Query instance is by itself an iterable. That means as long as you don't print it or similar, you can pass around a query just like a list with the only exception that will really only run on access (lazy) and that you can dynamically refine it, e.g. you could slice it: students[:10] would only select the first 10 students (and if you do it on the query it will execute it as a LIMIT instead of slicing a list, where the former is much more efficient).

Matching all records in a datastore query

Is there a way to substitute:
def get_objects(attr1,attr2,..):
objects = Entities.all()
if attr1 != None:
objects.filter('attr1',attr1)
if attr2 != None:
objects.filter('attr2',attr2)
....
return objects
With a single query:
Entities.all().filter('attr1',attr1).filter('attr2',attr2)
By using some sort of 'match all' sign ( maybe a regexp query )?
The problem with the first query is that ( apart from being ugly ) it creates indexes for all possible filter sequences.

The datastore doesn't support regex queries or OR queries.
However, if you're only using equality filters, indexes shouldn't be automatically created; these types of queries can be served using a merge-join strategy as long as the number of filters remains low (if you try to add too many filters, you'll get an error indicating that the existing indexes can't be used to execute the query efficiently; however, trying to add the required indexes in a case like this will usually result in the exploding indexes problem.)
The ugliness in the first approach can probably be solved by passing a list to your function instead of individual variables, then using a list comprehension instead of a bunch of if statements.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

GAE ndb filter by key(id) - python

Related

How to filter documents using 'fq' in Solr based on two or more fields?

Peewee dynamically generate where clause to check for duplicates

Django: distinct on a foreign key, then ordering

Order_by in sqlalchemy with outer join

Matching all records in a datastore query

Categories

Resources