NDB query by keys value - python

New to using Python NDB.
I have something like:
class User(ndb.Model):
seen_list = nbd.KeyProperty(kind=Survey, repeated=True)
class Survey(ndb.Model):
same = ndb.StringProperty(required=True)
I want to be able to query for users that have not seen certain surveys.
What I am doing now is:
users = User.query(seen_list != 'survey name').fetch()
This does not work. What would be the proper way to do this? Should I first query the Survey list to get the key of the survey with a certain name? Is the != part correct?
I could not find any examples similar to this.
Thanks.

unfortunately, if your survey is a repeated property, it won't work that way. When you query a repeated property the datastore tries EVERY entry in your list, and if one works, it'll return the item. So when you say "!= survey name 1", if you have at least ONE entry in your list that isn't "survey name 1", it'll come back as positive, even if another result IS "survey name 1".
it's uninstinctive if you come from an SQL background I know.... the only way to go around that is to go programatically and evaluate the ones your query returns. It comes from the fact that, for repeated values, Big Table "flatten" your results, which means it creates one entry for EVERY value in your repeated attribute. so as it scans, it eventually finds one "correct" line with your info, grabs the object key from there, and returns the object.

Related

Why is COUNT returning the wrong number?

I'm very new to programming and trying to figure what I'm doing wrong. I have a database with two tables. One is called "addresses and the other is called "tablePlayers". I'm trying to count the number of times a specific person's name appears in the "winner" column and then update it under the "W" column in the table "tablePlayers" on the row of that same person's name.
Here's the code I'm using
c.execute("UPDATE tablePlayers SET W = COUNT(winner) FROM addresses WHERE winner ='Mika'")
Here's what the tables look like in DB Browser for SQLite. As you can see "Mika" only appears once under the "winners" column. But the count says 6 in the other table, and is only printed on one row and not the one with the matching name
addresses
tablePlayers
I can't tell you why exactly it is going wrong, but I would recommend that you use parentheses and a SELECT statement to build a nested query. For example, your query could rather look like this:
UPDATE tablePlayers SET W = (SELECT COUNT(winner) FROM addresses WHERE winner ='Mika') WHERE Name='Mika'
You could also do the more general case and do it for all names at once:
UPDATE tablePlayers SET W = (SELECT COUNT(winner) FROM addresses WHERE winner=tablePlayers.Name)

How do I use web2py smart_query for a GET request?

So I'm trying to use smart_query in web2py to find specific values in a db, but the only explanation I can find is in the web2py book and it's not very clear. The example GET request from the book is formatted like this:
def GET(search):
try:
rows = db.smart_query([db.person, db.pet], search).select()
return dict(result=rows)
except:
...
I'm confused as to what values I would put in place of db.person and db.pet. Here is what the book says on it:
The method db.smart_query takes two arguments:
a list of field or table that should be allowed in the query
a string containing the query expressed in natural language
I'm thinking the first value would be the database I'm searching, but then I don't know what the second value would be. The book makes it sound like it should be the string I'm searching for, but I think that that's what the variable search is for.
Could someone please help me understand what exactly each argument is supposed to do?
The first argument to smart_query is a list of DAL Table and/or Field objects (a Table object in the list will simply be expanded to include all of the table's fields). This list determines which fields can be included in the query.
The second argument is the query itself, which can include field names and comparison operators (and their natural language counterparts) as well as "and" and "or" to expression conjunctions and disjunctions. For an idea of what is allowed, you can examine the relevant code here.
The SQLFORM.grid advanced search widget generates queries that are ultimately parsed by smart_query, so to get a better idea of how to generate such queries, try creating a test SQLFORM.grid and play with the search widget in the UI to see the queries it generates.

DynamoDB Querying in Python (Count with GroupBy)

This may be trivial, but I loaded a local DynamoDB instance with 30GB worth of Twitter data that I aggregated.
The primary key is id (tweet_id from the Tweet JSON), and I also store the date/text/username/geocode.
I basically am interested in mentions of two topics (let's say "Bees" and "Booze"). I want to get a count of each of those by state by day.
So by the end, I should know for each state, how many times each was mentioned on a given day. And I guess it'd be nice to export that as a CSV or something for later analysis.
Some issues I had with doing this...
First, the geocode info is a tuple of [latitude, longitude] so for each entry, I need to map that to a state. That I can do.
Second, is the most efficient way to do this to go through each entry and manually check if it contains a mention of either keyword and then have a dictionary for each that maps the date/location/count?
EDIT:
Since it took me 20 hours to load all the data into my table, I don't want to delete and re-create it. Perhaps I should create a global secondary index (?) and use that to search other fields in a query? That way I don't have to scan everything. Is that the right track?
EDIT 2:
Well, since the table is on my computer locally I should be OK with just using expensive operations like a Scan right?
So if I did something like this:
query = table.scan(
FilterExpression=Attr('text').contains("Booze"),
ProjectionExpression='id, text, date, geo',
Limit=100)
And did one scan for each keyword, then I would be able to go through the resulting filtered list and get a count of mentions of each topic for each state on a given day, right?
EDIT3:
response = table.scan(
FilterExpression=Attr('text').contains("Booze"),
Limit=100)
//do something with this set
while 'LastEvaluatedKey' in response:
response = table.scan(
FilterExpression=Attr('text').contains("Booze"),
Limit=100,
ExclusiveStartKey=response['LastEvaluatedKey']
)
//do something with each batch of 100 entries
So something like that, for both keywords. That way I'll be able to go through the resulting filtered set and do what I want (in this case, figure out the location and day and create a final dataset with that info). Right?
EDIT 4
If I add:
ProjectionExpression='date, location, user, text'
into the scan request, I get an error saying "botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the Scan operation: Invalid ProjectionExpression: Attribute name is a reserved keyword; reserved keyword: location". How do I fix that?
NVM I got it. Answer is to look into ExpressionAttributeNames (see: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ExpressionPlaceholders.html)
Yes, scanning the table for "Booze" and counting the items in the result should give you the total count. Please note that you need to do recursive scan until LastEvaluatedKey is null.
Refer exclusive start key as well.
Scan
EDIT:-
Yes, the code looks good. One thing to note, the result set wouldn't always contain 100 items. Please refer the LIMIT definition below (not same as SQL database).
Limit — (Integer) The maximum number of items to evaluate (not
necessarily the number of matching items). If DynamoDB processes the
number of items up to the limit while processing the results, it stops
the operation and returns the matching values up to that point, and a
key in LastEvaluatedKey to apply in a subsequent operation, so that
you can pick up where you left off. Also, if the processed data set
size exceeds 1 MB before DynamoDB reaches this limit, it stops the
operation and returns the matching values up to the limit, and a key
in LastEvaluatedKey to apply in a subsequent operation to continue the
operation. For more information, see Query and Scan in the Amazon
DynamoDB Developer Guide.

Order_by in sqlalchemy with outer join

I have the following sqlalchemy queries:
score = Scores.query.group_by(Scores.email).order_by(Scores.date).subquery()
students = db.session.query(Students, score.c.id).filter_by(archive=0).order_by(Students.exam_date).outerjoin(score, Students.email == score.c.email)
And then I render the things with:
return render_template('students.html', students=students.all())
Now, the issue is that I want the last score for a student to be displayed, as there are many of them corresponding to each user. But the first one seems to be returned. I tried some sortings and order_by on the first query, score, but without success.
How can I affect and pick only one latest result from the "score" to be paired with the corresponding row in "students"?
Thanks!
First of all you want to make sure that the subquery selects only rows for a particular student. You do that with the correlate method:
score = db.session.query(Scores.id).order_by(Scores.date.desc()).correlate(Students)
This alone does nothing, as you do not access the students. The idea of correlate is that if you use Students on your subquery, it will not add it to the FROM list but instead rely on the outer query providing it. Now you will want to refine your query (the join condition if you wish):
score = score.filter(Students.email == Scores.email)
This will produce a subquery that each time only returns the score for a single student. The remaining question is, if each student has to multiple scores. If so, you need to limit it (if there isn't, you don't need the order_by part from above as well):
score = score.limit(1)
Now you have made sure your query returns a single scalar value. Since you are using the query in a select context you will want to turn it into a scalar value:
students = db.session.query(Students, score.as_scalar()).filter_by(archive=0).order_by(Students.exam_date)
The as_scalar method is a way of telling SQLAlchemy that this returns a single row and column. Because otherwise you could not put it into a select.
Note: I am not 100% sure you need that limit if you put as_scalar. Try it out and expirment. If each student has only one score anyway then you don't need to worry at all about any of that stuff.
A little hint on the way: A Query instance is by itself an iterable. That means as long as you don't print it or similar, you can pass around a query just like a list with the only exception that will really only run on access (lazy) and that you can dynamically refine it, e.g. you could slice it: students[:10] would only select the first 10 students (and if you do it on the query it will execute it as a LIMIT instead of slicing a list, where the former is much more efficient).

How can I make a Django query for the first occurrence of a foreign key in a column?

Basically, I have a table with a bunch of foreign keys and I'm trying to query only the first occurrence of a particular key by the "created" field. Using the Blog/Entry example, if the Entry model has a foreign key to Blog and a foreign key to User, then how can I construct a query to select all Entries in which a particular User has written the first one for the various Blogs?
class Blog(models.Model):
...
class User(models.Model):
...
class Entry(models.Model):
blog = models.Foreignkey(Blog)
user = models.Foreignkey(User)
I assume there's some magic I'm missing to select the first entries of a blog and that I can simple filter further down to a particular user by appending:
query = Entry.objects.magicquery.filter(user=user)
But maybe there's some other more efficient way. Thanks!
query = Entry.objects.filter(user=user).order_by('id')[0]
Basically order by id (lowest to highest), and slice it to get only the first hit from the QuerySet.
I don't have a Django install available right now to test my line, so please check the documentation if somehow I have a type above:
order by
limiting querysets
By the way, interesting note on 'limiting queysets" manual section:
To retrieve a single object rather
than a list (e.g. SELECT foo FROM bar
LIMIT 1), use a simple index instead
of a slice. For example, this returns
the first Entry in the database, after
ordering entries alphabetically by
headline:
Entry.objects.order_by('headline')[0]
EDIT: Ok, this is the best I could come up so far (after yours and mine comment). It doesn't return Entry objects, but its ids as entry_id.
query = Entry.objects.values('blog').filter(user=user).annotate(Count('blog')).annotate(entry_id=Min('id'))
I'll keep looking for a better way.
Ancient question, I realise - #zalew's response is close but will likely result in the error:
ProgrammingError: SELECT DISTINCT ON expressions must match initial
ORDER BY expressions
To correct this, try aligning the ordering and distinct parts of the query:
Entry.objects.filter(user=user).distinct("blog").order_by("blog", "created")
As a bonus, in case of multiple entries being created at exactly the same time (unlikely, but you never know!), you could add determinism by including id in the order_by:
To correct this, try aligning the ordering and distinct parts of the query:
Entry.objects.filter(user=user).distinct("blog").order_by("blog", "created", "id")
Can't test it in this particular moment
Entry.objects.filter(user=user).distinct("blog").order_by("id")

Categories