App engine query retrieve data with index reference - python

class Entry(Base):
amount = db.IntegerProperty()
entries = Entry.gql("WHERE amount > 0")
Is there a way to refer to entries result with an index as an array, for example
my_entry = entries[4]

entries = [x for x in Entry.gql("WHERE amount > 0")]
The distinction between this and previous answers is that it filters at the datastore rather than in the handler, and doesn't require you to guess the maximum number of entities that will be returned.

You could use the fetch() method on the Query instance:
class Entry(Base):
amount = db.IntegerProperty()
entries = Entry.gql("WHERE amount > 0").fetch(5)
print entries[4].amount

You have to do a fetch() . which will give you a list of entries . In that case my_entry=entries[4] will give you the fifth object. What you were trying to do is manipulating the gql object. Which obviously won't work. Try this
class Entry(Base):
amount = db.IntegerProperty()
entries = Entry.gql("WHERE amount > 0").fetch(1000)
print entries[4].amount

If you want to refer to one object of specific index in your result query, you can use the fetch method of db.Query with offset parameter:
entry = Entry.gql("WHERE amount > 0").fetch(1, offset=4)[0]
print entry.amount
However, if you want to refer to the multiple objects from the query results, fetch them all and index as normal Python array:
entries = Entry.gql("WHERE amount > 0").fetch(1000)
print entries[4].amount
print entries[5].amount
print entries[7].amount
# etc.

entries= [entry for entry from Entry.all() if entry.amount > 0]
print entries[4]

Related

How to Make Iterator for do sum of a field from n number of objects

Lets say i have a model
class Testmodel():
amount = models.IntegerField(null=True)
contact = models.CharField()
Now I am making a query like:
obj1 = Testmodel.objects.filter(contact = 123)
and suppose its returning n number objects in any case like (obj1,obj2,obj3 ...)
So, if I want to make the sum of amount from all the returning object (obj1,obj2,obj3 ...) then how to do by the best way.
any help will be appreciated.
It is usually better to do this at the database level, than in Python. We can use .aggregate(..) for that:
from django.db.models import Sum
Testmodel.objects.filter(contact=123).aggregate(total=Sum('amount'))['total']
The .aggregate(total=Sum('amount')) will return a dictionary that contains a single key-value pair: 'total' will be associated with the sum of the amount of the rows. In case no rows are selected (i.e. the filter does not match anything), then it will associate None with the key.
Given the database supports to sum up values (most databases do), you construct a query that is something similar to:
SELECT SUM(amount) AS total
FROM app_testmodel
WHERE contact = 123
Use aggregate
from django.db.models import Sum
Testmodel.objects.filter(contact=123).aggregate(
total_sum=Sum('amount')
)

Composite key querying in couchbase 4.0

I got a view like this:
function (doc, meta) {
if(doc.type){
var id = doc.id ? doc.id: "";
var company = doc.company ? doc.company: "";
var store = doc.store ? doc.store: "";
emit([doc.type, id, company, store]);
}
}
And documents which all contain a type and a combination of the other 3 fields, depending on the type.
I want to query generically via this view with the following function:
def find_by_type_pageing_by_id_company_store(self, format_function=None, page=None, rows=None, recent=None, type=None, id="", company="", store="", include_docs=True):
if not type:
logger.error("No Type Provided in find by type query")
raise exceptions.InvalidQueryParams("No Type Provided in find by type query")
view = VIEW_BY_TYPE_VIN_COMPANY_STORE
cb = self.get_cb_bucket()
query = Query()
# 'recent' and 'rows' are equivalent and will be unified to 'limit' here
if recent and rows:
raise exceptions.InvalidQueryParams(detail="Query may not contain both 'recent' and 'rows'")
limit = rows or recent
if limit:
try:
rows_per_page = int(limit)
except ValueError:
raise exceptions.InvalidQueryParams(detail="Query params 'recent' and 'rows' have to be integers")
if rows_per_page > settings.PAGINATION_MAX_ROWS_LIMIT:
raise exceptions.InvalidQueryParams(detail="Query params 'recent' and 'rows' may not exceed %s. "
"Use the additional param 'page=2', 'page=3', etc. to access "
"more objects" % settings.PAGINATION_MAX_ROWS_LIMIT)
try:
page = 1 if page is None else int(page)
except ValueError:
raise exceptions.InvalidQueryParams(detail="Query param 'page' has to be an integer")
skip = rows_per_page * (page - 1)
query.limit = rows_per_page
query.skip = skip
query.mapkey_range = [
[type, id, company, workshop],
[type, id + query.STRING_RANGE_END, company + query.STRING_RANGE_END, store + query.STRING_RANGE_END]
]
rows = cb.query(view['doc'], view['view'], include_docs=include_docs, query=query)
if format_function is None:
format_function = self.format_function_default
return_array = format_function(rows)
return return_array
It works flawlessly when only querying for a certain type, or a type and an id range.
But if I e.g. want to have all docs of a certain type belonging to a company, disregarding id and store, also docs of other companies are delivered.
I tried by:
query.mapkey_range = [
["Vehicle", "", "abc", ""]
["Vehicle", q.STRING_RANGE_END, "abc", q.STRING_RANGE_END]
]
I know, somehow the order of the values in the composite key matters, thats why the query for an id range probably is succesful.
But I could not find any detailed explanation how the order matters and how to handle this use case.
Any idea or hint how to cope with this?
Thank you in advance.
with compound keys, the order in emit determines the internal "sorting" of the index. When using range query, this order is used.
In your case:
index contains all Vehicles
all the Vehicles are then sorted by id
for each similar id, Vehicles are sorted by company
for each similar id and company, Vehicles are then sorted by store
Let's take an example of 4 vehicles. Here is what the index would look like:
Vehicle,a,ACME,store100
Vehicle,c,StackOverflow,store1001
Vehicle,d,ACME,store100
Vehicle,e,StackOverflow,store999
Here is what happens with a range query:
The view engine finds the first row >= to the startKey from your range
It then finds the last one that is <= to the endKey of your range
It returns every row in between in the array
You can see how, depending on the ids, this can lead to seemingly bad results: for [["Vehicle", "", "ACME", ""], ["Vehicle", RANGE_END, "ACME", RANGE_END]] here is what happens:
row 1 (a) is identified as the lowest matching the startKey
row 4 (e) doesn't match the endKey, because "Vehicle,e,StackOverflow,store999" is greater than "Vehicle,RANGE_END,ACME,RANGE_END" due to the third component
row 3 (d) is the upper bound: Vehicle <= Vehicle, d <= RANGE_END, ACME <= ACME, store100 <= RANGE_END
hence row 1-3 are returned, including row 2 from "StackOverflow"
TL/DR: Ordering in the emit matters, you cannot query with sparse "jokers" in the left side of the compound key.
Change the map function to emit(doc.type, doc.company, doc.store, id) (most generic to least generic attribute) and it should work fine after you rework your query accordingly.
Here is a link from the doc explaining compound keys and ranges with dates: Partial Selection With Compound Keys
You have two options for querying your documents by a variable number/order of fields:
Use a multidimentional view (aka. spatial view), which lets you omit parts of the compound key in the query. Here is an example of using such a view: http://developer.couchbase.com/documentation/server/4.0/views/sv-example2.html
Use N1QL, which lets you actually query on any number of fields dynamically. Make sure you add indexes for the fields you intend to query, and use the EXPLAIN statement to check that your queries execute as you expect them to. Here is how you use N1QL in Python: http://developer.couchbase.com/documentation/server/4.0/sdks/python-2.0/n1ql-queries.html
As you've already discovered, you cannot use a regular view, because you can only query it by the exact order of fields in your compound key.

how to get some fields list by sqlalchemy?

This is how I get all of the field topicid values in Topics table.
all_topicid = [i.topicid for i in session.query(Topics)]
But when Topics table have lots of values, the vps killed this process. So is there some good method to resolve this?
Thanks everyone. I edit my code again, My code is below:
last = session.query(Topics).order_by('-topicid')[0].topicid
all_topicid = [i.topicid for i in session.query(Topics.topicid)]
all_id = range(1, last+1)
diff = list(set(all_id).difference(set(all_topicid)))
I want to get diff. Now it is faster than before. So are there other method to improve this code?
you could try by changing your query to return a list of id's with something like:
all_topic_id = session.query(Topics.topicid).all()
if the table contains duplicate topicid's you could add distinct to the above to return unique values
from sqlalchemy import distinct
all_topic_id = session.query(distinct(Topics.topicid)).all()
if this still causes an issue I would probably go for writing a stored procedure that returns the list of topicid's and have sqlalchemy call it.
for the second part I would do something like the below.
from sqlalchemy import distinct, func
all_topic_id = session.query(distinct(Topics.topicid)).all() # gets all ids
max_id = session.query(func.max(Topics.topicid)).one() # gets the last id
all_ids = range(1, max_number[0] + 1)) # creates list of all id's
missing_ids = list(set(all_topic_ids) - set(max_id)) # creates a list of missing id's

Django get a random object

I am trying to get a random object from a model A
For now, it is working well with this code:
random_idx = random.randint(0, A.objects.count() - 1)
random_object = A.objects.all()[random_idx]
But I feel this code is better:
random_object = A.objects.order_by('?')[0]
Which one is the best? Possible problem with deleted objects using the first code? Because, for example, I can have 10 objects but the object with the number 10 as id, is not existing anymore? Did I have misunderstood something in A.objects.all()[random_idx] ?
Just been looking at this. The line:
random_object = A.objects.order_by('?')[0]
has reportedly brought down many servers.
Unfortunately Erwans code caused an error on accessing non-sequential ids.
There is another short way to do this:
import random
items = list(Product.objects.all())
# change 3 to how many random items you want
random_items = random.sample(items, 3)
# if you want only a single random item
random_item = random.choice(items)
The good thing about this is that it handles non-sequential ids without error.
Improving on all of the above:
from random import choice
pks = A.objects.values_list('pk', flat=True)
random_pk = choice(pks)
random_obj = A.objects.get(pk=random_pk)
We first get a list of potential primary keys without loading any Django object, then we randomly choose one primary key, and then we load the chosen object only.
The second bit of code is correct, but can be slower, because in SQL that generates an ORDER BY RANDOM() clause that shuffles the entire set of results, and then takes a LIMIT based on that.
The first bit of code still has to evaluate the entire set of results. E.g., what if your random_idx is near the last possible index?
A better approach is to pick a random ID from your database, and choose that (which is a primary key lookup, so it's fast). We can't assume that our every id between 1 and MAX(id) is available, in the case that you've deleted something. So following is an approximation that works out well:
import random
# grab the max id in the database
max_id = A.objects.order_by('-id')[0].id
# grab a random possible id. we don't know if this id does exist in the database, though
random_id = random.randint(1, max_id + 1)
# return an object with that id, or the first object with an id greater than that one
# this is a fast lookup, because your primary key probably has a RANGE index.
random_object = A.objects.filter(id__gte=random_id)[0]
How about calculating maximal primary key and getting random pk?
The book ‘Django ORM Cookbook’ compares execution time of the following functions to get random object from a given model.
from django.db.models import Max
from myapp.models import Category
def get_random():
return Category.objects.order_by("?").first()
def get_random3():
max_id = Category.objects.all().aggregate(max_id=Max("id"))['max_id']
while True:
pk = random.randint(1, max_id)
category = Category.objects.filter(pk=pk).first()
if category:
return category
Test was made on a million DB entries:
In [14]: timeit.timeit(get_random3, number=100)
Out[14]: 0.20055226399563253
In [15]: timeit.timeit(get_random, number=100)
Out[15]: 56.92513192095794
See source.
After seeing those results I started using the following snippet:
from django.db.models import Max
import random
def get_random_obj_from_queryset(queryset):
max_pk = queryset.aggregate(max_pk=Max("pk"))['max_pk']
while True:
obj = queryset.filter(pk=random.randint(1, max_pk)).first()
if obj:
return obj
So far it did do the job as long as there is an id.
Notice that the get_random3 (get_random_obj_from_queryset) function won’t work if you replace model id with uuid or something else. Also, if too many instances were deleted the while loop will slow the process down.
Yet another way:
pks = A.objects.values_list('pk', flat=True)
random_idx = randint(0, len(pks)-1)
random_obj = A.objects.get(pk=pks[random_idx])
Works even if there are larger gaps in the pks, for example if you want to filter the queryset before picking one of the remaining objects at random.
EDIT: fixed call of randint (thanks to #Quique). The stop arg is inclusive.
https://docs.python.org/3/library/random.html#random.randint
I'm sharing my latest test result with Django 2.1.7, PostgreSQL 10.
students = Student.objects.all()
for i in range(500):
student = random.choice(students)
print(student)
# 0.021996498107910156 seconds
for i in range(500):
student = Student.objects.order_by('?')[0]
print(student)
# 0.41299867630004883 seconds
It seems that random fetching with random.choice() is about 2x faster.
in python for getting a random member of a iterable object like list,set, touple or anything else you can use random module.
random module have a method named choice, this method get a iterable object and return a one of all members randomly.
so becouse random.choice want a iterable object you can use this method for queryset in django.
first import the random module:
import random
then create a list:
my_iterable_object = [1, 2, 3, 4, 5, 6]
or create a query_set like this:
my_iterable_object = mymodel.objects.filter(name='django')
and for getting a random member of your iterable object use choice method:
random_member = random.choice(my_iterable_object)
print(random_member) # my_iterable_object is [1, 2, 3, 4, 5, 6]
3
full code:
import random
my_list = [1, 2, 3, 4, 5, 6]
random.choice(my_list)
2
import random
def get_random_obj(model, length=-1):
if length == -1:
length = model.objects.count()
return model.objects.all()[random.randint(0, length - 1)]
#to use this function
random_obj = get_random_obj(A)

Why/how does iterating over a list and calling 'pass' each time fix this function?

I have written the following function:
def auto_update_ratings(amounts, assessment_entries_qs, lowest_rating=-1):
start = 0
rating = lowest_rating
ids = assessment_entries_qs.values_list('id', flat=True)
for i in ids: # I have absolutely no idea why this seems to be required:
pass # without this loop, the last AssessmentEntries fail to update
# in the following for loop.
for amount in amounts:
end_mark = start + amount
entries = ids[start:end_mark]
a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
start = end_mark
rating += 1
It does what it is supposed to do (i.e. update the relevant number of entries in assessment_entries_qs with each rating (starting at lowest_rating) as specified in amounts). Here is a simple example:
>>> assessment_entries = AssessmentEntry.objects.all()
>>> print [ae.rating for ae in assessment_entries]
[None, None, None, None, None, None, None, None, None, None]
>>>
>>> auto_update_ratings((2,4,3,1), assessment_entries, 1)
>>> print [ae.rating for ae in assessment_entries]
[1, 1, 2, 2, 2, 2, 3, 3, 3, 4]
However, if I do not iterate through ids before iterating through amounts, the function only updates a subset of the queryset: with my current test data (approximately 250 AssessmentEntries in the queryset), it always results in exactly 84 AssessmentEntries not being updated.
Interestingly, it is always the last iteration of the second for loop that does not result in any updates (although the rest of the code in that iteration does execute properly), as well as a portion of the previous iteration. The querysets are ordered_by('?') prior to being passed to this function, and the intended results are achieved if I simply add the previous 'empty' for loop, so it does not appear to be an issue with my data).
A few more details, just in case they prove to be relevant:
AssessmentEntry.rating is a standard IntegerField(null=True,blank=True).
I am using this function purely for testing purposes, so I have only been executing it from iPython.
Test database is SQLite.
Question: Can someone please explain why I appear to need to iterate through ids, despite not actually touching the data in any way, and why without doing so the function still (sort of) executes correctly, but always fails to update the last few items in the queryset despite apparently still iterating through them?
QuerySets and QuerySet slicing are evaluated lazily. Iterating ids executes the query and makes ids behave like a static list instead of a QuerySet. So when you loop through ids, it causes entries later on to be a fixed set of values; but if you don't loop through ids, then entries is just a subquery with a LIMIT clause added to represent the slicing you do.
Here is what is happening in detail:
def auto_update_ratings(amounts, assessment_entries_qs, lowest_rating=-1):
# assessment_entries_qs is an unevaluated QuerySet
# from your calling code, it would probably generate a query like this:
# SELECT * FROM assessments ORDER BY RANDOM()
start = 0
rating = lowest_rating
ids = assessment_entries_qs.values_list('id', flat=True)
# ids is a ValueQuerySet that adds "SELECT id"
# to the query that assessment_entries_qs would generate.
# So ids is now something like:
# SELECT id FROM assessments ORDER BY RANDOM()
# we omit the loop
for amount in amounts:
end_mark = start + amount
entries = ids[start:end_mark]
# entries is now another QuerySet with a LIMIT clause added:
# SELECT id FROM assessments ORDER BY RANDOM() LIMIT start,(start+end_mark)
# When filter() gets a QuerySet, it adds a subquery
a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
# FINALLY, we now actually EXECUTE a query which is something like this:
# UPDATE assessments SET rating=? WHERE id IN
# (SELECT id FROM assessments ORDER BY RANDOM() LIMIT start,(start+end_mark))
start = end_mark
rating += 1
Since the subquery in entries is executed every time you insert and it has a random order, the slicing you do is meaningless! This function does not have deterministic behavior.
However when you iterate ids you actually execute the query, so your slicing has deterministic behavior again and the code does what you expect.
Let's see what happens when you use a loop instead:
ids = assessment_entries_qs.values_list('id', flat=True)
# Iterating ids causes the query to actually be executed
# This query was sent to the DB:
# SELECT id FROM assessments ORDER BY RANDOM()
for id in ids:
pass
# ids has now been "realized" and contains the *results* of the query
# e.g., [5,1,2,3,4]
# Iterating again (or slicing) will now return values rather than modify the query
for amount in amounts:
end_mark = start + amount
entries = ids[start:end_mark]
# because ids was executed, entries contains definite values
# When filter() gets actual values, it adds a simple condition
a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
# The query executed is something like this:
# UPDATE assessments SET rating=? WHERE id IN (5,1)
# "(5,1)" will change on each iteration, but it will always be a set of
# scalar values rather than a subquery.
start = end_mark
rating += 1
If you ever need to eagerly evaluate a QuerySet to get all its values at a moment in time, rather than perform a do-nothing iteration just convert it to a list:
ids = list(assessment_entries_qs.values_list('id', flat=True))
Also the Django docs go into detail about when exactly a QuerySet is evaluated.

Categories