Composite key querying in couchbase 4.0 - python

I got a view like this:
function (doc, meta) {
if(doc.type){
var id = doc.id ? doc.id: "";
var company = doc.company ? doc.company: "";
var store = doc.store ? doc.store: "";
emit([doc.type, id, company, store]);
}
}
And documents which all contain a type and a combination of the other 3 fields, depending on the type.
I want to query generically via this view with the following function:
def find_by_type_pageing_by_id_company_store(self, format_function=None, page=None, rows=None, recent=None, type=None, id="", company="", store="", include_docs=True):
if not type:
logger.error("No Type Provided in find by type query")
raise exceptions.InvalidQueryParams("No Type Provided in find by type query")
view = VIEW_BY_TYPE_VIN_COMPANY_STORE
cb = self.get_cb_bucket()
query = Query()
# 'recent' and 'rows' are equivalent and will be unified to 'limit' here
if recent and rows:
raise exceptions.InvalidQueryParams(detail="Query may not contain both 'recent' and 'rows'")
limit = rows or recent
if limit:
try:
rows_per_page = int(limit)
except ValueError:
raise exceptions.InvalidQueryParams(detail="Query params 'recent' and 'rows' have to be integers")
if rows_per_page > settings.PAGINATION_MAX_ROWS_LIMIT:
raise exceptions.InvalidQueryParams(detail="Query params 'recent' and 'rows' may not exceed %s. "
"Use the additional param 'page=2', 'page=3', etc. to access "
"more objects" % settings.PAGINATION_MAX_ROWS_LIMIT)
try:
page = 1 if page is None else int(page)
except ValueError:
raise exceptions.InvalidQueryParams(detail="Query param 'page' has to be an integer")
skip = rows_per_page * (page - 1)
query.limit = rows_per_page
query.skip = skip
query.mapkey_range = [
[type, id, company, workshop],
[type, id + query.STRING_RANGE_END, company + query.STRING_RANGE_END, store + query.STRING_RANGE_END]
]
rows = cb.query(view['doc'], view['view'], include_docs=include_docs, query=query)
if format_function is None:
format_function = self.format_function_default
return_array = format_function(rows)
return return_array
It works flawlessly when only querying for a certain type, or a type and an id range.
But if I e.g. want to have all docs of a certain type belonging to a company, disregarding id and store, also docs of other companies are delivered.
I tried by:
query.mapkey_range = [
["Vehicle", "", "abc", ""]
["Vehicle", q.STRING_RANGE_END, "abc", q.STRING_RANGE_END]
]
I know, somehow the order of the values in the composite key matters, thats why the query for an id range probably is succesful.
But I could not find any detailed explanation how the order matters and how to handle this use case.
Any idea or hint how to cope with this?
Thank you in advance.

with compound keys, the order in emit determines the internal "sorting" of the index. When using range query, this order is used.
In your case:
index contains all Vehicles
all the Vehicles are then sorted by id
for each similar id, Vehicles are sorted by company
for each similar id and company, Vehicles are then sorted by store
Let's take an example of 4 vehicles. Here is what the index would look like:
Vehicle,a,ACME,store100
Vehicle,c,StackOverflow,store1001
Vehicle,d,ACME,store100
Vehicle,e,StackOverflow,store999
Here is what happens with a range query:
The view engine finds the first row >= to the startKey from your range
It then finds the last one that is <= to the endKey of your range
It returns every row in between in the array
You can see how, depending on the ids, this can lead to seemingly bad results: for [["Vehicle", "", "ACME", ""], ["Vehicle", RANGE_END, "ACME", RANGE_END]] here is what happens:
row 1 (a) is identified as the lowest matching the startKey
row 4 (e) doesn't match the endKey, because "Vehicle,e,StackOverflow,store999" is greater than "Vehicle,RANGE_END,ACME,RANGE_END" due to the third component
row 3 (d) is the upper bound: Vehicle <= Vehicle, d <= RANGE_END, ACME <= ACME, store100 <= RANGE_END
hence row 1-3 are returned, including row 2 from "StackOverflow"
TL/DR: Ordering in the emit matters, you cannot query with sparse "jokers" in the left side of the compound key.
Change the map function to emit(doc.type, doc.company, doc.store, id) (most generic to least generic attribute) and it should work fine after you rework your query accordingly.
Here is a link from the doc explaining compound keys and ranges with dates: Partial Selection With Compound Keys

You have two options for querying your documents by a variable number/order of fields:
Use a multidimentional view (aka. spatial view), which lets you omit parts of the compound key in the query. Here is an example of using such a view: http://developer.couchbase.com/documentation/server/4.0/views/sv-example2.html
Use N1QL, which lets you actually query on any number of fields dynamically. Make sure you add indexes for the fields you intend to query, and use the EXPLAIN statement to check that your queries execute as you expect them to. Here is how you use N1QL in Python: http://developer.couchbase.com/documentation/server/4.0/sdks/python-2.0/n1ql-queries.html
As you've already discovered, you cannot use a regular view, because you can only query it by the exact order of fields in your compound key.

Related

Sqlalchemy single query for multiple rows from one column in one table

I have encountered some problems with sqlite3 and sqlalchemy. From some time I try to make some specific query and in some way I failed. The database is composed from two tables users, and Properties. Those tables have schema as shown bellow.
sqlite> .schema users
CREATE TABLE users (
id INTEGER NOT NULL,
name VARCHAR(50) NOT NULL,
PRIMARY KEY (id)
);
sqlite> .schema properties
CREATE TABLE properties (
id INTEGER NOT NULL,
property_number INTEGER,
user_id INTEGER,
PRIMARY KEY (id),
FOREIGN KEY(user_id) REFERENCES users (id)
);
The content of users table is pretty straightforward, but properties deserves for some dose of explanations. In property_number column I store different properties, each with its unique number, for example: property bald has number 3 and property tan has number 4 etc. If the User have multiple properties, every one of them occupies one row in the properties table. I choosed this style for easy way to add new properties without messing with migrations and stuff like that.
The problem is, a do not know how to make query which consist of multiple properties. My current best solution is, ask for every single property in separate query. This gives mi list of sets, two different ones. One for the positive and one for the negative instance of given property (positive equals stuff I would like user to have, negative equals stuff I would not like user to have). And in next step I make difference of the two subsets, and get final list which contains users' ids who have interesting for me properties. Then I make query for those users' names. It seems to be very complicated, maybe it is, but for sure it is ugly. I also do not like make single query for every single property. Python code if somone is interested.
def prop_dicts():
"""Create dictionaries of properties
contained in table properties in db.
Returns:
touple:
prop_names (dict)
prom_values (dict)."""
prop_names = {'higins': 10000,
'tall': 1,
'fat': 2,
'bald': 3,
'tan': 4,
'hairry': 5}
prop_values = {1000: 'higins',
1: 'tal',
2: 'fat',
3: 'bald',
4: 'tan',
5: 'hairry'}
dictionaries = (prop_names, prop_values)
return dictionaries
def list_of_sets_intersection(set_list):
"""Makes intersection of all sets in list.
Args:
param1 (list): list containing sets to check.
Returns:
set (values): contains intersectred values."""
if not set_list:
return set()
result = set_list[0]
for s in set_list[1:]:
result &= s
return result
def list_of_sets_union(set_list):
"""Makes union of elements in all sets in list.
Args:
param1 (list): list containing sets to check.
Returns:
set (values): contains union values."""
if not set_list:
return set()
result = set_list[0]
for s in set_list[1:]:
result |= s
return result
def db_search():
"""Search database against positiv and negative values.
Returns:
list (sets): one set in list for every property in
table properties db."""
n, v = prop_dicts()
positive = [2, 3]
negative = [4, 5]
results_p = []
results_n = []
#Positive properties.
for element in xrange(0, len(positive)):
subresult = []
for u_id, in db.query(Property.user_id).\
filter_by(property_number = positive[element]):
subresult.append(u_id)
subresult = set(subresult)
results_p.append(subresult)
#Negative properties.
for element in xrange(0, len(negative)):
subresult = []
for u_id, in db.query(Property.user_id).\
filter_by(property_number = negative[element]):
subresult.append(u_id)
subresult = set(subresult)
results_n.append(subresult)
print 'positive --> ', results_p
print 'negative --> ', results_n
results_p = list_of_sets_intersection(results_p)
results_n = list_of_sets_union(results_n)
print 'positive --> ', results_p
print 'negative --> ', results_n
final_result = results_p.difference(results_n)
return list(final_result)
print db_search()
Is it a way for do it in one single query? I am new in the field of databases and sorry if the quality of the question seems to be lame. There is so many possibilities that I really do not know how to do it in the "right" way. I have searched the vast percent of the internet regarding this topic and best solution I have found was this containing "WHERE" Cause and "AND" Operator. But those two do not work if you connects two the same columns of the one table.
SELECT user_id FROM properties WHERE property_number=3 AND property_number=4;
Or in sqlalchemy.
db.query(User.user_id).join(Property).filter(and_(property_number=3, property_number=4)).all()
This sqlalchemy example may contain some error, because I have no preview for it, but for sure you will understand what is the point of this.
You can do this by using aggregation
SELECT user_id
FROM properties
WHERE property_number in (3, 4)
GROUP BY user_id
HAVING count(*) = 2
In SQLAlchemy
from sqlalchemy import func
properties = [3, 4]
db.session.query(Property.user_id)\
.filter(Property.property_number.in_(properties))\
.group_by(Property.user_id)\
.having(func.count()==len(properties))\
.all()
update
positive = [2, 3]
negative = [4, 5]
positive_query = db.session.query(Property.user_id)\
.filter(Property.property_number.in_(positive))\
.group_by(Property.user_id)\
.having(func.count()==len(positive))
negative_query = db.session.query(Property.user_id)\
.filter(Property.property_number.in_(negative))\
.distinct()
final_result = positive_query.except_(negative_query).all()

Why/how does iterating over a list and calling 'pass' each time fix this function?

I have written the following function:
def auto_update_ratings(amounts, assessment_entries_qs, lowest_rating=-1):
start = 0
rating = lowest_rating
ids = assessment_entries_qs.values_list('id', flat=True)
for i in ids: # I have absolutely no idea why this seems to be required:
pass # without this loop, the last AssessmentEntries fail to update
# in the following for loop.
for amount in amounts:
end_mark = start + amount
entries = ids[start:end_mark]
a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
start = end_mark
rating += 1
It does what it is supposed to do (i.e. update the relevant number of entries in assessment_entries_qs with each rating (starting at lowest_rating) as specified in amounts). Here is a simple example:
>>> assessment_entries = AssessmentEntry.objects.all()
>>> print [ae.rating for ae in assessment_entries]
[None, None, None, None, None, None, None, None, None, None]
>>>
>>> auto_update_ratings((2,4,3,1), assessment_entries, 1)
>>> print [ae.rating for ae in assessment_entries]
[1, 1, 2, 2, 2, 2, 3, 3, 3, 4]
However, if I do not iterate through ids before iterating through amounts, the function only updates a subset of the queryset: with my current test data (approximately 250 AssessmentEntries in the queryset), it always results in exactly 84 AssessmentEntries not being updated.
Interestingly, it is always the last iteration of the second for loop that does not result in any updates (although the rest of the code in that iteration does execute properly), as well as a portion of the previous iteration. The querysets are ordered_by('?') prior to being passed to this function, and the intended results are achieved if I simply add the previous 'empty' for loop, so it does not appear to be an issue with my data).
A few more details, just in case they prove to be relevant:
AssessmentEntry.rating is a standard IntegerField(null=True,blank=True).
I am using this function purely for testing purposes, so I have only been executing it from iPython.
Test database is SQLite.
Question: Can someone please explain why I appear to need to iterate through ids, despite not actually touching the data in any way, and why without doing so the function still (sort of) executes correctly, but always fails to update the last few items in the queryset despite apparently still iterating through them?
QuerySets and QuerySet slicing are evaluated lazily. Iterating ids executes the query and makes ids behave like a static list instead of a QuerySet. So when you loop through ids, it causes entries later on to be a fixed set of values; but if you don't loop through ids, then entries is just a subquery with a LIMIT clause added to represent the slicing you do.
Here is what is happening in detail:
def auto_update_ratings(amounts, assessment_entries_qs, lowest_rating=-1):
# assessment_entries_qs is an unevaluated QuerySet
# from your calling code, it would probably generate a query like this:
# SELECT * FROM assessments ORDER BY RANDOM()
start = 0
rating = lowest_rating
ids = assessment_entries_qs.values_list('id', flat=True)
# ids is a ValueQuerySet that adds "SELECT id"
# to the query that assessment_entries_qs would generate.
# So ids is now something like:
# SELECT id FROM assessments ORDER BY RANDOM()
# we omit the loop
for amount in amounts:
end_mark = start + amount
entries = ids[start:end_mark]
# entries is now another QuerySet with a LIMIT clause added:
# SELECT id FROM assessments ORDER BY RANDOM() LIMIT start,(start+end_mark)
# When filter() gets a QuerySet, it adds a subquery
a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
# FINALLY, we now actually EXECUTE a query which is something like this:
# UPDATE assessments SET rating=? WHERE id IN
# (SELECT id FROM assessments ORDER BY RANDOM() LIMIT start,(start+end_mark))
start = end_mark
rating += 1
Since the subquery in entries is executed every time you insert and it has a random order, the slicing you do is meaningless! This function does not have deterministic behavior.
However when you iterate ids you actually execute the query, so your slicing has deterministic behavior again and the code does what you expect.
Let's see what happens when you use a loop instead:
ids = assessment_entries_qs.values_list('id', flat=True)
# Iterating ids causes the query to actually be executed
# This query was sent to the DB:
# SELECT id FROM assessments ORDER BY RANDOM()
for id in ids:
pass
# ids has now been "realized" and contains the *results* of the query
# e.g., [5,1,2,3,4]
# Iterating again (or slicing) will now return values rather than modify the query
for amount in amounts:
end_mark = start + amount
entries = ids[start:end_mark]
# because ids was executed, entries contains definite values
# When filter() gets actual values, it adds a simple condition
a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
# The query executed is something like this:
# UPDATE assessments SET rating=? WHERE id IN (5,1)
# "(5,1)" will change on each iteration, but it will always be a set of
# scalar values rather than a subquery.
start = end_mark
rating += 1
If you ever need to eagerly evaluate a QuerySet to get all its values at a moment in time, rather than perform a do-nothing iteration just convert it to a list:
ids = list(assessment_entries_qs.values_list('id', flat=True))
Also the Django docs go into detail about when exactly a QuerySet is evaluated.

error about get data

i have a function :
def tong_thoigian (self,cr,uid,ids,context={}):
obj=self.browse(cr,uid,ids,context=context)[0]
cr.execute('''select name,giolam from x_giolam where name=%s'''%(obj.ma_luong))
kq=cr.fetchall()
tong=0.00000
for i in kq:
tong+=kq[1]
self.write(cr,uid,ids,{'tonggiolam':tong},context=context)
and this is table x_giolam:
class x_giolam(osv.osv):
_name = 'x_giolam'
_description = 'Gio Lam'
_columns = {
'name': fields.integer('Lọai',size=64,required="true"),
'giolam' : fields.float('Gio lam',size=64,required="True"),
'time_in': fields.char('Gio vào',size=20),
'time_out' :fields.char('Gio về',size=20),
'congviec' :fields.char('Cong viec',size=50),
}
x_giolam()
and the 'self' is table x_salary, i think isn't importance to say about it because i want write a function for sum salary of a staff when name=Ma_luong of table x_salary
and the error is
IndexError: list index out of range
the type of Giolam is float...
and i write with in openerp
and i think error in line 'tong+=kq[1]'
How can i fix it ?
thanks!!
Using my magic crystal ball, I'm guessing that cr.execute is a call to the standard database API. So kr.fetchall() will return a tuple of rows. However, it seems that your SQL is returning only a single row.
You probably mean tong += kq[0][1], ie the second column (giolam) of the first row of the result. Alternatively, use kr.fetchone() to just get a single row, then you can keep it as kq[1]. Either way, you should check that your db call actually returns results.

App engine query retrieve data with index reference

class Entry(Base):
amount = db.IntegerProperty()
entries = Entry.gql("WHERE amount > 0")
Is there a way to refer to entries result with an index as an array, for example
my_entry = entries[4]
entries = [x for x in Entry.gql("WHERE amount > 0")]
The distinction between this and previous answers is that it filters at the datastore rather than in the handler, and doesn't require you to guess the maximum number of entities that will be returned.
You could use the fetch() method on the Query instance:
class Entry(Base):
amount = db.IntegerProperty()
entries = Entry.gql("WHERE amount > 0").fetch(5)
print entries[4].amount
You have to do a fetch() . which will give you a list of entries . In that case my_entry=entries[4] will give you the fifth object. What you were trying to do is manipulating the gql object. Which obviously won't work. Try this
class Entry(Base):
amount = db.IntegerProperty()
entries = Entry.gql("WHERE amount > 0").fetch(1000)
print entries[4].amount
If you want to refer to one object of specific index in your result query, you can use the fetch method of db.Query with offset parameter:
entry = Entry.gql("WHERE amount > 0").fetch(1, offset=4)[0]
print entry.amount
However, if you want to refer to the multiple objects from the query results, fetch them all and index as normal Python array:
entries = Entry.gql("WHERE amount > 0").fetch(1000)
print entries[4].amount
print entries[5].amount
print entries[7].amount
# etc.
entries= [entry for entry from Entry.all() if entry.amount > 0]
print entries[4]

GQL does not work for GET paramters for keys

I am trying to compare the key to filter results in GQL in Python but the direct comparison nor typecasting to int works. Therefore, I am forced to make a work around as mentioned in the uncommented lines below. Any clues?
row = self.request.get("selectedrow")
#mydbobject = DbModel.gql("WHERE key=:1", row).fetch(1)
#mydbobject = DbModel.gql("WHERE key=:1", int(row)).fetch(1)#invalid literal for int() with base 10
#print mydbobject,row
que = db.Query(DbModel)
results = que.fetch(100)
mydbobject = None
for item in results:
if item.key().__str__() in row:
mydbobject = item
EDIT1- one more attempt that does not retrieve the record, the key exists in the Datastore along with the record
mydbobject = DbModel.gql("WHERE key = KEY('%s')"%row).fetch(1)
Am I correct in my assumption that you're basically just want to retrieve an object with a particular key? If so, the get and get_by_id methods may be of help:
mydbobject = DbModel.get_by_id(int(self.request.get("selectedrow")))
The error "invalid literal for int()" indicate that the paramater pass to int was not a string representing an integer. Try to print the value of "row" for debuging, I bet it is an empty string.
The correct way to retrieve an element from the key is simply by using the method "get" or "get_by_id".
In your case:
row = self.request.get("selectedrow")
mydbobject = DbModel.get(row)

Categories