Use .contains() to retrieve values from DynamoDB - python

I'm new with NoSQL Databases and am stuck with a problem. I just want to get keys from a table in DynamoDB that contains a specific value. I know that for key equals I can use:
response = table.query(
KeyConditionExpression=Key('year').eq(1992)
)
But I can't use:
response = table.query(
KeyConditionExpression=Key('year').contain('1992')
)
The error is:
Key object has no attribute contain.

DynamoDB doesn't follow to use contain for key attribute on Query API. You can use only equals for partition key attribute.
CONTAINS can be used with LIST or SET data type only. Also, it can be used only on FilterExpression.
CONTAINS : Checks for a subsequence, or value in a set.
CONTAINS is supported for lists: When evaluating "a CONTAINS b", "a"
can be a list; however, "b" cannot be a set, a map, or a list.

People looking at this question in 2023 (At least with the current version of Boto!) are going to be misled to the other answers here.
You can now check if a value contains a substring by using the Attr object, not Key.
response = table.scan(
FilterExpression=Attr('key').contains(string)
)

Here you can check all the methods for Key class:
https://github.com/boto/boto3/blob/develop/boto3/dynamodb/conditions.py
Key class is inherited from class AttributeBase and here are nothing like contains, only greater than or starts with etc.

Related

Does Python dictionary ".values()" return the list in a particular order? [Python 2]

I'm using a dictionary in Python 2, and am wondering if the dictionary.values() returns a list of values in a particular order
i.e. the order they were given into the dictionary in?
Example
dict = {}
dict[1] = "potato"
dict[2] = "tomato"
dict[3] = "orange"
dict[4] = "carrot"
list_val = dict.values()
Is list_val in the order potato->tomato->orange->carrot?
or is it in some other order?
This is a simple example, I mean will it return in the same order for even more complex structures?
For example, replace the strings with dictionaries
making it a dictionary of dictionaries
NOTE: This was answered in the other thread, which is linked
Another good answer to refer to for conceptual answer is :
This
UPDATE:
After reading some documentation on hash tables (what dict in python use to store key information) I found that the order in which you assign keys/values is not what defines its location in the has table. There is a really good video HERE that does a great job of explaining how a hash table works and what is going on during the key assignment to the hash table.
Basically a key is assigned a hash number that is then placed in the hash table according to its hash number rather than its key value. This hash number is placed first in the desired location on the hash table that corresponds with its hash number unless there is already something assigned to that location. If something else is already assigned to that location on the Hash Table then python does some calculations to chose the next best location. This can be before or after the desired location in the has table. HERE Is some documentation but I highly recommend watching the video I liked as it does a great job explaining this concept.

Couchdb query multiple keys get all docs by one of keys

I have a view in couchdb that returns results with two keys:
[<string_key>, <date>]
I need to ignore second key "date" and return all docs that matches key "string". But when I'm trying ignore the "date" key, view returns all values. Any advice?
You need to use startkey and endkey for doing this. You can ignore the second parameter by using {} in the end key. It returns all the possible results.
For reference you can look at the summary of this article: http://ryankirkman.com/2011/03/30/advanced-filtering-with-couchdb-views.html

Python-Eve: Prevent inserting duplicates without using unique fields

I am trying to prevent inserting duplicate documents by the following approach:
Get a list of all documents from the desired endpoint which will contain all the documents in JSON-format. This list is called available_docs.
Use a pre_POST_<endpoint> hook in order to handle the request before inserting to the data. I am not using the on_insert hook since I need to do this before validation.
Since we can access the request object use request.json to get the payload JSON-formatted
Check if request.json is already contained in available_docs
Insert new document if it's not a duplicate only, abort otherwise.
Using this approach I got the following snippet:
def check_duplicate(request):
if not request.json in available_sims:
print('Not a duplicate')
else:
print('Duplicate')
flask.abort(422, description='Document is a duplicate and already in database.')
The available_docs list looks like this:
available_docs = [{'foo': ObjectId('565e12c58b724d7884cd02bb'), 'bar': [ObjectId('565e12c58b724d7884cd02b9'), ObjectId('565e12c58b724d7884cd02ba')]}]
The payload request.json looks like this:
{'foo': '565e12c58b724d7884cd02bb', 'bar': ['565e12c58b724d7884cd02b9', '565e12c58b724d7884cd02ba']}
As you can see, the only difference between the document which was passed to the API and the document already stored in the DB is the datatype of the IDs. Due to that fact, the if-statement in my above snippet evaluates to True and judges the document to be inserted not being a duplicate whereas it definitely is a duplicate.
Is there a way to check if a passed document is already in the database? I am not able to use unique fields since the combination of all document fields needs to be unique only. There is an unique identifier (which I left out in this example), but this is not suitable for the desired comparison since it is kind of a time stamp.
I think something like casting the given IDs at the keys foo and bar as ObjectIDs would do the trick, but I do not know how to to this since I do not know where to get the datatype ObjectID from.
You approach would be much slower than setting a unique rule for the field.
Since, from your example, you are going to compare objectids, can't you simply use those as the _id field for the collection? In Mongo (and Eve of course) that field is unique by default. Actually, you typically don't even define it. You would not need to do anything at all, as a POST of a document with an already existing id would fail right away.
If you can't go that way (maybe you need to compare a different objectid field and still, for some reason, you can't simply set a unique rule for the field), I would look at querying the db for the field value instead than getting all the documents from the db and then scanning them sequentially in code. Something like db.find({db_field: new_document_field_value}). If that returns true, new document is a duplicate. Make sure db_field is indexed (which usually holds true also for fields tagged with unique rule)
EDIT after the comments. A trivial implementation would probable be something like this:
def pre_POST_callback(resource, request):
# retrieve mongodb collection using eve connection
docs = app.data.driver.db['docs']
if docs.find_one({'foo': <value>}):
flask.abort(422, description='Document is a duplicate and already in database.')
app = Eve()
app.run()
Here's my approach on preventing duplicate records:
def on_insert_subscription(items):
c_subscription = app.data.driver.db['subscription']
user = decode_token()
if user:
for item in items:
if c_subscription.find_one({
'topic': ObjectId(item['topic']),
'client': ObjectId(user['user_id'])
}):
abort(422, description="Client already subscribed to this topic")
else:
item['client'] = ObjectId(user['user_id'])
else:
abort(401, description='Please provide proper credentials')
What I'm doing here is creating subscriptions for clients. If a client is already subscribed to a topic I throw 422.
Note: the client ID is decoded from the JWT token.

Override return values of a column in SQLAlchemy; hybrid property or custom type?

I would need a very quick advice. I have a table field which can contain NULL, one or more strings, separated by ';'.
At the moment the column is defined in the model as usual:
aliases = Column(String(255))
I have an hybrid property that splits the strings and returns a list:
def my_aliases(self):
if self.aliases:
return [i.strip() for i in self.aliases.split(';')]
How can change the default behaviour of the model to get rid of the useless 'self.aliases' and always get the list or None of 'self.my_aliases'?
Is it possible to override the attribute?
Using mapper or the declarative API you can create a computed attribute in your class. Options would include:
An attribute computed from a query
Using a descriptor to parse/assemble your semicolon-separated list
And I'm assuming here that you don't have the option of changing the fields of your tables. But if you do, putting lists that you have to parse inside a single column is a "smell". For example, what happens when your list is too long? Better to have a separate table for that data and use a straight-forward join to get your aliases list.

Ordered Dictionary in Python: add to MongoDB

I have a list of two element tuples, where the first element is a string (name of some parameter) and the second element is a float (the value of that parameter). For example,
thelist = [('costperunit', 200), ('profit', 10000), ('fixedcost', 5000),
('numpeople':300)]
There are many more such tuples and the names are different in the real case. I want to add these to a mongoDB database as key: value pairs. Here is how I want to add it.
db.collection.insert( {paramvalues: {'costperunit':200, 'profit':10000,
'fixedcost': 5000, 'numpeople': 300} } )
One way to do this is:
dictform = dict(thelist)
db.collection.insert( {paramvalues: dictform} )
This, however, does not ensure the order of the parameter names and values as dict changes the order.
I tried
from collections import OrderedDict
dictform = OrderedDict(thelist)
db.collection.insert( {paramvalues: dictform} )
This maintains the original order of parameter names and values, however, inserts the parameter names and values as list of lists.
I am very new to mongoDB and trying to learn it. Is there a trick either in Python or in mongoDB that would achieve what I want? The reason I want the value of the key paramvalues in the Mongodb database as a dictionary (or Javascript object) is that I can then filter results using the value of some parameter. For example, I can do:
db.collection.find( {'paramvalues.costperunit': 200} )
If you are sure there is no way to do this, I would appreciate if you let me know.
Thanks.
Pymongo offers a subclass of dict, bson.son.SON: http://api.mongodb.org/python/current/api/bson/son.html which is ordered for cases where you need that such as sending commands.
Dicts in Python and arrays in Javascript/BSON (MongoDB) are not ordered. Either you store some explicit sort-index number as part of the dict/array and perform app-level sorting on this level or you insert your data into a list which is of course sorted.

Categories