python boto simpledb with nested dictionaries - python

Am I correct to assume that nested dictionaries are not supported in aws simpledb? Should I just serialize everything into json and push to the database?
For example,
test = dict(company='test company', users={'username':'joe', 'password': 'test'})
This returns test with keys of 'company' and 'users', however 'users' just represents a string..

Simply, YES, SimpleDB provides only first level of keys.
So if you want to store data with higher level of key nesting, you will have to serialize the data to a string and you will not have simple select commands to make queries, using deeper nested data (you will be given to test it as a string, but will not have simple access to subkey values).
Note, that one key (in one record) handles storing multiple values, but this is sort of list (often used to store multiple tags), but not a dictionary.

Related

AWS DynamoDB execute_statement Without Data Types in Python

I am using boto3 to query my DynamoDB table using PartiQL,
dynamodb = boto3.client(
'dynamodb',
aws_access_key_id='<aws_access_key_id>',
aws_secret_access_key='<aws_secret_access_key>',
region_name='<region_name>'
)
resp = dynamodb.execute_statement(Statement='SELECT * FROM TryDaxTable')
Now, the response you get contains a list of dictionaries that looks something like this,
{'some_data': {'S': 'XXX'},
'sort_key': {'N': '1'},
'partition_key': {'N': '7'}
}
Along with the attribute name (e.g. partition_key), you get the data type of the value (e.g. 'N') and then the actual value (e.g. '7'). It can also be seen that value does not actually come in specified data type either (e.g. partition_key is supposed to be a number (N), but the value is a string.
Is there some way I can get my results in a list of dictionaries without the types and also with the types applied?
That would mean something like this,
{'some_data': 'XXX',
'sort_key': 1,
'partition_key': 7
}
Notice that in addition to removing the data types, the values have also been converted to the correct type.
This is a simple record, but more complex ones can have lists and nested dictionaries. More information is available here,
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Client.execute_statement
Is there some way that I can get the data in the format I desire?
Or has somebody already written a function to parse the data?
I know there are several questions regarding this posted already, but most of them relate to SDKs in other languages. For instance,
AWS DynamoDB data with and/or without types?
I did not find one that has addressed this issue in Python.
Note: I want to continue to use PartiQL to query my table.
If you can use .resource instead of .client you'll live at a higher abstraction layer.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#service-resource
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('name')
You want to use PartiQL which returns the lower-level data format, so you'll probably have to follow the advice at How to convert a boto3 Dynamo DB item to a regular dictionary in Python?

Checking whether specified multiple keys exist in Datastore table without fetching the entity

I have let's say 1000 key names whose existance I want to check in the Google App Engine datastore, but without fetching the entities themselves. One of the reason, beside possible speedups, is that keys-only fetching is free (no cost).
ndb.get_multi() allows me to pass in the list of keys, but it will retrieve the entities. I need a function to do just that but without fetching the entities, but just True or False based whether the specified keys exist.
I'd probably use a keys-only query...:
q = EntityKind.query(EntityKind.key.IN(wanted_keys))
keys_present = set(q.iter(keys_only=True))
That gives you keys_present as a set of those keys in wanted_keys that are actually present in the datastore. Not quite the same as your desired mapping from key to bool, but, the latter can be easily built:
key_there = {k: (k in keys_present) for k in wanted_keys}
...should you actually want it (a dict with bool values is usually more likely to be a less-wieldy hack for a set!-).

Store dictionary in database

I create a Berkeley database, and operate with it using bsddb module. And I need to store there information in a style, like this:
username = '....'
notes = {'name_of_note1':{
'password':'...',
'comments':'...',
'title':'...'
}
'name_of_note2':{
#keys same as previous, but another values
}
}
This is how I open database
db = bsddb.btopen['data.db','c']
How do I do that ?
So, first, I guess you should open your database using parentheses:
db = bsddb.btopen('data.db','c')
Keep in mind that Berkeley's pattern is key -> value, where both key and value are string objects (not unicode). The best way in your case would be to use:
db[str(username)] = json.dumps(notes)
since your notes are compatible with the json syntax.
However, this is not a very good choice, say, if you want to query only usernames' comments. You should use a relational database, such as sqlite, which is also built-in in Python.
A simple solution was described by #Falvian.
For a start there is a column pattern in ordered key/value store. So the key/value pattern is not the only one.
I think that bsddb is viable solution when you don't want to rely on sqlite. The first approach is to create a documents = bsddb.btopen['documents.db','c'] and store inside json values. Regarding the keys you have several options:
Name the keys yourself, like you do "name_of_note_1", "name_of_note_2"
Generate random identifiers using uuid.uuid4 (don't forget to check it's not already used ;)
Or use a row inside this documents with key=0 to store a counter that you will use to create uids (unique identifiers).
If you use integers don't forget to pack them with lambda x: struct.pack('>q', uid) before storing them.
If you need to create index. I recommend you to have a look at my other answer introducting composite keys to build index in bsddb.

Which one is more efficient?

I have a Python program for deleting duplicates from a list of names.
But I'm in a dilemma and searching out for a most efficient way out of both means.
I have uploaded a list of names to a SQLite DB, into a column in a table.
Whether comparing the names and deleting the duplicates out of them in a DB is good or loading them to Python means getting them into Python and deleting the duplicates and pushing them back to the DB is good?
I'm confused and here is a piece of code to do it on SQLite:
dup_killer (member_id, date) SELECT * FROM talks GROUP BY member_id,
If you use the names as a key in the database, the database will make sure they are not duplicated. So there would be no reason to ship the list to Python and de-dup there.
If you haven't inserted the names into the database yet, you might as well de-dup them in Python first. It is probably faster to do it in Python using the built-in features than to incur the overhead of repeated attempts to insert to the database.
(By the way: you can really speed up the insertion of many names if you wrap all the inserts in a single transaction. Start a transaction, insert all the names, and finish the transaction. The database does some work to make sure that the database is consistent, and it's much more efficient to do that work once for a whole list of names, rather than doing it once per name.)
If you have the list in Python, you can de-dup it very quickly using built-in features. The two common features that are useful for de-duping are the set and the dict.
I have given you three examples. The simplest case is where you have a list that just contains names, and you want to get a list with just unique names; you can just put the list into a set. The second case is that your list contains records and you need to extract the name part to build the set. The third case shows how to build a dict that maps a name onto a record, then inserts the record into a database; like a set, a dict will only allow unique values to be used as keys. When the dict is built, it will keep the last value from the list with the same name.
# list already contains names
unique_names = set(list_of_all_names)
unique_list = list(unique_names) # lst now contains only unique names
# extract record field from each record and make set
unique_names = set(x.name for x in list_of_all_records)
unique_list = list(unique_names) # lst now contains only unique names
# make dict mapping name to a complete record
d = dict((x.name, x) for x in list_of_records)
# insert complete record into database using name as key
for name in d:
insert_into_database(d[name])

Ordered Dictionary in Python: add to MongoDB

I have a list of two element tuples, where the first element is a string (name of some parameter) and the second element is a float (the value of that parameter). For example,
thelist = [('costperunit', 200), ('profit', 10000), ('fixedcost', 5000),
('numpeople':300)]
There are many more such tuples and the names are different in the real case. I want to add these to a mongoDB database as key: value pairs. Here is how I want to add it.
db.collection.insert( {paramvalues: {'costperunit':200, 'profit':10000,
'fixedcost': 5000, 'numpeople': 300} } )
One way to do this is:
dictform = dict(thelist)
db.collection.insert( {paramvalues: dictform} )
This, however, does not ensure the order of the parameter names and values as dict changes the order.
I tried
from collections import OrderedDict
dictform = OrderedDict(thelist)
db.collection.insert( {paramvalues: dictform} )
This maintains the original order of parameter names and values, however, inserts the parameter names and values as list of lists.
I am very new to mongoDB and trying to learn it. Is there a trick either in Python or in mongoDB that would achieve what I want? The reason I want the value of the key paramvalues in the Mongodb database as a dictionary (or Javascript object) is that I can then filter results using the value of some parameter. For example, I can do:
db.collection.find( {'paramvalues.costperunit': 200} )
If you are sure there is no way to do this, I would appreciate if you let me know.
Thanks.
Pymongo offers a subclass of dict, bson.son.SON: http://api.mongodb.org/python/current/api/bson/son.html which is ordered for cases where you need that such as sending commands.
Dicts in Python and arrays in Javascript/BSON (MongoDB) are not ordered. Either you store some explicit sort-index number as part of the dict/array and perform app-level sorting on this level or you insert your data into a list which is of course sorted.

Categories