Ordered Dictionary in Python: add to MongoDB - python

I have a list of two element tuples, where the first element is a string (name of some parameter) and the second element is a float (the value of that parameter). For example,
thelist = [('costperunit', 200), ('profit', 10000), ('fixedcost', 5000),
('numpeople':300)]
There are many more such tuples and the names are different in the real case. I want to add these to a mongoDB database as key: value pairs. Here is how I want to add it.
db.collection.insert( {paramvalues: {'costperunit':200, 'profit':10000,
'fixedcost': 5000, 'numpeople': 300} } )
One way to do this is:
dictform = dict(thelist)
db.collection.insert( {paramvalues: dictform} )
This, however, does not ensure the order of the parameter names and values as dict changes the order.
I tried
from collections import OrderedDict
dictform = OrderedDict(thelist)
db.collection.insert( {paramvalues: dictform} )
This maintains the original order of parameter names and values, however, inserts the parameter names and values as list of lists.
I am very new to mongoDB and trying to learn it. Is there a trick either in Python or in mongoDB that would achieve what I want? The reason I want the value of the key paramvalues in the Mongodb database as a dictionary (or Javascript object) is that I can then filter results using the value of some parameter. For example, I can do:
db.collection.find( {'paramvalues.costperunit': 200} )
If you are sure there is no way to do this, I would appreciate if you let me know.
Thanks.

Pymongo offers a subclass of dict, bson.son.SON: http://api.mongodb.org/python/current/api/bson/son.html which is ordered for cases where you need that such as sending commands.

Dicts in Python and arrays in Javascript/BSON (MongoDB) are not ordered. Either you store some explicit sort-index number as part of the dict/array and perform app-level sorting on this level or you insert your data into a list which is of course sorted.

Related

Store dictionary in database

I create a Berkeley database, and operate with it using bsddb module. And I need to store there information in a style, like this:
username = '....'
notes = {'name_of_note1':{
'password':'...',
'comments':'...',
'title':'...'
}
'name_of_note2':{
#keys same as previous, but another values
}
}
This is how I open database
db = bsddb.btopen['data.db','c']
How do I do that ?
So, first, I guess you should open your database using parentheses:
db = bsddb.btopen('data.db','c')
Keep in mind that Berkeley's pattern is key -> value, where both key and value are string objects (not unicode). The best way in your case would be to use:
db[str(username)] = json.dumps(notes)
since your notes are compatible with the json syntax.
However, this is not a very good choice, say, if you want to query only usernames' comments. You should use a relational database, such as sqlite, which is also built-in in Python.
A simple solution was described by #Falvian.
For a start there is a column pattern in ordered key/value store. So the key/value pattern is not the only one.
I think that bsddb is viable solution when you don't want to rely on sqlite. The first approach is to create a documents = bsddb.btopen['documents.db','c'] and store inside json values. Regarding the keys you have several options:
Name the keys yourself, like you do "name_of_note_1", "name_of_note_2"
Generate random identifiers using uuid.uuid4 (don't forget to check it's not already used ;)
Or use a row inside this documents with key=0 to store a counter that you will use to create uids (unique identifiers).
If you use integers don't forget to pack them with lambda x: struct.pack('>q', uid) before storing them.
If you need to create index. I recommend you to have a look at my other answer introducting composite keys to build index in bsddb.

python boto simpledb with nested dictionaries

Am I correct to assume that nested dictionaries are not supported in aws simpledb? Should I just serialize everything into json and push to the database?
For example,
test = dict(company='test company', users={'username':'joe', 'password': 'test'})
This returns test with keys of 'company' and 'users', however 'users' just represents a string..
Simply, YES, SimpleDB provides only first level of keys.
So if you want to store data with higher level of key nesting, you will have to serialize the data to a string and you will not have simple select commands to make queries, using deeper nested data (you will be given to test it as a string, but will not have simple access to subkey values).
Note, that one key (in one record) handles storing multiple values, but this is sort of list (often used to store multiple tags), but not a dictionary.

Which one is more efficient?

I have a Python program for deleting duplicates from a list of names.
But I'm in a dilemma and searching out for a most efficient way out of both means.
I have uploaded a list of names to a SQLite DB, into a column in a table.
Whether comparing the names and deleting the duplicates out of them in a DB is good or loading them to Python means getting them into Python and deleting the duplicates and pushing them back to the DB is good?
I'm confused and here is a piece of code to do it on SQLite:
dup_killer (member_id, date) SELECT * FROM talks GROUP BY member_id,
If you use the names as a key in the database, the database will make sure they are not duplicated. So there would be no reason to ship the list to Python and de-dup there.
If you haven't inserted the names into the database yet, you might as well de-dup them in Python first. It is probably faster to do it in Python using the built-in features than to incur the overhead of repeated attempts to insert to the database.
(By the way: you can really speed up the insertion of many names if you wrap all the inserts in a single transaction. Start a transaction, insert all the names, and finish the transaction. The database does some work to make sure that the database is consistent, and it's much more efficient to do that work once for a whole list of names, rather than doing it once per name.)
If you have the list in Python, you can de-dup it very quickly using built-in features. The two common features that are useful for de-duping are the set and the dict.
I have given you three examples. The simplest case is where you have a list that just contains names, and you want to get a list with just unique names; you can just put the list into a set. The second case is that your list contains records and you need to extract the name part to build the set. The third case shows how to build a dict that maps a name onto a record, then inserts the record into a database; like a set, a dict will only allow unique values to be used as keys. When the dict is built, it will keep the last value from the list with the same name.
# list already contains names
unique_names = set(list_of_all_names)
unique_list = list(unique_names) # lst now contains only unique names
# extract record field from each record and make set
unique_names = set(x.name for x in list_of_all_records)
unique_list = list(unique_names) # lst now contains only unique names
# make dict mapping name to a complete record
d = dict((x.name, x) for x in list_of_records)
# insert complete record into database using name as key
for name in d:
insert_into_database(d[name])

Python memcache preserve order in get_multi()

How can I preserve the order of the values fetched with memcache's get_multi() function? By default, the order returned is random. Thanks.
Python's Memcache library returns a dictionary, and dictionaries in python are unordered, so you have to get the values from the dictionary in the right order manually:
result = cache.get_multi(keys)
values = [result.get(key) for key in keys]
as I remember memcache has flag GET PRESERVE ORDER, try adding this to function flags

Django models - how to filter out duplicate values by PK after the fact?

I build a list of Django model objects by making several queries. Then I want to remove any duplicates, (all of these objects are of the same type with an auto_increment int PK), but I can't use set() because they aren't hashable.
Is there a quick and easy way to do this? I'm considering using a dict instead of a list with the id as the key.
In general it's better to combine all your queries into a single query if possible. Ie.
q = Model.objects.filter(Q(field1=f1)|Q(field2=f2))
instead of
q1 = Models.object.filter(field1=f1)
q2 = Models.object.filter(field2=f2)
If the first query is returning duplicated Models then use distinct()
q = Model.objects.filter(Q(field1=f1)|Q(field2=f2)).distinct()
If your query really is impossible to execute with a single command, then you'll have to resort to using a dict or other technique recommended in the other answers. It might be helpful if you posted the exact query on SO and we could see if it would be possible to combine into a single query. In my experience, most queries can be done with a single queryset.
Is there a quick and easy way to do this? I'm considering using a dict instead of a list with the id as the key.
That's exactly what I would do if you were locked into your current structure of making several queries. Then a simply dictionary.values() will return your list back.
If you have a little more flexibility, why not use Q objects? Instead of actually making the queries, store each query in a Q object and use a bitwise or ("|") to execute a single query. This will achieve your goal and save database hits.
Django Q objects
You can use a set if you add the __hash__ function to your model definition so that it returns the id (assuming this doesn't interfere with other hash behaviour you may have in your app):
class MyModel(models.Model):
def __hash__(self):
return self.pk
If the order doesn't matter, use a dict.
Remove "duplicates" depends on how you define "duplicated".
If you want EVERY column (except the PK) to match, that's a pain in the neck -- it's a lot of comparing.
If, on the other hand, you have some "natural key" column (or short set of columns) than you can easily query and remove these.
master = MyModel.objects.get( id=theMasterKey )
dups = MyModel.objects.filter( fld1=master.fld1, fld2=master.fld2 )
dups.all().delete()
If you can identify some shorter set of key fields for duplicate identification, this works pretty well.
Edit
If the model objects haven't been saved to the database yet, you can make a dictionary on a tuple of these keys.
unique = {}
...
key = (anObject.fld1,anObject.fld2)
if key not in unique:
unique[key]= anObject
I use this one:
dict(zip(map(lambda x: x.pk,items),items)).values()

Categories