I create a Berkeley database, and operate with it using bsddb module. And I need to store there information in a style, like this:
username = '....'
notes = {'name_of_note1':{
'password':'...',
'comments':'...',
'title':'...'
}
'name_of_note2':{
#keys same as previous, but another values
}
}
This is how I open database
db = bsddb.btopen['data.db','c']
How do I do that ?
So, first, I guess you should open your database using parentheses:
db = bsddb.btopen('data.db','c')
Keep in mind that Berkeley's pattern is key -> value, where both key and value are string objects (not unicode). The best way in your case would be to use:
db[str(username)] = json.dumps(notes)
since your notes are compatible with the json syntax.
However, this is not a very good choice, say, if you want to query only usernames' comments. You should use a relational database, such as sqlite, which is also built-in in Python.
A simple solution was described by #Falvian.
For a start there is a column pattern in ordered key/value store. So the key/value pattern is not the only one.
I think that bsddb is viable solution when you don't want to rely on sqlite. The first approach is to create a documents = bsddb.btopen['documents.db','c'] and store inside json values. Regarding the keys you have several options:
Name the keys yourself, like you do "name_of_note_1", "name_of_note_2"
Generate random identifiers using uuid.uuid4 (don't forget to check it's not already used ;)
Or use a row inside this documents with key=0 to store a counter that you will use to create uids (unique identifiers).
If you use integers don't forget to pack them with lambda x: struct.pack('>q', uid) before storing them.
If you need to create index. I recommend you to have a look at my other answer introducting composite keys to build index in bsddb.
Related
I am new to Python and Pyramid. In a test application I am using to learn more about Pyramid, I want to query a database and create a dictionary based on the results of a sqlalchemy query object and finally send the dictionary to the chameleon template.
So far I have the following code (which works fine), but I wanted to know if there is a better way to create my dictionary.
...
index = 0
clients = {}
q = self.request.params['q']
for client in DBSession.query(Client).filter(Client.name.like('%%%s%%' % q)).all():
clients[index] = { "id": client.id, "name": client.name }
index += 1
output = { "clients": clients }
return output
While learning Python, I found a nice way to create a list in a for loop statement like the following:
myvar = [user.name for user in users]
So, the other question I had: is there a similar 'one line' way like the above to create a dictionary of a sqlalchemy query object?
Thanks in advance.
well, yes, we can tighten this up a bit.
First, this pattern:
index = 0
for item in seq:
frobnicate(index, item)
item += 1
is common enough that there's a builtin function that does it automatically, enumerate(), used like this:
for index, item in enumerate(seq):
frobnicate(index, item)
but, I'm not sure you need it, Associating things with an integer index starting from zero is the functionality of a list, you don't really need a dict for that; unless you want to have holes, or need some of the other special features of dicts, just do:
stuff = []
stuff.extend(seq)
when you're only interested in a small subset of the attributes of a database entity, it's a good idea to tell sqlalchemy to emit a query that returns only that:
query = DBSession.query(Client.id, Client.name) \
.filter(q in Client.name)
In the above i've also shortened the .name.like('%%%s%%' % q) into just q in name since they mean the same thing (sqlalchemy expands it into the correct LIKE expression for you)
Queries constructed in this way return a special thing that looks like a tuple, and can be easily turned into a dict by calling _asdict() on it:
so to put it all together
output = [row._asdict() for row in DBSession.query(Client.id, Client.name)
.filter(q in Client.name)]
or, if you really desperately need it to be a dict, you can use a dict comprehension:
output = {index: row._asdict()
for index, row
in enumerate(DBSession.query(Client.id, Client.name)
.filter(q in Client.name))}
#TokenMacGuy gave a nice and detailed answer to your question. However, I have a feeling you've asked a wrong question :)
You don't need to convert SQLALchemy objects to dictionaries before passing them to the template - that would be quite inconvenient. You can pass the result of a query as is and directly use SQLALchemy mapped objects in your template
q = self.request.params['q']
clients = DBSession.query(Client).filter(q in Client.name).all()
return {'clients': clients}
If you want to turn a SqlAlchemy object into a dict, you can use this code:
def obj_to_dict(obj):
return dict((col.name, getattr(obj, col.name)) for col in sqlalchemy_orm.class_mapper(obj.__class__).mapped_table.c)
there is another attribute of the mapped table that has the relationships in it , but the code gets dicey.
you don't need to cast an object into a dict for any of the template libraries, but if you decide to persist the data ( memcached, session, pickle, etc ) you'll either need to use dicts or write some code to 'merge' the persisted data back into the session.
a quick side note- if you render any of this data through json , you'll either need to have a custom json renderer that can handle datetime objects , or change the values in a function.
I want to build a table in python with three columns and later on fetch the values as necessary.
I am thinking dictionaries are the best way to do it, which has key mapping to two values.
|column1 | column 2 | column 3 |
| MAC | PORT NUMBER | DPID |
| Key | Value 1 | Value 2 |
proposed way :
// define a global learning table
globe_learning_table = defaultdict(set)
// add port number and dpid of a switch based on its MAC address as a key
// packet.src will give you MAC address in this case
globe_learning_table[packet.src].add(event.port)
globe_learning_table[packet.src].add(dpid_to_str(connection.dpid))
// getting value of DPID based on its MAC address
globe_learning_table[packket.src][????]
I am not sure if one key points to two values how can I get the particular value associated with that key.
I am open to use any another data structure as well, if it can build this dynamic table and give me the particular values when necessary.
Why a dictionary? Why not a list of named tuples, or a collection (list, dictionary) of objects from some class which you define (with attributes for each column)?
What's wrong with:
class myRowObj(object):
def __init__(self, mac, port, dpid):
self.mac = mac
self.port = port
self.dpid = dpid
myTable = list()
for each in some_inputs:
myTable.append(myRowObj(*each.split())
... or something like that?
(Note: myTable can be a list, or a dictionary or whatever is suitable to your needs. Obviously if it's a dictionary then you have to ask what sort of key you'll use to access these "rows").
The advantage of this approach is that your "row objects" (which you'd name in some way that made more sense to your application domain) can implement whatever semantics you choose. These objects can validate and convert any values supplied at instantiation, compute any derived values, etc. You can also define a string and code representations of your object (implicit conversions for when one of your rows is used as a string or in certain types of development and debugging or serialization (_str_ and _repr_ special methods, for example).
The named tuples (added in Python 2.6) are a sort of lightweight object class which can offer some performance advantages and lighter memory footprint over normal custom classes (for situations where you only want the named fields without binding custom methods to these objects, for example).
Something like this perhaps?
>>> global_learning_table = collections.defaultdict(PortDpidPair)
>>> PortDpidPair = collections.namedtuple("PortDpidPair", ["port", "dpid"])
>>> global_learning_table = collections.defaultdict(collections.namedtuple('PortDpidPair', ['port', 'dpid']))
>>> global_learning_table["ff:" * 7 + "ff"] = PortDpidPair(80, 1234)
>>> global_learning_table
defaultdict(<class '__main__.PortDpidPair'>, {'ff:ff:ff:ff:ff:ff:ff:ff': PortDpidPair(port=80, dpid=1234)})
>>>
Named tuples might be appropriate for each row, but depending on how large this table is going to be, you may be better off with a sqlite db or something similar.
If it is small enough to store in memory and you want it to be a data structure, you could create an class that contains Values 1 & 2 and use that as the value for your dictionary mapping.
However, as Mr E pointed out, it is probably better design to use a database to store the information and retrieve as necessary from there. This will likely not result in significant performance loss.
Another option to keep in mind is an in-memory SQLite table. See the Python SQLite docs for a basic example:
11.13. sqlite3 — DB-API 2.0 interface for SQLite databases — Python v2.7.5 documentation
http://docs.python.org/2/library/sqlite3.html
I think you're getting two distinct objectives mixed up. You want a representative data structure, and (as I read it) you want to print it in a readable form. What gets printed as a table is not stored internally in the computer in two dimensions; the table presentation is a visual metaphor.
Assuming I'm right about what you want to accomplish, the way I'd go about it is by a) keeping it simple and b) using the right modules to save effort.
The simplest data structure that correctly represents your information is in my opinion a dictionary within a dictionary. Like this:
foo = {'00:00:00:00:00:00': {'port':22, 'dpid':42},
'00:00:00:00:00:01': {'port':23, 'dpid':43}}
The best module I have found for quick and dirty table printing is prettytable. Your code would look something like this:
foo = {'00:00:00:00:00:00': {'port':22, 'dpid':42},
'00:00:00:00:00:01': {'port':23, 'dpid':43}}
t = PrettyTable(['MAC', 'Port', 'dpid'])
for row in foo:
t.add_row([row, foo[row]['port'], foo[row]['dpid']])
print t
Am I correct to assume that nested dictionaries are not supported in aws simpledb? Should I just serialize everything into json and push to the database?
For example,
test = dict(company='test company', users={'username':'joe', 'password': 'test'})
This returns test with keys of 'company' and 'users', however 'users' just represents a string..
Simply, YES, SimpleDB provides only first level of keys.
So if you want to store data with higher level of key nesting, you will have to serialize the data to a string and you will not have simple select commands to make queries, using deeper nested data (you will be given to test it as a string, but will not have simple access to subkey values).
Note, that one key (in one record) handles storing multiple values, but this is sort of list (often used to store multiple tags), but not a dictionary.
In the Java low-level API, there is a way to turn an entity key into a string so you can pass it around to a client via JSON if you want. Is there a way to do this for python?
Depending on whether you use keynames or not, obj.key().name() or obj.key().id() can be used to retrieve keyname or ID, respectively. Neither of those contain name of the entity class so they are not sufficient to retrieve the original object from datastore. Granted, in most cases you usually know the entity kind when working with it, so that not a problem.
A universal solution, working in both cases (keynames or not) is obj.key().id_or_name(). This way you can retrieve the original object as follows:
from google.appengine.ext import db
#...
obj_key = db.Key.from_path('EntityClass', id_or_name)
obj = db.get(obj_key)
If you don't mind passing the long, cryptic string that also containts some extra data (like name of your GAE app), you can use the string representation of the key (str(obj.key()) and pass it directly to db.get for retrieving the object.
str(entity.key()) will return a base64-encoded representation of the key.
entity.key().name() or entity.key().id() will return just the name or ID, omitting the kind and the ancestry.
better:
string_key = entity.key().urlsafe()
and after you can decode de key with
key = ndb.Key(urlsafe=string_key)
You should be able to do:
entity.key().name()
This should return the string representation of the key. See here
Is that what you're looking for?
I build a list of Django model objects by making several queries. Then I want to remove any duplicates, (all of these objects are of the same type with an auto_increment int PK), but I can't use set() because they aren't hashable.
Is there a quick and easy way to do this? I'm considering using a dict instead of a list with the id as the key.
In general it's better to combine all your queries into a single query if possible. Ie.
q = Model.objects.filter(Q(field1=f1)|Q(field2=f2))
instead of
q1 = Models.object.filter(field1=f1)
q2 = Models.object.filter(field2=f2)
If the first query is returning duplicated Models then use distinct()
q = Model.objects.filter(Q(field1=f1)|Q(field2=f2)).distinct()
If your query really is impossible to execute with a single command, then you'll have to resort to using a dict or other technique recommended in the other answers. It might be helpful if you posted the exact query on SO and we could see if it would be possible to combine into a single query. In my experience, most queries can be done with a single queryset.
Is there a quick and easy way to do this? I'm considering using a dict instead of a list with the id as the key.
That's exactly what I would do if you were locked into your current structure of making several queries. Then a simply dictionary.values() will return your list back.
If you have a little more flexibility, why not use Q objects? Instead of actually making the queries, store each query in a Q object and use a bitwise or ("|") to execute a single query. This will achieve your goal and save database hits.
Django Q objects
You can use a set if you add the __hash__ function to your model definition so that it returns the id (assuming this doesn't interfere with other hash behaviour you may have in your app):
class MyModel(models.Model):
def __hash__(self):
return self.pk
If the order doesn't matter, use a dict.
Remove "duplicates" depends on how you define "duplicated".
If you want EVERY column (except the PK) to match, that's a pain in the neck -- it's a lot of comparing.
If, on the other hand, you have some "natural key" column (or short set of columns) than you can easily query and remove these.
master = MyModel.objects.get( id=theMasterKey )
dups = MyModel.objects.filter( fld1=master.fld1, fld2=master.fld2 )
dups.all().delete()
If you can identify some shorter set of key fields for duplicate identification, this works pretty well.
Edit
If the model objects haven't been saved to the database yet, you can make a dictionary on a tuple of these keys.
unique = {}
...
key = (anObject.fld1,anObject.fld2)
if key not in unique:
unique[key]= anObject
I use this one:
dict(zip(map(lambda x: x.pk,items),items)).values()