Why use MongoAlchemy when you could subclass a Python Dict?

Why use MongoAlchemy when you could subclass a Python Dict? - python

A friend recently showed me that you can create an instance that is a subclass of dict in Python, and then use that instance to save, update, etc. Seems like you have more control, and it looks easier as well.
class Marker(dict):
def __init__(self, username, email=None):
self.username = username
if email:
self.email = email
#property
def username(self):
return self.get('username')
#username.setter
def username(self, val):
self['username'] = val
def save(self):
db.collection.save(self)

Author here. The general reason you'd want to use it (or one of the many similar libraries) is for safety. When you assign a value to a MongoAlchemy Document it does a check a check to make sure all of the constraints you specified are satisfied (e.g. type, lengths of strings, numeric bounds).
It also has a query DSL that can be more pleasant to use than the json-like built in syntax. Here's an example from the docs:
>>> query = session.query(BloodDonor)
>>> for donor in query.filter(BloodDonor.first_name == 'Jeff', BloodDonor.age < 30):
>>> print donor
Jeff Jenkins (male; Age: 28; Type: O+)
The MongoAlchemy Session object also allows you to simulate transactions:
with session:
do_stuff()
session.insert(doc1)
do_more_stuff()
session.insert(doc2)
do_even_more_stuff()
session.insert(doc3)
# note that at this point nothing has been inserted
# now things are inserted
This doesn't mean that these inserts are one atomic operation—or even that all of the write will succeed—but it does mean that if your application has errors in the "do_stuff" functions that you won't have done half of the inserts. So it prevents a specific and reasonably common type of error

Related

Can an association class be implemented in Python?

I have just started learning software development and I am modelling my system in a UML Class diagram. I am unsure how I would implement this in code.
To keep things simple let’s assume the followimg example:
There is a Room and a Guest Class with association Room(0..)-Guest(0..) and an association class RoomBooking, which contains booking details. How would I model this in Python if my system wants to see all room bookings made by a particular guest?

Most Python applications developed from a UML design are backed by a relational database, usually via an ORM. In which case your design is pretty trivial: your RoomBooking is a table in the database, and the way you look up all RoomBooking objects for a given Guest is just an ORM query. Keeping it vague rather than using a particular ORM syntax, something like this:
bookings = RoomBooking.select(Guest=guest)
With an RDBMS but no ORM, it's not much different. Something like this:
sql = 'SELECT Room, Guest, Charge, Paid FROM RoomBooking WHERE Guest = ?'
cur = db.execute(sql, (guest.id))
bookings = [RoomBooking(*row) for row in cur]
And this points to what you'd do if you're not using a RDBMS: any relation that would be stored as a table with a foreign key is instead stored as some kind of dict in memory.
For example, you might have a dict mapping guests to sets of room bookings:
bookings = guest_booking[guest]
Or, alternatively, if you don't have a huge number of hotels, you might have this mapping implicit, with each hotel having a 1-to-1 mapping of guests to bookings:
bookings = [hotel.bookings[guest] for hotel in hotels]
Since you're starting off with UML, you're probably thinking in strict OO terms, so you'll want to encapsulate this dict in some class, behind some mutator and accessor methods, so you can ensure that you don't accidentally break any invariants.
There are a few obvious places to put it—a BookingManager object makes sense for the guest-to-set-of-bookings mapping, and the Hotel itself is such an obvious place for the per-hotel-guest-to-booking that I used it without thinking above.
But another place to put it, which is closer to the ORM design, is in a class attribute on the RoomBooking type, accessed by classmethods. This also allows you to extend things if you later need to, e.g., look things up by hotel—you'd then put two dicts as class attributes, and ensure that a single method always updates both of them, so you know they're always consistent.
So, let's look at that:
class RoomBooking
guest_mapping = collections.defaultdict(set)
hotel_mapping = collections.defaultdict(set)
def __init__(self, guest, room):
self.guest, self.room = guest, room
#classmethod
def find_by_guest(cls, guest):
return cls.guest_mapping[guest]
#classmethod
def find_by_hotel(cls, hotel):
return cls.hotel_mapping[hotel]
#classmethod
def add_booking(cls, guest, room):
booking = cls(guest, room)
cls.guest_mapping[guest].add(booking)
cls.hotel_mapping[room.hotel].add(booking)
Of course your Hotel instance probably needs to add the booking as well, so it can raise an exception if two different bookings cover the same room on overlapping dates, whether that happens in RoomBooking.add_booking, or in some higher-level function that calls both Hotel.add_booking and RoomBooking.add_booking.
And if this is multi-threaded (which seems like a good possibility, given that you're heading this far down the Java-inspired design path), you'll need a big lock, or a series of fine-grained locks, around the whole transaction.
For persistence, you probably want to store these mappings along with the public objects. But for a small enough data set, or for a server that rarely restarts, it might be simpler to just persist the public objects, and rebuild the mappings at load time by doing a bunch of add_booking calls as part of the load process.
If you want to make it even more ORM-style, you can have a single find method that takes keyword arguments and manually executes a "query plan" in a trivial way:
#classmethod
def find(cls, guest=None, hotel=None):
if guest is None and hotel is None:
return {booking for bookings in cls.guest_mapping.values()
for booking in bookings}
elif hotel is None:
return cls.guest_mapping[guest]
elif guest is None:
return cls.hotel_mapping[hotel]
else:
return {booking for booking in cls.guest_mapping[guest]
if booking.room.hotel == hotel}
But this is already pushing things to the point where you might want to go back and ask whether you were right to not use an ORM in the first place. If that sounds ridiculously heavy duty for your simple toy app, take a look at sqlite3 for the database (which comes with Python, and which takes less work to use than coming up with a way to pickle or json all your data for persistence) and SqlAlchemy for the ORM. There's not much of a learning curve, and not much runtime overhead or coding-time boilerplate.

Sure you can implement it in Python. But there is not a single way. Quite often you have a database layer where the association class is used with two foreign keys (in your case to the primaries of Room and Guest). So in order to search you would just code an according SQL to be sent. In case you want to cache this table you would code it like this (or similarly) with an associative array:
from collections import defaultdict
class Room():
def __init__(self, num):
self.room_number = num
def key(self):
return str(self.room_number)
class Guest():
def __init__(self, name):
self.name = name
def key(self):
return self.name
def nested_dict(n, type):
if n == 1:
return defaultdict(type)
else:
return defaultdict(lambda: nested_dict(n-1, type))
room_booking = nested_dict(2, str)
class Room_Booking():
def __init__(self, date):
self.date = date
room1 = Room(1)
guest1 = Guest("Joe")
room_booking[room1.key()][guest1.key()] = Room_Booking("some date")
print(room_booking[room1.key()][guest1.key()])

SQLAlchemy hybrid_property and expressions

I am working on storing some data produced by an external process in a postgres database using sqlalchemy. The external data has several dates stored as strings that I would like to use as datetime objects for comparison and duration calculation and I'd like the conversion to happen in the data model to maintain consistency. I'm trying to use a hybrid_property but I am running into problems based on the different ways that SQLAlchemy uses the hybrid_property as an instance or class.
A (simplified) case looks like this...
class Contact(Base):
id = Column(String(100), primary_key=True)
status_date = Column(String(100))
#hybrid_property
def real_status_date(self):
return convert_from_outside_date(self.status_date)
with the conversion function something like this (the function can return a date, False on conversion failure or None on being passed None)...
def convert_from_outside_date(in_str):
out_date = None
if in_str != None:
try:
out_date = datetime.datetime.strptime(in_str,"%Y-%m-%d")
except ValueError:
out_date = False
return out_date
When I use an instance of Contact, contact.real_status_date properly works as a datetime. The problem is when Contact.real_status_date is used in a query filter.
db_session.query(Contact).filter(
Contact.real_status_date > datetime.datetime.now())
Gets me a "TypeError: Boolean value of this clause is not defined" exception, with the
in_str != None
line of the conversion function as the last part of the stack trace.
Some answers (https://stackoverflow.com/a/14504695/416308) show the use of a setter function and the addition of new column in the data model. Other answers (https://stackoverflow.com/a/13642708/416308) show the addition of #property.expression function that returns something sqlalchemy can interpret into a sql expression.
Adding a setter to the Contact class works but the addition of new columns seems like it shouldn't be necessary and makes some table metadata parsing more difficult later and I'd like to avoid it if I can.
_real_status_date = Column(DateTime())
#hybrid_property
def real_status_date(self):
return self._real_status_date
#real_status_date.setter
def value(self):
self._real_status_date = convert_from_outside_date(self.status_date)
If I used an #.expression decorator would I have to implement a strptime function that is more sql compatible? What would that look like? Is there something wrong with the conversion function that is causing trouble here?

As zzzeek mentions, you could add the following to your class
Depending on your DB, it might already interpret a python datetime object
So it could work only modifying your conversion function to:
def convert_from_outside_date(in_str):
if in_str:
try:
return datetime.datetime.strptime(in_str,"%Y-%m-%d")
# Return None for a Null date
return None
Otherwise you need to add an expression function:
#real_status_date.expression
def real_status_date(self):
return sqlalchemy.Date(self.real_status_date)

python database implementation

I am trying to implement a simple database program in python. I get to the point where I have added elements to the db, changed the values, etc.
class db:
def __init__(self):
self.database ={}
def dbset(self, name, value):
self.database[name]=value
def dbunset(self, name):
self.dbset(name, 'NULL')
def dbnumequalto(self, value):
mylist = [v for k,v in self.database.items() if v==value]
return mylist
def main():
mydb=db()
cmd=raw_input().rstrip().split(" ")
while cmd[0]!='end':
if cmd[0]=='set':
mydb.dbset(cmd[1], cmd[2])
elif cmd[0]=='unset':
mydb.dbunset(cmd[1])
elif cmd[0]=='numequalto':
print len(mydb.dbnumequalto(cmd[1]))
elif cmd[0]=='list':
print mydb.database
cmd=raw_input().rstrip().split(" ")
if __name__=='__main__':
main()
Now, as a next step I want to be able to do nested transactions within this python code.I begin a set of commands with BEGIN command and then commit them with COMMIT statement. A commit should commit all the transactions that began. However, a rollback should revert the changes back to the recent BEGIN. I am not able to come up with a suitable solution for this.

A simple approach is to keep a "transaction" list containing all the information you need to be able to roll-back pending changes:
def dbset(self, name, value):
self.transaction.append((name, self.database.get(name)))
self.database[name]=value
def rollback(self):
# undo all changes
while self.transaction:
name, old_value = self.transaction.pop()
self.database[name] = old_value
def commit(self):
# everything went fine, drop undo information
self.transaction = []

If you are doing this as an academic exercise, you might want to check out the Rudimentary Database Engine recipe on the Python Cookbook. It includes quite a few classes to facilitate what you might expect from a SQL engine.
Database is used to create database instances without transaction support.
Database2 inherits from Database and provides for table transactions.
Table implements database tables along with various possible interactions.
Several other classes act as utilities to support some database actions that would normally be supported.
Like and NotLike implement the LIKE operator found in other engines.
date and datetime are special data types usable for database columns.
DatePart, MID, and FORMAT allow information selection in some cases.
In addition to the classes, there are functions for JOIN operations along with tests / demonstrations.

This is all available for free in the built in sqllite module. The commits and rollbacks for sqllite are discussed in more detail than I can understand here

Filter by an object in SQLAlchemy

I have a declared model where the table stores a "raw" path identifier of an object. I then have a #hybrid_property which allows directly getting and setting the object which is identified by this field (which is not another declarative model). Is there a way to query directly on this high level?
I can do this:
session.query(Member).filter_by(program_raw=my_program.raw)
I want to be able to do this:
session.query(Member).filter_by(program=my_program)
where my_program.raw == "path/to/a/program"
Member has a field program_raw and a property program which gets the correct Program instance and sets the appropriate program_raw value. Program has a simple raw field which identifies it uniquely. I can provide more code if necessary.
The problem is that currently, SQLAlchemy simply tries to pass the program instance as a parameter to the query, instead of its raw value. This results in a Error binding parameter 0 - probably unsupported type. error.
Either, SQLAlchemy needs to know that when comparing the program, it must use Member.program_raw and match that against the raw property of the parameter. Getting it to use Member.program_raw is done simply using #program.expression but I can't figure out how to translate the Program parameter correctly (using a Comparator?), and/or
SQLAlchemy should know that when I filter by a Program instance, it should use the raw attribute.
My use-case is perhaps a bit abstract, but imagine I stored a serialized RGB value in the database and had a property with a Color class on the model. I want to filter by the Color class, and not have to deal with RGB values in my filters. The color class has no problems telling me its RGB value.

Figured it out by reading the source for relationship. The trick is to use a custom Comparator for the property, which knows how to compare two things. In my case it's as simple as:
from sqlalchemy.ext.hybrid import Comparator, hybrid_property
class ProgramComparator(Comparator):
def __eq__(self, other):
# Should check for case of `other is None`
return self.__clause_element__() == other.raw
class Member(Base):
# ...
program_raw = Column(String(80), index=True)
#hybrid_property
def program(self):
return Program(self.program_raw)
#program.comparator
def program(cls):
# program_raw becomes __clause_element__ in the Comparator.
return ProgramComparator(cls.program_raw)
#program.setter
def program(self, value):
self.program_raw = value.raw
Note: In my case, Program('abc') == Program('abc') (I've overridden __new__), so I can just return a "new" Program all the time. For other cases, the instance should probably be lazily created and stored in the Member instance.

Consistent indexing for objects with variable attributes in ZODB

I have a ZODB installation where I have to organize several million objects of about a handful of different types. I have a generic container class Table, which contains BTrees to index objects by attributes or combinations of these attributes. Data consistency is quite essential, and so I want to enforce, that the indices are automatically updated, when I write to any of the attributes, which are covered by the indexing. So a simple obj.a = x should be sufficient to calculate all new dependent index entries, check if there are any collisions, and finally write the indices and the value.
In general, I'd be happy to use a library for that, so I was looking at repoze.catalog and IndexedCatalog, but was not really happy with that. IndexedCatalog seems dead for quite a while, and not providing the kind of consistency for changes to the objects. repoze.catalog seems to be more used and active, but also not providing this kind of consistency, as far as I understand. If I missed something here, I'd love to hear about it and prefer reusing over reinventing.
So, how I see it besides trying to find a library for the problem, I'd have to intercept the write access to the dataobject attributes with descriptors and let the Table class do the magic of changing the indices. For that, the descriptor instances have to know, with which Table instances they have to talk with. The current implementation goes someting like that:
class DatabaseElement(Persistent):
name = Property(constant_parameters)
...
class Property(object):
...
def __set__(self, obj, name, val):
val = self.check_value(val)
setattr(obj, '_' + name, val)
When these DatabaseElement classes are generated, the database and the objects within are not yet created. So as mentioned in this nice answer, I'd probably have to create some singleton lookup mechanism, to find the Table objects, without handing them to Property as an instantiation argument. Is there a more elegant way? Persisting the descriptors itself? Any suggestions and best-practice examples welcome!

So I finally figured out myself. The solution comes in three parts. No ugly Singleton required. Table provides the logic to check for collisions, DatabaseElement gets the ability to lookup the responsible Table without ugly workarounds and Property takes care, that the indices are updated, before any indexed values are written. Here some snippets, the main clue is the table lookup of DatabaseElement. I also didn't see that documented anywhere. Nice extra: It not only verifies writes to single values, I can also check for changes of several indexed values in one go.
class Table(PersistentMapping):
....
def update_indices(self, inst, updated_values_dict):
changed_indices_keys = self._check_collision(inst, updated_values_dict)
original_keys = [inst.key(index) for index, tmp_key in changed_indices_keys]
for (index, tmp_key), key in zip(changed_indices_keys, original_keys):
self[index][tmp_key] = inst
try:
del self[index][key]
except KeyError:
pass
class DatabaseElement(Persistent):
....
#property
def _table(self):
return self._p_jar and self._p_jar.root()[self.__class__.__name__]
def _update_indices(self, update_dict, verify=True):
if verify:
update_dict = dict((key, getattr(type(self), key).verify(val))
for key, val in update_dict.items()
if key in self._key_properties)
if not update_dict:
return
table = self._table
table and table.update_indices(self, update_dict)
class Property(object):
....
def __set__(self, obj, val):
validated_val = self.validator(obj, self.name, val)
if self.indexed:
obj._update_indices({self.name: val}, verify=False)
setattr(obj, self.hidden_name, validated_val)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why use MongoAlchemy when you could subclass a Python Dict? - python

Related

Can an association class be implemented in Python?

SQLAlchemy hybrid_property and expressions

python database implementation

Filter by an object in SQLAlchemy

Consistent indexing for objects with variable attributes in ZODB

Categories

Resources