MongoEngine: Create Pickle field - python

I'm using MongoEngine and trying to create a field that works like SQLAlchemy's PickleType field. Basically, I just need to pickle objects before they're written to the database, and unpickle them when they're loaded.
However it looks like MongoEngine's fields don't provide proper conversion methods I could override, instead having two coercion methods (to_python and to_mongo). If I understand correctly, these functions can be called anytime, that is, a call to to_python(v) does not guarantee that v comes from the database. I've thought of writing something like this:
class PickleField(fields.BinaryField):
def to_python(self, value):
value = super().to_python(value)
if <<value was pickled by the field>>
return pickle.loads(value)
else:
return value
Unfortunately, if I want to be as general as possible, I don't see a way to check whether the value should be unpickled or not. For instance,
a = pickle.dumps(x)
PickleField().to_python(a) # should return a, will return x
I also don't think I can store any state in the PickleField, since that's shared by all instances.
Is there a way around this?

Related

Filter by an object in SQLAlchemy

I have a declared model where the table stores a "raw" path identifier of an object. I then have a #hybrid_property which allows directly getting and setting the object which is identified by this field (which is not another declarative model). Is there a way to query directly on this high level?
I can do this:
session.query(Member).filter_by(program_raw=my_program.raw)
I want to be able to do this:
session.query(Member).filter_by(program=my_program)
where my_program.raw == "path/to/a/program"
Member has a field program_raw and a property program which gets the correct Program instance and sets the appropriate program_raw value. Program has a simple raw field which identifies it uniquely. I can provide more code if necessary.
The problem is that currently, SQLAlchemy simply tries to pass the program instance as a parameter to the query, instead of its raw value. This results in a Error binding parameter 0 - probably unsupported type. error.
Either, SQLAlchemy needs to know that when comparing the program, it must use Member.program_raw and match that against the raw property of the parameter. Getting it to use Member.program_raw is done simply using #program.expression but I can't figure out how to translate the Program parameter correctly (using a Comparator?), and/or
SQLAlchemy should know that when I filter by a Program instance, it should use the raw attribute.
My use-case is perhaps a bit abstract, but imagine I stored a serialized RGB value in the database and had a property with a Color class on the model. I want to filter by the Color class, and not have to deal with RGB values in my filters. The color class has no problems telling me its RGB value.
Figured it out by reading the source for relationship. The trick is to use a custom Comparator for the property, which knows how to compare two things. In my case it's as simple as:
from sqlalchemy.ext.hybrid import Comparator, hybrid_property
class ProgramComparator(Comparator):
def __eq__(self, other):
# Should check for case of `other is None`
return self.__clause_element__() == other.raw
class Member(Base):
# ...
program_raw = Column(String(80), index=True)
#hybrid_property
def program(self):
return Program(self.program_raw)
#program.comparator
def program(cls):
# program_raw becomes __clause_element__ in the Comparator.
return ProgramComparator(cls.program_raw)
#program.setter
def program(self, value):
self.program_raw = value.raw
Note: In my case, Program('abc') == Program('abc') (I've overridden __new__), so I can just return a "new" Program all the time. For other cases, the instance should probably be lazily created and stored in the Member instance.

Consistent indexing for objects with variable attributes in ZODB

I have a ZODB installation where I have to organize several million objects of about a handful of different types. I have a generic container class Table, which contains BTrees to index objects by attributes or combinations of these attributes. Data consistency is quite essential, and so I want to enforce, that the indices are automatically updated, when I write to any of the attributes, which are covered by the indexing. So a simple obj.a = x should be sufficient to calculate all new dependent index entries, check if there are any collisions, and finally write the indices and the value.
In general, I'd be happy to use a library for that, so I was looking at repoze.catalog and IndexedCatalog, but was not really happy with that. IndexedCatalog seems dead for quite a while, and not providing the kind of consistency for changes to the objects. repoze.catalog seems to be more used and active, but also not providing this kind of consistency, as far as I understand. If I missed something here, I'd love to hear about it and prefer reusing over reinventing.
So, how I see it besides trying to find a library for the problem, I'd have to intercept the write access to the dataobject attributes with descriptors and let the Table class do the magic of changing the indices. For that, the descriptor instances have to know, with which Table instances they have to talk with. The current implementation goes someting like that:
class DatabaseElement(Persistent):
name = Property(constant_parameters)
...
class Property(object):
...
def __set__(self, obj, name, val):
val = self.check_value(val)
setattr(obj, '_' + name, val)
When these DatabaseElement classes are generated, the database and the objects within are not yet created. So as mentioned in this nice answer, I'd probably have to create some singleton lookup mechanism, to find the Table objects, without handing them to Property as an instantiation argument. Is there a more elegant way? Persisting the descriptors itself? Any suggestions and best-practice examples welcome!
So I finally figured out myself. The solution comes in three parts. No ugly Singleton required. Table provides the logic to check for collisions, DatabaseElement gets the ability to lookup the responsible Table without ugly workarounds and Property takes care, that the indices are updated, before any indexed values are written. Here some snippets, the main clue is the table lookup of DatabaseElement. I also didn't see that documented anywhere. Nice extra: It not only verifies writes to single values, I can also check for changes of several indexed values in one go.
class Table(PersistentMapping):
....
def update_indices(self, inst, updated_values_dict):
changed_indices_keys = self._check_collision(inst, updated_values_dict)
original_keys = [inst.key(index) for index, tmp_key in changed_indices_keys]
for (index, tmp_key), key in zip(changed_indices_keys, original_keys):
self[index][tmp_key] = inst
try:
del self[index][key]
except KeyError:
pass
class DatabaseElement(Persistent):
....
#property
def _table(self):
return self._p_jar and self._p_jar.root()[self.__class__.__name__]
def _update_indices(self, update_dict, verify=True):
if verify:
update_dict = dict((key, getattr(type(self), key).verify(val))
for key, val in update_dict.items()
if key in self._key_properties)
if not update_dict:
return
table = self._table
table and table.update_indices(self, update_dict)
class Property(object):
....
def __set__(self, obj, val):
validated_val = self.validator(obj, self.name, val)
if self.indexed:
obj._update_indices({self.name: val}, verify=False)
setattr(obj, self.hidden_name, validated_val)

Meaning of the map function in couchdb-pythons ViewField

I'm using the couchdb.mapping in one of my projects. I have a class called SupportCase derived from Document that contains all the fields I want.
My database (called admin) contains multiple document types. I have a type field in all the documents which I use to distinguish between them. I have many documents of type "case" which I want to get at using a view. I have design document called support with a view inside it called cases. If I request the results of this view using db.view("support/cases), I get back a list of Rows which have what I want.
However, I want to somehow have this wrapped by the SupportCase class so that I can call a single function and get back a list of all the SupportCases in the system. I created a ViewField property
#ViewField.define('cases')
def all(self, doc):
if doc.get("type","") == "case":
yield doc["_id"], doc
Now, if I call SupportCase.all(db), I get back all the cases.
What I don't understand is whether this view is precomputed and stored in the database or done on demand similar to db.query. If it's the latter, it's going to be slow and I want to use a precomputed view. How do I do that?
I think what you need is:
#classmethod
def all(cls):
result = cls.view(db, "support/all", include_docs=True)
return result.rows
Document class has a classmethod view which wraps the rows by class on which it is called. So the following returns you a ViewResult with rows of type SupportCase and taking .rows of that gives a list of support cases.
SupportCase.view(db, viewname, include_docs=True)
And I don't think you need to get into the ViewField magic. But let me explain how it works. Consider the following example from the CouchDB-python documentation.
class Person(Document):
#ViewField.define('people')
def by_name(doc):
yield doc['name'], doc
I think this is equivalent to:
class Person(Document):
#classmethod
def by_name(cls, db, **kw):
return cls.view(db, **kw)
With the original function attached to People.by_name.map_fun.
The map function is in some ways analogous to an index in a relational database. It is not done again every time, and when new documents are added the way it is updated does not require everything to be redone (it's a kind of tree structure).
This has a pretty good summary
ViewField uses a pre-defined view so, once built, will be fast. It definitely doesn't use a temporary view.

Dynamically building up types in python

Suppose I am building a composite set of types:
def subordinate_type(params):
#Dink with stuff
a = type(myname, (), dict_of_fields)
return a()
def toplevel(params)
lots_of_types = dict(keys, values)
myawesomedynamictype = type(toplevelname, (), lots_of_types)
#Now I want to edit some of the values in myawesomedynamictype's
#lots_of_types.
return myawesomedynamictype()
In this particular case, I want a reference to the "typeclass" myawesomedynamictype inserted into lots_of_types.
I've tried to iterate through lots_of_types and set it, supposing that the references were pointed at the same thing, but I found that the myawesomedynamictype got corrupted and lost its fields.
The problem I'm trying to solve is that I get values related to the type subordinate_type, and I need to generate a toplevel instantiation based on subordinate_type.
This is an ancient question, and because it's not clear what the code is trying to do (being a code gist rather than working code), it's a little hard to answer.
But it sounds like you want a reference to the dynamically created class "myawesomedynamictype" on the class itself. A copy of (I believe a copy of) the dictionary lots_of_types became the __dict__ of this new class when you called type() to construct it.
So, just set a new attribute on the class to have a value of the class you just constructed; Is that what you were after?
def toplevel(params)
lots_of_types = dict(keys, values)
myawesomedynamictype = type(toplevelname, (), lots_of_types)
myawesomedynamictype.myawesomedynamictype = myawesomedynamictype
return myawesomedynamictype()

How do I get the key value of a db.ReferenceProperty without a database hit?

Is there a way to get the key (or id) value of a db.ReferenceProperty, without dereferencing the actual entity it points to? I have been digging around - it looks like the key is stored as the property name preceeded with an _, but I have been unable to get any code working. Examples would be much appreciated. Thanks.
EDIT: Here is what I have unsuccessfully tried:
class Comment(db.Model):
series = db.ReferenceProperty(reference_class=Series);
def series_id(self):
return self._series
And in my template:
more
The result:
more
Actually, the way that you are advocating accessing the key for a ReferenceProperty might well not exist in the future. Attributes that begin with '_' in python are generally accepted to be "protected" in that things that are closely bound and intimate with its implementation can use them, but things that are updated with the implementation must change when it changes.
However, there is a way through the public interface that you can access the key for your reference-property so that it will be safe in the future. I'll revise the above example:
class Comment(db.Model):
series = db.ReferenceProperty(reference_class=Series);
def series_id(self):
return Comment.series.get_value_for_datastore(self)
When you access properties via the class it is associated, you get the property object itself, which has a public method that can get the underlying values.
You're correct - the key is stored as the property name prefixed with '_'. You should just be able to access it directly on the model object. Can you demonstrate what you're trying? I've used this technique in the past with no problems.
Edit: Have you tried calling series_id() directly, or referencing _series in your template directly? I'm not sure whether Django automatically calls methods with no arguments if you specify them in this context. You could also try putting the #property decorator on the method.

Categories