Expunge object from SQLAlchemy session - python

I want to pass an instance of a mapped class to a non-SQLAlchemy aware method (in another process) and only need the values of my attributes. The problem is, that an UnboundExecutionError occurs, every time the method wants to read an attribute value. I do understand, why this happens, but I would like to have a solution for this problem.
I only need the values of my defined attributes (id, name and dirty in the example) and do not need the SQLAlchemy overhead in the destination method.
Example class:
Base = declarative_base()
class Record(Base):
__tablename__ = 'records'
_id = Column('id', Integer, primary_key=True)
_name = Column('name', String(50))
_dirty = Column('dirty', Boolean, index=True)
#synonym_for('_id')
#property
def id(self):
return self._id
#property
def name(self):
return self._name
#name.setter
def name(self, value):
self._name = value
self._dirty = True
name = synonym('_name', descriptor=name)
#synonym_for('_dirty')
#property
def dirty(self):
return self._dirty
Example call:
...
def do_it(self):
records = self.query.filter(Record.dirty == True)
for record in records:
pass_to_other_process(record)
I've tried using session.expunge() and copy.copy(), but without success.

You need to remove the SQL ALchemy object from the session aka 'expunge' it. Then you can request any already loaded attribute w/o it attempting to reuse its last known session/unbound session.
self.expunge(record)
Be aware though, that any unloaded attribute will return it's last known value or None. If you would like to later work with the object again you can just call 'add' again or 'merge'
self.add(record)

My guess is that you're running afoul of SQLAlchemy's lazy loading. Since I don't actually know a whole lot about SQLAlchemy's internals, here's what I recommend:
class RecordData(object):
__slots__ = ('id', 'name', 'dirty')
def __init__(self, rec):
self.id = rec.id
self.name = rec.name
self.dirty = rec.dirty
Then later on...
def do_it(self):
records = self.query.filter(Record.dirty == True)
for record in records:
pass_to_other_process(RecordData(record))
Now, I think there is a way to tell SQLAlchemy to turn your object into a 'dumb' object that has no connection to the database and looks very much like what I just made here. But I don't know what it is.

You may need to make the object transient as well:
This describes one of the major object states which an object can have
within a Session; a transient object is a new object that doesn’t have
any database identity and has not been associated with a session yet.
When the object is added to the session, it moves to the pending
state.
You can make that by expunging the old object, making it transient and then using it to create the new object.
Something like:
# Get the original object (new query style)
query = select(MyTable).where(MyTable.id == old_object_id)
results = session.execute(query)
old_object = result.scalar_one_or_none()
# Remove it from the session & make it transient
session.expunge(old_object)
make_transient(old_object)
# Use it to create the new object
new_object = MyTable(attr1=old_object.attr1 ...)
# Add & commit the new object
session.add(new_object)
session.commit()

Related

SQLAlchemy strategy: ORM + Core for classes with large amounts of data

Apparently use of ORM and Core in tandem is possible, but I haven't been able to find any solid explanation of a strategy for this.
Here's the use case class:
class DataHolder(Base):
__tablename__ = 'data_holder'
id = Column(Integer, primary_key=True)
dataset_id = Column(Integer, ForeignKey('data_set.id'))
name = Column(String)
_dataset_table = Table('data_set', Base.metadata,
Column('id', Integer, primary_key=True),
)
_datarows_table = Table('data_rows', Base.metadata,
Column('id', Integer, primary_key=True),
Column('dataset_id', None, ForeignKey('data_set.id')),
Column('row', Integer),
Column('col_0', Integer),
Column('col_1', Integer),
Column('col_2', Integer),
)
def __init__(self, name=None, data=None):
self.name = name
self.data = data
def _pack_info(self):
# Return __class__ and other info needed for packing.
def _unpack_info(self):
# Return info needed for unpacking.
name should be persisted via the ORM. data, which would be a large NumPy array (or similar type), should be persisted via the Core.
There is a go-between table 'data_set' that exists for the purpose of a many-to-one relationship between DataHolder and the data. This allows data sets to exist independently within some library. (The sole purpose of this table is to generate IDs for new data sets.)
Actual persistence would be accomplished through a class that implements some listeners, such as the following.
class PersistenceManager:
def __init__(self):
self.init_db()
self.init_listeners()
def init_db(self):
engine = create_engine('sqlite:///path/to/database.db')
self.sa_engine = engine
self.sa_sessionmaker = sessionmaker(bind=engine)
Base.metadata.create_all(engine)
def init_listeners(self):
#event.listens_for(Session, 'transient_to_pending')
def pack_data(session, instance):
try:
pack_info = instance._pack_info()
# Use Core to execute INSERT for bulky data.
except AttributeError:
pass
#event.listens_for(Session, 'loaded_as_persistent')
def unpack_data(session, instance):
try:
unpack_info = instance._unpack_info()
# Use Core to execute SELECT for bulky data.
except AttributeError:
pass
def persist(self, obj):
session.add(obj)
def load(self, class_, spec):
obj = session.query(class_).filter_by(**spec).all()[-1]
return obj
def session_scope(self):
session = self.sa_sessionmaker()
try:
yield session
session.commit()
except:
session.rollback()
raise
finally:
session.close()
The idea is that whenever a DataHolder is persisted, its data is also persisted at the same (or nearly the same) time.
Listening for 'transient_to_pending' (for "packing") and 'loaded_as_persistent' (for "unpacking") events will work for simple saving and loading. However, it seems care should be taken to also listen for the 'pending_to_transient' event. In the case of a rollback, the data added via Core will not be pulled back out of the database in the same way the ORM-related data will.
Is there another, better way to manipulate this behavior besides listening for 'pending_to_transient'? This could cause problems in the case where two different DataHolders reference the same data set: one DataHolder could rollback, removing the data set from the database so that the other DataHolder can no longer use it.

Changes to model class's __init__ method don't seem to be taking effect

I have two particular model classes that are giving me errors during my testing, upon inspecting the methods for each I was pretty certain my issues were the result of typos on my part.
I've made the changes in both classes as needed, but re-running the tests produces the same error. I even tried dropping my schema and re-creating it with Flask-SQLAlchemy's create_all() method, but I still run into issues.
In the Metrics class, the variables in the __init__ method were wrong and were missing underscores (Ie: self.name instead of self._name). I addressed that by changing them to self._name and self._metric_type
In the HostMetricMapping class, I needed to add the host_id parameter to the __init__ method, since I had forgotten it the first time. So, I added it.
class Metrics(_database.Model):
__tablename__ = 'Metrics'
_ID = _database.Column(_database.Integer, primary_key=True)
_name = _database.Column(_database.String(45), nullable=False)
_metric_type = _database.Column(_database.String(45))
_host_metric_mapping = _database.relationship('HostMetricMapping', backref='_parent_metric', lazy=True)
def __init__(self, name, metric_type):
self._name = name # This line used to say self.name, but was changed to self._name to match the column name
self._metric_type = metric_type # This line used to say self.metric_type, but was changed to self._metric_type to match the column name
def __repr__(self):
return '{0}'.format(self._ID)
class HostMetricMapping(_database.Model):
__tablename__ = 'HostMetricMapping'
_ID = _database.Column(_database.Integer, primary_key=True)
_host_id = _database.Column(_database.Integer, _database.ForeignKey('Hosts._ID'), nullable=False)
_metric_id = _database.Column(_database.Integer, _database.ForeignKey('Metrics._ID'), nullable=False)
_metric = _database.relationship('MetricData', backref='_metric_hmm', lazy=True)
_threshold = _database.relationship('ThresholdMapping', backref='_threshold_hmm', lazy=True)
def __init__(self, host_id, metric_id):
self._host_id = host_id # This line and it's corresponding parameter were missing, and were added
self._metric_id = metric_id
def __repr__(self):
return '{0}'.format(self._ID)
The issues I encounter are:
When trying to instantiate an instance of Metrics and add it into the database, SQLAlchemy raises an IntegrityError because I have the _name column set to not null, and SQLAlchemy inherits the values for both _name and _metric_type as None or NULL, even though I instantiate it with values for both parameters.
For HostMetricMapping, Python raises an exception because it still treats that class as only having the metric_id parameter, instead of also having the host_id parameter I've added.
A better way to override __init__ when using flask-sqlalchemy is to use reconstructor. Object initialization with sqlalchemy is a little tricky, and flask-sqlalchemy might be complicating it as well.. anyways, here's how we do it:
from sqlalchemy.orm import reconstructor
class MyModel(db.Model):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.init_on_load()
#reconstructor
def init_on_load(self):
# put your init stuff here

Adding to sqlalchemy mapping class non db attributes

The app has such logic: list of people stored in the database, each man has a rating calculated in realtime and this value is never stored in database. And I want to use one class to work with dababase fields: name, age etc. and non database field: rating.
Is it possible in sqlalchemy? Now I'm using inheritance Man -> ManMapping:
class Man:
rating = None
def get_rating(self):
return self.rating
...
class ManMapping(Base, Man):
__tablename__ = 'man'
id = Column('man_id', Integer, primary_key=True)
name = Column(Unicode)
...
It works but it looks terrible for me. Is it right approach or I have to do something else?
This is the correct solution: https://docs.sqlalchemy.org/en/13/orm/constructors.html
Hybrid properties are somewhat less flexible that this. The accepted answer is not an actual answer to the problem.
The SQLAlchemy ORM does not call init when recreating objects from database rows. The ORM’s process is somewhat akin to the Python standard library’s pickle module, invoking the low level new method and then quietly restoring attributes directly on the instance rather than calling init.
If you need to do some setup on database-loaded instances before they’re ready to use, there is an event hook known as InstanceEvents.load() which can achieve this; it is also available via a class-specific decorator called reconstructor(). When using reconstructor(), the mapper will invoke the decorated method with no arguments every time it loads or reconstructs an instance of the class. This is useful for recreating transient properties that are normally assigned in init:
from sqlalchemy import orm
class MyMappedClass(object):
def __init__(self, data):
self.data = data
# we need stuff on all instances, but not in the database.
self.stuff = []
#orm.reconstructor
def init_on_load(self):
self.stuff = []
If you are using any data from the DB to calculate rating I would recommend looking at hybrid property. Otherwise I would add self.rating to init and have your function inside the ManMapping class. Something like:
class ManMapping(Base):
__tablename__ = 'man'
id = Column('man_id', Integer, primary_key=True)
name = Column(Unicode)
def __init__(self)
self.rating = None
def get_rating(self):
return self.rating
In my point of view, you should have two distincts classes.
One for the logic in your code and one to communicate with your DB.
class Man(object):
"""This class is for your application"""
def __init__(self, name, rating):
# If the identifier is only used by the DB it should not be in this class
self.name = name
self.rating = rating
class ManModel(Base):
"""This model is only to communicate with the DB"""
__tablename__ = 'man'
id = Column('man_id', Integer, primary_key=True)
name = Column(Unicode)
You should have a provider that does queries to DB with ManModel objects, then maps results to Man objects and return your mapped data to the caller.
Your application will only use Man objects and your provider will do the mapping.
Something like below :
class DbProvider(object):
def get_man(self, id):
man_model = session.query(ManModel).filter(ManModel.id == id).one_or_none()
return self.man_mapper(man_model) if man_model else None
def get_men(self):
men_model = session.query(ManModel).all()
return [self.man_mapper(man_model) for man_model in men_model]
def man_mapper(self, man_model):
return Man(man_model.name, self.calculate_rating(man_model))
class Test(object):
def display_man(self):
man = db_provider.get_man(15)
if man:
print man.name, man.rating

Attaching extra information to model instance - django

I have a django model that I want to attach an extra piece of information to, depending on the environment the instance is in (which user is logged in). For this reason, I don't want to do it at the database level.
Is this okay to do? Or are there problems that I don't foresee?
in models.py
class FooOrBar(models.Model):
"""Type is 'foo' or 'bar'
"""
def __init__(self, type):
self.type = type
in views.py
class FooCheck(FooOrBar):
"""Never saved to the database
"""
def __init__(self, foo_or_bar):
self.__dict__ = foo_or_bar.__dict__.copy()
def check_type(self, external_type):
if external_type == 'foo':
self.is_foo = True
else:
self.is_foo = False
foos_or_bars = FooOrBar.objects.all()
foochecks = map(FooCheck, foos_or_bars)
for foocheck in foochecks:
foocheck.check_type('foo')
extra credit question: Is there a more efficient way of calling a method on multiple objects i.e. replacing the last forloop with something clever?
Okay, this does not work. Trying to delete a FooOrBar objects throws a complaint about
OperationalError at /
no such table: test_FooCheck
To get around this I'm just not going to inherit from FooOrBar, but if anyone has a suggestion on a better way to do it I'd be interested in hearing it
I had a similar issue, I did something like:
class Foo(models.Model):
# specific info goes here
class Bar(models.Model):
# specific info goes here
class FooBar(models.Model):
CLASS_TYPES = {
"foo":Foo,
"bar":Bar
}
type = models.CharField(choices=CLASS_TYPES)
id = models.IntegerField()
#field to identify FooBar
then you can get the object back using
object = FooBar.CLASS_TYPES[instance.type].objects.get(id=instance.id)
where instance is the FooBar instance

Equivalent of objects.latest() in App Engine

What would be the best way to get the latest inserted object using AppEngine ?
I know in Django this can be done using
MyObject.objects.latest()
in AppEngine I'd like to be able to do this
class MyObject(db.Model):
time = db.DateTimeProperty(auto_now_add=True)
# Return latest entry from MyObject.
MyObject.all().latest()
Any idea ?
Your best bet will be to implement a latest() classmethod directly on MyObject and call it like
latest = MyObject.latest()
Anything else would require monkeypatching the built-in Query class.
Update
I thought I'd see how ugly it would be to implement this functionality. Here's a mixin class you can use if you really want to be able to call MyObject.all().latest():
class LatestMixin(object):
"""A mixin for db.Model objects that will add a `latest` method to the
`Query` object returned by cls.all(). Requires that the ORDER_FIELD
contain the name of the field by which to order the query to determine the
latest object."""
# What field do we order by?
ORDER_FIELD = None
#classmethod
def all(cls):
# Get the real query
q = super(LatestMixin, cls).all()
# Define our custom latest method
def latest():
if cls.ORDER_FIELD is None:
raise ValueError('ORDER_FIELD must be defined')
return q.order('-' + cls.ORDER_FIELD).get()
# Attach it to the query
q.latest = latest
return q
# How to use it
class Foo(LatestMixin, db.Model):
ORDER_FIELD = 'timestamp'
timestamp = db.DateTimeProperty(auto_now_add=True)
latest = Foo.all().latest()
MyObject.all() returns an instance of the Query class
Order the results by time:
MyObject.all().order('-time')
So, assuming there is at least one entry, you can get the most recent MyObject directly by:
MyObject.all().order('-time')[0]
or
MyObject.all().order('-time').fetch(limit=1)[0]

Categories