Instantiate declarative classes in SQLAlchemy without instrumentation - python

I'm working on an application that uses sqlalchemy to pull a lot of data. The application does a lot of computations on the resulting objects, but never makes any changes to those objects. The application is too slow, and profiling suggests a lot of the time is spent accessing attributes that are managed by sqlalchemy. Therefore, I'm trying to figure out how to prevent sqlalchemy from instrumenting these objects.
So far, I've made a little progress based on this answer and this example. I've found two different ways to do it, but neither one seems to work with relationships. I'm mostly proceeding by poking around without any comprehensive understanding of the internals of sqlalchemy. In both cases, I'm using the following test session and data:
engine = create_engine('sqlite://')
engine.execute('CREATE TABLE foo (id INTEGER, name TEXT)')
engine.execute('INSERT INTO foo VALUES (0, \'Jeremy\')')
engine.execute('CREATE TABLE bar (id INTEGER, foo_id Integer)')
engine.execute('INSERT INTO bar VALUES (0, 0)')
engine.execute('INSERT INTO bar VALUES (1, 0)')
Session = sessionmaker(bind=engine)
session = Session()
Then I either (1) use a custom subclass of InstrumentationManager:
class ReadOnlyInstrumentationManager(InstrumentationManager):
def install_descriptor(self, class_, key, inst):
pass
Base = declarative_base()
Base.__sa_instrumentation_manager__ = ReadOnlyInstrumentationManager
class Bar(Base):
__tablename__ = 'bar'
id = Column(Integer, primary_key=True)
foo_id = Column(Integer, ForeignKey('foo.id'))
class Foo(Base):
__tablename__ = 'foo'
id = Column(Integer, primary_key=True)
name = Column(String(32))
bars = relationship(Bar)
f = session.query(Foo).first()
print type(f.bars)
which gives:
<class 'sqlalchemy.orm.relationships.RelationshipProperty'>
instead of the expected list of bars.
Or, (2) use a custom subclass of ClassManager:
class MyClassManager(ClassManager):
def new_instance(self, state=None):
if hasattr(self.class_, '__readonly_type__'):
instance = self.class_.__readonly_type__.__new__(self.class_.__readonly_type__)
else:
instance = self.class_.__new__(self.class_)
self.setup_instance(instance, state)
return instance
Base = declarative_base()
Base.__sa_instrumentation_manager__ = MyClassManager
class ReadonlyFoo(object):
pass
class ReadonlyBar(object):
pass
class Bar(Base, ReadonlyBar):
__tablename__ = 'bar'
__readonly_type__ = ReadonlyBar
id = Column(Integer, primary_key=True)
foo_id = Column(Integer, ForeignKey('foo.id'))
class Foo(Base, ReadonlyFoo):
__tablename__ = 'foo'
__readonly_type__ = ReadonlyFoo
id = Column(Integer, primary_key=True)
name = Column(String(32))
bars = relationship(Bar)
f = session.query(Foo).first()
print f.bars
which gives:
AttributeError: 'ReadonlyFoo' object has no attribute 'bars'
Is there a way to modify one of these approaches so that relationships still work? Or, is there another approach to this problem that's better?

Related

How do I change the schema for both a table and a foreign key?

I have the following simplified database access layer and two tables:
class DataAccessLayer():
def __init__(self):
conn_string = "mysql+mysqlconnector://root:root#localhost/"
self.engine = create_engine(conn_string)
Base.metadata.create_all(self.engine)
Session = sessionmaker()
Session.configure(bind=self.engine)
self.session = Session()
class MatchesATP(Base):
__tablename__ = "matches_atp"
__table_args__ = {"schema": "belgarath", "extend_existing": True}
ID_M = Column(Integer, primary_key=True)
ID_T_M = Column(Integer, ForeignKey("oncourt.tours_atp.ID_T"))
class TournamentsATP(Base):
__tablename__ = "tours_atp"
__table_args__ = {"schema": "oncourt", "extend_existing": True}
ID_T = Column(Integer, primary_key=True)
NAME_T = Column(String(255))
I want to be able to switch the schema names for the two tables to test databases as follows:
belgarath to belgarath_test
oncourt to oncourt_test
I've tried adding:
self.session.connection(execution_options={"schema_translate_map": {"belgarath": belgarath, "oncourt": oncourt}})
To the bottom of DataAccessLayer and then initialising the class with two variables as follows:
def __init__(self, belgarath, oncourt):
However, when I build the following query:
dal = DataAccessLayer("belgarath_test", "oncourt_test")
query = dal.session.query(MatchesATP)
print(query)
I get the following SQL:
SELECT belgarath.matches_atp.`ID_M` AS `belgarath_matches_atp_ID_M`, belgarath.matches_atp.`ID_T_M` AS `belgarath_matches_atp_ID_T_M`
FROM belgarath.matches_atp
This is still referencing the belgarath table.
I also can't figure out a way of changing the schema of the foreign key of oncourt.tours_atp.ID_T at the same time as the tables.
Are there individual solutions or a combined solution to my issues?
You might wanna decorate your subclassed Base declarative model with the #declared_attr decorator.
Try this--
In a base class for your models, say __init__.py...
from sqlalchemy.ext.declarative import declarative_base, declared_attr
SCHEMA_MAIN = 'belgarath' # figure out how you want to retrieve this
SCHEMA_TEST = 'belgarath_test'
class _Base(object):
#declared_attr
def __table_args__(cls):
return {'schema': SCHEMA_MAIN}
...
Base = declarative_base(cls=_Base)
Base.metadata.schema = SCHEMA_MAIN
Now that you have a Base that subclasses _Base with the main schema already defined, all your other models will subclass Base and do the following:
from . import Base, declared_attr, SCHEMA_TEST
class TestModel(Base):
#declared_attr
def __table_args__(cls):
return {'schema': SCHEMA_TEST}
Changing a schema for a foreign key could look like this:
class TournamentsATP(Base):
__tablename__ = "tours_atp"
__table_args__ = {"schema": "oncourt", "extend_existing": True}
ID_T = Column(Integer, primary_key=True)
NAME_T = Column(String(255))
match_id = Column('match_id', Integer, ForeignKey(f'{__table_args__.get("schema")}.matches_atp.id'))
Where match_id is a foreign key to matches_atp.id by using the __table_args[schema] element defined at the class level via #declared_attr.
It only took me 18 months to figure this out. Turns out I needed to add the schema_translate_map to an engine and then create the session with this engine:
from sqlalchemy import create_engine
engine = create_engine(conn_str, echo=False)
schema_engine = engine.execution_options(schema_translate_map={<old_schema_name>: <new_schema_name>})
NewSession = sessionmaker(bind=schema_engine)
session = NewSession()
All ready to roll...
Assuming your goal is to:
have dev/test/prod schemas on a single mysql host
allow your ORM classes to be flexible enough to be used in three different environments without modification
Then John has you most of the way to one type of solution. You could use #declared_attr to dynamically generate __table_args__ as he has suggested.
You could also consider using something like flask-sqlalchemy that comes with a built-in solution for this:
import os
DB_ENV = os.getenv(DB_ENV)
SQLALCHEMY_BINDS = {
'belgarath': 'mysql+mysqlconnector://root:root#localhost/belgarath{}'.format(DB_ENV),
'oncourt': 'mysql+mysqlconnector://root:root#localhost/oncourt{}'.format(DB_ENV)
}
class MatchesATP(Base):
__bind_key__ = "belgarath"
ID_M = Column(Integer, primary_key=True)
ID_T_M = Column(Integer, ForeignKey("oncourt.tours_atp.ID_T"))
class TournamentsATP(Base):
__bind_key__ = "oncourt"
ID_T = Column(Integer, primary_key=True)
NAME_T = Column(String(255))
Basically this method allows you to create a link to a schema (a bind key), and that schema is defined at run-time via the connection string. More information at the flask-sqlalchemy link.

How to avoid inserting duplicate entries when adding values via a sqlalchemy relationship?

Let's assume we have two tables in a many to many relationship as shown below:
class User(db.Model):
__tablename__ = 'user'
uid = db.Column(db.String(80), primary_key=True)
languages = db.relationship('Language', lazy='dynamic',
secondary='user_language')
class UserLanguage(db.Model):
__tablename__ = 'user_language'
__tableargs__ = (db.UniqueConstraint('uid', 'lid', name='user_language_ff'),)
id = db.Column(db.Integer, primary_key=True)
uid = db.Column(db.String(80), db.ForeignKey('user.uid'))
lid = db.Column(db.String(80), db.ForeignKey('language.lid'))
class Language(db.Model):
lid = db.Column(db.String(80), primary_key=True)
language_name = db.Column(db.String(30))
Now in the python shell:
In [4]: user = User.query.all()[0]
In [11]: user.languages = [Language('1', 'English')]
In [12]: db.session.commit()
In [13]: user2 = User.query.all()[1]
In [14]: user2.languages = [Language('1', 'English')]
In [15]: db.session.commit()
IntegrityError: (IntegrityError) column lid is not unique u'INSERT INTO language (lid, language_name) VALUES (?, ?)' ('1', 'English')
How can I let the relationship know that it should ignore duplicates and not break the unique constraint for the Language table? Of course, I could insert each language separately and check if the entry already exists in the table beforehand, but then much of the benefit offered by sqlalchemy relationships is gone.
The SQLAlchemy wiki has a collection of examples, one of which is how you might check uniqueness of instances.
The examples are a bit convoluted though. Basically, create a classmethod get_unique as an alternate constructor, which will first check a session cache, then try a query for existing instances, then finally create a new instance. Then call Language.get_unique(id, name) instead of Language(id, name).
I've written a more detailed answer in response to OP's bounty on another question.
I would suggest to read Association Proxy: Simplifying Association Objects. In this case your code would translate into something like below:
# NEW: need this function to auto-generate the PK for newly created Language
# here using uuid, but could be any generator
def _newid():
import uuid
return str(uuid.uuid4())
def _language_find_or_create(language_name):
language = Language.query.filter_by(language_name=language_name).first()
return language or Language(language_name=language_name)
class User(Base):
__tablename__ = 'user'
uid = Column(String(80), primary_key=True)
languages = relationship('Language', lazy='dynamic',
secondary='user_language')
# proxy the 'language_name' attribute from the 'languages' relationship
langs = association_proxy('languages', 'language_name',
creator=_language_find_or_create,
)
class UserLanguage(Base):
__tablename__ = 'user_language'
__tableargs__ = (UniqueConstraint('uid', 'lid', name='user_language_ff'),)
id = Column(Integer, primary_key=True)
uid = Column(String(80), ForeignKey('user.uid'))
lid = Column(String(80), ForeignKey('language.lid'))
class Language(Base):
__tablename__ = 'language'
# NEW: added a *default* here; replace with your implementation
lid = Column(String(80), primary_key=True, default=_newid)
language_name = Column(String(30))
# test code
user = User(uid="user-1")
# NEW: add languages using association_proxy property
user.langs.append("English")
user.langs.append("Spanish")
session.add(user)
session.commit()
user2 = User(uid="user-2")
user2.langs.append("English") # this will not create a new Language row...
user2.langs.append("German")
session.add(user2)
session.commit()

SQLAlchemy Bidirectional Association Proxy

I'm trying to create a simple many to many relationship with a mapping table containing metadata about the relationship it represents with association proxies on both ends using SQLAlchemy. However, I can't seem to get it to work. Here's the toy example I've been working with to try to figure it out:
Base = declarative_base()
def bar_creator(bar):
_ = FooBar(bar=bar)
return bar
class Foo(Base):
__tablename__ = 'foo'
id = Column(Integer, primary_key=True)
name = Column(String)
bars = association_proxy('bar_associations', 'bar',
creator=bar_creator)
def foo_creator(foo):
_ = FooBar(foo=foo)
return foo
class Bar(Base):
__tablename__ = 'bar'
id = Column(Integer, primary_key=True)
name = Column(String)
foos = association_proxy('foo_associations', 'foo',
creator=foo_creator)
class FooBar(Base):
__tablename__ = 'fooBar'
foo_id = Column(Integer, ForeignKey('foo.id'), primary_key=True)
bar_id = Column(Integer, ForeignKey('bar.id'), primary_key=True)
bazed = Column(Boolean)
foo = relationship(Foo, backref='bar_associations')
bar = relationship(Bar, backref='foo_associations')
Base.metadata.create_all(engine)
make_session = sessionmaker(bind=engine)
session = make_session()
foo0 = Foo(name='foo0')
session.add(foo0)
bar0 = Bar(name='bar0')
foo0.bars.append(bar0)
I added the creator functions so I could avoid writing an __init__ that won't work for my actual use case (takes single argument), and included the creation of the FooBar in each because I read in some of the documentation that the item being appended needs to already have a linking table instance associated with it. I'm sure I'm just missing something obvious (or maybe even trying to do something that just can't be done), but after much digging through the docs and Googling, I can't figure out why it doesn't work. What am I doing wrong?
Your problem lies in the creator: It should return the new instance of FooBar and not bar or foo:
def bar_creator(value):
return FooBar(bar=value)
And analogous for foo_creator.

SQLAlchemy declarative property from join (single attribute, not whole object)

I wish to create a mapped attribute of an object which is populated from another table.
Using the SQLAlchemy documentation example, I wish to make a user_name field exist on the Address class such that it can be both easily queried and easily accessed (without a second round trip to the database)
For example, I wish to be able to query and filter by user_name Address.query.filter(Address.user_name == 'wcdolphin').first()
And also access the user_name attribute of all Address objects, without performance penalty, and have it properly persist writes as would be expected of an attribute in the __tablename__
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String(50))
addresses = relation("Address", backref="user")
class Address(Base):
__tablename__ = 'addresses'
id = Column(Integer, primary_key=True)
email = Column(String(50))
user_name = Column(Integer, ForeignKey('users.name'))#This line is wrong
How do I do this?
I found the documentation relatively difficult to understand, as it did not seem to conform to most examples, especially the Flask-SQLAlchemy examples.
You can do this with a join on the query object, no need to specify this attribute directly. So your model would look like:
from sqlalchemy import create_engine, Column, Integer, String, ForeignKey
from sqlalchemy.orm import sessionmaker, relation
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
engine = create_engine('sqlite:///')
Session = sessionmaker(bind=engine)
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String(50))
addresses = relation("Address", backref="user")
class Address(Base):
__tablename__ = 'addresses'
id = Column(Integer, primary_key=True)
email = Column(String(50))
user_id = Column(Integer, ForeignKey("users.id"))
Base.metadata.create_all(engine)
A query after addresses with filtering the username looks like:
>>> session = Session()
>>> session.add(Address(user=User(name='test')))
>>> session.query(Address).join(User).filter(User.name == 'test').first()
<__main__.Address object at 0x02DB3730>
Edit: As you can directly access the user from an address object, there is no need for directly referencing an attribute to the Address class:
>>> a = session.query(Address).join(User).filter(User.name == 'test').first()
>>> a.user.name
'test'
If you truly want Address to have a SQL enabled version of "User.name" without the need to join explicitly, you need to use a correlated subquery. This will work in all cases but tends to be inefficient on the database side (particularly with MySQL), so there is possibly a performance penalty on the SQL side versus using a regular JOIN. Running some EXPLAIN tests may help to analyze how much of an effect there may be.
Another example of a correlated column_property() is at http://docs.sqlalchemy.org/en/latest/orm/mapped_sql_expr.html#using-column-property.
For the "set" event, a correlated subquery represents a read-only attribute, but an event can be used to intercept changes and apply them to the parent User row. Two approaches to this are presented below, one using regular identity map mechanics, which will incur a load of the User row if not already present, the other which emits a direct UPDATE to the row:
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
Base= declarative_base()
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String(50))
addresses = relation("Address", backref="user")
class Address(Base):
__tablename__ = 'addresses'
id = Column(Integer, primary_key=True)
user_id = Column(Integer, ForeignKey('users.id'))
email = Column(String(50))
Address.user_name = column_property(select([User.name]).where(User.id==Address.id))
from sqlalchemy import event
#event.listens_for(Address.user_name, "set")
def _set_address_user_name(target, value, oldvalue, initiator):
# use ORM identity map + flush
target.user.name = value
# use direct UPDATE
#object_session(target).query(User).with_parent(target).update({'name':value})
e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)
s = Session(e)
s.add_all([
User(name='u1', addresses=[Address(email='e1'), Address(email='e2')])
])
s.commit()
a1 = s.query(Address).filter(Address.user_name=="u1").first()
assert a1.user_name == "u1"
a1.user_name = 'u2'
s.commit()
a1 = s.query(Address).filter(Address.user_name=="u2").first()
assert a1.user_name == "u2"

SQLAlchemy: avoiding repetition in declarative style class definition

I'm using SQLAlchemy, and many classes in my object model have the same two attributes: id and (integer & primary key), and name (a string). I'm trying to avoid declaring them in every class like so:
class C1(declarative_base()):
id = Column(Integer, primary_key = True)
name = Column(String)
#...
class C2(declarative_base()):
id = Column(Integer, primary_key = True)
name = Column(String)
#...
What's a good way to do that? I tried using metaclasses but it didn't work yet.
You could factor out your common attributes into a mixin class, and multiply inherit it alongside declarative_base():
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
class IdNameMixin(object):
id = Column(Integer, primary_key=True)
name = Column(String)
class C1(declarative_base(), IdNameMixin):
__tablename__ = 'C1'
class C2(declarative_base(), IdNameMixin):
__tablename__ = 'C2'
print C1.__dict__['id'] is C2.__dict__['id']
print C1.__dict__['name'] is C2.__dict__['name']
EDIT: You might think this would result in C1 and C2 sharing the same Column objects, but as noted in the SQLAlchemy docs, Column objects are copied when originating from a mixin class. I've updated the code sample to demonstrate this behavior.
Could you also use the Column's copy method? This way, fields can be defined independently of tables, and those fields that are reused are just field.copy()-ed.
id = Column(Integer, primary_key = True)
name = Column(String)
class C1(declarative_base()):
id = id.copy()
name = name.copy()
#...
class C2(declarative_base()):
id = id.copy()
name = name.copy()
#...
I think I got it to work.
I created a metaclass that derives from DeclarativeMeta, and made that the metaclass of C1 and C2. In that new metaclass, I simply said
def __new__(mcs, name, base, attr):
attr['__tablename__'] = name.lower()
attr['id'] = Column(Integer, primary_key = True)
attr['name'] = Column(String)
return super().__new__(mcs, name, base, attr)
And it seems to work fine.

Categories