SQLAlchemy: can a deferred column be eagerly loaded? - python

I have a declarative SQLAlchemy object with deferred columns, declared like this:
class Review(Base):
__tablename__ = 'review'
id = Column(Integer, primary_key=True)
name = Column(String(255))
large_field = deferred(Column(Text))
Sometimes I'd like queries to eagerly load these columns, or "undefer" them. I've tried this, but looking at the SQL output shows it isn't doing anything.
reviews = session.query(Review).options(eagerload('large_field')).all():
Is selective eager loading possible?

Yep, you can undefer it:
http://www.sqlalchemy.org/docs/orm/mapper_config.html?highlight=deferred#sqlalchemy.orm.undefer

Related

Why am I unable to generate a query using relationships?

I'm experimenting with relationship functionality within SQLAlchemy however I've not been able to crack it. The following is a simple MRE:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, ForeignKey, Integer, create_engine
from sqlalchemy.orm import relationship, sessionmaker
Base = declarative_base()
class Tournament(Base):
__tablename__ = "tournament"
__table_args__ = {"schema": "belgarath", "extend_existing": True}
id_ = Column(Integer, primary_key=True)
tournament_master_id = Column(Integer, ForeignKey("belgarath.tournament_master.id_"))
tournament_master = relationship("TournamentMaster", back_populates="tournament")
class TournamentMaster(Base):
__tablename__ = "tournament_master"
__table_args__ = {"schema": "belgarath", "extend_existing": True}
id_ = Column(Integer, primary_key=True)
tour_id = Column(Integer, index=True)
tournament = relationship("Tournament", back_populates="tournament_master")
engine = create_engine("mysql+mysqlconnector://root:root#localhost/")
Session = sessionmaker(bind=engine)
session = Session()
qry = session.query(Tournament.tournament_master.id_).limit(100)
I was hoping to be able to query the id_ field from the tournament_master table through a relationship specified in the tournament table. However I get the following error:
AttributeError: Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with Tournament.tournament_master has an attribute 'id_'
I've also tried replacing the two relationship lines with a single backref line in TournamentMaster:
tournament = relationship("Tournament", backref="tournament_master")
However I then get the error:
AttributeError: type object 'Tournament' has no attribute 'tournament_master'
Where am I going wrong?
(I'm using SQLAlchemy v1.3.18)
Your ORM classes look fine. It's the query that's incorrect.
In short you're getting that "InstrumentedAttribute" error because you are misusing the session.query method.
From the docs the session.query method takes as arguments, "SomeMappedClass" or "entities". You have 2 mapped classes defined, Tournament, and TournamentMaster. These "entities" are typically either your mapped classes (ORM objects) or a Column of these mapped classes.
However you are passing in Tournament.tournament_master.id_ which is not a "MappedClass" or a column and thus not an "entity" that session.query can consume.
Another way to look at it is that by calling Tournament.tournament_master.id_ you are trying to access a 'TournamentMaster' record (or instance) from the 'Tournament' class, which doesn't make sense.
It's not super clear to me what exactly you hoping to return from the query. In any case though here's a start.
Instead of
qry = session.query(Tournament.tournament_master.id_).limit(100)
try
qry = session.query(Tournament, TournamentMaster).join(TournamentMaster).limit(100)
This may also work (haven't tested) to only return the id_ field, if that is you intention
qry = session.query(Tournament, TournamentMaster).join(Tournament).with_entities(TournamentMaster.id_).limit(100)

Can a sqlalchemy relationship be accessed using a new session?

When using yield_per I am forced to use a separate session if I want to perform another query while the result from the yield_per query have not yet been all fetched.
Let's take this models for our example:
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True)
children = relationship("Child", backref="parent")
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey('parent.id'))
Here are three ways to achieve the same thing, but only the first one works:
query = session.query(Parent).yield_per(5)
for p in query:
print(p.id)
# 1: Using a new session (will work):
c = newsession.query(Child).filter_by(parent_id=p.id).first()
print(c.id)
# 2: Using the same session (will not work):
c = session.query(Child).filter_by(parent_id=p.id).first()
print(c.id)
# 3: Using the relationship (will not work):
c = p.children[0]
print(c.id)
Indeed (when using mysql) both 2 and 3 will throw an exception and stop execution with the following error: "Commands out of sync; you can't run this command now".
My question is, is there a way I can make relationship lookup work in this context ? Is there maybe a way to trick sqlalchemy into using a new session when the first one is busy ?
Try selectinload, as it supports eagerly loading with yield_per.
import sqlalchemy as sa
query = session.query(
Parent
).options(
sa.orm.selectinload(Parent.children)
).yield_per(5)
for parent in query:
for child in parent.children:
print(child.id)

SQL Alchemy ORM tables on-demand

Running into something guys and was hoping to get some ideas/help.
I have a database with the tree structure where leaf can participate in the several parents as a foreign key. The typical example is a city, which belongs to the country and to the continent. Needless to say that countries and continents should not be repeatable, hence before adding another city I need to find an object in the DB. If it doesn't exist I have to create it, but if for instance country doesn't exist yet, then I have to check for the continent and if this one doesn't exist then I have to have creation process for it.
So far I got around with the creation of a whole bunch of items if I run it from the single file, but if I push the SQL alchemy code into module the story becomes different. For some reason meta scope becomes limited and if the table doesn't exist yet, then the code start throwing ProgrammingError exceptions if I query for the foreign key presence (from the city for the country). I have intercepted it and in the __init__ class constructor of the class I am looking for (country) I am checking if the table exists and creating it if doesn't. Two things I have a problem with and need an advice on:
1) Verification of the table is inefficient - I am working with the Base.metadata.sorted_tables array through which I have to look through and figure out if the table structure is the one that matches my class __tablename__. Such as:
for table in Base.metadata.sorted_tables:
# Find a right table in the list of tables
if table.name == self.__tablename__:
if __DEBUG__:
print 'DEBUG: Found table {} that equal to the class table {}'.format(table.name, self.__tablename__)
if not table.exists():
session.get_bind().execute(table.create())
Needless to say, this takes time I am looking for more efficient way to do the same.
2) The second issue is with the inheritance of the declarative base (declarative_base()) with respect to the OOP in Python. I want to take some of the code repetitions away and pull them into one class from which the other classes will be derived from. For instance code above can be taken out into the separate function and have something like this:
Base = declarative_base()
class OnDemandTables(Base):
__tablename__ = 'no_table'
# id = Column(Integer, Sequence('id'), nullable=False, unique=True, primary_key=True, autoincrement=True)
def create_my_table(self, session):
if __DEBUG__:
print 'DEBUG: Creating tables for the class {}'.format(self.__class__)
print 'DEBUG: Base.metadata.sorted_tables exists returns {}'.format(Base.metadata.sorted_tables)
for table in Base.metadata.sorted_tables:
# Find a right table in the list of tables
if table.name == self.__tablename__:
if __DEBUG__:
print 'DEBUG: Found table {} that equal to the class table {}'.format(table.name, self.__tablename__)
if not table.exists():
session.get_bind().execute(table.create())
class Continent(OnDemandTables):
__tablename__ = 'continent'
id = Column(Integer, Sequence('id'), nullable=False, unique=True, primary_key=True, autoincrement=True)
name = Column(String(64), unique=True, nullable=False)
def __init__(self, session, continent_description):
if type(continent_description) != dict:
raise AttributeError('Continent should be described by the dictionary!')
else:
self.create_my_table(session)
if 'continent' not in continent_description:
raise ReferenceError('No continent can be created without a name!. Dictionary is {}'.
format(continent_description))
else:
self.name = continent_description['continent']
print 'DEBUG: Continent name is {} '.format(self.name)
The problem here is that the metadata is trying to link unrelated classes together and requires __tablename__ and some index column to be present in the parent OnDemandTables class, which doesn't make any sense to me.
Any ideas?
Cheers
Wanted to post the solution here for the rest of the gang to keep it in mind. Apparently, SQLAlchemy doesn't see the classes in the module if they are not being used, so to say. After couple days of trying to work around things, the simplest solution that I found was to do it in a semi-manual way - not rely on the ORM to construct and build-up the database for you, but rather do this part in a sort of manual approach using class methods. The code is:
__DEBUG__ = True
from sqlalchemy import String, Integer, Column, ForeignKey, BigInteger, Float, Boolean, Sequence
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
from sqlalchemy.orm.exc import MultipleResultsFound, NoResultFound
from sqlalchemy.exc import ProgrammingError
from sqlalchemy import create_engine, schema
from sqlalchemy.orm import sessionmaker
Base = declarative_base()
engine = create_engine("mysql://test:test123#localhost/test", echo=True)
Session = sessionmaker(bind=engine, autoflush=False)
session = Session()
schema.MetaData.bind = engine
class TemplateBase(object):
__tablename__ = None
#classmethod
def create_table(cls, session):
if __DEBUG__:
print 'DEBUG: Creating tables for the class {}'.format(cls.__class__)
print 'DEBUG: Base.metadata.sorted_tables exists returns {}'.format(Base.metadata.sorted_tables)
for table in Base.metadata.sorted_tables:
# Find a right table in the list of tables
if table.name == cls.__tablename__:
if __DEBUG__:
print 'DEBUG: Found table {} that equal to the class table {}'.format(table.name, cls.__tablename__)
if not table.exists():
if __DEBUG__:
print 'DEBUG: Session is {}, engine is {}, table is {}'.format(session, session.get_bind(), dir(table))
table.create()
#classmethod
def is_provisioned(cls):
for table in Base.metadata.sorted_tables:
# Find a right table in the list of tables
if table.name == cls.__tablename__:
if __DEBUG__:
print 'DEBUG: Found table {} that equal to the class table {}'.format(table.name, cls.__tablename__)
return table.exists()
class Continent(Base, TemplateBase):
__tablename__ = 'continent'
id = Column(Integer, Sequence('id'), nullable=False, unique=True, primary_key=True, autoincrement=True)
name = Column(String(64), unique=True, nullable=False)
def __init__(self, session, provision, continent_description):
if type(continent_description) != dict:
raise AttributeError('Continent should be described by the dictionary!')
else:
if 'continent' not in continent_description:
raise ReferenceError('No continent can be created without a name!. Dictionary is {}'.
format(continent_description))
else:
self.name = continent_description['continent']
if __DEBUG__:
print 'DEBUG: Continent name is {} '.format(self.name)
It gives the following:
1. Class methods is_provisioned and create_table can be called during initial code start and will reflect the database state
2. Class inheritance is done from the second class where these methods are being kept and which is not interfering with the ORM classes, hence is not being linked.
As the result of the Base.metadata.sorted_tables loop is just a class table, the code can be optimized even further removing the loop. The following action would be to organize classes to have their tables checked and possibly created in a form of a list with keeping in mind their linkages and then loop through them using is_provisioned and, if necessary, create table methods.
Hope it helps the others.
Regards

Remove double quotes in table name during SQLAlchemy code execution (Teradata)

I'm trying to write a basic ORM SQLAlchemy class to access a Teradata table. However, when SQLAlchemy creates and executes the SQL code, it puts my table name in double quotes, which prevents Teradata from recognizing the table as a valid table name (it's expecting the table name without quotes). Is there anyway to remove the quotes that SQLalchemy is executing with?
For example:
class d_game_info(Base):
__tablename__ = 'dbo.d_game_info'
game_id = Column(Integer, primary_key = True)
game_name = Column()
Session = sessionmaker(bind=td_engine)
session = Session()
for instance in session.query(d_game_info).order_by(d_game_info.game_id):
print(instance.game_name)
Results in the error:
"Object 'dbo.d_game_info' does not exist."
because the code SQLAlchemy tries to execute is
... FROM "dbo.d_game_info" ...
instead of
... FROM dbo.d_game_info ...
So... is there a way to force it to execute code without the double quotes?
Thanks!
dbo is not part of the table's name; it's the schema name of the table. The way to specify the schema in SQLAlchemy is like this:
class d_game_info(Base):
__table_args__ = {'schema' : 'dbo'}
You can use the quote parameter
class d_game_info(Base):
__tablename__ = 'dbo.d_game_info'
__table_args__ = {'quote': False}
game_id = Column(Integer, primary_key = True)
game_name = Column()
The quote parameter can also be used with Column() in case it is putting column name in quotes.
game_name = Column('GAME_NAME', String(50), quote=False)

Bulk inserts with Flask-SQLAlchemy

I'm using Flask-SQLAlchemy to do a rather large bulk insert of 60k rows. I also have a many-to-many relationship on this table, so I can't use db.engine.execute for this. Before inserting, I need to find similar items in the database, and change the insert to an update if a duplicate item is found.
I could do this check beforehand, and then do a bulk insert via db.engine.execute, but I need the primary key of the row upon insertion.
Currently, I am doing a db.session.add() and db.session.commit() on each insert, and I get a measly 3-4 inserts per second.
I ran a profiler to see where the bottleneck is, and it seems that the db.session.commit() is taking 60% of the time.
Is there some way that would allow me to make this operation faster, perhaps by grouping commits, but which would give me primary keys back?
This is what my models looks like:
class Item(db.Model):
id = db.Column(db.Integer, primary_key=True)
title = db.Column(db.String(1024), nullable=True)
created = db.Column(db.DateTime())
tags_relationship = db.relationship('Tag', secondary=tags, backref=db.backref('items', lazy='dynamic'))
tags = association_proxy('tags_relationship', 'text')
class Tag(db.Model):
id = db.Column(db.Integer, primary_key=True)
text = db.Column(db.String(255))
My insert operation is:
for item in items:
if duplicate:
update_existing_item
else:
x = Item()
x.title = "string"
x.created = datetime.datetime.utcnow()
for tag in tags:
if not tag_already_exists:
y = Tag()
y.text = "tagtext"
x.tags_relationship.append(y)
db.session.add(y)
db.session.commit()
else:
x.tags_relationship.append(existing_tag)
db.session.add(x)
db.session.commit()
Perhaps you should try to db.session.flush() to send the data to the server, which means any primary keys will be generated. At the end you can db.session.commit() to actually commit the transaction.
I use the following code to quickly read the content of a pandas DataFrame into SQLite. Note that it circumvents the ORM features of SQLAlchemy. myClass in this context is a db.Model derived class that has a tablename assigned to it. As the code snippets mentions, I adapted
l = df.to_dict('records')
# bulk save the dictionaries, circumventing the slow ORM interface
# c.f. https://gist.github.com/shrayasr/5df96d5bc287f3a2faa4
connection.engine.execute(
myClass.__table__.insert(),
l
)
from app import db
data = [{"attribute": "value"}, {...}, {...}, ... ]
db.engine.execute(YourModel.__table__.insert(), data)
for more information refer https://gist.github.com/shrayasr/5df96d5bc287f3a2faa4

Categories