sqlalchemy use of inheritance in postgres - python

in an attempt to learn sqlalchemy (and python), i am trying to duplicate an already existing project, but am having trouble figuring out sqlalchemy and inheritance with postgres.
here is an example of what our postgres database does (obviously, this is simplified):
CREATE TABLE system (system_id SERIAL PRIMARY KEY,
system_name VARCHAR(24) NOT NULL);
CREATE TABLE file_entry(file_entry_id SERIAL,
file_entry_msg VARCHAR(256) NOT NULL,
file_entry_system_name VARCHAR(24) REFERENCES system(system_name) NOT NULL);
CREATE TABLE ops_file_entry(CONSTRAINT ops_file_entry_id_pkey PRIMARY KEY (file_entry_id),
CONSTRAINT ops_system_name_check CHECK ((file_entry_system_name = 'ops'::bpchar))) INHERITS (file_entry);
CREATE TABLE eng_file_entry(CONSTRAINT eng_file_entry_id_pkey PRIMARY KEY (file_entry_id),
CONSTRAINT eng_system_name_check CHECK ((file_entry_system_name = 'eng'::bpchar)) INHERITS (file_entry);
CREATE INDEX ops_file_entry_index ON ops_file_entry USING btree (file_entry_system_id);
CREATE INDEX eng_file_entry_index ON eng_file_entry USING btree (file_entry_system_id);
And then the inserts would be done with a trigger, so that they were properly inserted into the child databases. Something like:
CREATE FUNCTION file_entry_insert_trigger() RETURNS "trigger"
AS $$
DECLARE
BEGIN
IF NEW.file_entry_system_name = 'eng' THEN
INSERT INTO eng_file_entry(file_entry_id, file_entry_msg, file_entry_type, file_entry_system_name) VALUES (NEW.file_entry_id, NEW.file_entry_msg, NEW.file_entry_type, NEW.file_entry_system_name);
ELSEIF NEW.file_entry_system_name = 'ops' THEN
INSERT INTO ops_file_entry(file_entry_id, file_entry_msg, file_entry_type, file_entry_system_name) VALUES (NEW.file_entry_id, NEW.file_entry_msg, NEW.file_entry_type, NEW.file_entry_system_name);
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
in summary, i have a parent table with a foreign key to another table. then i have 2 child tables that exist, and the inserts are done based upon a given value. in my example above, if file_entry_system_name is 'ops', then the row goes into the ops_file_entry table; 'eng' goes into eng_file_entry_table. we have hundreds of children tables in our production environment, and considering the amount of data, it really speeds things up, so i would like to keep this same structure. i can query the parent, and as long as i give it the right 'system_name', it immediately knows which child table to look into.
my desire is to emulate this with sqlalchemy, but i can't find any examples that go into this much detail. i look at the sql generated by sqlalchemy by examples, and i can tell it is not doing anything similar to this on the database side.
the best i can come up with is something like:
class System(_Base):
__tablename__ = 'system'
system_id = Column(Integer, Sequence('system_id_seq'), primary_key = True)
system_name = Column(String(24), nullable=False)
def __init(self, name)
self.system_name = name
class FileEntry(_Base):
__tablename__ = 'file_entry'
file_entry_id = Column(Integer, Sequence('file_entry_id_seq'), primary_key=True)
file_entry_msg = Column(String(256), nullable=False)
file_entry_system_name = Column(String(24), nullable=False, ForeignKey('system.system_name'))
__mapper_args__ = {'polymorphic_on': file_entry_system_name}
def __init__(self, msg, name)
self.file_entry_msg = msg
self.file_entry_system_name = name
class ops_file_entry(FileEntry):
__tablename__ = 'ops_file_entry'
ops_file_entry_id = Column(None, ForeignKey('file_entry.file_entry_id'), primary_key=True)
__mapper_args__ = {'polymorphic_identity': 'ops_file_entry'}
in the end, what am i missing? how do i tell sqlalchemy to associate anything that is inserted into FileEntry with a system name of 'ops' to go to the 'ops_file_entry' table? is my understanding way off?
some insight into what i should do would be amazing.

You just create a new instance of ops_file_entry (shouldn't this be OpsFileEntry?), add it into the session, and upon flush, one row will be inserted into table file_entry as well as table ops_file_entry.
You don't need to set the file_entry_system_name attribute, nor the trigger.

I don't really know python or sqlalchemy, but I figured I'd give it a shot for old times sake. ;)
Have you tried basically setting up your own trigger at the application level? Something like this might work:
from sqlalchemy import event, orm
def my_after_insert_listener(mapper, connection, target):
# set up your constraints to store the data as you want
if target.file_entry_system_name = 'eng'
# do your child table insert
   elseif target.file_entry_system_name = 'ops'
# do your child table insert
#…
mapped_file_entry_class = orm.mapper(FileEntry, 'file_entry')
# associate the listener function with FileEntry,
# to execute during the "after_insert" hook
event.listen(mapped_file_entry_class, 'after_insert', my_after_insert_listener)
I'm not positive, but I think target (or perhaps mapper) should contain the data being inserted.
Events (esp. after_create) and mapper will probably be helpful.

Related

SQLAlchemy - defining a foreign key relationship in a different database

I'm using sqlalchemy declarative and python2.7 to read asset information from an existing database. The database uses a number of foreign keys for constant values. Many of the foreign keys exist on a different database.
How can I specify a foreign key relationship where the data exists on a separate database?
I've tried to use two separate Base classes, with the models inheriting from them separately.
I've also looked into specifying the primaryjoin keyword in relationship, but I've been unable to understand how it would be done in this case.
I think the problem is that I can only bind one engine to a session object. I can't see any way to ask sqlalchemy to use a different engine when making a query on a nested foreign key item.
OrgBase = declarative_base()
CommonBase = declarative_base()
class SomeClass:
def __init__(sql_user, sql_pass, sql_host, org_db, common_host, common)
self.engine = create_engine("{type}://{user}:{password}#{url}/{name}".format(type=db_type,
user=sql_user,
password=sql_pass,
url=sql_host,
name=org_db))
self.engine_common = create_engine("{type}://{user}:{password}#{url}/{name}".format(type=db_type,
user=sql_user,
password=sql_pass,
url=common_host,
name="common"))
self.session = sessionmaker(bind=self.engine)()
OrgBase.metadata.bind = self.engine
CommonBase.metadata.bind = self.engine_common
models.py:
class FrameRate(CommonBase):
__tablename__ = 'content_frame_rates'
__table_args__ = {'autoload': True}
class VideoAsset(OrgBase):
__tablename__ = 'content_video_files'
__table_args__ = {'autoload': True}
frame_rate_id = Column(Integer, ForeignKey('content_frame_rates.frame_rate_id'))
frame_rate = relationship(FrameRate, foreign_keys=[frame_rate_id])
Error with this code:
NoReferencedTableError: Foreign key associated with column 'content_video_files.frame_rate_id' could not find table 'content_frame_rates' with which to generate a foreign key to target column 'frame_rate_id'
if I run:
asset = self.session.query(self.VideoAsset).filter_by(uuid=asset_uuid).first()
My hope is that the VideoAsset model can nest frame_rate properly, finding the value on the separate database.
Thank you!

SQL Alchemy ORM tables on-demand

Running into something guys and was hoping to get some ideas/help.
I have a database with the tree structure where leaf can participate in the several parents as a foreign key. The typical example is a city, which belongs to the country and to the continent. Needless to say that countries and continents should not be repeatable, hence before adding another city I need to find an object in the DB. If it doesn't exist I have to create it, but if for instance country doesn't exist yet, then I have to check for the continent and if this one doesn't exist then I have to have creation process for it.
So far I got around with the creation of a whole bunch of items if I run it from the single file, but if I push the SQL alchemy code into module the story becomes different. For some reason meta scope becomes limited and if the table doesn't exist yet, then the code start throwing ProgrammingError exceptions if I query for the foreign key presence (from the city for the country). I have intercepted it and in the __init__ class constructor of the class I am looking for (country) I am checking if the table exists and creating it if doesn't. Two things I have a problem with and need an advice on:
1) Verification of the table is inefficient - I am working with the Base.metadata.sorted_tables array through which I have to look through and figure out if the table structure is the one that matches my class __tablename__. Such as:
for table in Base.metadata.sorted_tables:
# Find a right table in the list of tables
if table.name == self.__tablename__:
if __DEBUG__:
print 'DEBUG: Found table {} that equal to the class table {}'.format(table.name, self.__tablename__)
if not table.exists():
session.get_bind().execute(table.create())
Needless to say, this takes time I am looking for more efficient way to do the same.
2) The second issue is with the inheritance of the declarative base (declarative_base()) with respect to the OOP in Python. I want to take some of the code repetitions away and pull them into one class from which the other classes will be derived from. For instance code above can be taken out into the separate function and have something like this:
Base = declarative_base()
class OnDemandTables(Base):
__tablename__ = 'no_table'
# id = Column(Integer, Sequence('id'), nullable=False, unique=True, primary_key=True, autoincrement=True)
def create_my_table(self, session):
if __DEBUG__:
print 'DEBUG: Creating tables for the class {}'.format(self.__class__)
print 'DEBUG: Base.metadata.sorted_tables exists returns {}'.format(Base.metadata.sorted_tables)
for table in Base.metadata.sorted_tables:
# Find a right table in the list of tables
if table.name == self.__tablename__:
if __DEBUG__:
print 'DEBUG: Found table {} that equal to the class table {}'.format(table.name, self.__tablename__)
if not table.exists():
session.get_bind().execute(table.create())
class Continent(OnDemandTables):
__tablename__ = 'continent'
id = Column(Integer, Sequence('id'), nullable=False, unique=True, primary_key=True, autoincrement=True)
name = Column(String(64), unique=True, nullable=False)
def __init__(self, session, continent_description):
if type(continent_description) != dict:
raise AttributeError('Continent should be described by the dictionary!')
else:
self.create_my_table(session)
if 'continent' not in continent_description:
raise ReferenceError('No continent can be created without a name!. Dictionary is {}'.
format(continent_description))
else:
self.name = continent_description['continent']
print 'DEBUG: Continent name is {} '.format(self.name)
The problem here is that the metadata is trying to link unrelated classes together and requires __tablename__ and some index column to be present in the parent OnDemandTables class, which doesn't make any sense to me.
Any ideas?
Cheers
Wanted to post the solution here for the rest of the gang to keep it in mind. Apparently, SQLAlchemy doesn't see the classes in the module if they are not being used, so to say. After couple days of trying to work around things, the simplest solution that I found was to do it in a semi-manual way - not rely on the ORM to construct and build-up the database for you, but rather do this part in a sort of manual approach using class methods. The code is:
__DEBUG__ = True
from sqlalchemy import String, Integer, Column, ForeignKey, BigInteger, Float, Boolean, Sequence
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
from sqlalchemy.orm.exc import MultipleResultsFound, NoResultFound
from sqlalchemy.exc import ProgrammingError
from sqlalchemy import create_engine, schema
from sqlalchemy.orm import sessionmaker
Base = declarative_base()
engine = create_engine("mysql://test:test123#localhost/test", echo=True)
Session = sessionmaker(bind=engine, autoflush=False)
session = Session()
schema.MetaData.bind = engine
class TemplateBase(object):
__tablename__ = None
#classmethod
def create_table(cls, session):
if __DEBUG__:
print 'DEBUG: Creating tables for the class {}'.format(cls.__class__)
print 'DEBUG: Base.metadata.sorted_tables exists returns {}'.format(Base.metadata.sorted_tables)
for table in Base.metadata.sorted_tables:
# Find a right table in the list of tables
if table.name == cls.__tablename__:
if __DEBUG__:
print 'DEBUG: Found table {} that equal to the class table {}'.format(table.name, cls.__tablename__)
if not table.exists():
if __DEBUG__:
print 'DEBUG: Session is {}, engine is {}, table is {}'.format(session, session.get_bind(), dir(table))
table.create()
#classmethod
def is_provisioned(cls):
for table in Base.metadata.sorted_tables:
# Find a right table in the list of tables
if table.name == cls.__tablename__:
if __DEBUG__:
print 'DEBUG: Found table {} that equal to the class table {}'.format(table.name, cls.__tablename__)
return table.exists()
class Continent(Base, TemplateBase):
__tablename__ = 'continent'
id = Column(Integer, Sequence('id'), nullable=False, unique=True, primary_key=True, autoincrement=True)
name = Column(String(64), unique=True, nullable=False)
def __init__(self, session, provision, continent_description):
if type(continent_description) != dict:
raise AttributeError('Continent should be described by the dictionary!')
else:
if 'continent' not in continent_description:
raise ReferenceError('No continent can be created without a name!. Dictionary is {}'.
format(continent_description))
else:
self.name = continent_description['continent']
if __DEBUG__:
print 'DEBUG: Continent name is {} '.format(self.name)
It gives the following:
1. Class methods is_provisioned and create_table can be called during initial code start and will reflect the database state
2. Class inheritance is done from the second class where these methods are being kept and which is not interfering with the ORM classes, hence is not being linked.
As the result of the Base.metadata.sorted_tables loop is just a class table, the code can be optimized even further removing the loop. The following action would be to organize classes to have their tables checked and possibly created in a form of a list with keeping in mind their linkages and then loop through them using is_provisioned and, if necessary, create table methods.
Hope it helps the others.
Regards

(pymysql.err.IntegrityError) (1451, 'Cannot delete or update a parent row: a foreign key constraint fails...') in Flask

I'm aware that there are many questions about the error in the title, but I could't find a suitable solution. My problem is that while deleting a row using Session.delete() it's throwing
sqlalchemy.exc.IntegrityError: (pymysql.err.IntegrityError) (1451, 'Cannot delete or update a parent row: a foreign key constraint fails (`transport`.`driver`, CONSTRAINT `driver_ibfk_1` FOREIGN KEY (`owner_id`) REFERENCES `truckcompany` (`id`))') [SQL: 'DELETE FROM truckcompany WHERE truckcompany.id = %(id)s'] [parameters: {'id': 4}]
Models:
class Truck_company(Base):
__tablename__ = 'truckcompany'
id = Column(BigInteger, primary_key=True)
class Driver(Base):
__tablename__ = 'driver'
id = Column(BigInteger, primary_key=True)
owner_id = Column(BigInteger, ForeignKey('truckcompany.id'))
owner = relationship(Truck_company)
The view with the failing delete:
#app.route('/values/deleteuser/<int:id>', methods=['POST', 'GET'])
def delete_truck(id):
value_truckcompany = sqlsession.query(Truck_company).filter(Truck_company.id == id).first()
if value_truckcompany:
sqlsession.delete(value_truckcompany)
sqlsession.commit()
return redirect('/static/truckcompanyview', )
Why
In your Driver model there's a foreign key constraint referencing Truck_company:
class Driver(Base):
...
owner_id = Column(BigInteger, ForeignKey('truckcompany.id'))
You've omitted the ON DELETE action, so MySQL defaults to RESTRICT. Also there are no SQLAlchemy ORM relationships with cascades that would delete the related drivers. So when you try to delete the truck company in the view, the DB stops you from doing that, because you'd violate your foreign key constraint and in other words referential integrity. This is an issue with how you've modeled your DB, not Flask etc.
What can I do
The most important thing to do – when you're creating your model – is to decide what would you like to happen when you delete a truck company with related drivers. Your options include, but are not limited to:
Deleting the drivers also.
Setting their owner_id to NULL, effectively detaching them. This is what SQLAlchemy does, if an ORM relationship is present in its default configuration in the parent.
It is also a perfectly valid solution to restrict deleting parent rows with children, as you've implicitly done.
You've expressed in the comments that you'd like to remove the related drivers. A quick solution is to just manually issue a DELETE:
# WARNING: Allowing GET in a data modifying view is a terrible idea.
# Prepare yourself for when Googlebot, some other spider, or an overly
# eager browser nukes your DB.
#app.route('/values/deleteuser/<int:id>', methods=['POST', 'GET'])
def delete_truck(id):
value_truckcompany = sqlsession.query(Truck_company).get(id)
if value_truckcompany:
sqlsession.query(Driver).\
filter_by(owner=value_truckcompany).\
delete(synchronize_session=False)
sqlsession.delete(value_truckcompany)
sqlsession.commit()
return redirect('/static/truckcompanyview', )
This on the other hand fixes this one location only. If you decide that a Driver has no meaning without its Truck_company, you could alter the foreign key constraint to include ON DELETE CASCADE, and use passive deletes in related SQLAlchemy ORM relationships:
class Truck_company(Base):
...
# Remember to use passive deletes with ON DELETE CASCADE
drivers = relationship('Driver', passive_deletes=True)
class Driver(Base):
...
# Let the DB handle deleting related rows
owner_id = Column(BigInteger, ForeignKey('truckcompany.id',
ondelete='CASCADE'))
Alternatively you could leave it to the SQLAlchemy ORM level cascades to remove related objects, but it seems you've had some problems with that in the past. Note that the SQLAlchemy cascades define how an operation on the parent should propagate to its children, so you define delete and optionally delete-orphan on the parent side relationship, or the one-to-many side:
class Truck_company(Base):
...
# If a truck company is deleted, delete the related drivers as well
drivers = relationship('Driver', cascade='save-update, merge, delete')
In your current model you have no relationship defined from Truck_company to Driver, so no cascades take place.
Note that modifying Driver such as:
class Driver(Base):
...
owner_id = Column(BigInteger, ForeignKey('truckcompany.id',
ondelete='CASCADE'))
will not magically migrate the existing DB table and its constraints. If you wish to take that route, you'll have to either migrate manually or using some tool.

Change SqlAlchemy declarative model table schema at runtime

I am trying to build a declarative table that runs in both postgres and sqlite. The only difference between the tables is that the postgres table is going to run within a specific schema and the sqlite one will not. So far I've gotten the tables to build without a schema with the code below.
metadata = MetaData()
class Base(object):
__table_args__ = {'schema': None}
Base = declarative_base(cls=Base, metadata=metadata)
class Configuration(Base):
"""
Object representation of a row in the configuration table
"""
__tablename__ = 'configuration'
name = Column(String(90), primary_key=True)
value = Column(String(256))
def __init__(self, name="", value=""):
self.name = name
self.value = value
def build_tables(conn_str, schema=None):
global metadata
engine = create_engine(conn_str, echo=True)
if schema:
metadata.schema=schema
metadata.create_all(engine)
However, whenever I try to set a schema in build_tables(), the schema doesn't appear to be set in the newly built tables. It only seems to work if I set the schema initially at metadata = MetaData(schema='my_project') which I don't want to do until I know which database I will be running.
Is there another way to set the table schema dynamically using the declarative model? Is changing the metadata the wrong approach?
Altho this is not 100% the answer to what you are looking for, I think #Ilja Everilä was right the answer is partly in https://stackoverflow.com/a/9299021/3727050.
What I needed to do was to "copy" a model to a new declarative_base. As a result I faced a similar problem with you: I needed to:
Change the baseclass of my model to the new Base
Turns out we also need to change the autogenerated __table__ attribute of the model to point to the new metadata. Otherwise I was getting a lot of errors when looking up PK in that table
The solution that seems to be working for me is to clone the mode the following way:
def rebase(klass, new_base):
new_dict = {
k: v
for k, v in klass.__dict__.items()
if not k.startswith("_") or k in {"__tablename__", "__table_args__"}
}
# Associate the new table with the new metadata instead
# of the old/other pool
new_dict["__table__"] = klass.__table__.to_metadata(new_base.metadata)
# Construct and return a new type
return type(klass.__name__, (new_base,), new_dict)
This in your case can be used as:
...
# Your old base
Base = declarative_base(cls=Base, metadata=metadata)
# New metadata and base
metadata2 = MetaData(schema="<new_schema>")
Base2 = declarative_base(cls=Base, metadata=metadata)
# Register Model/Table in the new base and meta
NewConfiguration = rebase(Configuration, Base2)
metadata2.create_all(engine)
Notes/Warnings:
The above code is not tested
It looks to me too verbose and hacky ... there has to be a better solution for what you need (maybe via Pool configs?)

Bulk inserts with Flask-SQLAlchemy

I'm using Flask-SQLAlchemy to do a rather large bulk insert of 60k rows. I also have a many-to-many relationship on this table, so I can't use db.engine.execute for this. Before inserting, I need to find similar items in the database, and change the insert to an update if a duplicate item is found.
I could do this check beforehand, and then do a bulk insert via db.engine.execute, but I need the primary key of the row upon insertion.
Currently, I am doing a db.session.add() and db.session.commit() on each insert, and I get a measly 3-4 inserts per second.
I ran a profiler to see where the bottleneck is, and it seems that the db.session.commit() is taking 60% of the time.
Is there some way that would allow me to make this operation faster, perhaps by grouping commits, but which would give me primary keys back?
This is what my models looks like:
class Item(db.Model):
id = db.Column(db.Integer, primary_key=True)
title = db.Column(db.String(1024), nullable=True)
created = db.Column(db.DateTime())
tags_relationship = db.relationship('Tag', secondary=tags, backref=db.backref('items', lazy='dynamic'))
tags = association_proxy('tags_relationship', 'text')
class Tag(db.Model):
id = db.Column(db.Integer, primary_key=True)
text = db.Column(db.String(255))
My insert operation is:
for item in items:
if duplicate:
update_existing_item
else:
x = Item()
x.title = "string"
x.created = datetime.datetime.utcnow()
for tag in tags:
if not tag_already_exists:
y = Tag()
y.text = "tagtext"
x.tags_relationship.append(y)
db.session.add(y)
db.session.commit()
else:
x.tags_relationship.append(existing_tag)
db.session.add(x)
db.session.commit()
Perhaps you should try to db.session.flush() to send the data to the server, which means any primary keys will be generated. At the end you can db.session.commit() to actually commit the transaction.
I use the following code to quickly read the content of a pandas DataFrame into SQLite. Note that it circumvents the ORM features of SQLAlchemy. myClass in this context is a db.Model derived class that has a tablename assigned to it. As the code snippets mentions, I adapted
l = df.to_dict('records')
# bulk save the dictionaries, circumventing the slow ORM interface
# c.f. https://gist.github.com/shrayasr/5df96d5bc287f3a2faa4
connection.engine.execute(
myClass.__table__.insert(),
l
)
from app import db
data = [{"attribute": "value"}, {...}, {...}, ... ]
db.engine.execute(YourModel.__table__.insert(), data)
for more information refer https://gist.github.com/shrayasr/5df96d5bc287f3a2faa4

Categories