How do I execute inserts and updates in an Alembic upgrade script? - python

I need to alter data during an Alembic upgrade.
I currently have a 'players' table in a first revision:
def upgrade():
op.create_table('player',
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('name', sa.Unicode(length=200), nullable=False),
sa.Column('position', sa.Unicode(length=200), nullable=True),
sa.Column('team', sa.Unicode(length=100), nullable=True)
sa.PrimaryKeyConstraint('id')
)
I want to introduce a 'teams' table. I've created a second revision:
def upgrade():
op.create_table('teams',
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('name', sa.String(length=80), nullable=False)
)
op.add_column('players', sa.Column('team_id', sa.Integer(), nullable=False))
I would like the second migration to also add the following data:
Populate teams table:
INSERT INTO teams (name) SELECT DISTINCT team FROM players;
Update players.team_id based on players.team name:
UPDATE players AS p JOIN teams AS t SET p.team_id = t.id WHERE p.team = t.name;
How do I execute inserts and updates inside the upgrade script?

What you are asking for is a data migration, as opposed to the schema migration that is most prevalent in the Alembic docs.
This answer assumes you are using declarative (as opposed to class-Mapper-Table or core) to define your models. It should be relatively straightforward to adapt this to the other forms.
Note that Alembic provides some basic data functions: op.bulk_insert() and op.execute(). If the operations are fairly minimal, use those. If the migration requires relationships or other complex interactions, I prefer to use the full power of models and sessions as described below.
The following is an example migration script that sets up some declarative models that will be used to manipulate data in a session. The key points are:
Define the basic models you need, with the columns you'll need. You don't need every column, just the primary key and the ones you'll be using.
Within the upgrade function, use op.get_bind() to get the current connection, and make a session with it.
Or use bind.execute() to use SQLAlchemy's lower level to write SQL queries directly. This is useful for simple migrations.
Use the models and session as you normally would in your application.
"""create teams table
Revision ID: 169ad57156f0
Revises: 29b4c2bfce6d
Create Date: 2014-06-25 09:00:06.784170
"""
revision = '169ad57156f0'
down_revision = '29b4c2bfce6d'
from alembic import op
import sqlalchemy as sa
from sqlalchemy import orm
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Player(Base):
__tablename__ = 'players'
id = sa.Column(sa.Integer, primary_key=True)
name = sa.Column(sa.String, nullable=False)
team_name = sa.Column('team', sa.String, nullable=False)
team_id = sa.Column(sa.Integer, sa.ForeignKey('teams.id'), nullable=False)
team = orm.relationship('Team', backref='players')
class Team(Base):
__tablename__ = 'teams'
id = sa.Column(sa.Integer, primary_key=True)
name = sa.Column(sa.String, nullable=False, unique=True)
def upgrade():
bind = op.get_bind()
session = orm.Session(bind=bind)
# create the teams table and the players.team_id column
Team.__table__.create(bind)
op.add_column('players', sa.Column('team_id', sa.ForeignKey('teams.id'), nullable=False)
# create teams for each team name
teams = {name: Team(name=name) for name in session.query(Player.team).distinct()}
session.add_all(teams.values())
# set player team based on team name
for player in session.query(Player):
player.team = teams[player.team_name]
session.commit()
# don't need team name now that team relationship is set
op.drop_column('players', 'team')
def downgrade():
bind = op.get_bind()
session = orm.Session(bind=bind)
# re-add the players.team column
op.add_column('players', sa.Column('team', sa.String, nullable=False)
# set players.team based on team relationship
for player in session.query(Player):
player.team_name = player.team.name
session.commit()
op.drop_column('players', 'team_id')
op.drop_table('teams')
The migration defines separate models because the models in your code represent the current state of the database, while the migrations represent steps along the way. Your database might be in any state along that path, so the models might not sync up with the database yet. Unless you're very careful, using the real models directly will cause problems with missing columns, invalid data, etc. It's clearer to explicitly state exactly what columns and models you will use in the migration.

You can also use direct SQL see (Alembic Operation Reference) as in the following example:
from alembic import op
# revision identifiers, used by Alembic.
revision = '1ce7873ac4ced2'
down_revision = '1cea0ac4ced2'
branch_labels = None
depends_on = None
def upgrade():
# ### commands made by andrew ###
op.execute('UPDATE STOCK SET IN_STOCK = -1 WHERE IN_STOCK IS NULL')
# ### end Alembic commands ###
def downgrade():
# ### commands auto generated by Alembic - please adjust! ###
pass
# ### end Alembic commands ###

I recommend using SQLAlchemy core statements using an ad-hoc table, as detailed in the official documentation, because it allows the use of agnostic SQL and pythonic writing and is also self-contained. SQLAlchemy Core is the best of both worlds for migration scripts.
Here is an example of the concept:
from sqlalchemy.sql import table, column
from sqlalchemy import String
from alembic import op
account = table('account',
column('name', String)
)
op.execute(
account.update().\\
where(account.c.name==op.inline_literal('account 1')).\\
values({'name':op.inline_literal('account 2')})
)
# If insert is required
from sqlalchemy.sql import insert
from sqlalchemy import orm
bind = op.get_bind()
session = orm.Session(bind=bind)
data = {
"name": "John",
}
ret = session.execute(insert(account).values(data))
# for use in other insert calls
account_id = ret.lastrowid

Related

Alembic recreates foreign keys every time I migrate causing duplicate foreign keys on my tables

I have a flask app using sql-alchemy and flask migrate to handle database changes. Every time I run a flask migrate to create a script for Alembic to update the database, the script contains commands to create foreign keys that already exist in my database.
The table definition in my models.py is
class Airline(db.Model):
__tablename__ = 'Airlines'
AirlineID = db.Column(db.Integer, primary_key=True)
AirlineShortCode = db.Column(db.String(3), index=True, unique=True, nullable=False)
FullName = db.Column(db.String(256), unique=False, nullable=True)
ShortName = db.Column(db.String(64), unique=False, nullable=True)
class CabinClass(db.Model):
__tablename__ = 'CabinClasses'
CabinClassID = db.Column(db.Integer, primary_key=True)
AirlineShortCode = db.Column(db.ForeignKey("Airlines.AirlineShortCode"), nullable=True)
CabinClassShortCode = db.Column(db.String(32), unique=False, nullable=False)
CabinClassName = db.Column(db.String(64), unique=False, nullable=True)
The line in the migration database update script that is generated to create the foreign key is
op.create_foreign_key(None, 'CabinClasses', 'Airlines', ['AirlineShortCode'], ['AirlineShortCode'])
This line is generated every time I create the migration script, resulting in multiple foreign key entries in the CabinClasses table:
I see that the name of each foreign key created is different and that the create_foreign_key command in the database migration script states the name as None. I believe this is correct if you are using an automated naming scheme, which I believe is what happens by default
For setups that use an automated naming scheme such as that described
at Configuring Constraint Naming Conventions, name here can be None,
as the event listener will apply the name to the constraint object
when it is associated with the table
https://alembic.sqlalchemy.org/en/latest/naming.html
Can anyone identify what would cause these foreign keys to be created every time I update the database?
The names of the constraints that you are getting look like they come from your database, not SQLAlchemy. You need to add the constraint naming templates for all types of constraints to the SQLAlchemy metadata, and then I think you will get consistent names. See how to do this in the Flask-SQLAlchemy documentation. I'm copying the code example from the docs below for your convenience:
from sqlalchemy import MetaData
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
convention = {
"ix": 'ix_%(column_0_label)s',
"uq": "uq_%(table_name)s_%(column_0_name)s",
"ck": "ck_%(table_name)s_%(constraint_name)s",
"fk": "fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s",
"pk": "pk_%(table_name)s"
}
metadata = MetaData(naming_convention=convention)
db = SQLAlchemy(app, metadata=metadata)

Python SQLAlchemy: Reflecting the database breaks default/onupdate methods?

I have two separate SQLAlchemy interfaces to a Postgres database. The first interface, in the context of a Flask App, contains this model:
app = create_app() # sets the SQLAlchemy Database URI, etc.
db = SQLAlchemy(app)
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
created_at = db.Column(db.DateTime, default=datetime.datetime.utcnow)
updated_at = db.Column(db.DateTime, onupdate=datetime.datetime.utcnow)
name = db.Column(db.String, nullable=False)
The second interface is not through Flask -- rather, it's a script that listens for a particular event, in which case it is meant to perform some computations and update a row in the database. To accomplish this, I have SQLAlchemy reflect the existent database:
from sqlalchemy import create_engine, MetaData, Table
from sqlalchemy.orm import mapper, sessionmaker
from sqlalchemy.ext.automap import automap_base
from os import environ
dbPath = "postgresql://" + ...
engine = create_engine(dbPath)
Base = automap_base()
Base.prepare(engine, reflect=True)
metadata = MetaData(engine)
class User(object):
pass
users = Table('user', metadata, autoload=True, autoload_with=engine)
mapper(User, users)
Session = sessionmaker(bind=engine)
session = Session()
The issue I'm now running into is this: when I'm using the first interface to create a new entry or update one, things work as expected, and the created_at and updated_at fields are updated appropriately.
However, when I'm using the second interface -- importing the code and using session.query(User) to get an entry and to update it, the updated_at field doesn't change. Moreover, when I'm using this interface to create a new User, while it creates the new row as expected, it populates neither the created_at nor updated_at fields.
My questions:
Why is this happening? Why does the reflection seemingly break the default/onupdate methods?
How can I fix this?
default and onupdate are handled entirely client side in Python and so cannot be reflected from the DB. See "Limitations of Reflection". In case of default you could use server_default:
class User(db.Model):
...
created_at = db.Column(db.DateTime,
server_default=text("now() at time zone 'UTC'"))
and for onupdate you'd have to write a DB trigger and use server_onupdate=FetchedValue().
On the other hand you could avoid all that and just separate your models from your application code to a module, used by both your Flask application and your script. This would of course be a bit more involved as you'd have to use vanilla SQLAlchemy declarative instead of the customized db.Model base of Flask-SQLAlchemy. Or, you could use custom commands with Flask to implement your scripts, which would allow using the Flask-SQLAlchemy extensions.

Properly cascade delete with sqlalchemy association proxy

I have a self-referential relationship in sqlalchemy that is based heavily on the example found in this answer.
I have a table of users, and an association table that links a primary user to a secondary user. User A can be primary for user B, and B may or may not also be a primary for user A. It works exactly like the twitter analogy in the answer I linked above.
This works fine, except that I don't know how to establish cascade rules for an association proxy. Currently, if I delete a user, the association record remains, but it nulls out any FKs to the deleted user. I would like the delete to cascade to the association table and remove the record.
I also need to be able to disassociate users, which would only remove the association record, but would propagate to the "is_primary_of" and "is_secondary_of" association proxies of the users.
Can anyone help me figure out how to integrate these behaviors into the models that I have? Code is below. Thanks!
import sqlalchemy
import sqlalchemy.orm
import sqlalchemy.ext.declarative
import sqlalchemy.ext.associationproxy
# This is the base class from which all sqlalchemy table objects must inherit
SAModelBase = sqlalchemy.ext.declarative.declarative_base()
class UserAssociation(SAModelBase):
__tablename__ = 'user_associations'
# Columns
id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True)
# Foreign key columns
primary_user_id = sqlalchemy.Column(sqlalchemy.Integer,
sqlalchemy.ForeignKey('users.id', name='user_association_primary_user_fk'))
secondary_user_id = sqlalchemy.Column(sqlalchemy.Integer,
sqlalchemy.ForeignKey('users.id', name='user_association_secondary_user_fk'))
# Foreign key relationships
primary_user = sqlalchemy.orm.relationship('User',
foreign_keys=primary_user_id,
backref='secondary_users')
secondary_user = sqlalchemy.orm.relationship('User',
foreign_keys=secondary_user_id,
backref='primary_users')
def __init__(self, primary, secondary, **kwargs):
self.primary_user = primary
self.secondary_user = secondary
for kw,arg in kwargs.items():
setattr(self, kw, arg)
class User(SAModelBase):
__tablename__ = 'users'
# Columns
id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True)
first_name = sqlalchemy.Column(sqlalchemy.String)
last_name = sqlalchemy.Column(sqlalchemy.String)
is_primary_of = sqlalchemy.ext.associationproxy.association_proxy('secondary_users', 'secondary_user')
is_secondary_of = sqlalchemy.ext.associationproxy.association_proxy('primary_users', 'primary_user')
def associate(self, user, **kwargs):
UserAssociation(primary=self, secondary=user, **kwargs)
Turns out to be pretty straightforward. The backrefs in the original code were just strings, but they can instead be backref objects. This allows you to set cascade behavior. See the sqlalchemy documentation on backref arguments.
The only changes required here are in the UserAssociation object. It now reads:
# Foreign key columns
primary_user_id = sqlalchemy.Column(sqlalchemy.Integer,
sqlalchemy.ForeignKey('users.id',
name='user_association_primary_user_fk'),
nullable=False)
secondary_user_id = sqlalchemy.Column(sqlalchemy.Integer,
sqlalchemy.ForeignKey('users.id',
name='user_association_associated_user_fk'),
nullable=False)
# Foreign key relationships
primary_user = sqlalchemy.orm.relationship('User',
foreign_keys=primary_user_id,
backref=sqlalchemy.orm.backref('secondary_users',
cascade='all, delete-orphan'))
secondary_user = sqlalchemy.orm.relationship('User',
foreign_keys=secondary_user_id,
backref=sqlalchemy.orm.backref('primary_users',
cascade='all, delete-orphan'))
The backref keyword argument is now a backref object instead of a string. I was also able to make the foreign key columns non-nullable, since it now cascades deleted users such that the associations are deleted as well.

How to create a field with a list of foreign keys in SQLAlchemy?

I am trying to store a list of models within the field of another model. Here is a trivial example below, where I have an existing model, Actor, and I want to create a new model, Movie, with the field Movie.list_of_actors:
import uuid
from sqlalchemy import Boolean, Column, Integer, String, DateTime
from sqlalchemy.schema import ForeignKey
rom sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
Base = declarative_base()
class Actor(Base):
__tablename__ = 'actors'
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
name = Column(String)
nickname = Column(String)
academy_awards = Column(Integer)
# This is my new model:
class Movie(Base):
__tablename__ = 'movies'
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
title = Column(String)
# How do I make this a list of foreign keys???
list_of_actors = Column(UUID(as_uuid=True), ForeignKey('actors.id'))
I understand that this can be done with a many-to-many relationship, but is there a more simple solution? Note that I don't need to look up which Movie's an Actor is in - I just want to create a new Movie model and access the list of my Actor's. And ideally, I would prefer not to add any new fields to my Actor model.
I've gone through the tutorials using the relationships API, which outlines the various one-to-many/many-to-many combinations using back_propagates and backref here: http://docs.sqlalchemy.org/en/latest/orm/basic_relationships.html But I can't seem to implement my list of foreign keys without creating a full-blown many-to-many implementation.
But if a many-to-many implementation is the only way to proceed, is there a way to implement it without having to create an "association table"? The "association table" is described here: http://docs.sqlalchemy.org/en/latest/orm/basic_relationships.html#many-to-many ? Either way, an example would be very helpful!
Also, if it matters, I am using Postgres 9.5. I see from this post there might be support for arrays in Postgres, so any thoughts on that could be helpful.
Update
It looks like the only reasonable approach here is to create an association table, as shown in the selected answer below. I tried using ARRAY from SQLAlchemy's Postgres Dialect but it doesn't seem to support Foreign Keys. In my example above, I used the following column:
list_of_actors = Column('actors', postgresql.ARRAY(ForeignKey('actors.id')))
but it gives me an error. It seems like support for Postgres ARRAY with Foreign Keys is in progress, but still isn't quite there. Here is the most up to date source of information that I found: http://blog.2ndquadrant.com/postgresql-9-3-development-array-element-foreign-keys/
If you want many actors to be associated to a movie, and many movies be associated to an actor, you want a many-to-many. This means you need an association table. Otherwise, you could chuck away normalisation and use a NoSQL database.
An association table solution might resemble:
class Actor(Base):
__tablename__ = 'actors'
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
name = Column(String)
nickname = Column(String)
academy_awards = Column(Integer)
class Movie(Base):
__tablename__ = 'movies'
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
title = Column(String)
actors = relationship('ActorMovie', uselist=True, backref='movies')
class ActorMovie(Base):
__tablename__ = 'actor_movies'
actor_id = Column(UUID(as_uuid=True), ForeignKey('actors.id'))
movie_id = Column(UUID(as_uuid=True), ForeignKey('movies.id'))
If you don't want ActorMovie to be an object inheriting from Base, you could use sqlachlemy.schema.Table.

How should i use sqlalchemy to access the schema

I have a file called models.py in which i state:
Base = declarative_base()
class Vehicle(Base):
__tablename__ = 'vehicle'
id = Column(Integer, primary_key=True)
code = Column(String(15), nullable=False)
description = Column(String(100), default='')
vehicletype_id = ForeignKey('VehicleType.id')
Base.metadata.create_all(engine)
Which creates the database tables in my PostgreSQL database.
In my app.py should i now use:
from models.py import Vehicle
<do something with the Vehicle object>
or should i use something like:
meta = MetaData()
meta.reflect(bind=engine)
vehicle = meta.tables['vehicle']
when i want to access the schema of the table and the data in the database in that table.
I want to be able to create an API call (flask-jsonrpc) that gives the schema of a table , and another API call that returns the data from that table in the PostgreSQL database.
Since you're already using declarative ORM approach (by declaring your Vehicle class), there is no point to reflect it. Reflection is normally used when you're dealing with existing database and advanced features (such as defining custom relationships) are not important to you.

Categories