Alembic bulk_insert to table with schema - python

I'm looking at the example at bulk_insert.
# Create an ad-hoc table to use for the insert statement.
accounts_table = table('account',
column('id', Integer),
column('name', String),
column('create_date', Date)
)
I would like to bulk insert to a specific schema called kpi but I can't figure out how to do it. Can someone help me out?

The answer I've found so far is to use the sqlalchemy function instead like below:
accounts_table = sa.Table('accounts_table', sa.MetaData(),
sa.Column('id', sa.Integer),
sa.Column('name', sa.String(100)),
sa.Column('create_date', sa.Date),
schema='phone'
)

Related

SQLAlchemy: ForeignKey across schemas

In my postgres server we have a database database with 2 schemas: public and api.
public has several tables, and I need to create a table in api with a foreign key to a table in public called model.
So it's:
-Schemas
--public
---tables
----models
--api
---tables
Using SQLAlchemy I have the following class:
from sqlalchemy import create_engine, MetaData, Table, Column
class __PostgresService:
def __init__(self):
self.__client = create_engine("postgresql://postgres#localhost:5432/database")
metadata = MetaData(self.__client, schema="public")
self.__table = Table("training", metadata,
Column("id", String, primary_key=True, nullable=False),
Column("model_id", ForeignKey("model.id"), nullable=False),
schema="api")
metadata.create_all()
postgres_service = __PostgresService()
However upon launch I receive the following error:
sqlalchemy.exc.NoReferencedTableError: Foreign key associated with column 'training.model_id' could not find table 'public.model' with which to generate a foreign key to target column 'id'
It seems it does look for the correct thing but can't find it? I'm very confused as to why this is happening, especially because the error refers to not finding "public", which is created by default by postgres, rather than "api" which I created myself in pgAdmin.
Am I missing some cruicial config?
The error you are getting means that you are trying to create a foreign key referencing a table that SQLAlchemy does not know about. You can tell sqlalchemy about it by creating a Table associated with the same MetaData describing the referenced table. You can also do this using sqlalchemy's reflection capabilities. For example:
from sqlalchemy import create_engine, MetaData, Table, Column
class __PostgresService:
def __init__(self):
self.__client = create_engine("postgresql://postgres#localhost:5432/database")
metadata = MetaData(self.__client, schema="public")
metadata.reflect(schema="public", only=["model"])
self.__table = Table("training", metadata,
Column("id", String, primary_key=True, nullable=False),
Column("model_id", ForeignKey("model.id"), nullable=False),
schema="api")
metadata.create_all()
postgres_service = __PostgresService()
By default, MetaData.create_all() will check for the existence of tables first, before creating them, but you can also specify the exact tables to create: metadata.create_all(tables=[self.__table])

Alembic operations with multiple schemas

NOTE: This is a question I have already found the answer for, but wanted to share it so it could help other people facing the same problem.
I was trying to perform some alembic operations in my multi-schema postgresql database such as .add_column or .alter_table (although it will be the same question for .create_table or .drop_table). For example: op.add_column('table_name', 'new_column_name')
However, I was getting the same error saying basically that the table name could not be found. This, as far as I understand it, is caused because alembic is not recognizing the schema and is searching for that table in the public schema. Then, I tried to specify the schema in the table_name as 'schema_name.table_name' but had no luck.
I came across similar questions Perform alembic upgrade in multiple schemas or Alembic support for multiple Postgres schemas, but didn't find a satisfactory answer.
After searching for it into the alembic documentation, I found that there is actually an schema argument for the different operations. For example:
op.add_column('table_name', 'column_name', schema='schema_name')
Alembic will automatically pick up the schema from a table if it is already defined in a declarative SQLAlchemy model.
For example, with the following setup:
# models.py
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class SomeClass(Base):
__tablename__ = 'some_table'
id = Column(Integer, primary_key=True)
name = Column(String(50))
__table_args__ = {"schema": "my_schema"}
# alembic/env.py
from models import Base
target_metadata = Base.metadata
[...]
Running:
alembic revision --autogenerate -m "test"
Would result in a default migration script with a schema specified:
def upgrade_my_db():
# ### commands auto generated by Alembic - please adjust! ###
op.create_table('some_table',
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('name', sa.String(length=50), nullable=True),
sa.PrimaryKeyConstraint('id'),
schema='my_schema'
)
# ### end Alembic commands ###

Get existing table using SQLAlchemy MetaData

I have a table that already exists:
USERS_TABLE = Table("users", META_DATA,
Column("id", Integer, Sequence("user_id_seq"), primary_key=True),
Column("first_name", String(255)),
Column("last_name", String(255))
)
I created this table by running this:
CONN = create_engine(DB_URL, client_encoding="UTF-8")
META_DATA = MetaData(bind=CONN, reflect=True)
# ... table code
META_DATA.create_all(CONN, checkfirst=True)
the first time it worked and I was able to create the table. However, the 2nd time around I got this error:
sqlalchemy.exc.InvalidRequestError: Table 'users' is already defined for this MetaData instance. Specify 'extend_existing=True' to redefine options and columns on an existing Table object.
which makes sense since the table users already exists. I'm able to see if the table exists like so:
TABLE_EXISTS = CONN.dialect.has_table(CONN, "users")
However, how do I actually get the existing table object? I can't find this anywhere in the documentation. Please help.
We have 3 different approaches here:
assume that required tables have been created already, reflecting them and getting with MetaData.tables dictionary field like
from sqlalchemy import MetaData, create_engine
CONN = create_engine(DB_URL, client_encoding="UTF-8")
META_DATA = MetaData(bind=CONN, reflect=True)
USERS_TABLE = META_DATA.tables['users']
removing reflect flag from MetaData object initialization, because we don't use it and moreover – trying to create tables that've been already reflected:
from sqlalchemy import (MetaData, Table, Column, Integer, String, Sequence,
create_engine)
CONN = create_engine('sqlite:///db.sql')
META_DATA = MetaData(bind=CONN)
USERS_TABLE = Table("users", META_DATA,
Column("id", Integer, Sequence("user_id_seq"),
primary_key=True),
Column("first_name", String(255)),
Column("last_name", String(255)))
META_DATA.create_all(CONN, checkfirst=True)
assuming that we are keeping reflected table if it was previously created by setting in Table object initializer keep_existing flag to True:
from sqlalchemy import (MetaData, Table, Column, Integer, String, Sequence,
create_engine)
CONN = create_engine('sqlite:///db.sql')
META_DATA = MetaData(bind=CONN, reflect=True)
USERS_TABLE = Table("users", META_DATA,
Column("id", Integer, Sequence("user_id_seq"),
primary_key=True),
Column("first_name", String(255)),
Column("last_name", String(255)),
keep_existing=True)
META_DATA.create_all(CONN, checkfirst=True)
Which one to choose? Depends on your use case, but I prefer second one since it looks like you aren't using reflection, also it is simplest modification: just removing flag from MetaData initializer.
P. S.
we can always make reflection after initialization of MetaData object with MetaData.reflect method:
META_DATA.reflect()
also we can specify which tables to reflect with only parameter (may be any iterable of str objects):
META_DATA.reflect(only=['users'])
and many more.
This works for me pretty well -
import sqlalchemy as db
engine = db.create_engine("your_connection_string")
meta_data = db.MetaData(bind=engine)
db.MetaData.reflect(meta_data)
USERS = meta_data.tables['users']
# View the columns present in the users table
print(USERS.columns)
# You can run sqlalchemy queries
query = db.select([
USERS.c.id,
USERS.c.first_name,
USERS.c.last_name,
])
result = engine.execute(query).fetchall()
Note that using reflect parameter in Metadata(bind=engine, reflect=True) is deprecated and will be removed in a future release. Above code takes care of it.
__table_args__ = {'extend_existing': True}
right below __tablename__
If you're using async Sqlalchemy, you can use
metadata = MetaData()
async with engine.connect() as conn:
await conn.run_sync(metadata.reflect, only=["harshit_table"])
harshit_table = Table("harshit_table", metadata, autoload_with=engine)
print("tables: ", harshit_table, type(harshit_table))
I'm quite new to this, but what worked for me was this (variables are declared in the original question)
USERS_TABLE_NEW = Table("users", META_DATA, autoload_with=CONN)

Does the Storm or SQLAlchemy ORM allow creating schema's from an existing database?

Considering Storm, a python ORM, I would like to automatically generate the schema for a (mysql) database. The home page states
"Storm works well with existing database schemas." ( https://storm.canonical.com/FrontPage ),
hence I was hoping to not having to create model classes. However, the 'getting started' tutorial ( https://storm.canonical.com/Tutorial ) suggests that a class, like the one below, needs to be created manually for each table, and each field needs to be manually specified:
class Person(object):
__storm_table__ = "person"
id = Int(primary=True)
name = Unicode()
Alternatively, SQLAlchemy doesn't seem to support a reverse engineering feature either, but does need a schema like this one:
user = Table('user', metadata,
Column('user_id', Integer, primary_key = True),
Column('user_name', String(16), nullable = False),
Column('email_address', String(60)),
Column('password', String(20), nullable = False)
)
Of course, these classes/schemas make sense, since each table will likely represent some 'object of interest' and one can extend them with all kinds of functionality. However, they are tedious to create, and their (initial) content is straight forward when a database already exists.
One ORM that does allow for reverse engineering is:
http://docs.doctrine-project.org/en/2.0.x/reference/tools.html
Are there similar reverse engineering tools for Storm or SQLAlchemy or any python ORM or python database fancyfier?
I am not aware of how Storm manages this process, but you can certainly reflect tables in a database with sqlalchemy. For example, below is a basic example using a SQL Server instance that I have access to at the moment.
AN ENTIRE DATABASE
>>> from sqlalchemy import create_engine, MetaData
>>> engine = create_engine('mssql+pyodbc://<username>:<password>#<host>/<database>') # replace <username> with user name etc.
>>> meta = MetaData()
>>> meta.reflect(bind=engine)
>>> funds_table = meta.tables['funds'] # tables are stored in meta.tables dict
>>> funds_table # now stores database schema object
Table(u'funds', MetaData(bind=None), Column(u'fund_token', INTEGER(), table=<funds>, primary_key=True, nullable=False), Column(u'award_year_token', INTEGER(), ForeignKey(u'award_year_defn.award_year_token'), table=<funds>, nullable=False), ... Column(u'fin_aid_disclosure_category', VARCHAR(length=3, collation=u'SQL_Latin1_General_CP1_CI_AS'), table=<funds>), Column(u'report_as_additional_unsub', BIT(), table=<funds>, server_default=DefaultClause(<sqlalchemy.sql.expression.TextClause object at 0x000000000545B6D8>, for_update=False)), schema=None)
If you merely want to reflect one table at a time, you can use the following code instead.
ONE TABLE AT A TIME (much faster)
>>> from sqlalchemy import Table, create_engine, MetaData
>>> engine = create_engine('mssql+pyodbc://<username>:<password>#<host>/<database>')
>>> meta = MetaData()
>>> funds_table = Table('funds', meta, autoload=True, autoload_with=engine) # indicate table name (here 'funds') with a string passed to Table as the first argument
>>> funds_table # now stores database schema object
Table(u'funds', MetaData(bind=None), Column(u'fund_token', INTEGER(), table=<funds>, primary_key=True, nullable=False), Column(u'award_year_token', INTEGER(), ForeignKey(u'award_year_defn.award_year_token'), table=<funds>, nullable=False), ... Column(u'fin_aid_disclosure_category', VARCHAR(length=3, collation=u'SQL_Latin1_General_CP1_CI_AS'), table=<funds>), Column(u'report_as_additional_unsub', BIT(), table=<funds>, server_default=DefaultClause(<sqlalchemy.sql.expression.TextClause object at 0x000000000545B6D8>, for_update=False)), schema=None)
As you can probably imagine, you can then save the relevant tables' data for accessing the tables more quickly again in the future.

Creating seed data in a flask-migrate or alembic migration

How can I insert some seed data in my first migration? If the migration is not the best place for this, then what is the best practice?
"""empty message
Revision ID: 384cfaaaa0be
Revises: None
Create Date: 2013-10-11 16:36:34.696069
"""
# revision identifiers, used by Alembic.
revision = '384cfaaaa0be'
down_revision = None
from alembic import op
import sqlalchemy as sa
def upgrade():
### commands auto generated by Alembic - please adjust! ###
op.create_table('list_type',
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('name', sa.String(length=80), nullable=False),
sa.PrimaryKeyConstraint('id'),
sa.UniqueConstraint('name')
)
op.create_table('job',
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('list_type_id', sa.Integer(), nullable=False),
sa.Column('record_count', sa.Integer(), nullable=False),
sa.Column('status', sa.Integer(), nullable=False),
sa.Column('sf_job_id', sa.Integer(), nullable=False),
sa.Column('created_at', sa.DateTime(), nullable=False),
sa.Column('compressed_csv', sa.LargeBinary(), nullable=True),
sa.ForeignKeyConstraint(['list_type_id'], ['list_type.id'], ),
sa.PrimaryKeyConstraint('id')
)
### end Alembic commands ###
# ==> INSERT SEED DATA HERE <==
def downgrade():
### commands auto generated by Alembic - please adjust! ###
op.drop_table('job')
op.drop_table('list_type')
### end Alembic commands ###
Alembic has, as one of its operation, bulk_insert(). The documentation gives the following example (with some fixes I've included):
from datetime import date
from sqlalchemy.sql import table, column
from sqlalchemy import String, Integer, Date
from alembic import op
# Create an ad-hoc table to use for the insert statement.
accounts_table = table('account',
column('id', Integer),
column('name', String),
column('create_date', Date)
)
op.bulk_insert(accounts_table,
[
{'id':1, 'name':'John Smith',
'create_date':date(2010, 10, 5)},
{'id':2, 'name':'Ed Williams',
'create_date':date(2007, 5, 27)},
{'id':3, 'name':'Wendy Jones',
'create_date':date(2008, 8, 15)},
]
)
Note too that the alembic has an execute() operation, which is just like the normal execute() function in SQLAlchemy: you can run any SQL you wish, as the documentation example shows:
from sqlalchemy.sql import table, column
from sqlalchemy import String
from alembic import op
account = table('account',
column('name', String)
)
op.execute(
account.update().\
where(account.c.name==op.inline_literal('account 1')).\
values({'name':op.inline_literal('account 2')})
)
Notice that the table that is being used to create the metadata that is used in the update statement is defined directly in the schema. This might seem like it breaks DRY (isn't the table already defined in your application), but is actually quite necessary. If you were to try to use the table or model definition that is part of your application, you would break this migration when you make changes to your table/model in your application. Your migration scripts should be set in stone: a change to a future version of your models should not change migrations scripts. Using the application models will mean that the definitions will change depending on what version of the models you have checked out (most likely the latest). Therefore, you need the table definition to be self-contained in the migration script.
Another thing to talk about is whether you should put your seed data into a script that runs as its own command (such as using a Flask-Script command, as shown in the other answer). This can be used, but you should be careful about it. If the data you're loading is test data, then that's one thing. But I've understood "seed data" to mean data that is required for the application to work correctly. For example, if you need to set up records for "admin" and "user" in the "roles" table. This data SHOULD be inserted as part of the migrations. Remember that a script will only work with the latest version of your database, whereas a migration will work with the specific version that you are migrating to or from. If you wanted a script to load the roles info, you could need a script for every version of the database with a different schema for the "roles" table.
Also, by relying on a script, you would make it more difficult for you to run the script between migrations (say migration 3->4 requires that the seed data in the initial migration to be in the database). You now need to modify Alembic's default way of running to run these scripts. And that's still not ignoring the problems with the fact that these scripts would have to change over time, and who knows what version of your application you have checked out from source control.
Migrations should be limited to schema changes only, and not only that, it is important that when a migration up or down is applied that data that existed in the database from before is preserved as much as possible. Inserting seed data as part of a migration may mess up pre-existing data.
As most things with Flask, you can implement this in many ways. Adding a new command to Flask-Script is a good way to do this, in my opinion. For example:
#manager.command
def seed():
"Add seed data to the database."
db.session.add(...)
db.session.commit()
So then you run:
python manager.py seed
MarkHildreth has supplied an excellent explanation of how alembic can handle this. However, the OP was specifically about how to modify a flask-migration migration script. I'm going to post an answer to that below to save people the time of having to look into alembic at all.
Warning
Miguel's answer is accurate with respect to normal database information. That is to say, one should follow his advice and absolutely not use this approach to populate a database with "normal" rows. This approach is specifically for database rows which are required for the application to function, a kind of data which I think of as "seed" data.
OP's script modified to seed data:
"""empty message
Revision ID: 384cfaaaa0be
Revises: None
Create Date: 2013-10-11 16:36:34.696069
"""
# revision identifiers, used by Alembic.
revision = '384cfaaaa0be'
down_revision = None
from alembic import op
import sqlalchemy as sa
def upgrade():
### commands auto generated by Alembic - please adjust! ###
list_type_table = op.create_table('list_type',
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('name', sa.String(length=80), nullable=False),
sa.PrimaryKeyConstraint('id'),
sa.UniqueConstraint('name')
)
op.create_table('job',
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('list_type_id', sa.Integer(), nullable=False),
sa.Column('record_count', sa.Integer(), nullable=False),
sa.Column('status', sa.Integer(), nullable=False),
sa.Column('sf_job_id', sa.Integer(), nullable=False),
sa.Column('created_at', sa.DateTime(), nullable=False),
sa.Column('compressed_csv', sa.LargeBinary(), nullable=True),
sa.ForeignKeyConstraint(['list_type_id'], ['list_type.id'], ),
sa.PrimaryKeyConstraint('id')
)
### end Alembic commands ###
op.bulk_insert(
list_type_table,
[
{'name':'best list'},
{'name': 'bester list'}
]
)
def downgrade():
### commands auto generated by Alembic - please adjust! ###
op.drop_table('job')
op.drop_table('list_type')
### end Alembic commands ###
Context for those new to flask_migrate
Flask migrate generates migration scripts at migrations/versions. These scripts are run in order on a database in order to bring it up to the latest version. The OP includes an example of one of these auto-generated migration scripts. In order to add seed data, one must manually modify the appropriate auto-generated migration file. The code I have posted above is an example of that.
What changed?
Very little. You will note that in the new file I am storing the table returned from create_table for list_type in a variable called list_type_table. We then operate on that table using op.bulk_insert to create a few example rows.
You can also use Python's faker library which may be a bit quicker as you don't need to come up with any data yourself. One way of configuring it would be to put a method in a class that you wanted to generate data for as shown below.
from extensions import bcrypt, db
class User(db.Model):
# this config is used by sqlalchemy to store model data in the database
__tablename__ = 'users'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(150))
email = db.Column(db.String(100), unique=True)
password = db.Column(db.String(100))
def __init__(self, name, email, password, fav_movie):
self.name = name
self.email = email
self.password = password
#classmethod
def seed(cls, fake):
user = User(
name = fake.name(),
email = fake.email(),
password = cls.encrypt_password(fake.password()),
)
user.save()
#staticmethod
def encrypt_password(password):
return bcrypt.generate_password_hash(password).decode('utf-8')
def save(self):
db.session.add(self)
db.session.commit()
And then implement a method that calls the seed method which could look something like this:
from faker import Faker
from users.models import User
fake = Faker()
for _ in range(100):
User.seed(fake)
If you prefer to have a separate function to seed your data, you could do something like this:
from alembic import op
import sqlalchemy as sa
from models import User
def upgrade():
op.create_table('users',
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('name', sa.String(length=80), nullable=False),
sa.PrimaryKeyConstraint('id'),
sa.UniqueConstraint('name')
)
# data seed
seed()
def seed():
op.bulk_insert(User.__table__,
[
{'name': 'user1'},
{'name': 'user2'},
...
]
)
This way, you don't need to save the return of create_table into a separate variable to then pass it on to bulk_insert.

Categories