I am currently using SQLAlchemy ORM to deal with my db operations. Now I have a SQL command which requires ON CONFLICT (id) DO UPDATE. The method on_conflict_do_update() seems to be the correct one to use. But the post here says the code have to switch to SQLAlchemy core and the high-level ORM functionalities are missing. I am confused by this statement since I think the code like the demo below can achieve what I want while keep the functionalities of SQLAlchemy ORM.
class Foo(Base):
...
bar = Column(Integer)
foo = Foo(bar=1)
insert_stmt = insert(Foo).values(bar=foo.bar)
do_update_stmt = insert_stmt.on_conflict_do_update(
set_=dict(
bar=insert_stmt.excluded.bar,
)
)
session.execute(do_update_stmt)
I haven't tested it on my project since it will require a huge amount of modification. Can I ask if this is the correct way to deal with ON CONFLICT (id) DO UPDATE with SQLALchemy ORM?
As noted in the documentation, the constraint= argument is
The name of a unique or exclusion constraint on the table, or the constraint object itself if it has a .name attribute.
so we need to pass the name of the PK constraint to .on_conflict_do_update().
We can get the PK constraint name via the inspection interface:
insp = inspect(engine)
pk_constraint_name = insp.get_pk_constraint(Foo.__tablename__)["name"]
print(pk_constraint_name) # tbl_foo_pkey
new_bar = 123
insert_stmt = insert(Foo).values(id=12345, bar=new_bar)
do_update_stmt = insert_stmt.on_conflict_do_update(
constraint=pk_constraint_name, set_=dict(bar=new_bar)
)
with Session(engine) as session, session.begin():
session.execute(do_update_stmt)
Related
Let's say I have a simple table. In raw SQL it looks like this:
CREATE TABLE events (id INT PRIMARY KEY) WITH (fillfactor=60);
My question is how to specify fillfactor for a table using sqlalchemy declarative base?
Of course I can achieve that by using raw SQL in sqlalchemy like execute("ALTER TABLE events SET (fillfactor=60)"), but I'm interested whether there is a way to do that using native sqlalchemy tools.
I've tried the following approach, but that didnt't work:
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class SimpleExampleTable(Base):
__tablename__ = 'events'
__table_args__ = {'comment': 'events table', "fillfactor": 60}
id = Column(Integer, primary_key=True)
TypeError: Additional arguments should be named <dialectname>_<argument>, got 'fillfactor'
Looking through documentation I've managed to find information only about fillfactor usage in indexes.
My environment:
python 3.9
sqlalchemy 1.3.22
PostgreSQL 11.6
Tbh, have the same question, and the only thing I found which is related to the fillfactor in the SQLAlchemy docs is the one with index (link to docs):
PostgreSQL allows storage parameters to be set on indexes. The storage
parameters available depend on the index method used by the index.
Storage parameters can be specified on Index using the postgresql_with
keyword argument:
Index('my_index', my_table.c.data, postgresql_with={"fillfactor": 50})
But it seems, there is no setting option where you can set the fillfactor directly for the table.
But there is still an option to run the raw SQL query (as the alembic migration, let's say):
ALTER TABLE mytable SET (fillfactor = 70);
Note that setting fillfactor on an existing table will not rearrange
the data, it will only apply to future inserts. But you can use
VACUUM to rewrite the table, which will respect the new fillfactor
setting.
The previous quote is taken from here
Extending the answer from Max Kapustin, you can use an event listener to automatically execute the ALTER TABLE statement when the table is created.
import sqlalchemy as sa
engine = sa.create_engine('postgresql:///test', echo=True, future=True)
tablename = 't65741211'
tbl = sa.Table(
tablename,
sa.MetaData(),
sa.Column('id', sa.Integer, primary_key=True),
listeners=[
(
'after_create',
sa.schema.DDL(
f"""ALTER TABLE "{tablename}" SET (fillfactor = 70)"""
),
)
],
)
tbl.drop(engine, checkfirst=True)
tbl.create(engine)
TLTR: Django does not include database names in SQL queries, can I somehow force it to do this or is there a workaround?
The long version:
I have two legacy MySQL databases (Note: I have no influence on the DB layout) for which I'm creating a readonly API using DRF on Django 1.11 and python 3.6
I'm working around the referential integrity limitation of MyISAM DBs by using the SpanningForeignKey field suggested here: https://stackoverflow.com/a/32078727/7933618
I'm trying to connect a table from DB1 to a table from DB2 via a ManyToMany through table on DB1. That's the query Django is creating:
SELECT "table_b"."id" FROM "table_b" INNER JOIN "throughtable" ON ("table_b"."id" = "throughtable"."b_id") WHERE "throughtable"."b_id" = 12345
Which of course gives me an Error "Table 'DB2.throughtable' doesn't exist" because throughtable is on DB1 and I have no idea how to force Django to prefix the tables with the DB name. The query should be:
SELECT table_b.id FROM DB2.table_b INNER JOIN DB1.throughtable ON (table_b.id = throughtable.b_id) WHERE throughtable.b_id = 12345
Models for app1 db1_app/models.py: (DB1)
class TableA(models.Model):
id = models.AutoField(primary_key=True)
# some other fields
relations = models.ManyToManyField(TableB, through='Throughtable')
class Throughtable(models.Model):
id = models.AutoField(primary_key=True)
a_id = models.ForeignKey(TableA, to_field='id')
b_id = SpanningForeignKey(TableB, db_constraint=False, to_field='id')
Models for app2 db2_app/models.py: (DB2)
class TableB(models.Model):
id = models.AutoField(primary_key=True)
# some other fields
Database router:
def db_for_read(self, model, **hints):
if model._meta.app_label == 'db1_app':
return 'DB1'
if model._meta.app_label == 'db2_app':
return 'DB2'
return None
Can I force Django to include the database name in the query? Or is there any workaround for this?
A solution exists for Django 1.6+ (including 1.11) for MySQL and sqlite backends, by option ForeignKey.db_constraint=False and explicit Meta.db_table. If the database name and table name are quoted by ' ` ' (for MySQL) or by ' " ' (for other db), e.g. db_table = '"db2"."table2"'). Then it is not quoted more and the dot is out of quoted. Valid queries are compiled by Django ORM. A better similar solution is db_table = 'db2"."table2' (that allows not only joins but it is also by one issue nearer to cross db constraint migration)
db2_name = settings.DATABASES['db2']['NAME']
class Table1(models.Model):
fk = models.ForeignKey('Table2', on_delete=models.DO_NOTHING, db_constraint=False)
class Table2(models.Model):
name = models.CharField(max_length=10)
....
class Meta:
db_table = '`%s`.`table2`' % db2_name # for MySQL
# db_table = '"db2"."table2"' # for all other backends
managed = False
Query set:
>>> qs = Table2.objects.all()
>>> str(qs.query)
'SELECT "DB2"."table2"."id" FROM DB2"."table2"'
>>> qs = Table1.objects.filter(fk__name='B')
>>> str(qs.query)
SELECT "app_table1"."id"
FROM "app_table1"
INNER JOIN "db2"."app_table2" ON ( "app_table1"."fk_id" = "db2"."app_table2"."id" )
WHERE "db2"."app_table2"."b" = 'B'
That query parsing is supported by all db backends in Django, however other necessary steps must be discussed individually by backends. I'm trying to answer more generally because I found a similar important question.
The option 'db_constraint' is necessary for migrations, because Django can not create the reference integrity constraint
ADD foreign key table1(fk_id) REFERENCES db2.table2(id),
but it can be created manually for MySQL.
A question for particular backends is if another database can be connected to the default at run-time and if a cross database foreign key is supported. These models are also writable. The indirectly connected database should be used as a legacy database with managed=False (because only one table django_migrations for migrations tracking is created only in the directly connected database. This table should describe only tables in the same database.) Indexes for foreign keys can however be created automatically on the managed side if the database system supports such indexes.
Sqlite3: It has to be attached to another default sqlite3 database at run-time (answer SQLite - How do you join tables from different databases), at best by the signal connection_created:
from django.db.backends.signals import connection_created
def signal_handler(sender, connection, **kwargs):
if connection.alias == 'default' and connection.vendor == 'sqlite':
cur = connection.cursor()
cur.execute("attach '%s' as db2" % db2_name)
# cur.execute("PRAGMA foreign_keys = ON") # optional
connection_created.connect(signal_handler)
Then it doesn't need a database router of course and a normal django...ForeignKey can be used with db_constraint=False. An advantage is that "db_table" is not necessary if the table names are unique between databases.
In MySQL foreign keys between different databases are easy. All commands like SELECT, INSERT, DELETE support any database names without attaching them previously.
This question was about legacy databases. I have however some interesting results also with migrations.
I have a similar setup with PostgreSQL. Utilizing search_path to make cross-schema references possible in Django (schema in postgres = database in mysql). Unfortunately, seems like MySQL doesn't have such a mechanism.
However, you might try your luck creating views for it. Make views in one databases that references other databases, use it to select data. I think it's the best option since you want your data read-only anyway.
It's however not a perfect solution, executing raw queries might be more useful in some cases.
UPD: Providing mode details about my setup with PostgreSQL (as requested by bounty later). I couldn't find anything like search_path in MySQL documentation.
Quick intro
PostgreSQL has Schemas. They are synonymous to MySQL databases. So if you are MySQL user, imaginatively replace word "schema" with word "database". Requests can join tables between schemas, create foreign keys, etc... Each user (role) has a search_path:
This variable [search_path] specifies the order in which schemas are searched when
an object (table, data type, function, etc.) is referenced by a simple
name with no schema specified.
Special attention on "no schema specified", because that's exactly what Django does.
Example: Legacy databases
Let's say we got coupe legacy schemas, and since we are not allowed to modify them, we also want one new schema to store the NM relation in it.
old1 is the first legacy schema, it has old1_table (which is also the model name, for convenience sake)
old2 is the second legacy schema, it has old2_table
django_schema is a new one, it will store the required NM relation
All we need to do is:
alter role django_user set search_path = django_schema, old1, old2;
This is it. Yes, that simple. Django has no names of the schemas ("databases") specified anywhere. Django actually has no idea what is going on, everything is managed by PostgreSQL behind the scenes. Since django_schema is first in the list, new tables will be created there. So the following code ->
class Throughtable(models.Model):
a_id = models.ForeignKey('old1_table', ...)
b_id = models.ForeignKey('old2_table', ...)
-> will result in a migration that creates table throughtable that references old1_table and old2_table.
Problems: if you happened to have several tables with same names, you will either need to rename them or still trick Django into using a dot inside of table names.
Django does have the ability to work with multiple databases. See https://docs.djangoproject.com/en/1.11/topics/db/multi-db/.
You can also use raw SQL queries in Django. See https://docs.djangoproject.com/en/1.11/topics/db/sql/.
I'm starting a new application and looking at using an ORM -- in particular, SQLAlchemy.
Say I've got a column 'foo' in my database and I want to increment it. In straight sqlite, this is easy:
db = sqlite3.connect('mydata.sqlitedb')
cur = db.cursor()
cur.execute('update table stuff set foo = foo + 1')
I figured out the SQLAlchemy SQL-builder equivalent:
engine = sqlalchemy.create_engine('sqlite:///mydata.sqlitedb')
md = sqlalchemy.MetaData(engine)
table = sqlalchemy.Table('stuff', md, autoload=True)
upd = table.update(values={table.c.foo:table.c.foo+1})
engine.execute(upd)
This is slightly slower, but there's not much in it.
Here's my best guess for a SQLAlchemy ORM approach:
# snip definition of Stuff class made using declarative_base
# snip creation of session object
for c in session.query(Stuff):
c.foo = c.foo + 1
session.flush()
session.commit()
This does the right thing, but it takes just under fifty times as long as the other two approaches. I presume that's because it has to bring all the data into memory before it can work with it.
Is there any way to generate the efficient SQL using SQLAlchemy's ORM? Or using any other python ORM? Or should I just go back to writing the SQL by hand?
SQLAlchemy's ORM is meant to be used together with the SQL layer, not hide it. But you do have to keep one or two things in mind when using the ORM and plain SQL in the same transaction. Basically, from one side, ORM data modifications will only hit the database when you flush the changes from your session. From the other side, SQL data manipulation statements don't affect the objects that are in your session.
So if you say
for c in session.query(Stuff).all():
c.foo = c.foo+1
session.commit()
it will do what it says, go fetch all the objects from the database, modify all the objects and then when it's time to flush the changes to the database, update the rows one by one.
Instead you should do this:
session.execute(update(stuff_table, values={stuff_table.c.foo: stuff_table.c.foo + 1}))
session.commit()
This will execute as one query as you would expect, and because at least the default session configuration expires all data in the session on commit you don't have any stale data issues.
In the almost-released 0.5 series you could also use this method for updating:
session.query(Stuff).update({Stuff.foo: Stuff.foo + 1})
session.commit()
That will basically run the same SQL statement as the previous snippet, but also select the changed rows and expire any stale data in the session. If you know you aren't using any session data after the update you could also add synchronize_session=False to the update statement and get rid of that select.
session.query(Clients).filter(Clients.id == client_id_list).update({'status': status})
session.commit()
Try this =)
There are several ways to UPDATE using sqlalchemy
1) for c in session.query(Stuff).all():
c.foo += 1
session.commit()
2) session.query(Stuff).update({"foo": Stuff.foo + 1})
session.commit()
3) conn = engine.connect()
table = Stuff.__table__
stmt = table.update().values({'foo': Stuff.foo + 'a'})
conn.execute(stmt)
conn.commit()
Here's an example of how to solve the same problem without having to map the fields manually:
from sqlalchemy import Column, ForeignKey, Integer, String, Date, DateTime, text, create_engine
from sqlalchemy.exc import IntegrityError
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.attributes import InstrumentedAttribute
engine = create_engine('postgres://postgres#localhost:5432/database')
session = sessionmaker()
session.configure(bind=engine)
Base = declarative_base()
class Media(Base):
__tablename__ = 'media'
id = Column(Integer, primary_key=True)
title = Column(String, nullable=False)
slug = Column(String, nullable=False)
type = Column(String, nullable=False)
def update(self):
s = session()
mapped_values = {}
for item in Media.__dict__.iteritems():
field_name = item[0]
field_type = item[1]
is_column = isinstance(field_type, InstrumentedAttribute)
if is_column:
mapped_values[field_name] = getattr(self, field_name)
s.query(Media).filter(Media.id == self.id).update(mapped_values)
s.commit()
So to update a Media instance, you can do something like this:
media = Media(id=123, title="Titular Line", slug="titular-line", type="movie")
media.update()
If it is because of the overhead in terms of creating objects, then it probably can't be sped up at all with SA.
If it is because it is loading up related objects, then you might be able to do something with lazy loading. Are there lots of objects being created due to references? (IE, getting a Company object also gets all of the related People objects).
Withough testing, I'd try:
for c in session.query(Stuff).all():
c.foo = c.foo+1
session.commit()
(IIRC, commit() works without flush()).
I've found that at times doing a large query and then iterating in python can be up to 2 orders of magnitude faster than lots of queries. I assume that iterating over the query object is less efficient than iterating over a list generated by the all() method of the query object.
[Please note comment below - this did not speed things up at all].
I have a SQLAlchemy ORM model that currently looks a bit like this:
Base = declarative_base()
class Database(Base):
__tablename__ = "databases"
__table_args__ = (
saschema.PrimaryKeyConstraint('db', 'role'),
{
'schema' : 'defines',
},
)
db = Column(String, nullable=False)
role = Column(String, nullable=False)
server = Column(String)
Here's the thing, in practice this model exists in multiple databases, and in those databases it'll exist in mutiple schemas. For any one operation I'll only use one (database, schema) tuple.
Right now, I can set the database engine using this:
Session = scoped_session(sessionmaker())
Session.configure(bind=my_db_engine)
# ... do my operations on the model here.
But I'm not sure how I can change the __table_args__ at execution time so that the schema will be the right one.
One option is to use the create_engine to bind the models to a schema/database rather than do so in the actual database.
#first connect to the database that holds the DB and schema
engine1 = create_engine('mysql://user:pass#db.com/schema1')
Session = session(sessionmaker())
session = Session(bind=engine1)
#fetch the first database
database = session.query(Database).first()
engine2 = create_engine('mysql://user:pass#%s/%s' % (database.DB, database.schema))
session2 = Session(bind=engine2)
I don't know that this is ideal, but it is one way to do it. If you cache the list of databases before hand then in most cases you are only having to create one session.
I'm also looking for an elegant solution for this problem. If there are standard tables/models, but different table names, databases, and schemas at runtime, how is this handled? Others have suggested writing some sort of function that takes a tablename and schema argument, and constructs the model for you. I've found that using __abstract__ helps. I've suggested a solution here that may be useful. It involves adding a Base with a specific schema/metadata into the inheritance.
I'm starting a new application and looking at using an ORM -- in particular, SQLAlchemy.
Say I've got a column 'foo' in my database and I want to increment it. In straight sqlite, this is easy:
db = sqlite3.connect('mydata.sqlitedb')
cur = db.cursor()
cur.execute('update table stuff set foo = foo + 1')
I figured out the SQLAlchemy SQL-builder equivalent:
engine = sqlalchemy.create_engine('sqlite:///mydata.sqlitedb')
md = sqlalchemy.MetaData(engine)
table = sqlalchemy.Table('stuff', md, autoload=True)
upd = table.update(values={table.c.foo:table.c.foo+1})
engine.execute(upd)
This is slightly slower, but there's not much in it.
Here's my best guess for a SQLAlchemy ORM approach:
# snip definition of Stuff class made using declarative_base
# snip creation of session object
for c in session.query(Stuff):
c.foo = c.foo + 1
session.flush()
session.commit()
This does the right thing, but it takes just under fifty times as long as the other two approaches. I presume that's because it has to bring all the data into memory before it can work with it.
Is there any way to generate the efficient SQL using SQLAlchemy's ORM? Or using any other python ORM? Or should I just go back to writing the SQL by hand?
SQLAlchemy's ORM is meant to be used together with the SQL layer, not hide it. But you do have to keep one or two things in mind when using the ORM and plain SQL in the same transaction. Basically, from one side, ORM data modifications will only hit the database when you flush the changes from your session. From the other side, SQL data manipulation statements don't affect the objects that are in your session.
So if you say
for c in session.query(Stuff).all():
c.foo = c.foo+1
session.commit()
it will do what it says, go fetch all the objects from the database, modify all the objects and then when it's time to flush the changes to the database, update the rows one by one.
Instead you should do this:
session.execute(update(stuff_table, values={stuff_table.c.foo: stuff_table.c.foo + 1}))
session.commit()
This will execute as one query as you would expect, and because at least the default session configuration expires all data in the session on commit you don't have any stale data issues.
In the almost-released 0.5 series you could also use this method for updating:
session.query(Stuff).update({Stuff.foo: Stuff.foo + 1})
session.commit()
That will basically run the same SQL statement as the previous snippet, but also select the changed rows and expire any stale data in the session. If you know you aren't using any session data after the update you could also add synchronize_session=False to the update statement and get rid of that select.
session.query(Clients).filter(Clients.id == client_id_list).update({'status': status})
session.commit()
Try this =)
There are several ways to UPDATE using sqlalchemy
1) for c in session.query(Stuff).all():
c.foo += 1
session.commit()
2) session.query(Stuff).update({"foo": Stuff.foo + 1})
session.commit()
3) conn = engine.connect()
table = Stuff.__table__
stmt = table.update().values({'foo': Stuff.foo + 'a'})
conn.execute(stmt)
conn.commit()
Here's an example of how to solve the same problem without having to map the fields manually:
from sqlalchemy import Column, ForeignKey, Integer, String, Date, DateTime, text, create_engine
from sqlalchemy.exc import IntegrityError
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.attributes import InstrumentedAttribute
engine = create_engine('postgres://postgres#localhost:5432/database')
session = sessionmaker()
session.configure(bind=engine)
Base = declarative_base()
class Media(Base):
__tablename__ = 'media'
id = Column(Integer, primary_key=True)
title = Column(String, nullable=False)
slug = Column(String, nullable=False)
type = Column(String, nullable=False)
def update(self):
s = session()
mapped_values = {}
for item in Media.__dict__.iteritems():
field_name = item[0]
field_type = item[1]
is_column = isinstance(field_type, InstrumentedAttribute)
if is_column:
mapped_values[field_name] = getattr(self, field_name)
s.query(Media).filter(Media.id == self.id).update(mapped_values)
s.commit()
So to update a Media instance, you can do something like this:
media = Media(id=123, title="Titular Line", slug="titular-line", type="movie")
media.update()
If it is because of the overhead in terms of creating objects, then it probably can't be sped up at all with SA.
If it is because it is loading up related objects, then you might be able to do something with lazy loading. Are there lots of objects being created due to references? (IE, getting a Company object also gets all of the related People objects).
Withough testing, I'd try:
for c in session.query(Stuff).all():
c.foo = c.foo+1
session.commit()
(IIRC, commit() works without flush()).
I've found that at times doing a large query and then iterating in python can be up to 2 orders of magnitude faster than lots of queries. I assume that iterating over the query object is less efficient than iterating over a list generated by the all() method of the query object.
[Please note comment below - this did not speed things up at all].