I'm trying to implement dynamic filtering using SQLAlchemy ORM.
I was looking through StackOverflow and found very similar question:SQLALchemy dynamic filter_by
It's useful for me, but not enough.
So, here is some example of code, I'm trying to write:
# engine - MySQL engine
session_maker = sessionmaker(bind=engine)
session = session_maker()
# my custom model
model = User
def get_query(session, filters):
if type(filters) == tuple:
query = session.query(model).filter(*filters)
elif type(filters) == dict:
query = session.query(model).filter(**filters)
return query
then I'm trying to reuse it with something very similar:
filters = (User.name == 'Johny')
get_query(s, filters) # it works just fine
filters = {'name': 'Johny'}
get_query(s, filters)
After the second run, there are some issues:
TypeError: filter() got an unexpected keyword argument 'name'
When I'm trying to change my filters to:
filters = {User.name: 'Johny'}
it returns:
TypeError: filter() keywords must be strings
But it works fine for manual querying:
s.query(User).filter(User.name == 'Johny')
What is wrong with my filters?
BTW, it looks like it works fine for case:
filters = {'name':'Johny'}
s.query(User).filter_by(**filters)
But following the recommendations from mentioned post I'm trying to use just filter.
If it's just one possible to use filter_by instead of filter, is there any differences between these two methods?
Your problem is that filter_by takes keyword arguments, but filter takes expressions. So expanding a dict for filter_by **mydict will work. With filter, you normally pass it one argument, which happens to be an expression. So when you expand your **filters dict to filter, you pass filter a bunch of keyword arguments that it doesn't understand.
If you want to build up a set of filters from a dict of stored filter args, you can use the generative nature of the query to keep applying filters. For example:
# assuming a model class, User, with attributes, name_last, name_first
my_filters = {'name_last':'Duncan', 'name_first':'Iain'}
query = session.query(User)
for attr,value in my_filters.iteritems():
query = query.filter( getattr(User,attr)==value )
# now we can run the query
results = query.all()
The great thing about the above pattern is you can use it across multiple joined columns, you can construct 'ands' and 'ors' with and_ and or_, you can do <= or date comparisons, whatever. It's much more flexible than using filter_by with keywords. The only caveat is that for joins you have to be a bit careful you don't accidentally try to join a table twice, and you might have to specify the join condition for complex filtering. I use this in some very complex filtering over a pretty involved domain model and it works like a charm, I just keep a dict going of entities_joined to keep track of the joins.
I have a similar issue, tried to filter from a dictionary:
filters = {"field": "value"}
Wrong:
...query(MyModel).filter(**filters).all()
Good:
...query(MyModel).filter_by(**filters).all()
FWIW, There's a Python library designed to solve this exact problem: sqlalchemy-filters
It allows to dynamically filter using all operators, not only ==.
from sqlalchemy_filters import apply_filters
# `query` should be a SQLAlchemy query object
filter_spec = [{'field': 'name', 'op': '==', 'value': 'name_1'}]
filtered_query = apply_filters(query, filter_spec)
more_filters = [{'field': 'foo_id', 'op': 'is_not_null'}]
filtered_query = apply_filters(filtered_query, more_filters)
result = filtered_query.all()
For the people using FastAPI and SQLAlchemy, here is a example of dynamic filtering:
api/app/app/crud/order.py
from typing import Optional
from pydantic import UUID4
from sqlalchemy.orm import Session
from app.crud.base import CRUDBase
from app.models.order import Order
from app.schemas.order import OrderCreate, OrderUpdate
class CRUDOrder(CRUDBase[Order, OrderCreate, OrderUpdate]):
def get_orders(
self,
db: Session,
owner_id: UUID4,
status: str,
trading_type: str,
pair: str,
skip: int = 0,
limit: int = 100,
) -> Optional[Order]:
filters = {
arg: value
for arg, value in locals().items()
if arg != "self" and arg != "db" and arg != "skip" and arg != "limit" and value is not None
}
query = db.query(self.model)
for attr, value in filters.items():
query = query.filter(getattr(self.model, attr) == value)
return (
query
.offset(skip)
.limit(limit)
.all()
)
order = CRUDOrder(Order)
class Place(db.Model):
id = db.Column(db.Integer, primary_key=True)
search_id = db.Column(db.Integer, db.ForeignKey('search.id'), nullable=False)
#classmethod
def dynamic_filter(model_class, filter_condition):
'''
Return filtered queryset based on condition.
:param query: takes query
:param filter_condition: Its a list, ie: [(key,operator,value)]
operator list:
eq for ==
lt for <
ge for >=
in for in_
like for like
value could be list or a string
:return: queryset
'''
__query = db.session.query(model_class)
for raw in filter_condition:
try:
key, op, value = raw
except ValueError:
raise Exception('Invalid filter: %s' % raw)
column = getattr(model_class, key, None)
if not column:
raise Exception('Invalid filter column: %s' % key)
if op == 'in':
if isinstance(value, list):
filt = column.in_(value)
else:
filt = column.in_(value.split(','))
else:
try:
attr = list(filter(lambda e: hasattr(column, e % op), ['%s', '%s_', '__%s__']))[0] % op
except IndexError:
raise Exception('Invalid filter operator: %s' % op)
if value == 'null':
value = None
filt = getattr(column, attr)(value)
__query = __query.filter(filt)
return __query
Execute like:
places = Place.dynamic_filter([('search_id', 'eq', 1)]).all()
Related
I have the following api call to tda-api
orders = client.get_account(config.account_id,fields=['positions'])
Gives the Error:
File "/opt/anaconda3/lib/python3.7/site-packages/tda/client/base.py", line 361, in get_account
fields = self.convert_enum_iterable(fields, self.Account.Fields)
File "/opt/anaconda3/lib/python3.7/site-packages/tda/utils.py", line 66, in convert_enum_iterable
self.type_error(value, required_enum_type)
File "/opt/anaconda3/lib/python3.7/site-packages/tda/utils.py", line 41, in type_error
possible_members_message))
ValueError: expected type "Fields", got type "str". (initialize with enforce_enums=False to disable this checking)
the documentation follows:
Client.get_account(account_id, *, fields=None)
and if i replace with:
client.get_account(config.account_id,fields=positions)
'positions' is not defined
And if i look into the api the code for the get_account() function looks like:
class Fields(Enum):
'''Account fields passed to :meth:`get_account` and
:meth:`get_accounts`'''
POSITIONS = 'positions'
ORDERS = 'orders'
def get_account(self, account_id, *, fields=None):
fields = self.convert_enum_iterable(fields, self.Account.Fields)
params = {}
if fields:
params['fields'] = ','.join(fields)
Most likely not the correct way to go about this but I found a fix for now.
I commented out the following that checks for the type in the utils.py file for the tda package.
def convert_enum_iterable(self, iterable, required_enum_type):
if iterable is None:
return None
if isinstance(iterable, required_enum_type):
return [iterable.value]
values = []
for value in iterable:
if isinstance(value, required_enum_type):
values.append(value.value)
# elif self.enforce_enums:
# self.type_error(value, required_enum_type)
else:
values.append(value)
return values
I believe if you create your own client you can set this property to false as well.
And then I am able to use the following to get my current positions:
data = client.get_account(config.account_id,fields=['positions'])
At one point, I had these working but haven't tried in a while. I do recall it being difficult to decipher what a field type was. Depending on your import statements, you might need the entire tda.client.Client.Account.Fields.POSITIONS, for example.
r_acct_orders = client.get_account(config.ACCOUNT_ID, fields=tda.client.Client.Account.Fields.ORDERS).json() # field options None, ORDERS, POSITIONS
r = client.get_account(config.ACCOUNT_ID, fields=tda.client.Client.Account.Fields.POSITIONS).json() # field options None, ORDERS, POSITIONS
r = client.get_accounts(fields=tda.client.Client.Account.Fields.ORDERS).json() # field options None, ORDERS, POSITIONS
r = client.get_accounts(fields=None).json() # field options None, ORDERS, POSITIONS
Also, I use a config.py for account_id but you can just use your syntax as needed.
A simple query looks like this
User.query.filter(User.name == 'admin')
In my code, I need to check the parameters that are being passed and then filter the results from the database based on the parameter.
For example, if the User table contains columns like username, location and email, the request parameter can contain either one of them or can have combination of columns. Instead of checking each parameter as shown below and chaining the filter, I'd like to create one dynamic query string which can be passed to one filter and can get the results back. I'd like to create a separate function which will evaluate all parameters and will generate a query string. Once the query string is generated, I can pass that query string object and get the desired result. I want to avoid using RAW SQL query as it defeats the purpose of using ORM.
if location:
User.query.filter(User.name == 'admin', User.location == location)
elif email:
User.query.filter(User.email == email)
You can apply filter to the query repeatedly:
query = User.query
if location:
query = query.filter(User.location == location)
if email:
query = query.filter(User.email == email)
If you only need exact matches, there’s also filter_by:
criteria = {}
# If you already have a dict, there are easier ways to get a subset of its keys
if location: criteria['location'] = location
if email: criteria['email'] = email
query = User.query.filter_by(**criteria)
If you don’t like those for some reason, the best I can offer is this:
from sqlalchemy.sql.expression import and_
def get_query(table, lookups, form_data):
conditions = [
getattr(table, field_name) == form_data[field_name]
for field_name in lookups if form_data[field_name]
]
return table.query.filter(and_(*conditions))
get_query(User, ['location', 'email', ...], form_data)
Late to write an answer but if anyone is looking for the answer then sqlalchemy-json-querybuilder can be useful. It can be installed as -
pip install sqlalchemy-json-querybuilder
e.g.
filter_by = [{
"field_name": "SomeModel.field1",
"field_value": "somevalue",
"operator": "contains"
}]
order_by = ['-SomeModel.field2']
results = Search(session, "pkg.models", (SomeModel,), filter_by=filter_by,order_by=order_by, page=1, per_page=5).results
https://github.com/kolypto/py-mongosql/
MongoSQL is a query builder that uses JSON as the input.
Capable of:
Choosing which columns to load
Loading relationships
Filtering using complex conditions
Ordering
Pagination
Example:
{
project: ['id', 'name'], // Only fetch these columns
sort: ['age+'], // Sort by age, ascending
filter: {
// Filter condition
sex: 'female', // Girls
age: { $gte: 18 }, // Age >= 18
},
join: ['user_profile'], // Load the 'user_profile' relationship
limit: 100, // Display 100 per page
skip: 10, // Skip first 10 rows
}
I have a table with equipment and each of them has dates for level of maintenance. The user can select the maintenance level. So, I should adjust my SQLAlchemy for each combination of maintenance level chosen. For example:
SELECT * WHERE (equipment IN []) AND m_level1 = DATE AND m_level2 = DATE ....)
So it is possible to have combinations for each if condition, depending on checkboxes I used multiple strings to reach my goal, but I want to improve the query using SQLAlchemy.
I assume you are using the ORM.
in that case, the filter function returns a query object. You can conditionaly build the query by doing something like
query = Session.query(schema.Object).filter_by(attribute=value)
if condition:
query = query.filter_by(condition_attr=condition_val)
if another_condition:
query = query.filter_by(another=another_val)
#then finally execute it
results = query.all()
The function filter(*criterion) means you can use tuple as it's argument, #Wolph has detail here:
SQLALchemy dynamic filter_by for detail
if we speak of SQLAlchemy core, there is another way:
from sqlalchemy import and_
filters = [table.c.col1 == filter1, table.c.col2 > filter2]
query = table.select().where(and_(*filters))
If you're trying to filter based on incoming form criteria:
form = request.form.to_dict()
filters = []
for col in form:
sqlalchemybinaryexpression = (getattr(MODEL, col) == form[col])
filters.append(sqlalchemybinaryexpression)
query = table.select().where(and_(*filters))
Where MODEL is your SQLAlchemy Model
Another resolution to this question, this case is raised in a more secure way, since it verifies if the field to be filtered exists in the model.
To add operators to the value to which you want to filter. And not having to add a new parameter to the query, we can add an operator before the value e.g ?foo=>1, '?foo=<1,?foo=>=1, ?foo=<=1 ', ?foo=!1,?foo=1, and finally between which would be like this `?foo=a, b'.
from sqlalchemy.orm import class_mapper
import re
# input parameters
filter_by = {
"column1": "!1", # not equal to
"column2": "1", # equal to
"column3": ">1", # great to. etc...
}
def computed_operator(column, v):
if re.match(r"^!", v):
"""__ne__"""
val = re.sub(r"!", "", v)
return column.__ne__(val)
if re.match(r">(?!=)", v):
"""__gt__"""
val = re.sub(r">(?!=)", "", v)
return column.__gt__(val)
if re.match(r"<(?!=)", v):
"""__lt__"""
val = re.sub(r"<(?!=)", "", v)
return column.__lt__(val)
if re.match(r">=", v):
"""__ge__"""
val = re.sub(r">=", "", v)
return column.__ge__(val)
if re.match(r"<=", v):
"""__le__"""
val = re.sub(r"<=", "", v)
return column.__le__(val)
if re.match(r"(\w*),(\w*)", v):
"""between"""
a, b = re.split(r",", v)
return column.between(a, b)
""" default __eq__ """
return column.__eq__(v)
query = Table.query
filters = []
for k, v in filter_by.items():
mapper = class_mapper(Table)
if not hasattr(mapper.columns, k):
continue
filters.append(computed_operator(mapper.columns[k], "{}".format(v))
query = query.filter(*filters)
query.all()
Here is a solution that works both for AND or OR...
Just replace or_ with and_ in the code if you need that case:
from sqlalchemy import or_, and_
my_filters = set() ## <-- use a set to contain only unique values, avoid duplicates
if condition_1:
my_filters.add(MySQLClass.id == some_id)
if condition_2:
my_filters.add(MySQLClass.name == some_name)
fetched = db_session.execute(select(MySQLClass).where(or_(*my_filters))).scalars().all()
Playing with new Google App Engine MapReduce library filters for input_reader I would like to know how can I filter by ndb.Key.
I read this post and I've played with datetime, string, int, float, in filters tuples, but How I can filter by ndb.Key?
When I try to filter by a ndb.Key I get this error:
BadReaderParamsError: Expected Key, got u"Key('Clients', 406)"
Or this error:
TypeError: Key('Clients', 406) is not JSON serializable
I tried to pass a ndb.Key object and string representation of the ndb.Key.
Here are my two filters tuples:
Sample 1:
input_reader': {
'input_reader': 'mapreduce.input_readers.DatastoreInputReader',
'entity_kind': 'model.Sales',
'filters': [("client","=", ndb.Key('Clients', 406))]
}
Sample 2:
input_reader': {
'input_reader': 'mapreduce.input_readers.DatastoreInputReader',
'entity_kind': 'model.Sales',
'filters': [("client","=", "%s" % ndb.Key('Clients', 406))]
}
This is a bit tricky.
If you look at the code on Google Code you can see that mapreduce.model defines a JSON_DEFAULTS dict which determines the classes that get special-case handling in JSON serialization/deserialization: by default, just datetime. So, you can monkey-patch the ndb.Key class into there, and provide it with functions to do that serialization/deserialization - something like:
from mapreduce import model
def _JsonEncodeKey(o):
"""Json encode an ndb.Key object."""
return {'key_string': o.urlsafe()}
def _JsonDecodeKey(d):
"""Json decode a ndb.Key object."""
return ndb.Key(urlsafe=d['key_string'])
model.JSON_DEFAULTS[ndb.Key] = (_JsonEncodeKey, _JsonDecodeKey)
model._TYPE_IDS['Key'] = ndb.Key
You may also need to repeat those last two lines to patch mapreduce.lib.pipeline.util as well.
Also note if you do this, you'll need to ensure that this gets run on any instance that runs any part of a mapreduce: the easiest way to do this is to write a wrapper script that imports the above registration code, as well as mapreduce.main.APP, and override the mapreduce URL in your app.yaml to point to your wrapper.
Make your own input reader based on DatastoreInputReader, which knows how to decode key-based filters:
class DatastoreKeyInputReader(input_readers.DatastoreKeyInputReader):
"""Augment the base input reader to accommodate ReferenceProperty filters"""
def __init__(self, *args, **kwargs):
try:
filters = kwargs['filters']
decoded = []
for f in filters:
value = f[2]
if isinstance(value, list):
value = db.Key.from_path(*value)
decoded.append((f[0], f[1], value))
kwargs['filters'] = decoded
except KeyError:
pass
super(DatastoreKeyInputReader, self).__init__(*args, **kwargs)
Run this function on your filters before passing them in as options:
def encode_filters(filters):
if filters is not None:
encoded = []
for f in filters:
value = f[2]
if isinstance(value, db.Model):
value = value.key()
if isinstance(value, db.Key):
value = value.to_path()
entry = (f[0], f[1], value)
encoded.append(entry)
filters = encoded
return filters
Are you aware of the to_old_key() and from_old_key() methods?
I had the same problem and came up with a workaround with computed properties.
You can add to your Sales model a new ndb.ComputedProperty with the Key id. Ids are just strings, so you wont have any JSON problems.
client_id = ndb.ComputedProperty(lambda self: self.client.id())
And then add that condition to your mapreduce query filters
input_reader': {
'input_reader': 'mapreduce.input_readers.DatastoreInputReader',
'entity_kind': 'model.Sales',
'filters': [("client_id","=", '406']
}
The only drawback is that Computed properties are not indexed and stored until you call the put() parameter, so you will have to traverse all the Sales entities and save them:
for sale in Sales.query().fetch():
sale.put()
Is there an elegant way to do an INSERT ... ON DUPLICATE KEY UPDATE in SQLAlchemy? I mean something with a syntax similar to inserter.insert().execute(list_of_dictionaries) ?
ON DUPLICATE KEY UPDATE post version-1.2 for MySQL
This functionality is now built into SQLAlchemy for MySQL only. somada141's answer below has the best solution:
https://stackoverflow.com/a/48373874/319066
ON DUPLICATE KEY UPDATE in the SQL statement
If you want the generated SQL to actually include ON DUPLICATE KEY UPDATE, the simplest way involves using a #compiles decorator.
The code (linked from a good thread on the subject on reddit) for an example can be found on github:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def append_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
if 'append_string' in insert.kwargs:
return s + " " + insert.kwargs['append_string']
return s
my_connection.execute(my_table.insert(append_string = 'ON DUPLICATE KEY UPDATE foo=foo'), my_values)
But note that in this approach, you have to manually create the append_string. You could probably change the append_string function so that it automatically changes the insert string into an insert with 'ON DUPLICATE KEY UPDATE' string, but I'm not going to do that here due to laziness.
ON DUPLICATE KEY UPDATE functionality within the ORM
SQLAlchemy does not provide an interface to ON DUPLICATE KEY UPDATE or MERGE or any other similar functionality in its ORM layer. Nevertheless, it has the session.merge() function that can replicate the functionality only if the key in question is a primary key.
session.merge(ModelObject) first checks if a row with the same primary key value exists by sending a SELECT query (or by looking it up locally). If it does, it sets a flag somewhere indicating that ModelObject is in the database already, and that SQLAlchemy should use an UPDATE query. Note that merge is quite a bit more complicated than this, but it replicates the functionality well with primary keys.
But what if you want ON DUPLICATE KEY UPDATE functionality with a non-primary key (for example, another unique key)? Unfortunately, SQLAlchemy doesn't have any such function. Instead, you have to create something that resembles Django's get_or_create(). Another StackOverflow answer covers it, and I'll just paste a modified, working version of it here for convenience.
def get_or_create(session, model, defaults=None, **kwargs):
instance = session.query(model).filter_by(**kwargs).first()
if instance:
return instance
else:
params = dict((k, v) for k, v in kwargs.iteritems() if not isinstance(v, ClauseElement))
if defaults:
params.update(defaults)
instance = model(**params)
return instance
I should mention that ever since the v1.2 release, the SQLAlchemy 'core' has a solution to the above with that's built in and can be seen under here (copied snippet below):
from sqlalchemy.dialects.mysql import insert
insert_stmt = insert(my_table).values(
id='some_existing_id',
data='inserted value')
on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(
data=insert_stmt.inserted.data,
status='U'
)
conn.execute(on_duplicate_key_stmt)
Based on phsource's answer, and for the specific use-case of using MySQL and completely overriding the data for the same key without performing a DELETE statement, one can use the following #compiles decorated insert expression:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def append_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
if insert.kwargs.get('on_duplicate_key_update'):
fields = s[s.find("(") + 1:s.find(")")].replace(" ", "").split(",")
generated_directive = ["{0}=VALUES({0})".format(field) for field in fields]
return s + " ON DUPLICATE KEY UPDATE " + ",".join(generated_directive)
return s
It's depends upon you. If you want to replace then pass OR REPLACE in prefixes
def bulk_insert(self,objects,table):
#table: Your table class and objects are list of dictionary [{col1:val1, col2:vale}]
for counter,row in enumerate(objects):
inserter = table.__table__.insert(prefixes=['OR IGNORE'], values=row)
try:
self.db.execute(inserter)
except Exception as E:
print E
if counter % 100 == 0:
self.db.commit()
self.db.commit()
Here commit interval can be changed to speed up or speed down
My way
import typing
from datetime import datetime
from sqlalchemy.dialects import mysql
class MyRepository:
def model(self):
return MySqlAlchemyModel
def upsert(self, data: typing.List[typing.Dict]):
if not data:
return
model = self.model()
if hasattr(model, 'created_at'):
for item in data:
item['created_at'] = datetime.now()
stmt = mysql.insert(getattr(model, '__table__')).values(data)
for_update = []
for k, v in data[0].items():
for_update.append(k)
dup = {k: getattr(stmt.inserted, k) for k in for_update}
stmt = stmt.on_duplicate_key_update(**dup)
self.db.session.execute(stmt)
self.db.session.commit()
Usage:
myrepo.upsert([
{
"field11": "value11",
"field21": "value21",
"field31": "value31",
},
{
"field12": "value12",
"field22": "value22",
"field32": "value32",
},
])
The other answers have this covered but figured I'd reference another good example for mysql I found in this gist. This also includes the use of LAST_INSERT_ID, which may be useful depending on your innodb auto increment settings and whether your table has a unique key. Lifting the code here for easy reference but please give the author a star if you find it useful.
from app import db
from sqlalchemy import func
from sqlalchemy.dialects.mysql import insert
def upsert(model, insert_dict):
"""model can be a db.Model or a table(), insert_dict should contain a primary or unique key."""
inserted = insert(model).values(**insert_dict)
upserted = inserted.on_duplicate_key_update(
id=func.LAST_INSERT_ID(model.id), **{k: inserted.inserted[k]
for k, v in insert_dict.items()})
res = db.engine.execute(upserted)
return res.lastrowid
ORM
use upset func based on on_duplicate_key_update
class Model():
__input_data__ = dict()
def __init__(self, **kwargs) -> None:
self.__input_data__ = kwargs
self.session = Session(engine)
def save(self):
self.session.add(self)
self.session.commit()
def upsert(self, *, ingore_keys = []):
column_keys = self.__table__.columns.keys()
udpate_data = dict()
for key in self.__input_data__.keys():
if key not in column_keys:
continue
else:
udpate_data[key] = self.__input_data__[key]
insert_stmt = insert(self.__table__).values(**udpate_data)
all_ignore_keys = ['id']
if isinstance(ingore_keys, list):
all_ignore_keys =[*all_ignore_keys, *ingore_keys]
else:
all_ignore_keys.append(ingore_keys)
udpate_columns = dict()
for key in self.__input_data__.keys():
if key not in column_keys or key in all_ignore_keys:
continue
else:
udpate_columns[key] = insert_stmt.inserted[key]
on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(
**udpate_columns
)
# self.session.add(self)
self.session.execute(on_duplicate_key_stmt)
self.session.commit()
class ManagerAssoc(ORM_Base, Model):
def __init__(self, **kwargs):
self.id = idWorker.get_id()
column_keys = self.__table__.columns.keys()
udpate_data = dict()
for key in kwargs.keys():
if key not in column_keys:
continue
else:
udpate_data[key] = kwargs[key]
ORM_Base.__init__(self, **udpate_data)
Model.__init__(self, **kwargs, id = self.id)
....
# you can call it as following:
manager_assoc.upsert()
manager.upsert(ingore_keys = ['manager_id'])
Got a simpler solution:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def replace_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
s = s.replace("INSERT INTO", "REPLACE INTO")
return s
my_connection.execute(my_table.insert(replace_string=""), my_values)
I just used plain sql as:
insert_stmt = "REPLACE INTO tablename (column1, column2) VALUES (:column_1_bind, :columnn_2_bind) "
session.execute(insert_stmt, data)
Update Feb 2023: SQLAlchemy version 2 was recently released and supports on_duplicate_key_update in the MySQL dialect. Many many thanks to Federico Caselli of the SQLAlchemy project who helped me develop sample code in a discussion at https://github.com/sqlalchemy/sqlalchemy/discussions/9328
Please see https://stackoverflow.com/a/75538576/1630244
If it's ok to post the same answer twice (?) here is my small self-contained code example:
import sqlalchemy as db
import sqlalchemy.dialects.mysql as mysql
from sqlalchemy import delete, select, String
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
class Base(DeclarativeBase):
pass
class User(Base):
__tablename__ = "foo"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(String(30))
engine = db.create_engine('mysql+mysqlconnector://USER-NAME-HERE:PASS-WORD-HERE#localhost/SCHEMA-NAME-HERE')
conn = engine.connect()
# setup step 0 - ensure the table exists
Base().metadata.create_all(bind=engine)
# setup step 1 - clean out rows with id 1..5
del_stmt = delete(User).where(User.id.in_([1, 2, 3, 4, 5]))
conn.execute(del_stmt)
conn.commit()
sel_stmt = select(User)
users = list(conn.execute(sel_stmt))
print(f'Table size after cleanout: {len(users)}')
# setup step 2 - insert 4 rows
ins_stmt = mysql.insert(User).values(
[
{"id": 1, "name": "x"},
{"id": 2, "name": "y"},
{"id": 3, "name": "w"},
{"id": 4, "name": "z"},
]
)
conn.execute(ins_stmt)
conn.commit()
users = list(conn.execute(sel_stmt))
print(f'Table size after insert: {len(users)}')
# demonstrate upsert
ups_stmt = mysql.insert(User).values(
[
{"id": 1, "name": "xx"},
{"id": 2, "name": "yy"},
{"id": 3, "name": "ww"},
{"id": 5, "name": "new"},
]
)
ups_stmt = ups_stmt.on_duplicate_key_update(name=ups_stmt.inserted.name)
# if you want to see the compiled result
# x = ups_stmt.compile(dialect=mysql.dialect())
# print(x.string, x.construct_params())
conn.execute(ups_stmt)
conn.commit()
users = list(conn.execute(sel_stmt))
print(f'Table size after upsert: {len(users)}')
As none of these solutions seem all the elegant. A brute force way is to query to see if the row exists. If it does delete the row and then insert otherwise just insert. Obviously some overhead involved but it does not rely on modifying the raw sql and it works on non orm stuff.