Using SQLAlchemy, I am trying to print out all of the attributes of each model that I have in a manner similar to:
SELECT * from table;
However, I would like to do something with each models instance information as I get it. So far the best that I've been able to come up with is:
for m in session.query(model).all():
print [getattr(m, x.__str__().split('.')[1]) for x in model.__table__.columns]
# additional code
And this will give me what I'm looking for, but it's a fairly roundabout way of getting it. I was kind of hoping for an attribute along the lines of:
m.attributes
# or
m.columns.values
I feel I'm missing something and there is a much better way of doing this. I'm doing this because I'll be printing everything to .CSV files, and I don't want to have to specify the columns/attributes that I'm interested in, I want everything (there's a lot of columns in a lot of models to be printed).
This is an old post, but I ran into a problem with the actual database column names not matching the mapped attribute names on the instance. We ended up going with this:
from sqlalchemy import inspect
inst = inspect(model)
attr_names = [c_attr.key for c_attr in inst.mapper.column_attrs]
Hope that helps somebody with the same problem!
Probably the shortest solution (see the recent documentation):
from sqlalchemy.inspection import inspect
columns = [column.name for column in inspect(model).c]
The last line might look more readable, if rewrite it in three lines:
table = inspect(model)
for column in table.c:
print column.name
Building on Rodney L's answer:
model = MYMODEL
columns = [m.key for m in model.__table__.columns]
Take a look at SQLAchemy's metadata reflection feature.
A Table object can be instructed to load information about itself from the corresponding database schema object already existing within the database. This process is called reflection.
print repr(model.__table__)
Or just the columns:
print str(list(model.__table__.columns))
I believe this is the easiest way:
print [cname for cname in m.__dict__.keys()]
EDIT: The answer above me using sqlalchemy.inspection.inspect() seems to be a better solution.
Put this together and found it helpful:
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
engine = create_engine('mysql+pymysql://testuser:password#localhost:3306/testdb')
DeclarativeBase = declarative_base()
metadata = DeclarativeBase.metadata
metadata.bind = engine
# configure Session class with desired options
Session = sessionmaker()
# associate it with our custom Session class
Session.configure(bind=engine)
# work with the session
session = Session()
And then:
d = {k: metadata.tables[k].columns.keys() for k in metadata.tables.keys()}
Example output print(d):
{'orderdetails': ['orderNumber', 'productCode', 'quantityOrdered', 'priceEach', 'orderLineNumber'],
'offices': ['addressLine1', 'addressLine2', 'city', 'country', 'officeCode', 'phone', 'postalCode', 'state', 'territory'],
'orders': ['comments', 'customerNumber', 'orderDate', 'orderNumber', 'requiredDate', 'shippedDate', 'status'],
'products': ['MSRP', 'buyPrice', 'productCode', 'productDescription', 'productLine', 'productName', 'productScale', 'productVendor', 'quantityInStock'],
'employees': ['employeeNumber', 'lastName', 'firstName', 'extension', 'email', 'officeCode', 'reportsTo', 'jobTitle'],
'customers': ['addressLine1', 'addressLine2', 'city', 'contactFirstName', 'contactLastName', 'country', 'creditLimit', 'customerName', 'customerNumber', 'phone', 'postalCode', 'salesRepEmployeeNumber', 'state'],
'productlines': ['htmlDescription', 'image', 'productLine', 'textDescription'],
'payments': ['amount', 'checkNumber', 'customerNumber', 'paymentDate']}
OR and then:
from sqlalchemy.sql import text
cmd = "SELECT * FROM information_schema.columns WHERE table_schema = :db ORDER BY table_name,ordinal_position"
result = session.execute(
text(cmd),
{"db": "classicmodels"}
)
result.fetchall()
I'm using SQL Alchemy v 1.0.14 on Python 3.5.2
Assuming you can connect to an engine with create_engine(), I was able to display all columns using the following code. Replace "my connection string" and "my table name" with the appropriate values.
from sqlalchemy import create_engine, MetaData, Table, select
engine = create_engine('my connection string')
conn = engine.connect()
metadata = MetaData(conn)
t = Table("my table name", metadata, autoload=True)
columns = [m.key for m in t.columns]
columns
the last row just displays the column names from the previous statement.
You may be interested in what I came up with to do this.
from sqlalchemy.orm import class_mapper
import collections
# structure returned by get_metadata function.
MetaDataTuple = collections.namedtuple("MetaDataTuple",
"coltype, colname, default, m2m, nullable, uselist, collection")
def get_metadata_iterator(class_):
for prop in class_mapper(class_).iterate_properties:
name = prop.key
if name.startswith("_") or name == "id" or name.endswith("_id"):
continue
md = _get_column_metadata(prop)
if md is None:
continue
yield md
def get_column_metadata(class_, colname):
prop = class_mapper(class_).get_property(colname)
md = _get_column_metadata(prop)
if md is None:
raise ValueError("Not a column name: %r." % (colname,))
return md
def _get_column_metadata(prop):
name = prop.key
m2m = False
default = None
nullable = None
uselist = False
collection = None
proptype = type(prop)
if proptype is ColumnProperty:
coltype = type(prop.columns[0].type).__name__
try:
default = prop.columns[0].default
except AttributeError:
default = None
else:
if default is not None:
default = default.arg(None)
nullable = prop.columns[0].nullable
elif proptype is RelationshipProperty:
coltype = RelationshipProperty.__name__
m2m = prop.secondary is not None
nullable = prop.local_side[0].nullable
uselist = prop.uselist
if prop.collection_class is not None:
collection = type(prop.collection_class()).__name__
else:
collection = "list"
else:
return None
return MetaDataTuple(coltype, str(name), default, m2m, nullable, uselist, collection)
I use this because it's slightly shorter:
for m in session.query(*model.__table__.columns).all():
print m
Related
I'm looking to create a new object from q2, which fails because the Question class is expecting options to be a dictionary of Options, and it's receiving a dict of dicts instead.
So, unpacking obviously fails with a nested model.
What is the best approach to handle this? Is there something that's equivalent to the elegance of the **dict for a nested model?
main.py
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import models.base
from models.question import Question
from models.option import Option
engine = create_engine('sqlite:///:memory:')
models.base.Base.metadata.create_all(engine, checkfirst=True)
Session = sessionmaker(bind=engine)
session = Session()
def create_question(q):
# The following hard coding works:
# q = Question(text='test text',
# frequency='test frequency',
# options=[Option(text='test option')]
# )
question = Question(**q)
session.add(question)
session.commit()
q1 = {
'text': 'test text',
'frequency': 'test frequency'
}
q2 = {
'text': 'test text',
'frequency': 'test frequency',
'options': [
{'text': 'test option 123'},
]
}
create_question(q1)
# create_question(q2) FAILS
base.py
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
question.py
from sqlalchemy import *
from sqlalchemy.orm import relationship
from .base import Base
class Question(Base):
__tablename__ = 'questions'
id = Column(Integer, primary_key=True)
text = Column(String(120), nullable=False)
frequency = Column(String(20), nullable=False)
active = Column(Boolean(), default=True, nullable=False)
options = relationship('Option', back_populates='question')
def __repr__(self):
return "<Question(id={0}, text={1}, frequency={2}, active={3})>".format(self.id, self.text, self.frequency, self.active)
option.py
from sqlalchemy import *
from sqlalchemy.orm import relationship
from .base import Base
class Option(Base):
__tablename__ = 'options'
id = Column(Integer, primary_key=True)
question_id = Column(Integer, ForeignKey('questions.id'))
text = Column(String(20), nullable=False)
question = relationship('Question', back_populates='options')
def __repr__(self):
return "<Option(id={0}, question_id={1}, text={2})>".format(self.id, self.question_id, self.text)
I liked the answer provided by #Abdou, but wanted to see if I couldn't make it a bit more generic.
I ended up coming up with the following, which should handle any nested model.
from sqlalchemy import event, inspect
#event.listens_for(Question, 'init')
#event.listens_for(Option, 'init')
def received_init(target, args, kwargs):
for rel in inspect(target.__class__).relationships:
rel_cls = rel.mapper.class_
if rel.key in kwargs:
kwargs[rel.key] = [rel_cls(**c) for c in kwargs[rel.key]]
Listens for the init event of any specified models, checks for relationships that match the kwargs passed in, and then converts those to the matching class of the relationship.
If anyone knows how to set this up so it can work on all models instead of specifying them, I would appreciate it.
Given that you need to create an Option object every time there is an options key in the dictionary passed to the create_question function, you should use dictionary comprehension to create your options before passing the result to the Question instantiator. I would rewrite the function as follows:
def create_question(q):
# The following hard coding works:
# q = Question(text='test text',
# frequency='test frequency',
# options=[Option(text='test option')]
# )
q = dict((k, [Option(**x) for x in v]) if k == 'options' else (k,v) for k,v in q.items())
print(q)
question = Question(**q)
session.add(question)
session.commit()
The dictionary comprehension part basically checks if there is an options key in the given dictionary; and if there is one, then it creates Option objects with the values. Otherwise, it carries on as normal.
The above function generated the following:
# {'text': 'test text', 'frequency': 'test frequency'}
# {'text': 'test text', 'frequency': 'test frequency', 'options': [<Option(id=None, question_id=None, text=test option 123)>]}
I hope this helps.
For SQLAlchemy objects you can simply use Model.__dict__
Building on #Searle's answer, this avoids needing to directly list all models in the decorators, and also provides handling for when uselist=False (e.g. 1:1, many:1 relationships):
from sqlalchemy import event
from sqlalchemy.orm import Mapper
#event.listens_for(Mapper, 'init')
def received_init(target, args, kwargs):
"""Allow initializing nested relationships with dict only"""
for rel in db.inspect(target).mapper.relationships:
if rel.key in kwargs:
if rel.uselist:
kwargs[rel.key] = [rel.mapper.class_(**c) for c in kwargs[rel.key]]
else:
kwargs[rel.key] = rel.mapper.class_(**kwargs[rel.key])
Possible further improvements:
add handling for if kwargs[rel.key] is a model instance (right now this fails if you pass a model instance for relationships instead of a dict)
allow relationships to be specified as None (right now requires empty lists or dicts)
source: SQLAlchemy "event.listen" for all models
This is very similar to another question that's over 3 years old: What's a good general way to look SQLAlchemy transactions, complete with authenticated user, etc?
I'm working on an application where I'd like to log all changes to particular tables. There's currently a really good "recipe" that does versioning, but I need to modify it to instead record a datetime when the change occurred and a user id of who made the change. I took the history_meta.py example that's packaged with SQLAlchemy and made it record times instead of version numbers, but I'm having trouble figuring out how to pass in a user id.
The question I referenced above suggests including the user id in the session object. That makes a lot of sense, but I'm not sure how to do that. I've tried something simple like session.userid = authenticated_userid(request) but in history_meta.py that attribute doesn't seem to be on the session object any more.
I'm doing all of this in the Pyramid framework and the session object that I'm using is defined as DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension())). In a view I do session = DBSession() and then proceed to use session. (I'm not really sure if that's necessary, but that's what's going on)
Here's my modified history_meta.py in case someone might find it useful:
from sqlalchemy.ext.declarative import declared_attr
from sqlalchemy.orm import mapper, class_mapper, attributes, object_mapper
from sqlalchemy.orm.exc import UnmappedClassError, UnmappedColumnError
from sqlalchemy import Table, Column, ForeignKeyConstraint, DateTime
from sqlalchemy import event
from sqlalchemy.orm.properties import RelationshipProperty
from datetime import datetime
def col_references_table(col, table):
for fk in col.foreign_keys:
if fk.references(table):
return True
return False
def _history_mapper(local_mapper):
cls = local_mapper.class_
# set the "active_history" flag
# on on column-mapped attributes so that the old version
# of the info is always loaded (currently sets it on all attributes)
for prop in local_mapper.iterate_properties:
getattr(local_mapper.class_, prop.key).impl.active_history = True
super_mapper = local_mapper.inherits
super_history_mapper = getattr(cls, '__history_mapper__', None)
polymorphic_on = None
super_fks = []
if not super_mapper or local_mapper.local_table is not super_mapper.local_table:
cols = []
for column in local_mapper.local_table.c:
if column.name == 'version_datetime':
continue
col = column.copy()
col.unique = False
if super_mapper and col_references_table(column, super_mapper.local_table):
super_fks.append((col.key, list(super_history_mapper.local_table.primary_key)[0]))
cols.append(col)
if column is local_mapper.polymorphic_on:
polymorphic_on = col
if super_mapper:
super_fks.append(('version_datetime', super_history_mapper.base_mapper.local_table.c.version_datetime))
cols.append(Column('version_datetime', DateTime, default=datetime.now, nullable=False, primary_key=True))
else:
cols.append(Column('version_datetime', DateTime, default=datetime.now, nullable=False, primary_key=True))
if super_fks:
cols.append(ForeignKeyConstraint(*zip(*super_fks)))
table = Table(local_mapper.local_table.name + '_history', local_mapper.local_table.metadata,
*cols
)
else:
# single table inheritance. take any additional columns that may have
# been added and add them to the history table.
for column in local_mapper.local_table.c:
if column.key not in super_history_mapper.local_table.c:
col = column.copy()
col.unique = False
super_history_mapper.local_table.append_column(col)
table = None
if super_history_mapper:
bases = (super_history_mapper.class_,)
else:
bases = local_mapper.base_mapper.class_.__bases__
versioned_cls = type.__new__(type, "%sHistory" % cls.__name__, bases, {})
m = mapper(
versioned_cls,
table,
inherits=super_history_mapper,
polymorphic_on=polymorphic_on,
polymorphic_identity=local_mapper.polymorphic_identity
)
cls.__history_mapper__ = m
if not super_history_mapper:
local_mapper.local_table.append_column(
Column('version_datetime', DateTime, default=datetime.now, nullable=False, primary_key=False)
)
local_mapper.add_property("version_datetime", local_mapper.local_table.c.version_datetime)
class Versioned(object):
#declared_attr
def __mapper_cls__(cls):
def map(cls, *arg, **kw):
mp = mapper(cls, *arg, **kw)
_history_mapper(mp)
return mp
return map
def versioned_objects(iter):
for obj in iter:
if hasattr(obj, '__history_mapper__'):
yield obj
def create_version(obj, session, deleted = False):
obj_mapper = object_mapper(obj)
history_mapper = obj.__history_mapper__
history_cls = history_mapper.class_
obj_state = attributes.instance_state(obj)
attr = {}
obj_changed = False
for om, hm in zip(obj_mapper.iterate_to_root(), history_mapper.iterate_to_root()):
if hm.single:
continue
for hist_col in hm.local_table.c:
if hist_col.key == 'version_datetime':
continue
obj_col = om.local_table.c[hist_col.key]
# get the value of the
# attribute based on the MapperProperty related to the
# mapped column. this will allow usage of MapperProperties
# that have a different keyname than that of the mapped column.
try:
prop = obj_mapper.get_property_by_column(obj_col)
except UnmappedColumnError:
# in the case of single table inheritance, there may be
# columns on the mapped table intended for the subclass only.
# the "unmapped" status of the subclass column on the
# base class is a feature of the declarative module as of sqla 0.5.2.
continue
# expired object attributes and also deferred cols might not be in the
# dict. force it to load no matter what by using getattr().
if prop.key not in obj_state.dict:
getattr(obj, prop.key)
a, u, d = attributes.get_history(obj, prop.key)
if d:
attr[hist_col.key] = d[0]
obj_changed = True
elif u:
attr[hist_col.key] = u[0]
else:
# if the attribute had no value.
attr[hist_col.key] = a[0]
obj_changed = True
if not obj_changed:
# not changed, but we have relationships. OK
# check those too
for prop in obj_mapper.iterate_properties:
if isinstance(prop, RelationshipProperty) and \
attributes.get_history(obj, prop.key).has_changes():
obj_changed = True
break
if not obj_changed and not deleted:
return
attr['version_datetime'] = obj.version_datetime
hist = history_cls()
for key, value in attr.items():
setattr(hist, key, value)
session.add(hist)
print(dir(session))
obj.version_datetime = datetime.now()
def versioned_session(session):
#event.listens_for(session, 'before_flush')
def before_flush(session, flush_context, instances):
for obj in versioned_objects(session.dirty):
create_version(obj, session)
for obj in versioned_objects(session.deleted):
create_version(obj, session, deleted = True)
UPDATE:
Okay, it seems that in the before_flush() method the session I get is of type sqlalchemy.orm.session.Session where the session I attached the user_id to was sqlalchemy.orm.scoping.scoped_session. So, at some point an object layer is stripped off. Is it safe to assign the user_id to the Session within the scoped_session? Can I be sure that it won't be there for other requests?
Old question, but still very relevant.
You should avoid trying to place web session information on the database session. It's combining unrelated concerns and each has it's own lifecycle (which don't match). Here's an approach I use in Flask with SQLAlchemy (not Flask-SQLAlchemy, but that should work too). I've tried to comment where Pyramid would be different.
from flask import has_request_context # How to check if in a Flask session
from sqlalchemy import inspect
from sqlalchemy.orm import class_mapper
from sqlalchemy.orm.attributes import get_history
from sqlalchemy.event import listen
from YOUR_SESSION_MANAGER import get_user # This would be something in Pyramid
from my_project import models # Where your models are defined
def get_object_changes(obj):
""" Given a model instance, returns dict of pending
changes waiting for database flush/commit.
e.g. {
'some_field': {
'before': *SOME-VALUE*,
'after': *SOME-VALUE*
},
...
}
"""
inspection = inspect(obj)
changes = {}
for attr in class_mapper(obj.__class__).column_attrs:
if getattr(inspection.attrs, attr.key).history.has_changes():
if get_history(obj, attr.key)[2]:
before = get_history(obj, attr.key)[2].pop()
after = getattr(obj, attr.key)
if before != after:
if before or after:
changes[attr.key] = {'before': before, 'after': after}
return changes
def my_model_change_listener(mapper, connection, target):
changes = get_object_changes(target)
changes.pop("modify_ts", None) # remove fields you don't want to track
user_id = None
if has_request_context():
# Call your function to get active user and extract id
user_id = getattr(get_user(), 'id', None)
if user_id is None:
# What do you want to do if user can't be determined
pass
# You now have the model instance (target), the user_id who is logged in,
# and a dictionary of changes.
# Either do somthing "quick" with it here or call an async task (e.g.
# Celery) to do something with the information that may take longer
# than you want the request to take.
# Add the listener
listen(models.MyModel, 'after_update', my_model_change_listener)
After a bunch of fiddling I seem to able to set values on the session object within the scoped_session by doing the following:
DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
session = DBSession()
inner_session = session.registry()
inner_session.user_id = "test"
versioned_session(session)
Now the session object being passed around in history_meta.py has a user_id attribute on it which I set. I'm a little concerned about whether this is the right way of doing this as the object in the registry is a thread-local one and the threads are being re-used for different http requests.
I ran into this old question recently. My requirement is to log all changes to a set of tables.
I'll post the code I ended up with here in case anyone finds it useful. It has some limitations, especially around deletes, but works for my purposes. The code supports logging audit records for selected tables to either a log file, or an audit table in the db.
from app import db
import datetime
from flask import current_app, g
# your own session user goes here
# you'll need an id and an email in that model
from flask_user import current_user as user
import importlib
import logging
from sqlalchemy import event, inspect
from sqlalchemy.orm.attributes import get_history
from sqlalchemy.orm import ColumnProperty, class_mapper
from uuid import uuid4
class AuditManager (object):
config = {'storage': 'log',
#define class for Audit model for your project, if saving audit records in db
'auditModel': 'app.models.user_models.Audit'}
def __init__(self, app):
if 'AUDIT_CONFIG' in app.config:
app.before_request(self.before_request_handler)
self.config.update(app.config['AUDIT_CONFIG'])
event.listen(
db.session,
'after_flush',
self.db_after_flush
)
event.listen(
db.session,
'before_flush',
self.db_before_flush
)
event.listen(
db.session,
'after_bulk_delete',
self.db_after_bulk_delete
)
if self.config['storage'] == 'log':
self.logger = logging.getLogger(__name__)
elif self.config['storage'] == 'db':
# Load Audit model class at runtime, so that log file users dont need to define it
module_name, class_name = self.config['auditModel'].rsplit(".", 1)
self.AuditModel = getattr(importlib.import_module(module_name), class_name)
#Create a global request id
# Use this to group transactions together
def before_request_handler(self):
g.request_id = uuid4()
def db_after_flush(self, session, flush_context):
for instance in session.new:
if instance.__tablename__ in self.config['tables']:
# Record the inserts for this table
data = {}
auditFields = getattr(instance.__class__, 'Meta', None)
auditFields = getattr(auditFields,\
'auditFields', #Prefer to list auditable fields explicitly in the model's Meta class
self.get_fields(instance)) # or derive them otherwise
for attr in auditFields:
data[attr] = str(getattr(instance, attr, 'not set')) #Make every value a string in audit
self.log_it (session, 'insert', instance, data)
def db_before_flush(self, session, flush_context, instances):
for instance in session.dirty:
# Record the changes for this table
if instance.__tablename__ in self.config['tables']:
inspection = inspect(instance)
data = {}
auditFields = getattr(instance.__class__, 'Meta', None)
auditFields = getattr(auditFields,\
'auditFields',
self.get_fields(instance))
for attr in auditFields:
if getattr(inspection.attrs, attr).history.has_changes(): #We only log the new data
data[attr] = str(getattr(instance, attr, 'not set'))
self.log_it (session, 'change', instance, data)
for instance in session.deleted:
# Record the deletes for this table
# for this to be triggered, you must use this session based delete object construct.
# Eg: session.delete({query}.first())
if instance.__tablename__ in self.config['tables']:
data = {}
auditFields = getattr(instance.__class__, 'Meta', None)
auditFields = getattr(auditFields,\
'auditFields',
self.get_fields(instance))
for attr in auditFields:
data[attr] = str(getattr(instance, attr, 'not set'))
self.log_it (session, 'delete', instance, data)
def db_after_bulk_delete(self, delete_context):
instance = delete_context.query.column_descriptions[0]['type'] #only works for single table deletes
if delete_context.result.returns_rows:
# Not sure exactly how after_bulk_delete is expected work, since the context.results is empty,
# as delete statement return no results
for row in delete_context.result:
data = {}
auditFields = getattr(instance.__class__, 'Meta', None)
auditFields = getattr(auditFields,\
'auditFields',
self.get_fields(instance))
for attr in auditFields:
data[attr] = str(getattr(row, attr, 'not set')) #Make every value a string in audit
self.log_it (delete_context.session, 'delete', instance, data)
else:
# Audit what we can when we don't have indiividual rows to look at
self.log_it (delete_context.session, 'delete', instance,\
{"rowcount": delete_context.result.rowcount})
def log_it (self, session, action, instance, data):
if self.config['storage'] == 'log':
self.logger.info("request_id: %s, table: %s, action: %s, user id: %s, user email: %s, date: %s, data: %s" \
% (getattr(g, 'request_id', None), instance.__tablename__, action, getattr(user, 'id', None), getattr(user, 'email', None),\
datetime.datetime.now(), data))
elif self.config['storage'] == 'db':
audit = self.AuditModel(request_id=str(getattr(g, 'request_id', None)),
table=str(instance.__tablename__),
action=action,
user_id=getattr(user, 'id', None),
user_email=getattr(user, 'email', None),
date=datetime.datetime.now(),
data=data
)
session.add(audit)
def get_fields(self, instance):
fields = []
for attr in class_mapper(instance.__class__).column_attrs:
fields.append(attr.key)
return fields
Suggested Model, if you want to store audit records in the database.
class Audit(db.Model):
__tablename__ = 'audit'
id = db.Column(db.Integer, primary_key=True)
request_id = db.Column(db.Unicode(50), nullable=True, index=True, server_default=u'')
table = db.Column(db.Unicode(50), nullable=False, index=True, server_default=u'')
action = db.Column(db.Unicode(20), nullable=False, server_default=u'')
user_id = db.Column(db.Integer, db.ForeignKey('user.id', ondelete='SET NULL'), nullable=True, )
user_email = db.Column(db.Unicode(255), nullable=False, server_default=u'')
date = db.Column(db.DateTime, default=db.func.now())
data = db.Column(JSON)
In settings:
AUDIT_CONFIG = {
"tables": ['user', 'order', 'batch']
}
I am trying to create a program that loads in over 100 tables from a database so that I can change all appearances of a user's user id.
Rather than map all of the tables individually, I decided to use a loop to map each of the tables using an array of objects. This way, the table definitions can be stored in a config file and later updated.
Here is my code so far:
def init_model(engine):
"""Call me before using any of the tables or classes in the model"""
meta.Session.configure(bind=engine)
meta.engine = engine
class Table:
tableID = ''
primaryKey = ''
pkType = sa.types.String()
class mappedClass(object):
pass
WIW_TBL = Table()
LOCATIONS_TBL = Table()
WIW_TBL.tableID = "wiw_tbl"
WIW_TBL.primaryKey = "PORTAL_USERID"
WIW_TBL.pkType = sa.types.String()
LOCATIONS_TBL.tableID = "locations_tbl"
LOCATIONS_TBL.primaryKey = "LOCATION_CODE"
LOCATIONS_TBL.pkType = sa.types.Integer()
tableList = ([WIW_TBL, LOCATIONS_TBL])
for i in tableList:
i.tableID = sa.Table(i.tableID.upper(), meta.metadata,
sa.Column(i.primaryKey, i.pkType, primary_key=True),
autoload=True,
autoload_with=engine)
orm.mapper(i.mappedClass, i.tableID)
The error that this code returns is:
sqlalchemy.exc.ArgumentError: Class '<class 'changeofname.model.mappedClass'>' already has a primary mapper defined. Use non_primary=True to create a non primary Mapper. clear_mappers() will remove *all* current mappers from all classes.
I cant use clear_mappers as it wipes all of the classes and the entity_name scheme doesn't seem to apply here.
It seems that every object wants to use the same class, although they all should have their own instance of it.
Does anyone have any ideas?
Well, in your case it *is the same Class you try to map to different Tables. To solve this, create a class dynamically for each Table:
class Table(object):
tableID = ''
primaryKey = ''
pkType = sa.types.String()
def __init__(self):
self.mappedClass = type('TempClass', (object,), {})
But I would prefer slightly cleaner version:
class Table2(object):
def __init__(self, table_id, pk_name, pk_type):
self.tableID = table_id
self.primaryKey = pk_name
self.pkType = pk_type
self.mappedClass = type('Class_' + self.tableID, (object,), {})
# ...
WIW_TBL = Table2("wiw_tbl", "PORTAL_USERID", sa.types.String())
LOCATIONS_TBL = Table2("locations_tbl", "LOCATION_CODE", sa.types.Integer())
Is there an elegant way to do an INSERT ... ON DUPLICATE KEY UPDATE in SQLAlchemy? I mean something with a syntax similar to inserter.insert().execute(list_of_dictionaries) ?
ON DUPLICATE KEY UPDATE post version-1.2 for MySQL
This functionality is now built into SQLAlchemy for MySQL only. somada141's answer below has the best solution:
https://stackoverflow.com/a/48373874/319066
ON DUPLICATE KEY UPDATE in the SQL statement
If you want the generated SQL to actually include ON DUPLICATE KEY UPDATE, the simplest way involves using a #compiles decorator.
The code (linked from a good thread on the subject on reddit) for an example can be found on github:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def append_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
if 'append_string' in insert.kwargs:
return s + " " + insert.kwargs['append_string']
return s
my_connection.execute(my_table.insert(append_string = 'ON DUPLICATE KEY UPDATE foo=foo'), my_values)
But note that in this approach, you have to manually create the append_string. You could probably change the append_string function so that it automatically changes the insert string into an insert with 'ON DUPLICATE KEY UPDATE' string, but I'm not going to do that here due to laziness.
ON DUPLICATE KEY UPDATE functionality within the ORM
SQLAlchemy does not provide an interface to ON DUPLICATE KEY UPDATE or MERGE or any other similar functionality in its ORM layer. Nevertheless, it has the session.merge() function that can replicate the functionality only if the key in question is a primary key.
session.merge(ModelObject) first checks if a row with the same primary key value exists by sending a SELECT query (or by looking it up locally). If it does, it sets a flag somewhere indicating that ModelObject is in the database already, and that SQLAlchemy should use an UPDATE query. Note that merge is quite a bit more complicated than this, but it replicates the functionality well with primary keys.
But what if you want ON DUPLICATE KEY UPDATE functionality with a non-primary key (for example, another unique key)? Unfortunately, SQLAlchemy doesn't have any such function. Instead, you have to create something that resembles Django's get_or_create(). Another StackOverflow answer covers it, and I'll just paste a modified, working version of it here for convenience.
def get_or_create(session, model, defaults=None, **kwargs):
instance = session.query(model).filter_by(**kwargs).first()
if instance:
return instance
else:
params = dict((k, v) for k, v in kwargs.iteritems() if not isinstance(v, ClauseElement))
if defaults:
params.update(defaults)
instance = model(**params)
return instance
I should mention that ever since the v1.2 release, the SQLAlchemy 'core' has a solution to the above with that's built in and can be seen under here (copied snippet below):
from sqlalchemy.dialects.mysql import insert
insert_stmt = insert(my_table).values(
id='some_existing_id',
data='inserted value')
on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(
data=insert_stmt.inserted.data,
status='U'
)
conn.execute(on_duplicate_key_stmt)
Based on phsource's answer, and for the specific use-case of using MySQL and completely overriding the data for the same key without performing a DELETE statement, one can use the following #compiles decorated insert expression:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def append_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
if insert.kwargs.get('on_duplicate_key_update'):
fields = s[s.find("(") + 1:s.find(")")].replace(" ", "").split(",")
generated_directive = ["{0}=VALUES({0})".format(field) for field in fields]
return s + " ON DUPLICATE KEY UPDATE " + ",".join(generated_directive)
return s
It's depends upon you. If you want to replace then pass OR REPLACE in prefixes
def bulk_insert(self,objects,table):
#table: Your table class and objects are list of dictionary [{col1:val1, col2:vale}]
for counter,row in enumerate(objects):
inserter = table.__table__.insert(prefixes=['OR IGNORE'], values=row)
try:
self.db.execute(inserter)
except Exception as E:
print E
if counter % 100 == 0:
self.db.commit()
self.db.commit()
Here commit interval can be changed to speed up or speed down
My way
import typing
from datetime import datetime
from sqlalchemy.dialects import mysql
class MyRepository:
def model(self):
return MySqlAlchemyModel
def upsert(self, data: typing.List[typing.Dict]):
if not data:
return
model = self.model()
if hasattr(model, 'created_at'):
for item in data:
item['created_at'] = datetime.now()
stmt = mysql.insert(getattr(model, '__table__')).values(data)
for_update = []
for k, v in data[0].items():
for_update.append(k)
dup = {k: getattr(stmt.inserted, k) for k in for_update}
stmt = stmt.on_duplicate_key_update(**dup)
self.db.session.execute(stmt)
self.db.session.commit()
Usage:
myrepo.upsert([
{
"field11": "value11",
"field21": "value21",
"field31": "value31",
},
{
"field12": "value12",
"field22": "value22",
"field32": "value32",
},
])
The other answers have this covered but figured I'd reference another good example for mysql I found in this gist. This also includes the use of LAST_INSERT_ID, which may be useful depending on your innodb auto increment settings and whether your table has a unique key. Lifting the code here for easy reference but please give the author a star if you find it useful.
from app import db
from sqlalchemy import func
from sqlalchemy.dialects.mysql import insert
def upsert(model, insert_dict):
"""model can be a db.Model or a table(), insert_dict should contain a primary or unique key."""
inserted = insert(model).values(**insert_dict)
upserted = inserted.on_duplicate_key_update(
id=func.LAST_INSERT_ID(model.id), **{k: inserted.inserted[k]
for k, v in insert_dict.items()})
res = db.engine.execute(upserted)
return res.lastrowid
ORM
use upset func based on on_duplicate_key_update
class Model():
__input_data__ = dict()
def __init__(self, **kwargs) -> None:
self.__input_data__ = kwargs
self.session = Session(engine)
def save(self):
self.session.add(self)
self.session.commit()
def upsert(self, *, ingore_keys = []):
column_keys = self.__table__.columns.keys()
udpate_data = dict()
for key in self.__input_data__.keys():
if key not in column_keys:
continue
else:
udpate_data[key] = self.__input_data__[key]
insert_stmt = insert(self.__table__).values(**udpate_data)
all_ignore_keys = ['id']
if isinstance(ingore_keys, list):
all_ignore_keys =[*all_ignore_keys, *ingore_keys]
else:
all_ignore_keys.append(ingore_keys)
udpate_columns = dict()
for key in self.__input_data__.keys():
if key not in column_keys or key in all_ignore_keys:
continue
else:
udpate_columns[key] = insert_stmt.inserted[key]
on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(
**udpate_columns
)
# self.session.add(self)
self.session.execute(on_duplicate_key_stmt)
self.session.commit()
class ManagerAssoc(ORM_Base, Model):
def __init__(self, **kwargs):
self.id = idWorker.get_id()
column_keys = self.__table__.columns.keys()
udpate_data = dict()
for key in kwargs.keys():
if key not in column_keys:
continue
else:
udpate_data[key] = kwargs[key]
ORM_Base.__init__(self, **udpate_data)
Model.__init__(self, **kwargs, id = self.id)
....
# you can call it as following:
manager_assoc.upsert()
manager.upsert(ingore_keys = ['manager_id'])
Got a simpler solution:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def replace_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
s = s.replace("INSERT INTO", "REPLACE INTO")
return s
my_connection.execute(my_table.insert(replace_string=""), my_values)
I just used plain sql as:
insert_stmt = "REPLACE INTO tablename (column1, column2) VALUES (:column_1_bind, :columnn_2_bind) "
session.execute(insert_stmt, data)
Update Feb 2023: SQLAlchemy version 2 was recently released and supports on_duplicate_key_update in the MySQL dialect. Many many thanks to Federico Caselli of the SQLAlchemy project who helped me develop sample code in a discussion at https://github.com/sqlalchemy/sqlalchemy/discussions/9328
Please see https://stackoverflow.com/a/75538576/1630244
If it's ok to post the same answer twice (?) here is my small self-contained code example:
import sqlalchemy as db
import sqlalchemy.dialects.mysql as mysql
from sqlalchemy import delete, select, String
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
class Base(DeclarativeBase):
pass
class User(Base):
__tablename__ = "foo"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(String(30))
engine = db.create_engine('mysql+mysqlconnector://USER-NAME-HERE:PASS-WORD-HERE#localhost/SCHEMA-NAME-HERE')
conn = engine.connect()
# setup step 0 - ensure the table exists
Base().metadata.create_all(bind=engine)
# setup step 1 - clean out rows with id 1..5
del_stmt = delete(User).where(User.id.in_([1, 2, 3, 4, 5]))
conn.execute(del_stmt)
conn.commit()
sel_stmt = select(User)
users = list(conn.execute(sel_stmt))
print(f'Table size after cleanout: {len(users)}')
# setup step 2 - insert 4 rows
ins_stmt = mysql.insert(User).values(
[
{"id": 1, "name": "x"},
{"id": 2, "name": "y"},
{"id": 3, "name": "w"},
{"id": 4, "name": "z"},
]
)
conn.execute(ins_stmt)
conn.commit()
users = list(conn.execute(sel_stmt))
print(f'Table size after insert: {len(users)}')
# demonstrate upsert
ups_stmt = mysql.insert(User).values(
[
{"id": 1, "name": "xx"},
{"id": 2, "name": "yy"},
{"id": 3, "name": "ww"},
{"id": 5, "name": "new"},
]
)
ups_stmt = ups_stmt.on_duplicate_key_update(name=ups_stmt.inserted.name)
# if you want to see the compiled result
# x = ups_stmt.compile(dialect=mysql.dialect())
# print(x.string, x.construct_params())
conn.execute(ups_stmt)
conn.commit()
users = list(conn.execute(sel_stmt))
print(f'Table size after upsert: {len(users)}')
As none of these solutions seem all the elegant. A brute force way is to query to see if the row exists. If it does delete the row and then insert otherwise just insert. Obviously some overhead involved but it does not rely on modifying the raw sql and it works on non orm stuff.
I have some problems with setting up the dictionary collection in Python's SQLAlchemy:
I am using declarative definition of tables. I have Item table in 1:N relation with Record table. I set up the relation using the following code:
_Base = declarative_base()
class Record(_Base):
__tablename__ = 'records'
item_id = Column(String(M_ITEM_ID), ForeignKey('items.id'))
id = Column(String(M_RECORD_ID), primary_key=True)
uri = Column(String(M_RECORD_URI))
name = Column(String(M_RECORD_NAME))
class Item(_Base):
__tablename__ = 'items'
id = Column(String(M_ITEM_ID), primary_key=True)
records = relation(Record, collection_class=column_mapped_collection(Record.name), backref='item')
Now I want to work with the Items and Records. Let's create some objects:
i1 = Item(id='id1')
r = Record(id='mujrecord')
And now I want to associate these objects using the following code:
i1.records['source_wav'] = r
but the Record r doesn't have set the name attribute (the foreign key). Is there any solution how to automatically ensure this? (I know that setting the foreign key during the Record creation works, but it doesn't sound good for me).
Many thanks
You want something like this:
from sqlalchemy.orm import validates
class Item(_Base):
[...]
#validates('records')
def validate_record(self, key, record):
assert record.name is not None, "Record fails validation, must have a name"
return record
With this, you get the desired validation:
>>> i1 = Item(id='id1')
>>> r = Record(id='mujrecord')
>>> i1.records['source_wav'] = r
Traceback (most recent call last):
[...]
AssertionError: Record fails validation, must have a name
>>> r.name = 'foo'
>>> i1.records['source_wav'] = r
>>>
I can't comment yet, so I'm just going to write this as a separate answer:
from sqlalchemy.orm import validates
class Item(_Base):
[...]
#validates('records')
def validate_record(self, key, record):
record.name=key
return record
This is basically a copy of Gunnlaugur's answer but abusing the validates decorator to do something more useful than exploding.
You have:
backref='item'
Is this a typo for
backref='name'
?