I have two tables, say A and B. Both have a primary key id. They have a many-to-many relationship, SEC.
SEC = Table('sec', Base.metadata,
Column('a_id', Integer, ForeignKey('A.id'), primary_key=True, nullable=False),
Column('b_id', Integer, ForeignKey('B.id'), primary_key=True, nullable=False)
)
class A():
...
id = Column(Integer, primary_key=True)
...
rels = relationship(B, secondary=SEC)
class B():
...
id = Column(Integer, primary_key=True)
...
Let's consider this piece of code.
a = A()
b1 = B()
b2 = B()
a.rels = [b1, b2]
...
#some place later
b3 = B()
a.rels = [b1, b3] # errors sometimes
Sometimes, I get an error at the last line saying
duplicate key value violates unique constraint a_b_pkey
In my understanding, I think it tries to add (a.id, b.id) into 'sec' table again resulting in a unique constraint error. Is that what it is? If so, how can I avoid this? If not, why do I have this error?
The problem is you want to make sure the instances you create are unique. We can create an alternate constructor that checks a cache of existing uncommited instances or queries the database for existing commited instance before returning a new instance.
Here is a demonstration of such a method:
from sqlalchemy import Column, Integer, String, ForeignKey, Table
from sqlalchemy.engine import create_engine
from sqlalchemy.ext.declarative.api import declarative_base
from sqlalchemy.orm import sessionmaker, relationship
engine = create_engine('sqlite:///:memory:', echo=True)
Session = sessionmaker(engine)
Base = declarative_base(engine)
session = Session()
class Role(Base):
__tablename__ = 'role'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False, unique=True)
#classmethod
def get_unique(cls, name):
# get the session cache, creating it if necessary
cache = session._unique_cache = getattr(session, '_unique_cache', {})
# create a key for memoizing
key = (cls, name)
# check the cache first
o = cache.get(key)
if o is None:
# check the database if it's not in the cache
o = session.query(cls).filter_by(name=name).first()
if o is None:
# create a new one if it's not in the database
o = cls(name=name)
session.add(o)
# update the cache
cache[key] = o
return o
Base.metadata.create_all()
# demonstrate cache check
r1 = Role.get_unique('admin') # this is new
r2 = Role.get_unique('admin') # from cache
session.commit() # doesn't fail
# demonstrate database check
r1 = Role.get_unique('mod') # this is new
session.commit()
session._unique_cache.clear() # empty cache
r2 = Role.get_unique('mod') # from database
session.commit() # nop
# show final state
print session.query(Role).all() # two unique instances from four create calls
The create_unique method was inspired by the example from the SQLAlchemy wiki. This version is much less convoluted, favoring simplicity over flexibility. I have used it in production systems with no problems.
There are obviously improvements that can be added; this is just a simple example. The get_unique method could be inherited from a UniqueMixin, to be used for any number of models. More flexible memoizing of arguments could be implemented. This also puts aside the problem of multiple threads inserting conflicting data mentioned by Ants Aasma; handling that is more complex but should be an obvious extension. I leave that to you.
The error you mention is indeed from inserting a conflicting value to the sec table. To be sure that it is from the operation you think it is, not some previous change, turn on SQL logging and check what values is it trying to insert before erroring out.
When overwriting a many-to-many collection value, SQLAlchemy compares the new contents of the collection with the state in the database and correspondingly issues delete and insert statements. Unless you are poking around in SQLAlchemy internals, there should be two ways to encounter this error.
First is concurrent modification: Process 1 fetches the value a.rels and notices that it is empty, meanwhile Process 2 also fetches a.rels, sets it to [b1, b2] and commits flushing the (a,b1),(a,b2) tuples, Process 1 sets a.rels to [b1, b3] noticing that the previous contents was empty and when it tries to flush the sec tuple (a,b1) it gets a duplicate key error. The correct action in such cases is usually to retry the transaction from the top. You can use serializable transaction isolation to instead get a serialization error in this case that is distinct from a business logic error causing a duplicate key error.
The second case happens when you have managed to convince SQLAlchemy that you don't need to know the database state by setting the loading strategy of the rels attribute to noload. This can be done when defining the relationship by adding the lazy='noload' parameter, or when querying, calling .options(noload(A.rels)) on the query. SQLAlchemy will assume that sec table has no matching rows for objects loaded with this strategy in effect.
Related
I'm attempting to check if the key of a particular entity (as defined with an ORM class) already exists in a database.
Note that it needs to work regardless of the names of the (key) attributes. It needs to work generically across all ORM models.
Here's the basic structure of what I need:
# User class definition using ORM
class Employee(Base):
__tablename__ = 'User'
USER_ID = Column(String(50), primary_key=True)
... # OTHER FIELDS EXIST HERE, NOT PART OF PRIMARY KEY
# We have a particular ORM instance
# We need to check if its PK exists in table
# Other fields may have any values
user_instance = User(USER_ID = 1)
# We have a session with DB
session = Session()
# ===
# Now, how do we check if User with USER_ID = 1 (dynamically - we might have different data)
# ===
I am aware of sqlalchemy.inspect but I'm not really sure how it would be applied in this case.
Any help is greatly appreciated.
My SQLAlchemy application (running on top of MariaDB) includes two models MyModelA and MyModelB where the latter is a child-record of the former:
class MyModelA(db.Model):
a_id = db.Column(db.Integer, nullable=False, primary_key=True)
my_field1 = db.Column(db.String(1024), nullable=True)
class MyModelB(db.Model):
b_id = db.Column(db.Integer, nullable=False, primary_key=True)
a_id = db.Column(db.Integer, db.ForeignKey(MyModelA.a_id), nullable=False)
my_field2 = db.Column(db.String(1024), nullable=True)
These are the instances of MyModelA and MyModelB that I create:
>>> my_a = MyModelA(my_field1="A1")
>>> my_a.aid
1
>>> MyModelB(a_id=my_a.aid, my_field2="B1")
I have the following code that deletes the instance of MyModelA where a_id==1:
db.session.commit()
try:
my_a = MyModelA.query.get(a_id=1)
assert my_a is not None
print "#1) Number of MyModelAs: %s\n" % MyModelA.query.count()
db.session.delete(my_a)
db.session.commit()
except IntegrityError:
print "#2) Cannot delete instance of MyModelA because it has child record(s)!"
db.session.rollback()
print "#3) Number of MyModelAs: %s\n" % MyModelA.query.count()
When I run this code look at the unexpected results I get:
#1) Number of MyModelAs: 1
#2) Cannot delete instance of MyModelA because it has child record(s)!
#3) Number of MyModelAs: 0
The delete supposedly fails and the DB throws an exception which causes a rollback. However even after the rollback, the number of rows in the table indicates that the row -- which supposedly wasn't deleted -- is actually gone!!!
Why is this happening? How can I fix this? It seems like a bug in SQLAlchemy.
TL;DR
Your problem might be related to the lack of explicit relationship declaration.
For example, here there is a sample of objects' relationship. In addition to the usage of a ForeignKey field, the class explicitly uses the relationship directive to define that connection. In the session API documentation, the following text appears:
object references should be constructed at the object level, not at the foreign key level
Which might imply to the way of SQLAlchemy to manage relations. I am not deeply familiar with the underlying mechanisms, but it is possible that this is what happens. Your session only includes the MyModelA object. Since you did not use the relationship() directive in the definition of MyModelB, objects of MyModelA type are not aware to the fact that some other object might refer them through a ForeignKey. Hence, when the session is about to commit, it is not aware to the fact that deleting the object affects some other MyModelB object, and its transaction mechanism does not take that into account.
I suggest that adding the relationship explicitly might prevent that behavior.
I'm starting a new application and looking at using an ORM -- in particular, SQLAlchemy.
Say I've got a column 'foo' in my database and I want to increment it. In straight sqlite, this is easy:
db = sqlite3.connect('mydata.sqlitedb')
cur = db.cursor()
cur.execute('update table stuff set foo = foo + 1')
I figured out the SQLAlchemy SQL-builder equivalent:
engine = sqlalchemy.create_engine('sqlite:///mydata.sqlitedb')
md = sqlalchemy.MetaData(engine)
table = sqlalchemy.Table('stuff', md, autoload=True)
upd = table.update(values={table.c.foo:table.c.foo+1})
engine.execute(upd)
This is slightly slower, but there's not much in it.
Here's my best guess for a SQLAlchemy ORM approach:
# snip definition of Stuff class made using declarative_base
# snip creation of session object
for c in session.query(Stuff):
c.foo = c.foo + 1
session.flush()
session.commit()
This does the right thing, but it takes just under fifty times as long as the other two approaches. I presume that's because it has to bring all the data into memory before it can work with it.
Is there any way to generate the efficient SQL using SQLAlchemy's ORM? Or using any other python ORM? Or should I just go back to writing the SQL by hand?
SQLAlchemy's ORM is meant to be used together with the SQL layer, not hide it. But you do have to keep one or two things in mind when using the ORM and plain SQL in the same transaction. Basically, from one side, ORM data modifications will only hit the database when you flush the changes from your session. From the other side, SQL data manipulation statements don't affect the objects that are in your session.
So if you say
for c in session.query(Stuff).all():
c.foo = c.foo+1
session.commit()
it will do what it says, go fetch all the objects from the database, modify all the objects and then when it's time to flush the changes to the database, update the rows one by one.
Instead you should do this:
session.execute(update(stuff_table, values={stuff_table.c.foo: stuff_table.c.foo + 1}))
session.commit()
This will execute as one query as you would expect, and because at least the default session configuration expires all data in the session on commit you don't have any stale data issues.
In the almost-released 0.5 series you could also use this method for updating:
session.query(Stuff).update({Stuff.foo: Stuff.foo + 1})
session.commit()
That will basically run the same SQL statement as the previous snippet, but also select the changed rows and expire any stale data in the session. If you know you aren't using any session data after the update you could also add synchronize_session=False to the update statement and get rid of that select.
session.query(Clients).filter(Clients.id == client_id_list).update({'status': status})
session.commit()
Try this =)
There are several ways to UPDATE using sqlalchemy
1) for c in session.query(Stuff).all():
c.foo += 1
session.commit()
2) session.query(Stuff).update({"foo": Stuff.foo + 1})
session.commit()
3) conn = engine.connect()
table = Stuff.__table__
stmt = table.update().values({'foo': Stuff.foo + 'a'})
conn.execute(stmt)
conn.commit()
Here's an example of how to solve the same problem without having to map the fields manually:
from sqlalchemy import Column, ForeignKey, Integer, String, Date, DateTime, text, create_engine
from sqlalchemy.exc import IntegrityError
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.attributes import InstrumentedAttribute
engine = create_engine('postgres://postgres#localhost:5432/database')
session = sessionmaker()
session.configure(bind=engine)
Base = declarative_base()
class Media(Base):
__tablename__ = 'media'
id = Column(Integer, primary_key=True)
title = Column(String, nullable=False)
slug = Column(String, nullable=False)
type = Column(String, nullable=False)
def update(self):
s = session()
mapped_values = {}
for item in Media.__dict__.iteritems():
field_name = item[0]
field_type = item[1]
is_column = isinstance(field_type, InstrumentedAttribute)
if is_column:
mapped_values[field_name] = getattr(self, field_name)
s.query(Media).filter(Media.id == self.id).update(mapped_values)
s.commit()
So to update a Media instance, you can do something like this:
media = Media(id=123, title="Titular Line", slug="titular-line", type="movie")
media.update()
If it is because of the overhead in terms of creating objects, then it probably can't be sped up at all with SA.
If it is because it is loading up related objects, then you might be able to do something with lazy loading. Are there lots of objects being created due to references? (IE, getting a Company object also gets all of the related People objects).
Withough testing, I'd try:
for c in session.query(Stuff).all():
c.foo = c.foo+1
session.commit()
(IIRC, commit() works without flush()).
I've found that at times doing a large query and then iterating in python can be up to 2 orders of magnitude faster than lots of queries. I assume that iterating over the query object is less efficient than iterating over a list generated by the all() method of the query object.
[Please note comment below - this did not speed things up at all].
I'm trying to use association proxies to make dealing with tag-style records a little simpler, but I'm running into a problem enforcing uniqueness and getting objects to reuse existing tags rather than always create new ones.
Here is a setup similar to what I have. The examples in the documentation have a few recipes for enforcing uniqueness, but they all rely on having access to a session and usually require a single global session, which I cannot do in my case.
from sqlalchemy import Column, Integer, String, create_engine, ForeignKey
from sqlalchemy.orm import sessionmaker, relationship
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.ext.associationproxy import association_proxy
Base = declarative_base()
engine = create_engine('sqlite://', echo=True)
Session = sessionmaker(bind=engine)
def _tag_find_or_create(name):
# can't use global objects here, may be multiple sessions and engines
# ?? No access to session here, how to do a query
tag = session.query(Tag).filter_by(name=name).first()
tag = Tag.query.filter_by(name=name).first()
if not tag:
tag = Tag(name=name)
return tag
class Item(Base)
__tablename__ = 'item'
id = Column(Integer, primary_key=True)
tags = relationship('Tag', secondary='itemtag')
tagnames = association_proxy('tags', 'name', creator=_tag_find_or_create)
class ItemTag(Base)
__tablename__ = 'itemtag'
id = Column(Integer, primary_key=True)
item_id = Column(Integer, ForeignKey('item.id'))
tag_id = Column(Integer, ForeignKey('tag.id'))
class Tag(Base)
__tablename__ = 'tag'
id = Column(Integer, primary_key=True)
name = Column(String(50), nullable=False)
# Scenario 1
session = Session()
item = Item()
session.add(item)
item.tagnames.append('red')
# Scenario 2
item2 = Item()
item2.tagnames.append('blue')
item2.tagnames.append('red')
session.add(item2)
Without the creator function, I just get tons of duplicate Tag items. The creator function seems like the most obvious place to put this type of check, but I'm unsure how to do a query from inside the creator function.
Consider the two scenarios provided at the bottom of the example. In the first example, it seems like there should be a way to get access to the session in the creator function, since the object the tags are being added to is already associated with a session.
In the second example, the Item object isn't yet associated with a session, so the validation check can't happen in the creator function. It would have to happen later when the object is actually added to a session.
For the first scenario, how would I go about getting access to the session object in the creator function?
For the second scenario, is there a way to "listen" for when the parent object is added to a session and validate the association proxies at that point?
For the first scenario, you can use object_session.
As for the question overall: true, you need access to the current session; if using scoped_session in your application is appropriate, then the second part of the Recipe you link to should work fine to use. See Contextual/Thread-local Sessions for more info.
Working with events and change objects when they change from transient to persistent state will not make your code pretty or very robust. So I would immediately add new Tag objects to the session, and if the transaction is rolled back, they would not be in the database.
Note that in a multi-user environment you are likely to have race condition: the same tag is new and created in simultaneously by two users. The user who commits last will fail (if you have a unique constraint on the database).
In this case you might consider be without the unique constraint, and have a (daily) procedure to clean those duplicates up (and reassign relations). With time there would be less and less new items, and less possibilities for such clashes.
I'm starting a new application and looking at using an ORM -- in particular, SQLAlchemy.
Say I've got a column 'foo' in my database and I want to increment it. In straight sqlite, this is easy:
db = sqlite3.connect('mydata.sqlitedb')
cur = db.cursor()
cur.execute('update table stuff set foo = foo + 1')
I figured out the SQLAlchemy SQL-builder equivalent:
engine = sqlalchemy.create_engine('sqlite:///mydata.sqlitedb')
md = sqlalchemy.MetaData(engine)
table = sqlalchemy.Table('stuff', md, autoload=True)
upd = table.update(values={table.c.foo:table.c.foo+1})
engine.execute(upd)
This is slightly slower, but there's not much in it.
Here's my best guess for a SQLAlchemy ORM approach:
# snip definition of Stuff class made using declarative_base
# snip creation of session object
for c in session.query(Stuff):
c.foo = c.foo + 1
session.flush()
session.commit()
This does the right thing, but it takes just under fifty times as long as the other two approaches. I presume that's because it has to bring all the data into memory before it can work with it.
Is there any way to generate the efficient SQL using SQLAlchemy's ORM? Or using any other python ORM? Or should I just go back to writing the SQL by hand?
SQLAlchemy's ORM is meant to be used together with the SQL layer, not hide it. But you do have to keep one or two things in mind when using the ORM and plain SQL in the same transaction. Basically, from one side, ORM data modifications will only hit the database when you flush the changes from your session. From the other side, SQL data manipulation statements don't affect the objects that are in your session.
So if you say
for c in session.query(Stuff).all():
c.foo = c.foo+1
session.commit()
it will do what it says, go fetch all the objects from the database, modify all the objects and then when it's time to flush the changes to the database, update the rows one by one.
Instead you should do this:
session.execute(update(stuff_table, values={stuff_table.c.foo: stuff_table.c.foo + 1}))
session.commit()
This will execute as one query as you would expect, and because at least the default session configuration expires all data in the session on commit you don't have any stale data issues.
In the almost-released 0.5 series you could also use this method for updating:
session.query(Stuff).update({Stuff.foo: Stuff.foo + 1})
session.commit()
That will basically run the same SQL statement as the previous snippet, but also select the changed rows and expire any stale data in the session. If you know you aren't using any session data after the update you could also add synchronize_session=False to the update statement and get rid of that select.
session.query(Clients).filter(Clients.id == client_id_list).update({'status': status})
session.commit()
Try this =)
There are several ways to UPDATE using sqlalchemy
1) for c in session.query(Stuff).all():
c.foo += 1
session.commit()
2) session.query(Stuff).update({"foo": Stuff.foo + 1})
session.commit()
3) conn = engine.connect()
table = Stuff.__table__
stmt = table.update().values({'foo': Stuff.foo + 'a'})
conn.execute(stmt)
conn.commit()
Here's an example of how to solve the same problem without having to map the fields manually:
from sqlalchemy import Column, ForeignKey, Integer, String, Date, DateTime, text, create_engine
from sqlalchemy.exc import IntegrityError
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.attributes import InstrumentedAttribute
engine = create_engine('postgres://postgres#localhost:5432/database')
session = sessionmaker()
session.configure(bind=engine)
Base = declarative_base()
class Media(Base):
__tablename__ = 'media'
id = Column(Integer, primary_key=True)
title = Column(String, nullable=False)
slug = Column(String, nullable=False)
type = Column(String, nullable=False)
def update(self):
s = session()
mapped_values = {}
for item in Media.__dict__.iteritems():
field_name = item[0]
field_type = item[1]
is_column = isinstance(field_type, InstrumentedAttribute)
if is_column:
mapped_values[field_name] = getattr(self, field_name)
s.query(Media).filter(Media.id == self.id).update(mapped_values)
s.commit()
So to update a Media instance, you can do something like this:
media = Media(id=123, title="Titular Line", slug="titular-line", type="movie")
media.update()
If it is because of the overhead in terms of creating objects, then it probably can't be sped up at all with SA.
If it is because it is loading up related objects, then you might be able to do something with lazy loading. Are there lots of objects being created due to references? (IE, getting a Company object also gets all of the related People objects).
Withough testing, I'd try:
for c in session.query(Stuff).all():
c.foo = c.foo+1
session.commit()
(IIRC, commit() works without flush()).
I've found that at times doing a large query and then iterating in python can be up to 2 orders of magnitude faster than lots of queries. I assume that iterating over the query object is less efficient than iterating over a list generated by the all() method of the query object.
[Please note comment below - this did not speed things up at all].