composite UniqueConstraint and on_conflict_do_nothing don't combine? - python

I have a database with a table that has a composite unique constraint. So the table in flask-sqlalchemy looks something like this:
class MyModel(db.Model):
id = db.Column(db.Integer, index=True)
name = db.Column(db.String(80))
collection = db.Column(db.String(80))
entry_type = db.Column(db.String(80))
__table_args__ = (
db.UniqueConstraint(name, collection),
)
Now I want to be able to insert several rows at a time, but my input could have duplicates, so I wanted the unique constraint to protect me.
I found that sqlite has an option for ON CONFLICT DO NOTHING, so I tried to do this with flask-sqlalchemy.
Following this stackoverflow answer and the documentation, what I have done so far is:
my_vals = (id, name, collection, entry_type)
insert_command = insert(MyModel.__table__).values(my_vals).on_conflict_do_update(
index_elements=(MyModel.name, MyModel.collection))
db.session.execute(insert_command)
db.session.commit()
However, I keep getting (IntegrityError) UNIQUE constraint failed: mymodel.name, mymodel.collection
and I can see the resulting sqlite command and the ON CONFLICT part is not added.
If I write: on_conflict_do_update(index_elements=(MyModel.name,))
I get OperationError ON CONFLICT clause does not match any PRIMARY KEY or UNIQUE constraint but the ON CONFLICT (name) DO NOTHING part is added to the sqlite command.
Am I doing something wrong? Is it a bug? The documentation clearly states that the method expects a sequence, so why doesn't it work for more than one column?
PS: Please ignore any typos, the code is in another computer that has no access to internet.

Related

Postgres SQLAlchemy auto increment not working

I have a model class :
class User(PBase):
__tablename__ = "users"
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False, unique=True)
Now as per the documentation , when type Integer is used along with primary_key , a sequence is generated automatically. Here is the output table
id | integer | not null default nextval('users_id_seq'::regclass)
As you can see a default sequence is generated in the modifiers column.
But when I try to add the second user, I get integrity error on primary key constraint.
IntegrityError) duplicate key value violates unique constraint "users_pkey"
DETAIL: Key (id)=(1) already exists.
What is wrong here?
Edit: Code for adding the user, a snap shot
def create(name, email, roleid)
with self._session_context() as session:
user = User(name, email, roleid)
session.add(user)
session.commit()
So, figured out and answering here, so it may help others. So with Postgres if you happen to supply the id field when you insert a new record, the sequence of the table is not used. Upon further insertion if you don't specify the id, the sequence table is not used and hence you have duplication. In my app few records where default loaded from a JSON file and id was specified for these records, but for all non default values no id was supplied during insertion. This helped me
It can be solved by issuing the following query on your database.
SELECT setval('users_id_seq', MAX(id)) FROM users;

How can I tell SQLAlchemy to use a different identity rule for Session.merge (instead of the PK)?

I have a legacy DB which was blindly created with auto-increment IDs even though there's a perfectly valid natural key in the table.
This ends up with code littered with code along the lines:
Fetch row with natural key 'x'
if exists:
update row with NK 'x'
else:
insert row with NK 'x'
Essentially an upsert.
This use-case (upsert) is covered by Session.merge() from SQLAlchemy. But SA will only look at the primary key of the table to reconcile whether it has to do an insert or update. In the existing DB, the PK does however - contrary to what it should do - not represent the true identity of the row. So the same identity can appear with multiple auto-increment IDs. There are some other business rules in place to ensure uniqueness. But the ID 1 of today can be ID 3246 tomorrow!
There is currently no good way to modify the DB in a sensible manner as too many legacy applications are dependent on the structure as it is.
For the sake of a tangible example, assume we have network devices in the table, and take their hostname as natural key. The current DB would look something like this:
CREATE TABLE device (
id SERIAL PRIMARY KEY,
hostname TEXT UNIQUE,
some_other_column TEXT
)
The corresponding SA model:
class Device(Base):
id = Column(Integer, primary_key=True)
hostname = Column(String(256))
some_other_column = Column(String(20))
I would like to be able to do the following:
mydevice = Device(hostname='hello-world', some_other_column='foo')
merged_device = session.merge(mydevice)
session.commit()
In this example, I would like SA to do an "insert or update". But with the current model, this would actually result in an error (due to the unique hostname constraint).
I could specify the hostname column as primary key in the SA model (and leave the PK in the DB as-is). But that looks a bit hacky. Is there not a more explicit and understandable way to tell SQLAlchemy that it should use "hostname" as identity? And if yes, how?
In situations like this, i find it best to lie to sqlalchemy. Tell it that the natural key is primary.
class Device(Base):
hostname = Column(String(256), primary_key=True)
some_other_column = Column(String(20))

SQLalchemy Bulk insert with one to one relation

I have the following model where TableA and TableB have 1 to 1 relationship:
class TableA(db.Model):
id = Column(db.BigInteger, primary_key=True)
title = Column(String(1024))
table_b = relationship('TableB', uselist=False, back_populates="table_a")
class TableB(db.Model):
id = Column(BigInteger, ForeignKey(TableA.id), primary_key=True)
a = relationship('TableA', back_populates='table_b')
name = Column(String(1024))
when I insert 1 record everything goes fine:
rec_a = TableA(title='hello')
rec_b = TableB(a=rec_a, name='world')
db.session.add(rec_b)
db.session.commit()
but when I try to do this for bulk of records:
bulk_ = []
for title, name in zip(titles, names):
rec_a = TableA(title=title)
bulk_.append(TableB(a=rec_a, name=name))
db.session.bulk_save_objects(bulk_)
db.session.commit()
I get the following exception:
sqlalchemy.exc.InternalError: (pymysql.err.InternalError) (1364, "Field 'id' doesn't have a default value")
Am I doing something wrong? Did I configure the model wrong?
Is there a way to bulk commit this type of data?
The error you see is thrown by Mysql. It is complaining that the attempt to insert records into table_b violates the foreign key constraint.
One technique could be to write all the titles in one bulk statement, then write all the names in a 2nd bulk statement. Also, I've never passed relationships successfully to bulk operations, to this method relies on inserting simple values.
bulk_titles = [TableA(title=title) for title in titles]
session.bulk_save_objects(bulk_titles, return_defauls=True)
bulk_names = [TableB(id=title.id, name=name) for title, name in zip(bulk_titles, names)]
session.bulk_save_objects(bulk_names)
return_defaults=True is needed above because we need title.id in the 2nd bulk operation. But this greatly reduces the performance gains of the bulk operation
To avoid the performance degradation due to return_defauts=True, you could generate the primary keys from the application, rather than the database, e.g. using uuids, or fetching the max id in each table and generating a range from that start value.
Another technique might be to write your bulk insert statement using sqlalchemy core or plain text.

SQLAlchemy: how should I define a column's default value computed using a reference to the table containing that column?

SQLAlchemy: how should I define a column's default value computed using a reference to the table containing that column?
Let's use these tables as an example (SQLite):
CREATE TABLE department (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE employee (
id INTEGER,
name TEXT NOT NULL,
department_id INTEGER NOT NULL,
FOREIGN KEY (department_id) REFERENCES department(id),
PRIMARY KEY (id, department_id)
);
I want each eployee's ID to be unique only with respect to their department. On INSERT, a new employee ID should be generated that is one larger than the previously-highest employee ID in that department.
Put in raw SQL, here's what I'm looking to do:
INSERT INTO employee(
id,
name,
department_id
)
VALUES (
(
SELECT coalesce(MAX(id),0)+1
FROM employee
WHERE department_id=?
),
?,
?
)
What's the best way to do this using SQLAlchemy?
I think I'm looking for something similar to the third column example in here. Something like this:
employee_table = Table("employee", meta,
Column('id', Integer, primary_key=True, autoincrement=False,
default=keyvalues.select(
func.max(employee_table.c.id)
).filter_by(department_id=??))
Column('department_id', Integer, ForeignKey('department.id'),
nullable=False, primary_key=True, autoincrement=False)
Column('name', String(127), nullable=False),
)
That doesn't work, of course: I don't have a reference to the employee table yet (since I'm still defining it) and because I don't know how to reference the "current" department_id in the filter_by clause. (There are quite possibly other problems, too)
Alternatively, if it is not possible to do this through the Python API, is there any way I can just specify a column's default value (applied at INSERT time) using raw SQL? Or do i need to use raw SQL for the entire insert?
Note: my situation is basically the same as in this question, but the solution I'm looking for is different: I want to use a nested SELECT in my inserts rather than create a DB trigger.
EDIT
I'm getting closer to solving the problem, but I'm still not there yet.
agronholm in #sqlalchemy explained that by just using default there would be no way to fill in the department_id because although it's possible to have the selectable used as the default on INSERT, there is no way to fill in parameters (the department_id)
Instead, agronholm suggested the best solution is to create the subquery within the constructor. By assigning the query (not running it and assigning the result!), the id will be fetched in a sub-SELECT. This avoids the race condition that would result from performing the SELECT first on the Python side, and then assigning the result.
I'm trying out something like this:
def __init__(self, department, name):
self.id = db.select(
db.func.max(Employee.id)
).filter_by(department_id=department.id).as_scalar()
self.department = department
self.data = data
Unfortunately, this also doesn't work, because the calculated column is used as part of the primary key. It throws:
InvalidRequestError: Instance <XXXXX at 0x3d15d10> cannot be refreshed - it's not persistent and does not contain a full primary key.
In my original raw-SQLite version, I would access the newly-created row with the cursor's lastrowid. Is something similar possible in SQLAlchemy?
I ran into a similar problem and finally arrived at this solution. There's still room for improvement -- it does the SELECT before the INSERT rather than inlining it -- but it seems to work.
from sqlalchemy import sql
...
def default_employee_id(context):
return context.connection.execute(
sql.select(
[sql.func.ifnull(sql.func.max(employee_table.c.id), 0) + 1]
).where(
employee_table.c.department_id==context.current_parameters['department_id']
)
).scalar()
employee_table = Table("employee", meta,
Column('id', Integer, primary_key=True, autoincrement=False,
default=default_employee_id),
Column('department_id', Integer, ForeignKey('department.id'),
nullable=False, primary_key=True, autoincrement=False),
Column('name', String(127), nullable=False)
)
The next thing I would try is a trigger, even though the docs say it's a bad idea for a primary key.
Hooking into the "before_flush" event would probably have the same pre-select issue.
It may also be possible to alter or replace context.compiled argument in order to inject the SELECT into the INSERT, but that seems extreme for what we're trying to accomplish.

Why is SQLAlchemy/associationproxy duplicating my tags?

I'm trying to use association proxy for tags, in a very similar scenario to the example in the docs. Here is a subset of my schema (it's a blog), using declarative:
class Tag(Base):
__tablename__ = 'tags'
id = Column(Integer, primary_key=True)
tag = Column(Unicode(255), unique=True, nullable=False)
class EntryTag(Base):
__tablename__ = 'entrytags'
entry_id = Column(Integer, ForeignKey('entries.id'), key='entry', primary_key=True)
tag_id = Column(Integer, ForeignKey('tags.id'), key='tag', primary_key=True)
class Entry(Base):
__tablename__ = 'entries'
id = Column(Integer, primary_key=True)
subject = Column(Unicode(255), nullable=False)
# some other fields here
_tags = relation('Tag', backref='entries', secondary=EntryTag.__table__)
tags = association_proxy('_tags','tag')
Here's how I'm trying to use it:
>>> e = db.query(Entry).first()
>>> e.tags
[u'foo']
>>> e.tags = [u'foo', u'bar'] # really this is from a comma-separated input
db.commit()
Traceback (most recent call last):
[...]
sqlalchemy.exc.IntegrityError: (IntegrityError) duplicate key value violates unique constraint "tags_tag_key"
'INSERT INTO tags (id, tag) VALUES (%(id)s, %(tag)s)' {'tag': 'bar', 'id': 11L}
>>> map(lambda t:(t.id,t.tag), db.query(Tag).all())
[(1, u'foo'), (2, u'bar'), (3, u'baz')]
The tag u'bar' already existed with id 2; why didn't SQLAlchemy just attach that one instead of trying to create it? Is my schema wrong somehow?
Disclaimer: it's been ages since I used SQLAlchemy so this is more of a guess than anything.
It looks like you're expecting SQLAlchemy to magically take the string 'bar' and look up the relevant Tag for it when performing the insert on the many-to-many table. I expect this is invalid, because the field in question ('tag') is not a primary key.
Imagine a similar situation where your Tag table is actually Comment, also with an id and a text field. You'd expect to be able to add Comments to an Entry with the same e.comments = ['u'Foo', 'u'Bar'] syntax that you've used above, but you'd want it to just perform INSERTs, not check for existing comments with the same content.
So that is probably what it's doing here, but it hits the uniqueness constraint on your tag name and fails, assuming that you're attempting to do the wrong thing.
How to fix it? Making tags.tag the primary key is arguably the correct thing to do, although I don't know how efficient that is nor how well SQLAlchemy handles it. Failing that, try querying for Tag objects by name before assigning them to the entry. You may have to write a little utility function that takes a unicode string and either returns an existing Tag or creates a new one for you.
I've never used SQLAlchemy 0.5 yet (my last app using it was 0.4 based) but I can see one quirk in your code: you should modify the association_proxy object, not reassign it.
Try doing something like:
e.tags.append(u"bar")
Instead of
e.tags = ...
If that doesn't work, try pasting a complete working example for those tables (including the imports, please!) and I'll give you some more advice.

Categories