I have a model class :
class User(PBase):
__tablename__ = "users"
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False, unique=True)
Now as per the documentation , when type Integer is used along with primary_key , a sequence is generated automatically. Here is the output table
id | integer | not null default nextval('users_id_seq'::regclass)
As you can see a default sequence is generated in the modifiers column.
But when I try to add the second user, I get integrity error on primary key constraint.
IntegrityError) duplicate key value violates unique constraint "users_pkey"
DETAIL: Key (id)=(1) already exists.
What is wrong here?
Edit: Code for adding the user, a snap shot
def create(name, email, roleid)
with self._session_context() as session:
user = User(name, email, roleid)
session.add(user)
session.commit()
So, figured out and answering here, so it may help others. So with Postgres if you happen to supply the id field when you insert a new record, the sequence of the table is not used. Upon further insertion if you don't specify the id, the sequence table is not used and hence you have duplication. In my app few records where default loaded from a JSON file and id was specified for these records, but for all non default values no id was supplied during insertion. This helped me
It can be solved by issuing the following query on your database.
SELECT setval('users_id_seq', MAX(id)) FROM users;
Related
I have a database with a table that has a composite unique constraint. So the table in flask-sqlalchemy looks something like this:
class MyModel(db.Model):
id = db.Column(db.Integer, index=True)
name = db.Column(db.String(80))
collection = db.Column(db.String(80))
entry_type = db.Column(db.String(80))
__table_args__ = (
db.UniqueConstraint(name, collection),
)
Now I want to be able to insert several rows at a time, but my input could have duplicates, so I wanted the unique constraint to protect me.
I found that sqlite has an option for ON CONFLICT DO NOTHING, so I tried to do this with flask-sqlalchemy.
Following this stackoverflow answer and the documentation, what I have done so far is:
my_vals = (id, name, collection, entry_type)
insert_command = insert(MyModel.__table__).values(my_vals).on_conflict_do_update(
index_elements=(MyModel.name, MyModel.collection))
db.session.execute(insert_command)
db.session.commit()
However, I keep getting (IntegrityError) UNIQUE constraint failed: mymodel.name, mymodel.collection
and I can see the resulting sqlite command and the ON CONFLICT part is not added.
If I write: on_conflict_do_update(index_elements=(MyModel.name,))
I get OperationError ON CONFLICT clause does not match any PRIMARY KEY or UNIQUE constraint but the ON CONFLICT (name) DO NOTHING part is added to the sqlite command.
Am I doing something wrong? Is it a bug? The documentation clearly states that the method expects a sequence, so why doesn't it work for more than one column?
PS: Please ignore any typos, the code is in another computer that has no access to internet.
I have a legacy DB which was blindly created with auto-increment IDs even though there's a perfectly valid natural key in the table.
This ends up with code littered with code along the lines:
Fetch row with natural key 'x'
if exists:
update row with NK 'x'
else:
insert row with NK 'x'
Essentially an upsert.
This use-case (upsert) is covered by Session.merge() from SQLAlchemy. But SA will only look at the primary key of the table to reconcile whether it has to do an insert or update. In the existing DB, the PK does however - contrary to what it should do - not represent the true identity of the row. So the same identity can appear with multiple auto-increment IDs. There are some other business rules in place to ensure uniqueness. But the ID 1 of today can be ID 3246 tomorrow!
There is currently no good way to modify the DB in a sensible manner as too many legacy applications are dependent on the structure as it is.
For the sake of a tangible example, assume we have network devices in the table, and take their hostname as natural key. The current DB would look something like this:
CREATE TABLE device (
id SERIAL PRIMARY KEY,
hostname TEXT UNIQUE,
some_other_column TEXT
)
The corresponding SA model:
class Device(Base):
id = Column(Integer, primary_key=True)
hostname = Column(String(256))
some_other_column = Column(String(20))
I would like to be able to do the following:
mydevice = Device(hostname='hello-world', some_other_column='foo')
merged_device = session.merge(mydevice)
session.commit()
In this example, I would like SA to do an "insert or update". But with the current model, this would actually result in an error (due to the unique hostname constraint).
I could specify the hostname column as primary key in the SA model (and leave the PK in the DB as-is). But that looks a bit hacky. Is there not a more explicit and understandable way to tell SQLAlchemy that it should use "hostname" as identity? And if yes, how?
In situations like this, i find it best to lie to sqlalchemy. Tell it that the natural key is primary.
class Device(Base):
hostname = Column(String(256), primary_key=True)
some_other_column = Column(String(20))
I have the following model where TableA and TableB have 1 to 1 relationship:
class TableA(db.Model):
id = Column(db.BigInteger, primary_key=True)
title = Column(String(1024))
table_b = relationship('TableB', uselist=False, back_populates="table_a")
class TableB(db.Model):
id = Column(BigInteger, ForeignKey(TableA.id), primary_key=True)
a = relationship('TableA', back_populates='table_b')
name = Column(String(1024))
when I insert 1 record everything goes fine:
rec_a = TableA(title='hello')
rec_b = TableB(a=rec_a, name='world')
db.session.add(rec_b)
db.session.commit()
but when I try to do this for bulk of records:
bulk_ = []
for title, name in zip(titles, names):
rec_a = TableA(title=title)
bulk_.append(TableB(a=rec_a, name=name))
db.session.bulk_save_objects(bulk_)
db.session.commit()
I get the following exception:
sqlalchemy.exc.InternalError: (pymysql.err.InternalError) (1364, "Field 'id' doesn't have a default value")
Am I doing something wrong? Did I configure the model wrong?
Is there a way to bulk commit this type of data?
The error you see is thrown by Mysql. It is complaining that the attempt to insert records into table_b violates the foreign key constraint.
One technique could be to write all the titles in one bulk statement, then write all the names in a 2nd bulk statement. Also, I've never passed relationships successfully to bulk operations, to this method relies on inserting simple values.
bulk_titles = [TableA(title=title) for title in titles]
session.bulk_save_objects(bulk_titles, return_defauls=True)
bulk_names = [TableB(id=title.id, name=name) for title, name in zip(bulk_titles, names)]
session.bulk_save_objects(bulk_names)
return_defaults=True is needed above because we need title.id in the 2nd bulk operation. But this greatly reduces the performance gains of the bulk operation
To avoid the performance degradation due to return_defauts=True, you could generate the primary keys from the application, rather than the database, e.g. using uuids, or fetching the max id in each table and generating a range from that start value.
Another technique might be to write your bulk insert statement using sqlalchemy core or plain text.
I have a class like this. My goal is to update the last_seen_date each time a duplicate link is encountered.
I've defined the id column, because this will be used as a foreign key into another table. This is auto-incrementing. The true duplicate is the url.
class Link(Base):
__tablename__ = 'links'
id = Column(BigInteger, primary_key=True, unique=True)
url = Column(String(500), nullable=False, index=True, unique=True)
last_seen_date = Column(DateTime, nullable=False, default=datetime.datetime.now())
I'd like to do an INSERT...ON DUPLICATE KEY UPDATE, but I don't know how to define the fields to accomplish this. The way I have it now, I can never duplicate because the combination primary key always has an auto-incrementing portion in the ID.
How do I need to change my definition to allow an ON DUPLICATE KEY UPDATE statement to work when the url is a duplicate, I can change the last_seen_date?
ALTER TABLE `tablename` ADD UNIQUE INDEX `ttt_url` (`url`);
ttt_xxx would be the name of the key which i do out of habit. It will allow for insert / on duplicate and you can update the date/timestamp and preserve your original pk
SQLAlchemy: how should I define a column's default value computed using a reference to the table containing that column?
Let's use these tables as an example (SQLite):
CREATE TABLE department (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE employee (
id INTEGER,
name TEXT NOT NULL,
department_id INTEGER NOT NULL,
FOREIGN KEY (department_id) REFERENCES department(id),
PRIMARY KEY (id, department_id)
);
I want each eployee's ID to be unique only with respect to their department. On INSERT, a new employee ID should be generated that is one larger than the previously-highest employee ID in that department.
Put in raw SQL, here's what I'm looking to do:
INSERT INTO employee(
id,
name,
department_id
)
VALUES (
(
SELECT coalesce(MAX(id),0)+1
FROM employee
WHERE department_id=?
),
?,
?
)
What's the best way to do this using SQLAlchemy?
I think I'm looking for something similar to the third column example in here. Something like this:
employee_table = Table("employee", meta,
Column('id', Integer, primary_key=True, autoincrement=False,
default=keyvalues.select(
func.max(employee_table.c.id)
).filter_by(department_id=??))
Column('department_id', Integer, ForeignKey('department.id'),
nullable=False, primary_key=True, autoincrement=False)
Column('name', String(127), nullable=False),
)
That doesn't work, of course: I don't have a reference to the employee table yet (since I'm still defining it) and because I don't know how to reference the "current" department_id in the filter_by clause. (There are quite possibly other problems, too)
Alternatively, if it is not possible to do this through the Python API, is there any way I can just specify a column's default value (applied at INSERT time) using raw SQL? Or do i need to use raw SQL for the entire insert?
Note: my situation is basically the same as in this question, but the solution I'm looking for is different: I want to use a nested SELECT in my inserts rather than create a DB trigger.
EDIT
I'm getting closer to solving the problem, but I'm still not there yet.
agronholm in #sqlalchemy explained that by just using default there would be no way to fill in the department_id because although it's possible to have the selectable used as the default on INSERT, there is no way to fill in parameters (the department_id)
Instead, agronholm suggested the best solution is to create the subquery within the constructor. By assigning the query (not running it and assigning the result!), the id will be fetched in a sub-SELECT. This avoids the race condition that would result from performing the SELECT first on the Python side, and then assigning the result.
I'm trying out something like this:
def __init__(self, department, name):
self.id = db.select(
db.func.max(Employee.id)
).filter_by(department_id=department.id).as_scalar()
self.department = department
self.data = data
Unfortunately, this also doesn't work, because the calculated column is used as part of the primary key. It throws:
InvalidRequestError: Instance <XXXXX at 0x3d15d10> cannot be refreshed - it's not persistent and does not contain a full primary key.
In my original raw-SQLite version, I would access the newly-created row with the cursor's lastrowid. Is something similar possible in SQLAlchemy?
I ran into a similar problem and finally arrived at this solution. There's still room for improvement -- it does the SELECT before the INSERT rather than inlining it -- but it seems to work.
from sqlalchemy import sql
...
def default_employee_id(context):
return context.connection.execute(
sql.select(
[sql.func.ifnull(sql.func.max(employee_table.c.id), 0) + 1]
).where(
employee_table.c.department_id==context.current_parameters['department_id']
)
).scalar()
employee_table = Table("employee", meta,
Column('id', Integer, primary_key=True, autoincrement=False,
default=default_employee_id),
Column('department_id', Integer, ForeignKey('department.id'),
nullable=False, primary_key=True, autoincrement=False),
Column('name', String(127), nullable=False)
)
The next thing I would try is a trigger, even though the docs say it's a bad idea for a primary key.
Hooking into the "before_flush" event would probably have the same pre-select issue.
It may also be possible to alter or replace context.compiled argument in order to inject the SELECT into the INSERT, but that seems extreme for what we're trying to accomplish.