SQLalchemy Bulk insert with one to one relation - python

I have the following model where TableA and TableB have 1 to 1 relationship:
class TableA(db.Model):
id = Column(db.BigInteger, primary_key=True)
title = Column(String(1024))
table_b = relationship('TableB', uselist=False, back_populates="table_a")
class TableB(db.Model):
id = Column(BigInteger, ForeignKey(TableA.id), primary_key=True)
a = relationship('TableA', back_populates='table_b')
name = Column(String(1024))
when I insert 1 record everything goes fine:
rec_a = TableA(title='hello')
rec_b = TableB(a=rec_a, name='world')
db.session.add(rec_b)
db.session.commit()
but when I try to do this for bulk of records:
bulk_ = []
for title, name in zip(titles, names):
rec_a = TableA(title=title)
bulk_.append(TableB(a=rec_a, name=name))
db.session.bulk_save_objects(bulk_)
db.session.commit()
I get the following exception:
sqlalchemy.exc.InternalError: (pymysql.err.InternalError) (1364, "Field 'id' doesn't have a default value")
Am I doing something wrong? Did I configure the model wrong?
Is there a way to bulk commit this type of data?

The error you see is thrown by Mysql. It is complaining that the attempt to insert records into table_b violates the foreign key constraint.
One technique could be to write all the titles in one bulk statement, then write all the names in a 2nd bulk statement. Also, I've never passed relationships successfully to bulk operations, to this method relies on inserting simple values.
bulk_titles = [TableA(title=title) for title in titles]
session.bulk_save_objects(bulk_titles, return_defauls=True)
bulk_names = [TableB(id=title.id, name=name) for title, name in zip(bulk_titles, names)]
session.bulk_save_objects(bulk_names)
return_defaults=True is needed above because we need title.id in the 2nd bulk operation. But this greatly reduces the performance gains of the bulk operation
To avoid the performance degradation due to return_defauts=True, you could generate the primary keys from the application, rather than the database, e.g. using uuids, or fetching the max id in each table and generating a range from that start value.
Another technique might be to write your bulk insert statement using sqlalchemy core or plain text.

Related

composite UniqueConstraint and on_conflict_do_nothing don't combine?

I have a database with a table that has a composite unique constraint. So the table in flask-sqlalchemy looks something like this:
class MyModel(db.Model):
id = db.Column(db.Integer, index=True)
name = db.Column(db.String(80))
collection = db.Column(db.String(80))
entry_type = db.Column(db.String(80))
__table_args__ = (
db.UniqueConstraint(name, collection),
)
Now I want to be able to insert several rows at a time, but my input could have duplicates, so I wanted the unique constraint to protect me.
I found that sqlite has an option for ON CONFLICT DO NOTHING, so I tried to do this with flask-sqlalchemy.
Following this stackoverflow answer and the documentation, what I have done so far is:
my_vals = (id, name, collection, entry_type)
insert_command = insert(MyModel.__table__).values(my_vals).on_conflict_do_update(
index_elements=(MyModel.name, MyModel.collection))
db.session.execute(insert_command)
db.session.commit()
However, I keep getting (IntegrityError) UNIQUE constraint failed: mymodel.name, mymodel.collection
and I can see the resulting sqlite command and the ON CONFLICT part is not added.
If I write: on_conflict_do_update(index_elements=(MyModel.name,))
I get OperationError ON CONFLICT clause does not match any PRIMARY KEY or UNIQUE constraint but the ON CONFLICT (name) DO NOTHING part is added to the sqlite command.
Am I doing something wrong? Is it a bug? The documentation clearly states that the method expects a sequence, so why doesn't it work for more than one column?
PS: Please ignore any typos, the code is in another computer that has no access to internet.

How to rename an existing table?

I want to create 200+ tables using declarative base on the fly. I learnt it's not possible, so my idea was to create a common table and rename it 200+ times.
class Movie(Base):
id = Column(Integer, primary_key=True)
title = Column(String)
release_date = Column(Date)
name=Column(String)
__tablename__ = 'titanic'
def __init__(self, newname,title, release_date):
self.title = title
self.release_date = release_date
What is the code to change the table name from "titanic" to "wild"?
In Postgresql it is
ALTER TABLE table_name
RENAME TO new_table_name;
I am not finding a solution in sqlalchemy.
There are no foreign keys to this table.
The objective of this question is to rename an existing table thru a solution (if) available in sqlalchemy, not in a purely python way (as mentioned in the other question).
The easiest way to rename a table is to create a new table, dumping the data into it with an INSERT INTO statement.
More from the web:
You must issue the appropriate ALTER statements to your database to change the name of the table. As far as the Table metadata itself, you can attempt to set table.name = 'newname', and re-place the Table object within metadata.tables with its new name, but this may have lingering side effects regarding foreign keys that reference the old name. In general, the pattern is not supported - its intended that a SQLAlchemy application runs with a fixed database structure (only new tables can be added on the fly).
(Source)

How can I tell SQLAlchemy to use a different identity rule for Session.merge (instead of the PK)?

I have a legacy DB which was blindly created with auto-increment IDs even though there's a perfectly valid natural key in the table.
This ends up with code littered with code along the lines:
Fetch row with natural key 'x'
if exists:
update row with NK 'x'
else:
insert row with NK 'x'
Essentially an upsert.
This use-case (upsert) is covered by Session.merge() from SQLAlchemy. But SA will only look at the primary key of the table to reconcile whether it has to do an insert or update. In the existing DB, the PK does however - contrary to what it should do - not represent the true identity of the row. So the same identity can appear with multiple auto-increment IDs. There are some other business rules in place to ensure uniqueness. But the ID 1 of today can be ID 3246 tomorrow!
There is currently no good way to modify the DB in a sensible manner as too many legacy applications are dependent on the structure as it is.
For the sake of a tangible example, assume we have network devices in the table, and take their hostname as natural key. The current DB would look something like this:
CREATE TABLE device (
id SERIAL PRIMARY KEY,
hostname TEXT UNIQUE,
some_other_column TEXT
)
The corresponding SA model:
class Device(Base):
id = Column(Integer, primary_key=True)
hostname = Column(String(256))
some_other_column = Column(String(20))
I would like to be able to do the following:
mydevice = Device(hostname='hello-world', some_other_column='foo')
merged_device = session.merge(mydevice)
session.commit()
In this example, I would like SA to do an "insert or update". But with the current model, this would actually result in an error (due to the unique hostname constraint).
I could specify the hostname column as primary key in the SA model (and leave the PK in the DB as-is). But that looks a bit hacky. Is there not a more explicit and understandable way to tell SQLAlchemy that it should use "hostname" as identity? And if yes, how?
In situations like this, i find it best to lie to sqlalchemy. Tell it that the natural key is primary.
class Device(Base):
hostname = Column(String(256), primary_key=True)
some_other_column = Column(String(20))

Handling Redshift identity columns in SQLAlchemy

I'm using the redshift-sqlalchemy package to connect SQLAlchemy to Redshift. In Redshift I have a simple "companies" table:
create table if not exists companies (
id bigint identity primary key,
name varchar(1024) not null
);
On the SQLAlchemy side I have mapped it like so:
Base = declarative_base()
class Company(Base):
__tablename__ = 'companies'
id = Column(BigInteger, primary_key=True)
name = Column(String)
If I try to create a company:
company = Company(name = 'Acme')
session.add(company)
session.commit()
then I get this error:
sqlalchemy.exc.StatementError: (raised as a result of Query-invoked autoflush;
consider using a session.no_autoflush block if this flush is occurring prematurely)
(sqlalchemy.exc.ProgrammingError) (psycopg2.ProgrammingError)
relation "companies_id_seq" does not exist
[SQL: 'select nextval(\'"companies_id_seq"\')']
[SQL: u'INSERT INTO companies (id, name)
VALUES (%(id)s, %(name)s)'] [parameters: [{'name': 'Acme'}]]
The problem is surely that SQLAlchemy is expecting an auto-incrementing sequence - standard technique with Postgres and other conventional DBs. But Redshift doesn't have sequences, instead it offers "identity columns" for auto-generated unique values (not necessarily sequential). Any advice on how to make this work? To be clear, I don't care about auto-incrementing, just need unique primary key values.
Just like you said, Redshift doesn't support sequences so you can remove this part:
select nextval(\'"companies_id_seq"\')
And your insert statement should simply be:
INSERT INTO companies
(name)
VALUES
('Acme')
In your table, you will see that 'Acme' has a id column with a unique value. You can't insert a value into the id column so you don't specify it in the insert statement. It will be auto populated.
Here is more explanation:
http://docs.aws.amazon.com/redshift/latest/dg/c_Examples_of_INSERT_30.html

SQLAlchemy: how should I define a column's default value computed using a reference to the table containing that column?

SQLAlchemy: how should I define a column's default value computed using a reference to the table containing that column?
Let's use these tables as an example (SQLite):
CREATE TABLE department (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE employee (
id INTEGER,
name TEXT NOT NULL,
department_id INTEGER NOT NULL,
FOREIGN KEY (department_id) REFERENCES department(id),
PRIMARY KEY (id, department_id)
);
I want each eployee's ID to be unique only with respect to their department. On INSERT, a new employee ID should be generated that is one larger than the previously-highest employee ID in that department.
Put in raw SQL, here's what I'm looking to do:
INSERT INTO employee(
id,
name,
department_id
)
VALUES (
(
SELECT coalesce(MAX(id),0)+1
FROM employee
WHERE department_id=?
),
?,
?
)
What's the best way to do this using SQLAlchemy?
I think I'm looking for something similar to the third column example in here. Something like this:
employee_table = Table("employee", meta,
Column('id', Integer, primary_key=True, autoincrement=False,
default=keyvalues.select(
func.max(employee_table.c.id)
).filter_by(department_id=??))
Column('department_id', Integer, ForeignKey('department.id'),
nullable=False, primary_key=True, autoincrement=False)
Column('name', String(127), nullable=False),
)
That doesn't work, of course: I don't have a reference to the employee table yet (since I'm still defining it) and because I don't know how to reference the "current" department_id in the filter_by clause. (There are quite possibly other problems, too)
Alternatively, if it is not possible to do this through the Python API, is there any way I can just specify a column's default value (applied at INSERT time) using raw SQL? Or do i need to use raw SQL for the entire insert?
Note: my situation is basically the same as in this question, but the solution I'm looking for is different: I want to use a nested SELECT in my inserts rather than create a DB trigger.
EDIT
I'm getting closer to solving the problem, but I'm still not there yet.
agronholm in #sqlalchemy explained that by just using default there would be no way to fill in the department_id because although it's possible to have the selectable used as the default on INSERT, there is no way to fill in parameters (the department_id)
Instead, agronholm suggested the best solution is to create the subquery within the constructor. By assigning the query (not running it and assigning the result!), the id will be fetched in a sub-SELECT. This avoids the race condition that would result from performing the SELECT first on the Python side, and then assigning the result.
I'm trying out something like this:
def __init__(self, department, name):
self.id = db.select(
db.func.max(Employee.id)
).filter_by(department_id=department.id).as_scalar()
self.department = department
self.data = data
Unfortunately, this also doesn't work, because the calculated column is used as part of the primary key. It throws:
InvalidRequestError: Instance <XXXXX at 0x3d15d10> cannot be refreshed - it's not persistent and does not contain a full primary key.
In my original raw-SQLite version, I would access the newly-created row with the cursor's lastrowid. Is something similar possible in SQLAlchemy?
I ran into a similar problem and finally arrived at this solution. There's still room for improvement -- it does the SELECT before the INSERT rather than inlining it -- but it seems to work.
from sqlalchemy import sql
...
def default_employee_id(context):
return context.connection.execute(
sql.select(
[sql.func.ifnull(sql.func.max(employee_table.c.id), 0) + 1]
).where(
employee_table.c.department_id==context.current_parameters['department_id']
)
).scalar()
employee_table = Table("employee", meta,
Column('id', Integer, primary_key=True, autoincrement=False,
default=default_employee_id),
Column('department_id', Integer, ForeignKey('department.id'),
nullable=False, primary_key=True, autoincrement=False),
Column('name', String(127), nullable=False)
)
The next thing I would try is a trigger, even though the docs say it's a bad idea for a primary key.
Hooking into the "before_flush" event would probably have the same pre-select issue.
It may also be possible to alter or replace context.compiled argument in order to inject the SELECT into the INSERT, but that seems extreme for what we're trying to accomplish.

Categories