Handling Redshift identity columns in SQLAlchemy

Handling Redshift identity columns in SQLAlchemy - python

I'm using the redshift-sqlalchemy package to connect SQLAlchemy to Redshift. In Redshift I have a simple "companies" table:
create table if not exists companies (
id bigint identity primary key,
name varchar(1024) not null
);
On the SQLAlchemy side I have mapped it like so:
Base = declarative_base()
class Company(Base):
__tablename__ = 'companies'
id = Column(BigInteger, primary_key=True)
name = Column(String)
If I try to create a company:
company = Company(name = 'Acme')
session.add(company)
session.commit()
then I get this error:
sqlalchemy.exc.StatementError: (raised as a result of Query-invoked autoflush;
consider using a session.no_autoflush block if this flush is occurring prematurely)
(sqlalchemy.exc.ProgrammingError) (psycopg2.ProgrammingError)
relation "companies_id_seq" does not exist
[SQL: 'select nextval(\'"companies_id_seq"\')']
[SQL: u'INSERT INTO companies (id, name)
VALUES (%(id)s, %(name)s)'] [parameters: [{'name': 'Acme'}]]
The problem is surely that SQLAlchemy is expecting an auto-incrementing sequence - standard technique with Postgres and other conventional DBs. But Redshift doesn't have sequences, instead it offers "identity columns" for auto-generated unique values (not necessarily sequential). Any advice on how to make this work? To be clear, I don't care about auto-incrementing, just need unique primary key values.

Just like you said, Redshift doesn't support sequences so you can remove this part:
select nextval(\'"companies_id_seq"\')
And your insert statement should simply be:
INSERT INTO companies
(name)
VALUES
('Acme')
In your table, you will see that 'Acme' has a id column with a unique value. You can't insert a value into the id column so you don't specify it in the insert statement. It will be auto populated.
Here is more explanation:
http://docs.aws.amazon.com/redshift/latest/dg/c_Examples_of_INSERT_30.html

Related

composite UniqueConstraint and on_conflict_do_nothing don't combine?

I have a database with a table that has a composite unique constraint. So the table in flask-sqlalchemy looks something like this:
class MyModel(db.Model):
id = db.Column(db.Integer, index=True)
name = db.Column(db.String(80))
collection = db.Column(db.String(80))
entry_type = db.Column(db.String(80))
__table_args__ = (
db.UniqueConstraint(name, collection),
)
Now I want to be able to insert several rows at a time, but my input could have duplicates, so I wanted the unique constraint to protect me.
I found that sqlite has an option for ON CONFLICT DO NOTHING, so I tried to do this with flask-sqlalchemy.
Following this stackoverflow answer and the documentation, what I have done so far is:
my_vals = (id, name, collection, entry_type)
insert_command = insert(MyModel.__table__).values(my_vals).on_conflict_do_update(
index_elements=(MyModel.name, MyModel.collection))
db.session.execute(insert_command)
db.session.commit()
However, I keep getting (IntegrityError) UNIQUE constraint failed: mymodel.name, mymodel.collection
and I can see the resulting sqlite command and the ON CONFLICT part is not added.
If I write: on_conflict_do_update(index_elements=(MyModel.name,))
I get OperationError ON CONFLICT clause does not match any PRIMARY KEY or UNIQUE constraint but the ON CONFLICT (name) DO NOTHING part is added to the sqlite command.
Am I doing something wrong? Is it a bug? The documentation clearly states that the method expects a sequence, so why doesn't it work for more than one column?
PS: Please ignore any typos, the code is in another computer that has no access to internet.

Postgres SQL error: there is no unique or exclusion constraint matching the ON CONFLICT specification" DESPITE constrain being set

(sorry, there are many similar questions on SO but none I could find that match well enough)
Attempting to upsert to a Postgres RDS table via a temp table...
import sqlalchemy as sa
# assume db_engine is already set up
with db_engine.connect() as conn:
conn.execute(sa.text("DROP TABLE IF EXISTS temp_table"))
build_temp_table = f"""
CREATE TABLE temp_table (
unique_id VARCHAR(40) NOT NULL,
date TIMESTAMP,
amount NUMERIC,
UNIQUE (unique_id)
);
"""
conn.execute(sa.text(build_temp_table))
upsert_sql_string = """
INSERT INTO production_table(unique_id, date, amount)
SELECT unique_id, date, amount FROM temp_table
ON CONFLICT (unique_id)
DO UPDATE SET
date = excluded.date,
amount = excluded.amount
"""
conn.execute(sa.text(upsert_sql_string))
Note: production_table is configured the identically to temp_table
Other methods I have tried include:
Specifying unique_id as PRIMARY KEY or UNIQUE in table definition
Running ALTER TABLE temp_table ADD PRIMARY KEY (unique_id) after creating temp_table
Regardless of what I do, I get the error:
psycopg2.errors.InvalidColumnReference: there is no unique or exclusion constraint matching the ON CONFLICT specification
Thanks

SQLAlchemy query marks id as null

I'm using SQLAlchemy to insert a row into a table. The table is defined like this:
class MyTable():
__table__ = "my_table"
id = Column(BigInteger, primary_key=True)
stuff1 = Column(Numeric)
Here's the alchemy line:
MyTable(stuff1=100)
Here's the query it generates:
INSERT INTO my_table (id, stuff1) VALUES (null, 100)
And i get this error:
IntegrityError('(psycopg2.IntegrityError) null value in column \"id\" violates not-null constraint
Since the id is the primary key, i expected it to get generated automatically. But it seems like i have to manually apply a sequence to it? What am I doing wrong?

This has happened to me before and usually I either didn't use sqlalchemy to create the table or my table definition in sqlalchemy was incomplete/incorrect when I did. If you have a 'create table if not exists' setup then your changes won't be synchronized with the table definition in postgres. I would checkout the table definition with the psql command line tool to verify if a server default is not setup for the primary key. It should be a serial data type or using an external sequence.

SQLalchemy Bulk insert with one to one relation

I have the following model where TableA and TableB have 1 to 1 relationship:
class TableA(db.Model):
id = Column(db.BigInteger, primary_key=True)
title = Column(String(1024))
table_b = relationship('TableB', uselist=False, back_populates="table_a")
class TableB(db.Model):
id = Column(BigInteger, ForeignKey(TableA.id), primary_key=True)
a = relationship('TableA', back_populates='table_b')
name = Column(String(1024))
when I insert 1 record everything goes fine:
rec_a = TableA(title='hello')
rec_b = TableB(a=rec_a, name='world')
db.session.add(rec_b)
db.session.commit()
but when I try to do this for bulk of records:
bulk_ = []
for title, name in zip(titles, names):
rec_a = TableA(title=title)
bulk_.append(TableB(a=rec_a, name=name))
db.session.bulk_save_objects(bulk_)
db.session.commit()
I get the following exception:
sqlalchemy.exc.InternalError: (pymysql.err.InternalError) (1364, "Field 'id' doesn't have a default value")
Am I doing something wrong? Did I configure the model wrong?
Is there a way to bulk commit this type of data?

The error you see is thrown by Mysql. It is complaining that the attempt to insert records into table_b violates the foreign key constraint.
One technique could be to write all the titles in one bulk statement, then write all the names in a 2nd bulk statement. Also, I've never passed relationships successfully to bulk operations, to this method relies on inserting simple values.
bulk_titles = [TableA(title=title) for title in titles]
session.bulk_save_objects(bulk_titles, return_defauls=True)
bulk_names = [TableB(id=title.id, name=name) for title, name in zip(bulk_titles, names)]
session.bulk_save_objects(bulk_names)
return_defaults=True is needed above because we need title.id in the 2nd bulk operation. But this greatly reduces the performance gains of the bulk operation
To avoid the performance degradation due to return_defauts=True, you could generate the primary keys from the application, rather than the database, e.g. using uuids, or fetching the max id in each table and generating a range from that start value.
Another technique might be to write your bulk insert statement using sqlalchemy core or plain text.

SQLAlchemy: how should I define a column's default value computed using a reference to the table containing that column?

SQLAlchemy: how should I define a column's default value computed using a reference to the table containing that column?
Let's use these tables as an example (SQLite):
CREATE TABLE department (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE employee (
id INTEGER,
name TEXT NOT NULL,
department_id INTEGER NOT NULL,
FOREIGN KEY (department_id) REFERENCES department(id),
PRIMARY KEY (id, department_id)
);
I want each eployee's ID to be unique only with respect to their department. On INSERT, a new employee ID should be generated that is one larger than the previously-highest employee ID in that department.
Put in raw SQL, here's what I'm looking to do:
INSERT INTO employee(
id,
name,
department_id
)
VALUES (
(
SELECT coalesce(MAX(id),0)+1
FROM employee
WHERE department_id=?
),
?,
?
)
What's the best way to do this using SQLAlchemy?
I think I'm looking for something similar to the third column example in here. Something like this:
employee_table = Table("employee", meta,
Column('id', Integer, primary_key=True, autoincrement=False,
default=keyvalues.select(
func.max(employee_table.c.id)
).filter_by(department_id=??))
Column('department_id', Integer, ForeignKey('department.id'),
nullable=False, primary_key=True, autoincrement=False)
Column('name', String(127), nullable=False),
)
That doesn't work, of course: I don't have a reference to the employee table yet (since I'm still defining it) and because I don't know how to reference the "current" department_id in the filter_by clause. (There are quite possibly other problems, too)
Alternatively, if it is not possible to do this through the Python API, is there any way I can just specify a column's default value (applied at INSERT time) using raw SQL? Or do i need to use raw SQL for the entire insert?
Note: my situation is basically the same as in this question, but the solution I'm looking for is different: I want to use a nested SELECT in my inserts rather than create a DB trigger.
EDIT
I'm getting closer to solving the problem, but I'm still not there yet.
agronholm in #sqlalchemy explained that by just using default there would be no way to fill in the department_id because although it's possible to have the selectable used as the default on INSERT, there is no way to fill in parameters (the department_id)
Instead, agronholm suggested the best solution is to create the subquery within the constructor. By assigning the query (not running it and assigning the result!), the id will be fetched in a sub-SELECT. This avoids the race condition that would result from performing the SELECT first on the Python side, and then assigning the result.
I'm trying out something like this:
def __init__(self, department, name):
self.id = db.select(
db.func.max(Employee.id)
).filter_by(department_id=department.id).as_scalar()
self.department = department
self.data = data
Unfortunately, this also doesn't work, because the calculated column is used as part of the primary key. It throws:
InvalidRequestError: Instance <XXXXX at 0x3d15d10> cannot be refreshed - it's not persistent and does not contain a full primary key.
In my original raw-SQLite version, I would access the newly-created row with the cursor's lastrowid. Is something similar possible in SQLAlchemy?

I ran into a similar problem and finally arrived at this solution. There's still room for improvement -- it does the SELECT before the INSERT rather than inlining it -- but it seems to work.
from sqlalchemy import sql
...
def default_employee_id(context):
return context.connection.execute(
sql.select(
[sql.func.ifnull(sql.func.max(employee_table.c.id), 0) + 1]
).where(
employee_table.c.department_id==context.current_parameters['department_id']
)
).scalar()
employee_table = Table("employee", meta,
Column('id', Integer, primary_key=True, autoincrement=False,
default=default_employee_id),
Column('department_id', Integer, ForeignKey('department.id'),
nullable=False, primary_key=True, autoincrement=False),
Column('name', String(127), nullable=False)
)
The next thing I would try is a trigger, even though the docs say it's a bad idea for a primary key.
Hooking into the "before_flush" event would probably have the same pre-select issue.
It may also be possible to alter or replace context.compiled argument in order to inject the SELECT into the INSERT, but that seems extreme for what we're trying to accomplish.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Handling Redshift identity columns in SQLAlchemy - python

Related

composite UniqueConstraint and on_conflict_do_nothing don't combine?

Postgres SQL error: there is no unique or exclusion constraint matching the ON CONFLICT specification" DESPITE constrain being set

SQLAlchemy query marks id as null

SQLalchemy Bulk insert with one to one relation

SQLAlchemy: how should I define a column's default value computed using a reference to the table containing that column?

Categories

Resources