Using SQLAlchemy on PostgreSQL, I try to improve performance at insertion (about 100k egdes to insert), executing "nested inserts" in a single query for one edge and its nodes.
Using Insert.from_select, I get following error and I don't really understand why.
CompileError: bindparam() name 'name' is reserved for automatic usage in the VALUES or SET clause of this insert/update statement. Please use a name other than column name when using bindparam() with insert() or update() (for example, 'b_name').
from sqlalchemy import *
metadata = MetaData()
node = Table('node', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
)
edge = Table('edge', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
Column('source_id', Integer(), ForeignKey(node.c.id)),
Column('target_id', Integer(), ForeignKey(node.c.id)),
)
engine = create_engine('postgres://postgres:postgres#db:5432')
metadata.create_all(engine)
e1_source = insert(node).values(name='e1_source').returning(node.c.id).cte('source')
e1_target = insert(node).values(name='e1_target').returning(node.c.id).cte('target')
e1 = insert(edge).from_select(
['source_id', 'target_id', 'name'], # bindparam error
# ['source_id', 'target_id', 'b_name'], # key error
# [edge.c.source_id, edge.c.target_id, edge.c.name], # bindparam error
select([
e1_source.c.id,
e1_target.c.id,
literal('e1'),
])
)
engine.execute(e1)
EDIT: Below, the SQL query I was expected to produces. I remain open to any suggestions to achieve my purpose though.
CREATE TABLE node (
id SERIAL PRIMARY KEY,
name VARCHAR
);
CREATE TABLE edge (
id SERIAL PRIMARY KEY,
source_id INTEGER REFERENCES node (id),
target_id INTEGER REFERENCES node (id),
name VARCHAR
);
WITH source AS (
INSERT INTO node (name)
VALUES ('e1_source')
RETURNING id
), target as (
INSERT INTO node (name)
VALUES ('e1_target')
RETURNING id
)
INSERT INTO edge (source_id, target_id, name)
SELECT source.id, target.id, 'e1'
FROM source, target;
I have finally figured out where bindparam was implicitly used by SQLAlchemy to solve my issue: in the node queries and not the edge query as I was first thinking.
But I am still not sure if this is the proper way to perform nested insert queries with SQLAlchemy and if it will improve execution time.
e1_source = insert(node).values(name=bindparam('source_name')).returning(node.c.id).cte('source')
e1_target = insert(node).values(name=bindparam('target_name')).returning(node.c.id).cte('target')
e1 = insert(edge).from_select(
['source_id', 'target_id', 'name'],
select([
e1_source.c.id,
e1_target.c.id,
literal('e1'),
])
)
engine.execute(e1, {
'source_name': 'e1_source',
'target_name': 'e1_target',
})
Related
I have a function that I use to update tables in PostgreSQL. It works great to avoid duplicate insertions by creating a temp table and dropping it upon completion. However, I have a few tables with serial ids and I have to pass the serial id in a column. Otherwise, I get an error that the keys are missing. How can I insert values in those tables and have the serial key get assigned automatically? I would prefer to modify the function below if possible.
def export_to_sql(df, table_name):
from sqlalchemy import create_engine
engine = create_engine(f'postgresql://{user}:{password}#{host}:5432/{user}')
df.to_sql(con=engine,
name='temporary_table',
if_exists='append',
index=False,
method = 'multi')
with engine.begin() as cnx:
insert_sql = f'INSERT INTO {table_name} (SELECT * FROM temporary_table) ON CONFLICT DO NOTHING; DROP TABLE temporary_table'
cnx.execute(insert_sql)
code used to create the tables
CREATE TABLE symbols
(
symbol_id serial NOT NULL,
symbol varchar(50) NOT NULL,
CONSTRAINT PK_symbols PRIMARY KEY ( symbol_id )
);
CREATE TABLE tweet_symols(
tweet_id varchar(50) REFERENCES tweets,
symbol_id int REFERENCES symbols,
PRIMARY KEY (tweet_id, symbol_id),
UNIQUE (tweet_id, symbol_id)
);
CREATE TABLE hashtags
(
hashtag_id serial NOT NULL,
hashtag varchar(140) NOT NULL,
CONSTRAINT PK_hashtags PRIMARY KEY ( hashtag_id )
);
CREATE TABLE tweet_hashtags
(
tweet_id varchar(50) NOT NULL,
hashtag_id integer NOT NULL,
CONSTRAINT FK_344 FOREIGN KEY ( tweet_id ) REFERENCES tweets ( tweet_id )
);
CREATE INDEX fkIdx_345 ON tweet_hashtags
(
tweet_id
);
The INSERT statement does not define the target columns, so Postgresql will attempt to insert values into a column that was defined as SERIAL.
We can work around this by providing a list of target columns, omitting the serial types. To do this we use SQLAlchemy to fetch the metadata of the table that we are inserting into from the database, then make a list of target columns. SQLAlchemy doesn't tell us if a column was created using SERIAL, but we will assume that it is if it is a primary key and is set to autoincrement. Primary key columns defined with GENERATED ... AS IDENTITY will also be filtered out - this is probably desirable as they behave in the same way as SERIAL columns.
import sqlalchemy as sa
def export_to_sql(df, table_name):
engine = sa.create_engine(f'postgresql://{user}:{password}#{host}:5432/{user}')
df.to_sql(con=engine,
name='temporary_table',
if_exists='append',
index=False,
method='multi')
# Fetch table metadata from the database
table = sa.Table(table_name, sa.MetaData(), autoload_with=engine)
# Get the names of columns to be inserted,
# assuming auto-incrementing PKs are serial types
column_names = ','.join(
[f'"{c.name}"' for c in table.columns
if not (c.primary_key and c.autoincrement)]
)
with engine.begin() as cnx:
insert_sql = sa.text(
f'INSERT INTO {table_name} ({column_names}) (SELECT * FROM temporary_table) ON CONFLICT DO NOTHING; DROP TABLE temporary_table'
)
cnx.execute(insert_sql)
I'm trying to understand what the set_ means in SQLAlchemy's on_conflict_do_update method. i have the following Table:
Table(
"test",
metadata,
Column("id", Integer, primary_key=True),
Column("firstname", String(100)),
Column("lastname", String(100)),
)
and what insert some like this (if i wrote it in psql)
INSERT INTO test (id, firstname, lastname) VALUES (1, 'John', 'Doe)
ON CONFLICT (id) DO UPDATE SET firstname = EXCLUDED.firstname, lastname = EXCLUDED.lastname
I did some due diligence and saw people write in the set_ like this:
import sqlalchemy.dialects import postgresql
insert_stmt = postgresql.insert(target).values([{'id':1,'firstname':'John','lastname':'Doe'}])
primary_keys = [key.name for key in inspect(target).primary_key]
update_dict = {c.name: c for c in insert_stmt.excluded if not c.primary_key}
stmt = insert_stmt.on_conflict_do_update(index_elements = primary_keys , set_ = update_dict)
engine.execute(stmt)
Is the update_dict just looking at the EXCLUDED values (the ones I want to update with) that I set in my insert_stmt? If I str(update_dict) I get an dictionary of specific information regarding the column {'firstname': Column('firstname', VARCHAR(length=100), table=<excluded>), 'lastname': Column('lastname', VARCHAR(length=100), table=<excluded>)}, is the method above the only way to retrieve the data? Can you write it out manually?
Consider the following database table:
ID ticker description
1 GDBR30 30YR
2 GDBR10 10YR
3 GDBR5 5YR
4 GDBR2 2YR
It can be replicated with this piece of code:
from sqlalchemy import (
Column,
Integer,
MetaData,
String,
Table,
create_engine,
insert,
select,
)
engine = create_engine("sqlite+pysqlite:///:memory:", echo=True, future=True)
metadata = MetaData()
# Creating the table
tickers = Table(
"tickers",
metadata,
Column("id", Integer, primary_key=True, autoincrement=True),
Column("ticker", String, nullable=False),
Column("description", String(), nullable=False),
)
metadata.create_all(engine)
# Populating the table
with engine.connect() as conn:
result = conn.execute(
insert(tickers),
[
{"ticker": "GDBR30", "description": "30YR"},
{"ticker": "GDBR10", "description": "10YR"},
{"ticker": "GDBR5", "description": "5YR"},
{"ticker": "GDBR2", "description": "2YR"},
],
)
conn.commit()
I need to filter tickers for some values:
search_list = ["GDBR10", "GDBR5", "GDBR30"]
records = conn.execute(
select(tickers.c.description).where((tickers.c.ticker).in_(search_list))
)
print(records.fetchall())
# Result
# [('30YR',), ('10YR',), ('5YR',)]
However, I need the resulting list of tuples ordered in the way search_list has been ordered. That is, I need the following result:
print(records.fetchall())
# Expected result
# [('10YR',), ('5YR',), ('30YR',)]
Using SQLite, you could create a cte with two columns (id and ticker). Applying the following code will lead to the expected result (see Maintain order when using SQLite WHERE-clause and IN operator). Unfortunately, I am not able to transfer the SQLite solution to sqlalchemy.
WITH cte(id, ticker) AS (VALUES (1, 'GDBR10'), (2, 'GDBR5'), (3, 'GDBR30'))
SELECT t.*
FROM tbl t INNER JOIN cte c
ON c.ticker = t.ticker
ORDER BY c.id
Suppose, I have search_list_tuple as folllows, how am I suppose to code the sqlalchemy query?
search_list_tuple = [(1, 'GDBR10'), (2, 'GDBR5'), (3, 'GDBR30')]
Below works and is actually equivalent to the VALUES (...) on sqlite albeit somewhat more verbose:
# construct the CTE
sub_queries = [
select(literal(i).label("id"), literal(v).label("ticker"))
for i, v in enumerate(search_list)
]
cte = union_all(*sub_queries).cte("cte")
# desired query
records = conn.execute(
select(tickers.c.description)
.join(cte, cte.c.ticker == tickers.c.ticker)
.order_by(cte.c.id)
)
print(records.fetchall())
# [('10YR',), ('5YR',), ('30YR',)]
Below is using the values() contruct, but unfortunately the resulting query fails on SQLite, but it works perfectly on postgresql:
cte = select(
values(
column("id", Integer), column("ticker", String), name="subq"
).data(list(zip(range(len(search_list)), search_list)))
).cte("cte")
qq = (
select(tickers.c.description)
.join(cte, cte.c.ticker == tickers.c.ticker)
.order_by(cte.c.id)
)
records = conn.execute(qq)
print(records.fetchall())
I'd like for some of my tables to be able to access models via foreign keys, using SqlAlchemy's relationship function. Unfortunately, I'm running into an issue where the foreign keys do not appear to resolve.
What I'm doing differently from the standard SqlAlchemy docs is using an inheritance structure where most tables inherit from the same base.
Tables:
Noun:
nounId INT
Person:
nounId INT PRIMARY KEY References(Noun.nounId)
name STRING
Place:
nounId INT PRIMARY KEY References(Noun.nounId)
location STRING
Plan:
nounId INT PRIMARY KEY References(Noun.nounId)
personId INT References(Person.nounId)
plan STRING
Trip:
nounId INT PRIMARY KEY References(Noun.nounId)
planId INT References(Plan.nounId)
placeId INT References(Place.nounId)
plan STRING
Currently, I'm unable to get SqlAlchemy to resolve the plan.person model and I'm unsure as to why. (I'd also like to be able to resolve person.places via SqlAlchemy's secondary, but I think the two issues might be related).
The following code will raise an error at the "assert" line:
from sqlalchemy import *
from sqlalchemy.orm import *
def test():
metadata = MetaData()
# Base Table
nounTable = Table(
'Nouns', metadata,
Column('nounId', Integer, primary_key=True)
)
personsTable = Table(
'Persons', metadata,
Column('nounId', Integer, ForeignKey('Nouns.nounId'), primary_key=True),
Column('name', String)
)
placesTable = Table(
'Places', metadata,
Column('nounId', Integer, ForeignKey('Nouns.nounId'), primary_key=True),
Column('location', String)
)
plansTable = Table(
'Plans', metadata,
Column('nounId', Integer, ForeignKey('Nouns.nounId'), primary_key=True),
Column('personId', Integer, ForeignKey('Persons.nounId')),
Column('plan', String)
)
tripsTable = Table(
'Trips', metadata,
Column('nounId', Integer, ForeignKey('Nouns.nounId'), primary_key=True),
Column('planId', Integer, ForeignKey('Plans.nounId')),
Column('placeId', Integer, ForeignKey('Places.nounId')),
Column('plan', String)
)
class Noun(object): pass
class Person(Noun): pass
class Place(Noun): pass
class Plan(Noun): pass
class Trip(Noun): pass
mapper(Noun, nounTable)
mapper(Trip, tripsTable, inherits=Noun)
mapper(Place, placesTable, inherits=Noun)
mapper(Plan, plansTable, inherits=Noun, properties={
# SqlAlchemy will raise an exception if `foreign_keys` is not explicitly defined
'person': relationship(Person, foreign_keys=[personsTable.c.nounId], backref='plans')
})
mapper(Person, personsTable, inherits=Noun, properties={
# This is not resolved either
'places': relationship(Place,
secondary = join(Plan, Trip, Plan.nounId==Trip.planId),
secondaryjoin = lambda: Trip.placeId==Place.nounId,
primaryjoin = lambda: Person.nounId==Plan.personId
)
})
engine = create_engine('sqlite://')
metadata.create_all(engine)
session = sessionmaker(bind=engine)()
alice = Person()
alice.name = "alice"
session.add(alice)
session.commit()
planA = Plan()
planA.personId = alice.nounId
planA.plan = "This is a plan"
session.add(planA)
session.commit()
# The reference isn't resolved
assert planA.person, "No person found"
print "Plan: {}".format([planA.nounId, planA.person])
test()
Ideally, I'd like to be able to retrieve all Persons at a place and vice versa, but currently, it is not able to resolve the simple Plan->Person relationship.
I do have database table that has an id primary key that is not an auto-increment (sequence). So it's up to the user to create an unique id or the insert will fail.
This table is not under my control, so I cannot change the database structure.
from sqlalchemy import create_engine, Table, MetaData
import psycopg2
db = create_engine('postgresql://...', echo=False).connect()
meta = MetaData()
meta.reflect(bind=db)
t = Table("mytable", meta, autoload=True, autoload_with=db)
values = { "title":"title", "id": ... }# ???
t.insert(bind=db, values=values).execute()
Given this is "single-user" / "single-client" system, you should be able to use the Column defaults: Python-Executed Functions. The example on the documentation linked to is enough to get you started. I would, however, use python function but with proper initialization from the datatabase adn still stored in a global variable:
def new_id_factory():
if not('_MYTABLE_ID_' in globals()):
q = db.execute("select max(mytable.id) as max_id from mytable").fetchone()
_MYTABLE_ID_ = (q and q.max_id) or 0
_MYTABLE_ID_ += 1
return _MYTABLE_ID_
t = Table("mytable", Base.metadata,
Column('id', Integer, primary_key=True, default=new_id_factory), #
autoload=True, autoload_with=db,
)