I'm using SQLAlchemy 1.3.18, Python 3.8.5 and PostgreSQL 12.
I have the following table declaration with a Check Constraint with multiple columns and conditions:
Table(
'my_table',
MetaData(),
Column('id', Integer, primary_key=True),
Column('start', DateTime(), nullable=False),
Column('end', DateTime(), nullable=False),
CheckConstraint(
and_(
or_(
func.date_trunc('month', column('start')) == func.date_trunc('month', column('end')),
func.extract('day', column('end')) == 1
),
(column('end') - (column('start') + func.make_interval(0, 1)) <= func.make_interval())
)
)
)
Although the application DOES create the check constraint in the database correctly, I'm getting the following warning:
C:\Python38\lib\site-packages\sqlalchemy\sql\base.py:559: SAWarning:
Column 'end' on table None being replaced by
<sqlalchemy.sql.elements.ColumnClause at 0x26522ab0e50; end>, which
has the same key. Consider use_labels for select() statements.
C:\Python38\lib\site-packages\sqlalchemy\sql\base.py:559: SAWarning:
Column 'start' on table None being replaced by
<sqlalchemy.sql.elements.ColumnClause at 0x26522ab0b80; start>, which
has the same key. Consider use_labels for select() statements.
C:\Python38\lib\site-packages\sqlalchemy\sql\base.py:559: SAWarning:
Column 'end' on table None being replaced by
<sqlalchemy.sql.elements.ColumnClause at 0x26522ab0c70; end>, which
has the same key. Consider use_labels for select() statements.
What I am doing wrong?
Thanks to Ilja Everilä for the comment that solved the problem.
This is the solution, put the columns in variables so they are the same object in memory.
my_table_start = column('start')
my_table_end = column('end')
Table(
'my_table',
MetaData(),
Column('id', Integer, primary_key=True),
Column('start', DateTime(), nullable=False),
Column('end', DateTime(), nullable=False),
CheckConstraint(
and_(
or_(
func.date_trunc('month', my_table_start) == func.date_trunc('month', my_table_end),
func.extract('day', my_table_end) == 1
),
(my_table_end - (my_table_start + func.make_interval(0, 1)) <= func.make_interval())
)
)
)
Related
Using SQLAlchemy on PostgreSQL, I try to improve performance at insertion (about 100k egdes to insert), executing "nested inserts" in a single query for one edge and its nodes.
Using Insert.from_select, I get following error and I don't really understand why.
CompileError: bindparam() name 'name' is reserved for automatic usage in the VALUES or SET clause of this insert/update statement. Please use a name other than column name when using bindparam() with insert() or update() (for example, 'b_name').
from sqlalchemy import *
metadata = MetaData()
node = Table('node', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
)
edge = Table('edge', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
Column('source_id', Integer(), ForeignKey(node.c.id)),
Column('target_id', Integer(), ForeignKey(node.c.id)),
)
engine = create_engine('postgres://postgres:postgres#db:5432')
metadata.create_all(engine)
e1_source = insert(node).values(name='e1_source').returning(node.c.id).cte('source')
e1_target = insert(node).values(name='e1_target').returning(node.c.id).cte('target')
e1 = insert(edge).from_select(
['source_id', 'target_id', 'name'], # bindparam error
# ['source_id', 'target_id', 'b_name'], # key error
# [edge.c.source_id, edge.c.target_id, edge.c.name], # bindparam error
select([
e1_source.c.id,
e1_target.c.id,
literal('e1'),
])
)
engine.execute(e1)
EDIT: Below, the SQL query I was expected to produces. I remain open to any suggestions to achieve my purpose though.
CREATE TABLE node (
id SERIAL PRIMARY KEY,
name VARCHAR
);
CREATE TABLE edge (
id SERIAL PRIMARY KEY,
source_id INTEGER REFERENCES node (id),
target_id INTEGER REFERENCES node (id),
name VARCHAR
);
WITH source AS (
INSERT INTO node (name)
VALUES ('e1_source')
RETURNING id
), target as (
INSERT INTO node (name)
VALUES ('e1_target')
RETURNING id
)
INSERT INTO edge (source_id, target_id, name)
SELECT source.id, target.id, 'e1'
FROM source, target;
I have finally figured out where bindparam was implicitly used by SQLAlchemy to solve my issue: in the node queries and not the edge query as I was first thinking.
But I am still not sure if this is the proper way to perform nested insert queries with SQLAlchemy and if it will improve execution time.
e1_source = insert(node).values(name=bindparam('source_name')).returning(node.c.id).cte('source')
e1_target = insert(node).values(name=bindparam('target_name')).returning(node.c.id).cte('target')
e1 = insert(edge).from_select(
['source_id', 'target_id', 'name'],
select([
e1_source.c.id,
e1_target.c.id,
literal('e1'),
])
)
engine.execute(e1, {
'source_name': 'e1_source',
'target_name': 'e1_target',
})
My problem is : I have multiple identical databases and I want to merge them into one. But I may have duplicate entries as Primary Keys. What i'm trying to do is handle the duplicates before putting them in Mysql.
my actual code is:
df = pd.DataFrame()
duplicates = pd.DataFrame()
size=[]
lignes=[]
chunksize = 100000
for db in dbtuple: #For each databases in the tuple given as entry
engine = create_engine(URL(
drivername="mysql",
username="xxx",
password="xxx",
host="localhost",
database=db
))
conn = engine.connect()
#Get the data from the table given as entry
sql = "SELECT * FROM "+ tableName
#Execution of the query above
generator_df = pd.read_sql(sql=sql,con=conn,chunksize= chunksize)
#Init of sizechunk value
sizechunk = 0
#Because the query can be very big number of rows there's a separation
# every 100k rows so that dataframe size <= 100k
for dataframe in generator_df:
df = pd.concat([df,dataframe],ignore_index = True, axis=0,sort=False)
#We add the size of the chunk to know how many rows we have per database
sizechunk+= dataframe.shape[0]
size.append(sizechunk)
if tableName == 'table1':
duplicates = df.duplicated(subset='id')
for i in range(0,len(df)):
if duplicates[i]:
df.id[i] = numligne + '_' + df.id[i]
#same for all tables
But this is not pythonic way at all and is it very long to execute. Do you have any suggestion on how to improve the code to make it faster ?
Here is my db schema to understand better:
table1 = Table('table1', metadata,
Column('id', VARCHAR(40), primary_key=True,nullable=False),
mysql_engine='InnoDB'
)
table2= Table('table2', metadata,
Column('id', VARCHAR(40), primary_key=True,nullable=False),
Column('id_of', VARCHAR(20),ForeignKey("table1.id"), nullable=False, index= True)
)
table3= Table('table3', metadata,
Column('index',BIGINT(10), primary_key=True,nullable=False,autoincrement=True),
Column('id', VARCHAR(40),nullable=False),
Column('id_produit', VARCHAR(40),ForeignKey("table2.id"), nullable=False, index= True),
Column('id_produit_enfant', VARCHAR(40),ForeignKey("table2.id"), nullable=False, index= True)
)
table4= Table('table4', metadata,
Column('index',BIGINT(10), primary_key=True,nullable=False,autoincrement=True),
Column('id', VARCHAR(40),nullable=False),
Column('id_produit', VARCHAR(40),ForeignKey("table2.id"), nullable=False, index= True)
)
table5= Table('table5', metadata,
Column('index',BIGINT(10), primary_key=True,nullable=False,autoincrement=True),
Column('id', VARCHAR(40),nullable=False),
Column('id_produit', VARCHAR(40),ForeignKey("table2.id"), nullable=False, index= True)
)
table6= Table('table6', metadata,
Column('index',BIGINT(10), primary_key=True,nullable=False,autoincrement=True),
Column('id', VARCHAR(40),nullable=False),
Column('id_produit', VARCHAR(40),ForeignKey("table2.id"), nullable=False, index= True)
)```
I'd like for some of my tables to be able to access models via foreign keys, using SqlAlchemy's relationship function. Unfortunately, I'm running into an issue where the foreign keys do not appear to resolve.
What I'm doing differently from the standard SqlAlchemy docs is using an inheritance structure where most tables inherit from the same base.
Tables:
Noun:
nounId INT
Person:
nounId INT PRIMARY KEY References(Noun.nounId)
name STRING
Place:
nounId INT PRIMARY KEY References(Noun.nounId)
location STRING
Plan:
nounId INT PRIMARY KEY References(Noun.nounId)
personId INT References(Person.nounId)
plan STRING
Trip:
nounId INT PRIMARY KEY References(Noun.nounId)
planId INT References(Plan.nounId)
placeId INT References(Place.nounId)
plan STRING
Currently, I'm unable to get SqlAlchemy to resolve the plan.person model and I'm unsure as to why. (I'd also like to be able to resolve person.places via SqlAlchemy's secondary, but I think the two issues might be related).
The following code will raise an error at the "assert" line:
from sqlalchemy import *
from sqlalchemy.orm import *
def test():
metadata = MetaData()
# Base Table
nounTable = Table(
'Nouns', metadata,
Column('nounId', Integer, primary_key=True)
)
personsTable = Table(
'Persons', metadata,
Column('nounId', Integer, ForeignKey('Nouns.nounId'), primary_key=True),
Column('name', String)
)
placesTable = Table(
'Places', metadata,
Column('nounId', Integer, ForeignKey('Nouns.nounId'), primary_key=True),
Column('location', String)
)
plansTable = Table(
'Plans', metadata,
Column('nounId', Integer, ForeignKey('Nouns.nounId'), primary_key=True),
Column('personId', Integer, ForeignKey('Persons.nounId')),
Column('plan', String)
)
tripsTable = Table(
'Trips', metadata,
Column('nounId', Integer, ForeignKey('Nouns.nounId'), primary_key=True),
Column('planId', Integer, ForeignKey('Plans.nounId')),
Column('placeId', Integer, ForeignKey('Places.nounId')),
Column('plan', String)
)
class Noun(object): pass
class Person(Noun): pass
class Place(Noun): pass
class Plan(Noun): pass
class Trip(Noun): pass
mapper(Noun, nounTable)
mapper(Trip, tripsTable, inherits=Noun)
mapper(Place, placesTable, inherits=Noun)
mapper(Plan, plansTable, inherits=Noun, properties={
# SqlAlchemy will raise an exception if `foreign_keys` is not explicitly defined
'person': relationship(Person, foreign_keys=[personsTable.c.nounId], backref='plans')
})
mapper(Person, personsTable, inherits=Noun, properties={
# This is not resolved either
'places': relationship(Place,
secondary = join(Plan, Trip, Plan.nounId==Trip.planId),
secondaryjoin = lambda: Trip.placeId==Place.nounId,
primaryjoin = lambda: Person.nounId==Plan.personId
)
})
engine = create_engine('sqlite://')
metadata.create_all(engine)
session = sessionmaker(bind=engine)()
alice = Person()
alice.name = "alice"
session.add(alice)
session.commit()
planA = Plan()
planA.personId = alice.nounId
planA.plan = "This is a plan"
session.add(planA)
session.commit()
# The reference isn't resolved
assert planA.person, "No person found"
print "Plan: {}".format([planA.nounId, planA.person])
test()
Ideally, I'd like to be able to retrieve all Persons at a place and vice versa, but currently, it is not able to resolve the simple Plan->Person relationship.
I'm having problems with SQLAlchemy's select_from statement when using the core component. I try to construct an outer join query which currently looks like:
query = select([b1.c.id, b1.c.num, n1.c.name, n1.c.num, ...]
).where(and_(
... some conditions ...
)
).select_from(
???.outerjoin(
n1,
and_(
... some conditions ...
)
).select_from(... more outer joins similar to the above ...)
According to the docs, the structure should look like this:
table1 = table('t1', column('a'))
table2 = table('t2', column('b'))
s = select([table1.c.a]).\
select_from(
table1.join(table2, table1.c.a==table2.c.b)
)
My problem is that I don't have a table1 object in this case, as the select ... part consists of columns and not a single table (see question marks in my query). I've tried using n1.outerjoin(n1..., but that caused an exception (Exception: (ProgrammingError) table name "n1" specified more than once).
The above snippet is derived from a working session-based (ORM) query, which I try to convert (with limited success).
b = Table('b', metadata,
Column('id', Integer, Sequence('seq_b_id')),
Column('num', Integer, nullable=False),
Column('active', Boolean, default=False),
)
n = Table('n', metadata,
Column('b_id', Integer, nullable=False),
Column('num', Integer, nullable=False),
Column('active', Boolean, default=False),
)
p = Table('p', metadata,
Column('b_id', Integer, nullable=False),
Column('num', Integer, nullable=False),
Column('active', Boolean, default=False),
)
n1 = aliased(n, name='n1')
n2 = aliased(n, name='n2')
b1 = aliased(b, name='b1')
b2 = aliased(b, name='b2')
p1 = aliased(p, name='p1')
p2 = aliased(p, name='p2')
result = sess.query(b1.id, b1.num, n1.c.name, n1.c.num, p1.par, p1.num).filter(
b1.active==False,
b1.num==sess.query(func.max(b2.num)).filter(
b2.id==b1.id
)
).outerjoin(
n1,
and_(
n1.c.b_id==b1.id,
n1.c.num<=num,
n1.c.active==False,
n1.c.num==sess.query(func.max(n2.num)).filter(
n2.id==n1.c.id
)
)
).outerjoin(
p1,
and_(
p1.b_id==b1.id,
p1.num<=num,
p1.active==False,
p1.num==sess.query(func.max(p2.num)).filter(
p2.id==p1.id
)
)
).order_by(b1.id)
How do I go about converting this ORM query into a plain Core query?
Update:
I was able to narrow down the problem. It seems that a combination of two select_from calls causes the problem.
customer = Table('customer', metadata,
Column('id', Integer),
Column('name', String(50)),
)
order = Table('order', metadata,
Column('id', Integer),
Column('customer_id', Integer),
Column('order_num', Integer),
)
address = Table('address', metadata,
Column('id', Integer),
Column('customer_id', Integer),
Column('city', String(50)),
)
metadata.create_all(db)
customer1 = aliased(customer, name='customer1')
order1 = aliased(order, name='order1')
address1 = aliased(address, name='address1')
columns = [
customer1.c.id, customer.c.name,
order1.c.id, order1.c.order_num,
address1.c.id, address1.c.city
]
query = select(columns)
query = query.select_from(
customer1.outerjoin(
order1,
and_(
order1.c.customer_id==customer1.c.id,
)
)
)
query = query.select_from(
customer1.outerjoin(
address1,
and_(
customer1.c.id==address1.c.customer_id
)
)
)
result = connection.execute(query)
for r in result.fetchall():
print r
The above code causes the following exception:
ProgrammingError: (ProgrammingError) table name "customer1" specified more than once
'SELECT customer1.id, customer.name, order1.id, order1.order_num, address1.id, address1.city \nFROM customer, customer AS customer1 LEFT OUTER JOIN "order" AS order1 ON order1.customer_id = customer1.id, customer AS customer1 LEFT OUTER JOIN address AS address1 ON customer1.id = address1.customer_id' {}
If I was a bit more experienced in using SQLAlchemy, I would say this could be a bug...
I finally managed to solved the problem. Instead of cascading select_from, additional joins need to be chained to the actual join. The above query would read:
query = select(columns)
query = query.select_from(
customer1.outerjoin(
order1,
and_(
order1.c.customer_id==customer1.c.id,
)
).outerjoin(
address1,
and_(
customer1.c.id==address1.c.customer_id
)
)
)
I wanted to optimize my database query:
link_list = select(
columns=[link_table.c.rating, link_table.c.url, link_table.c.donations_in],
whereclause=and_(
not_(link_table.c.id.in_(
select(
columns=[request_table.c.recipient],
whereclause=request_table.c.donator==donator.id
).as_scalar()
)),
link_table.c.id!=donator.id,
),
limit=20,
).execute().fetchall()
and tried to merge those two selects in one query:
link_list = select(
columns=[link_table.c.rating, link_table.c.url, link_table.c.donations_in],
whereclause=and_(
link_table.c.active==True,
link_table.c.id!=donator.id,
request_table.c.donator==donator.id,
link_table.c.id!=request_table.c.recipient,
),
limit=20,
order_by=[link_table.c.rating.desc()]
).execute().fetchall()
the database-schema looks like:
link_table = Table('links', metadata,
Column('id', Integer, primary_key=True, autoincrement=True),
Column('url', Unicode(250), index=True, unique=True),
Column('registration_date', DateTime),
Column('donations_in', Integer),
Column('active', Boolean),
)
request_table = Table('requests', metadata,
Column('id', Integer, primary_key=True, autoincrement=True),
Column('recipient', Integer, ForeignKey('links.id')),
Column('donator', Integer, ForeignKey('links.id')),
Column('date', DateTime),
)
There are several links (donator) in request_table pointing to one link in the link_table. I want to have links from link_table, which are not yet "requested".
But this does not work. Is it actually possible, what I'm trying to do? If so, how would you do that?
Thank you very much in advance!
You may be look for the SQL NOT EXISTS construct:
http://www.sqlalchemy.org/docs/orm/tutorial.html#using-exists
Riffing on masida's answer:
First, the original query:
>>> print select(
... columns=[link_table.c.url, link_table.c.donations_in],
... whereclause=and_(
... not_(link_table.c.id.in_(
... select(
... columns=[request_table.c.recipient],
... whereclause=request_table.c.donator==5
... ).as_scalar()
... )),
... link_table.c.id!=5,
... ),
... limit=20,
... )
SELECT links.url, links.donations_in
FROM links
WHERE links.id NOT IN (SELECT requests.recipient
FROM requests
WHERE requests.donator = :donator_1) AND links.id != :id_1
LIMIT 20
And rewritten in terms of exists():
>>> print select(
... columns=[link_table.c.url, link_table.c.donations_in],
... whereclause=and_(
... not_(exists().where(request_table.c.donator==5)),
... # ^^^^^^^^^^^^^^
... link_table.c.id!=5,
... ),
... limit=20,
... )
SELECT links.url, links.donations_in
FROM links
WHERE NOT (EXISTS (SELECT *
FROM requests
WHERE requests.donator = :donator_1)) AND links.id != :id_1
LIMIT 20