SQLAlchemy convert outer join ORM query to Core - python

I'm having problems with SQLAlchemy's select_from statement when using the core component. I try to construct an outer join query which currently looks like:
query = select([b1.c.id, b1.c.num, n1.c.name, n1.c.num, ...]
).where(and_(
... some conditions ...
)
).select_from(
???.outerjoin(
n1,
and_(
... some conditions ...
)
).select_from(... more outer joins similar to the above ...)
According to the docs, the structure should look like this:
table1 = table('t1', column('a'))
table2 = table('t2', column('b'))
s = select([table1.c.a]).\
select_from(
table1.join(table2, table1.c.a==table2.c.b)
)
My problem is that I don't have a table1 object in this case, as the select ... part consists of columns and not a single table (see question marks in my query). I've tried using n1.outerjoin(n1..., but that caused an exception (Exception: (ProgrammingError) table name "n1" specified more than once).
The above snippet is derived from a working session-based (ORM) query, which I try to convert (with limited success).
b = Table('b', metadata,
Column('id', Integer, Sequence('seq_b_id')),
Column('num', Integer, nullable=False),
Column('active', Boolean, default=False),
)
n = Table('n', metadata,
Column('b_id', Integer, nullable=False),
Column('num', Integer, nullable=False),
Column('active', Boolean, default=False),
)
p = Table('p', metadata,
Column('b_id', Integer, nullable=False),
Column('num', Integer, nullable=False),
Column('active', Boolean, default=False),
)
n1 = aliased(n, name='n1')
n2 = aliased(n, name='n2')
b1 = aliased(b, name='b1')
b2 = aliased(b, name='b2')
p1 = aliased(p, name='p1')
p2 = aliased(p, name='p2')
result = sess.query(b1.id, b1.num, n1.c.name, n1.c.num, p1.par, p1.num).filter(
b1.active==False,
b1.num==sess.query(func.max(b2.num)).filter(
b2.id==b1.id
)
).outerjoin(
n1,
and_(
n1.c.b_id==b1.id,
n1.c.num<=num,
n1.c.active==False,
n1.c.num==sess.query(func.max(n2.num)).filter(
n2.id==n1.c.id
)
)
).outerjoin(
p1,
and_(
p1.b_id==b1.id,
p1.num<=num,
p1.active==False,
p1.num==sess.query(func.max(p2.num)).filter(
p2.id==p1.id
)
)
).order_by(b1.id)
How do I go about converting this ORM query into a plain Core query?
Update:
I was able to narrow down the problem. It seems that a combination of two select_from calls causes the problem.
customer = Table('customer', metadata,
Column('id', Integer),
Column('name', String(50)),
)
order = Table('order', metadata,
Column('id', Integer),
Column('customer_id', Integer),
Column('order_num', Integer),
)
address = Table('address', metadata,
Column('id', Integer),
Column('customer_id', Integer),
Column('city', String(50)),
)
metadata.create_all(db)
customer1 = aliased(customer, name='customer1')
order1 = aliased(order, name='order1')
address1 = aliased(address, name='address1')
columns = [
customer1.c.id, customer.c.name,
order1.c.id, order1.c.order_num,
address1.c.id, address1.c.city
]
query = select(columns)
query = query.select_from(
customer1.outerjoin(
order1,
and_(
order1.c.customer_id==customer1.c.id,
)
)
)
query = query.select_from(
customer1.outerjoin(
address1,
and_(
customer1.c.id==address1.c.customer_id
)
)
)
result = connection.execute(query)
for r in result.fetchall():
print r
The above code causes the following exception:
ProgrammingError: (ProgrammingError) table name "customer1" specified more than once
'SELECT customer1.id, customer.name, order1.id, order1.order_num, address1.id, address1.city \nFROM customer, customer AS customer1 LEFT OUTER JOIN "order" AS order1 ON order1.customer_id = customer1.id, customer AS customer1 LEFT OUTER JOIN address AS address1 ON customer1.id = address1.customer_id' {}
If I was a bit more experienced in using SQLAlchemy, I would say this could be a bug...

I finally managed to solved the problem. Instead of cascading select_from, additional joins need to be chained to the actual join. The above query would read:
query = select(columns)
query = query.select_from(
customer1.outerjoin(
order1,
and_(
order1.c.customer_id==customer1.c.id,
)
).outerjoin(
address1,
and_(
customer1.c.id==address1.c.customer_id
)
)
)

Related

Maintain order when using sqlalchemy WHERE-clause and IN operator

Consider the following database table:
ID ticker description
1 GDBR30 30YR
2 GDBR10 10YR
3 GDBR5 5YR
4 GDBR2 2YR
It can be replicated with this piece of code:
from sqlalchemy import (
Column,
Integer,
MetaData,
String,
Table,
create_engine,
insert,
select,
)
engine = create_engine("sqlite+pysqlite:///:memory:", echo=True, future=True)
metadata = MetaData()
# Creating the table
tickers = Table(
"tickers",
metadata,
Column("id", Integer, primary_key=True, autoincrement=True),
Column("ticker", String, nullable=False),
Column("description", String(), nullable=False),
)
metadata.create_all(engine)
# Populating the table
with engine.connect() as conn:
result = conn.execute(
insert(tickers),
[
{"ticker": "GDBR30", "description": "30YR"},
{"ticker": "GDBR10", "description": "10YR"},
{"ticker": "GDBR5", "description": "5YR"},
{"ticker": "GDBR2", "description": "2YR"},
],
)
conn.commit()
I need to filter tickers for some values:
search_list = ["GDBR10", "GDBR5", "GDBR30"]
records = conn.execute(
select(tickers.c.description).where((tickers.c.ticker).in_(search_list))
)
print(records.fetchall())
# Result
# [('30YR',), ('10YR',), ('5YR',)]
However, I need the resulting list of tuples ordered in the way search_list has been ordered. That is, I need the following result:
print(records.fetchall())
# Expected result
# [('10YR',), ('5YR',), ('30YR',)]
Using SQLite, you could create a cte with two columns (id and ticker). Applying the following code will lead to the expected result (see Maintain order when using SQLite WHERE-clause and IN operator). Unfortunately, I am not able to transfer the SQLite solution to sqlalchemy.
WITH cte(id, ticker) AS (VALUES (1, 'GDBR10'), (2, 'GDBR5'), (3, 'GDBR30'))
SELECT t.*
FROM tbl t INNER JOIN cte c
ON c.ticker = t.ticker
ORDER BY c.id
Suppose, I have search_list_tuple as folllows, how am I suppose to code the sqlalchemy query?
search_list_tuple = [(1, 'GDBR10'), (2, 'GDBR5'), (3, 'GDBR30')]
Below works and is actually equivalent to the VALUES (...) on sqlite albeit somewhat more verbose:
# construct the CTE
sub_queries = [
select(literal(i).label("id"), literal(v).label("ticker"))
for i, v in enumerate(search_list)
]
cte = union_all(*sub_queries).cte("cte")
# desired query
records = conn.execute(
select(tickers.c.description)
.join(cte, cte.c.ticker == tickers.c.ticker)
.order_by(cte.c.id)
)
print(records.fetchall())
# [('10YR',), ('5YR',), ('30YR',)]
Below is using the values() contruct, but unfortunately the resulting query fails on SQLite, but it works perfectly on postgresql:
cte = select(
values(
column("id", Integer), column("ticker", String), name="subq"
).data(list(zip(range(len(search_list)), search_list)))
).cte("cte")
qq = (
select(tickers.c.description)
.join(cte, cte.c.ticker == tickers.c.ticker)
.order_by(cte.c.id)
)
records = conn.execute(qq)
print(records.fetchall())

SQLAlchemy CheckConstraint with multiple conditions raises warning

I'm using SQLAlchemy 1.3.18, Python 3.8.5 and PostgreSQL 12.
I have the following table declaration with a Check Constraint with multiple columns and conditions:
Table(
'my_table',
MetaData(),
Column('id', Integer, primary_key=True),
Column('start', DateTime(), nullable=False),
Column('end', DateTime(), nullable=False),
CheckConstraint(
and_(
or_(
func.date_trunc('month', column('start')) == func.date_trunc('month', column('end')),
func.extract('day', column('end')) == 1
),
(column('end') - (column('start') + func.make_interval(0, 1)) <= func.make_interval())
)
)
)
Although the application DOES create the check constraint in the database correctly, I'm getting the following warning:
C:\Python38\lib\site-packages\sqlalchemy\sql\base.py:559: SAWarning:
Column 'end' on table None being replaced by
<sqlalchemy.sql.elements.ColumnClause at 0x26522ab0e50; end>, which
has the same key. Consider use_labels for select() statements.
C:\Python38\lib\site-packages\sqlalchemy\sql\base.py:559: SAWarning:
Column 'start' on table None being replaced by
<sqlalchemy.sql.elements.ColumnClause at 0x26522ab0b80; start>, which
has the same key. Consider use_labels for select() statements.
C:\Python38\lib\site-packages\sqlalchemy\sql\base.py:559: SAWarning:
Column 'end' on table None being replaced by
<sqlalchemy.sql.elements.ColumnClause at 0x26522ab0c70; end>, which
has the same key. Consider use_labels for select() statements.
What I am doing wrong?
Thanks to Ilja Everilä for the comment that solved the problem.
This is the solution, put the columns in variables so they are the same object in memory.
my_table_start = column('start')
my_table_end = column('end')
Table(
'my_table',
MetaData(),
Column('id', Integer, primary_key=True),
Column('start', DateTime(), nullable=False),
Column('end', DateTime(), nullable=False),
CheckConstraint(
and_(
or_(
func.date_trunc('month', my_table_start) == func.date_trunc('month', my_table_end),
func.extract('day', my_table_end) == 1
),
(my_table_end - (my_table_start + func.make_interval(0, 1)) <= func.make_interval())
)
)
)

(Beginner) SQLAlchemy JOIN with SQLITE

I am studying sqlalchemy and I cannot understand the reason for this error.
from sqlalchemy import create_engine, Table, Column, MetaData, INTEGER, String, ForeignKey, ForeignKeyConstraint
from sqlalchemy.sql import Select
engine = create_engine('sqlite:///:memory:')
with engine.connect() as conn:
meta = MetaData(engine)
cars = Table('Cars', meta,
Column('Id', INTEGER, primary_key=True, autoincrement=True),
Column('Name', String, nullable=False),
Column('BrandId', INTEGER, ForeignKey('Brands.Id')))
brands = Table('Brands', meta,
Column('Id', INTEGER, primary_key=True, autoincrement=True),
Column('Name', String))
meta.create_all()
cars_values = [{'Name':'Escort', 'BrandId':1}]
brands_values = [{'Id':1, 'Name': 'Ford'}]
insert1 = brands.insert().values(brands_values)
insert2 = cars.insert().values(cars_values)
conn.execute(insert1)
conn.execute(insert2)
query = Select([cars]).join(brands, brands.c.Id == cars.c.BrandId)
#query = 'select * from cars c JOIN brands b on b.id = c.brandid'
result = conn.execute(query)
print(result.fetchall())
When I run this way, I get an error
Select([cars]).join(brands, brands.c.Id == cars.c.BrandId)
sqlalchemy.exc.ObjectNotExecutableError: Not an executable object: <sqlalchemy.sql.selectable.Join at 0x1e62654ed88; Join object on Select object(2087997204936) and Brands(2087997203848)>
But if you run raw sql the JOIN is accepted
'select * from cars c JOIN brands b on b.id = c.brandid'
[(1, 'Escort', 1, 1, 'Ford')]
query = select(['*']).select_from(cars.join(brands, brands.c.Id == cars.c.BrandId))
# print(query)
# SELECT * FROM "Cars" JOIN "Brands" ON "Brands"."Id" = "Cars"."BrandId"
see Using Joins
This is another way
from sqlalchemy import text
query = text("select c.Id, c.Name,c.BrandId, b.Id, b.Name from Cars c left join
Brands b on b.Id = b.Id ")
result = engine.execute(query)

Concat databases and update duplicate rows

My problem is : I have multiple identical databases and I want to merge them into one. But I may have duplicate entries as Primary Keys. What i'm trying to do is handle the duplicates before putting them in Mysql.
my actual code is:
df = pd.DataFrame()
duplicates = pd.DataFrame()
size=[]
lignes=[]
chunksize = 100000
for db in dbtuple: #For each databases in the tuple given as entry
engine = create_engine(URL(
drivername="mysql",
username="xxx",
password="xxx",
host="localhost",
database=db
))
conn = engine.connect()
#Get the data from the table given as entry
sql = "SELECT * FROM "+ tableName
#Execution of the query above
generator_df = pd.read_sql(sql=sql,con=conn,chunksize= chunksize)
#Init of sizechunk value
sizechunk = 0
#Because the query can be very big number of rows there's a separation
# every 100k rows so that dataframe size <= 100k
for dataframe in generator_df:
df = pd.concat([df,dataframe],ignore_index = True, axis=0,sort=False)
#We add the size of the chunk to know how many rows we have per database
sizechunk+= dataframe.shape[0]
size.append(sizechunk)
if tableName == 'table1':
duplicates = df.duplicated(subset='id')
for i in range(0,len(df)):
if duplicates[i]:
df.id[i] = numligne + '_' + df.id[i]
#same for all tables
But this is not pythonic way at all and is it very long to execute. Do you have any suggestion on how to improve the code to make it faster ?
Here is my db schema to understand better:
table1 = Table('table1', metadata,
Column('id', VARCHAR(40), primary_key=True,nullable=False),
mysql_engine='InnoDB'
)
table2= Table('table2', metadata,
Column('id', VARCHAR(40), primary_key=True,nullable=False),
Column('id_of', VARCHAR(20),ForeignKey("table1.id"), nullable=False, index= True)
)
table3= Table('table3', metadata,
Column('index',BIGINT(10), primary_key=True,nullable=False,autoincrement=True),
Column('id', VARCHAR(40),nullable=False),
Column('id_produit', VARCHAR(40),ForeignKey("table2.id"), nullable=False, index= True),
Column('id_produit_enfant', VARCHAR(40),ForeignKey("table2.id"), nullable=False, index= True)
)
table4= Table('table4', metadata,
Column('index',BIGINT(10), primary_key=True,nullable=False,autoincrement=True),
Column('id', VARCHAR(40),nullable=False),
Column('id_produit', VARCHAR(40),ForeignKey("table2.id"), nullable=False, index= True)
)
table5= Table('table5', metadata,
Column('index',BIGINT(10), primary_key=True,nullable=False,autoincrement=True),
Column('id', VARCHAR(40),nullable=False),
Column('id_produit', VARCHAR(40),ForeignKey("table2.id"), nullable=False, index= True)
)
table6= Table('table6', metadata,
Column('index',BIGINT(10), primary_key=True,nullable=False,autoincrement=True),
Column('id', VARCHAR(40),nullable=False),
Column('id_produit', VARCHAR(40),ForeignKey("table2.id"), nullable=False, index= True)
)```

SQLAlchemy: select over multiple tables

I wanted to optimize my database query:
link_list = select(
columns=[link_table.c.rating, link_table.c.url, link_table.c.donations_in],
whereclause=and_(
not_(link_table.c.id.in_(
select(
columns=[request_table.c.recipient],
whereclause=request_table.c.donator==donator.id
).as_scalar()
)),
link_table.c.id!=donator.id,
),
limit=20,
).execute().fetchall()
and tried to merge those two selects in one query:
link_list = select(
columns=[link_table.c.rating, link_table.c.url, link_table.c.donations_in],
whereclause=and_(
link_table.c.active==True,
link_table.c.id!=donator.id,
request_table.c.donator==donator.id,
link_table.c.id!=request_table.c.recipient,
),
limit=20,
order_by=[link_table.c.rating.desc()]
).execute().fetchall()
the database-schema looks like:
link_table = Table('links', metadata,
Column('id', Integer, primary_key=True, autoincrement=True),
Column('url', Unicode(250), index=True, unique=True),
Column('registration_date', DateTime),
Column('donations_in', Integer),
Column('active', Boolean),
)
request_table = Table('requests', metadata,
Column('id', Integer, primary_key=True, autoincrement=True),
Column('recipient', Integer, ForeignKey('links.id')),
Column('donator', Integer, ForeignKey('links.id')),
Column('date', DateTime),
)
There are several links (donator) in request_table pointing to one link in the link_table. I want to have links from link_table, which are not yet "requested".
But this does not work. Is it actually possible, what I'm trying to do? If so, how would you do that?
Thank you very much in advance!
You may be look for the SQL NOT EXISTS construct:
http://www.sqlalchemy.org/docs/orm/tutorial.html#using-exists
Riffing on masida's answer:
First, the original query:
>>> print select(
... columns=[link_table.c.url, link_table.c.donations_in],
... whereclause=and_(
... not_(link_table.c.id.in_(
... select(
... columns=[request_table.c.recipient],
... whereclause=request_table.c.donator==5
... ).as_scalar()
... )),
... link_table.c.id!=5,
... ),
... limit=20,
... )
SELECT links.url, links.donations_in
FROM links
WHERE links.id NOT IN (SELECT requests.recipient
FROM requests
WHERE requests.donator = :donator_1) AND links.id != :id_1
LIMIT 20
And rewritten in terms of exists():
>>> print select(
... columns=[link_table.c.url, link_table.c.donations_in],
... whereclause=and_(
... not_(exists().where(request_table.c.donator==5)),
... # ^^^^^^^^^^^^^^
... link_table.c.id!=5,
... ),
... limit=20,
... )
SELECT links.url, links.donations_in
FROM links
WHERE NOT (EXISTS (SELECT *
FROM requests
WHERE requests.donator = :donator_1)) AND links.id != :id_1
LIMIT 20

Categories