I have a table in PostgreSQL with interval field. There is a possibility that someone may want to store something like INTERVAL '1 MONTH' in this table. In my Python application, I have a timedelta object which is substituted to a query string:
with sqla_engine.connect() as conn:
# 'params' contains parametrised SQL where one of the fields is a timedelta object
return conn.execute(text(query).execution_options(autocommit=autocommit), params)
I want to replace my timedelta object with something that is translated as INTERVAL '1 MONTH' by SQLAlchemy Engine. Is that possible?
And, in reverse, how can I read interval '1 month' value from PostgreSQL into something usable in Python?
I want to replace my timedelta object with something that is translated as INTERVAL '1 MONTH' by SQLAlchemy Engine. Is that possible?
PostgreSQL accepts string values for interval columns so this works:
from sqlalchemy import (
create_engine,
Table,
MetaData,
Column,
Integer,
text,
Interval,
)
engine = create_engine(
"postgresql://scott:tiger#192.168.0.199/test", echo=True
)
tbl = Table(
"tbl",
MetaData(),
Column("id", Integer, primary_key=True, autoincrement=False),
Column("intrvl", Interval()),
)
tbl.drop(engine, checkfirst=True)
tbl.create(engine)
"""SQL emitted:
CREATE TABLE tbl (
id INTEGER NOT NULL,
intrvl INTERVAL,
PRIMARY KEY (id)
)
"""
with engine.begin() as conn:
conn.execute(tbl.insert(), {"id": 1, "intrvl": "1 MONTH"})
"""SQL emitted:
2021-03-29 17:32:27,427 INFO sqlalchemy.engine.Engine INSERT INTO tbl (id, intrvl) VALUES (%(id)s, %(intrvl)s)
2021-03-29 17:32:27,428 INFO sqlalchemy.engine.Engine [generated in 0.00032s] {'id': 1, 'intrvl': '1 MONTH'}
"""
and if we query the table from psql we can see that the value has been stored:
gord#gord-dv7-xubuntu0:~$ psql -h 192.168.0.199 test scott
Password for user scott:
psql (12.6 (Ubuntu 12.6-1.pgdg18.04+1), server 12.3 (Debian 12.3-1.pgdg100+1))
Type "help" for help.
test=# select * from tbl;
id | intrvl
----+--------
1 | 1 mon
(1 row)
And, in reverse, how can I read interval '1 month' value from PostgreSQL into something usable in Python?
psycopg2 will return the value as a timedelta, but timedelta does not support months= so it just assumes 30 days:
results = conn.execute(text("SELECT * FROM tbl")).fetchall()
print(results)
# [(1, datetime.timedelta(days=30))]
Update:
Is there a way to make a correct transformation, translate it into string value "1 month" for example?
Your SQL query could ask for cast(intrvl as varchar(50)) as intrvl_str to get back a string, and if you wanted to make that automatic you could define intrvl_str as a Computed (generated) column in the table:
tbl = Table(
"tbl",
MetaData(),
Column("id", Integer, primary_key=True, autoincrement=False),
Column("intrvl", Interval()),
Column("intrvl_str", String(50), Computed("cast (intrvl as varchar(50))")),
)
tbl.drop(engine, checkfirst=True)
tbl.create(engine)
"""SQL emitted:
CREATE TABLE tbl (
id INTEGER NOT NULL,
intrvl INTERVAL,
intrvl_str VARCHAR(50) GENERATED ALWAYS AS (cast (intrvl as varchar(50))) STORED,
PRIMARY KEY (id)
)
"""
with engine.begin() as conn:
conn.execute(tbl.insert(), {"id": 1, "intrvl": "1 MONTH"})
"""SQL emitted: (same as before)
2021-03-29 17:32:27,427 INFO sqlalchemy.engine.Engine INSERT INTO tbl (id, intrvl) VALUES (%(id)s, %(intrvl)s)
2021-03-29 17:32:27,428 INFO sqlalchemy.engine.Engine [generated in 0.00032s] {'id': 1, 'intrvl': '1 MONTH'}
"""
results = conn.execute(text("SELECT * FROM tbl")).fetchall()
print(results)
# [(1, datetime.timedelta(days=30), '1 mon')]
Related
I need to test a python flask app that uses mySQL to run its' queries using sqlalchemy, with sqlite3.
I've encountered an exception when trying to test an upsert function using an ON DUPLICATE clause:
(sqlite3.OperationalError) near "DUPLICATE": syntax error
After a brief search for a solution, I've found that the correct syntax for sqlite to execute upsert queries is ON CONFLICT(id) DO UPDATE SET ..., I've tried it but mySQL doesn't recognize this syntax.
What can I do? How can I do an upsert query so sqlite3 and mySQL will both execute it properly?
Example:
employees table:
id
name
1
Jeff Bezos
2
Bill Gates
INSERT INTO employees(id,name)
VALUES(1, 'Donald Trump')
ON DUPLICATE KEY UPDATE name = VALUES(name);
Should update the table to be:
id
name
1
Donald Trump
2
Bill Gates
Thanks in advance!
How can I do an upsert query so sqlite3 and mySQL will both execute it properly?
You can achieve the same result by attempting an UPDATE, and if no match is found then do an INSERT. The following code uses SQLAlchemy Core constructs, which provide further protection from the subtle differences between MySQL and SQLite . For example, if your table had a column named "order" then SQLAlchemy would emit this DDL for MySQL …
CREATE TABLE employees (
id INTEGER NOT NULL,
name VARCHAR(50),
`order` INTEGER,
PRIMARY KEY (id)
)
… and this DDL for SQLite
CREATE TABLE employees (
id INTEGER NOT NULL,
name VARCHAR(50),
"order" INTEGER,
PRIMARY KEY (id)
)
import logging
import sqlalchemy as sa
# pick one
connection_url = "mysql+mysqldb://scott:tiger#localhost:3307/mydb"
# connection_url = "sqlite://"
engine = sa.create_engine(connection_url)
def _dump_table():
with engine.begin() as conn:
print(conn.exec_driver_sql("SELECT * FROM employees").all())
def _setup_example():
employees = sa.Table(
"employees",
sa.MetaData(),
sa.Column("id", sa.Integer, primary_key=True, autoincrement=False),
sa.Column("name", sa.String(50)),
)
employees.drop(engine, checkfirst=True)
employees.create(engine)
# create initial example data
with engine.begin() as conn:
conn.execute(
employees.insert(),
[{"id": 1, "name": "Jeff Bezos"}, {"id": 2, "name": "Bill Gates"}],
)
def upsert_employee(id_, name):
employees = sa.Table("employees", sa.MetaData(), autoload_with=engine)
with engine.begin() as conn:
result = conn.execute(
employees.update().where(employees.c.id == id_), {"name": name}
)
logging.debug(f" {result.rowcount} row(s) updated.")
if result.rowcount == 0:
result = conn.execute(
employees.insert(), {"id": id_, "name": name}
)
logging.debug(f" {result.rowcount} row(s) inserted.")
if __name__ == "__main__":
logging.basicConfig(level=logging.DEBUG)
_setup_example()
_dump_table()
"""
[(1, 'Jeff Bezos'), (2, 'Bill Gates')]
"""
upsert_employee(3, "Donald Trump")
"""
DEBUG:root: 0 row(s) updated.
DEBUG:root: 1 row(s) inserted.
"""
_dump_table()
"""
[(1, 'Jeff Bezos'), (2, 'Bill Gates'), (3, 'Donald Trump')]
"""
upsert_employee(1, "Elon Musk")
"""
DEBUG:root: 1 row(s) updated.
"""
_dump_table()
"""
[(1, 'Elon Musk'), (2, 'Bill Gates'), (3, 'Donald Trump')]
"""
I have a function that I use to update tables in PostgreSQL. It works great to avoid duplicate insertions by creating a temp table and dropping it upon completion. However, I have a few tables with serial ids and I have to pass the serial id in a column. Otherwise, I get an error that the keys are missing. How can I insert values in those tables and have the serial key get assigned automatically? I would prefer to modify the function below if possible.
def export_to_sql(df, table_name):
from sqlalchemy import create_engine
engine = create_engine(f'postgresql://{user}:{password}#{host}:5432/{user}')
df.to_sql(con=engine,
name='temporary_table',
if_exists='append',
index=False,
method = 'multi')
with engine.begin() as cnx:
insert_sql = f'INSERT INTO {table_name} (SELECT * FROM temporary_table) ON CONFLICT DO NOTHING; DROP TABLE temporary_table'
cnx.execute(insert_sql)
code used to create the tables
CREATE TABLE symbols
(
symbol_id serial NOT NULL,
symbol varchar(50) NOT NULL,
CONSTRAINT PK_symbols PRIMARY KEY ( symbol_id )
);
CREATE TABLE tweet_symols(
tweet_id varchar(50) REFERENCES tweets,
symbol_id int REFERENCES symbols,
PRIMARY KEY (tweet_id, symbol_id),
UNIQUE (tweet_id, symbol_id)
);
CREATE TABLE hashtags
(
hashtag_id serial NOT NULL,
hashtag varchar(140) NOT NULL,
CONSTRAINT PK_hashtags PRIMARY KEY ( hashtag_id )
);
CREATE TABLE tweet_hashtags
(
tweet_id varchar(50) NOT NULL,
hashtag_id integer NOT NULL,
CONSTRAINT FK_344 FOREIGN KEY ( tweet_id ) REFERENCES tweets ( tweet_id )
);
CREATE INDEX fkIdx_345 ON tweet_hashtags
(
tweet_id
);
The INSERT statement does not define the target columns, so Postgresql will attempt to insert values into a column that was defined as SERIAL.
We can work around this by providing a list of target columns, omitting the serial types. To do this we use SQLAlchemy to fetch the metadata of the table that we are inserting into from the database, then make a list of target columns. SQLAlchemy doesn't tell us if a column was created using SERIAL, but we will assume that it is if it is a primary key and is set to autoincrement. Primary key columns defined with GENERATED ... AS IDENTITY will also be filtered out - this is probably desirable as they behave in the same way as SERIAL columns.
import sqlalchemy as sa
def export_to_sql(df, table_name):
engine = sa.create_engine(f'postgresql://{user}:{password}#{host}:5432/{user}')
df.to_sql(con=engine,
name='temporary_table',
if_exists='append',
index=False,
method='multi')
# Fetch table metadata from the database
table = sa.Table(table_name, sa.MetaData(), autoload_with=engine)
# Get the names of columns to be inserted,
# assuming auto-incrementing PKs are serial types
column_names = ','.join(
[f'"{c.name}"' for c in table.columns
if not (c.primary_key and c.autoincrement)]
)
with engine.begin() as cnx:
insert_sql = sa.text(
f'INSERT INTO {table_name} ({column_names}) (SELECT * FROM temporary_table) ON CONFLICT DO NOTHING; DROP TABLE temporary_table'
)
cnx.execute(insert_sql)
Consider the following database table:
ID ticker description
1 GDBR30 30YR
2 GDBR10 10YR
3 GDBR5 5YR
4 GDBR2 2YR
It can be replicated with this piece of code:
from sqlalchemy import (
Column,
Integer,
MetaData,
String,
Table,
create_engine,
insert,
select,
)
engine = create_engine("sqlite+pysqlite:///:memory:", echo=True, future=True)
metadata = MetaData()
# Creating the table
tickers = Table(
"tickers",
metadata,
Column("id", Integer, primary_key=True, autoincrement=True),
Column("ticker", String, nullable=False),
Column("description", String(), nullable=False),
)
metadata.create_all(engine)
# Populating the table
with engine.connect() as conn:
result = conn.execute(
insert(tickers),
[
{"ticker": "GDBR30", "description": "30YR"},
{"ticker": "GDBR10", "description": "10YR"},
{"ticker": "GDBR5", "description": "5YR"},
{"ticker": "GDBR2", "description": "2YR"},
],
)
conn.commit()
I need to filter tickers for some values:
search_list = ["GDBR10", "GDBR5", "GDBR30"]
records = conn.execute(
select(tickers.c.description).where((tickers.c.ticker).in_(search_list))
)
print(records.fetchall())
# Result
# [('30YR',), ('10YR',), ('5YR',)]
However, I need the resulting list of tuples ordered in the way search_list has been ordered. That is, I need the following result:
print(records.fetchall())
# Expected result
# [('10YR',), ('5YR',), ('30YR',)]
Using SQLite, you could create a cte with two columns (id and ticker). Applying the following code will lead to the expected result (see Maintain order when using SQLite WHERE-clause and IN operator). Unfortunately, I am not able to transfer the SQLite solution to sqlalchemy.
WITH cte(id, ticker) AS (VALUES (1, 'GDBR10'), (2, 'GDBR5'), (3, 'GDBR30'))
SELECT t.*
FROM tbl t INNER JOIN cte c
ON c.ticker = t.ticker
ORDER BY c.id
Suppose, I have search_list_tuple as folllows, how am I suppose to code the sqlalchemy query?
search_list_tuple = [(1, 'GDBR10'), (2, 'GDBR5'), (3, 'GDBR30')]
Below works and is actually equivalent to the VALUES (...) on sqlite albeit somewhat more verbose:
# construct the CTE
sub_queries = [
select(literal(i).label("id"), literal(v).label("ticker"))
for i, v in enumerate(search_list)
]
cte = union_all(*sub_queries).cte("cte")
# desired query
records = conn.execute(
select(tickers.c.description)
.join(cte, cte.c.ticker == tickers.c.ticker)
.order_by(cte.c.id)
)
print(records.fetchall())
# [('10YR',), ('5YR',), ('30YR',)]
Below is using the values() contruct, but unfortunately the resulting query fails on SQLite, but it works perfectly on postgresql:
cte = select(
values(
column("id", Integer), column("ticker", String), name="subq"
).data(list(zip(range(len(search_list)), search_list)))
).cte("cte")
qq = (
select(tickers.c.description)
.join(cte, cte.c.ticker == tickers.c.ticker)
.order_by(cte.c.id)
)
records = conn.execute(qq)
print(records.fetchall())
I'd like to run raw SQL queries through SQLAlchemy and have the resulting rows use python types which are automatically mapped from the database type. This AutoMap functionality is available for tables in the database. Is it available for any arbitrary resultset?
As an example, we build small sqlite database:
import sqlite3
con = sqlite3.connect('test.db')
cur = con.cursor()
cur.execute("CREATE TABLE Trainer (id INTEGER PRIMARY KEY, first_name VARCHAR(50), last_name VARCHAR(50), dob DATE, tiger_skill FLOAT);")
cur.execute("INSERT INTO Trainer VALUES (1, 'Joe', 'Exotic', '1963-03-05', 0.6)")
cur.execute("INSERT INTO Trainer VALUES (2, 'Carole', 'Baskin', '1961-06-06', 0.3)")
cur.close()
con.commit()
con.close()
And uing SQLAlchemy, I query the newly created database "test.db":
from sqlalchemy import create_engine
engine = create_engine("sqlite:///test.db")
connection = engine.connect()
CUSTOM_SQL_QUERY = "SELECT count(*) as total_trainers, min(dob) as first_dob from Trainer"
result = connection.execute(CUSTOM_SQL_QUERY)
for r in result:
print(r)
>>> (2, '1961-06-06')
Notice that the second column in the result set is a python string, not a python datetime.date object. Is there a way for sqlalchemy to automap an arbitrary result set? Or is this automap reflection capability limited to just actual tables in the database?
Using SQLAlchemy on PostgreSQL, I try to improve performance at insertion (about 100k egdes to insert), executing "nested inserts" in a single query for one edge and its nodes.
Using Insert.from_select, I get following error and I don't really understand why.
CompileError: bindparam() name 'name' is reserved for automatic usage in the VALUES or SET clause of this insert/update statement. Please use a name other than column name when using bindparam() with insert() or update() (for example, 'b_name').
from sqlalchemy import *
metadata = MetaData()
node = Table('node', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
)
edge = Table('edge', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
Column('source_id', Integer(), ForeignKey(node.c.id)),
Column('target_id', Integer(), ForeignKey(node.c.id)),
)
engine = create_engine('postgres://postgres:postgres#db:5432')
metadata.create_all(engine)
e1_source = insert(node).values(name='e1_source').returning(node.c.id).cte('source')
e1_target = insert(node).values(name='e1_target').returning(node.c.id).cte('target')
e1 = insert(edge).from_select(
['source_id', 'target_id', 'name'], # bindparam error
# ['source_id', 'target_id', 'b_name'], # key error
# [edge.c.source_id, edge.c.target_id, edge.c.name], # bindparam error
select([
e1_source.c.id,
e1_target.c.id,
literal('e1'),
])
)
engine.execute(e1)
EDIT: Below, the SQL query I was expected to produces. I remain open to any suggestions to achieve my purpose though.
CREATE TABLE node (
id SERIAL PRIMARY KEY,
name VARCHAR
);
CREATE TABLE edge (
id SERIAL PRIMARY KEY,
source_id INTEGER REFERENCES node (id),
target_id INTEGER REFERENCES node (id),
name VARCHAR
);
WITH source AS (
INSERT INTO node (name)
VALUES ('e1_source')
RETURNING id
), target as (
INSERT INTO node (name)
VALUES ('e1_target')
RETURNING id
)
INSERT INTO edge (source_id, target_id, name)
SELECT source.id, target.id, 'e1'
FROM source, target;
I have finally figured out where bindparam was implicitly used by SQLAlchemy to solve my issue: in the node queries and not the edge query as I was first thinking.
But I am still not sure if this is the proper way to perform nested insert queries with SQLAlchemy and if it will improve execution time.
e1_source = insert(node).values(name=bindparam('source_name')).returning(node.c.id).cte('source')
e1_target = insert(node).values(name=bindparam('target_name')).returning(node.c.id).cte('target')
e1 = insert(edge).from_select(
['source_id', 'target_id', 'name'],
select([
e1_source.c.id,
e1_target.c.id,
literal('e1'),
])
)
engine.execute(e1, {
'source_name': 'e1_source',
'target_name': 'e1_target',
})