Upserting with SQLAlchemy into Postgres to Table with a '.' in the name - python

I am trying to upsert into a postgres table where some of the columns have a '.' in their name.
example column names: country.name.
It would be best not to change the column name.
When I try to do this I get an error.
def upsert(df: DataFrame, engine: sql_engine) -> None:
with engine.connect() as conn:
base = automap_base()
base.prepare(engine, reflect=True, schema="some_schema")
table1= Table('table1', base.metadata,
autoload=True, autoload_with=engine, schema="some_schema")
stmt = insert(table1).values(df.to_dict('records'))
conn.execute(stmt.on_conflict_do_update(
constraint='table1_pkey',
set_=dict(country.name=stmt.excluded.country.name
)))
I get the following error:
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?
I was trying to follow this recipe which was working fine until the name of the columns had a '.'
https://docs.sqlalchemy.org/en/14/dialects/postgresql.html#updating-using-the-excluded-insert-values
Any tips?

The sqlalchemy statement contains a excluded field where all the columns are, if you use this then it will work.
I created a "updated_dict" where all the columns names and object are from the excluded column. I filter out the primary keys.
This way it won't matter how the name of the columns are constructed.
def upsert(df: DataFrame, engine: sql_engine) -> None:
with engine.connect() as conn:
base = automap_base()
base.prepare(engine, reflect=True, schema="some_schema")
table1= Table('table1', base.metadata,
autoload=True, autoload_with=engine, schema="some_schema")
stmt = insert(table1).values(df.to_dict('records'))
update_dict = {
c.name: c
for c in stmt.excluded
if not c.primary_key
}
conn.execute(stmt.on_conflict_do_update(
constraint='table1_pkey',
set_=update_dict ))

Related

understanding insert, on conflict (upsert) in SQLAlchemy

I'm trying to understand what the set_ means in SQLAlchemy's on_conflict_do_update method. i have the following Table:
Table(
"test",
metadata,
Column("id", Integer, primary_key=True),
Column("firstname", String(100)),
Column("lastname", String(100)),
)
and what insert some like this (if i wrote it in psql)
INSERT INTO test (id, firstname, lastname) VALUES (1, 'John', 'Doe)
ON CONFLICT (id) DO UPDATE SET firstname = EXCLUDED.firstname, lastname = EXCLUDED.lastname
I did some due diligence and saw people write in the set_ like this:
import sqlalchemy.dialects import postgresql
insert_stmt = postgresql.insert(target).values([{'id':1,'firstname':'John','lastname':'Doe'}])
primary_keys = [key.name for key in inspect(target).primary_key]
update_dict = {c.name: c for c in insert_stmt.excluded if not c.primary_key}
stmt = insert_stmt.on_conflict_do_update(index_elements = primary_keys , set_ = update_dict)
engine.execute(stmt)
Is the update_dict just looking at the EXCLUDED values (the ones I want to update with) that I set in my insert_stmt? If I str(update_dict) I get an dictionary of specific information regarding the column {'firstname': Column('firstname', VARCHAR(length=100), table=<excluded>), 'lastname': Column('lastname', VARCHAR(length=100), table=<excluded>)}, is the method above the only way to retrieve the data? Can you write it out manually?

Why am I getting a "relation does not exist" error for existing table with sqlalchemy Metadata?

I have the following code which throws the following error
engine = create_engine('postgresql+psycopg2:....', convert_unicode=True)
metadata = sqlalchemy.MetaData()
table = sqlalchemy.Table('omni.all_order', metadata,
sqlalchemy.Column('o_id', sqlalchemy.Integer),
sqlalchemy.Column('order', sqlalchemy.String),
)
ins = table.insert().values(all_rows)
engine.execute(ins)
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) relation
"omni.all_order" does not exist
But the following two codes work fine
engine = create_engine('postgresql+psycopg2:....', convert_unicode=True)
result = engine.execute("SELECT * from omni.all_order ")
rows = result.fetchall()
print(rows)
--
engine = create_engine('postgresql+psycopg2:....', convert_unicode=True)
engine.execute("INSERT INTO omni.all_order (o_id) VALUES (1) ")
Creating another table first in the same schema (omni) throws the same error
engine = create_engine('postgresql+psycopg2:....', convert_unicode=True)
result = engine.execute("CREATE TABLE omni.all_order_s(o_id INT, order VARCHAR(80))")
metadata = sqlalchemy.MetaData()
table = sqlalchemy.Table('omni.all_order_s', metadata,
sqlalchemy.Column('o_id', sqlalchemy.Integer),
sqlalchemy.Column('order', sqlalchemy.String),
)
ins = table.insert().values(all_rows)
engine.execute(ins)
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) relation
"omni.all_order_s" does not exist
but creating it outside of the schema works fine
engine = create_engine('postgresql+psycopg2:....', convert_unicode=True)
result = engine.execute("CREATE TABLE all_order_s(o_id INT, order VARCHAR(80))")
metadata = sqlalchemy.MetaData()
table = sqlalchemy.Table('all_order_s', metadata,
sqlalchemy.Column('o_id', sqlalchemy.Integer),
sqlalchemy.Column('order', sqlalchemy.String),
)
ins = table.insert().values(all_rows)
engine.execute(ins)
Any ideas why this is?
Pass the table's schema using the schema= keyword argument instead of including it in the table's name:
table = sqlalchemy.Table('all_order', metadata,
sqlalchemy.Column('o_id', sqlalchemy.Integer),
sqlalchemy.Column('order', sqlalchemy.String),
schema='omni',
)
Currently it is quoted as a whole.
I had the same problem and I found the solution in this link: https://dba.stackexchange.com/questions/192897/postgres-relation-does-not-exist-error.
When you create the table name from a variable, the name is passed with quotes, so the name is case sensitive and need the quotes when you called again.

Executing a sqlalchemy exists query

I'm having trouble understanding how to execute a query to check and see if a matching record already exists in sqlalchemy. Most of the examples I can find online seem to reference "session" and "query" objects that I don't have.
Here's a short complete program that illustrates my problem:
1. sets up in-memory sqlite db with "person" table.
2. inserts two records into the person table.
3. check if a particular record exists in the table. This is where it barfs.
from sqlalchemy import create_engine, Table, Column, Integer, String, MetaData
from sqlalchemy.sql.expression import exists
engine = create_engine('sqlite:///:memory:', echo=False)
metadata = MetaData()
person = Table('person', metadata,
Column('id', Integer, primary_key=True),
Column('name', String(255), nullable=False))
metadata.create_all(engine)
conn = engine.connect()
s = person.insert()
conn.execute(s, name="Alice")
conn.execute(s, name="Bob")
print("I can see the names in the table:")
s = person.select()
result = conn.execute(s)
print(result.fetchall())
print('This query looks like it should check to see if a matching record exists:')
s = person.select().where(person.c.name == "Bob")
s = exists(s)
print(s)
print("But it doesn't run...")
result = conn.execute(s)
The output of this program is:
I can see the names in the table:
[(1, 'Alice'), (2, 'Bob')]
This query looks like it should check to see if a matching record exists:
EXISTS (SELECT person.id, person.name
FROM person
WHERE person.name = :name_1)
But it doesn't run...
Traceback (most recent call last):
File "/project_path/db_test/db_test_env/exists_example.py", line 30, in <module>
result = conn.execute(s)
File "/project_path/db_test/db_test_env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 945, in execute
return meth(self, multiparams, params)
File "/project_path/db_test/db_test_env/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 265, in _execute_on_connection
raise exc.ObjectNotExecutableError(self)
sqlalchemy.exc.ObjectNotExecutableError: Not an executable object: <sqlalchemy.sql.selectable.Exists object at 0x105797438>
The s.exists() is only building the exists clause. All you need to do to get your code to work is to generate a select for it.
s = exists(s).select()
Here's your full example:
from sqlalchemy import create_engine, Table, Column, Integer, String, MetaData
from sqlalchemy.sql.expression import exists
engine = create_engine('sqlite:///:memory:', echo=False)
metadata = MetaData()
person = Table('person', metadata,
Column('id', Integer, primary_key=True),
Column('name', String(255), nullable=False))
metadata.create_all(engine)
conn = engine.connect()
s = person.insert()
conn.execute(s, name="Alice")
conn.execute(s, name="Bob")
print("I can see the names in the table:")
s = person.select()
result = conn.execute(s)
print(result.fetchall())
print('This query looks like it should check to see if a matching record exists:')
s = person.select().where(person.c.name == "Bob")
s = exists(s).select()
print(s)
print("And it runs fine...")
result = conn.execute(s)
print(result.fetchall())
exists is used in SQL subqueries. If you had a table posts containing blog post with an author_id, mapping back to people, you might use a query like the following to find people who had made a blog post:
select * from people where exists (select author_id from posts where author_id = people.id);
You can't have a exists as the outermost statement in an SQL query; it's an operator to use in SQL boolean clauses.
So, SQLAlchemy is not letting you execute that query because it's not well-formed.
If you want to see if a row exists, just construct a select statement with a where clause and see how many rows the query returns.
Try this instead:
...
s = person.select().where(person.c.name == "Bob")
s = select(exists(s))
print(s)
...
Unless someone suggests a better answer, here's what I've come up with that works. Having the DB count the matching records and send just the count to the python app.
from sqlalchemy import select, func # more imports not in my example code above
s = select([func.count(1)]).select_from(person).where(person.c.name == "Bob")
print(s)
record_count = conn.execute(s).scalar()
print("Matching records: ", record_count)
Example output:
SELECT count(:count_2) AS count_1
FROM person
WHERE person.name = :name_1
Matching records: 1

Sqlalchemy - Auto-instantiate all tables

Here a simple sqlalchemy task, where i try to create instances of each table present in the database:
from sqlalchemy import MetaData, create_engine, Table
engine = create_engine("here my engine details...")
metadata = MetaData()
If i type engine.table_names() , I can see all my tables' names, for instance ['indicators', 'prices', 'scripts'].
I would normally go at creating instances of each of them as follow:
scripts = Table('scripts', metadata, autoload = True, autoload_with=engine)
indicators = Table('indicators', metadata, autoload = True, autoload_with=engine)
prices = Table('prices', metadata, autoload = True, autoload_with=engine)
But is there a way to create the Table instances without coding them explicitely?
Doing this:
tables = engine.table_names()
for table in tables:
table = Table( table , metadata, autoload = True, autoload_with=engine)
obviously doesn't work.
Any suggestion appreciated
You can do just that. This code will get you a list of tables:
my_tables = [Table(table,metadata,autoload=True,autoload_with=engine) for
table in engine.table_names()]
If you prefer a dictionary do this:
my_tables = {table:Table(table,metadata,autoload=True,autoload_with=engine)
for table in engine.table_names()}
With the dictionary you get O(1) lookup of tables when accessing the elements of your dictionary:
my_tables['indicators']

How to specify the primary id when inserting rows with sqlalchemy when id dos not have autoincrement?

I do have database table that has an id primary key that is not an auto-increment (sequence). So it's up to the user to create an unique id or the insert will fail.
This table is not under my control, so I cannot change the database structure.
from sqlalchemy import create_engine, Table, MetaData
import psycopg2
db = create_engine('postgresql://...', echo=False).connect()
meta = MetaData()
meta.reflect(bind=db)
t = Table("mytable", meta, autoload=True, autoload_with=db)
values = { "title":"title", "id": ... }# ???
t.insert(bind=db, values=values).execute()
Given this is "single-user" / "single-client" system, you should be able to use the Column defaults: Python-Executed Functions. The example on the documentation linked to is enough to get you started. I would, however, use python function but with proper initialization from the datatabase adn still stored in a global variable:
def new_id_factory():
if not('_MYTABLE_ID_' in globals()):
q = db.execute("select max(mytable.id) as max_id from mytable").fetchone()
_MYTABLE_ID_ = (q and q.max_id) or 0
_MYTABLE_ID_ += 1
return _MYTABLE_ID_
t = Table("mytable", Base.metadata,
Column('id', Integer, primary_key=True, default=new_id_factory), #
autoload=True, autoload_with=db,
)

Categories