I'm trying to understand what the set_ means in SQLAlchemy's on_conflict_do_update method. i have the following Table:
Table(
"test",
metadata,
Column("id", Integer, primary_key=True),
Column("firstname", String(100)),
Column("lastname", String(100)),
)
and what insert some like this (if i wrote it in psql)
INSERT INTO test (id, firstname, lastname) VALUES (1, 'John', 'Doe)
ON CONFLICT (id) DO UPDATE SET firstname = EXCLUDED.firstname, lastname = EXCLUDED.lastname
I did some due diligence and saw people write in the set_ like this:
import sqlalchemy.dialects import postgresql
insert_stmt = postgresql.insert(target).values([{'id':1,'firstname':'John','lastname':'Doe'}])
primary_keys = [key.name for key in inspect(target).primary_key]
update_dict = {c.name: c for c in insert_stmt.excluded if not c.primary_key}
stmt = insert_stmt.on_conflict_do_update(index_elements = primary_keys , set_ = update_dict)
engine.execute(stmt)
Is the update_dict just looking at the EXCLUDED values (the ones I want to update with) that I set in my insert_stmt? If I str(update_dict) I get an dictionary of specific information regarding the column {'firstname': Column('firstname', VARCHAR(length=100), table=<excluded>), 'lastname': Column('lastname', VARCHAR(length=100), table=<excluded>)}, is the method above the only way to retrieve the data? Can you write it out manually?
Related
I am using SQLAlchemy to pull data from my database. More specifically, I use the db.select method. So I manage to pull out only the values from the columns or only the names of the columns, but I need to pull out in the format NAME: VALUE. Help how to do this?
connection = engine.connect()
metadata = db.MetaData()
report = db.Table('report', metadata, autoload=True, autoload_with=engine)
query = db.select([report])
ResultProxy = connection.execute(query)
ResultSet = ResultProxy.fetchall()
With SQLAlchemy 1.4+ we can use .mappings() to return results in a dictionary-like format:
import sqlalchemy as sa
# …
t = sa.Table(
"t",
sa.MetaData(),
sa.Column("id", sa.Integer, primary_key=True, autoincrement=False),
sa.Column("txt", sa.String),
)
t.create(engine)
# insert some sample data
with engine.begin() as conn:
conn.exec_driver_sql(
"INSERT INTO t (id, txt) VALUES (1, 'foo'), (2, 'bar')"
)
# test code
with engine.begin() as conn:
results = conn.execute(select(t)).mappings().fetchall()
pprint(results)
# [{'id': 1, 'txt': 'foo'}, {'id': 2, 'txt': 'bar'}]
As the docs state, ResultProxy.fetchall() returns a list of RowProxy objects. These behave like namedtuples, but can also be used like dictionaries:
>>> ResultSet[0]['column_name']
column_value
For more info, see https://docs.sqlalchemy.org/en/13/core/tutorial.html#coretutorial-selecting
Using Python and SQLAlchemy, is it possible to insert None / NIL_UUID / NULL value in a Postgresql foreign key column that links to a primary key, both stored as UUID ?
None returns column none does not exist :
statement = "INSERT INTO tb_person (pk_person, first_name, last_name, fk_person_parent) VALUES ('9ce131...985
fea06', 'John', 'Doe', None)"
parameters = {}, context = <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0x7fbff5ea2730>
def do_execute(self, cursor, statement, parameters, context=None):
> cursor.execute(statement, parameters)
E psycopg2.errors.UndefinedColumn: column "none" does not exist
E LINE 1: '9ce131...985','John', 'Doe', None)
E ^
E HINT: Perhaps you meant to reference the column "tb_person.last_name".
../../.local/share/virtualenvs/project/lib/python3.8/site-packages/sqlalchemy/engine/default.py:593: UndefinedColumn
a NIL_UUID (i.e. a valid UUID formed with 0s) returns psycopg2.errors.ForeignKeyViolation:
E psycopg2.errors.ForeignKeyViolation: insert or update on table "tb_person" violates foreign key constrain
t "tb_person_fk_person_parent_fkey"
E DETAIL: Key (fk_person_parent)=(00000000-0000-0000-0000-000000000000) is not present in table "tb_person
".
MORE DETAILS
I use SQLAlchemy classical mapping (SQLAlchemy Core), my table is defined like this :
tb_person = Table(
"tb_person",
metadata,
Column(
"pk_person",
UUID(as_uuid=True),
default=uuid.uuid4,
unique=True,
nullable=False
),
Column("first_name", String(255)),
Column("last_name", String(255)),
Column(
"fk_person_parent", UUID(as_uuid=True),
ForeignKey("tb_person.pk_person"),
nullable=True
)
)
The mapper is defined like this :
client_mapper = mapper(
domain.model.Person,
tb_person,
properties={
"child": relationship(domain.model.Person),
},
)
The unit test works well when inserting a UUID that already exists in the database in the pk_person field.
I was using raw SQL, and as suggested by #AdrianKlaver it is much preferable to use params - it fixes the problem.
# In the context of this unit test,
# we have to handle UUID generation here
uuid_1 = uuid.uuid4()
uuid_2 = uuid.uuid4()
uuid_3 = uuid.uuid4()
session.execute(
"""
INSERT INTO tb_person
(pk_person, first_name, last_name, fk_parent_person)
VALUES
(:uuid_1, 'John', 'Doe', :uuid_none),
(:uuid_2, 'Jean', 'Dupont', :uuid_none)
(:uuid_3, 'Baby', 'Doe', :uuid_1)
""",
{"uuid_1": uuid_1, "uuid_2", "uuid_3": uuid_3, ":uuid_none": None}
)
It effectively translates to NULL in the query.
Using SQLAlchemy on PostgreSQL, I try to improve performance at insertion (about 100k egdes to insert), executing "nested inserts" in a single query for one edge and its nodes.
Using Insert.from_select, I get following error and I don't really understand why.
CompileError: bindparam() name 'name' is reserved for automatic usage in the VALUES or SET clause of this insert/update statement. Please use a name other than column name when using bindparam() with insert() or update() (for example, 'b_name').
from sqlalchemy import *
metadata = MetaData()
node = Table('node', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
)
edge = Table('edge', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
Column('source_id', Integer(), ForeignKey(node.c.id)),
Column('target_id', Integer(), ForeignKey(node.c.id)),
)
engine = create_engine('postgres://postgres:postgres#db:5432')
metadata.create_all(engine)
e1_source = insert(node).values(name='e1_source').returning(node.c.id).cte('source')
e1_target = insert(node).values(name='e1_target').returning(node.c.id).cte('target')
e1 = insert(edge).from_select(
['source_id', 'target_id', 'name'], # bindparam error
# ['source_id', 'target_id', 'b_name'], # key error
# [edge.c.source_id, edge.c.target_id, edge.c.name], # bindparam error
select([
e1_source.c.id,
e1_target.c.id,
literal('e1'),
])
)
engine.execute(e1)
EDIT: Below, the SQL query I was expected to produces. I remain open to any suggestions to achieve my purpose though.
CREATE TABLE node (
id SERIAL PRIMARY KEY,
name VARCHAR
);
CREATE TABLE edge (
id SERIAL PRIMARY KEY,
source_id INTEGER REFERENCES node (id),
target_id INTEGER REFERENCES node (id),
name VARCHAR
);
WITH source AS (
INSERT INTO node (name)
VALUES ('e1_source')
RETURNING id
), target as (
INSERT INTO node (name)
VALUES ('e1_target')
RETURNING id
)
INSERT INTO edge (source_id, target_id, name)
SELECT source.id, target.id, 'e1'
FROM source, target;
I have finally figured out where bindparam was implicitly used by SQLAlchemy to solve my issue: in the node queries and not the edge query as I was first thinking.
But I am still not sure if this is the proper way to perform nested insert queries with SQLAlchemy and if it will improve execution time.
e1_source = insert(node).values(name=bindparam('source_name')).returning(node.c.id).cte('source')
e1_target = insert(node).values(name=bindparam('target_name')).returning(node.c.id).cte('target')
e1 = insert(edge).from_select(
['source_id', 'target_id', 'name'],
select([
e1_source.c.id,
e1_target.c.id,
literal('e1'),
])
)
engine.execute(e1, {
'source_name': 'e1_source',
'target_name': 'e1_target',
})
I am refactoring some old SQLite3 SQL statements in Python into SQLAlchemy. In our framework, we have the following SQL statements that takes in a dict with certain known keys and potentially any number of unexpected keys and values (depending what information was provided).
import sqlite3
import sys
def dict_factory(cursor, row):
d = {}
for idx, col in enumerate(cursor.description):
d[col[0]] = row[idx]
return d
def Create_DB(db):
# Delete the database
from os import remove
remove(db)
# Recreate it and format it as needed
with sqlite3.connect(db) as conn:
conn.row_factory = dict_factory
conn.text_factory = str
cursor = conn.cursor()
cursor.execute("CREATE TABLE [Listings] ([ID] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL UNIQUE, [timestamp] REAL NOT NULL DEFAULT(( datetime ( 'now' , 'localtime' ) )), [make] VARCHAR, [model] VARCHAR, [year] INTEGER);")
def Add_Record(db, data):
with sqlite3.connect(db) as conn:
conn.row_factory = dict_factory
conn.text_factory = str
cursor = conn.cursor()
#get column names already in table
cursor.execute("SELECT * FROM 'Listings'")
col_names = list(map(lambda x: x[0], cursor.description))
#check if column doesn't exist in table, then add it
for i in data.keys():
if i not in col_names:
cursor.execute("ALTER TABLE 'Listings' ADD COLUMN '{col}' {type}".format(col=i, type='INT' if type(data[i]) is int else 'VARCHAR'))
#Insert record into table
cursor.execute("INSERT INTO Listings({cols}) VALUES({vals});".format(cols = str(data.keys()).strip('[]'),
vals=str([data[i] for i in data]).strip('[]')
))
#Database filename
db = 'test.db'
Create_DB(db)
data = {'make': 'Chevy',
'model' : 'Corvette',
'year' : 1964,
'price' : 50000,
'color' : 'blue',
'doors' : 2}
Add_Record(db, data)
data = {'make': 'Chevy',
'model' : 'Camaro',
'year' : 1967,
'price' : 62500,
'condition' : 'excellent'}
Add_Record(db, data)
This level of dynamicism is necessary because there's no way we can know what additional information will be provided, but, regardless, it's important that we store all information provided to us. This has never been a problem because in our framework, as we've never expected an unwieldy number of columns in our tables.
While the above code works, it's obvious that it's not a clean implementation and thus why I'm trying to refactor it into SQLAlchemy's cleaner, more robust ORM paradigm. I started going through SQLAlchemy's official tutorials and various examples and have arrived at the following code:
from sqlalchemy import Column, String, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
Base = declarative_base()
class Listing(Base):
__tablename__ = 'Listings'
id = Column(Integer, primary_key=True)
make = Column(String)
model = Column(String)
year = Column(Integer)
engine = create_engine('sqlite:///')
session = sessionmaker()
session.configure(bind=engine)
Base.metadata.create_all(engine)
data = {'make':'Chevy',
'model' : 'Corvette',
'year' : 1964}
record = Listing(**data)
s = session()
s.add(record)
s.commit()
s.close()
and it works beautifully with that data dict. Now, when I add a new keyword, such as
data = {'make':'Chevy',
'model' : 'Corvette',
'year' : 1964,
'price' : 50000}
I get a TypeError: 'price' is an invalid keyword argument for Listing error. To try and solve the issue, I modified the class to be dynamic, too:
class Listing(Base):
__tablename__ = 'Listings'
id = Column(Integer, primary_key=True)
make = Column(String)
model = Column(String)
year = Column(Integer)
def __checker__(self, data):
for i in data.keys():
if i not in [a for a in dir(self) if not a.startswith('__')]:
if type(i) is int:
setattr(self, i, Column(Integer))
else:
setattr(self, i, Column(String))
else:
self[i] = data[i]
But I quickly realized this would not work at all for several reasons, e.g. the class was already initialized, the data dict cannot be fed into the class without reinitializing it, it's a hack more than anything, et al.). The more I think about it, the less obvious the solution using SQLAlchemy seems to me. So, my main question is, how do I implement this level of dynamicism using SQLAlchemy?
I've researched a bit to see if anyone has a similar issue. The closest I've found was Dynamic Class Creation in SQLAlchemy but it only talks about the constant attributes ("tablename" et al.). I believe the unanswered https://stackoverflow.com/questions/29105206/sqlalchemy-dynamic-attribute-change may be asking the same question. While Python is not my forte, I consider myself a highly skilled programmer (C++ and JavaScript are my strongest languages) in the context scientific/engineering applications, so I may not hitting the correct Python-specific keywords in my searches.
I welcome any and all help.
class Listing(Base):
__tablename__ = 'Listings'
id = Column(Integer, primary_key=True)
make = Column(String)
model = Column(String)
year = Column(Integer)
def __init__(self,**kwargs):
for k,v in kwargs.items():
if hasattr(self,k):
setattr(self,k,v)
else:
engine.execute("ALTER TABLE %s AD COLUMN %s"%(self.__tablename__,k)
setattr(self.__class__,Column(k, String))
setattr(self,k,v)
might work ... maybe ... I am not entirely sure I did not test it
a better solution would be to use a relational table
class Attribs(Base):
listing_id = Column(Integer,ForeignKey("Listing"))
name = Column(String)
val = Column(String)
class Listing(Base):
id = Column(Integer,primary_key = True)
attributes = relationship("Attribs",backref="listing")
def __init__(self,**kwargs):
for k,v in kwargs.items():
Attribs(listing_id=self.id,name=k,value=v)
def __str__(self):
return "\n".join(["A LISTING",] + ["%s:%s"%(a.name,a.val) for a in self.attribs])
another solution would be to store json
class Listing(Base):
__tablename__ = 'Listings'
id = Column(Integer, primary_key=True)
data = Column(String)
def __init__(self,**kwargs):
self.data = json.dumps(kwargs)
self.data_dict = kwargs
the best solution would be to use a no-sql key,value store (maybe even just a simple json file? or perhaps shelve? or even pickle I guess)
I do have database table that has an id primary key that is not an auto-increment (sequence). So it's up to the user to create an unique id or the insert will fail.
This table is not under my control, so I cannot change the database structure.
from sqlalchemy import create_engine, Table, MetaData
import psycopg2
db = create_engine('postgresql://...', echo=False).connect()
meta = MetaData()
meta.reflect(bind=db)
t = Table("mytable", meta, autoload=True, autoload_with=db)
values = { "title":"title", "id": ... }# ???
t.insert(bind=db, values=values).execute()
Given this is "single-user" / "single-client" system, you should be able to use the Column defaults: Python-Executed Functions. The example on the documentation linked to is enough to get you started. I would, however, use python function but with proper initialization from the datatabase adn still stored in a global variable:
def new_id_factory():
if not('_MYTABLE_ID_' in globals()):
q = db.execute("select max(mytable.id) as max_id from mytable").fetchone()
_MYTABLE_ID_ = (q and q.max_id) or 0
_MYTABLE_ID_ += 1
return _MYTABLE_ID_
t = Table("mytable", Base.metadata,
Column('id', Integer, primary_key=True, default=new_id_factory), #
autoload=True, autoload_with=db,
)