Sqlalchemy bulk update in MySQL works very slow - python

I'm using SQLAlchemy 1.0.0, and want to make some UPDATE ONLY (update if match primary key else do nothing) queries in batch.
I've made some experiment and found that bulk update looks much slower than bulk insert or bulk upsert.
Could you please help me to point out why it works so slow or is there any alternative way/idea to make the BULK UPDATE (not BULK UPSERT) with SQLAlchemy ?
Below is the table in MYSQL:
CREATE TABLE `test` (
`id` int(11) unsigned NOT NULL,
`value` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
And the test code:
from sqlalchemy import create_engine, text
import time
driver = 'mysql'
host = 'host'
user = 'user'
password = 'password'
database = 'database'
url = "{}://{}:{}#{}/{}?charset=utf8".format(driver, user, password, host, database)
engine = create_engine(url)
engine.connect()
engine.execute('TRUNCATE TABLE test')
num_of_rows = 1000
rows = []
for i in xrange(0, num_of_rows):
rows.append({'id': i, 'value': i})
print '--------- test insert --------------'
sql = '''
INSERT INTO test (id, value)
VALUES (:id, :value)
'''
start = time.time()
engine.execute(text(sql), rows)
end = time.time()
print 'Cost {} seconds'.format(end - start)
print '--------- test upsert --------------'
for r in rows:
r['value'] = r['id'] + 1
sql = '''
INSERT INTO test (id, value)
VALUES (:id, :value)
ON DUPLICATE KEY UPDATE value = VALUES(value)
'''
start = time.time()
engine.execute(text(sql), rows)
end = time.time()
print 'Cost {} seconds'.format(end - start)
print '--------- test update --------------'
for r in rows:
r['value'] = r['id'] * 10
sql = '''
UPDATE test
SET value = :value
WHERE id = :id
'''
start = time.time()
engine.execute(text(sql), rows)
end = time.time()
print 'Cost {} seconds'.format(end - start)
The output when num_of_rows = 100:
--------- test insert --------------
Cost 0.568960905075 seconds
--------- test upsert --------------
Cost 0.569655895233 seconds
--------- test update --------------
Cost 20.0891299248 seconds
The output when num_of_rows = 1000:
--------- test insert --------------
Cost 0.807548999786 seconds
--------- test upsert --------------
Cost 0.584554195404 seconds
--------- test update --------------
Cost 206.199367046 seconds
The network latency to database server is around 500ms.
Looks like in bulk update it send and execute each query one by one, not in batch?
Thanks in advance.

You can speed up bulk update operations with a trick, even if the database-server (like in your case) has a very bad latency. Instead of updating your table directly, you use a stage-table to insert your new data very fast, then do one join-update to the destination-table. This also has the advantage that you reduce the number of statements you have to send to the database quite dramatically.
How does this work with UPDATEs?
Say you have a table entries and you have new data coming in all the time, but you only want to update those which have already been stored. You create a copy of your destination-table entries_stage with only the relevant fields in it:
entries = Table('entries', metadata,
Column('id', Integer, autoincrement=True, primary_key=True),
Column('value', Unicode(64), nullable=False),
)
entries_stage = Table('entries_stage', metadata,
Column('id', Integer, autoincrement=False, unique=True),
Column('value', Unicode(64), nullable=False),
)
Then you insert your data with a bulk-insert. This can be sped up even further if you use MySQL's multiple value insert syntax, which isn't natively supported by SQLAlchemy, but can be built without much difficulty.
INSERT INTO enries_stage (`id`, `value`)
VALUES
(1, 'string1'), (2, 'string2'), (3, 'string3'), ...;
In the end, you update the values of the destination-table with the values from the stage-table like this:
UPDATE entries e
JOIN entries_stage es ON e.id = es.id
SET e.value = es.value
WHERE e.value != es.value;
Then you're done.
What about inserts?
This also works to speed up inserts of course. As you already have the data in the stage-table, all you need to do is issue a INSERT INTO ... SELECT statement, with the data which is not in destination-table yet.
INSERT INTO entries (id, value)
SELECT FROM entries_stage es
LEFT JOIN entries e ON e.id = es.id
HAVING e.id IS NULL;
The nice thing about this is that you don't have to do INSERT IGNORE, REPLACE or ON DUPLICATE KEY UPDATE, which will increment your primary key, even if they will do nothing.

Related

Dynamic Partition creation in Postgres with Python

I am trying to create a Python program to dynamically create a partition on a partitioned Postgres table ("my_test_hist") and load data into the partition baed on the eff_date column. If the partition exists then the records should be loaded to this existing partition, else a new partition should be created and the data should be loaded to the new partition.
With the below sample data, there should be 2 partitions created in the "my_test_hist" partitioned table -
The partitioned table ("my_test_hist") will source data from the "my_test" table by :
INSERT INTO my_test_hist SELECT * from my_test;
I am getting the following error while running the Python program:-
no partition of relation "my_test_hist" found for row
DETAIL: Partition key of the failing row contains (eff_date) = (2022-07-15)
The code snippets are as follows:
create table my_test -- Source table non-partitioned
(
id int,
area_code varchar(10),
fname varchar(10),
eff_date date
) ;
INSERT INTO my_test VALUES(1, 'A505', 'John', DATE '2022-07-15');
INSERT INTO my_test VALUES(2, 'A506', 'Mike', DATE '2022-07-20');
COMMIT;
create table my_test_hist -- Target table partitioned
(
id int,
area_code varchar(10),
fname varchar(10),
eff_date date
) PARTITION BY LIST (eff_date) ; -- Partitioned by List on eff_date col
DB Func:
CREATE FUNCTION insert_my_test_hist_part() RETURNS trigger
LANGUAGE plpgsql
AS $$
BEGIN
BEGIN
/* try to create a table for the new partition */
EXECUTE
format(
'CREATE TABLE %I (LIKE my_test_hist INCLUDING DEFAULTS)',
'my_test_hist_' || to_char(NEW.eff_date, 'YYYY_MM_DD')
);
/* tell listener to attach the partition (only if a new table was created) */
EXECUTE
format(
'NOTIFY my_test_hist, %L', to_char(NEW.eff_date, 'YYYY_MM_DD')
);
EXCEPTION
WHEN duplicate_table THEN
NULL; -- ignore
END;
/* insert into the new partition */
EXECUTE
format(
'INSERT INTO %I VALUES ($1.*)', 'my_test_hist_' || to_char(NEW.eff_date, 'YYYY_MM_DD') )
USING NEW;
RETURN NULL;
END;
$$;
DB Trigger:
CREATE OR REPLACE TRIGGER insert_my_test_hist_part_trigger
BEFORE INSERT ON MY_TEST_HIST FOR EACH ROW
WHEN (pg_trigger_depth() < 1)
EXECUTE FUNCTION insert_my_test_hist_part();
Python Listener Program:
try:
conn = psycopg2.connect(host=hostName, dbname=dbName, user=userName, password=password, port=port)
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
cursor = conn.cursor()
def listen_notify():
cursor.execute(f"LISTEN my_test_hist;")
query1 = "ALTER TABLE my_test_hist ADD PARTITION my_test_hist_{} FOR VALUES IN ('{}');"
query2 = "INSERT INTO my_test_hist SELECT * FROM my_test ;"
while True:
cursor.execute(query2) # Trigger the insert to notify
if select.select([conn], [], [], 5) == ([],[],[]):
print("Timeout")
else:
conn.poll()
while conn.notifies:
notify = conn.notifies.pop()
var = notify.payload
query = query1.format(var, var)
cursor.execute(query)
conn.notifies.clear()
# Call the function
listen_notify()
except Exception as e:
print("Exception occurred: " +str(e))
Can anyone please help me fix the error in Python. Also please let me know how to use asyncio in the program and how can I terminate the infinite loop once the message is caught.
Thanks.

How to upsert pandas DataFrame to MySQL with SQLAlchemy

I'm pushing data from a data-frame into MySQL, right now it is only adding new data to the table if the data does not exists(appending). This works perfect, however I also want my code to check if the record already exists then it needs to update. So I need it to append + update. I really don't know how to start fixing this as I got stuck....someone tried this before?
This is my code:
engine = create_engine("mysql+pymysql://{user}:{pw}#localhost/{db}"
.format(user="root",
pw="*****",
db="my_db"))
my_df.to_sql('my_table', con = engine, if_exists = 'append')
You can use next solution on DB side:
First: create table for insert data from Pandas (let call it test):
CREATE TABLE `test` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(100) NOT NULL,
`capacity` INT(11) NOT NULL,
PRIMARY KEY (`id`)
);
Second: Create table for resulting data (let call it cumulative_test) exactly same structure as test:
CREATE TABLE `cumulative_test` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(100) NOT NULL,
`capacity` INT(11) NOT NULL,
PRIMARY KEY (`id`)
);
Third: set trigger on each insert into the test table will insert ore update record in the second table like:
DELIMITER $$
CREATE
/*!50017 DEFINER = 'root'#'localhost' */
TRIGGER `before_test_insert` BEFORE INSERT ON `test`
FOR EACH ROW BEGIN
DECLARE _id INT;
SELECT id INTO _id
FROM `cumulative_test` WHERE `cumulative_test`.`name` = new.name;
IF _id IS NOT NULL THEN
UPDATE cumulative_test
SET `cumulative_test`.`capacity` = `cumulative_test`.`capacity` + new.capacity;
ELSE
INSERT INTO `cumulative_test` (`name`, `capacity`)
VALUES (NEW.name, NEW.capacity);
END IF;
END;
$$
DELIMITER ;
So you will already insert values into the test table and get calculated results in the second table. The logic inside the trigger can be matched for your needs.
Similar to the approach used for PostgreSQL here, you can use INSERT … ON DUPLICATE KEY in MySQL:
with engine.begin() as conn:
# step 0.0 - create test environment
conn.execute(sa.text("DROP TABLE IF EXISTS main_table"))
conn.execute(
sa.text(
"CREATE TABLE main_table (id int primary key, txt varchar(50))"
)
)
conn.execute(
sa.text(
"INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
)
)
# step 0.1 - create DataFrame to UPSERT
df = pd.DataFrame(
[(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
)
# step 1 - create temporary table and upload DataFrame
conn.execute(
sa.text(
"CREATE TEMPORARY TABLE temp_table (id int primary key, txt varchar(50))"
)
)
df.to_sql("temp_table", conn, index=False, if_exists="append")
# step 2 - merge temp_table into main_table
conn.execute(
sa.text(
"""\
INSERT INTO main_table (id, txt)
SELECT id, txt FROM temp_table
ON DUPLICATE KEY UPDATE txt = VALUES(txt)
"""
)
)
# step 3 - confirm results
result = conn.execute(
sa.text("SELECT * FROM main_table ORDER BY id")
).fetchall()
print(result) # [(1, 'row 1 new text'), (2, 'new row 2 text')]

fast insert (on conflict) many rows to postges-DB with python

I want to write messages from a websocket to a postgres-DB running on a Raspberry Pi.
The average message/seconds ratio from the websocket is about 30messages/second. But within peaks it reaches up to 250 messages/second.
I implemented a python program to receive the messages and write them to the database with sqlalchemy orm. After each message i first check if the same primary key already exists and then do an update or an insert, afterwards i always do a commit, and so it gets very slow. I can write maximally 30 messages/second to the database. In peak-times this is a problem.
So i tested several approaches to speed things up.
This is my best approach:
I first make all the single-querys (with psycopg2) and then join them together and send the complete querystring to the database to execute it at once --> so it speeds up to 580 messages /second.
Create the table for Testdata:
CREATE TABLE transactions (
id int NOT NULL PRIMARY KEY,
name varchar(255),
description varchar(255),
country_name varchar(255),
city_name varchar(255),
cost varchar(255),
currency varchar(255),
created_at DATE,
billing_type varchar(255),
language varchar(255),
operating_system varchar(255)
);
example copied from https://medium.com/technology-nineleaps/mysql-sqlalchemy-performance-b123584eb833
Python-Test-Skript:
import random
import time
from faker import Faker
import psycopg2
from psycopg2.extensions import AsIs
"""psycopg2"""
psycopg2_conn = {'host':'192.168.176.101',
'dbname':'test',
'user':'blabla',
'password':'blabla'}
connection_psycopg2 = psycopg2.connect(**psycopg2_conn)
myFactory = Faker()
def random_data():
billing_type_list = ['cheque', 'cash', 'credit', 'debit', 'e-wallet']
language = ['English', 'Bengali', 'Kannada']
operating_system = 'linux'
random_dic = {}
for i in range(0, 300):
id = int(i)
name = myFactory.name()
description = myFactory.text()
country_name = myFactory.country()
city_name = myFactory.city()
cost = str(myFactory.random_digit_not_null())
currency = myFactory.currency_code()
created_at = myFactory.date_time_between(start_date="-30y", end_date="now", tzinfo=None)
billing_type = random.choice(billing_type_list)
language = random.choice(language)
operating_system = operating_system
random_dic[id] = {}
for xname in ['id', 'description', 'country_name','city_name','cost','currency',
'created_at', 'billing_type','language','operating_system']:
random_dic[id][xname]=locals()[xname]
print(id)
return random_dic
def single_insert_on_conflict_psycopg2(idic, icur):
cur=icur
columns = idic.keys()
columns_with_excludephrase = ['EXCLUDED.{}'.format(column) for column in columns]
values = [idic[column] for column in columns]
insert_statement = """
insert into transactions (%s) values %s
ON CONFLICT ON CONSTRAINT transactions_pkey
DO UPDATE SET (%s) = (%s)
"""
#insert_statement = 'insert into transactions (%s) values %s'
print(','.join(columns))
print(','.join(columns_with_excludephrase))
print(tuple(values))
xquery = cur.mogrify(insert_statement,(
AsIs (','.join(columns)) ,
tuple(values),
AsIs (','.join(columns)) ,
AsIs (','.join(columns_with_excludephrase))
))
print(xquery)
return xquery
def complete_run_psycopg2(random_dic):
querylist=[]
starttime = time.time()
cur = connection_psycopg2.cursor()
for key in random_dic:
print(key)
query=single_insert_on_conflict_psycopg2(idic=random_dic[key],
icur=cur)
querylist.append(query.decode("utf-8") )
complete_query = ';'.join(tuple(querylist))
cur.execute(complete_query)
connection_psycopg2.commit()
cur.close()
endtime = time.time()
xduration=endtime-starttime
write_sec=len(random_dic)/xduration
print('complete Duration:{}'.format(xduration))
print('writes per second:{}'.format(write_sec))
return write_sec
def main():
random_dic = random_data()
complete_run_psycopg2(random_dic)
return
if __name__ == '__main__':
main()
Now my question: is this a proper approach? Are there any hints I didn’t consider?
First You can not insert column names like that. I would use .format to inject column names, and then use %s for the values.
SQL = 'INSERT INTO ({}) VALUES (%s,%s,%s,%s,%s,%s)'.format(','.join(columnns))
db.Pcursor().execute(SQL, value1, value2, value3)
Second you will get better speed if you use async processes.
Fortunately for you I wrote a gevent async library for psycopg2 you can use. It makes the process far easier, it is async threaded and pooled.
Python Postgres psycopg2 ThreadedConnectionPool exhausted

SQLAlchemy - performing a bulk upsert (if exists, update, else insert) in postgresql

I am trying to write a bulk upsert in python using the SQLAlchemy module (not in SQL!).
I am getting the following error on a SQLAlchemy add:
sqlalchemy.exc.IntegrityError: (IntegrityError) duplicate key value violates unique constraint "posts_pkey"
DETAIL: Key (id)=(TEST1234) already exists.
I have a table called posts with a primary key on the id column.
In this example, I already have a row in the db with id=TEST1234. When I attempt to db.session.add() a new posts object with the id set to TEST1234, I get the error above. I was under the impression that if the primary key already exists, the record would get updated.
How can I upsert with Flask-SQLAlchemy based on primary key alone? Is there a simple solution?
If there is not, I can always check for and delete any record with a matching id, and then insert the new record, but that seems expensive for my situation, where I do not expect many updates.
There is an upsert-esque operation in SQLAlchemy:
db.session.merge()
After I found this command, I was able to perform upserts, but it is worth mentioning that this operation is slow for a bulk "upsert".
The alternative is to get a list of the primary keys you would like to upsert, and query the database for any matching ids:
# Imagine that post1, post5, and post1000 are posts objects with ids 1, 5 and 1000 respectively
# The goal is to "upsert" these posts.
# we initialize a dict which maps id to the post object
my_new_posts = {1: post1, 5: post5, 1000: post1000}
for each in posts.query.filter(posts.id.in_(my_new_posts.keys())).all():
# Only merge those posts which already exist in the database
db.session.merge(my_new_posts.pop(each.id))
# Only add those posts which did not exist in the database
db.session.add_all(my_new_posts.values())
# Now we commit our modifications (merges) and inserts (adds) to the database!
db.session.commit()
You can leverage the on_conflict_do_update variant. A simple example would be the following:
from sqlalchemy.dialects.postgresql import insert
class Post(Base):
"""
A simple class for demonstration
"""
id = Column(Integer, primary_key=True)
title = Column(Unicode)
# Prepare all the values that should be "upserted" to the DB
values = [
{"id": 1, "title": "mytitle 1"},
{"id": 2, "title": "mytitle 2"},
{"id": 3, "title": "mytitle 3"},
{"id": 4, "title": "mytitle 4"},
]
stmt = insert(Post).values(values)
stmt = stmt.on_conflict_do_update(
# Let's use the constraint name which was visible in the original posts error msg
constraint="post_pkey",
# The columns that should be updated on conflict
set_={
"title": stmt.excluded.title
}
)
session.execute(stmt)
See the Postgres docs for more details about ON CONFLICT DO UPDATE.
See the SQLAlchemy docs for more details about on_conflict_do_update.
Side-Note on duplicated column names
The above code uses the column names as dict keys both in the values list and the argument to set_. If the column-name is changed in the class-definition this needs to be changed everywhere or it will break. This can be avoided by accessing the column definitions, making the code a bit uglier, but more robust:
coldefs = Post.__table__.c
values = [
{coldefs.id.name: 1, coldefs.title.name: "mytitlte 1"},
...
]
stmt = stmt.on_conflict_do_update(
...
set_={
coldefs.title.name: stmt.excluded.title
...
}
)
An alternative approach using compilation extension (https://docs.sqlalchemy.org/en/13/core/compiler.html):
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def compile_upsert(insert_stmt, compiler, **kwargs):
"""
converts every SQL insert to an upsert i.e;
INSERT INTO test (foo, bar) VALUES (1, 'a')
becomes:
INSERT INTO test (foo, bar) VALUES (1, 'a') ON CONFLICT(foo) DO UPDATE SET (bar = EXCLUDED.bar)
(assuming foo is a primary key)
:param insert_stmt: Original insert statement
:param compiler: SQL Compiler
:param kwargs: optional arguments
:return: upsert statement
"""
pk = insert_stmt.table.primary_key
insert = compiler.visit_insert(insert_stmt, **kwargs)
ondup = f'ON CONFLICT ({",".join(c.name for c in pk)}) DO UPDATE SET'
updates = ', '.join(f"{c.name}=EXCLUDED.{c.name}" for c in insert_stmt.table.columns)
upsert = ' '.join((insert, ondup, updates))
return upsert
This should ensure that all insert statements behave as upserts. This implementation is in Postgres dialect, but it should be fairly easy to modify for MySQL dialect.
I started looking at this and I think I've found a pretty efficient way to do upserts in sqlalchemy with a mix of bulk_insert_mappings and bulk_update_mappings instead of merge.
import time
import sqlite3
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
from contextlib import contextmanager
engine = None
Session = sessionmaker()
Base = declarative_base()
def creat_new_database(db_name="sqlite:///bulk_upsert_sqlalchemy.db"):
global engine
engine = create_engine(db_name, echo=False)
local_session = scoped_session(Session)
local_session.remove()
local_session.configure(bind=engine, autoflush=False, expire_on_commit=False)
Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)
#contextmanager
def db_session():
local_session = scoped_session(Session)
session = local_session()
session.expire_on_commit = False
try:
yield session
except BaseException:
session.rollback()
raise
finally:
session.close()
class Customer(Base):
__tablename__ = "customer"
id = Column(Integer, primary_key=True)
name = Column(String(255))
def bulk_upsert_mappings(customers):
entries_to_update = []
entries_to_put = []
with db_session() as sess:
t0 = time.time()
# Find all customers that needs to be updated and build mappings
for each in (
sess.query(Customer.id).filter(Customer.id.in_(customers.keys())).all()
):
customer = customers.pop(each.id)
entries_to_update.append({"id": customer["id"], "name": customer["name"]})
# Bulk mappings for everything that needs to be inserted
for customer in customers.values():
entries_to_put.append({"id": customer["id"], "name": customer["name"]})
sess.bulk_insert_mappings(Customer, entries_to_put)
sess.bulk_update_mappings(Customer, entries_to_update)
sess.commit()
print(
"Total time for upsert with MAPPING update "
+ str(len(customers))
+ " records "
+ str(time.time() - t0)
+ " sec"
+ " inserted : "
+ str(len(entries_to_put))
+ " - updated : "
+ str(len(entries_to_update))
)
def bulk_upsert_merge(customers):
entries_to_update = 0
entries_to_put = []
with db_session() as sess:
t0 = time.time()
# Find all customers that needs to be updated and merge
for each in (
sess.query(Customer.id).filter(Customer.id.in_(customers.keys())).all()
):
values = customers.pop(each.id)
sess.merge(Customer(id=values["id"], name=values["name"]))
entries_to_update += 1
# Bulk mappings for everything that needs to be inserted
for customer in customers.values():
entries_to_put.append({"id": customer["id"], "name": customer["name"]})
sess.bulk_insert_mappings(Customer, entries_to_put)
sess.commit()
print(
"Total time for upsert with MERGE update "
+ str(len(customers))
+ " records "
+ str(time.time() - t0)
+ " sec"
+ " inserted : "
+ str(len(entries_to_put))
+ " - updated : "
+ str(entries_to_update)
)
if __name__ == "__main__":
batch_size = 10000
# Only inserts
customers_insert = {
i: {"id": i, "name": "customer_" + str(i)} for i in range(batch_size)
}
# 50/50 inserts update
customers_upsert = {
i: {"id": i, "name": "customer_2_" + str(i)}
for i in range(int(batch_size / 2), batch_size + int(batch_size / 2))
}
creat_new_database()
bulk_upsert_mappings(customers_insert.copy())
bulk_upsert_mappings(customers_upsert.copy())
bulk_upsert_mappings(customers_insert.copy())
creat_new_database()
bulk_upsert_merge(customers_insert.copy())
bulk_upsert_merge(customers_upsert.copy())
bulk_upsert_merge(customers_insert.copy())
The results for the benchmark:
Total time for upsert with MAPPING: 0.17138004302978516 sec inserted : 10000 - updated : 0
Total time for upsert with MAPPING: 0.22074174880981445 sec inserted : 5000 - updated : 5000
Total time for upsert with MAPPING: 0.22307634353637695 sec inserted : 0 - updated : 10000
Total time for upsert with MERGE: 0.1724097728729248 sec inserted : 10000 - updated : 0
Total time for upsert with MERGE: 7.852903842926025 sec inserted : 5000 - updated : 5000
Total time for upsert with MERGE: 15.11970829963684 sec inserted : 0 - updated : 10000
This is not the safest method, but it is very simple and very fast. I was just trying to selectively overwrite a portion of a table. I deleted the known rows that I knew would conflict and then I appended the new rows from a pandas dataframe. Your pandas dataframe column names will need to match your sql table column names.
eng = create_engine('postgresql://...')
conn = eng.connect()
conn.execute("DELETE FROM my_table WHERE col = %s", val)
df.to_sql('my_table', con=eng, if_exists='append')
I know this is kind of late, but I have built on the answer given by #Emil Wåreusand turned it into a function that can be used on any model (table),
def upsert_data(self, entries, model, key):
entries_to_update = []
entries_to_insert = []
# get all entries to be updated
for each in session.query(model).filter(getattr(model, key).in_(entries.keys())).all():
entry = entries.pop(str(getattr(each, key)))
entries_to_update.append(entry)
# get all entries to be inserted
for entry in entries.values():
entries_to_insert.append(entry)
session.bulk_insert_mappings(model, entries_to_insert)
session.bulk_update_mappings(model, entries_to_update)
session.commit()
entries should be a dictionary, with the primary key values as the keys, and the values should be mappings (mappings of the values against the columns of the database).
model is the ORM model that you want to upsert to.
key is the primary key of the table.
You can even use this function to get the model for the table you want to insert to from a string,
def get_table(self, table_name):
for c in self.base._decl_class_registry.values():
if hasattr(c, '__tablename__') and c.__tablename__ == table_name:
return c
Using this, you can just pass the name of the table as a string to the upsert_data function,
def upsert_data(self, entries, table, key):
model = get_table(table)
entries_to_update = []
entries_to_insert = []
# get all entries to be updated
for each in session.query(model).filter(getattr(model, key).in_(entries.keys())).all():
entry = entries.pop(str(getattr(each, key)))
entries_to_update.append(entry)
# get all entries to be inserted
for entry in entries.values():
entries_to_insert.append(entry)
session.bulk_insert_mappings(model, entries_to_insert)
session.bulk_update_mappings(model, entries_to_update)
session.commit()

Getting the id of the last record inserted for Postgresql SERIAL KEY with Python

I am using SQLAlchemy without the ORM, i.e. using hand-crafted SQL statements to directly interact with the backend database. I am using PG as my backend database (psycopg2 as DB driver) in this instance - I don't know if that affects the answer.
I have statements like this,for brevity, assume that conn is a valid connection to the database:
conn.execute("INSERT INTO user (name, country_id) VALUES ('Homer', 123)")
Assume also that the user table consists of the columns (id [SERIAL PRIMARY KEY], name, country_id)
How may I obtain the id of the new user, ideally, without hitting the database again?
You might be able to use the RETURNING clause of the INSERT statement like this:
result = conn.execute("INSERT INTO user (name, country_id) VALUES ('Homer', 123)
RETURNING *")
If you only want the resulting id:
result = conn.execute("INSERT INTO user (name, country_id) VALUES ('Homer', 123)
RETURNING id")
[new_id] = result.fetchone()
User lastrowid
result = conn.execute("INSERT INTO user (name, country_id) VALUES ('Homer', 123)")
result.lastrowid
Current SQLAlchemy documentation suggests
result.inserted_primary_key should work!
Python + SQLAlchemy
after commit, you get the primary_key column id (autoincremeted) updated in your object.
db.session.add(new_usr)
db.session.commit() #will insert the new_usr data into database AND retrieve id
idd = new_usr.usrID # usrID is the autoincremented primary_key column.
return jsonify(idd),201 #usrID = 12, correct id from table User in Database.
this question has been asked many times on stackoverflow and no answer I have seen is comprehensive. Googling 'sqlalchemy insert get id of new row' brings up a lot of them.
There are three levels to SQLAlchemy.
Top: the ORM.
Middle: Database abstraction (DBA) with Table classes etc.
Bottom: SQL using the text function.
To an OO programmer the ORM level looks natural, but to a database programmer it looks ugly and the ORM gets in the way. The DBA layer is an OK compromise. The SQL layer looks natural to database programmers and would look alien to an OO-only programmer.
Each level has it own syntax, similar but different enough to be frustrating. On top of this there is almost too much documentation online, very hard to find the answer.
I will describe how to get the inserted id AT THE SQL LAYER for the RDBMS I use.
Table: User(user_id integer primary autoincrement key, user_name string)
conn: Is a Connection obtained within SQLAlchemy to the DBMS you are using.
SQLite
======
insstmt = text(
'''INSERT INTO user (user_name)
VALUES (:usernm) ''' )
# Execute within a transaction (optional)
txn = conn.begin()
result = conn.execute(insstmt, usernm='Jane Doe')
# The id!
recid = result.lastrowid
txn.commit()
MS SQL Server
=============
insstmt = text(
'''INSERT INTO user (user_name)
OUTPUT inserted.record_id
VALUES (:usernm) ''' )
txn = conn.begin()
result = conn.execute(insstmt, usernm='Jane Doe')
# The id!
recid = result.fetchone()[0]
txn.commit()
MariaDB/MySQL
=============
insstmt = text(
'''INSERT INTO user (user_name)
VALUES (:usernm) ''' )
txn = conn.begin()
result = conn.execute(insstmt, usernm='Jane Doe')
# The id!
recid = conn.execute(text('SELECT LAST_INSERT_ID()')).fetchone()[0]
txn.commit()
Postgres
========
insstmt = text(
'''INSERT INTO user (user_name)
VALUES (:usernm)
RETURNING user_id ''' )
txn = conn.begin()
result = conn.execute(insstmt, usernm='Jane Doe')
# The id!
recid = result.fetchone()[0]
txn.commit()
result.inserted_primary_key
Worked for me. The only thing to note is that this returns a list that contains that last_insert_id.
Make sure you use fetchrow/fetch to receive the returning object
insert_stmt = user.insert().values(name="homer", country_id="123").returning(user.c.id)
row_id = await conn.fetchrow(insert_stmt)
For Postgress inserts from python code is simple to use "RETURNING" keyword with the "col_id" (name of the column which you want to get the last inserted row id) in insert statement at end
syntax -
from sqlalchemy import create_engine
conn_string = "postgresql://USERNAME:PSWD#HOSTNAME/DATABASE_NAME"
db = create_engine(conn_string)
conn = db.connect()
INSERT INTO emp_table (col_id, Name ,Age)
VALUES(3,'xyz',30) RETURNING col_id;
or
(if col_id column is auto increment)
insert_sql = (INSERT INTO emp_table (Name ,Age)
VALUES('xyz',30) RETURNING col_id;)
result = conn.execute(insert_sql)
[last_row_id] = result.fetchone()
print(last_row_id)
#output = 3
ex -

Categories