So I'm using sqlalchemy for a project I'm working on. I've got an issue where I will eventually have thousands of records that need to be saved every hour. These records may be inserted or updated. I've been using bulk_save_objects for this and it's worked great. However now I have to introduce a history to these records being saved, which means I need the IDs returned so I can link these entries to an entry in a history table. I know about using return_defaults, and that works. However, it introduces a problem that my bulk_save_objects inserts and updates one entry at a time, instead of in bulk, which removes the purpose. Is there another option, where I can bulk insert and update at the same time, but retain the IDs?
The desired result can be achieved using a technique similar to the one described in the answer here by uploading the rows to a temporary table and then performing an UPDATE followed by an INSERT that returns the inserted ID values. For SQL Server, that would be an OUTPUT clause on the INSERT statement:
main_table = "team"
# <set up test environment>
with engine.begin() as conn:
conn.execute(sa.text(f"DROP TABLE IF EXISTS [{main_table}]"))
conn.execute(
sa.text(
f"""
CREATE TABLE [dbo].[{main_table}](
[id] [int] IDENTITY(1,1) NOT NULL,
[prov] [varchar](2) NOT NULL,
[city] [varchar](50) NOT NULL,
[name] [varchar](50) NOT NULL,
[comments] [varchar](max) NULL,
CONSTRAINT [PK_team] PRIMARY KEY CLUSTERED
(
[id] ASC
)
)
"""
)
)
conn.execute(
sa.text(
f"""
CREATE UNIQUE NONCLUSTERED INDEX [UX_team_prov_city] ON [dbo].[{main_table}]
(
[prov] ASC,
[city] ASC
)
"""
)
)
conn.execute(
sa.text(
f"""
INSERT INTO [{main_table}] ([prov], [city], [name])
VALUES ('AB', 'Calgary', 'Flames')
"""
)
)
# <data for upsert>
df = pd.DataFrame(
[
("AB", "Calgary", "Flames", "hard-working, handsome lads"),
("AB", "Edmonton", "Oilers", "ruffians and scalawags"),
],
columns=["prov", "city", "name", "comments"],
)
# <perform upsert, returning IDs>
temp_table = "#so65525098"
with engine.begin() as conn:
df.to_sql(temp_table, conn, index=False, if_exists="replace")
conn.execute(
sa.text(
f"""
UPDATE main SET main.name = temp.name,
main.comments = temp.comments
FROM [{main_table}] main INNER JOIN [{temp_table}] temp
ON main.prov = temp.prov AND main.city = temp.city
"""
)
)
inserted = conn.execute(
sa.text(
f"""
INSERT INTO [{main_table}] (prov, city, name, comments)
OUTPUT INSERTED.prov, INSERTED.city, INSERTED.id
SELECT prov, city, name, comments FROM [{temp_table}] temp
WHERE NOT EXISTS (
SELECT * FROM [{main_table}] main
WHERE main.prov = temp.prov AND main.city = temp.city
)
"""
)
).fetchall()
print(inserted)
"""console output:
[('AB', 'Edmonton', 2)]
"""
# <check results>
with engine.begin() as conn:
pprint(conn.execute(sa.text(f"SELECT * FROM {main_table}")).fetchall())
"""console output:
[(1, 'AB', 'Calgary', 'Flames', 'hard-working, handsome lads'),
(2, 'AB', 'Edmonton', 'Oilers', 'ruffians and scalawags')]
"""
Related
The user adds information here: the form
The information gets added to the shoes table.
The database: the database
I want to insert ShoeImage, ShoeName, ShoeStyle, ShoeColor, ShoePrice, and ShoeDescr, and NOT ShoeID (which is autoincrement),ShoeBrandID, and ShoeSizeID.
My insert statement:
$sql = "INSERT INTO $tblShoes VALUES (NULL, '$ShoeImage', '$ShoeName', '$ShoeStyle', '$ShoeColor',
'$ShoePrice', '$ShoeDescr')";
How to write this insert statement with inner join?
Might works.
INSERT INTO shoes
(
'ShoeImage',
'ShoeName',
'ShoeStyle',
'ShoeColor',
'ShoePrice',
'ShoeDescr',
'ShoeBrandID',
'ShoeSizeID'
)
VALUES(
NULL,
'$ShoeImage',
'$ShoeName',
'$ShoeStyle',
'$ShoeColor',
'$ShoePrice',
'$ShoeDescr',
(SELECT BrandID FROM shoebrand WHERE BrandName = '$ShoeBrand'),
(SELECT SizeID FROM shoesize WHERE Size = '$ShoeSize')
)
I need to test a python flask app that uses mySQL to run its' queries using sqlalchemy, with sqlite3.
I've encountered an exception when trying to test an upsert function using an ON DUPLICATE clause:
(sqlite3.OperationalError) near "DUPLICATE": syntax error
After a brief search for a solution, I've found that the correct syntax for sqlite to execute upsert queries is ON CONFLICT(id) DO UPDATE SET ..., I've tried it but mySQL doesn't recognize this syntax.
What can I do? How can I do an upsert query so sqlite3 and mySQL will both execute it properly?
Example:
employees table:
id
name
1
Jeff Bezos
2
Bill Gates
INSERT INTO employees(id,name)
VALUES(1, 'Donald Trump')
ON DUPLICATE KEY UPDATE name = VALUES(name);
Should update the table to be:
id
name
1
Donald Trump
2
Bill Gates
Thanks in advance!
How can I do an upsert query so sqlite3 and mySQL will both execute it properly?
You can achieve the same result by attempting an UPDATE, and if no match is found then do an INSERT. The following code uses SQLAlchemy Core constructs, which provide further protection from the subtle differences between MySQL and SQLite . For example, if your table had a column named "order" then SQLAlchemy would emit this DDL for MySQL …
CREATE TABLE employees (
id INTEGER NOT NULL,
name VARCHAR(50),
`order` INTEGER,
PRIMARY KEY (id)
)
… and this DDL for SQLite
CREATE TABLE employees (
id INTEGER NOT NULL,
name VARCHAR(50),
"order" INTEGER,
PRIMARY KEY (id)
)
import logging
import sqlalchemy as sa
# pick one
connection_url = "mysql+mysqldb://scott:tiger#localhost:3307/mydb"
# connection_url = "sqlite://"
engine = sa.create_engine(connection_url)
def _dump_table():
with engine.begin() as conn:
print(conn.exec_driver_sql("SELECT * FROM employees").all())
def _setup_example():
employees = sa.Table(
"employees",
sa.MetaData(),
sa.Column("id", sa.Integer, primary_key=True, autoincrement=False),
sa.Column("name", sa.String(50)),
)
employees.drop(engine, checkfirst=True)
employees.create(engine)
# create initial example data
with engine.begin() as conn:
conn.execute(
employees.insert(),
[{"id": 1, "name": "Jeff Bezos"}, {"id": 2, "name": "Bill Gates"}],
)
def upsert_employee(id_, name):
employees = sa.Table("employees", sa.MetaData(), autoload_with=engine)
with engine.begin() as conn:
result = conn.execute(
employees.update().where(employees.c.id == id_), {"name": name}
)
logging.debug(f" {result.rowcount} row(s) updated.")
if result.rowcount == 0:
result = conn.execute(
employees.insert(), {"id": id_, "name": name}
)
logging.debug(f" {result.rowcount} row(s) inserted.")
if __name__ == "__main__":
logging.basicConfig(level=logging.DEBUG)
_setup_example()
_dump_table()
"""
[(1, 'Jeff Bezos'), (2, 'Bill Gates')]
"""
upsert_employee(3, "Donald Trump")
"""
DEBUG:root: 0 row(s) updated.
DEBUG:root: 1 row(s) inserted.
"""
_dump_table()
"""
[(1, 'Jeff Bezos'), (2, 'Bill Gates'), (3, 'Donald Trump')]
"""
upsert_employee(1, "Elon Musk")
"""
DEBUG:root: 1 row(s) updated.
"""
_dump_table()
"""
[(1, 'Elon Musk'), (2, 'Bill Gates'), (3, 'Donald Trump')]
"""
I have a function that I use to update tables in PostgreSQL. It works great to avoid duplicate insertions by creating a temp table and dropping it upon completion. However, I have a few tables with serial ids and I have to pass the serial id in a column. Otherwise, I get an error that the keys are missing. How can I insert values in those tables and have the serial key get assigned automatically? I would prefer to modify the function below if possible.
def export_to_sql(df, table_name):
from sqlalchemy import create_engine
engine = create_engine(f'postgresql://{user}:{password}#{host}:5432/{user}')
df.to_sql(con=engine,
name='temporary_table',
if_exists='append',
index=False,
method = 'multi')
with engine.begin() as cnx:
insert_sql = f'INSERT INTO {table_name} (SELECT * FROM temporary_table) ON CONFLICT DO NOTHING; DROP TABLE temporary_table'
cnx.execute(insert_sql)
code used to create the tables
CREATE TABLE symbols
(
symbol_id serial NOT NULL,
symbol varchar(50) NOT NULL,
CONSTRAINT PK_symbols PRIMARY KEY ( symbol_id )
);
CREATE TABLE tweet_symols(
tweet_id varchar(50) REFERENCES tweets,
symbol_id int REFERENCES symbols,
PRIMARY KEY (tweet_id, symbol_id),
UNIQUE (tweet_id, symbol_id)
);
CREATE TABLE hashtags
(
hashtag_id serial NOT NULL,
hashtag varchar(140) NOT NULL,
CONSTRAINT PK_hashtags PRIMARY KEY ( hashtag_id )
);
CREATE TABLE tweet_hashtags
(
tweet_id varchar(50) NOT NULL,
hashtag_id integer NOT NULL,
CONSTRAINT FK_344 FOREIGN KEY ( tweet_id ) REFERENCES tweets ( tweet_id )
);
CREATE INDEX fkIdx_345 ON tweet_hashtags
(
tweet_id
);
The INSERT statement does not define the target columns, so Postgresql will attempt to insert values into a column that was defined as SERIAL.
We can work around this by providing a list of target columns, omitting the serial types. To do this we use SQLAlchemy to fetch the metadata of the table that we are inserting into from the database, then make a list of target columns. SQLAlchemy doesn't tell us if a column was created using SERIAL, but we will assume that it is if it is a primary key and is set to autoincrement. Primary key columns defined with GENERATED ... AS IDENTITY will also be filtered out - this is probably desirable as they behave in the same way as SERIAL columns.
import sqlalchemy as sa
def export_to_sql(df, table_name):
engine = sa.create_engine(f'postgresql://{user}:{password}#{host}:5432/{user}')
df.to_sql(con=engine,
name='temporary_table',
if_exists='append',
index=False,
method='multi')
# Fetch table metadata from the database
table = sa.Table(table_name, sa.MetaData(), autoload_with=engine)
# Get the names of columns to be inserted,
# assuming auto-incrementing PKs are serial types
column_names = ','.join(
[f'"{c.name}"' for c in table.columns
if not (c.primary_key and c.autoincrement)]
)
with engine.begin() as cnx:
insert_sql = sa.text(
f'INSERT INTO {table_name} ({column_names}) (SELECT * FROM temporary_table) ON CONFLICT DO NOTHING; DROP TABLE temporary_table'
)
cnx.execute(insert_sql)
I'm pushing data from a data-frame into MySQL, right now it is only adding new data to the table if the data does not exists(appending). This works perfect, however I also want my code to check if the record already exists then it needs to update. So I need it to append + update. I really don't know how to start fixing this as I got stuck....someone tried this before?
This is my code:
engine = create_engine("mysql+pymysql://{user}:{pw}#localhost/{db}"
.format(user="root",
pw="*****",
db="my_db"))
my_df.to_sql('my_table', con = engine, if_exists = 'append')
You can use next solution on DB side:
First: create table for insert data from Pandas (let call it test):
CREATE TABLE `test` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(100) NOT NULL,
`capacity` INT(11) NOT NULL,
PRIMARY KEY (`id`)
);
Second: Create table for resulting data (let call it cumulative_test) exactly same structure as test:
CREATE TABLE `cumulative_test` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(100) NOT NULL,
`capacity` INT(11) NOT NULL,
PRIMARY KEY (`id`)
);
Third: set trigger on each insert into the test table will insert ore update record in the second table like:
DELIMITER $$
CREATE
/*!50017 DEFINER = 'root'#'localhost' */
TRIGGER `before_test_insert` BEFORE INSERT ON `test`
FOR EACH ROW BEGIN
DECLARE _id INT;
SELECT id INTO _id
FROM `cumulative_test` WHERE `cumulative_test`.`name` = new.name;
IF _id IS NOT NULL THEN
UPDATE cumulative_test
SET `cumulative_test`.`capacity` = `cumulative_test`.`capacity` + new.capacity;
ELSE
INSERT INTO `cumulative_test` (`name`, `capacity`)
VALUES (NEW.name, NEW.capacity);
END IF;
END;
$$
DELIMITER ;
So you will already insert values into the test table and get calculated results in the second table. The logic inside the trigger can be matched for your needs.
Similar to the approach used for PostgreSQL here, you can use INSERT … ON DUPLICATE KEY in MySQL:
with engine.begin() as conn:
# step 0.0 - create test environment
conn.execute(sa.text("DROP TABLE IF EXISTS main_table"))
conn.execute(
sa.text(
"CREATE TABLE main_table (id int primary key, txt varchar(50))"
)
)
conn.execute(
sa.text(
"INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
)
)
# step 0.1 - create DataFrame to UPSERT
df = pd.DataFrame(
[(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
)
# step 1 - create temporary table and upload DataFrame
conn.execute(
sa.text(
"CREATE TEMPORARY TABLE temp_table (id int primary key, txt varchar(50))"
)
)
df.to_sql("temp_table", conn, index=False, if_exists="append")
# step 2 - merge temp_table into main_table
conn.execute(
sa.text(
"""\
INSERT INTO main_table (id, txt)
SELECT id, txt FROM temp_table
ON DUPLICATE KEY UPDATE txt = VALUES(txt)
"""
)
)
# step 3 - confirm results
result = conn.execute(
sa.text("SELECT * FROM main_table ORDER BY id")
).fetchall()
print(result) # [(1, 'row 1 new text'), (2, 'new row 2 text')]
Here is my code:
import sqlite3
def insert(fields=(), values=()):
connection = sqlite3.connect('database.db')
# g.db is the database connection
cur = connection.cursor()
query = 'INSERT INTO this_database (%s) VALUES (%s)' % (
', '.join(fields),
', '.join(['?'] * len(values))
)
cur.execute(query, values)
connection.commit()
id = cur.lastrowid
cur.close()
print (id)
test example:
insert(fields = ("id", "file_name", "url", "time", "type", "description"), values = (2, "file1", "wwww.test.com", "1", "photo", "my first database test"))
I don't want to give the id manually.
I want it to add it+1 automatically.
How can I do that?
You have an INTEGER PRIMARY KEY column, which, if you leave it out when inserting items, automatically increments:
INSERT INTO this_database(file_name, url, time, type, description)
VALUES (?,?,?,?,?)
Since id is omitted, every time you insert a value using the above statement, it's automatically assigned a number by sqlite.
The documentation explaining this.