ON DUPLICATE KEY UPDATE non-index columns - python

I have a code that is updating a few mySQL tables with the data that is coming from a Sybase database. The table structures are exactly the same.
Since the number of tables may increase in the future, I wrote a Python script that loops over an array of table names, and based on the number of columns in each of those tables, the insert statement dynamically changes:
'''insert into databaseName.{} ({}) values ({})'''.format(table, columns, parameters)
as you can see, the value parameters are not hardcoded, which has caused this problem where I can't modify this query to do an "ON DUPLICATE KEY UPDATE".
for example, the insert statement may look like:
insert into databaseName.table_foo (col1,col2,col3,col4,col5) values (%s,%s,%s,%s,%s)
or
insert into databaseName.table_bar (col1,col2,col3) values (%s,%s,%s)
how can I use "ON DUPLICATE KEY UPDATE" in here to update non-index columns with their corresponding non-index values?
I can update this question by including more details if needed.

The easiest solution is this:
'''replace into databaseName.{} ({}) values ({})'''.format(table, columns, parameters)
This works similarly to IODKU, in that if the values conflict with a PRIMARY KEY or UNIQUE KEY of the table, it replaces the row, overwriting the other columns, instead of causing a duplicate key error.
The difference is that REPLACE does a DELETE of the old row followed by an INSERT of the new row. Whereas IODKU does either an INSERT or an UPDATE. We know this because if you create triggers on the table, you'll see which triggers are activated.
Anyway, using REPLACE would make your task a lot simpler in this case.
If you must use IODKU, you would need to add more syntax after the update at the end. Unfortunately, there is no syntax for "assign all the columns respectively to the new row's values." You must assign them individually.
For MySQL 8.0.19 or later use this syntax:
INSERT INTO t1 (a,b,c) VALUES (?,?,?) AS new
ON DUPLICATE KEY UPDATE a = new.a, b = new.b, c = new.c;
In earlier MySQL, use this syntax:
INSERT INTO t1 (a,b,c) VALUES (?,?,?)
ON DUPLICATE KEY UPDATE a = VALUES(a), b = VALUES(b), c = VALUES(c);

Related

Insert or update if primary key exists into postgreSQL table with .to_sql()

I have a pandas DataFrame that consists of multiple columns that I want to store into the postgreSQL database, using .to_sql():
my_table.to_sql('table', con=engine, schema='wrhouse', if_exists='append', index=False)
I have set a primary key (date), in order to avoid duplicate entries. So above-mentioned command works when my primary key does not exist in the database.
However, if that key exists I am getting the following error:
IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "table_pkey"
DETAIL: Key (date)=(2022-07-01 00:00:00) already exists.
Now, what I would like to do is:
Update the row with the already existed Key(date)
Insert a new row in case the Key(date) does not exist
I checked the documentation on: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html but I could't find any option by using the DataFrame.to_sql() function.
Additionally, if I change the if_exists='append' parameter to if_exists='replace', it deletes the whole table and that is not what I want.
Is there any way to update/insert rows using the .to_sql() function?
you could convert the my_table dataframe (which holds new values you'd like to send to the table in the database) to a numpy record array and add it to the query used in the execute function in your comment ^:
values = str(list(my_table.to_records(index=False)))[1:-1]
conn.execute(f"INSERT INTO wrschema.table (date, first_hour, last_hour, quantity) VALUES {values} ON CONFLICT (date) DO UPDATE SET first_hour = EXCLUDED.first_hour, last_hour = EXCLUDED.last_hour, quantity = EXCLUDED.quantity;")
(this is something that worked for me, hope it helps!)

Do not insert duplicates into mysql in python [duplicate]

I started by googling and found the article How to write INSERT if NOT EXISTS queries in standard SQL which talks about mutex tables.
I have a table with ~14 million records. If I want to add more data in the same format, is there a way to ensure the record I want to insert does not already exist without using a pair of queries (i.e., one query to check and one to insert is the result set is empty)?
Does a unique constraint on a field guarantee the insert will fail if it's already there?
It seems that with merely a constraint, when I issue the insert via PHP, the script croaks.
Use INSERT IGNORE INTO table.
There's also INSERT … ON DUPLICATE KEY UPDATE syntax, and you can find explanations in 13.2.6.2 INSERT ... ON DUPLICATE KEY UPDATE Statement.
Post from bogdan.org.ua according to Google's webcache:
18th October 2007
To start: as of the latest MySQL, syntax presented in the title is not
possible. But there are several very easy ways to accomplish what is
expected using existing functionality.
There are 3 possible solutions: using INSERT IGNORE, REPLACE, or
INSERT … ON DUPLICATE KEY UPDATE.
Imagine we have a table:
CREATE TABLE `transcripts` (
`ensembl_transcript_id` varchar(20) NOT NULL,
`transcript_chrom_start` int(10) unsigned NOT NULL,
`transcript_chrom_end` int(10) unsigned NOT NULL,
PRIMARY KEY (`ensembl_transcript_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Now imagine that we have an automatic pipeline importing transcripts
meta-data from Ensembl, and that due to various reasons the pipeline
might be broken at any step of execution. Thus, we need to ensure two
things:
repeated executions of the pipeline will not destroy our
> database
repeated executions will not die due to ‘duplicate
> primary key’ errors.
Method 1: using REPLACE
It’s very simple:
REPLACE INTO `transcripts`
SET `ensembl_transcript_id` = 'ENSORGT00000000001',
`transcript_chrom_start` = 12345,
`transcript_chrom_end` = 12678;
If the record exists, it will be overwritten; if it does not yet
exist, it will be created. However, using this method isn’t efficient
for our case: we do not need to overwrite existing records, it’s fine
just to skip them.
Method 2: using INSERT IGNORE Also very simple:
INSERT IGNORE INTO `transcripts`
SET `ensembl_transcript_id` = 'ENSORGT00000000001',
`transcript_chrom_start` = 12345,
`transcript_chrom_end` = 12678;
Here, if the ‘ensembl_transcript_id’ is already present in the
database, it will be silently skipped (ignored). (To be more precise,
here’s a quote from MySQL reference manual: “If you use the IGNORE
keyword, errors that occur while executing the INSERT statement are
treated as warnings instead. For example, without IGNORE, a row that
duplicates an existing UNIQUE index or PRIMARY KEY value in the table
causes a duplicate-key error and the statement is aborted.”.) If the
record doesn’t yet exist, it will be created.
This second method has several potential weaknesses, including
non-abortion of the query in case any other problem occurs (see the
manual). Thus it should be used if previously tested without the
IGNORE keyword.
Method 3: using INSERT … ON DUPLICATE KEY UPDATE:
Third option is to use INSERT … ON DUPLICATE KEY UPDATE
syntax, and in the UPDATE part just do nothing do some meaningless
(empty) operation, like calculating 0+0 (Geoffray suggests doing the
id=id assignment for the MySQL optimization engine to ignore this
operation). Advantage of this method is that it only ignores duplicate
key events, and still aborts on other errors.
As a final notice: this post was inspired by Xaprb. I’d also advise to
consult his other post on writing flexible SQL queries.
Solution:
INSERT INTO `table` (`value1`, `value2`)
SELECT 'stuff for value1', 'stuff for value2' FROM DUAL
WHERE NOT EXISTS (SELECT * FROM `table`
WHERE `value1`='stuff for value1' AND `value2`='stuff for value2' LIMIT 1)
Explanation:
The innermost query
SELECT * FROM `table`
WHERE `value1`='stuff for value1' AND `value2`='stuff for value2' LIMIT 1
used as the WHERE NOT EXISTS-condition detects if there already exists a row with the data to be inserted. After one row of this kind is found, the query may stop, hence the LIMIT 1 (micro-optimization, may be omitted).
The intermediate query
SELECT 'stuff for value1', 'stuff for value2' FROM DUAL
represents the values to be inserted. DUAL refers to a special one row, one column table present by default in all Oracle databases (see https://en.wikipedia.org/wiki/DUAL_table). On a MySQL-Server version 5.7.26 I got a valid query when omitting FROM DUAL, but older versions (like 5.5.60) seem to require the FROM information. By using WHERE NOT EXISTS the intermediate query returns an empty result set if the innermost query found matching data.
The outer query
INSERT INTO `table` (`value1`, `value2`)
inserts the data, if any is returned by the intermediate query.
In MySQL, ON DUPLICATE KEY UPDATE or INSERT IGNORE can be viable solutions.
An example of ON DUPLICATE KEY UPDATE update based on mysql.com:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
UPDATE table SET c=c+1 WHERE a=1;
An example of INSERT IGNORE based on mysql.com
INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name [(col_name,...)]
{VALUES | VALUE} ({expr | DEFAULT},...),(...),...
[ ON DUPLICATE KEY UPDATE
col_name=expr
[, col_name=expr] ... ]
Or:
INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name
SET col_name={expr | DEFAULT}, ...
[ ON DUPLICATE KEY UPDATE
col_name=expr
[, col_name=expr] ... ]
Or:
INSERT [LOW_PRIORITY | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name [(col_name,...)]
SELECT ...
[ ON DUPLICATE KEY UPDATE
col_name=expr
[, col_name=expr] ... ]
Any simple constraint should do the job, if an exception is acceptable. Examples:
primary key if not surrogate
unique constraint on a column
multi-column unique constraint
Sorry if this seems deceptively simple. I know it looks bad confronted to the link you share with us. ;-(
But I nevertheless give this answer, because it seems to fill your need. (If not, it may trigger you updating your requirements, which would be "a Good Thing"(TM) also).
If an insert would break the database unique constraint, an exception is throw at the database level, relayed by the driver. It will certainly stop your script, with a failure. It must be possible in PHP to address that case...
Try the following:
IF (SELECT COUNT(*) FROM beta WHERE name = 'John' > 0)
UPDATE alfa SET c1=(SELECT id FROM beta WHERE name = 'John')
ELSE
BEGIN
INSERT INTO beta (name) VALUES ('John')
INSERT INTO alfa (c1) VALUES (LAST_INSERT_ID())
END
REPLACE INTO `transcripts`
SET `ensembl_transcript_id` = 'ENSORGT00000000001',
`transcript_chrom_start` = 12345,
`transcript_chrom_end` = 12678;
If the record exists, it will be overwritten; if it does not yet exist, it will be created.
Here is a PHP function that will insert a row only if all the specified columns values don't already exist in the table.
If one of the columns differ, the row will be added.
If the table is empty, the row will be added.
If a row exists where all the specified columns have the specified values, the row won't be added.
function insert_unique($table, $vars)
{
if (count($vars)) {
$table = mysql_real_escape_string($table);
$vars = array_map('mysql_real_escape_string', $vars);
$req = "INSERT INTO `$table` (`". join('`, `', array_keys($vars)) ."`) ";
$req .= "SELECT '". join("', '", $vars) ."' FROM DUAL ";
$req .= "WHERE NOT EXISTS (SELECT 1 FROM `$table` WHERE ";
foreach ($vars AS $col => $val)
$req .= "`$col`='$val' AND ";
$req = substr($req, 0, -5) . ") LIMIT 1";
$res = mysql_query($req) OR die();
return mysql_insert_id();
}
return False;
}
Example usage:
<?php
insert_unique('mytable', array(
'mycolumn1' => 'myvalue1',
'mycolumn2' => 'myvalue2',
'mycolumn3' => 'myvalue3'
)
);
?>
There are several answers that cover how to solve this if you have a UNIQUE index that you can check against with ON DUPLICATE KEY or INSERT IGNORE. That is not always the case, and as UNIQUE has a length constraint (1000 bytes) you might not be able to change that. For example, I had to work with metadata in WordPress (wp_postmeta).
I finally solved it with two queries:
UPDATE wp_postmeta SET meta_value = ? WHERE meta_key = ? AND post_id = ?;
INSERT INTO wp_postmeta (post_id, meta_key, meta_value) SELECT DISTINCT ?, ?, ? FROM wp_postmeta WHERE NOT EXISTS(SELECT * FROM wp_postmeta WHERE meta_key = ? AND post_id = ?);
Query 1 is a regular UPDATE query without any effect when the data set in question is not there. Query 2 is an INSERT which depends on a NOT EXISTS, i.e. the INSERT is only executed when the data set doesn't exist.
Something worth noting is that INSERT IGNORE will still increment the primary key whether the statement was a success or not just like a normal INSERT would.
This will cause gaps in your primary keys that might make a programmer mentally unstable. Or if your application is poorly designed and depends on perfect incremental primary keys, it might become a headache.
Look into innodb_autoinc_lock_mode = 0 (server setting, and comes with a slight performance hit), or use a SELECT first to make sure your query will not fail (which also comes with a performance hit and extra code).
Update or insert without known primary key
If you already have a unique or primary key, the other answers with either INSERT INTO ... ON DUPLICATE KEY UPDATE ... or REPLACE INTO ... should work fine (note that replace into deletes if exists and then inserts - thus does not partially update existing values).
But if you have the values for some_column_id and some_type, the combination of which are known to be unique. And you want to update some_value if exists, or insert if not exists. And you want to do it in just one query (to avoid using a transaction). This might be a solution:
INSERT INTO my_table (id, some_column_id, some_type, some_value)
SELECT t.id, t.some_column_id, t.some_type, t.some_value
FROM (
SELECT id, some_column_id, some_type, some_value
FROM my_table
WHERE some_column_id = ? AND some_type = ?
UNION ALL
SELECT s.id, s.some_column_id, s.some_type, s.some_value
FROM (SELECT NULL AS id, ? AS some_column_id, ? AS some_type, ? AS some_value) AS s
) AS t
LIMIT 1
ON DUPLICATE KEY UPDATE
some_value = ?
Basically, the query executes this way (less complicated than it may look):
Select an existing row via the WHERE clause match.
Union that result with a potential new row (table s), where the column values are explicitly given (s.id is NULL, so it will generate a new auto-increment identifier).
If an existing row is found, then the potential new row from table s is discarded (due to LIMIT 1 on table t), and it will always trigger an ON DUPLICATE KEY which will UPDATE the some_value column.
If an existing row is not found, then the potential new row is inserted (as given by table s).
Note: Every table in a relational database should have at least a primary auto-increment id column. If you don't have this, add it, even when you don't need it at first sight. It is definitely needed for this "trick".
INSERT INTO table_name (columns) VALUES (values) ON CONFLICT (id) DO NOTHING;

Mysql 'VALUES function' is deprecated

This is my python code which prints the sql query.
def generate_insert_statement(column_names, values_format, table_name, items, insert_template=INSERT_TEMPLATE, ):
return insert_template.format(
column_names=",".join(column_names),
values=",".join(
map(
lambda x: generate_raw_values(values_format, x),
items
)
),
table_name=table_name,
updates_on=create_updates_on_columns(column_names)
)
query = generate_insert_statement(table_name=property['table_name'],
column_names=property['column_names'],
values_format=property['values_format'], items=batch)
print(query) #here
execute_commit(query)
When printing the Mysql query my Django project shows following error in the terminal:
'VALUES function' is deprecated and will be removed in a future release. Please use an alias (INSERT INTO ... VALUES (...) AS alias) and replace VALUES(col) in the ON DUPLICATE KEY UPDATE clause with alias.col instead
Mysql doumentation does not say much about it.What does this mean and how to can i rectify it.
INSERT_TEMPLATE = "INSERT INTO {table_name} ({column_names}) VALUES {values} ON DUPLICATE KEY UPDATE {updates_on};"
Basically, mysql is looking toward removing a longstanding non-standard use of the values function to clear the way for some future work where the SQL standard allows using a VALUES keyword for something very different, and because how the VALUES function works in subqueries or not in a ON DUPLICATE KEY UPDATE clause can be surprising.
You need to add an alias to the VALUES clause and then use that alias instead of the non-standard VALUES function in the ON DUPLICATE KEY UPDATE clause, e.g. change
INSERT INTO foo (bar, baz) VALUES (1,2)
ON DUPLICATE KEY UPDATE baz=VALUES(baz)
to
INSERT INTO foo (bar, baz) VALUES (1,2) AS new_foo
ON DUPLICATE KEY UPDATE baz=new_foo.baz
(This only works on mysql 8+, not on older versions or in any version of mariadb through at least 10.8.3)
Note that this is no different if you are updating multiple rows:
INSERT INTO foo (bar, baz) VALUES (1,2),(3,4),(5,6) AS new_foo
ON DUPLICATE KEY UPDATE baz=new_foo.baz
From https://dev.mysql.com/worklog/task/?id=13325:
According to the SQL standard, VALUES is a table value constructor that returns a table. In MySQL this is true for simple INSERT and REPLACE statements, but MySQL also uses VALUES to refer to values in INSERT ... ON DUPLICATE KEY UPDATE statements. E.g.:
INSERT INTO t(a,b) VALUES (1, 2) ON DUPLICATE KEY
UPDATE a = VALUES (b) + 1;
VALUES (b) refers to the value for b in the table value constructor for the INSERT, in this case 2.
To make the value available in simple arithmetic expressions, it is part of the parser rule for simple_expr. Unfortunately, this also means that VALUES can be used in this way in a lot of other statements, e.g.:
SELECT a FROM t WHERE a=VALUES(a);
In all such statements, VALUES returns NULL, so the above query would not have the intended effect. The only meaningful usage of VALUES as a function, rather than a table value constructor, is in INSERT ... ON DUPLICATE KEY UPDATE. Also, the non-standard use in INSERT ... ON DUPLICATE KEY UPDATE does not extend to subqueries. E.g.:
INSERT INTO t1 VALUES(1,2) ON DUPLICATE KEY
UPDATE a=(SELECT a FROM t2 WHERE b=VALUES(b));
This does not do what the user expects. VALUES(b) will return NULL, even if it is in an INSERT .. ON DUPLICATE KEY UPDATE statement.
The non-standard syntax also makes it harder (impossible?) to implement standard behavior of VALUES as specified in feature F641 "Row and table constructors".

SQLite: Batch aggregation with insert

I would like to do the following:
cur.execute("SELECT key, SUM(val) FROM table GROUP BY key")
cur.executemany("INSERT INTO table_sums VALUES(?,?)",(row for row in cur))
in a single SQLite statement with batch processing if possible, that is it does the sum only for a number of keys, inserts, continues till all are processed.
Apparently I am using Python right now but as I am asking for a single statement (if exists), I don't think this should matter. If it doesn't exist, perhaps there is an efficient(!) work-around in Python?
EDIT: To avoid a SELECT WHERE query, it would actually be desirable not to produce complete sums for a subset of keys, but to just sum over the first n rows and store the resulting sums so far, then continue with the next n...
The two SQLs could be combined into one using a temporary view.
WITH tempsums as
(SELECT key,sum(value) from table
GROUP by key
where key in :batch)
INSERT INTO total_sums SELECT * from tempsums)

Copy row from Cassandra database and then insert it using Python

I'm using plugin DataStax Python Driver for Apache Cassandra.
I want to read 100 rows from database and then insert them again into database after changing one value. I do not want to miss previous records.
I know how to get my rows:
rows = session.execute('SELECT * FROM columnfamily LIMIT 100;')
for myrecord in rows:
print(myrecord.timestamp)
I know how to insert new rows into database:
stmt = session.prepare('''
INSERT INTO columnfamily (rowkey, qualifier, info, act_date, log_time)
VALUES (, ?, ?, ?, ?)
IF NOT EXISTS
''')
results = session.execute(stmt, [arg1, arg2, ...])
My problems are that:
I do not know how to change only one value in a row.
I don't know how to insert rows into database without using CQL. My columnfamily has more than 150 columns and writing all their names in query does not seem as a best idea.
To conclude:
Is there a way to get rows, modify one value from every one of them and then insert this rows into database without using only CQL?
First, you need to select only needed columns from Cassandra - it will be faster to transfer the data. You need to include all columns of primary key + column that you want to change.
After you get the data, you can use UPDATE command to update only necessary column (example from documentation):
UPDATE cycling.cyclist_name
SET comments ='='Rides hard, gets along with others, a real winner'
WHERE id = fb372533-eb95-4bb4-8685-6ef61e994caa
You can also use prepared statement to make it more performant...
But be careful - the UPDATE & INSERT in CQL are really UPSERTs, so if you change columns that are part of primary key, then it will create new entry...

Categories