I started by googling and found the article How to write INSERT if NOT EXISTS queries in standard SQL which talks about mutex tables.
I have a table with ~14 million records. If I want to add more data in the same format, is there a way to ensure the record I want to insert does not already exist without using a pair of queries (i.e., one query to check and one to insert is the result set is empty)?
Does a unique constraint on a field guarantee the insert will fail if it's already there?
It seems that with merely a constraint, when I issue the insert via PHP, the script croaks.
Use INSERT IGNORE INTO table.
There's also INSERT … ON DUPLICATE KEY UPDATE syntax, and you can find explanations in 13.2.6.2 INSERT ... ON DUPLICATE KEY UPDATE Statement.
Post from bogdan.org.ua according to Google's webcache:
18th October 2007
To start: as of the latest MySQL, syntax presented in the title is not
possible. But there are several very easy ways to accomplish what is
expected using existing functionality.
There are 3 possible solutions: using INSERT IGNORE, REPLACE, or
INSERT … ON DUPLICATE KEY UPDATE.
Imagine we have a table:
CREATE TABLE `transcripts` (
`ensembl_transcript_id` varchar(20) NOT NULL,
`transcript_chrom_start` int(10) unsigned NOT NULL,
`transcript_chrom_end` int(10) unsigned NOT NULL,
PRIMARY KEY (`ensembl_transcript_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Now imagine that we have an automatic pipeline importing transcripts
meta-data from Ensembl, and that due to various reasons the pipeline
might be broken at any step of execution. Thus, we need to ensure two
things:
repeated executions of the pipeline will not destroy our
> database
repeated executions will not die due to ‘duplicate
> primary key’ errors.
Method 1: using REPLACE
It’s very simple:
REPLACE INTO `transcripts`
SET `ensembl_transcript_id` = 'ENSORGT00000000001',
`transcript_chrom_start` = 12345,
`transcript_chrom_end` = 12678;
If the record exists, it will be overwritten; if it does not yet
exist, it will be created. However, using this method isn’t efficient
for our case: we do not need to overwrite existing records, it’s fine
just to skip them.
Method 2: using INSERT IGNORE Also very simple:
INSERT IGNORE INTO `transcripts`
SET `ensembl_transcript_id` = 'ENSORGT00000000001',
`transcript_chrom_start` = 12345,
`transcript_chrom_end` = 12678;
Here, if the ‘ensembl_transcript_id’ is already present in the
database, it will be silently skipped (ignored). (To be more precise,
here’s a quote from MySQL reference manual: “If you use the IGNORE
keyword, errors that occur while executing the INSERT statement are
treated as warnings instead. For example, without IGNORE, a row that
duplicates an existing UNIQUE index or PRIMARY KEY value in the table
causes a duplicate-key error and the statement is aborted.”.) If the
record doesn’t yet exist, it will be created.
This second method has several potential weaknesses, including
non-abortion of the query in case any other problem occurs (see the
manual). Thus it should be used if previously tested without the
IGNORE keyword.
Method 3: using INSERT … ON DUPLICATE KEY UPDATE:
Third option is to use INSERT … ON DUPLICATE KEY UPDATE
syntax, and in the UPDATE part just do nothing do some meaningless
(empty) operation, like calculating 0+0 (Geoffray suggests doing the
id=id assignment for the MySQL optimization engine to ignore this
operation). Advantage of this method is that it only ignores duplicate
key events, and still aborts on other errors.
As a final notice: this post was inspired by Xaprb. I’d also advise to
consult his other post on writing flexible SQL queries.
Solution:
INSERT INTO `table` (`value1`, `value2`)
SELECT 'stuff for value1', 'stuff for value2' FROM DUAL
WHERE NOT EXISTS (SELECT * FROM `table`
WHERE `value1`='stuff for value1' AND `value2`='stuff for value2' LIMIT 1)
Explanation:
The innermost query
SELECT * FROM `table`
WHERE `value1`='stuff for value1' AND `value2`='stuff for value2' LIMIT 1
used as the WHERE NOT EXISTS-condition detects if there already exists a row with the data to be inserted. After one row of this kind is found, the query may stop, hence the LIMIT 1 (micro-optimization, may be omitted).
The intermediate query
SELECT 'stuff for value1', 'stuff for value2' FROM DUAL
represents the values to be inserted. DUAL refers to a special one row, one column table present by default in all Oracle databases (see https://en.wikipedia.org/wiki/DUAL_table). On a MySQL-Server version 5.7.26 I got a valid query when omitting FROM DUAL, but older versions (like 5.5.60) seem to require the FROM information. By using WHERE NOT EXISTS the intermediate query returns an empty result set if the innermost query found matching data.
The outer query
INSERT INTO `table` (`value1`, `value2`)
inserts the data, if any is returned by the intermediate query.
In MySQL, ON DUPLICATE KEY UPDATE or INSERT IGNORE can be viable solutions.
An example of ON DUPLICATE KEY UPDATE update based on mysql.com:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
UPDATE table SET c=c+1 WHERE a=1;
An example of INSERT IGNORE based on mysql.com
INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name [(col_name,...)]
{VALUES | VALUE} ({expr | DEFAULT},...),(...),...
[ ON DUPLICATE KEY UPDATE
col_name=expr
[, col_name=expr] ... ]
Or:
INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name
SET col_name={expr | DEFAULT}, ...
[ ON DUPLICATE KEY UPDATE
col_name=expr
[, col_name=expr] ... ]
Or:
INSERT [LOW_PRIORITY | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name [(col_name,...)]
SELECT ...
[ ON DUPLICATE KEY UPDATE
col_name=expr
[, col_name=expr] ... ]
Any simple constraint should do the job, if an exception is acceptable. Examples:
primary key if not surrogate
unique constraint on a column
multi-column unique constraint
Sorry if this seems deceptively simple. I know it looks bad confronted to the link you share with us. ;-(
But I nevertheless give this answer, because it seems to fill your need. (If not, it may trigger you updating your requirements, which would be "a Good Thing"(TM) also).
If an insert would break the database unique constraint, an exception is throw at the database level, relayed by the driver. It will certainly stop your script, with a failure. It must be possible in PHP to address that case...
Try the following:
IF (SELECT COUNT(*) FROM beta WHERE name = 'John' > 0)
UPDATE alfa SET c1=(SELECT id FROM beta WHERE name = 'John')
ELSE
BEGIN
INSERT INTO beta (name) VALUES ('John')
INSERT INTO alfa (c1) VALUES (LAST_INSERT_ID())
END
REPLACE INTO `transcripts`
SET `ensembl_transcript_id` = 'ENSORGT00000000001',
`transcript_chrom_start` = 12345,
`transcript_chrom_end` = 12678;
If the record exists, it will be overwritten; if it does not yet exist, it will be created.
Here is a PHP function that will insert a row only if all the specified columns values don't already exist in the table.
If one of the columns differ, the row will be added.
If the table is empty, the row will be added.
If a row exists where all the specified columns have the specified values, the row won't be added.
function insert_unique($table, $vars)
{
if (count($vars)) {
$table = mysql_real_escape_string($table);
$vars = array_map('mysql_real_escape_string', $vars);
$req = "INSERT INTO `$table` (`". join('`, `', array_keys($vars)) ."`) ";
$req .= "SELECT '". join("', '", $vars) ."' FROM DUAL ";
$req .= "WHERE NOT EXISTS (SELECT 1 FROM `$table` WHERE ";
foreach ($vars AS $col => $val)
$req .= "`$col`='$val' AND ";
$req = substr($req, 0, -5) . ") LIMIT 1";
$res = mysql_query($req) OR die();
return mysql_insert_id();
}
return False;
}
Example usage:
<?php
insert_unique('mytable', array(
'mycolumn1' => 'myvalue1',
'mycolumn2' => 'myvalue2',
'mycolumn3' => 'myvalue3'
)
);
?>
There are several answers that cover how to solve this if you have a UNIQUE index that you can check against with ON DUPLICATE KEY or INSERT IGNORE. That is not always the case, and as UNIQUE has a length constraint (1000 bytes) you might not be able to change that. For example, I had to work with metadata in WordPress (wp_postmeta).
I finally solved it with two queries:
UPDATE wp_postmeta SET meta_value = ? WHERE meta_key = ? AND post_id = ?;
INSERT INTO wp_postmeta (post_id, meta_key, meta_value) SELECT DISTINCT ?, ?, ? FROM wp_postmeta WHERE NOT EXISTS(SELECT * FROM wp_postmeta WHERE meta_key = ? AND post_id = ?);
Query 1 is a regular UPDATE query without any effect when the data set in question is not there. Query 2 is an INSERT which depends on a NOT EXISTS, i.e. the INSERT is only executed when the data set doesn't exist.
Something worth noting is that INSERT IGNORE will still increment the primary key whether the statement was a success or not just like a normal INSERT would.
This will cause gaps in your primary keys that might make a programmer mentally unstable. Or if your application is poorly designed and depends on perfect incremental primary keys, it might become a headache.
Look into innodb_autoinc_lock_mode = 0 (server setting, and comes with a slight performance hit), or use a SELECT first to make sure your query will not fail (which also comes with a performance hit and extra code).
Update or insert without known primary key
If you already have a unique or primary key, the other answers with either INSERT INTO ... ON DUPLICATE KEY UPDATE ... or REPLACE INTO ... should work fine (note that replace into deletes if exists and then inserts - thus does not partially update existing values).
But if you have the values for some_column_id and some_type, the combination of which are known to be unique. And you want to update some_value if exists, or insert if not exists. And you want to do it in just one query (to avoid using a transaction). This might be a solution:
INSERT INTO my_table (id, some_column_id, some_type, some_value)
SELECT t.id, t.some_column_id, t.some_type, t.some_value
FROM (
SELECT id, some_column_id, some_type, some_value
FROM my_table
WHERE some_column_id = ? AND some_type = ?
UNION ALL
SELECT s.id, s.some_column_id, s.some_type, s.some_value
FROM (SELECT NULL AS id, ? AS some_column_id, ? AS some_type, ? AS some_value) AS s
) AS t
LIMIT 1
ON DUPLICATE KEY UPDATE
some_value = ?
Basically, the query executes this way (less complicated than it may look):
Select an existing row via the WHERE clause match.
Union that result with a potential new row (table s), where the column values are explicitly given (s.id is NULL, so it will generate a new auto-increment identifier).
If an existing row is found, then the potential new row from table s is discarded (due to LIMIT 1 on table t), and it will always trigger an ON DUPLICATE KEY which will UPDATE the some_value column.
If an existing row is not found, then the potential new row is inserted (as given by table s).
Note: Every table in a relational database should have at least a primary auto-increment id column. If you don't have this, add it, even when you don't need it at first sight. It is definitely needed for this "trick".
INSERT INTO table_name (columns) VALUES (values) ON CONFLICT (id) DO NOTHING;
I am making a script, that should create a schema for each customer. I’m fetching all metadata from a database that defines how each customer’s schema should look like, and then create it. Everything is well defined, the types, names of tables, etc. A customer has many tables (fx, address, customers, contact, item, etc), and each table has the same metadata.
My procedure now:
get everything I need from the metadataDatabase.
In a for loop, create a table, and then Alter Table and add each metadata (This is done for each table).
Right now my script runs in about a minute for each customer, which I think is too slow. It has something to do with me having a loop, and in that loop, I’m altering each table.
I think that instead of me altering (which might be not so clever approach), I should do something like the following:
Note that this is just a stupid but valid example:
for table in tables:
con.execute("CREATE TABLE IF NOT EXISTS tester.%s (%s, %s);", (table, "last_seen date", "valid_from timestamp"))
But it gives me this error (it seems like it reads the table name as a string in a string..):
psycopg2.errors.SyntaxError: syntax error at or near "'billing'"
LINE 1: CREATE TABLE IF NOT EXISTS tester.'billing' ('last_seen da...
Consider creating tables with a serial type (i.e., autonumber) ID field and then use alter table for all other fields by using a combination of sql.Identifier for identifiers (schema names, table names, column names, function names, etc.) and regular format for data types which are not literals in SQL statement.
from psycopg2 import sql
# CREATE TABLE
query = """CREATE TABLE IF NOT EXISTS {shm}.{tbl} (ID serial)"""
cur.execute(sql.SQL(query).format(shm = sql.Identifier("tester"),
tbl = sql.Identifier("table")))
# ALTER TABLE
items = [("last_seen", "date"), ("valid_from", "timestamp")]
query = """ALTER TABLE {shm}.{tbl} ADD COLUMN {col} {typ}"""
for item in items:
# KEEP IDENTIFIER PLACEHOLDERS
final_query = query.format(shm="{shm}", tbl="{tbl}", col="{col}", typ=i[1])
cur.execute(sql.SQL(final_query).format(shm = sql.Identifier("tester"),
tbl = sql.Identifier("table"),
col = sql.Identifier(item[0]))
Alternatively, use str.join with list comprehension for one CREATE TABLE:
query = """CREATE TABLE IF NOT EXISTS {shm}.{tbl} (
"id" serial,
{vals}
)"""
items = [("last_seen", "date"), ("valid_from", "timestamp")]
val = ",\n ".join(["{{}} {typ}".format(typ=i[1]) for i in items])
# KEEP IDENTIFIER PLACEHOLDERS
pre_query = query.format(shm="{shm}", tbl="{tbl}", vals=val)
final_query = sql.SQL(pre_query).format(*[sql.Identifier(i[0]) for i in items],
shm = sql.Identifier("tester"),
tbl = sql.Identifier("table"))
cur.execute(final_query)
SQL (sent to database)
CREATE TABLE IF NOT EXISTS "tester"."table" (
"id" serial,
"last_seen" date,
"valid_from" timestamp
)
However, this becomes heavy as there are too many server roundtrips.
How many tables with how many columns are you creating that this is slow? Could you ssh to a machine closer to your server and run the python there?
I don't get that error. Rather, I get an SQL syntax error. A values list is for conveying data. But ALTER TABLE is not about data, it is about metadata. You can't use a values list there. You need the names of the columns and types in double quotes (or no quotes) rather than single quotes. And you can't have a comma between name and type. And you can't have parentheses around each pair. And each pair needs to be introduced with "ADD", you can't have it just once. You are using the wrong tool for the job. execute_batch is almost the right tool, except it will use single quotes rather than double quotes around the identifiers. Perhaps you could add a flag to it tell it to use quote_ident.
Not only is execute_values the wrong tool for the job, but I think python in general might be as well. Why not just load from a .sql file?
I have a table using SQL Lite with Python. The size of the table always has 3 columns and could have many rows. Each of the cells are strings. Here is example table:
serial_num date_measured status
1234A 1-1-2015 passed
4321B 6-21-2015 failed
1423C 12-25-2015 passed
......
My program prompts me for a serial number. This is saved as a variable called serialNum. How can I delete (or overwrite) an entire row if serialNum equals any of the strings in the serial_num column in my table?
I've seen many examples on how to delete (or overwrite) a row in a table if I know all the values in each cell of that row, but my trouble is that the only cell that could ever be the same in each row would be the serial number. I need to so a search through the serial_number column and if any string in that column equals the current value of my serialNum variable, I need to delete (or overwrite) that row.
import sqlite3
conn = sqlite3.connect('example.db')
c = conn.cursor()
c.execute('''CREATE TABLE test (serial_num text, date_measured text, status text)''')
c.execute("INSERT INTO test VALUES ('1234A', '1-1-2015', 'passed')")
c.execute("INSERT INTO test VALUES ('4321B', '6-21-2015', 'failed')")
c.execute("INSERT INTO test VALUES ('1423C', '12-25-2015', 'passed')")
conn.commit()
Does anyone know a simple way to do this? I've seen others say that an ID must be used or a temporary table, but I would hope there might be an easier way to accomplish my task. Any advice would be great.
SQL suports this: simply use delete
"delete from test where serial_num=<some input>;"
or in this case
c.execute("delete from test where serial_num=%s;", serialNum);
There's no need to search through the list when using SQL. SQL is declarative: you tell it what to do using your query, not how to do it. Don't loop though all your rows to check which to delete: tell it what to delete and the database engine will find the best/fastest way to satisfy that goal.
Hope I well interpreted your question
for row in c.execute('SELECT * FROM test WHERE serial_num = ?', serialNum'):
# do whatever you want on row
print row
I was able to figure out a working solution:
sql = "DELETE FROM test WHERE serial_num = ?"
c.execute(sql, (serialNum,))
The comma after serialNum for some reason has to be there. Thank you #Michiel Arienfor the head start
I am writing a genetic algorithm in python that makes use of a sqlite3 database to store information about a structure and its properties.
I have a separate table for structure and property and therefore need to know the index of the newly added structure so that it can be referenced in the property table. I would like to use the struct_id INT IDENTITY(1,0) PRIMARY KEY, command, but I can't seem to immediately retrieve the new id from the cursor.
This wouldn't normally be a problem as the additions happen serially, but the plan is to have dozens of individual processes writing to this database simultaneously.
My current attempt is below, but I've found that the structure ids are being overwritten when many processes rapidly write to the database.
Thanks in advance.
def add_structure(self, struct):
'''
inserts a structure and its properties into a sqlite database and returns the struct_id
'''
conn = self.get_conn()
cursor = conn.cursor()
# insert structure
new_id = -1 # initialize
while True:
# select current max id number and add one
new_id = cursor.execute('SELECT max(struct_id) FROM structure').fetchone()[0] + 1
# attemts to insert. If structure with id already exists, error is returned and loop restarts
try:
cursor.execute('INSERT INTO structure (struct_id, input_id, stoic, geo) \
VALUES (?, ?, ?, ?)',
(new_id, self.input_ref, self.stoichiometry.get_string(), struct.get_geometry()))
prop_list = []
for prop in struct.properties.iteritems():
prop_list.append((new_id, prop[0], str(prop[1]),))
cursor.executemany('INSERT INTO property (struct_id, key, value) \
VALUES (?, ?, ?)', prop_list)
conn.commit()
except Exception,e: print str(e); continue # structure id clash, re-evaluate new id
break # if successful
# insert attributes
conn.commit()
cursor.close()
conn.close()
if new_id > 0: # no errors
struct.index = new_id
print 'structure added to DB with ID: ' + str(new_id)
return new_id
else: raise Exception # insertion problem
My mistake!
I made an error in testing and the index was actually the proper one. So in short, my code works. If you're running in to the same problem, feel free to use mine as a framework. Be sure to surround any execute statement in a try-catch block, even if it's just a read, the database could be locked and throw an exception.
I created a table by following query:
with sqlite3.connect('example.db', detect_types=sqlite3.PARSE_DECLTYPES) as conn:
cur = conn.cursor()
cur.execute("CREATE TABLE Surveys(Id INTEGER PRIMARY KEY, Name TEXT, Desc TEXT, DictObject BLOB, Hash TEXT)")
conn.commit()
Now I have to add some Survey data to Surveys table for every request. Surveys table has Id as primary integer value. This has to be increased upon every insertion - And What is the proper way to do it? Do I have to fetch every row and check what the lastIdis upon every request?
sqlite will automatically provide an id for a INTEGER PRIMARY KEY column on INSERT on a table if you do not provide a value yourself. Just insert data for every column except for the id column.
with sqlite3.connect('example.db', detect_types=sqlite3.PARSE_DECLTYPES) as conn:
cur = conn.cursor()
cur.execute("INSERT INTO Surveys(Name, Desc, DictObject, Hash) VALUES (?, ?, ?, ?",
('somename', 'some description\nof sorts\n',
"{'dump': 'of a dictionary'}", '0xhash'))
You may want to add the keyword AUTOINCREMENT to your id column though; the default is to pick the highest existing row id plus 1, and if you delete data from the table on a regular basis that can lead to reuse of ids (delete the current highest id and it'll be re-used next time round). AUTOINCREMENT guarantees that each generated number will only be used once for a table, independent of deletes.
Note that when you use a sqlite3 connection as a context manager (using the with statement), the commit() is automatically applied for you. Quoting the documentation:
Connection objects can be used as context managers that automatically commit or rollback transactions. In the event of an exception, the transaction is rolled back; otherwise, the transaction is committed.
In other words, you can safely remove the conn.commit() line in your code, it is entirely redundant.