Python MySQLdb : Duplicate entry '2147483647' for key 1 - python

I am getting this error when I run my program in Python.
Here's the table of my database :
Field Type Collation Null Key Default
articleCode varchar(25) latin1_swedish_ci NO UNI
dateReceived datetime NULL NO MUL 0000-00-00 00:00:00
s100RSD datetime NULL YES 0000-00-00 00:00:00
remarks longtext latin1_swedish_ci YES
And to simplify the problem of my program, I will isolate the part of the program that makes an error, here:
import MySQLdb
def main():
dateReceived = '2011-10-07 01:06:30'
articleCode = 'name'
s100rsd = '2011-10-07 01:06:30'
remark_text = 'This is a remark'
db = MySQLdb.connect('server', 'user', 'passwd', 'table_name', port)
cur = db.cursor()
db_query = cur.execute("INSERT INTO tblS100CurrentListing (articleCode, dateReceived, s100RSD, remarks) VALUES ('articleCode', 'dateReceived', 's100rsd', 'remark_text')")
cur.close()
db.close()
if __name__ == '__main__':
main()
Here's the error that I get : _mysql_exceptions.IntegrityError: (1062, "Duplicate entry '2147483647' for key 1")
Thanks for all your help!

You seem to be inserting constants into the database, not your actual values. Instead, try something similar to;
db_query = cur.execute("INSERT INTO tblS100CurrentListing " +
"(articleCode, dateReceived, s100RSD, remarks) VALUES (%s, %s, %s, %s)",
(articleCode, dateReceived, s100rsd, remark_text))

This happens because the limit of Key. If it is INTEGER, 2147483647 is the limit. You can choose something like BIGINT or the one bigger than INTEGER. All records after 2147483647 will be tried to write on value 2147483647. So this is why you are having this problem. Change it with BIGINT / LARGEINT or something like this.
Hope it helps.

Unique key on field "articleCode" preventing MySQL to have two records in this column with the same content. Seems like you already inserted one on the first program run.
Remove previously inserted record with articleCode = 'name' OR remove UNIQUE KEY on articleCode field OR try to insert different value of articleCode.
Hope this helps!

After correcting the code as described in other answers, you should modify the table in order to reset its auto_increment counter.
ALTER TABLE tblS100CurrentListing auto_increment=1
should reset the counter to the lowest possible value.
Removing or repairing the erroneous values from the table is required; otherwise the change won't have any effect.
Besides, is it really needed to insert a field which is set to auto_increment? Or is this part of a restore process? Otherwise, the two things are redundant: either you get the data automatically or you insert them. Both can (as seen) lead to conflicts.

Related

update the last entered value from a selection of values in a database with python , mysql

Okay so i have a table which has student id and the student id is used as identifier to edit the column but what if the same student lends a book twice then all the student value will b edited which i don't want....i want the last entered data of student id to b edited and using a Sl.No is not a solution here because its practically complicated.I am using python connector. Please help :) Thanks in advance
code i use right now :
con = mysql.connect(host='localhost', user='root',
password='monkey123', database='BOOK')
c = con.cursor()
c.execute(
f"UPDATE library set `status`='Returned',`date returned`='{str(cal.selection_get())}' WHERE `STUDENT ID`='{e_sch.get()}';")
c.execute('commit')
con.close()
messagebox.showinfo(
'Success', 'Book has been returned successfully')
If I followed you correctly, you want to update just one record that matches the where condition. For this to be done in a reliable manner, you need a column to define the ordering of the records. It could be a date, an incrementing id, or else. I assume that such column exists in your table and is called ordering_column.
A simple option is to use ORDER BY and LIMIT in the UPDATE statement, like so:
sql = """
UPDATE library
SET status = 'Returned', date returned = %s
WHERE student_id = %s
ORDER BY ordering_column DESC
LIMIT 1
"""
c = con.cursor()
c.execute(sql, (str(cal.selection_get()), e_sch.get(), )
Note that I modified your code so input values are given as parameters rather than concatenated into the query string. This is an important change, that makes your code safer and more efficient.

make MariaDB update from Python much faster

I have a python script that aggregates data from multiple sources to one, for technical reasons.
In this script, I create an employees table fills it with data and in a second step, fetch each employee's name/last name from another data source. My code is the following:
Create the table and fill it with data:
def createIdentite(mariaConnector, fmsConnector):
print('Creating table "Identite"...')
mariadbCursor = mariaConnector.cursor()
# verify we have the destination tables we need
print(' Checking for table Identite...')
if mariaCheckTableExists(mariadbConnector, 'Identite') == False:
print(' Table doesn\'t exist, creating it...')
mariadbCursor.execute("""
CREATE TABLE Identite (
PK_FP VARCHAR(50) NOT NULL,
LieuNaissance TEXT,
PaysNaissance TEXT,
Name TEXT,
LastName TEXT,
Nationalite TEXT,
PaysResidence TEXT,
PersonneAPrevenir TEXT,
Tel1_PAP TEXT,
Tel2_PAP TEXT,
CategorieMutuelle TEXT,
Ep1_MUTUELLE BOOLEAN,
TypeMutuelle BOOLEAN,
NiveauMutuelle BOOLEAN,
NiveauMutuelle2 BOOLEAN,
NiveauMutuelle3 BOOLEAN,
PartMutuelleSalarie FLOAT,
PartMutuelleSalarieOption FLOAT,
PRIMARY KEY (PK_FP)
)
""")
mariadbCursor.execute("CREATE INDEX IdentitePK_FP ON Identite(PK_FP)")
else:
# flush the table
print(' Table exists, flushing it...')
mariadbCursor.execute("DELETE FROM Identite")
# now fill it with fresh data
print(' Retrieving the data from FMS...')
fmsCursor = fmsConnector.cursor()
fmsCursor.execute("""
SELECT
PK_FP,
Lieu_Naiss_Txt,
Pays_Naiss_Txt,
Nationalite_Txt,
Pays_Resid__Txt,
Pers_URG,
Tel1_URG,
Tel2_URG,
CAT_MUTUELLE,
CASE WHEN Ep1_MUTUELLE = 'OUI' THEN 1 ELSE 0 END as Ep1_MUTUELLE,
CASE WHEN TYPE_MUT = 'OUI' THEN 1 ELSE 0 END as TYPE_MUT,
CASE WHEN Niv_Mutuelle IS NULL THEN 0 ELSE 1 END as Niv_Mutuelle,
CASE WHEN NIV_MUTUELLE[2] IS NULL THEN 0 ELSE 1 END as Niv_Mutuelle2,
CASE WHEN NIV_MUTUELLE[3] IS NULL THEN 0 ELSE 1 END as Niv_Mutuelle3,
PART_MUT_SAL,
PART_MUT_SAL_Option
FROM B_EMPLOYE
WHERE PK_FP IS NOT NULL
""")
print(' Transferring...')
#for row in fmsCursor:
insert = """INSERT INTO Identite (
PK_FP,
LieuNaissance,
PaysNaissance,
Nationalite,
PaysResidence,
PersonneAPrevenir,
Tel1_PAP,
Tel2_PAP,
CategorieMutuelle,
Ep1_MUTUELLE,
TypeMutuelle,
NiveauMutuelle,
NiveauMutuelle2,
NiveauMutuelle3,
PartMutuelleSalarie,
PartMutuelleSalarieOption
) VALUES (
%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s
)"""
values = fmsCursor.fetchall()
mariadbCursor.executemany(insert, values)
mariaConnector.commit()
print(' Inserted '+str(len(values))+' values')
return len(values)
And the part where I retrieve first name and last name:
def updateEmployeeNames(mariaConnector, mssqlConnector):
print("Updating employee names...")
mariadbCursor = mariaConnector.cursor()
mssqlCursor = mssqlConnector.cursor()
mssqlCursor.execute("SELECT Name, LastName, PK_FP FROM F_Person")
rows = mssqlCursor.fetchall()
query = """
UPDATE Identite
SET Name = %s, LastName = %s
WHERE PK_FP = %s
"""
mariadbCursor.executemany(query, rows)
mariadbConnector.commit()
As you might have guessed, the first function takes almost no time to execute (less that 2 seconds), where the second one take almost 20.
Python's not my strong suit, but there might be another way, the aim is to make it much faster.
I already tried adding values to createIdentite's each tuple before the executeMany, but Mysql connector won't let me do that.
Thanks a lot for your help.
So the UPDATE to the existing MariaDB table is the bottleneck, in which case it might be faster to do the update on a pandas DataFrame and then push the result the MariaDB table using pandas to_sql method. A simplified example would be ...
df_main = pd.read_sql_query(fms_query, fms_engine, index_col='PK_FP')
df_mssql = pd.read_sql_query(mssql_query, mssql_engine, index_col='PK_FP')
df_main.update(df_mssql)
df_main.to_sql('Identite', mariadb_engine, if_exists='replace',
dtype={'PK_FP': sqlalchemy.types.String(50)})
... where fms_query and mssql_query are the queries from your question. fms_engine, mssql_engine, and mariadb_engine would be SQLAlchemy Engine objects.
In all MySQL Python Drivers the execute_many is rewritten, since bulk operations are not supported in MySQL, they are supported only via binary protocol in MariaDB since 10.2, full support (including delete and update) was added later and is available in the lastest 10.2, 10.3 and 10.4 versions of MariaDB Server.
The python Driver is rewriting an insert query, iterates over the number of rows and transforms the statement to
INSERT INTO t1 VALUES (row1_id, row1_data), (row2_id, row2_data),....(rown_id, row_n data)
This is quite fast, but the SQL Syntax doesn't allow this for UPDATE or DELETE. In this case the Driver needs to execute the statement n times (n= number of rows), passing the values for each row in a single statment.
MariaDB binary protocol allows to prepare the statement, executing it by sending all data at once (The execute package also contains the data).
If C would be an alternative, take a look at the bulk unittests on Github repository of MariaDB Connector/C. Otherwise you have to wait, MariaDB will likey release it's own python Driver next year.
Create the index as you create the temp table.
These combined statements work: CREATE TABLE ... SELECT ...; and INSERT INTO table ... SELECT .... However, they may be difficult to perform from Python.
It is unclear whether you need the temp table at all.
Learn how to use JOIN to get information simultaneously from two tables.

SQLite with Python "Table has X columns but Y were supplied"

I have a python script that executes some simple SQL.
c.execute("CREATE TABLE IF NOT EXISTS simpletable (id integer PRIMARY KEY, post_body text, post_id text, comment_id text, url text);")
command = "INSERT OR IGNORE INTO simpletable VALUES ('%s', '%s', '%s', '%s')" % (comments[-1].post_body, comments[-1].post_id, comments[-1].comment_id,
comments[-1].url)
c.execute(command)
c.commit()
But when I execute it, I get an error
sqlite3.OperationalError: table simpletable has 5 columns but 4 values were supplied
Why is it not automatically filling in the id key?
In Python 3.6 I did as shown below and data was inserted successfully.
I used None for autoincrementing ID since Null was not found.
conn.execute("INSERT INTO CAMPAIGNS VALUES (?, ?, ?, ?)", (None, campaign_name, campaign_username, campaign_password))
The ID structure is as follows.
ID INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL
If you don't specify the target columns VALUES is expected to provide values for all columns and that you didn't do.
INSERT
OR IGNORE INTO simpletable
(text,
post_id,
comment_id,
text)
VALUES ('%s',
'%s',
'%s',
'%s');
Specifying the target columns is advisable in any case. The query won't break, if, for any reason, the order of the columns in the tables changes.
try to specify the columns names to ensure that the destination of values doesn't depends on order.
ex:
INTO simpletable
(text,
post_id,
comment_id,
text)
And if you wants the id column to be automatically incremented make sure to add Identity property on, or similar auto increment of your dbms.
ex:
CREATE TABLE IF NOT EXISTS simpletable (id integer PRIMARY KEY Identity(1,1),
and remember your script is not prepared to alter the table structure, only creation.
If you wrote code correctly delete your SQL file(name.db) and run your code again some time it solve the problem.
Imagine this is your code:
cursor.execute('''CREATE TABLE IF NOT EXISTS food(name TEXT , price TEXT)''')
cursor.execute('INSERT INTO food VALUES ("burger" , "20")')
connection.commit()
and you see an error like this:
table has 1 column but 2 values were supplied
it happened because for example you create a file with one column and then you modify your file to two column but you don't change the file name so compiler do not over write it because it exist.

Pandas to_sql fails on duplicate primary key

I'd like to append to an existing table, using pandas df.to_sql() function.
I set if_exists='append', but my table has primary keys.
I'd like to do the equivalent of insert ignore when trying to append to the existing table, so I would avoid a duplicate entry error.
Is this possible with pandas, or do I need to write an explicit query?
There is unfortunately no option to specify "INSERT IGNORE". This is how I got around that limitation to insert rows into that database that were not duplicates (dataframe name is df)
for i in range(len(df)):
try:
df.iloc[i:i+1].to_sql(name="Table_Name",if_exists='append',con = Engine)
except IntegrityError:
pass #or any other action
You can do this with the method parameter of to_sql:
from sqlalchemy.dialects.mysql import insert
def insert_on_duplicate(table, conn, keys, data_iter):
insert_stmt = insert(table.table).values(list(data_iter))
on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(insert_stmt.inserted)
conn.execute(on_duplicate_key_stmt)
df.to_sql('trades', dbConnection, if_exists='append', chunksize=4096, method=insert_on_duplicate)
for older versions of sqlalchemy, you need to pass a dict to on_duplicate_key_update. i.e., on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(dict(insert_stmt.inserted))
please note that the "if_exists='append'" related to the existing of the table and what to do in case the table not exists.
The if_exists don't related to the content of the table.
see the doc here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html
if_exists : {‘fail’, ‘replace’, ‘append’}, default ‘fail’
fail: If table exists, do nothing.
replace: If table exists, drop it, recreate it, and insert data.
append: If table exists, insert data. Create if does not exist.
Pandas has no option for it currently, but here is the Github issue. If you need this feature too, just upvote for it.
The for loop method above slow things down significantly. There's a method parameter you can pass to panda.to_sql to help achieve customization for your sql query
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html#pandas.DataFrame.to_sql
The below code should work for postgres and do nothing if there's a conflict with primary key "unique_code". Change your insert dialects for your db.
def insert_do_nothing_on_conflicts(sqltable, conn, keys, data_iter):
"""
Execute SQL statement inserting data
Parameters
----------
sqltable : pandas.io.sql.SQLTable
conn : sqlalchemy.engine.Engine or sqlalchemy.engine.Connection
keys : list of str
Column names
data_iter : Iterable that iterates the values to be inserted
"""
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy import table, column
columns=[]
for c in keys:
columns.append(column(c))
if sqltable.schema:
table_name = '{}.{}'.format(sqltable.schema, sqltable.name)
else:
table_name = sqltable.name
mytable = table(table_name, *columns)
insert_stmt = insert(mytable).values(list(data_iter))
do_nothing_stmt = insert_stmt.on_conflict_do_nothing(index_elements=['unique_code'])
conn.execute(do_nothing_stmt)
pd.to_sql('mytable', con=sql_engine, method=insert_do_nothing_on_conflicts)
Pandas doesn't support editing the actual SQL syntax of the .to_sql method, so you might be out of luck. There's some experimental programmatic workarounds (say, read the Dataframe to a SQLAlchemy object with CALCHIPAN and use SQLAlchemy for the transaction), but you may be better served by writing your DataFrame to a CSV and loading it with an explicit MySQL function.
CALCHIPAN repo: https://bitbucket.org/zzzeek/calchipan/
I had trouble where I was still getting the IntegrityError
...strange but I just took the above and worked it backwards:
for i, row in df.iterrows():
sql = "SELECT * FROM `Table_Name` WHERE `key` = '{}'".format(row.Key)
found = pd.read_sql(sql, con=Engine)
if len(found) == 0:
df.iloc[i:i+1].to_sql(name="Table_Name",if_exists='append',con = Engine)
In my case, I was trying to insert new data in an empty table, but some of the rows are duplicated, almost the same issue here, I "may" think about fetching existing data and merge with the new data I got and continue in process, but this is not optimal, and may work only for small data, not a huge tables.
As pandas do not provide any kind of handling for this situation right now, I was looking for a suitable workaround for this, so I made my own, not sure if that will work or not for you, but I decided to control my data first instead of luck of waiting if that worked or not, so what I did is removing duplicates before I call .to_sql so if any error happens, I know more about my data and make sure I know what is going on:
import pandas as pd
def write_to_table(table_name, data):
df = pd.DataFrame(data)
# Sort by price, so we remove the duplicates after keeping the lowest only
data.sort(key=lambda row: row['price'])
df.drop_duplicates(subset=['id_key'], keep='first', inplace=True)
#
df.to_sql(table_name, engine, index=False, if_exists='append', schema='public')
So in my case, I wanted to keep the lowest price of rows (btw I was passing an array of dict for data), and for that, I did sorting first, not necessary but this is an example of what I mean with control the data that I want to keep.
I hope this will help someone who got almost the same as my situation.
When you use SQL Server you'll get a SQL error when you enter a duplicate value into a table that has a primary key constraint. You can fix it by altering your table:
CREATE TABLE [dbo].[DeleteMe](
[id] [uniqueidentifier] NOT NULL,
[Value] [varchar](max) NULL,
CONSTRAINT [PK_DeleteMe]
PRIMARY KEY ([id] ASC)
WITH (IGNORE_DUP_KEY = ON)); <-- add
Taken from https://dba.stackexchange.com/a/111771.
Now your df.to_sql() should work again.
The solutions by Jayen and Huy Tran helped me a lot, but they didn't work straight out of the box. The problem I faced with Jayen code is that it requires that the DataFrame columns be exactly as those of the database. This was not true in my case as there were some DataFrame columns that I won't write to the database.
I modified the solution so that it considers the column names.
from sqlalchemy.dialects.mysql import insert
import itertools
def insertWithConflicts(sqltable, conn, keys, data_iter):
"""
Execute SQL statement inserting data, whilst taking care of conflicts
Used to handle duplicate key errors during database population
This is my modification of the code snippet
from https://stackoverflow.com/questions/30337394/pandas-to-sql-fails-on-duplicate-primary-key
The help page from https://docs.sqlalchemy.org/en/14/core/dml.html#sqlalchemy.sql.expression.Insert.values
proved useful.
Parameters
----------
sqltable : pandas.io.sql.SQLTable
conn : sqlalchemy.engine.Engine or sqlalchemy.engine.Connection
keys : list of str
Column names
data_iter : Iterable that iterates the values to be inserted. It is a zip object.
The length of it is equal to the chunck size passed in df_to_sql()
"""
vals = [dict(zip(z[0],z[1])) for z in zip(itertools.cycle([keys]),data_iter)]
insertStmt = insert(sqltable.table).values(vals)
doNothingStmt = insertStmt.on_duplicate_key_update(dict(insertStmt.inserted))
conn.execute(doNothingStmt)
I faced the same issue and I adopted the solution provided by #Huy Tran for a while, until my tables started to have schemas.
I had to improve his answer a bit and this is the final result:
def do_nothing_on_conflicts(sql_table, conn, keys, data_iter):
"""
Execute SQL statement inserting data
Parameters
----------
sql_table : pandas.io.sql.SQLTable
conn : sqlalchemy.engine.Engine or sqlalchemy.engine.Connection
keys : list of str
Column names
data_iter : Iterable that iterates the values to be inserted
"""
columns = []
for c in keys:
columns.append(column(c))
if sql_table.schema:
my_table = table(sql_table.name, *columns, schema=sql_table.schema)
# table_name = '{}.{}'.format(sql_table.schema, sql_table.name)
else:
my_table = table(sql_table.name, *columns)
# table_name = sql_table.name
# my_table = table(table_name, *columns)
insert_stmt = insert(my_table).values(list(data_iter))
do_nothing_stmt = insert_stmt.on_conflict_do_nothing()
conn.execute(do_nothing_stmt)
How to use it:
history.to_sql('history', schema=schema, con=engine, method=do_nothing_on_conflicts)
The idea is the same as #Nfern's but uses recursive function to divide the df into half in each iteration to skip the row/rows causing the integrity violation.
def insert(df):
try:
# inserting into backup table
df.to_sql("table",con=engine, if_exists='append',index=False,schema='schema')
except:
rows = df.shape[0]
if rows>1:
df1 = df.iloc[:int(rows/2),:]
df2 = df.iloc[int(rows/2):,:]
insert(df1)
insert(df2)
else:
print(f"{df} not inserted. Integrity violation, duplicate primary key/s")

SQLite insert or ignore and return original _rowid_

I've spent some time reading the SQLite docs, various questions and answers here on Stack Overflow, and this thing, but have not come to a full answer.
I know that there is no way to do something like INSERT OR IGNORE INTO foo VALUES(...) with SQLite and get back the rowid of the original row, and that the closest to it would be INSERT OR REPLACE but that deletes the entire row and inserts a new row and thus gets a new rowid.
Example table:
CREATE TABLE foo(
id INTEGER PRIMARY KEY AUTOINCREMENT,
data TEXT
);
Right now I can do:
sql = sqlite3.connect(":memory:")
# create database
sql.execute("INSERT OR IGNORE INTO foo(data) VALUES(?);", ("Some text.", ))
the_id_of_the_row = None
for row in sql.execute("SELECT id FROM foo WHERE data = ?", ("Some text.", )):
the_id_of_the_row = row[0]
But something ideal would look like:
the_id_of_the_row = sql.execute("INSERT OR IGNORE foo(data) VALUES(?)", ("Some text", )).lastrowid
What is the best (read: most efficient) way to insert a row into a table and return the rowid, or to ignore the row if it already exists and just get the rowid? Efficiency is important because this will be happening quite often.
Is there a way to INSERT OR IGNORE and return the rowid of the row that the ignored row was compared to? This would be great, as it would be just as efficient as an insert.
The way that worked the best for me was to insert or ignore the values, and the select the rowid in two separate steps. I used a unique constraint on the data column to both speed up selects and avoid duplicates.
sql.execute("INSERT OR IGNORE INTO foo(data) VALUES(?);" ("Some text.", ))
last_row_id = sql.execute("SELECT id FROM foo WHERE data = ?;" ("Some text. ", ))
The select statement isn't as slow as I thought it would be. This, it seems, is due to SQLite automatically creating an index for the unique columns.
INSERT OR IGNORE is for situations where you do not care about the identity of the record; where the goal is only to have some record with that specific value.
If you want to know whether a new record is inserted or not, you have to check by hand:
the_id_of_the_row = None
for row in sql.execute("SELECT id FROM foo WHERE data = ?", ...):
the_id_of_the_row = row[0]
if the_id_of_the_row is None:
c = sql.cursor()
c.execute("INSERT INTO foo(data) VALUES(?)", ...)
the_id_of_the_row = c.lastrowid
As for efficiency: when SQLite checks the datacolumn for duplicates, it has to do exactly the same query that you're doing with the SELECT, and once you've done that, the access path is in the cache, so performance should not be a problem. In any case, it is necessary to execute two separate INSERT/SELECT queries (in either order, both your and my code work, but yours is simpler).

Categories