Avoid duplicate data in mysql - python

I have a csv file from stock that is updated every day.
I want to enter this data in a table and Just add new data every day.
this is my code:
# - *- coding: utf- 8 - *-
import csv
import mysql.connector
from datetime import datetime
cnx = mysql.connector.connect(host= 'localhost',
user= 'root',
passwd='pass',
db='stock')
cursor = cnx.cursor()
cursor.execute("""CREATE TABLE IF NOT EXISTS stock(id INT AUTO_INCREMENT ,
name VARCHAR(50), day DATE UNIQUE, open float, high float, low float,
close float, vol float, PRIMARY KEY(id))""")
a = 0
with open("file path")as f:
data = csv.reader(f)
for row in data:
if a== 0 :
a=+1
else:
cursor.execute('''INSERT INTO stock(name,day,open,high,low,close,vol)
VALUES("%s","%s","%s","%s","%s","%s","%s")''',
(row[0],int(row[1]),float(row[2]),float(row[3]),float(row[4]),
float(row[5]),float(row[6])))
cnx.commit()
cnx.close()
But I can not prevent duplication of information

Assuming that you want to avoid duplicates on (name, day), one approach would be to set a unique key on this tuple of columns. You can then use insert ignore, or better yet on duplicate key syntax to skip the duplicate rows.
You create the table like:
create table if not exists stock(
id int auto_increment ,
name varchar(50),
day date,
open float,
high float,
low float,
close float,
vol float,
primary key(id),
unique (name, day) -- unique constraint
);
Then:
insert into stock(name,day,open,high,low,close,vol)
values(%s, %s, %s, %s, %s, %s, %s)
on duplicate key update name = values(name) -- dummy update
Notes:
you should not have double quotes around the %s placeholders
your original create table code had a unique constraint on column day; this does not fit with your question, as I understood it. In any case, you should put the unique constraint on the column (or set of columns) on which you want to avoid duplicates.

Related

automatically insert only one row in table after calculating the sum of column

i have 3 table in my database
CREATE TABLE IF NOT EXISTS depances (
id SERIAL PRIMARY KEY UNIQUE NOT NULL,
type VARCHAR NOT NULL,
nom VARCHAR,
montant DECIMAL(100,2) NOT NULL,
date DATE,
temp TIME)
CREATE TABLE IF NOT EXISTS transactions (
id SERIAL PRIMARY KEY UNIQUE NOT NULL,
montant DECIMAL(100,2),
medecin VARCHAR,
patient VARCHAR,
acte VARCHAR,
date_d DATE,
time_d TIME,
users_id INTEGER)
CREATE TABLE IF NOT EXISTS total_jr (
id SERIAL PRIMARY KEY UNIQUE NOT NULL,
total_revenu DECIMAL(100,2),
total_depance DECIMAL(100,2),
total_différence DECIMAL(100,2),
date DATE)
my idea is to insert defrent value in table depances and transaction using a GUI interface.
and after that adding the SUM of montant.depances in total_depance.total_jr
and the SUM of montant.transactions in total_revenu.total_jr where all rows have the same time
that's the easy part using this code
self.cur.execute( '''SELECT SUM(montant) AS totalsum FROM depances WHERE date = %s''',(date,))
result = self.cur.fetchall()
for i in result:
o = i[0]
self.cur_t = self.connection.cursor()
self.cur_t.execute( '''INSERT INTO total_jr(total_depance)
VALUES (%s)'''
, (o,))
self.connection.commit()
self.cur.execute( '''UPDATE total_jr SET total_depance = %s WHERE date = %s''',(o, date))
self.connection.commit()
But every time it adds a new row to the table of total_jr
How can i add thos value of SUM(montant) to the table where the date is the same every time its only put the value of sum in one row not every time it add a new row
The result should will be like this
id|total_revenu|total_depance|total_différence|date
--+------------+-------------+----------------+----
1 sum(montant1) value value 08/07/2020
2 sum(montant2) value value 08/09/2020
3 sum(montant3) value value 08/10/2020
but it only give me this result
id|total_revenu|total_depance|total_différence|date
--+------------+-------------+----------------+----
1 1 value value 08/07/2020
2 2 value value 08/07/2020
3 3 value value 08/7/2020
if there is any idea or any hit that will be hulpefull
You didn't mention which DBMS or SQL module you're using so I'm guessing MySQL.
In your process, run the update first and check how many rows were changed. If zero row changed, then insert a new row for that date.
self.cur.execute( '''SELECT SUM(montant) AS totalsum FROM depances WHERE date = %s''',(date,))
result = self.cur.fetchall()
for i in result:
o = i[0]
self.cur.execute( '''UPDATE total_jr SET total_depance = %s WHERE date = %s''',(o, date))
rowcnt = self.cur.rowcount # number of rows updated - psycopg2
self.connection.commit()
if rowcnt == 0: # no rows updated, need to insert new row
self.cur_t = self.connection.cursor()
self.cur_t.execute( '''INSERT INTO total_jr(total_depance, date)
VALUES (%s, %s)'''
, (o, date))
self.connection.commit()
I find a solution for anyone who need it in future first of all we need to update the table
create_table_total_jr = ''' CREATE TABLE IF NOT EXISTS total_jr (
id SERIAL PRIMARY KEY UNIQUE NOT NULL,
total_revenu DECIMAL(100,2),
total_depance DECIMAL(100,2),
total_différence DECIMAL(100,2),
date DATE UNIQUE)''' #add unique to the date
and after that we use the UPSERT and ON CONFLICT
self.cur_t.execute( ''' INSERT INTO total_jr(date) VALUES (%s)
ON CONFLICT (date) DO NOTHING''', (date,))
self.connection.commit()
with this code when there is an insert value with the same date it will do nothing
after that we update the value of the SUM
self.cur.execute( '''UPDATE total_jr SET total_depance = %s WHERE date = %s''',(o, date))
self.connection.commit()
Special thanks to Mike67 for his help
You do not need 2 database calls for this. As #Mike67 suggested UPSERT functionality is what you want. However, you need to send both date and total_depance. In SQL that becomes:
insert into total_jr(date,total_depance)
values (date_value, total_value
on conflict (date)
do update
set total_depance = excluded.total_depance;
or depending on input total_depance just the transaction value while on the table total_depance is an accumulation:
insert into total_jr(date,total_depance)
values (date_value, total_value
on conflict (date)
do update
set total_depance = total_depance + excluded.total_depance;
I believe your code then becomes something like (assuming the 1st insert is correct)
self.cur_t.execute( ''' INSERT INTO total_jr(date,total_depance) VALUES (%s1,$s2)
ON CONFLICT (date) DO UPDATE set total_depance = excluded.$s2''',(date,total_depance))
self.connection.commit()
But that could off, you will need to verify.
Tip of the day: You should change the column name date to something else. Date is a reserved word in both Postgres and the SQL Standard. It has predefined meanings based on its context. While you may get away with using it as a data name Postgres still has the right to change that at any time without notice, unlikely but still true. If so, then your code (and most code using that/those table(s)) fails, and tracking down why becomes extremely difficult. Basic rule do not use reserved words as data names; using reserved words as data or db object names is a bug just waiting to bite.

insert row if not exist in database

Hello how can i do to insert uniq rows without duplicate.
cursor.execute("CREATE TABLE IF NOT EXISTS tab1 (id varchar(36) primary key, cap1 VARCHAR(4), cap2 varchar(55), cap3 int(6), Version VARCHAR(4));")
id = uuid.uuid1()
id = str(id)
cursor.execute("INSERT IGNORE INTO tab1 (id, cap1, cap2, cap3, Version) VALUES (%s, %s, %s, %s, %s )", (vals))
I should not insert the third row while is the same as first row.
Hope im clear .
Thank you in advance,
The problem is that uuid() will always give a unique identifier and since id is a primary key, the row is getting inserted with duplicate values except for id column which is different always.
I think this link might answer your question or else, create a unique index on columns that you want to be unique.
Let me know if it helps!!

make MariaDB update from Python much faster

I have a python script that aggregates data from multiple sources to one, for technical reasons.
In this script, I create an employees table fills it with data and in a second step, fetch each employee's name/last name from another data source. My code is the following:
Create the table and fill it with data:
def createIdentite(mariaConnector, fmsConnector):
print('Creating table "Identite"...')
mariadbCursor = mariaConnector.cursor()
# verify we have the destination tables we need
print(' Checking for table Identite...')
if mariaCheckTableExists(mariadbConnector, 'Identite') == False:
print(' Table doesn\'t exist, creating it...')
mariadbCursor.execute("""
CREATE TABLE Identite (
PK_FP VARCHAR(50) NOT NULL,
LieuNaissance TEXT,
PaysNaissance TEXT,
Name TEXT,
LastName TEXT,
Nationalite TEXT,
PaysResidence TEXT,
PersonneAPrevenir TEXT,
Tel1_PAP TEXT,
Tel2_PAP TEXT,
CategorieMutuelle TEXT,
Ep1_MUTUELLE BOOLEAN,
TypeMutuelle BOOLEAN,
NiveauMutuelle BOOLEAN,
NiveauMutuelle2 BOOLEAN,
NiveauMutuelle3 BOOLEAN,
PartMutuelleSalarie FLOAT,
PartMutuelleSalarieOption FLOAT,
PRIMARY KEY (PK_FP)
)
""")
mariadbCursor.execute("CREATE INDEX IdentitePK_FP ON Identite(PK_FP)")
else:
# flush the table
print(' Table exists, flushing it...')
mariadbCursor.execute("DELETE FROM Identite")
# now fill it with fresh data
print(' Retrieving the data from FMS...')
fmsCursor = fmsConnector.cursor()
fmsCursor.execute("""
SELECT
PK_FP,
Lieu_Naiss_Txt,
Pays_Naiss_Txt,
Nationalite_Txt,
Pays_Resid__Txt,
Pers_URG,
Tel1_URG,
Tel2_URG,
CAT_MUTUELLE,
CASE WHEN Ep1_MUTUELLE = 'OUI' THEN 1 ELSE 0 END as Ep1_MUTUELLE,
CASE WHEN TYPE_MUT = 'OUI' THEN 1 ELSE 0 END as TYPE_MUT,
CASE WHEN Niv_Mutuelle IS NULL THEN 0 ELSE 1 END as Niv_Mutuelle,
CASE WHEN NIV_MUTUELLE[2] IS NULL THEN 0 ELSE 1 END as Niv_Mutuelle2,
CASE WHEN NIV_MUTUELLE[3] IS NULL THEN 0 ELSE 1 END as Niv_Mutuelle3,
PART_MUT_SAL,
PART_MUT_SAL_Option
FROM B_EMPLOYE
WHERE PK_FP IS NOT NULL
""")
print(' Transferring...')
#for row in fmsCursor:
insert = """INSERT INTO Identite (
PK_FP,
LieuNaissance,
PaysNaissance,
Nationalite,
PaysResidence,
PersonneAPrevenir,
Tel1_PAP,
Tel2_PAP,
CategorieMutuelle,
Ep1_MUTUELLE,
TypeMutuelle,
NiveauMutuelle,
NiveauMutuelle2,
NiveauMutuelle3,
PartMutuelleSalarie,
PartMutuelleSalarieOption
) VALUES (
%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s
)"""
values = fmsCursor.fetchall()
mariadbCursor.executemany(insert, values)
mariaConnector.commit()
print(' Inserted '+str(len(values))+' values')
return len(values)
And the part where I retrieve first name and last name:
def updateEmployeeNames(mariaConnector, mssqlConnector):
print("Updating employee names...")
mariadbCursor = mariaConnector.cursor()
mssqlCursor = mssqlConnector.cursor()
mssqlCursor.execute("SELECT Name, LastName, PK_FP FROM F_Person")
rows = mssqlCursor.fetchall()
query = """
UPDATE Identite
SET Name = %s, LastName = %s
WHERE PK_FP = %s
"""
mariadbCursor.executemany(query, rows)
mariadbConnector.commit()
As you might have guessed, the first function takes almost no time to execute (less that 2 seconds), where the second one take almost 20.
Python's not my strong suit, but there might be another way, the aim is to make it much faster.
I already tried adding values to createIdentite's each tuple before the executeMany, but Mysql connector won't let me do that.
Thanks a lot for your help.
So the UPDATE to the existing MariaDB table is the bottleneck, in which case it might be faster to do the update on a pandas DataFrame and then push the result the MariaDB table using pandas to_sql method. A simplified example would be ...
df_main = pd.read_sql_query(fms_query, fms_engine, index_col='PK_FP')
df_mssql = pd.read_sql_query(mssql_query, mssql_engine, index_col='PK_FP')
df_main.update(df_mssql)
df_main.to_sql('Identite', mariadb_engine, if_exists='replace',
dtype={'PK_FP': sqlalchemy.types.String(50)})
... where fms_query and mssql_query are the queries from your question. fms_engine, mssql_engine, and mariadb_engine would be SQLAlchemy Engine objects.
In all MySQL Python Drivers the execute_many is rewritten, since bulk operations are not supported in MySQL, they are supported only via binary protocol in MariaDB since 10.2, full support (including delete and update) was added later and is available in the lastest 10.2, 10.3 and 10.4 versions of MariaDB Server.
The python Driver is rewriting an insert query, iterates over the number of rows and transforms the statement to
INSERT INTO t1 VALUES (row1_id, row1_data), (row2_id, row2_data),....(rown_id, row_n data)
This is quite fast, but the SQL Syntax doesn't allow this for UPDATE or DELETE. In this case the Driver needs to execute the statement n times (n= number of rows), passing the values for each row in a single statment.
MariaDB binary protocol allows to prepare the statement, executing it by sending all data at once (The execute package also contains the data).
If C would be an alternative, take a look at the bulk unittests on Github repository of MariaDB Connector/C. Otherwise you have to wait, MariaDB will likey release it's own python Driver next year.
Create the index as you create the temp table.
These combined statements work: CREATE TABLE ... SELECT ...; and INSERT INTO table ... SELECT .... However, they may be difficult to perform from Python.
It is unclear whether you need the temp table at all.
Learn how to use JOIN to get information simultaneously from two tables.

SQLite with Python "Table has X columns but Y were supplied"

I have a python script that executes some simple SQL.
c.execute("CREATE TABLE IF NOT EXISTS simpletable (id integer PRIMARY KEY, post_body text, post_id text, comment_id text, url text);")
command = "INSERT OR IGNORE INTO simpletable VALUES ('%s', '%s', '%s', '%s')" % (comments[-1].post_body, comments[-1].post_id, comments[-1].comment_id,
comments[-1].url)
c.execute(command)
c.commit()
But when I execute it, I get an error
sqlite3.OperationalError: table simpletable has 5 columns but 4 values were supplied
Why is it not automatically filling in the id key?
In Python 3.6 I did as shown below and data was inserted successfully.
I used None for autoincrementing ID since Null was not found.
conn.execute("INSERT INTO CAMPAIGNS VALUES (?, ?, ?, ?)", (None, campaign_name, campaign_username, campaign_password))
The ID structure is as follows.
ID INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL
If you don't specify the target columns VALUES is expected to provide values for all columns and that you didn't do.
INSERT
OR IGNORE INTO simpletable
(text,
post_id,
comment_id,
text)
VALUES ('%s',
'%s',
'%s',
'%s');
Specifying the target columns is advisable in any case. The query won't break, if, for any reason, the order of the columns in the tables changes.
try to specify the columns names to ensure that the destination of values doesn't depends on order.
ex:
INTO simpletable
(text,
post_id,
comment_id,
text)
And if you wants the id column to be automatically incremented make sure to add Identity property on, or similar auto increment of your dbms.
ex:
CREATE TABLE IF NOT EXISTS simpletable (id integer PRIMARY KEY Identity(1,1),
and remember your script is not prepared to alter the table structure, only creation.
If you wrote code correctly delete your SQL file(name.db) and run your code again some time it solve the problem.
Imagine this is your code:
cursor.execute('''CREATE TABLE IF NOT EXISTS food(name TEXT , price TEXT)''')
cursor.execute('INSERT INTO food VALUES ("burger" , "20")')
connection.commit()
and you see an error like this:
table has 1 column but 2 values were supplied
it happened because for example you create a file with one column and then you modify your file to two column but you don't change the file name so compiler do not over write it because it exist.

Saving a pandas dataframe into sqlite with different column names?

I have a a sqlite database and dataframe with different column names but they refer to the same thing. E.g.
My database Cars has the car Id, Name and Price.
My dataframe df has the car Identity, Value and Name.
Additional : I would also like to add an additional 'date' column in the database that is not there in the df and insert it based on the current date.
I would like to save the df in the database so that Id = Identity, Price = Value, Name = Name and date = something specified by the user or current
So I cannot to the usual df.to_sql (unless i rename the column names, but i wonder if there is a better way to do this)
I tried first to sync the names just without the date column cur.execute("INSERT INTO Cars VALUES(?,?,?)",df.to_records(index=False))
However, the above does not work and gives me an error that the binding is incorrect. Plus, the order of the columns in the DB and DF is different
I'm not even sure how to handle the part where I have the extra date column, so any help would be great. Below is a sample code to generate all the values.
import sqlite3 as lite
con = lite.connect('test.db')
with con:
cur = con.cursor()
cur.execute("CREATE TABLE Cars(Id INT, Name TEXT, Price INT, date TEXT)")
df = pd.DataFrame({'Identity': range(5), 'Value': range(5,10), 'Name': range(10,15)})
You can do:
cur.executemany("INSERT INTO Cars (Id, Name, Price) VALUES(?,?,?)", list(df.to_records(index=False)))
Besides, you should specify the dtype attribute of your dataframe as numpy.int32 to meet the constraint of table 'Cars'
con = sqlite3.connect('test.db')
cur = con.cursor()
cur.execute("CREATE TABLE Cars(Id INT, Name TEXT, Price INT, date TEXT)")
df = pandas.DataFrame({'Identity': range(5), 'Value': range(5,10), 'Name': range(10,15)}, dtype=numpy.int32)
cur.executemany("INSERT INTO Cars (Id, Price, Name) VALUES(?,?,?)", list(df[['Identity', 'Value', 'Name']].to_records(index=False)))
query ="SELECT * from Cars"
cur.execute(query)
rows= cur.fetchall()
for row in rows:
print (row)

Categories