CSV to MSSQL using pymssql - python

the motive is to continuously look for new records in my CSV and insert the records to the mssql using pymssql library.
The CSV initially has 244 rows and I'm trying to insert 1 value and wants to dynamically insert the new row only when the script is ran with the scheduler.
I have the script which runs every 15 seconds to insert the values, but post inserting the values the first time, the second time the script throws 'Cannot insert duplicate key in object' as I have my first column DateID which is set a PK and terminates the statement from the first record itself, therefore doesn't insert the new row.
How do I encounter this.
Code:
def trial():
try:
for row in df.itertuples():
datevalue = datetime.datetime.strptime(row.OrderDate, format)
query= "INSERT INTO data (OrderDate, Region, City, Category) VALUES (%s,%s,%s,%s)"
cursor.execute(query, (datevalue, row.Region,row.City,row.Category))
print('"Values inserted')
conn.commit()
conn.close()
except Exception as e:
print("Handle error", e)
pass
schedule.every(15).seconds.do(trial)
Library used: pymssql
SQL: MSSQL server 2019

To avoid duplicate values, consider adjusting query to use EXCEPT clause (part of UNION and INTERSECT set operator family) against actual data. Also, consider using executemany by passing a nested list of all row/column data with DataFrame.to_numpy().tolist().
By the way if OrderDate column is a datetime type in data frame and database table, you do not need to re-format to string value.
def trial():
try:
query= (
"INSERT INTO data (OrderDate, Region, City, Category) "
"SELECT %s, %s, %s, %s "
"EXCEPT "
"SELECT OrderDate, Region, City, Category "
"FROM data"
)
vals = df[["OrderDate", "Region", "City", "Category"]].to_numpy()
vals = tuple(map(tuple, vals))
cur.executemany(query, vals)
print('Values inserted')
conn.commit()
except Exception as e:
print("Handle error", e)
finally:
cur.close()
conn.close()
For a faster, bulk insert, consider using a staging, temp table:
# CREATE EMPTY TEMP TABLE
query = "SELECT TOP 0 OrderDate, Region, City, Category INTO #pydata FROM data"
cur.execute(query)
# INSERT INTO TEMP TABLE
query= (
"INSERT INTO #pydata (OrderDate, Region, City, Category) "
"VALUES (%s, %s, %s, %s) "
)
vals = df[["OrderDate", "Region", "City", "Category"]].to_numpy()
vals = tuple(map(tuple, vals))
cur.execute("BEGIN TRAN")
cur.executemany(query, vals)
# MIGRATE TO FINAL TABLE
query= (
"INSERT INTO data (OrderDate, Region, City, Category) "
"SELECT OrderDate, Region, City, Category "
"FROM #pydata "
"EXCEPT "
"SELECT OrderDate, Region, City, Category "
"FROM data"
)
cur.execute(query)
conn.commit()
print("Values inserted")

Related

MySQL and Python Error : While I am trying to insert a row to my table from the Tkinter widget's(Entry, ComboBox, Radio Button), I'm getting an error

While I am trying to insert a row to my table from the Tkinter entries widgets (Entry, ComboBox, Radio Button), I'm getting the following errors:
ERROR 1136(21S01): Column count doesn't match value count at row 1.
Column names:
Course,
U_Id,
Subject,
Years,
Semester,
Student_Names,
Roll_No,
Gender,
DOB,
Email,
Mobile,
Address,
Photo
where U_Id is auto increment,
and values:
crsVar.get(), sbVar.get(), yrsVar.get(), smVar.get(),nVar.get(), rollVar.get(), genVar.get(), dVar.get(), eVar.get(), mobVar.get(), adVar.get(), rdVar.get()
Please help me out, this is my code
try:
conn = mysql.connector.connect(host="localhost", username="root",
password="Sahil#12", database="attendancesystem")
c = conn.cursor()
c.execute('insert into `students_detail` values(crsVar.get(), sbVar.get(),
yrsVar.get(), smVar.get(), nVar.get(), rollVar.get(),
genVar.get(), dVar.get(), eVar.get(), mobVar.get(),
adVar.get(), rdVar.get()))
conn.commit()
conn.close()
messagebox.showinfo("Success", "Students details has been submitted",
parent=self.master)
except Exception as e:
messagebox.showerror("Error", f"Due to {str(e)}")
The issue here is that you have 12 values and 13 columns due to the auto-incremented U_Id. Your best option is to specify the columns of the table manually ie:
c.execute('insert into `students_detail` (Course, Subject, Years, ...) values(crsVar.get(), sbVar.get(),
yrsVar.get(), ...))
(Note you do NOT include the U_Id field in either the columns or values).
Ther are alternatives: How to insert new row to database with AUTO_INCREMENT column without specifying column names? however specifying the column/value pairs is a more robust solution.
Try it this way :
(use %s to avoid SQL injection)
c.execute("INSERT INTO students_detail "
"(value1, value2, value3, value4, value5)"
"VALUES (%s, %s, %s, %s, %s) ",
(
widget[0].get(), #
widget[1].get(), #
widget[2].get(), #
widget[3].get(), #
widget[4].get(), #
)) ## this widget.get() is only an example, you will need to change all these values

importing single .csv into mysql with python

when running this code i am getting a Error while connecting to MySQL Not all parameters were used in the SQL statement
I have tried also to ingest these with another technique
import mysql.connector as msql
from mysql.connector import Error
import pandas as pd
empdata = pd.read_csv('path_to_file', index_col=False, delimiter = ',')
empdata.head()
try:
conn = msql.connect(host='localhost', user='test345',
password='test123')
if conn.is_connected():
cursor = conn.cursor()
cursor.execute("CREATE DATABASE timetheft")
print("Database is created")
except Error as e:
print("Error while connecting to MySQL", e)
try:
conn = msql.connect(host='localhost', database='timetheft', user='test345', password='test123')
if conn.is_connected():
cursor = conn.cursor()
cursor.execute("select database();")
record = cursor.fetchone()
print("You're connected to database: ", record)
cursor.execute('DROP TABLE IF EXISTS company;')
print('Creating table....')
create_contracts_table = """
CREATE TABLE company ( ID VARCHAR(40) PRIMARY KEY,
Company_Name VARCHAR(40),
Country VARCHAR(40),
City VARCHAR(40),
Email VARCHAR(40),
Industry VARCHAR(30),
Employees VARCHAR(30)
);
"""
cursor.execute(create_company_table)
print("Table is created....")
for i,row in empdata.iterrows():
sql = "INSERT INTO timetheft.company VALUES (%S, %S, %S, %S, %S,%S,%S,%S)"
cursor.execute(sql, tuple(row))
print("Record inserted")
# the connection is not auto committed by default, so we must commit to save our changes
conn.commit()
except Error as e:
print("Error while connecting to MySQL", e)
second technique I tried
LOAD DATA LOCAL INFILE 'path_to_file'
INTO TABLE copmany
FIELDS TERMINATED BY ';'
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES;
worked better but many errors. only 20% of rows ingested.
Finally here is an excerpt from the .csv (data is consistent throughout all 1K rows)
"ID";"Company_Name";"Country";"City";"Email";"Industry";"Employees"
217520699;"Enim Corp.";"Germany";"Bamberg";"posuere#diamvel.edu";"Internet";"51-100"
352428999;"Lacus Vestibulum Consulting";"Germany";"Villingen-Schwenningen";"egestas#lacusEtiambibendum.org";"Food Production";"100-500"
371718299;"Dictum Ultricies Ltd";"Germany";"Anklam";"convallis.erat#sempercursus.co.uk";"Primary/Secondary Education";"100-500"
676789799;"A Consulting";"Germany";"Andernach";"massa#etrisusQuisque.ca";"Government Relations";"100-500"
718526699;"Odio LLP";"Germany";"Eisenhüttenstadt";"Quisque.varius#euismod.org";"E-Learning";"11-50"
I fixed these issues to get the code to work:
make the number of placeholders in the insert statement equal to the number of columns
the placeholders should be lower-case '%s'
the cell delimiter appears to be a semi-colon, not a comma.
For simply reading a csv with ~1000 rows Pandas is overkill (and iterrows seems not to behave as you expect). I've used the csv module from the standard library instead.
import csv
...
sql = "INSERT INTO company VALUES (%s, %s, %s, %s, %s, %s, %s)"
with open("67359903.csv", "r", newline="") as f:
reader = csv.reader(f, delimiter=";")
# Skip the header row.
next(reader)
# For large files it may be more efficient to commit
# rows in batches.
cursor.executemany(sql, reader)
conn.commit()
If using the csv module is not convenient, the dataframe's itertuples method may be used to iterate over the data:
empdata = pd.read_csv('67359903.csv', index_col=False, delimiter=';')
for tuple_ in empdata.itertuples(index=False):
cursor.execute(sql, tuple_)
conn.commit()
Or the dataframe can be dumped to the database directly.
import sqlalchemy as sa
engine = sa.create_engine('mysql+mysqlconnector:///test')
empdata.to_sql('company', engine, index=False, if_exists='replace')

SQL Injection using Python

I have the following problem: I need a dynamic create statement, depending on what attributes my object has.
its following object:
class Table:
columns = []
def __init__(self, name, columns):
self.columns = columns
self.name = name
def columnsNumber(self) -> int:
return self.columns.__len__()
this is what the insert looks like:
sql = "INSERT INTO tableOverview (tableName, columns, datum) VALUES(%s, %s, CURRENT_TIMESTAMP);"
val = (table.name, table.columns.__len__())
await cursor.execute(sql, (val))
for x in table.columns:
sql = "ALTER TABLE %s ADD COLUMN %s VARCHAR(100) UNIQUE " % (table.name,x)
await cursor.execute(sql)
now I don't know, how to prevent a SQL injection.
For the ALTER TABLE statements you can quote the identifier names with backticks as described here.
for x in table.columns:
sql = "ALTER TABLE `%s` ADD COLUMN `%s` VARCHAR(100) UNIQUE " % (table.name,x)
await cursor.execute(sql)
In the insert statement, the code is already correctly using parameter substitution to ensure the inserted values are correctly quoted.
sql = "INSERT INTO tableOverview (tableName, columns, datum) VALUES(%s, %s, CURRENT_TIMESTAMP);"
val = (table.name, table.columns.len())
await cursor.execute(sql, val)

Python "INSERT INTO" vs. "INSERT INTO...ON DUPLICATE KEY UPDATE"

I am trying to use python to insert a record into a MySQL database and then update that record. To do this I have created 2 functions:
def insert_into_database():
query = "INSERT INTO pcf_dev_D.users(user_guid,username) VALUES (%s, %s) "
data = [('1234', 'user1234')]
parser = ConfigParser()
parser.read('db/db_config.ini')
db = {}
section = 'mysql'
if parser.has_section(section):
items = parser.items(section)
for item in items:
db[item[0]] = item[1]
else:
raise Exception('{0} not found in the {1} file'.format(section, filename))
try:
conn = MySQLConnection(**db)
cursor = conn.cursor()
cursor.executemany(query, data)
conn.commit()
except Error as e:
print('Error:', e)
finally:
# print("done...")
cursor.close()
conn.close()
This works fine and inserts 1234, user1234 into the db.
Now I want to update this particular user's username to '5678', so I have created another function:
def upsert_into_database():
query = "INSERT INTO pcf_dev_D.users(user_guid,username) " \
"VALUES (%s, %s) ON DUPLICATE KEY UPDATE username='%s'"
data = [('1234', 'user1234', 'user5678')]
parser = ConfigParser()
parser.read('db/db_config.ini')
db = {}
section = 'mysql'
if parser.has_section(section):
items = parser.items(section)
for item in items:
db[item[0]] = item[1]
else:
raise Exception('{0} not found in the {1} file'.format(section, 'db/db_config.ini'))
try:
conn = MySQLConnection(**db)
cursor = conn.cursor()
cursor.executemany(query, data)
conn.commit()
except Error as e:
print('Error:', e)
finally:
# print("done...")
cursor.close()
conn.close()
Which produces the following error:
Error: Not all parameters were used in the SQL statement
What's interesting is if I modify query and data to be:
query = "INSERT INTO pcf_dev_D.users(user_guid,username) " \
"VALUES (%s, %s) ON DUPLICATE KEY UPDATE username='user5678'"
data = [('1234', 'user1234')]
Then python updates the record just fine...what am I missing?
You included the 3rd parameter within single quotes in the update clause, therefore it is interpreted as part of a string, not as a placeholder for parameter. You must not enclose a parameter by quotes:
query = "INSERT INTO pcf_dev_D.users(user_guid,username) " \
"VALUES (%s, %s) ON DUPLICATE KEY UPDATE username=%s"
UPDATE
If you want to use the on duplicate key update clause with a bulk insert (e.g. executemany()), then you should not provide any parameters in the update clause because you can only have one update clause in the bulk insert statement. Use the values() function instead:
query = "INSERT INTO pcf_dev_D.users(user_guid,username) " \
"VALUES (%s, %s) ON DUPLICATE KEY UPDATE username=VALUES(username)"
In assignment value expressions in the ON DUPLICATE KEY UPDATE clause, you can use the VALUES(col_name) function to refer to column values from the INSERT portion of the INSERT ... ON DUPLICATE KEY UPDATE statement. In other words, VALUES(col_name) in the ON DUPLICATE KEY UPDATE clause refers to the value of col_name that would be inserted, had no duplicate-key conflict occurred. This function is especially useful in multiple-row inserts. The VALUES() function is meaningful only in the ON DUPLICATE KEY UPDATE clause or INSERT statements and returns NULL otherwise.

MySQL not accepting executemany() INSERT, running Python from Excel (datanitro)

I HAVE ADDED MY OWN ANSWER THAT WORKS BUT OPEN TO IMPROVEMENTS
After seeing a project at datanitro. I took on getting a connection to MySQL (they use SQLite) and I was able to import a small test table into Excel from MySQL.
Inserting new updated data from the Excel sheet was this next task and so far I can get one row to work like so...
import MySQLdb
db = MySQLdb.connect("xxx","xxx","xxx","xxx")
c = db.cursor()
c.execute("""INSERT INTO users (id, username, password, userid, fname, lname)
VALUES (%s, %s, %s, %s, %s, %s);""",
(Cell(5,1).value,Cell(5,2).value,Cell(5,3).value,Cell(5,4).value,Cell(5,5).value,Cell(5,6).value,))
db.commit()
db.close()
...but attempts at multiple rows will fail. I suspect either issues while traversing rows in Excel. Here is what I have so far...
import MySQLdb
db = MySQLdb.connect(host="xxx.com", user="xxx", passwd="xxx", db="xxx")
c = db.cursor()
c.execute("select * from users")
usersss = c.fetchall()
updates = []
row = 2 # starting row
while True:
data = tuple(CellRange((row,1),(row,6)).value)
if data[0]:
if data not in usersss: # new record
updates.append(data)
row += 1
else: # end of table
break
c.executemany("""INSERT INTO users (id, username, password, userid, fname, lname) VALUES (%s, %s, %s, %s, %s, %s)""", updates)
db.commit()
db.close()
...as of now, I don't get any errors, but my new line is not added (id 3). This is what my table looks like in Excel...
The database holds the same structure, minus id 3. There has to be a simpler way to traverse the rows and pull the unique content for INSERT, but after 6 hours trying different things (and 2 new Python books) I am going to ask for help.
If I run either...
print '[%s]' % ', '.join(map(str, updates))
or
print updates
my result is
[]
So this is likely not passing any data to MySQL in the first place.
LATEST UPDATE AND WORKING SCRIPT
Not exactly what I want, but this has worked for me...
c = db.cursor()
row = 2
while Cell(row,1).value != None:
c.execute("""INSERT IGNORE INTO users (id, username, password, userid, fname, lname)
VALUES (%s, %s, %s, %s, %s, %s);""",
(CellRange((row,1),(row,6)).value))
row = row + 1
Here is your problem:
while True:
if data[0]:
...
else:
break
Your first id is 0, so in the first iteration of the loop data[0] will be falsely and your loop will exit, without ever adding any data. What you probably ment is:
while True:
if data[0] is not None:
...
else:
break
I ended up finding a solution that gets me an Insert on new and allows for UPDATE of those that are changed. Not exactly a Python selection based on a single query, but will do.
import MySQLdb
db = MySQLdb.connect("xxx","xxx","xxx","xxx")
c = db.cursor()
row = 2
while Cell(row,1).value is not None:
c.execute("INSERT INTO users (id, username, password, \
userid, fname, lname) \
VALUES (%s, %s, %s, %s, %s, %s) \
ON DUPLICATE KEY UPDATE \
id=VALUES(id), username=VALUES(username), password=VALUES(password), \
userid=VALUES(userid), fname=VALUES(fname), lname=VALUES(lname);",
(CellRange((row,1),(row,6)).value))
row = row + 1
db.commit()
db.close()

Categories