Load data infile updating table on duplicate values from csv - python

I'm trying to update mysql table based on my csv data where sha1 in my csv should update or insert the suggestedname on duplicate. What part am I doing wrong here? Gives me error:
ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'where sha1=#col1' at line 1
Here is my table structure:
date_sourced, sha1, suggested, vsdt, trendx, falcon, notes, mtf
CSV structure:
SHA1,suggestedName
Code:
import mysql.connector
mydb = mysql.connector.connect(user='root', password='',
host='localhost',database='jeremy_db')
cursor = mydb.cursor()
query = "LOAD DATA INFILE %s IGNORE INTO TABLE jeremy_table_test FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (#col1,#col2) set suggested=#col2 where sha1=#col1"
cursor.execute(query, (fullPath))
mydb.commit()

LOAD DATA INFILE can not add condition in it. You can try to read file through pandas then insert value into table, but you need to set up an unique index on sha1 in advance. otherwise, my script will not work(reason).
import pandas as pd
import mysql.connector as mysql
path = "1.xls"
df = pd.read_excel(path)
_sha1 = df["SHA1"].tolist()
_suggestedName = df["suggestedName"].tolist()
conn = mysql.connect(user="xx",passwd="xx",db="xx")
cur = conn.cursor()
sql = """INSERT INTO jeremy_table_test (sha1,suggested) VALUES (%s,%s) ON DUPLICATE KEY UPDATE suggested=VALUES(suggested)"""
try:
cur.executemany(sql,list(zip(_sha1,_suggestedName)))
conn.commit()
except Exception as e:
conn.rollback()
raise e

Related

Uploading data with psycopg2 and python

With the next cmds I am trying to upload a csv file where columns are separated by tabs and sometimes null values can be assigned to a column.
conn = psycopg2.connect(host="localhost",
port="5432",
user="postgres",
password="somepwd",
database="mydb",
options="-c search_path=dbo")
...
cur = conn.cursor()
with open(opath, "r") as opath_file:
next(opath_file) # skip the header row
cur.copy_from(opath_file, table_name[3:], null='', columns=cols.split(','))
cols has a string with the column names separated by ','
the table with name table_name[3:] belongs to the dbo schema
This code runs, no error is reported but no data is uploaded. The owner of the db is postgres.
Any ideas?
Would you believe me if the problem was I needed to run
conn.commit()
after the cur.copy_from cmd?

Running select query on db for different variables using python

I am using python to establish db connection and reading csv file. For each line in csv i want to run a PostgreSQL query and get value corresponding to each line read.
DB connection and file reading is working fine. Also if i run query for hardcoded value then it works fine. But if i try to run query for each row in csv file using python variable then i am not getting correct value.
cursor.execute("select team from users.teamdetails where p_id = '123abc'")
Above query works fine.
but when i try it for multiple values fetched from csv file then i am not getting correct value.
cursor.execute("select team from users.teamdetails where p_id = queryPID")
Complete code for Reference:
import psycopg2
import csv
conn = psycopg2.connect(dbname='', user='', password='', host='', port='')
cursor = conn.cursor()
with open('playerid.csv','r') as csv_file:
csv_reader = csv.reader(csv_file)
for line in csv_reader:
queryPID = line[0]
cursor.execute("select team from users.teamdetails where p_id = queryPID")
team = cursor.fetchone()
print (team[0])
conn.close()
DO NOT concatenate the csv data. Use a parameterised query.
Use %s inside your string, then pass the additional variable:
cursor.execute('select team from users.teamdetails where p_id = %s', (queryPID,))
Concatenation of text leaves your application vulnerable to SQL injection.
https://www.psycopg.org/docs/usage.html

Store XML File into MS SQL DB using Python

My MSSQL DB table contains following structure:
create table TEMP
(
MyXMLFile XML
)
Using Python, I a trying to load locally stored .XML file into MS SQL DB (No XML Parsing Required)
Following is Python code:
import pyodbc
import xlrd
import xml.etree.ElementTree as ET
print("Connecting..")
# Establish a connection between Python and SQL Server
conn = pyodbc.connect('Driver={SQL Server};'
'Server=TEST;'
'Database=test;'
'Trusted_Connection=yes;')
print("DB Connected..")
# Get XMLFile
XMLFilePath = open('C:HelloWorld.xml')
# Create Table in DB
CreateTable = """
create table test.dbo.TEMP
(
XBRLFile XML
)
"""
# execute create table
cursor = conn.cursor()
try:
cursor.execute(CreateTable)
conn.commit()
except pyodbc.ProgrammingError:
pass
print("Table Created..")
InsertQuery = """
INSERT INTO test.dbo.TEMP (
XBRLFile
) VALUES (?)"""
# Assign values from each row
values = (XMLFilePath)
# Execute SQL Insert Query
cursor.execute(InsertQuery, values)
# Commit the transaction
conn.commit()
# Close the database connection
conn.close()
But the code is storing the XML path in MYXMLFile column and not the XML file. I referred lxml library and other tutorials. But, I did not encountered straight forward approach to store file.
Please can anyone help me with it. I have just started working on Python.
Here, is solution to load .XML file directly into MS SQL SB using Python.
import pyodbc
import xlrd
import xml.etree.ElementTree as ET
print("Connecting..")
# Establish a connection between Python and SQL Server
conn = pyodbc.connect('Driver={SQL Server};'
'Server=TEST;'
'Database=test;'
'Trusted_Connection=yes;')
print("DB Connected..")
# Get XMLFile
XMLFilePath = open('C:HelloWorld.xml')
x = etree.parse(XBRLFilePath) # Updated Code line
with open("FileName", "wb") as f: # Updated Code line
f.write(etree.tostring(x)) # Updated Code line
# Create Table in DB
CreateTable = """
create table test.dbo.TEMP
(
XBRLFile XML
)
"""
# execute create table
cursor = conn.cursor()
try:
cursor.execute(CreateTable)
conn.commit()
except pyodbc.ProgrammingError:
pass
print("Table Created..")
InsertQuery = """
INSERT INTO test.dbo.TEMP (
XBRLFile
) VALUES (?)"""
# Assign values from each row
values = etree.tostring(x) # Updated Code line
# Execute SQL Insert Query
cursor.execute(InsertQuery, values)
# Commit the transaction
conn.commit()
# Close the database connection
conn.close()

Python PYDOBC Insert Into SQL Server DB with Parameters

I am currently trying to use pyodbc to insert data from a .csv into an Azure SQL Server database. I found a majority of this syntax on Stack Overflow, however for some reason I keep getting one of two different errors.
1) Whenever I use the following code, I get an error that states 'The SQL contains 0 parameter markers, but 7 parameters were supplied'.
import pyodbc
import csv
cnxn = pyodbc.connect('driver', user='username', password='password', database='database')
cnxn.autocommit = True
cursor = cnxn.cursor()
csvfile = open('CSV File')
csv_data = csv.reader(csvfile)
SQL="insert into table([Col1],[Col2],[Col3],[Col4],[Col5],[Col6],[Col7]) values ('?','?','?','?','?','?','?')"
for row in csv_data:
cursor.execute(SQL, row)
time.sleep(1)
cnxn.commit()
cnxn.close()
2) In order to get rid of that error, I am defining the parameter markers by adding '=?' to each of the columns in the insert statement (see code below), however this then gives the following error: ProgrammingError: ('42000'"[42000] [Microsoft] [ODBC SQL Server Driver][SQL Server] Incorrect syntax near '=').
import pyodbc
import csv
cnxn = pyodbc.connect('driver', user='username', password='password', database='database')
cnxn.autocommit = True
cursor = cnxn.cursor()
csvfile = open('CSV File')
csv_data = csv.reader(csvfile)
SQL="insert into table([Col1]=?,[Col2]=?,[Col3]=?,[Col4]=?,[Col5]=?,[Col6]=?,[Col7]=?) values ('?','?','?','?','?','?','?')"
for row in csv_data:
cursor.execute(SQL, row)
time.sleep(1)
cnxn.commit()
cnxn.close()
This is the main error I am haveing trouble with, I have searched all over Stack Overflow and can't seem to find a solution. I know this error is probably very trivial, however I am new to Python and would greatly appreciate any advice or help.
Since SQL server can import your entire CSV file with a single statement this is a reinvention of the wheel.
BULK INSERT my_table FROM 'CSV_FILE'
WITH ( FIELDTERMINATOR=',', ROWTERMINATOR='\n');
If you want to persist with using python, just execute the above query with pyodbc!
If you would still prefer to execute thousands of statements instead of just one
SQL="insert into table([Col1],[Col2],[Col3],[Col4],[Col5],[Col6],[Col7]) values (?,?,?,?,?,?,?)"
note that the ' sorrounding the ? shouldn't be there.
# creating column list for insertion
colsInsert = "["+"],[".join([str(i) for i in mydata.columns.tolist()]) +']'
# Insert DataFrame recrds one by one.
for i,row in mydata.iterrows():
sql = "INSERT INTO Test (" +colsInsert + ") VALUES (" + "%?,"*(len(row)-1) + "%?)"
cursor.execute(sql, tuple(row))
# cursor.execute(sql, tuple(row))
# the connection is not autocommitted by default, so we must commit to save our changes
c.commit()

Insert data from file into database

I have a .sql file with multiple insert statements ( 1000 + ) and I want to run the statements in this file into my Oracle database.
For now, im using a python with odbc to connect to my database with the following:
import pyodbc
from ConfigParser import SafeConfigParser
def db_call(self, cfgFile, sql):
parser = SafeConfigParser()
parser.read(cfgFile)
dsn = parser.get('odbc', 'dsn')
uid = parser.get('odbc', 'user')
pwd = parser.get('odbc', 'pass')
try:
con = pyodbc.connect('DSN=' + dsn + ';PWD=' + pwd + ';UID=' + pwd)
cur = con.cursor()
cur.execute(sql)
con.commit()
except pyodbc.DatabaseError, e:
print 'Error %s' % e
sys.exit(1)
finally:
if con and cur:
cur.close()
con.close()
with open('theFile.sql','r') as f:
cfgFile = 'c:\\dbinfo\\connectionInfo.cfg'
#here goes the code to insert the contents into the database using db_call_many
statements = f.read()
db_call(cfgFile,statements)
But when i run it i receive the following error:
pyodbc.Error: ('HY000', '[HY000] [Oracle][ODBC][Ora]ORA-00911: invalid character\n (911) (SQLExecDirectW)')
But all the content of the file are only:
INSERT INTO table (movie,genre) VALUES ('moviename','horror');
Edit
Adding print '<{}>'.format(statements) before the db_db_call(cfgFile,statements) i get the results(100+):
<INSERT INTO table (movie,genre) VALUES ('moviename','horror');INSERT INTO table (movie,genre) VALUES ('moviename_b','horror');INSERT INTO table (movie,genre) VALUES ('moviename_c','horror');>
Thanks for your time on reading this.
Now it's somewhat clarified - you have a lot of separate SQL statements such as INSERT INTO table (movie,genre) VALUES ('moviename','horror');
Then, you're effectively after cur.executescript() than the current state (I have no idea if pyodbc supports that part of the DB API, but any reason, you can't just execute an execute to the database itself?
When you read a file using read() function, the end line (\n) at the end of file is read too. I think you should use db_call(cfgFile,statements[:-1]) to eliminate the end line.

Categories