Python/MySQL - LOAD DATA LOCAL INFILE - python

I am using the mysql connector for Python and I'm trying to run the following SQL statement via Python (Windows) - It's a .csv file:
sql1 = ('SET GLOBAL local_infile = "ON";')
cursor.execute(sql1)
sql2 = ('LOAD DATA LOCAL INFILE "' + path[1:-1] + '" INTO TABLE mytable COLUMNS TERMINATED BY "," LINES TERMINATED BY "\\r\\n" (COL0, COL1, COL2, COL3, COL4, COL5, COL6) SET COL7 = "'some_data'";')
cursor.execute(sql2)
but when I try to execute I receive the following exception:
1148 (42000): The used command is not allowed with this MySQL version
If I try to execute LOAD DATA LOCAL INFILE on mysql console, everything runs fine.

Load Data Infile is disabled by default with Connector/Python
while creating the connection set LOCAL_FILES client flag like this:
from mysql.connector.constants import ClientFlag
conn = mysql.connector.connect(...., client_flags=[ClientFlag.LOCAL_FILES])

There are a lot of security issues with LOAD DATA, so the server is really picky. Are you logging in to localhost, not the public IP of the server? Often one IP will be granted LOAD DATA, but the other won't.
See the fine manual

You could iterate through each line of the file, inserting each as a row. This would be easy since you already mentioned each column is delineated by , and each row is delineated by newlines.
For example, assuming your table mytable had 8 string columns, (COL0 to COL7):
input_file = open(path[1:-1], 'r')
#Loop through the lines of the input file, inserting each as a row in mytable
for line_of_input_file in input_file:
values_from_file = line_of_input_file.split(',', 1) #get the columns from the line read from the file
if(len(values_from_file) == 7): #ensure that 7 columns accounted for on this line of the file
sql_insert_row = "INSERT INTO mytable VALUES (" + values_from_file[0] + "," + values_from_file[1] + "," + values_from_file[2] + "," + values_from_file[3] + "," + values_from_file[4] + "," + values_from_file[5] + "," + values_from_file[6] + "," + some_data + ");"
cursor.execute(sql_insert_row)
input_file.close()

With the MySQLdb driver:
import MySQLdb
from MySQLdb.constants import CLIENT
then along with other arguments to MySQLdb.connect() , pass client_flag=CLIENT.LOCAL_FILES
Discovered by studying the source, and then trying it out.

Related

Jupyter - Using psycogp2 in Python and concatenating problems

Im trying to concatenate the date and code entity in order to read a database, but I can't. Here is my code. Its comment a line where my code works but im not concating the date and code entity. Please, I want to help me changing my code.
# PAQUETES
import os
import pandas as pd
import psycopg2 as pg2
entidadinput="00022"
fechainput="202202"
tabla="admcrcd.v_rcd_anexo6"
variables="*"
userRCD=os.getenv('JUPYTERHUB_USER')
con = pg2.connect(user=userRCD,
password="post",
host="172.XX.ABC.EF",
port="3456",
database="BDCreditos")
with con:
cur=con.cursor() #vesel
#cur.execute("SELECT " + variables + " from " + tabla + " WHERE fecha='202202'"+" AND cdg_emp='00022'") # This works
cur.execute("SELECT " + variables + " from " + tabla + " WHERE fecha= '" + fechainput +"' AND cdg_emp='" + entidadinput + "'") # This doesnt work
#cur.execute("SELECT * from admcrcd.v_rcd_anexo6 WHERE fecha=202204")
version = cur.fetchone()[0]
print(version)
BD = cur.fetchall()
#for row in rows:
# print(f"{row[0]} {row[1]} {row[2]}")
con.close()
First, the second line should import os. I regenerated both connection strings and the only difference I found is that the string not working has one more space before fecha=, which should not cause the problem normally. But otherwise, the two strings are the same. You can try that (code is as below).
cur.execute("SELECT " + variables + " from " + tabla + " WHERE fecha='" + fechainput +"' AND cdg_emp=" + entidadinput) # This does not work

Convert the whole (large) schema into hdf5

I am trying to export the whole database schema (around 20 GB) using postgreSQL query to create a final unique hdf5 file.
Because this size don't fit on my computers memory, I am using chuncks argument.
First I use this function to establish conection:
def make_connectstring(prefix, db, uname, passa, hostname, port):
"""return an sql connectstring"""
connectstring = prefix + "://" + uname + ":" + passa + "#" + hostname + \
":" + port + "/" + db
return connectstring
Then I created a temporary folder to save each of hdf5 file.
def query_to_hdf5(connectstring, query, verbose=False, chunksize=50000):
engine = sqlalchemy.create_engine(connectstring,
server_side_cursors=True)
# get the data to temp chunk filese
i = 0
paths_chunks = []
with tempfile.TemporaryDirectory() as td:
for df in pd.read_sql_query(sql=query, con=engine, chunksize=chunksize):
path = td + "/chunk" + str(i) + ".hdf5"
df.to_hdf(path, key='data')
print(path)
if verbose:
print("wrote", path)
paths_chunks.append(path)
i+=1
connectstring = make_connectstring(prefix, db, uname, passa, hostname, port)
query = "SELECT * FROM public.zz_ges"
df = query_to_hdf5(connectstring, query)
What is the best way to merge all these files into 1 single file that represents the whole dataframe ?
I tried something like this :
df = pd.DataFrame()
print(path)
for path in paths_chunks:
df_scratch = pd.read_hdf(path)
df = pd.concat([df, df_scratch])
if verbose:
print("read", path)
However, the memory goes up very fast. I need something that could be more efficient.
Update:
def make_connectstring(prefix, db, uname, passa, hostname, port):
"""return an sql connectstring"""
connectstring = prefix + "://" + uname + ":" + passa + "#" + hostname + \
":" + port + "/" + db
return connectstring
def query_to_df(connectstring, query, verbose=False, chunksize=50000):
engine = sqlalchemy.create_engine(connectstring,
server_side_cursors=True)
# get the data to temp chunk filese
with pd.HDFStore('output.h5', 'w') as store:
for df in pd.read_sql_query(sql=query, con=engine, chunksize=chunksize):
store.append('data', df)
I'd suggest using a HDFStore directly, that way you can append chunks as you get them from the database, something like:
with pd.HDFStore('output.h5', 'w') as store:
for df in pd.read_sql_query(sql=query, con=engine, chunksize=chunksize):
store.append('data', df)
this is based around your existing code so isn't complete, let me know if it isn't clear
note I'm opening the store in w mode so it'll delete the file every time. otherwise append will just keep adding the same rows to the end of the table. alternatively you could remove the key first
when you open the store you also get lots of options like compression to use but it doesn't seem to be well documented, help(pd.HDFStore) describes complevel and complib for me

Inserting a Python List in Python into an SQL Table

I have code that reads from a socket and creates a list called i. The socket is read, the list is created from the socket, the list gets printed then deleted. This gets repeated in a while true loop. Instead of just printing the list, I'd like to insert it into a table in my DB. I already have the cursor and connection established in the code. I was messing around with some other stuff but keep getting errors. I would like to use REPLACE INTO instead of INSERT INTO. Thank you very much for the help.
This is an example of what the list will look like.
'Dec-11-2018,','12:28:43,','iPhone,','alpha,','lib,','lib,','(45.67.67)\n']
My table name is StudentPrototype and it has 7 columns
Columns - (Date,Time,Device,ID,AP,APGroup,MACAdd)
#!/bin/python
import socket
import os, os.path
import MySQLdb as mdb
con = mdb.connect('localhost', 'user', 'pass',
'StudentTacker');
cur = con.cursor()
cur.execute("SELECT VERSION()")
i = []
def ParseArray(l): #parses line in socke
i.append(l.split()[+0] + '-') # Gets Day
i.append(l.split()[+1] + '-') # Gets Month
i.append(l.split()[+3] + ',') # Gets Year
i.append(l.split()[+2] + ',') # Gets Time
i.append(l.split()[-2] + ',') # Gets Device
i.append(l.split()[+9] + ',') # Gets ID
i.append(l.split()[+18] + ',') # Gets AP
i.append(l.split()[+19] + ',') # Gets AP Group
i.append(l.split()[+16] + '\n') # Gets MAC
# This is where I want to REPLACE INTO my table called StudentTest using list i
print(i)
del i[:]
if os.path.exists("/-socket"):
os.remove("/-socket")
sock = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
sock.bind("/home/socket")
infile = sock.makefile('r')
while True:
l = sock.recv(4096).decode()
ParseArray(l)
Update: I tried another method that I found on this site for how to insert python lists in a db.
Here is my new code that I used inside my function:
def ParseArray(l): #parses line in socke
i.append(l.split()[+0] + '-') # Gets Day
i.append(l.split()[+1] + '-') # Gets Month
i.append(l.split()[+3] + ',') # Gets Year
i.append(l.split()[+2] + ',') # Gets Time
i.append(l.split()[-2] + ',') # Gets Device
i.append(l.split()[+9] + ',') # Gets FSU ID
i.append(l.split()[+18] + ',') # Gets AP
i.append(l.split()[+19] + ',') # Gets AP Group
i.append(l.split()[+16] + '\n') # Gets MAC
#insert line into db else by primary key mac adresss
#update line to db if mac adress doesn't exist
params = ['?' for item in i]
sql = 'REPLACE INTO SocketTest (month, day, year, time, device,
Id, ap, ApGroup, MacAdd) VALUES (%s); ' % ', '.join(params)
cur.execute(sql, i)
Using that I'm getting an error:
Traceback (most recent call last):
File "./UnixSocketReader9.py", line 55, in <module>
ParseArray(l)
File "./UnixSocketReader9.py", line 28, in ParseArray
cur.execute(sql, i)
File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line
187, in execute
query = query % tuple([db.literal(item) for item in args])
TypeError: not all arguments converted during string formatting

Reading .sql File in for Execution in Python (pymysql)

I'm attempting to create a multiscript tool, that will take an argument of a .sql file and execute it.
I've set up a simple test, just executing on one database, however the syntax is giving me issues every time.
DELIMITER $$
CREATE DEFINER=`a_user`#`%` PROCEDURE `a_procedure`(
IN DirectEmployeeID TEXT,
IN StartRange DATE,
IN EndRange DATE
)
BEGIN
SELECT aColumn
WHERE thisThing = 1;
END$$
DELIMITER ;
To be clear, this script has been tested, and works when passed like :
mysql -uuser -p -hhost -Pport databaseName < file.sql
and also works through mysql workbench.
I saw this type of solution on another site:
with conn.cursor() as cursor:
f = sys.argv[1]
file = open(f, 'r')
sql = " ".join(file.readlines())
cursor.execute(sql)
which gives me a MySQL syntax error:
pymysql.err.ProgrammingError: (1064, u"You have an error in your SQL syntax;
check the manual that corresponds to your MySQL server version for the right
syntax to use near 'DELIMITER $$\n CREATE DEFINER=`a_user`#`%` PROCEDURE
`MyCommissionsDirect`(\n \tIN ' at line 1")
as you can see, there are newline characters within the script that mysql isn't liking.
I then tried this:
with conn.cursor() as cursor:
f = sys.argv[1]
file = open(f, 'r')
sql = ''
line = file.readline()
while line:
sql += ' ' + line.strip('\n').strip('\t')
line = file.readline()
print sql
cursor.execute(sql)
and get another syntax issue, the print shows that this is all one line, which is not working in mysqlworkbench. doesn't even try to execute it, which is strange.
When I put the DELIMETER $$ on a separate line first, it executes in mysqlworkbench.
This is one of those situations where I feel like I may be making this more and more complicated. I'm very surprised pymysql doesn't have a way of simply executing a sql file directly. I'm weary of trying to do string manipulation and get this working for this particular file, because then the dream of making this tool ambiguous and reusable kind of goes out the door.
Am I going about this in the complete incorrect way?
Thanks!
Here is my solution for using an SQL file with PyMySQL. The files contain many requests ended by ; which is used to split requests in a list. So beware of the missing ; in the list.
I decided to add the missing ; not in the function to spar a for loop. Maybe there is a better way.
create-db-loff.sql :
DROP DATABASE IF EXISTS loff;
CREATE DATABASE loff CHARACTER SET 'utf8';
USE loff;
CREATE TABLE product(
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
`code` BIGINT UNSIGNED NOT NULL UNIQUE,
`name` VARCHAR(200),
`nutrition_grades` VARCHAR(1)
);
CREATE TABLE category(
`id`INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
`name` VARCHAR(200)
);
CREATE TABLE asso_prod_cat(
`category_id` INT UNSIGNED NOT NULL,
`product_id` INT UNSIGNED NOT NULL,
CONSTRAINT `fk_asso_prod_cat_category`
FOREIGN KEY(category_id)
REFERENCES category(id)
ON DELETE CASCADE,
CONSTRAINT `fk_asso_prod_cat_product`
FOREIGN KEY(product_id)
REFERENCES product(id)
ON DELETE CASCADE
);
db.py :
DB_CONFIG = {
'host': 'localhost',
'user': 'loff',
'pass': 'loff',
'db': 'loff',
'char': 'utf8',
'file': 'create-db-loff.sql'
}
def get_sql_from_file(filename=DB_CONFIG['file']):
"""
Get the SQL instruction from a file
:return: a list of each SQL query whithout the trailing ";"
"""
from os import path
# File did not exists
if path.isfile(filename) is False:
print("File load error : {}".format(filename))
return False
else:
with open(filename, "r") as sql_file:
# Split file in list
ret = sql_file.read().split(';')
# drop last empty entry
ret.pop()
return ret
request_list = self.get_sql_from_file()
if request_list is not False:
for idx, sql_request in enumerate(request_list):
self.message = self.MSG['request'].format(idx, sql_request)
cursor.execute(sql_request + ';')
DELIMITER is command used by a MySQL interpreter, such as the command line or Workbench, and not an actual MySQL command.
I ended up working in some logic in my Python application to disable execution of MySQL queries when DELIMITER has been defined, then to execute when DELIMITER has been defined again:
import MySQLdb
import re
file = 'somesql.sql'
conn = MySQLdb.Connection(mysqlserver, mysqluser, mysqlpass, mysqldb)
curs = conn.cursor()
ignorestatement = False # by default each time we get a ';' that's our cue to execute.
statement = ""
for line in open(file):
if line.startswith('DELIMITER'):
if not ignorestatement:
ignorestatement = True # disable executing when we get a ';'
continue
else:
ignorestatement = False # re-enable execution of sql queries on ';'
line = " ;" # Rewrite the DELIMITER command to allow the block of sql to execute
if re.match(r'--', line): # ignore sql comment lines
continue
if not re.search(r'[^-;]+;', line) or ignorestatement: # keep appending lines that don't end in ';' or DELIMITER has been called
statement = statement + line
else: # when you get a line ending in ';' then exec statement and reset for next statement providing the DELIMITER hasn't been set
statement = statement + line
# print "\n\n[DEBUG] Executing SQL statement:\n%s" % (statement)
try:
curs.execute(statement)
conn.commit()
statement = ""
except curs.Error, e:
print(file + " - Error applying (" + str(e) + ")\nTerminating.")
sys.exit(1)
It's a bit hacky, but seems to work well enough.
Most SQL files contain interpreter commands such as DELIMITER that make passing the commands through to pymysql somewhat difficult, this code snippet allows you to separate out the statements in the sql file into a list for sequential execution.
def parse_sql(filename):
data = open(filename, 'r').readlines()
stmts = []
DELIMITER = ';'
stmt = ''
for lineno, line in enumerate(data):
if not line.strip():
continue
if line.startswith('--'):
continue
if 'DELIMITER' in line:
DELIMITER = line.split()[1]
continue
if (DELIMITER not in line):
stmt += line.replace(DELIMITER, ';')
continue
if stmt:
stmt += line
stmts.append(stmt.strip())
stmt = ''
else:
stmts.append(line.strip())
return stmts
Usage example:
conn = pymysql.connect('test')
stmts = parse_sql('my_sql_file.sql')
with conn.cursor() as cursor:
for stmt in stmts:
cursor.execute(stmt)
conn.commit()
It's simple code
import pymysql
class ScriptRunner:
def __init__(self, connection, delimiter=";", autocommit=True):
self.connection = connection
self.delimiter = delimiter
self.autocommit = autocommit
def run_script(self, sql):
try:
script = ""
for line in sql.splitlines():
strip_line = line.strip()
if "DELIMITER $$" in strip_line:
self.delimiter = "$$"
continue
if "DELIMITER ;" in strip_line:
self.delimiter = ";"
continue
if strip_line and not strip_line.startswith("//") and not strip_line.startswith("--"):
script += line + "\n"
if strip_line.endswith(self.delimiter):
if self.delimiter == "$$":
script = script[:-1].rstrip("$") + ";"
cursor = self.connection.cursor()
print(script)
cursor.execute(script)
script = ""
if script.strip():
raise Exception("Line missing end-of-line terminator (" + self.delimiter + ") => " + script)
if not self.connection.get_autocommit():
self.connection.commit()
except Exception:
if not self.connection.get_autocommit():
self.connection.rollback()
raise
if __name__ == '__main__':
connection = pymysql.connect(host="127.0.0.1", user="root", password="root", db="test", autocommit=True)
sql = ""
ScriptRunner(connection).run_script(sql)

Python-xml parse using beautifulsoup4 and writing the output to mysql db - unicode error

I'm trying to parse an xml file using beautifulsoup4.
IDE : LICLIPSE
Python version: 2.7
xml encoding : utf-8
Sample xml file : http://pastebin.com/RhjvyKDN
Below is the code I used to parse the xml files and write the extracted information to a local mysql database.
from bs4 import BeautifulSoup
import pymysql
import os, os.path
#strips apostrophes from the text and then just adds them at the beginning and end for the query
def apostro(text):
text= text.replace("'","")
text= text.replace(",","")
text = "'"+text+"'"
return text
#sets up the MYSQL connection
conn = pymysql.connect(host='127.0.0.1', user='xxxx', passwd='xxxx', db='mysql', port= 3306 )
cur = conn.cursor()
#drop all of the previous values from the database
cur.execute("DELETE FROM db WHERE title is not null")
conn.commit()
#loop through all of the files
for root, _, files in os.walk("C:/usc/xml"):
for f in files:
#j is a counter for how many sections we have processed
j=0
#fullpath is the location of the file we're parsing
fullpath = os.path.join(root, f)
print(fullpath)
#open file using BeautifulSoup
soup = BeautifulSoup(open(""+fullpath+""), 'xml')
sec = soup.find_all("section", {"style" : "-uslm-lc:I80"})
t = soup.main.title
t_num = t.num['value']
#if not clauses are needed in case there is a blank, otherwise an error is thrown
if not t.heading.text:
t_head = ''
else:
t_head = t.heading.text.encode('ascii', 'ignore').encode("UTF-8")
for element in sec:
if not element.num['value']:
section = ''
else:
section = element.num['value'].encode('ascii', 'ignore').encode("UTF-8")
if not element.heading:
s_head = ''
else:
s_head = element.heading.text.encode('ascii', 'ignore').encode("UTF-8")
if not element.text:
s_text = ''
else:
s_text = element.text.encode('ascii', 'ignore').encode("UTF-8")
#inserttest is the sql command that 'cur' executes. counter is printed every time a section is written to let me know the program is still alive
inserttest = "insert into deadlaws.usc_new (title, t_head, section, s_head, s_text) values (" + t_num + "," + apostro(t_head) + "," + apostro(section) + "," + apostro(s_head) + "," + apostro(s_text) +")"
j=j+1
cur.execute( inserttest)
conn.commit()
print(fullpath + " " +str(j))
conn.commit()
cur.close()
conn.close()
Everything went well until I noticed that the program ignores the hyphens '-' in the section numbers which makes the entire activity wrong.
I know I have used 'ignore' in the encode statement, but a hyphen '-' is a legitimate character in ascii, right? Shouldn't it be writing the character to the db instead of ignoring it?
I did a lot of reading on SO and elsewhere.
I've tried including from_encoding="utf-8" in the soup statement, 'xmlrefreplace' in the encode() statement and other methods, which have resulted in the below output : it writes this a– (some special unicode character) instead of a hyphen '-' to the database.
Sample output:
The data is huge and I'm afraid there could be other characters like - that are being ignored by the program. It's ok if it ignores special characters from the t_head, s_head and s_text fields as they are text but not the section column.
Any help in resolving this issue would be greatly appreciated.
Don't encode, the MySQL library is perfectly capable of inserting Unicode text into the database directly. Use SQL parameters, not string interpolation, and specify the character set to use when connecting to the database:
conn = pymysql.connect(host='127.0.0.1', user='xxxx', passwd='xxxx',
db='mysql', port=3306,
charset='utf8')
Don't encode:
t_head = t.heading.text or ''
for element in sec:
if not element.num['value']:
section = ''
else:
section = element.num.get('value', '')
s_head = element.heading.text or ''
s_text = element.text or ''
inserttest = "insert into deadlaws.usc_new (title, t_head, section, s_head, s_text) values (?, ?, ?, ?)"
cur.execute(inserttest, (t_num, t_head, section, s_head, s_text))

Categories