empty set after load data in file mysql - python

I'm using pymysql to load a large csv file into a database, because of memory limitations im using load infile rather than insert. however after the code completes when i query the server it for the data in the table it returns an empty set.
import pymysql
conn = pymysql.connect(host = 'localhost', port = 3306, user = 'root', passwd = '', local_infile = True)
cur = conn.cursor()
cur.execute("CREATE SCHEMA IF NOT EXISTS `test`DEFAULT "
"CHARACTER SET utf8 COLLATE utf8_unicode_ci ;")
cur.execute("CREATE TABLE IF NOT EXISTS "
"`test`.`scores` ( `date` DATE NOT NULL, "
"`name` VARCHAR(15) NOT NULL,"
"`score` DECIMAL(10,3) NOT NULL);")
conn.commit()
def push(fileName = '/home/pi/test.csv', tableName = '`test`.`scores`'):
push = """LOAD DATA LOCAL INFILE "%s" INTO TABLE %s
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(date, name, score);""" % (fileName, tableName)
cur.execute(push)
conn.commit()
push()
I get some truncation warnings but no other errors or warnings to work off of. any ideas on how to fix this?

I did a few things to fix this, First I changed the config files for my sql server to allow load infile, following this MySQL: Enable LOAD DATA LOCAL INFILE. Then the problem was with the line,
LINES TERMINATED BY '\r\n'
the fix was to change it to
LINES TERMINATED BY '\n'
after that the script runs fine and is significantly faster than inserting row by row

Related

How to load csv into an empty SQL table, using python?

So, I have this empty table which I created (see code below) and I need to load it with data from a csv file, using python-sql connection. As I do this, need to replace the html codes and change to correct datatypes (clean the file) and finally load it into this empty sql table.
This is the code I wrote but, without any success...when I check the table in SQL it just returns an empty table:
Python code:
import csv
with open ('UFOGB_Observations.csv', 'r') as UFO_Obsr:
## Write to the csv file, to clean it and change the html codes:
with open ('UFO_Observations.csv', 'w') as UFO_Obsw:
for line in UFO_Obsr:
line = line.replace('&#44', ',')
line = line.replace('&#39', "'")
line = line.replace('&#33', '!')
line = line.replace('&', '&')
UFO_Obsw.write(line)
##To Connect Python to SQL:
import pyodbc
print('Connecting...')
conn = pyodbc.connect('Trusted_Connection=yes', driver = '{ODBC Driver 13 for SQL Server}', server = '.\SQLEXPRESS', database = 'QA_DATA_ANALYSIS')
print('Connected')
cursor = conn.cursor()
print('cursor established')
cursor.execute('''DROP TABLE IF EXISTS UFO_GB_1;
CREATE TABLE UFO_GB_1 (Index_No VARCHAR(10) NOT NULL, date_time VARCHAR(15) NULL, city_or_state VARCHAR(50) NULL,
country_code VARCHAR(50) NULL, shape VARCHAR (200) NULL, duration VARCHAR(50) NULL,
date_posted VARCHAR(15) NULL, comments VARCHAR(700) NULL);
''')
print('Commands succesfully completed')
#To insert that csv into the table:
cursor.execute('''BULK INSERT QA_DATA_ANALYSIS.dbo.UFO_GB_1
FROM 'F:\GSS\QA_DATA_ANALYSIS_LEVEL_4\MODULE_2\Challenge_2\TASK_2\UFO_Observations.csv'
WITH ( fieldterminator = '', rowterminator = '\n')''')
conn.commit()
conn.close()
I was expecting to see a table with all 1900+ rows, when I type SELECT * FROM table, with correct data types (i.e. date_time and date_posted columns as timestamp)
(Apologies in advance. New here so not allowed to comment.)
1) Why are you creating the table each time? Is this meant to be a temporary table?
2) What do you get as a response to your query?
3) What happens when you break the task down into parts?
Does the code create the table?
If the table already exists and you run just the insert data code does it work? When you import the csv and then write back to the same file does that produce the result you are looking for or crash? What if you wrote to a different file and imported that?
you are writing your queries like how you would in SQL, but you need to re-write them in python. Python needs to understand the query as a python string, then it can parse it into sql. IE don't wrap the statement with '''.
This is not tested, but try something like this:
bulk_load_sql = """
BULK INSERT QA_DATA_ANALYSIS.dbo.UFO_GB_1
FROM 'F:\GSS\QA_DATA_ANALYSIS_LEVEL_4\MODULE_2\Challenge_2\TASK_2\UFO_Observations.csv'
WITH ( fieldterminator = '', rowterminator = '\n')
"""
cursor.execute(bulk_load_sql)
This uses a docstring to put the sql on multiple lines, but you may want to use a regular string.
Here is an answer that goes over formatting your query for pyodbc
https://stackoverflow.com/a/43855693/4788717

Load CSV MySQLdb - Python

I have a CSV that I'm attempting to load into a MySQL database. I'm running the following code:
import MySQLdb
con = MySQLdb.connect(host="myhost",
user="me",
passwd="mypw",
db="mydb")
cur = con.cursor()
sqlscript =r"""
DROP TABLE IF EXISTS MyTable;
CREATE TABLE MyTable
(Col1 VARCHAR(255),
Col2 VARCHAR(255),
CONSTRAINT PK_MyTable PRIMARY KEY (Col1));
LOAD DATA LOCAL INFILE 'C:\\Users\\me\\Documents\\Rec\\New Files\\mycsv.csv'
INTO TABLE MyTable
CHARACTER SET UTF8
FIELDS TERMINATED BY ','
ESCAPED BY '!'
ENCLOSED BY '"'
OPTIONALLY ENCLOSED BY '\''
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;"""
cur.execute(sqlscript)
cur.close()
This runs without error, but does not load data from my CSV file. It correctly drops the table and creates it using the script. When I then query the table, it has zero rows. What am I missing?

LOAD DATA LOCAL INFILE success in interactive mode, fails in script

I'm using ubuntu 16.04, mysql 5.6.34. python 3.5.2.
I cannot seem to get my script to perform the LOAD DATA INFILE statement, but it works fine on the same machine using python3 interactive mode .
Here is my code:
#!/usr/bin/python3
import mysql.connector
db = mysql.connector.connect(passwd=dbpwd,db=dbname,host=dbhostname,port=port_no,user=dbusername)
cursor = db.cursor()
insert_file = '/home/ubuntu/insert.csv'
db.get_warnings=True
q_event = ("LOAD DATA LOCAL INFILE '%s' INTO TABLE my_table FIELDS TERMINATED BY "
"',' OPTIONALLY ENCLOSED BY '\\\"' (col1,col2,col3)"
)
print(q_event.__repr__())
cursor.execute(q_event % insert_file)
print(cursor.rowcount)
print(cursor.statement.__repr__())
print(cursor.fetchwarnings())
db.commit()
My output looks like this:
'LOAD DATA LOCAL INFILE \'%s\' INTO TABLE my_table FIELDS TERMINATED BY \',\' OPTIONALLY ENCLOSED BY \'\\"\' (col1,col2,col3)'
0
'LOAD DATA LOCAL INFILE '/home/ubuntu/insert.csv\' INTO TABLE my_table FIELDS TERMINATED BY \',\' OPTIONALLY ENCLOSED BY \'\\"\' (col1,col2,col3)'
None
The row count is always 0. No matter how I change the formatting of the Load statement, I can't seem to get the script result to change; it simply fails, without error.
Meanwhile, I things work just fine when running in interactive mode:
>>> import mysql.connector; db = mysql.connector.connect(passwd="...",db="...",host="...",port=...,user="..."); cursor = db.cursor();db.get_warnings=True;
>>> cursor.execute("LOAD DATA LOCAL INFILE '%s' INTO TABLE my_table FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\\\"' (col1,col2,col3)" % "/home/ubuntu/insert.csv")
>>> cursor.rowcount
31
>>> cursor.statement
'LOAD DATA LOCAL INFILE \'/home/ubuntu/insert.csv\' INTO TABLE my_table FIELDS TERMINATED BY \',\' OPTIONALLY ENCLOSED BY \'\\"\' (col1,col2,col3)'
>>> cursor.fetchwarnings()
>>>
Is there a reason this should work in interactive mode but not in a script?

import csv file into Mysql Database using python

This is my code:
#!/usr/bin/python
import MySQLdb
import csv
db = MySQLdb.connect(host="host", # The Host
user="username", # username
passwd="pwd", # password
db="databasename") # name of the data base
sqlLoadData = 'LOAD DATA LOCAL INFILE "csv?_file_name.csv" INTO TABLE tablename '
sqlLoadData += 'FIELDS TERMINATED BY "," LINES TERMINATED BY "\n"'
sqlLoadData += 'IGNORE 1 LINES'
sqlLoadData += 'ENCLOSED BY '"' ESCAPED BY "\\" '
try:
curs = db.cursor()
curs.execute(sqlLoadData)
resultSet = curs.fetchall()
except StandardError, e:
print e
db.rollback()
db.close()
I recieve the error Message : You have an error in your SQL Syntax; chekc the manual that correcpond to your Mysql Server.
When I remove the part sqlLoadData += 'ENCLOSED BY '"' ESCAPED BY "\\" ' everything work perfect. I used the last part just to remove the quote from the values.
I also tried:
cursor = mydb.cursor()
reader = csv.reader(open('Cumulative.csv', 'rb'))
reader.next() for row in reader[1:]:
cursor.execute('INSERT INTO Cumulative (C1, C2, C3, C4, C5, C6) VALUES(%s, %s, %s, %s, %s, %s)', row)
cursor.commit()
close the connection to the database.
cursor.close()
I want just to remove the quote so the integer field will support the data. so with quote "1" will be considered as a String instead of integer
Can Anyone please help me to understand this?
Thanks!
looks like you forgot to terminate the preceding line with a space or newline character. Thi sis causing a syntax error when the parser tries to understand LINESENCLOSED which obviously isn't a keyword.
sqlLoadData += 'IGNORE 1 LINES \n'
sqlLoadData += ''ENCLOSED BY '"' ESCAPED BY "\" ''
As a rule of thumb: when you're debugging, and you're able to fix you're code by removing a line, don't rule out the line immediately above
EDIT: Modified the quotes around the second line. I think it was breaking in the "enclosed by" statement.
After 2 days worth of research I found the answer:
!/usr/bin/python
import MySQLdb
import csv
db = MySQLdb.connect(host="host", # The Host
user="username", # username
passwd="pwd", # password
db="databasename") # name of the data base
cursor = connection.cursor()
Query = """ LOAD DATA LOCAL INFILE 'usrl to csv file' INTO TABLE
table_nameFIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED
BY '"' Lines terminated by '\n' IGNORE 1 LINES """
cursor.execute(Query)
connection.commit()
cursor.close()
hope it will help somebody out there.
After days and hours of searching the internet and running into all sort of errors and warnings, this worked perfectly. I hope this saves someone some time
import MySQLdb
import os
import string
db = MySQLdb.connect (host="host",
user="user",
passwd="pwd",
db="database_name",
local_infile = 1) #Grants permission to write to db from an input file. Without this you get sql Error: (1148, 'The used command is not allowed with this MySQL version')
print "\nConnection to DB established\n"
#The statement 'IGNORE 1 LINES' below makes the Python script ignore first line on csv file
#You can execute the sql below on the mysql bash to test if it works
sqlLoadData = """load data local infile 'file.csv' into table table_name FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES;"""
try:
curs = db.cursor()
curs.execute(sqlLoadData)
db.commit()
print "SQL execution complete"
resultSet = curs.fetchall()
except StandardError, e:
print "Error incurred: ", e
db.rollback()
db.close()
print "Data loading complete.\n"
Thanks, I hope this helps :)

Insert data from file into database

I have a .sql file with multiple insert statements ( 1000 + ) and I want to run the statements in this file into my Oracle database.
For now, im using a python with odbc to connect to my database with the following:
import pyodbc
from ConfigParser import SafeConfigParser
def db_call(self, cfgFile, sql):
parser = SafeConfigParser()
parser.read(cfgFile)
dsn = parser.get('odbc', 'dsn')
uid = parser.get('odbc', 'user')
pwd = parser.get('odbc', 'pass')
try:
con = pyodbc.connect('DSN=' + dsn + ';PWD=' + pwd + ';UID=' + pwd)
cur = con.cursor()
cur.execute(sql)
con.commit()
except pyodbc.DatabaseError, e:
print 'Error %s' % e
sys.exit(1)
finally:
if con and cur:
cur.close()
con.close()
with open('theFile.sql','r') as f:
cfgFile = 'c:\\dbinfo\\connectionInfo.cfg'
#here goes the code to insert the contents into the database using db_call_many
statements = f.read()
db_call(cfgFile,statements)
But when i run it i receive the following error:
pyodbc.Error: ('HY000', '[HY000] [Oracle][ODBC][Ora]ORA-00911: invalid character\n (911) (SQLExecDirectW)')
But all the content of the file are only:
INSERT INTO table (movie,genre) VALUES ('moviename','horror');
Edit
Adding print '<{}>'.format(statements) before the db_db_call(cfgFile,statements) i get the results(100+):
<INSERT INTO table (movie,genre) VALUES ('moviename','horror');INSERT INTO table (movie,genre) VALUES ('moviename_b','horror');INSERT INTO table (movie,genre) VALUES ('moviename_c','horror');>
Thanks for your time on reading this.
Now it's somewhat clarified - you have a lot of separate SQL statements such as INSERT INTO table (movie,genre) VALUES ('moviename','horror');
Then, you're effectively after cur.executescript() than the current state (I have no idea if pyodbc supports that part of the DB API, but any reason, you can't just execute an execute to the database itself?
When you read a file using read() function, the end line (\n) at the end of file is read too. I think you should use db_call(cfgFile,statements[:-1]) to eliminate the end line.

Categories