pyodbc creating corrupt excel file - python

I'm using pyodbc to create a new excel file, and the operations seem to execute fine but the resultant xlsx file is corrupt.
I've stripped the code down to this minimal code snippet:
import pyodbc
# Setup path and driver connection string
spreadsheet_path = "C:\\temp\\test_spreadsheet.xlsx"
conn_str = (r'Driver={{Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)}};'
r'DBQ={}; ReadOnly=0').format(spreadsheet_path)
with pyodbc.connect(conn_str, autocommit=True) as conn:
# Create table
cursor = conn.cursor()
query = "create table sheet1 (COL1 TEXT, COL2 NUMBER);"
cursor.execute(query)
cursor.commit()
# Insert a row
query = "insert into sheet1 (COL1, COL2) values (?, ?);"
cursor.execute(query, "apples", 10)
cursor.commit()
# Check the row is there
query = "select * from sheet1;"
cursor.execute(query)
for r in cursor.fetchall():
print(r)
print("done")
Note this will create a new spreadsheet in the location specified by spreadsheet_path. I had to use a full path, because the ODBC driver doesn't like relative paths.
I have enabled autocommit, and manually called cursor.commit() just to test if it makes a difference, and it does not.
Any ideas?
--
After doing some searching, I found this guide to using the Excel ODBC driver with PowerShell and it mentions:
The problem is in the Workbook you create. Whether you name it XLS or
XSLX it produces an XLSX spreadsheet, in the latest zipped Office Open
XML form. The trouble is that, with my version of the driver, I can
only get Excel to read it with the XLS filetype, since it says that
there is an error if you try to open it as an .XLSX file. I suspect
that the ODBC driver hasn’t been that well tested by Microsoft.
If I change the file to be .xls, I can open it in excel (although it gives me a warning about the format and extension not matching). The data is valid though. Is this all due to Microsoft's crappy driver? Or is there something I'm doing wrong here?

Related

How to automatically pull data from access into excel using a python script

I am looking to automatically pull data from Access to Excel using a python script. Manually, I have to do the following inside Excel:
Click on "Data"--> "from Access".
Step 1
Select the Data Source (.accdb)
Step 2
Input the credentials for Oracle ODBC Driver Connect Step 3
This fairly easy process inside Excel I would like to automate using a python script. Could you get me an idea how this could be achieved? Could I possibly run a macro from python that does these 3 steps ?
Thanks for your help!
Greetings,
Daniel
Your question has two parts:
Reading an access database; to do so you can use pyodbc to read an access database and run query on it. e.g.:
import pyodbc
conn = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=path where you stored the Access file\file name.accdb;')
cursor = conn.cursor()
cursor.execute('select * from table name')
for row in cursor.fetchall():
print (row)
Write your data to Excel file; for this you can use XlsxWriter to write excel files:
import xlsxwriter
workbook = xlsxwriter.Workbook('hello.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write('A1', 'Hello world')
workbook.close()

can't connect to MySQL database when using df.to_sql (pandas, pyodbc)

I am trying to move information that is on several Excel spreadsheets into a single table on a SQL database. First I read the sheets using pandas and convert them into a single dataframe (this part works).
def condense(sheet):
# read all excel files and convert to single pandas dataframe
dfs = []
for school in SCHOOLS:
path = filePaths[school]
try: opened = open(path, "rb")
except: print("There was an error with the Excel file path.")
dataframe = pd.read_excel(opened, sheetname=sheet)
dfs.append(dataframe)
return pd.concat(dfs)
Then I want to upload the dataframe to the database. I have read a lot of documentation but still don't really know where to begin. This is what I have currently.
connection = pyodbc.connect('''Driver={SQL Server};
Server=serverName;
Database=dbName;
Trusted_Connection=True''')
df.to_sql(tableName, connection)
I have also tried using an engine, but am confused at how to format the connection string, especially as I do not want to use a password. (What I tried below does use a password, and doesn't work.)
connection_string= 'mysql://username:password#localhost/dbName'
engine = create_engine(connection_string)
engine.connect()
df.to_sql(tableName, engine)
Any suggestions on how to create the engine without using a password, or where my code is wrong would be very appreciated.

Fastest way to load .xlsx file into MySQL database

I'm trying to import data from a .xlsx file into a SQL database.
Right now, I have a python script which uses the openpyxl and MySQLdb modules to
establish a connection to the database
open the workbook
grab the worksheet
loop thru the rows the the worksheet, extracting the columns I need
and inserting each record into the database, one by one
Unfortunately, this is painfully slow. I'm working with a huge data set, so I need to find a faster way to do this (preferably with Python). Any ideas?
wb = openpyxl.load_workbook(filename="file", read_only=True)
ws = wb['My Worksheet']
conn = MySQLdb.connect()
cursor = conn.cursor()
cursor.execute("SET autocommit = 0")
for row in ws.iter_rows(row_offset=1):
sql_row = # data i need
cursor.execute("INSERT sql_row")
conn.commit()
Disable autocommit if it is on! Autocommit is a function which causes MySQL to immediately try to push your data to disk. This is good if you only have one insert, but this is what causes each individual insert to take a long time. Instead, you can turn it off and try to insert the data all at once, committing only once you've run all of your insert statements.
Something like this might work:
con = mysqldb.connect(
host="your db host",
user="your username",
passwd="your password",
db="your db name"
)
con.execute("SET autocommit = 0")
cursor = con.cursor()
data = # some code to get data from excel
for datum in data:
cursor.execute("your insert statement".format(datum))
con.commit()
con.close()
Consider saving workbook's worksheet as a CSV, then use MySQL's LOAD DATA INFILE. This is often a very fast read.
sql = """LOAD DATA INFILE '/path/to/data.csv'
INTO TABLE myTable
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '\"'
LINES TERMINATED BY '\n'"""
cursor.execute(sql)
con.commit()

How to export parsed data from Python to an Oracle table in SQL Developer?

I have used Python to parse a txt file for specific information (dates, $ amounts, lbs, etc) and now I want to export that data to an Oracle table that I made in SQL Developer.
I have successfully connected Python to Oracle with the cx_Oracle module, but I am struggling to export or even print any data to my database from Python.
I am not proficient at using SQL, I know of simple queries and that's about it. I have explored the Oracle docs and haven't found straightforward export commands. When exporting data to an Oracle table via Python is it Python code I am going to be using or SQL code? Is it the same as importing a CSV file, for example?
I would like to understand how to write to an Oracle table from Python; I need to parse and export a very large amount of data so this won't be a one time export/import. I would also ideally like to have a way to preview my import to ensure it aligns correctly with my already created Oracle table, or if a simple undo action exists that would suffice.
If my problem is unclear I am more than happy to clarify it. Thanks for all help.
My code so far:
import cx_Oracle
dsnStr = cx_Oracle.makedsn("sole.wh.whoi.edu", "1526", "sole")
con = cx_Oracle.connect(user="myusername", password="mypassword", dsn=dsnStr)
print (con.version)
#imp 'Book1.csv' [this didn't work]
cursor = con.cursor()
print (cursor)
con.close()
From Import a CSV file into Oracle using CX_Oracle & Python 2.7 you can see overall plan.
So if you already parsed data into csv you can easily do it like:
import cx_Oracle
import csv
dsnStr = cx_Oracle.makedsn("sole.wh.whoi.edu", "1526", "sole")
con = cx_Oracle.connect(user="myusername", password="mypassword", dsn=dsnStr)
print (con.version)
#imp 'Book1.csv' [this didn't work]
cursor = con.cursor()
print (cursor)
text_sql = '''
INSERT INTO tablename (firstfield, secondfield) VALUES(:1,:2)
'''
my_file = 'C:\CSVData\Book1.csv'
cr = csv.reader(open(my_file,"rb"))
for row in cr:
print row
cursor.execute(text_sql, row)
print 'Imported'
con.close()

Python with Mysql - pdf file insertion during runtime

I have a script that stores results in pdf format in a particular folder. I want to create a mysql database ( which is successful with the below code ), and populate the pdf results to it. what would be the best way , storing the file as such , or as reference to the location. The file size would be around 2MB. Could someone help in explaining the same with some working examples. I am new to both python and mysql.Thanks in advance.
To clarify more : I tried using LOAD DATA INFILE and the BLOB type for the result file column , but it dosent seem to work .I am using pymysql api module to connect to the database.Below code is to connect to the database and is successful.
import pymsql
conn = pymysql.connect(host='hostname', port=3306, user='root', passwd='abcdef', db='mydb')
cur = conn.cursor()
cur.execute("SELECT * FROM userlogin")
for r in cur.fetchall():
print(r)
cur.close()
conn.close()
Since you seem to be close to getting mysql to store strings for you (user names), your best bet is to just stick with what you did there and store the file path just as you stored the strings in your userlogin table (but in a different table with a foreign key to userlogin). It will probably be the most efficient approach in the long run anyway, especially if you store important metadata along with the file path (like keywords or even complete n-gram sets)... now you're talking about a file indexing system like Google Desktop or Xapian... just so you know what you're up against if you want to do this the "best" way.

Categories