How to export a query to excel using Python? - python

I have been trying to loop through a list as a parameter for a query from database, and convert it into xlsx format, using pyodbc, pandas, xlsxwriter modules.
However, the message below keeps on appearing despite a process of trial and error:
The first argument to execute must be a string or unicode query.
Could this have something to do with the query itself or the module 'pandas'?
Thank you.
This is for exporting a query result to an excel spreadsheet using pandas and pyodbc, with python 3.7 ver.
import pyodbc
import pandas as pd
#Database Connection
conn = pyodbc.connect(driver='xxxxxx', server='xxxxxxx', database='xxxxxx',
user='xxxxxx', password='xxxxxxxx')
cursor = conn.cursor()
depts = ['Human Resources','Accounting','Marketing']
query = """
SELECT *
FROM device ID
WHERE
Department like ?
AND
Status like 'Active'
"""
target = r'O:\\Example'
today = target + os.sep + time.strftime('%Y%m%d')
if not os.path.exists(today):
os.mkdir(today)
for i in departments:
cursor.execute(query, i)
#workbook = Workbook(today + os.sep + i + 'xlsx')
#worksheet = workbook.add_worksheet()
data = cursor.fetchall()
P_data = pd.read_sql(data, conn)
P_data.to_excel(today + os.sep + i + 'xlsx')

When you read data into a dataframe using pandas.read_sql(), pandas expects the first argument to be a query to execute (in string format), not the results from the query.
Instead of your line:
P_data = pd.read_sql(data, conn)
You'd want to use:
P_data = pd.read_sql(query, conn)
And to filter out the departments, you'd want to serialize the list into SQL syntax string:
depts = ['Human Resources','Accounting','Marketing']
# gives you the string to use in your sql query:
depts_query_string = "('{query_vals}')".format(query_vals="','".join(depts))
To use the new SQL string in your query, use str.format:
query = """
SELECT *
FROM device ID
WHERE
Department in {query_vals}
AND
Status like 'Active'
""".format(query_vals=depts_query_string)
All together now:
import pyodbc
import pandas as pd
#Database Connection
conn = pyodbc.connect(driver='xxxxxx', server='xxxxxxx', database='xxxxxx',
user='xxxxxx', password='xxxxxxxx')
cursor = conn.cursor()
depts = ['Human Resources','Accounting','Marketing']
# gives you the string to use in your sql query:
depts_query_string = "('{query_vals}')".format(query_vals="','".join(depts))
query = """
SELECT *
FROM device ID
WHERE
Department in {query_vals}
AND
Status like 'Active'
""".format(query_vals=depts_query_string)
target = r'O:\\Example'
today = target + os.sep + time.strftime('%Y%m%d')
if not os.path.exists(today):
os.mkdir(today)
for i in departments:
#workbook = Workbook(today + os.sep + i + 'xlsx')
#worksheet = workbook.add_worksheet()
P_data = pd.read_sql(query, conn)
P_data.to_excel(today + os.sep + i + 'xlsx')

Once you have your query sorted, you can just load directly into a dataframe with the following command.
P_data = pd.read_sql_query(query, conn)
P_data.to_excel('desired_filename.format')

Related

TyperError: Query dictionary values must be strings or sequences of strings

I was previously using only pyodbc and pandas to reach out to a SQL Server to run a run a query and save that information into a csv file. Using this method does work but results in warnings that I feel was slowing down the program.
I'm trying to implement SQLAchemy but getting the TypeError above.
I read from the docs: SQLAlchemy 1.4 Documentation
import pyodbc
import pandas as pd
import numpy as np
from sqlalchemy.engine import URL
from sqlalchemy import create_engine
# Setting display options for the dataframe
pd.set_option('display.max_columns', None)
pd.set_option('max_colwidth', None)
# Turning each row from the main .csv file into a dataframe.
df = pd.read_csv(r'''C:\Users\username\Downloads\py_tma_tables.csv''')
# Turning the dataframe into a list.
tma_table = df['table_name'].tolist()
# Connection setup.
cnxn = pyodbc.connect('DSN=DB_NAME;Trusted_Connection=yes')
connection_url = URL.create("mssql+pyodbc", query={"odbc_connect": cnxn})
engine = create_engine(connection_url)
df_list = []
count = 0
while count < 1:
df1 = pd.read_sql_query("SELECT * FROM " + tma_table[count], engine)
df_list.append(df1)
count += 1
df_count = 0
while df_count < len(df_list):
for item in df_list:
df_list[df_count].to_csv(tma_table[df_count] + ".csv", index=False, header=None, encoding='utf-8')
df_count += 1
Running this returns:
TyperError: Query dictionary values must be strings or sequences of strings
Found a solution:
servername = 'server'
dbname = 'database'
sqlcon = create_engine('mssql+pyodbc://#' + servername + '/' + dbname + '?driver=ODBC+Driver+17+for+SQL+Server')
Was then able to plug sqlcon in:
df_list = []
count = 0
while count < 1:
df1 = pd.read_sql_query("SELECT * FROM " + tma_table[count], sqlcon)
df_list.append(df1)
count += 1
cnxn = pyodbc.connect('DSN=DB_NAME;Trusted_Connection=yes')
connection_url = URL.create("mssql+pyodbc", query={"odbc_connect": cnxn})
You are trying to pass a pyodbc Connection object in the query= dict. You need to pass the connection string:
connection_string = "DSN=DB_NAME;Trusted_Connection=yes"
connection_url = URL.create("mssql+pyodbc", query={"odbc_connect": connection_string})

Incorrect date value when loading xlsx file to table using pymysql and xlrd

(Very) beginner python user here. I'm trying to load an xlsx file into a MySQL table using xlrd and pymysql python libraries and I'm getting an error:
pymysql.err.InternalError: (1292, "Incorrect date value: '43500' for column 'invoice_date' at row 1")
The datatype for invoice_date for my table is DATE. The format for this field on my xlsx file is also Date. Things work fine if I change the table datatype to varchar, but I'd prefer to have the data load into my table as a date instead of converting after the fact. Any ideas as to why I'm getting this error? It appears that xlrd or pymysql is reading '2/4/2019' in my xlxs file as '43500' and mysql is rejecting it due to a datatype mismatch.
import xlrd
import pymysql as MySQLdb
# Open workbook and define first sheet
book = xlrd.open_workbook("2019_Complete.xlsx")
sheet = book.sheet_by_index(0)
# MySQL connection
database = MySQLdb.connect (host="localhost", user="root",passwd="password", db="vendor")
# Get cursor, which is used to traverse the databse, line by line
cursor = database.cursor()
# INSERT INTO SQL query
query = """insert into table values (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"""
# Create a For loop to iterate through each row in the XLS file, starting at row 2 to skip the headers
for r in range(1, sheet.nrows):
lp = sheet.cell(r,0).value
pallet_lp = sheet.cell(r,1).value
bol = sheet.cell(r,2).value
invoice_date = sheet.cell(r,3).value
date_received = sheet.cell(r,4).value
date_repaired = sheet.cell(r,5).value
time_in_repair = sheet.cell(r,6).value
date_shipped = sheet.cell(r,7).value
serial_number = sheet.cell(r,8).value
upc = sheet.cell(r,9).value
product_type = sheet.cell(r,10).value
product_description = sheet.cell(r,11).value
repair_code = sheet.cell(r,12).value
condition = sheet.cell(r,13).value
repair_cost = sheet.cell(r,14).value
parts_cost = sheet.cell(r,15).value
total_cost = sheet.cell(r,16).value
repair_notes = sheet.cell(r,17).value
repair_cap = sheet.cell(r,18).value
complaint = sheet.cell(r,19).value
delta = sheet.cell(r,20).value
# Assign values from each row
values = (lp, pallet_lp, bol, invoice_date, date_received, date_repaired, time_in_repair, date_shipped, serial_number, upc, product_type, product_description, repair_code, condition, repair_cost, parts_cost, total_cost, repair_notes, repair_cap, complaint, delta)
# Execute sql Query
cursor.execute(query, values)
# Close the cursor
cursor.close()
# Commit the transaction
database.commit()
# Close the database connection
database.close()
# Print results
print ("")
columns = str(sheet.ncols)
rows = str(sheet.nrows)
print ("I just imported " + columns + " columns and " + rows + " rows to MySQL!")
You can see this answer for a more detailed explanation, but basically Excel treats dates as a number relative to 1899-12-31, and so to convert your date value to an actual date you need to convert that number into an ISO format date which MySQL will accept. You can do that using date.fromordinal and date.isoformat. For example:
dval = 43500
d = date.fromordinal(dval + 693594)
print(d.isoformat())
Output:
2019-02-04

How to return a list from SQL query using pyodbc?

I am trying to run a select query to retrieve data from SQL Server using pyodbc in python 2.7. I want the data to be returned in a list. The code I have written is below.
It works, kinda, but not in the way I expected. My returned list looks something like below:
Index Type Size Value
0 Row 1 Row object of pyodbc module
1 Row 1 Row object of pyodbc module
...
105 Row 1 Row object of pyodbc module
I was hoping to see something like below (i.e. my table in SQL)
ActionId AnnDate Name SaleValue
128929 2018-01-01 Bob 105.3
193329 2018-04-05 Bob 1006.98
...
23654 2018-11-21 Bob 103.32
Is a list not the best way to return data from a SQL query using pyodbc?
Code
import pyodbc
def GetSQLData(dbName, query):
sPass = 'MyPassword'
sServer = 'MyServer\\SQL1'
uname = 'MyUser'
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=" + sServer + ";"
"Database=" + dbName + ";"
"uid=" + uname + ";pwd=" + sPass)
cursor = cnxn.cursor()
cursor.execute(query)
return list(cursor.fetchall())
If you want to return your query results as a list of lists with your column names as the first sublist (similar to the example output in your question), then you can do something like the following:
import pyodbc
cnxn = pyodbc.connect("YOUR_CONNECTION_STRING")
cursor = cnxn.cursor()
cursor.execute("YOUR_QUERY")
columns = [column[0] for column in cursor.description]
results = [columns] + [row for row in cursor.fetchall()]
for result in results:
print result
# EXAMPLE OUTPUT
# ['col1', 'col2']
# ['r1c1', 'r1c2']
# ['r2c1', 'r2c2']
Depending on how you are using the results, I often find it more useful to a have a list of dicts. For example:
results = [dict(zip(columns, row)) for row in cursor.fetchall()]
for result in results:
print result
# EXAMPLE OUTPUT
# {'col1': 'r1c1', 'col2':'r1c2'}
# {'col1': 'r2c1', 'col2':'r2c2'}
There is even a better option than a list, try Pandas DataFrame!
It helps to deal with column names and apply column wise operations!
import pandas as pd
import pyodbc
def GetSQLData(dbName, query):
sPass = 'MyPassword'
sServer = 'MyServer\\SQL1'
uname = 'MyUser'
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=" + sServer + ";"
"Database=" + dbName + ";"
"uid=" + uname + ";pwd=" + sPass)
df = pd.read_sql(cnxn, query)
return df # Pandas Dataframe
EDIT:
If you prefer a list of lists, (this means one list per row) you can obtain it by:
df.values.tolist() # list of lists
But I highly recommend you to start working with pandas

Creating a function to turn a SQL query into Pandas df

I'd like to create a function that allows the user to input a SQL query and have it converted into a Pandas df. So far I've tried the following:
def dataset():
raw_sql_query = input("Enter your SQL query: ")
sql_query = """" " + raw_sql_query + " """"
sql3 =
sql_query
df = pd.io.sql.read_sql(sql3, cnxn)
df.head()
Which yields the error:
File "<ipython-input-18-6b10c2bc776f>", line 4
sql_query = """" " + raw_sql_query + " """"
^
SyntaxError: EOL while scanning string literal
I've also tried a few similar versions of the above code, including:
def dataset():
raw_sql_query = input("Enter your SQL query: ")
sql_query = """"" + raw_sql_query + """""
sql3 =
sql_query
df = pd.io.sql.read_sql(sql3, cnxn)
df.head()
Which led to the following error:
File "<ipython-input-23-e501c9746878>", line 5
sql3 =
^
SyntaxError: invalid syntax
Is a function like this possible? If so, how would I go about creating a working function for this action?
All the documentation I've read about functions only includes examples for stuff like printing "Hello World" or basic addition/subtraction/etc - so not very useful.
EDIT:
Using pandas.read_sql_query like this:
def dataset():
"""This functions allows you to input a SQL query and it will be transformed into a Pandas dataframe"""
raw_sql_query = input("Enter your SQL query: ")
sql_query = """"" + raw_sql_query + """""
sql3 = sql_query
df = pd.io.sql.read_sql(sql3, cnxn)
df.head()
This doesn't return an error, but also doesn't return the expected results. It returns nothing.
I like the flexibility of sqlalchemy combined to pandas.read_sql. This is the code that I use:
import sqlalchemy as sa
def bindQuery(query, **params):
for key, value in params.items():
key = f":{key}"
if isinstance(value, str):
value = f"'{value}'"
query = query.replace(key, str(value))
query = query.replace("\n", " ").replace("\t", " ")
return query
def readQuery(query, engine, **params):
query = bindQuery(query, **params)
return pd.read_sql(query, engine)
So when I've to run the following QUERY
QUERY = """
SELECT count(*)
FROM table
where id in :ids
"""
ids = (1, 2, 3)
df = readQuery(query=QUERY,
engine=my_engine,
ids=ids)

Get MSSQL table column names using pyodbc in python

I am trying to get the mssql table column names using pyodbc, and getting an error saying
ProgrammingError: No results. Previous SQL was not a query.
Here is my code:
class get_Fields:
def GET(self,r):
web.header('Access-Control-Allow-Origin', '*')
web.header('Access-Control-Allow-Credentials', 'true')
fields = []
datasetname = web.input().datasetName
tablename = web.input().tableName
cnxn = pyodbc.connect(connection_string)
cursor = cnxn.cursor()
query = "USE" + "[" +datasetname+ "]" + "SELECT COLUMN_NAME,* FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = " + "'"+ tablename + "'"
cursor.execute(query)
DF = DataFrame(cursor.fetchall())
columns = [column[0] for column in cursor.description]
return json.dumps(columns)
how to solve this?
You can avoid this by using some of pyodbc's built in methods. For example, instead of:
query = "USE" + "[" +datasetname+ "]" + "SELECT COLUMN_NAME,* FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = " + "'"+ tablename + "'"
cursor.execute(query)
DF = DataFrame(cursor.fetchall())
Try:
column_data = cursor.columns(table=tablename, catalog=datasetname, schema='dbo').fetchall()
print(column_data)
That will return the column names (and other column metadata). I believe the column name is the fourth element per row. This also relieves the very valid concerns about SQL injection. You can then figure out how to build your DataFrame from the resulting data.
Good luck!
Your line
query = "USE" + "[" +datasetname+ "]" + "SELECT COLUMN_NAME,*...
Will produce something like
USE[databasename]SELECT ...
In SSMS this would work, but I'd suggest to look on proper spacing and to separate the USE-statement with a semicolon:
query = "USE " + "[" +datasetname+ "]; " + "SELECT COLUMN_NAME,*...
Set the database context using the Database attribute when building the connection string
Use parameters any time you are passing user input (especially from HTTP requests!) to a WHERE clause.
These changes eliminate the need for dynamic SQL, which can be insecure and difficult to maintain.

Categories