Passing a parameter to SQL Server query using read_sql_query - python

I have a one column list of Member_IDs called Member_ID_only2 similar to below:
Member ID
'123',
'456',
'758',
.
.
.
I'm trying to pass this as a list parms to a read_sql_query. My code sample is below:
'''
'''
import pandas as pd
import pyodbc
conn = pyodbc.connect()
query = open(path + 'XXXXX_PROD.SQL', 'r')
SQL_Query = pd.read_sql_query(query, conn, params=(Member_ID_only2,))
My XXXXX_PROD.SQL has a where statement below:
WHERE MEME_CK IN '%s'
'''
I'm getting the error below:
'''
DatabaseError: Execution failed on sql '<_io.TextIOWrapper
name='C:\Folder\XXXXX_PROD.SQL' mode='r' encoding='cp1252'>': The first argument
''' to execute must be a string or unicode query.
I'm not sure how to fix it. Can someone help?
Thank you for any help that I can get.

The open method opens the file and returns a TextIOWrapper object but does not read the content of the files.
To actually get the content of the file, you need to call the read method on that object, like so:
# Read the sql file
query = open('filename.sql', 'r')
DF = pd.read_sql_query(query.read(),connection,your_params)
By the aid of with open you will be ensure the file is properly closed
with open(filename.sql', 'r') as query:
DF = pd.read_sql_query(query.read(),connection,your_params)

Related

How to connect to snowflake using multiple SQL code in Python vscode?

I am trying to connect to the snowflake database using Python. I have the .sql file in VS code that contains multiple SQL statements. For example:
select * from table1;
select * from table2:
select * from table3:
So, I tried this code to get the result but it returned an error:
"Multiple SQL statements in a single API call are not supported; use one API call per statement instead."
My Python code is
#!/usr/bin/env python
import snowflake.connector
# Gets the version
ctx = snowflake.connector.connect(
user='<user_name>',
password='<password>',
account='<account_identifier>'
)
cs = ctx.cursor()
try:
with open('<file_directory>') as f:
lines = f.readlines()
cs.execute(lines)
data_frame=cs.fetch_pandas_all()
data_frame.to_csv('filename.csv')
finally:
cs.close()
ctx.close()
What can I try next?
Perhaps do as the error suggest, and limit each API call to a single SQL statement?
dfs = []
for line in lines:
cs.execute(lines)
dfs.append(cs.fetch_pandas_all())
df = pd.concat(dfs).to_csv('filename.csv')

How to get a CSV string from querying a relational DB?

I'm querying a relational Database and I need the result as a CSV string. I can't save it on the disk as is running in a serverless environment (I don't have access to disk).
Any idea?
My solution was using PyGreSQL library and defining this function:
import pg
def get_csv_from_db(query, cols):
"""
Given the SQL #query and the expected #cols,
a string formatted CSV (containing headers) is returned
:param str query:
:param list of str cols:
:return str:
"""
connection = pg.DB(
dbname=my_db_name,
host=my_host,
port=my_port,
user=my_username,
passwd=my_password)
header = ','.join(cols) + '\n'
records_list = []
for row in connection.query(query).dictresult():
record = []
for c in cols:
record.append(str(row[c]))
records_list.append(",".join(record))
connection.close()
return header + "\n".join(records_list)
Unfortunately this solution expects the column names in input (which is not too bad IMHO) and iterate over the dictionary result with Python code.
Other solutions (especially out of the box) using other packages are more than welcome.
This is another solution based on PsycoPG and Pandas:
import psycopg2
import pandas as pd
def get_csv_from_db(query):
"""
Given the SQL #query a string formatted CSV (containing headers) is returned
:param str query:
:return str:
"""
conn = psycopg2.connect(
dbname=my_db_name,
host=my_host,
port=my_port,
user=my_username,
passwd=my_password)
cur = conn.cursor()
cursor.execute("query")
df = pd.DataFrame(cur.fetchall(), columns=[desc[0] for desc in cur.description])
cur.close()
conn.commit()
return df.to_csv()
I hadn't chance to test it yet though.
here is a different approach from other answers, Using Pandas.
i suppose you have a database connection already,
for example I'm using Oracle database, same can be done by using respective library for your relational db.
only these 2 lines do the trick,
df = pd.read_sql(query, con)
df.to_csv("file_name.csv")
Here is a full example using Oracle database:
dsn = cx_Oracle.makedsn(ip, port,service_name)
con = cx_Oracle.connect("user","password",dsn)
query = """"select * from YOUR_TABLE"""
df = pd.read_sql(query, con)
df.to_csv("file_name.csv")
PyGreSQL's Cursor has method copy_to. It accept as stream file-like object (which must have a write() method). io.StringIO does meet this condition and do not need access to disk, so it should be possible to do:
import io
csv_io = io.StringIO()
# here connect to your DB and get cursor
cursor.copy_to(csv_io, "SELECT * FROM table", format="csv", decode=True)
csv_io.seek(0)
csv_str = csv_io.read()
Explanation: many python modules accept file-like object, meaning you can use io.StringIO() or io.BytesIO() in place of true file-handles. These mimick file opened in text and bytes modes respectively. As with files there is position of reader, so I do seek to begin after usage. Last line does create csv_str which is just plain str. Remember to adjust SQL query to your needs.
Note: I do not tested above code, please try it yourself and write if it works as intended.

LOAD DATA LOCAL INFILE with incremental field

I have multiple unstructured txt files in a directory and I want to insert all of them into mysql; basically, the entire content of each text file should be placed into a row . In MySQL, I have 2 columns: ID (auto increment), and LastName(nvarchar(45)). I used Python to connect to MySql; used LOAD DATA LOCAL INFILE to insert the whole content. But when I run the code I see the following messages in Python console:
.
Also, when I check MySql, I see nothing but a bunch of empty rows with Ids being automatically generated.
Here is the code:
import MySQLdb
import sys
import os
result = os.listdir("C:\\Users\\msalimi\\Google Drive\\s\\Discharge_Summary")
for x in result:
db = MySQLdb.connect("localhost", "root", "Pass", "myblog")
cursor = db.cursor()
file1 = os.path.join(r'C:\\Discharge_Summary\\'+x)
cursor.execute("LOAD DATA LOCAL INFILE '%s' INTO TABLE clamp_test" %(file1,));
db.commit()
db.close()
Can someone please tell me what is wrong with the code? What is the right way to achieve my goal?
I edited my code with:
.....cursor.execute("LOAD DATA LOCAL INFILE '%s' INTO TABLE clamp_test LINES TERMINATED BY '\r' (Lastname) SET id = NULL" %(file1,))
and it worked :)

Python Running SQL Query With Temp Tables

I am new to the Python-SQL connectivity world. My goal is to retrieve data from SQL in a pandas DataFrame format by executing long SQL queries thru my python script.
Most of my SQL queries are long with multiple interim-temp tables before the final SELECT statement from the last temp table. When I run such a monolithic query in Python I get an error saying -
"pandas.io.sql.DatabaseError: Execution failed on sql"
Though they run absolutely fine in MS SQL Management Studio
I suspect this is due to the interim-temp tables, because if I split my long query into two pieces (with everything before the final SELECT in 1st section and final SELECT in the 2nd section) the two section sequentially, run fine
Can someone guide me why is it so or alternatively what is the best way to run long queries with temp tables/views and retrieve results in a pandas DataFrame?
Here is my sample Python code that ideally should take a fine name as an input and run the SQL to retrieve results in a data frame, however it fails in case of a query with temp tables
import pyodbc as db
import pandas as pd
filename = 'file.sql'
username = 'XXXX'
password = 'YYYYY'
driver= '{ODBC Driver 13 for SQL Server}'
database = 'DB'
server = 'local'
conn = db.connect('DRIVER='+driver+'; PORT=1433; SERVER='+server+';
PORT=1443; DATABASE='+database+'; UID='+username+'; PWD='+ password)
fd = open(filename, 'r')
sqlfile = fd.read()
fd.close()
sqlcommand1 = sql
df_table = pd.read_sql(sqlcommand1, conn)
If I break my sql query in two pieces (one with all temp tables and 2nd with final Select), then it runs fine. Below is a modified function that splits the long Query after finding '/**/' and it works fine
"""
This Function Reads a SQL Script From an Extrenal File and Executes The
Script in SQL. If The SQL Script Has Bunch of Tem Tables/Views
Followed By a Select Statement to Retrieve Data From Those Views Then Input
SQL File Should Have '/**/' Immediately Before the Final
Select Statement. This is to Esnure Final Select Statement is Executed on
the Temporary Views Already Run by Python.
Input is a SQL File Name and Output is a DataFrame
"""
import pyodbc as db
import pandas as pd
filename = 'filename.sql'
username = 'XXXX'
password = 'YYYYY'
driver= '{ODBC Driver 13 for SQL Server}'
database = 'DB'
server = 'local'
conn = db.connect('DRIVER='+driver+'; PORT=1433; SERVER='+server+';
PORT=1443; DATABASE='+database+'; UID='+username+'; PWD='+ password)
fd = open(filename, 'r')
sqlfile = fd.read()
fd.close()
sql = sqlfile.split('/**/')
sqlcommand1 = sql[0] #1st Section of Query with temp tables
sqlcommand2 = sql[1] #2nd section of Query with final SELECT statement
conn.execute(sqlcommand1)
df_table = pd.read_sql(sqlcommand2, conn)
Quick and dirty answer: if using T-SQL put the line SET NOCOUNT ON at the beginning of your query.
Like #Parfait mentioned above the pandas read_sql method can only support one result set. However, when you generate a temp table in T-sql you do create a result set in the form "(XX row(s) affected)" which is what causes your original query to fail. By setting NOCOUNT you eliminate any early returns and only get the results from your final SELECT statement.
Alternatively, if using pyodbc cursor instead of pandas you can utilize nextset() to skip the result sets from the temp table(s). More info on pyodbc here.

Psycopg2 "copy_from" command, possible to ignore delimiter in quote (getting error)?

I am trying to load rows of data into postgres in a csv-like structure using the copy_from command (function to utilize copy command in postgres). My data is delimited with commas(and unfortunately since I am not the data owner I cannot just change the delimiter). I run into a problem when I try to load a row that has a value in quotes containing a comma (ie. that comma should not be treated as a delimiter).
For example this row of data is fine:
",Madrid,SN,,SEN,,,SN,173,157"
This row of data is not fine:
","Dominican, Republic of",MC,,YUO,,,MC,65,162",
Some code:
conn = get_psycopg_conn()
cur = conn.cursor()
_io_buffer.seek(0) #This buffer is holding the csv-like data
cur.copy_from(_io_buffer, str(table_name), sep=',', null='', columns=column_names)
conn.commit()
It looks like copy_from doesn't expose the csv mode or quote options, which are available form the underlying PostgreSQL COPY command. So you'll need to either patch psycopg2 to add them, or use copy_expert.
I haven't tried it, but something like
curs.copy_expert("""COPY mytable FROM STDIN WITH (FORMAT CSV)""", _io_buffer)
might be sufficient.
I had this same error and was able to get close to a fix based on the single line of code listed by craig-ringer. The other item I needed was to include quotes for the initial object by using df.to_csv(index=False,header=False, quoting=csv.QUOTE_NONNUMERIC,sep=',') and specifically , quoting=csv.QUOTE_NONNUMERIC.
The full example of pulling one data source from MySQL and storing it in Postgres is below:
#run in python 3.6
import MySQLdb
import psycopg2
import os
from io import StringIO
import pandas as pd
import csv
mysql_db = MySQLdb.connect(host="host_address",# your host, usually localhost
user="user_name", # your username
passwd="source_pw", # your password
db="source_db") # name of the data base
postgres_db = psycopg2.connect("host=dest_address dbname=dest_db_name user=dest_user password=dest_pw")
my_list = ['1','2','3','4']
# you must create a Cursor object. It will let you execute all the queries you need
mysql_cur = mysql_db.cursor()
postgres_cur = postgres_db.cursor()
for item in my_list:
# Pull cbi data for each state and write it to postgres
print(item)
mysql_sql = 'select * from my_table t \
where t.important_feature = \'' + item + '\';'
# Do something to create your dataframe here...
df = pd.read_sql_query(mysql_sql, mysql_db)
# Initialize a string buffer
sio = StringIO()
sio.write(df.to_csv(index=False,header=False, quoting=csv.QUOTE_NONNUMERIC,sep=',')) # Write the Pandas DataFrame as a csv to the buffer
sio.seek(0) # Be sure to reset the position to the start of the stream
# Copy the string buffer to the database, as if it were an actual file
with postgres_db.cursor() as c:
print(c)
c.copy_expert("""COPY schema:new_table FROM STDIN WITH (FORMAT CSV)""", sio)
postgres_db.commit()
mysql_db.close()
postgres_db.close()

Categories