I am new to the Python-SQL connectivity world. My goal is to retrieve data from SQL in a pandas DataFrame format by executing long SQL queries thru my python script.
Most of my SQL queries are long with multiple interim-temp tables before the final SELECT statement from the last temp table. When I run such a monolithic query in Python I get an error saying -
"pandas.io.sql.DatabaseError: Execution failed on sql"
Though they run absolutely fine in MS SQL Management Studio
I suspect this is due to the interim-temp tables, because if I split my long query into two pieces (with everything before the final SELECT in 1st section and final SELECT in the 2nd section) the two section sequentially, run fine
Can someone guide me why is it so or alternatively what is the best way to run long queries with temp tables/views and retrieve results in a pandas DataFrame?
Here is my sample Python code that ideally should take a fine name as an input and run the SQL to retrieve results in a data frame, however it fails in case of a query with temp tables
import pyodbc as db
import pandas as pd
filename = 'file.sql'
username = 'XXXX'
password = 'YYYYY'
driver= '{ODBC Driver 13 for SQL Server}'
database = 'DB'
server = 'local'
conn = db.connect('DRIVER='+driver+'; PORT=1433; SERVER='+server+';
PORT=1443; DATABASE='+database+'; UID='+username+'; PWD='+ password)
fd = open(filename, 'r')
sqlfile = fd.read()
fd.close()
sqlcommand1 = sql
df_table = pd.read_sql(sqlcommand1, conn)
If I break my sql query in two pieces (one with all temp tables and 2nd with final Select), then it runs fine. Below is a modified function that splits the long Query after finding '/**/' and it works fine
"""
This Function Reads a SQL Script From an Extrenal File and Executes The
Script in SQL. If The SQL Script Has Bunch of Tem Tables/Views
Followed By a Select Statement to Retrieve Data From Those Views Then Input
SQL File Should Have '/**/' Immediately Before the Final
Select Statement. This is to Esnure Final Select Statement is Executed on
the Temporary Views Already Run by Python.
Input is a SQL File Name and Output is a DataFrame
"""
import pyodbc as db
import pandas as pd
filename = 'filename.sql'
username = 'XXXX'
password = 'YYYYY'
driver= '{ODBC Driver 13 for SQL Server}'
database = 'DB'
server = 'local'
conn = db.connect('DRIVER='+driver+'; PORT=1433; SERVER='+server+';
PORT=1443; DATABASE='+database+'; UID='+username+'; PWD='+ password)
fd = open(filename, 'r')
sqlfile = fd.read()
fd.close()
sql = sqlfile.split('/**/')
sqlcommand1 = sql[0] #1st Section of Query with temp tables
sqlcommand2 = sql[1] #2nd section of Query with final SELECT statement
conn.execute(sqlcommand1)
df_table = pd.read_sql(sqlcommand2, conn)
Quick and dirty answer: if using T-SQL put the line SET NOCOUNT ON at the beginning of your query.
Like #Parfait mentioned above the pandas read_sql method can only support one result set. However, when you generate a temp table in T-sql you do create a result set in the form "(XX row(s) affected)" which is what causes your original query to fail. By setting NOCOUNT you eliminate any early returns and only get the results from your final SELECT statement.
Alternatively, if using pyodbc cursor instead of pandas you can utilize nextset() to skip the result sets from the temp table(s). More info on pyodbc here.
Related
I am trying to connect to the snowflake database using Python. I have the .sql file in VS code that contains multiple SQL statements. For example:
select * from table1;
select * from table2:
select * from table3:
So, I tried this code to get the result but it returned an error:
"Multiple SQL statements in a single API call are not supported; use one API call per statement instead."
My Python code is
#!/usr/bin/env python
import snowflake.connector
# Gets the version
ctx = snowflake.connector.connect(
user='<user_name>',
password='<password>',
account='<account_identifier>'
)
cs = ctx.cursor()
try:
with open('<file_directory>') as f:
lines = f.readlines()
cs.execute(lines)
data_frame=cs.fetch_pandas_all()
data_frame.to_csv('filename.csv')
finally:
cs.close()
ctx.close()
What can I try next?
Perhaps do as the error suggest, and limit each API call to a single SQL statement?
dfs = []
for line in lines:
cs.execute(lines)
dfs.append(cs.fetch_pandas_all())
df = pd.concat(dfs).to_csv('filename.csv')
I am using python to establish db connection and reading csv file. For each line in csv i want to run a PostgreSQL query and get value corresponding to each line read.
DB connection and file reading is working fine. Also if i run query for hardcoded value then it works fine. But if i try to run query for each row in csv file using python variable then i am not getting correct value.
cursor.execute("select team from users.teamdetails where p_id = '123abc'")
Above query works fine.
but when i try it for multiple values fetched from csv file then i am not getting correct value.
cursor.execute("select team from users.teamdetails where p_id = queryPID")
Complete code for Reference:
import psycopg2
import csv
conn = psycopg2.connect(dbname='', user='', password='', host='', port='')
cursor = conn.cursor()
with open('playerid.csv','r') as csv_file:
csv_reader = csv.reader(csv_file)
for line in csv_reader:
queryPID = line[0]
cursor.execute("select team from users.teamdetails where p_id = queryPID")
team = cursor.fetchone()
print (team[0])
conn.close()
DO NOT concatenate the csv data. Use a parameterised query.
Use %s inside your string, then pass the additional variable:
cursor.execute('select team from users.teamdetails where p_id = %s', (queryPID,))
Concatenation of text leaves your application vulnerable to SQL injection.
https://www.psycopg.org/docs/usage.html
I have a SQLite db with three relational tables. I'm trying to return the max record from a log table along with related columns from the other tables based on the ID relationships.
I created the query in DB Browser and verified it returns the expected record however, when I use the exact same query statement in my python code it never steps into the 'for' loop.
SQL statement in python -
def GetLastLogEntry():
readings = ()
conn = sqlite3.connect(dbName)
conn.row_factory = sqlite3.Row
cursor = conn.cursor()
cursor.execute("SELECT f.FoodCategory, f.FoodName, gs.FoodWeight,
gsl.GrillDateTime, gsl.CurrentGrillTemp, gsl.TargetGrillTemp,
gsl.CurrentFoodTemp, gsl.TargetFoodTemp, gsl.CurrentOutsideTemp,
gsl.CurrentOutsideHumidity FROM Food as f, GrillSession as gs,
GrillSessionLog as gsl WHERE f.FoodId = gs.FoodId AND
gs.GrillSessionID = gsl.GrillSessionID AND gsl.GrillSessionLogID =
(SELECT MAX(GrillSessionLog.GrillSessionLogID) FROM
GrillSessionLog, GrillSession WHERE GrillSessionLog.GrillSessionID
= GrillSession.GrillSessionID AND GrillSession.ActiveSession =
1)")
for row in cursor:
print("In for loop")
readings = readings + (row['FoodCategory'], row['FoodName'])
print("Food Cat = " + row['FoodCategory'])
cursor.close()
return readings
The query in DB Browser returns only one row which is what I'm trying to have happen in the python code.
Just discovered the issue....
Using DB Browser, I updated a record I'm using for testing but failed to "write" the change to the database table. As a result, every time I was executing my python code against the table it was executing the query with the original record values because my change wasn't yet committed via DB Browser.
Huge brain fart on that one.... Hopefully it will be a lesson learned for someone else in the future.
I have multiple unstructured txt files in a directory and I want to insert all of them into mysql; basically, the entire content of each text file should be placed into a row . In MySQL, I have 2 columns: ID (auto increment), and LastName(nvarchar(45)). I used Python to connect to MySql; used LOAD DATA LOCAL INFILE to insert the whole content. But when I run the code I see the following messages in Python console:
.
Also, when I check MySql, I see nothing but a bunch of empty rows with Ids being automatically generated.
Here is the code:
import MySQLdb
import sys
import os
result = os.listdir("C:\\Users\\msalimi\\Google Drive\\s\\Discharge_Summary")
for x in result:
db = MySQLdb.connect("localhost", "root", "Pass", "myblog")
cursor = db.cursor()
file1 = os.path.join(r'C:\\Discharge_Summary\\'+x)
cursor.execute("LOAD DATA LOCAL INFILE '%s' INTO TABLE clamp_test" %(file1,));
db.commit()
db.close()
Can someone please tell me what is wrong with the code? What is the right way to achieve my goal?
I edited my code with:
.....cursor.execute("LOAD DATA LOCAL INFILE '%s' INTO TABLE clamp_test LINES TERMINATED BY '\r' (Lastname) SET id = NULL" %(file1,))
and it worked :)
I'm trying to upload a pandas data frame to an SQL table. It seemed to me that pandas to_sql function is the best solution for larger data frames, but I can't get it to work. I can easily extract data, but get an error message when trying to write it to a new table:
# connect to Exasol DB
exaString='DSN=exa'
conDB = pyodbc.connect(exaString)
# get some data from somewhere, works without error
sqlString = "SELECT * FROM SOMETABLE"
data = pd.read_sql(sqlString, conDB)
# now upload this data to a new table
data.to_sql('MYTABLENAME', conDB, flavor='mysql')
conDB.close()
The error message I get is
pyodbc.ProgrammingError: ('42000', "[42000] [EXASOL][EXASolution driver]syntax error, unexpected identifier_chain2, expecting
assignment_operator or ':' [line 1, column 6] (-1)
(SQLExecDirectW)")
Unfortunately I have no idea how the query that caused this syntax error looks like or what else is wrong. Can someone please point me in the right direction?
(Second) EDIT:
Following Humayuns and Joris suggestions, I now use Pandas version 0.14 and SQLAlchemy in combination with the Exasol dialect (?). Since I am connecting to a defined schema, I am using the meta data option, but the programm crashes with "Bus error (core dumped)".
engine = create_engine('exa+pyodbc://uid:passwd#exa/mySchemaName', echo=True)
# get some data
sqlString = "SELECT * FROM SOMETABLE" # SOMETABLE is a view in mySchemaName
df = pd.read_sql(sqlString, con=engine) # works
print engine.has_table('MYTABLENAME') # MYTABLENAME is a view in mySchemaName
# prints "True"
# upload it to a new table
meta = sqlalchemy.MetaData(engine, schema='mySchemaName')
meta.reflect(engine, schema='mySchemaName')
pdsql = sql.PandasSQLAlchemy(engine, meta=meta)
pdsql.to_sql(df, 'MYTABLENAME')
I am not sure about setting "mySchemaName" in create_engine(..), but the outcome is the same.
Pandas does not support the EXASOL syntax out of the box, so it need to be changed a bit, here is a working example of your code without SQLAlchemy:
import pyodbc
import pandas as pd
con = pyodbc.connect('DSN=EXA')
con.execute('OPEN SCHEMA TEST2')
# configure pandas to understand EXASOL as mysql flavor
pd.io.sql._SQL_TYPES['int']['mysql'] = 'INT'
pd.io.sql._SQL_SYMB['mysql']['br_l'] = ''
pd.io.sql._SQL_SYMB['mysql']['br_r'] = ''
pd.io.sql._SQL_SYMB['mysql']['wld'] = '?'
pd.io.sql.PandasSQLLegacy.has_table = \
lambda self, name: name.upper() in [t[0].upper() for t in con.execute('SELECT table_name FROM cat').fetchall()]
data = pd.read_sql('SELECT * FROM services', con)
data.to_sql('SERVICES2', con, flavor = 'mysql', index = False)
If you use the EXASolution Python package, then the code would look like follows:
import exasol
con = exasol.connect(dsn='EXA') # normal pyodbc connection with additional functions
con.execute('OPEN SCHEMA TEST2')
data = con.readData('SELECT * FROM services') # pandas data frame per default
con.writeData(data, table = 'services2')
The problem is that also in pandas 0.14 the read_sql and to_sql functions cannot deal with schemas, but using exasol without schemas makes no sense. This will be fixed in 0.15. If you want to use it now look at this pull request https://github.com/pydata/pandas/pull/7952