Set a date variable to SQL query in Python [duplicate] - python

This question already has answers here:
How to pass variable values dynamically in pandas sql query
(2 answers)
Closed 4 years ago.
I want to do sql query in python. I could use cx_oracle to connection database in python:
# Build connection
conn_str = u'username/password#host:1521/sid'
conn = cx_Oracle.connect(conn_str)
Now I'm trying to retrieve data from the database by using SQL query in Python:
sql_select_statement = """SELECT * FROM TABLE
WHERE DATE BETWEEN '20-oct-2017' AND '30-oct-2017'"""
Assume we don't know the starting date, we only have a date variable called starting_time, and its value is a datetime %m/%d/%Y. Also, ending_time is yesterday, I would like to modify my SQL query as:
sql_select_statement = """SELECT * FROM TABLE
WHERE DATE BETWEEN '20-oct-2017' AND sysdate-1"""
df = pd.read_sql(sql_select_statement, conn)
It works and generate a new df, but how to replace '20-oct-2017' with the variable starting_time? It's inside the sql query, and it's datetime format, so general python method like 'd%' % variable doesn't work. How to solve this problem? Thanks!

Consider SQLAlchemy to connect pandas and use the params argument of pandas.read_sql to bind variable to SQL statement:
from sqlalchemy import create_engine
engine = create_engine("username/password#host:1521/sid")
sql_select_statement = "SELECT * FROM TABLE WHERE DATE BETWEEN :my_date AND sysdate-1"
my_var = '20-oct-2017'
df = pd.read_sql(sql_select_statement, engine, params={'my_date':my_var})
Alternatively, continue to use the raw connection with parameterization:
sql_select_statement = "SELECT * FROM TABLE WHERE DATE BETWEEN :my_date AND sysdate-1"
my_var = '20-oct-2017'
df = pd.read_sql(sql_select_statement, conn, params={'my_date':my_var})

Related

Pass Argument Through Sql Queries Pandas

I want to convert lot of database table into dataframe. So I tried this step manually.
sql_query = pd.read_sql_query ('''
SELECT
*
FROM attendance
''', test_db_engine)
test_db_attendance_df=pd.DataFrame(sql_query)
Where 'test_db_engine' is database connection.
This method work and I can create dataframe for table attendance.
Now, I want to put this into function so I can do with any table not just one. So I tried this method.
def sql_to_df(table_name):
sql_query = pd.read_sql_query ('''
SELECT
*
FROM table_name
''', test_db_engine)
test_db_df=pd.DataFrame(sql_query)
return test_db_df
sql_to_df(attendance)
It threw me an error:-
name 'attendance' is not defined
Can anyone tell me how to pass function argument through sql query so I can convert any number of database table into pandas dataframe? I need to pass attendance inside sql query replacing table_name.
python thinks that attendance is a variable , but you need to pass a string to the function and then use string replacement
def sql_to_df(table_name):
sql_query = pd.read_sql_query ('''
SELECT
*
FROM %s
''' % (table_name), test_db_engine)
test_db_df=pd.DataFrame(sql_query)
return test_db_df
sql_to_df('attendance')
use f-strings to format your query.
and pass attendance as a string (your error occurred because no variable attendance was set).
read_sql_query returns a dataframe.
def sql_to_df(table_name):
return pd.read_sql_query(f'''
SELECT
*
FROM {table_name}
''', test_db_engine)
sql_to_df("attendance")

use string as columns definition for DataFrame(cursor.fetchall(),columns

I would like to use a string as column names for pandas DataFrame.
The problem arised is that pandas DataFrame interpret the string var as single column instead of multiple ones. An thus the error:
ValueError: 1 columns passed, passed data had 11 columns
The first part of my code is intended to get the column names from the Mysql database I am about to query:
cursor1.execute ("SELECT GROUP_CONCAT(COLUMN_NAME) AS cols FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'or_red' AND TABLE_NAME = 'nomen_prefix'")
for colsTableMysql in cursor1.fetchall() :
colsTable = colsTableMysql[0]
colsTable="'"+colsTable.replace(",", "','")+"'"
The second part uses the created variable "colsTable" :
cursor = connection.cursor()
cursor.execute("SELECT * FROM or_red.nomen_prefix WHERE C_emp IN ("+emplazamientos+")")
tabla = pd.DataFrame(cursor.fetchall(),columns=[colsTable])
#tabla = exec("pd.DataFrame(cursor.fetchall(),columns=["+colsTable+"])")
#tabla = pd.DataFrame(cursor.fetchall())
I have tried ather aproaches like the use of exec(). In that case, there is no error but there is no response with information either, and the result of print(tabla) is None.
¿Is there any direct way of passing the columns dynamically as string to a python pandas DataFrame?
Thanks in advance
I am going to answer my question since I've already found the way.
The first part of my code is intended to get the column names from the Mysql database table I am about to query:
cursor1.execute ("SELECT GROUP_CONCAT(COLUMN_NAME) AS cols FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'or_red' AND TABLE_NAME = 'nomen_prefix'")
for colsTableMysql in cursor1.fetchall() :
colsTable = colsTableMysql[0]
colsTable="'"+colsTable.replace(",", "','")+"'"
The second part uses the created variable "colsTable" as input in the statement to define the columns.
cursor = connection.cursor()
cursor.execute("SELECT * FROM or_red.nomen_prefix WHERE C_emp IN ("+emplazamientos+")")
tabla = eval("pd.DataFrame(cursor.fetchall(),columns=["+colsTable+"])")
Using eval the string is parsed and evaluated as a Python expression.

Value error inserting into Postgres table with psycopg2

I've been trying to use this piece of code:
# df is the dataframe
if len(df) > 0:
df_columns = list(df)
# create (col1,col2,...)
columns = ",".join(df_columns)
# create VALUES('%s', '%s",...) one '%s' per column
values = "VALUES({})".format(",".join(["%s" for _ in df_columns]))
#create INSERT INTO table (columns) VALUES('%s',...)
insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)
cur = conn.cursor()
cur = db_conn.cursor()
psycopg2.extras.execute_batch(cur, insert_stmt, df.values)
conn.commit()
cur.close()
So I could connect into Postgres DB and insert values from a df.
I get these 2 errors for this code:
LINE 1: INSERT INTO mrr.shipments (mainFreight_freight_motherVesselD...
psycopg2.errors.UndefinedColumn: column "mainfreight_freight_mothervesseldepartdatetime" of relation "shipments" does not exist
for some reason, the columns can't get the values properly
What can I do to fix it?
You should not do your own string interpolation; let psycopg2 handle it. From the docs:
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
Since you also have dynamic column names, you should use psycopg2.sql to create the statement and then use the standard method of passing query parameters to psycopg2 instead of using format.

Passing a list of values from Python to the IN clause of an SQL query

I am trying to pass a list like below to a sql query
x = ['1000000000164774783','1000000000253252111']
I am using sqlalchemy and pyodbc to connect to sql:
import pandas as pd
from pandas import Series,DataFrame
import pyodbc
import sqlalchemy
cnx=sqlalchemy.create_engine("mssql+pyodbc://Omnius:MainBrain1#172.31.163.135:1433/Basis?driver=/opt/microsoft/sqlncli/lib64/libsqlncli-11.0.so.1790.0")
I tried using various string format functions to be used in the sql query. below is one of them
xx = ', '.join(x)
sql = "select * from Pretty_Txns where Send_Customer in (%s)" % xx
df = pd.read_sql(sql,cnx)
All of them seem to convert it into a numeric format
(1000000000164774783, 1000000000253252111) instead of ('1000000000164774783','1000000000253252111')
And hence the query fails as datatype of Send_Customer is varchar(50) in SQL
ProgrammingError: (pyodbc.ProgrammingError) ('42000', '[42000] [Microsoft][SQL Server Native Client 11.0]
[SQL Server]Error converting data type varchar to numeric. (8114) (SQLExecDirectW)')
[SQL: 'select * from Pretty_Txns where Send_Customer in (1000000000164774783, 1000000000253252111)']
As stated in the comment to the other answer, that approach can fail for a variety of reasons. What you really want to do is create an SQL statement with the required number of parameter placeholders and then use the params= parameter of .read_sql_query() to supply the values:
x = ['1000000000164774783','1000000000253252111']
placeholders = ','.join('?' for i in range(len(x))) # '?,?'
sql = f"select * from Pretty_Txns where Send_Customer in ({placeholders})"
df = pd.read_sql_query(sql, cnx, params=x)
Here's the SQL query you need
sql = f"select * from Pretty_Txns where Send_Customer in {tuple(x)}"
df = pd.read_sql(sql,cnx)
Used the below approach and it worked fine:
sql = "select * from Pretty_Txns where Send_Customer in %s" % str(tuple(x))
df = pd.read_sql(sql,cnx)
Making sqlalchemey, pyodbc to work with pandas read_sql() is a hairy and messy thing. After much frustration and bumping into various solutions and documentation from pandas and psycopg, here's the correct (so far) way to do a query with named parameter:
import pandas as pd
import psycopg2
# import pyodbc
import sqlalchemy
from sqlalchemy import text # this is crucial
cnx=sqlalchemy.create_engine(...)
x = ['1000000000164774783','1000000000253252111']
sql = "select * from Pretty_Txns where Send_Customer in (:id);" # named parameter
df = pd.read_sql(text(sql), cnx, params={'id':x}) # note how `sql`
# string is cast with text()
# and key-pair value is passed for
# named parameter 'id'
df.head()
I've made it work with the PostgreSQL database. I hope it wouldn't be too much different for MySQL.

How to Use a Wildcard (%) in Pandas read_sql()

I am trying to run a MySQL query that has a text wildcard in as demonstrated below:
import sqlalchemy
import pandas as pd
#connect to mysql database
engine = sqlalchemy.create_engine('mysql://user:#localhost/db?charset=utf8')
conn = engine.connect()
#read sql into pandas dataframe
mysql_statement = """SELECT * FROM table WHERE field LIKE '%part%'; """
df = pd.read_sql(mysql_statement, con=conn)
When run I get the error as shown below related to formatting.
TypeError: not enough arguments for format string
How can I use a wild card when reading MySQL with Pandas?
from sqlalchemy import create_engine, text
import pandas as pd
mysql_statement = """SELECT * FROM table WHERE field LIKE '%part%'; """
df = pd.read_sql( text(mysql_statement), con=conn)
You can use text() function from sqlalchemy
Under the hood of pandas, it's using whatever sql engine you've given it to parse the statement. In your case that's sqlalchemy, so you need to figure out how it handles %. It might be as easy as escaping it with LIKE '%%part%%'.
In the case of psycopg, you use the params variable like this:
mysql_statement = """SELECT * FROM table WHERE field LIKE %s; """
df = pd.read_sql(mysql_statement, con=conn, params=("%part%",))
you need to use params options in read_sql pandas method.
# string interpolate
mysql_statement = """SELECT * FROM table WHERE field LIKE %s; """
df = pd.read_sql(mysql_statement, con=conn, params=("%part%",))
# variable interpolate
# string interpolate
val = 'part'
mysql_statement = """SELECT * FROM table WHERE field LIKE %s; """
df = pd.read_sql(mysql_statement, con=conn, params=('%' + val + '%',))
'select * from _table_ where "_column_name_" like 'somethin%' order by... '
Putting "" around the _column_name_ solved this problem for me.

Categories