Pandas read_sql with where clause using "in"

Pandas read_sql with where clause using "in" - python

Help!
I need to query a table with an "in" clause, where the SQL looks like this:
select * from some_table where some_field in (?)
I originally took a naive approach and tried this:
in_items = [1,2,3,4]
df = pd.read_sql(MY_SQL_STATEMENT, con=con, params=[in_items]
Which did not work, it throws the following error:
The SQL contains 1 parameter markers, but 4 parameters were supplied
Where I'm stuck at, is figuring out how to pass a list of items as a single parameter.
I can do a string concatenation approach, something like:
MY_SQL = 'select * from tableA where fieldA in ({})'.format(
','.join([str(x) from x in list_items]))
df = pd.read_sql(MY_SQL, con=con)
I would rather avoid this approach if possible. Does anybody know of a way to pass a list of values as a single parameter?
I'm also open to a possibly more cleverer way to do this. :)

Simply string format the placeholders then pass in your params into pandas.read_sql. Do note, placeholder markers depend on DB-API: pyodbc/sqlite3 uses qmarks ? and most others use %s. Below assumes the former marker:
in_items = [1,2,3,4]
MY_SQL = 'select * from tableA where fieldA in ({})'\
.format(', '.join(['?' for _ in in_items]))
# select * from tableA where fieldA in (?, ?, ?, ?)
df = pd.read_sql(MY_SQL, con=con, params=[in_items])

For me, using sqllite3, worked this way:
list_of_entries_to_retrive = pd.read_excel('.table_with_entries.xlsx')
list_of_entries_to_retrive = (cell_list['entries']).tolist()
conn = sqlite3.connect('DataBase.db')
queryString = 'SELECT * FROM table WHERE attribute IN (\'{}\');'.format('\',\''.join([_ for _ in list_of_entries_to_retrive]))
df = pd.read_sql(queryString, con=conn)
Do not worked this way:
df = pd.read_sql(queryString, con=conn, params=[list_of_entries_to_retrive]))
Thanks

Related

use python variable to read specific rows from access table using sqlalchemy

I have an access table called "Cell_list" with a key column called "Cell_#". I want to read the table into a dataframe, but only the rows that match indices which are specified in a python list "cell_numbers".
I tried several variations on:
import pyodbc
import pandas as pd
cell_numbers = [1,3,7]
cnn_str = r'Driver={Microsoft Access Driver (*.mdb,*.accdb)};DBQ=C:\folder\myfile.accdb;'
conn = pyodbc.connect(cnn_str)
query = ('SELECT * FROM Cell_list WHERE Cell_# in '+tuple(cell_numbers))
df = pd.read_sql(query, conn)
But no matter what I try I get a syntax error.
How do I do this?

Consider best practice of parameterization which is supported in pandas.read_sql:
# PREPARED STATEMENT, NO DATA
query = (
'SELECT * FROM Cell_list '
'WHERE [Cell_#] IN (?, ?, ?)'
)
# RUN SQL WITH BINDED PARAMS
df = pd.read_sql(query, conn, params=cell_numbers)
Consider even dynamic qmark placeholders dependent on length of cell_numbers:
qmarks = [', '.join('?' for _ in cell_numbers)]
query = (
'SELECT * FROM Cell_list '
f'WHERE [Cell_#] IN ({qmarks})'
)

Convert (join) cell_numbers to text:
cell_text = '(1,3,7)'
and concatenate this.
The finished SQL should read (you may need brackets around the weird field name Cell_#):
SELECT * FROM Cell_list WHERE [Cell_#] IN (1,3,7)

How To Pass Mutiple Values As Parameter In Cursor.Execute Method OF HANA Python Connector

I am using HANA Python Connector's Cursor.execute(sql,hanaparams) method, the parameters to this method are sql statement and hanaparams.My query is like this.
"SELECT * FROM TABLE WHRE COLUMN1 IN(?)" and My PARAMETES ARE VALUE1 ,VALUE2; LIKE LIST/TUPLE;
I am unable to retrieve resultset, Whereas when i run this in HANA with Query and Input Parameters Hard Coded in ,it runs perfectly fine
I am following this tutorial https://developers.sap.com/tutorials/hana-clients-python.html
Any pointers how should i Pass multiple values in Params

Something simple like this seems to work just fine. Count of ? must be equal to count of parameters you have. In your case it takes only VALUE1.
from hdbcli import dbapi
conn = dbapi.connect(
key='HDBKEY'
)
cursor = conn.cursor()
parameters = [11, "2020-12-24"]
params = '?,'*len(parameters)
params2 = params[0:-1]
sql_command2 = f"SELECT {params2} FROM DUMMY;"
cursor.execute(sql_command2, parameters)
rows = cursor.fetchall()
for row in rows:
for col in row:
print ("%s" % col, end=" ")
print (" ")
cursor.close()
conn.close()
So instead of SELECT * FROM TABLE WHERE COLUMN1 IN(?) it should be SELECT * FROM TABLE WHERE COLUMN1 IN(?, ?)

INSERT INTO SELECT based on a dataframe

I have a dataframe df and I want to to execute a query to insert into a table all the values from the dataframe. Basically I am trying to load as the following query:
INSERT INTO mytable
SELECT *
FROM mydataframe
For that I have the following code:
import pyodbc
import pandas as pd
connection = pyodbc.connect('Driver={' + driver + '} ;'
'Server=' + server + ';'
'UID=' + user + ';'
'PWD=' + pass + ';')
cursor = connection.cursor()
query = 'SELECT * FROM [myDB].[dbo].[myTable]'
df = pd.read_sql_query(query, connection)
sql = 'INSERT INTO [dbo].[new_date] SELECT * FROM :x'
cursor.execute(sql, x=df)
connection.commit()
However, I am getting the following error:
TypeError: execute() takes no keyword arguments
Does anyone know what I am doing wrong?

For raw DB-API insert query from Pandas, consider DataFrame.to_numpy() with executemany and avoid any top layer for looping. However, explicit columns must be used in append query. Adjust below columns and qmark parameter placeholders to correspond to data frame columns.
# PREPARED STATEMENT
sql = '''INSERT INTO [dbo].[new_date] (Col1, Col2, Col3, ...)
VALUES (?, ?, ?, ...)
'''
# EXECUTE PARAMETERIZED QUERY
cursor.executemany(sql, df.to_numpy().tolist())
conn.commit()
(And by the way, it is best practice generally in SQL queries to always explicitly reference columns and avoid SELECT * for code readability, maintainability, and even performance.)

I had some issues to connect pandas with SQL Server too. But I've get this solution to write my df:
import pyodbc
import sqlalchemy
engine = sqlalchemy.create_engine('mssql+pyodbc://{0}:{1}#{2}:{3}/{4}?driver={5}'.format(username,password,server,port,bdName,driver))
pd.to_sql("TableName",con=engine,if_exists="append")

See below my favourite solution, with UPSERT statement included.
df_columns = list(df)
columns = ','.join(df_columns)
values = 'VALUES({})'.format(','.join(['%s' for col in df_columns]))
update_list = ['{} = EXCLUDED.{}'.format(col, col) for col in df_columns]
update_str = ','.join(update_list)
insert_stmt = "INSERT INTO {} ({}) {} ON CONFLICT ([your_pkey_here]) DO UPDATE SET {}".format(table, columns, values, update_str)

cursor.execute doesnot accepts keyword arguments. One way of doing the insert can be using following below code snippet.
cols = "`,`".join([str(i) for i in df.columns.tolist()])
# Insert DataFrame recrds one by one.
for i,row in df.iterrows():
sql = "INSERT INTO `[dbo].[new_date]` (`" +cols + "`) VALUES (" + "?,"*(len(row)-1) + "%s)"
cursor.execute(sql, tuple(row))
here you are iterating through each row and then inserting it into the table.

thank you for your answers :) but I use the following code to solve my problem:
params = urllib.parse.quote_plus("DRIVER={SQL Server};SERVER=servername;DATABASE=database;UID=user;PWD=pass")
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
engine.connect()
query = query
df = pd.read_sql_query(query, connection)
df.to_sql(name='new_table',con=engine, index=False, if_exists='append')

Correct way to "select * from tbl where field in ?" and the placeholder is a list without string interpolation

I have a query of this form using pysqlite:
query = "select * from tbl where field1 in ?"
variables = ['Aa', 'Bb']
In a query, I'd like this to work:
with conn.cursor() as db:
res = db.execute(query, (variables,)).fetchall()
eg, interpreted into SQLITE command line as:
select * from tbl where field1 in ("Aa", "Bb");
But this fails with:
pysqlite3.dbapi2.InterfaceError: Error binding parameter 0 - probably unsupported type.
I understand I can just string.join([mylist]), but this is unsafe. How can I use placeholder parameters and a list in sqlite with python?
Update
Differentiating this from similar questions on Stackoverflow, they seem to be looking to use %s string interpolation where I am looking to avoid this

Question: WHERE field IN ? and the placeholder is a list without string interpolation
Values are a list of int
values = (42, 43, 44)
Prepare your Query with the number of bindings
bindings = '?,'*len(values)
QUERY = "SELECT * FROM t1 WHERE id IN ({});".format(bindings[:-1])
print("{}".format(QUERY))
Output:
SELECT * FROM t1 WHERE id IN (?,?,?);
Execute the Query
cur.execute (QUERY, values)

Pandas read_sql_query using multiple AND statements

I am trying to put together a SQL query in python pandas. I have attempted different methods, but always getting the following error:
Incorrect number of bindings supplied. The current statement uses 6, and there are 3 supplied.
My code is as follows. What am I doing wrong?
conn = sqlite3.connect(my_db)
df = pd.read_sql_query(
con = conn,
sql = """SELECT * FROM {table}
WHERE #var1 = (?)
AND #var2 = (?)
AND #var3 = (?)
;""".format(table=table),
params= (value1, value2, value3),
)

As you've said, #var1, #var2 and #var3 are all column names.
However, SQL interprets the # symbol as a parameter for a value which you will supply later in your code.
So, your SQL code expects 6 values because of the three (?)s and three #vars. But you're only supplying 3 values (for the (?)s), meaning that the said error is occurring.
I would recommend naming your columns something without '#' so that there is less chance for errors.
See this question for further clarification.

sqlite interprets the # symbol, like ?, as a parameter placeholder (Search for "Parameters"). If #var1 is the name of a column, then it must escaped by surrounding it with backticks:
df = pd.read_sql_query(
con = conn,
sql = """SELECT * FROM {table}
WHERE `#var1` = (?)
AND `#var2` = (?)
AND `#var3` = (?)""".format(table=table),
params= (value1, value2, value3), )
I agree with #AdiC, though -- it would be more convenient to rename your columns so that they do not use characters with special meaning.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas read_sql with where clause using "in" - python

Related

use python variable to read specific rows from access table using sqlalchemy

How To Pass Mutiple Values As Parameter In Cursor.Execute Method OF HANA Python Connector

INSERT INTO SELECT based on a dataframe

Correct way to "select * from tbl where field in ?" and the placeholder is a list without string interpolation

Pandas read_sql_query using multiple AND statements

Categories

Resources