How to send parameters with f-Strings in a SQLite query python - python

How can I send a parameter to a query this is my code
import pandas as pd
import sqlite3
def query_brand(filter):
sql_query = pd.read_sql(f'SELECT * FROM ps_lss_brands WHERE label = {filter}',
self.conn_brand)
df = pd.DataFrame(sql_query, columns = ['id_brand', 'label'])
# print(df["id_brand"][0])
print(df)
query_brand("ACURA")
This the error that I get:
pandas.errors.DatabaseError: Execution failed on sql 'SELECT * FROM ps_lss_brands WHERE label=ACURA': no such column: ACURA
My column is label but in the query it is trying to look for an ACURA column

There is an issue in the fourth line.
Please change your SQL query to include quotation marks around the {filter}
Specifically, make your fourth line something like this:
sql_query = pd.read_sql(f"SELECT * FROM ps_lss_brands WHERE label = '{filter}'",
self.conn_brand)
However, you should try to avoid this altogether, and instead use parameterized queries. This will prevent SQL injection.

Related

python SQLite3 how to getting records that match a list of values in a column then place that into pandas df

I am not experienced with SQL or SQLite3.
I have a list of ids from another table. I want to use the list as a key in my query and get all records based on the list. I want the SQL query to feed directly into a DataFrame.
import pandas as pd
import sqlite3
cnx = sqlite3.connect('c:/path/to/data.sqlite')
# the below values are ones found in "s_id"
id_list = ['C20','C23','C25','C28', ... ,'C83']
# change list to sql string.
id_sql = ", ".join(str(x) for x in id_list)
df = pd.read_sql_query(f"SELECT * FROM table WHERE s_id in ({id_sql})", cnx)
I am getting a DatabaseError: Execution failed on sql 'SELECT * FROM ... : no such column: C20.
When I saw this error I thought the code just needs a simple switch. So I tried this
df = pd.read_sql_query(f"SELECT * FROM table WHERE ({id_sql}) in s_id", cnx)
it did not work.
So how can I get this to work?
The table is like.
id
s_id
date
assigned_to
date_complete
notes
0
C10
1/6/2020
Jack
1/8/2020
None
1
C20
1/10/2020
Jane
1/12/2020
Call back
2
C23
1/11/2020
Henry
1/12/2020
finished
n
C83
rows
of
more
data
n+1
D85
9/10/2021
Jeni
9/12/2021
Call back
Currently, you are missing the single quotes around your literal values and consequently the SQLite engine assumes you are attempting to query columns. However, avoid concatenation of values altogether but bind them to parameters which pandas pandas.read_sql supports with the params argument:
# the below values are ones found in "s_id"
id_list = ['C20','C23','C25','C28', ... ,'C83']
# build equal length string of ? place holders
prm_list = ", ".join("?" for _ in id_list)
# build prepared SQL statement
sql = f"SELECT * FROM table WHERE s_id IN ({prm_list})"
# run query, passing parameters and values separately
df = pd.read_sql(sql, con=cnx, params=id_list)
So the problem is that it is missing single quote marks in the sql string. So for the in part needs ' on each side of the s_id values.
import pandas as pd
import sqlite3
cnx = sqlite3.connect('c:/path/to/data.sqlite')
# the below values are ones found in "s_id"
id_list = ['C20','C23','C25','C28', ... ,'C83']
# change list to sql string.
id_sql = "', '".join(str(x) for x in id_list)
df = pd.read_sql_query(f"SELECT * FROM table WHERE s_id in ('{id_sql}')", cnx)

SQL Parameter in Python script

I'm looking to do a simple read_sql in pandas to use a variable to extract data in Python and SQL Server as such:
import pyodbc as cnn
import pandas as pd
cursor = cnn.Cursor
cnxn = cnn.connect('DRIVER={SQL Server};SERVER=SQLSERVER;DATABASE=DATABASE')
x = "FirstName"
tableResult = pd.read_sql(("SELECT * FROM TABLE where COLUMN = ?"),cnxn,index_col=None, coerce_float=True, params=x)
I get the following error however:
Execution failed on sql 'SELECT * FROM TABLE where COLUMN = ?': ('The SQL contains 1 parameter markers, but 9 parameters were supplied', 'HY000')
What am I doing wrong here?
I also tried this, which runs, but I don't know how to actually grab the results here:
tableResult2 = cnxn.execute("SELECT * FROM TABLE where COLUMN = ?", x)
It would be like this:
import pyodbc as cnn
import pandas as pd
cursor = cnn.Cursor
cnxn = cnn.connect('DRIVER={SQL Server};SERVER=SQLSERVER;DATABASE=DATABASE')
x = "FirstName"
query=("SELECT * FROM TABLE"
f"where COLUMN = '{x}'")
tableResult = pd.read_sql(query,cnxn)

Reading an SQL query into a Dask DataFrame

I'm trying create a function that takes an SQL SELECT query as a parameter and use dask to read its results into a dask DataFrame using the dask.read_sql_query function. I am new to dask and to SQLAlchemy.
I first tried this:
import dask.dataFrame as dd
query = "SELECT name, age, date_of_birth from customer"
df = dd.read_sql_query(sql=query, con=con_string, index_col="name", npartitions=10)
As you probably already know, this won't work because the sql parameter has to be an SQLAlchemy selectable and more importantly, TextClause isn't supported.
I then wrapped the query behind a select like this:
import dask.dataFrame as dd
from sqlalchemy import sql
query = "SELECT name, age, date_of_birth from customer"
sa_query = sql.select(sql.text(query))
df = dd.read_sql_query(sql=sa_query, con=con_string, index_col="name")
This fails too with a very weird error that I have been trying to solve. The problem is that dask needs to infer the types of the columns and it does so by reading the first head_row rows in the table - 5 rows by default - and infer the types there. This line in the dask codebase adds a LIMIT ? to the query, which ends up being
SELECT name, age, date_of_birth from customer LIMIT param_1
The param_1 doesn't get substituted at all with the right value - 5 in this case. It then fails on the next line, https://github.com/dask/dask/blob/main/dask/dataframe/io/sql.py#L119, tjat evaluates the SQL expression.
sqlalchemy.exc.ProgrammingError: (mariadb.ProgrammingError) You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELECT name, age, date_of_birth from customer
LIMIT ?' at line 1
[SQL: SELECT SELECT name, age, date_of_birth from customer
LIMIT ?]
[parameters: (5,)]
(Background on this error at: https://sqlalche.me/e/14/f405)
I can't understand why param_1 wasn't substituted with the value of head_rows. One can see from the error message that it detects there's a parameter that needs to be used for the substitution but for some reason it doesn't actually substitute it.
Perhaps, I didn't correctly create the SQLAlchemy selectable?
I can simply use pandas.read_sql and create a dask dataframe from the resulting pandas dataframe but that defeats the purpose of using dask in the first place.
I have the following constraints:
I cannot change the function to accept a ready-made sqlalchemy
selectable. This feature will be added to a private library used at
my company and various projects using this library do not use
sqlalchemy.
Passing meta to the custom function is not an option because it would require the caller do create it. However, passing a meta attribute to read_sql_query and setting head_rows=0 is completely ok as long as there's an efficient way to retrieve/create
while dask-sql might work for this case, using it is not an
option, unfortunately
How can I go about correctly reading an SQL query into dask dataframe?
The crux of the problem is this line:
sa_query = sql.select(sql.text(query))
What is happening is that we are constructing a nested SELECT query,
which can cause a problem downstream.
Let's first create a test database:
# create a test database (using https://stackoverflow.com/a/64898284/10693596)
from sqlite3 import connect
from dask.datasets import timeseries
con = "delete_me_test.sqlite"
db = connect(con)
# create a pandas df and store (timestamp is dropped to make sure
# that the index is numeric)
df = (
timeseries(start="2000-01-01", end="2000-01-02", freq="1h", seed=0)
.compute()
.reset_index()
)
df.to_sql("ticks", db, if_exists="replace")
Next, let's try to get things working with pandas without sqlalchemy:
from pandas import read_sql_query
con = "sqlite:///test.sql"
query = "SELECT * FROM ticks LIMIT 3"
meta = read_sql_query(sql=query, con=con).set_index("index")
print(meta)
# id name x y
# index
# 0 998 Ingrid 0.760997 -0.381459
# 1 1056 Ingrid 0.506099 0.816477
# 2 1056 Laura 0.316556 0.046963
Now, let's add sqlalchemy functions:
from pandas import read_sql_query
from sqlalchemy.sql import text, select
con = "sqlite:///test.sql"
query = "SELECT * FROM ticks LIMIT 3"
sa_query = select(text(query))
meta = read_sql_query(sql=sa_query, con=con).set_index("index")
# OperationalError: (sqlite3.OperationalError) near "SELECT": syntax error
# [SQL: SELECT SELECT * FROM ticks LIMIT 3]
# (Background on this error at: https://sqlalche.me/e/14/e3q8)
Note the SELECT SELECT due to running sqlalchemy.select on an existing query. This can cause problems. How to fix this? In general, I don't think there's a safe and robust way of transforming arbitrary SQL queries into their sqlalchemy equivalent, but if this is for an application where you know that users will only run SELECT statements, you can manually sanitize the query before passing it to sqlalchemy.select:
from dask.dataframe import read_sql_query
from sqlalchemy.sql import select, text
con = "sqlite:///test.sql"
query = "SELECT * FROM ticks"
def _remove_leading_select_from_query(query):
if query.startswith("SELECT "):
return query.replace("SELECT ", "", 1)
else:
return query
sa_query = select(text(_remove_leading_select_from_query(query)))
ddf = read_sql_query(sql=sa_query, con=con, index_col="index")
print(ddf)
print(ddf.head(3))
# Dask DataFrame Structure:
# id name x y
# npartitions=1
# 0 int64 object float64 float64
# 23 ... ... ... ...
# Dask Name: from-delayed, 2 tasks
# id name x y
# index
# 0 998 Ingrid 0.760997 -0.381459
# 1 1056 Ingrid 0.506099 0.816477
# 2 1056 Laura 0.316556 0.046963

use string as columns definition for DataFrame(cursor.fetchall(),columns

I would like to use a string as column names for pandas DataFrame.
The problem arised is that pandas DataFrame interpret the string var as single column instead of multiple ones. An thus the error:
ValueError: 1 columns passed, passed data had 11 columns
The first part of my code is intended to get the column names from the Mysql database I am about to query:
cursor1.execute ("SELECT GROUP_CONCAT(COLUMN_NAME) AS cols FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'or_red' AND TABLE_NAME = 'nomen_prefix'")
for colsTableMysql in cursor1.fetchall() :
colsTable = colsTableMysql[0]
colsTable="'"+colsTable.replace(",", "','")+"'"
The second part uses the created variable "colsTable" :
cursor = connection.cursor()
cursor.execute("SELECT * FROM or_red.nomen_prefix WHERE C_emp IN ("+emplazamientos+")")
tabla = pd.DataFrame(cursor.fetchall(),columns=[colsTable])
#tabla = exec("pd.DataFrame(cursor.fetchall(),columns=["+colsTable+"])")
#tabla = pd.DataFrame(cursor.fetchall())
I have tried ather aproaches like the use of exec(). In that case, there is no error but there is no response with information either, and the result of print(tabla) is None.
¿Is there any direct way of passing the columns dynamically as string to a python pandas DataFrame?
Thanks in advance
I am going to answer my question since I've already found the way.
The first part of my code is intended to get the column names from the Mysql database table I am about to query:
cursor1.execute ("SELECT GROUP_CONCAT(COLUMN_NAME) AS cols FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'or_red' AND TABLE_NAME = 'nomen_prefix'")
for colsTableMysql in cursor1.fetchall() :
colsTable = colsTableMysql[0]
colsTable="'"+colsTable.replace(",", "','")+"'"
The second part uses the created variable "colsTable" as input in the statement to define the columns.
cursor = connection.cursor()
cursor.execute("SELECT * FROM or_red.nomen_prefix WHERE C_emp IN ("+emplazamientos+")")
tabla = eval("pd.DataFrame(cursor.fetchall(),columns=["+colsTable+"])")
Using eval the string is parsed and evaluated as a Python expression.

How to Use a Wildcard (%) in Pandas read_sql()

I am trying to run a MySQL query that has a text wildcard in as demonstrated below:
import sqlalchemy
import pandas as pd
#connect to mysql database
engine = sqlalchemy.create_engine('mysql://user:#localhost/db?charset=utf8')
conn = engine.connect()
#read sql into pandas dataframe
mysql_statement = """SELECT * FROM table WHERE field LIKE '%part%'; """
df = pd.read_sql(mysql_statement, con=conn)
When run I get the error as shown below related to formatting.
TypeError: not enough arguments for format string
How can I use a wild card when reading MySQL with Pandas?
from sqlalchemy import create_engine, text
import pandas as pd
mysql_statement = """SELECT * FROM table WHERE field LIKE '%part%'; """
df = pd.read_sql( text(mysql_statement), con=conn)
You can use text() function from sqlalchemy
Under the hood of pandas, it's using whatever sql engine you've given it to parse the statement. In your case that's sqlalchemy, so you need to figure out how it handles %. It might be as easy as escaping it with LIKE '%%part%%'.
In the case of psycopg, you use the params variable like this:
mysql_statement = """SELECT * FROM table WHERE field LIKE %s; """
df = pd.read_sql(mysql_statement, con=conn, params=("%part%",))
you need to use params options in read_sql pandas method.
# string interpolate
mysql_statement = """SELECT * FROM table WHERE field LIKE %s; """
df = pd.read_sql(mysql_statement, con=conn, params=("%part%",))
# variable interpolate
# string interpolate
val = 'part'
mysql_statement = """SELECT * FROM table WHERE field LIKE %s; """
df = pd.read_sql(mysql_statement, con=conn, params=('%' + val + '%',))
'select * from _table_ where "_column_name_" like 'somethin%' order by... '
Putting "" around the _column_name_ solved this problem for me.

Categories