I have the following problem: I have a really large SQL statement inside Python code in string form:
sql = f"""
*many statements here*
"""
Part of that SQL statement is:
where 1 = 1
and selector in ('YES', 'NO')
AND parameter1 = value1
AND parameter2 = value2.1 OR parameter2 = value2.2
AND ...
where those AND/OR statements are given by Python dictionary in the form
{ parameter1: [value1], parameter2: [value2.1, value 2.2], ...}
I've written a function which takes that dictionary and unfolds it to become a string in the form:
AND (parameter1 = value1) AND ((parameter2 = value2.1) OR (parameter2 = value2.2)) AND ...
and inserted that string into large SQL statement via this function:
where 1 = 1
and selector in ('YES', 'NO')
{form_sql_statement_from_dictionary(dictionary)}
but it seems that this approach is vulnerable for SQL-injection attacks. Now, the safe way would be to parametrise the large SQL statement, but since I don't know how many parameters and values there will be in dictionary, I don't know how to make such parametrisation. Also, I can't change the large SQL statement. Somehow I have to form and insert that AND/OR statement into existing string in safe way. Is there any way of doing that, rather than trying to police dictionary values itself?
Full Python script looks like this:
async def query_for_data(
connection: "PgService", dictionary: Dict[str, Any]
) -> pd.DataFrame:
sql = f"""
*multiple SQL statements*
where 1 = 1
and selector in ('YES', 'NO')
{form_sql_statement_from_dictionary(dictionary)}
"""
data = await connection.fetch(sql)
data = pd.DataFrame(res, columns=[k for k in res[0].keys()])
return data
Function looks like this:
def form_sql_statement_from_dictionary(
dictionary: Dict[str, Any]) -> str:
hashvalue = list(dictionary.values())
scope = hashvalue[0]["scope"]
dictionary_element_names = list(scope.keys())
statement_elements = []
for element_name in dictionary_element_names:
dictionary_element_values = scope[element_name]
if len(dictionary_element_values) == 1:
dictionary_element_value = dictionary_element_values[0]
statement_element = (
f"( {prefix}{element_name} = '{dictionary_element_value}' )"
)
statement_elements.append(statement_element)
else:
statement_or_elements = []
for dictionary_element_value in dictionary_element_values:
statement_element = (
f"{prefix}{element_name} = '{dictionary_element_value}'"
)
statement_or_elements.append(statement_element)
final_or_statement = "( " + " OR ".join(statement_or_elements) + ")"
statement_elements.append(final_or_statement)
final_statement = " AND " + " AND ".join(statement_elements)
return final_statement
Details found here:
https://www.psycopg.org/docs/usage.html
First, build the SQL where clause with positional arguments, such as...
(x.col1 = %s) AND (x.col2 = %s OR x.col3 = %s)
At the same time create a list of those parameters
['foo', 'foo', 'bar']
Then use a parameterised query...
cursor = connection.cursor()
cursor.execute(sql, parameters)
data = await cursor.fetchall()
The parameterisation will quote and escape all the parameters for you, so no SQL Injection attacks from those.
BUT the column names are still being substituted in to the query directly. There is no inbuilt way to protect you from that. If a user has direct control over those strings, they can still hack you that way.
As such it is vital that you police, validate, whatever, those column names yourself, through whatever means are appropriate to your use case.
All in all, the revised python would look something like...
def form_sql_statement_from_dictionary(dictionary):
hashvalue = list(dictionary.values())
scope = hashvalue[0]["scope"]
dictionary_element_names = list(scope.keys())
prefix='ummmmmmmmmm.'
statement_elements = []
statement_params = []
for element_name in dictionary_element_names:
statement_elements.append(
' OR '.join(
f"{prefix}{element_name} = %s"
for item in scope[element_name]
)
)
statement_params += scope[element_name]
return '(' + ') AND ('.join(statement_elements) + ')', statement_params
async def query_for_data(
connection: "PgService", dictionary: Dict[str, Any]
) -> pd.DataFrame:
sql_where_clause, sql_params = form_sql_statement_from_dictionary(dictionary)
sql = f"""
*multiple SQL statements*
where 1 = 1
and selector in ('YES', 'NO')
and {sql_where_clause}
"""
cursor = connection.cursor()
cursor.execute(sql, sql_params)
res = await cursor.fetchall()
data = pd.DataFrame(res, columns=[k for k in res[0].keys()])
return data
Related
Say for example, I have a table of students, and I have a Python dictionary
mydict = {"fname" : "samwise", "lname" : "gamgee", "age" : 13}
How can I safely generate a Python function that can UPDATE this into my student table? (In my use-case I can safely assume that the student already exists in the table, AND I KNOW the id already)
I have created a function that achieves this functionality, but I can't help but think it's a bit crude, and perhaps open to SQL injection attacks
def sqlite_update(conn, table, data, pkeyname, pkeyval):
set_lines = ""
for k,v in data.items():
set_lines += "{} = '{}', ".format(k,v)
set_lines = set_lines[:-2] #remove space and comma from last item
sql = "UPDATE {0} SET {1} WHERE {2} = '{3}'"
statement = sql.format(table, set_lines, pkeyname, pkeyval)
conn.execute(statement)
conn.commit()
And to update I just call
sqlite_update(conn, "student", mydict, "id", 1)
As I assume you are using sqlalchemy. In this case, you can use sqlalchemy.sql.text function which escapes strings if required.
You can try to adjust your function as below.
from sqlalchemy.sql import text
def sqlite_update(conn, table, data, pkeyname, pkeyval):
set_lines = ",".join([f"{k}=:{k}" for k in data.keys()])
statement = text(f"UPDATE {table} SET {set_lines} WHERE {pkeyname} = :pkeyval")
args = dict(data)
args["pkeyval"] = pkeyval
conn.execute(statement, args)
conn.commit()
For more details, refer to sqlalchemy official documentation on text function.
EDIT
As for sqlite3 connection you can do basically the same thing as above, with slight changes.
def sqlite_update(conn, table, data, pkeyname, pkeyval):
set_lines = ",".join([f"{k}=:{k}" for k in data.keys()])
statement = f"UPDATE {table} SET {set_lines} WHERE {pkeyname} = :pkeyval"
args = dict(data)
args["pkeyval"] = pkeyval
conn.execute(statement, args)
conn.commit()
Refer to sqlite3 execute
This is indeed widely opened to SQL injection, because you build the query as a string including its values, instead of using a parameterized query.
Building a parameterized query with Python is easy:
def sqlite_update(conn, table, data, pkeyname, pkeyval):
query = f"UPDATE {table} SET " + ', '.join(
"{}=?".format(k) for k in data.keys()) + f" WHERE {pkeyname}=?"
# uncomment next line for debugging
# print(query, list(data.values()) + [pkeyval])
conn.execute(query, list(data.values()) + [pkeyval])
With your example, the query displayed by the debug print line is:
UPDATE student SET fname=?, lname=?, age=? WHERE id=?
with the following values list: ['samwise', 'gamgee', 13, 1]
But beware, to be fully protected from SQL injection, you should sanitize the table and field names to ensure they contain no dangerous characters like ;
I am using case when in sqlAlchemy as shown below:
abc = "%abc%"
def = "%def%"
proj1 = "%project1%"
proj2 = "%project2%"
case_condition = case([
(text('FPro.status = "ON" and Ab.name.like (''' + abc + ''') and F.project.like (''' + proj1 ''')'''), 'value1'),
(text('FPro.status = "ON" and Ab.name.like (''' + abc + ''') and F.project.like (''' + proj2 + ''')'''), 'value2'),
(text('FPro.status = "OFF" and Ab.name.like (''' + def + ''') and F.project.like (''' + abc + ''')'''), 'value3')]).label (deriver_vals)
query = db.session.query(F)\
.join(FPro, F.id == FPro.f_id)\
.join(Ab, Ab.id == F.ab_id).with_entities(FPro.f_id, case_condition,
F.f_name, F.proj,
FPro.status).subquery()
main_query = db.session.query(Tags).join(query, Tags.tag == query.c.derived_vals).\
with_entities(query.c.f_id.label('f_id'), query.c.derived_vals.label('derived_vals'), Tags.id.label('tag_id')).all()
The above code generates a sql statement like below:
SELECT anon_1.f_id AS f_id, anon_1.derived_vals AS derived_vals, fw_tags.id AS tag_id
FROM fw_tags INNER JOIN (SELECT fpro.f_id AS f_id, CASE WHEN FPro.status = "ON" and Ab.name.like (%%abc%%) and F.project.like (%%proj1%%) THEN %(param_1)s WHEN FPro.status = "ON" and Ab.name.like ( %%abc%%) and Flow.project.like ( %%proj2%%) THEN %(param_2)s WHEN FPro.status = "ON" and Ab.name.like (%%def%%) and F.project.like (%%proj1%%) THEN %(param_3)s END AS derived_vals, F.f_name , F.proj AS project,
FROM F INNER JOIN FPro ON f.id = Fpro.f_id INNER JOIN Ab ON Ab.id = F.ab_id) AS anon_1 ON fw_tags.tag = anon_1.derived_vals]
This is exactly the query I want but I am getting below error while executing the script which contains above code:
sqlalchemy.exc.ProgrammingError: (pymysql.err.ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(%abc%) and F.project.like (%proj1%) THEN 'value1' WHEN FPro' at line 2")
I am guessing the error is with the '%' getting appended, but I am not sure, so any help regarding why this error is occuring or what can be done to prevent the % getting added will be appreciated.
Is there a way to do this without using text or literal()?
In standard SQL "ON" is a delimited/quoted identifier, not a (text) literal, identifying a column, table, or some other schema object. Use single quotes instead: 'ON'. Some DBMS have modes that allow using double quotes for literals, or even attempt to infer the meaning from context, but perhaps it is not a good idea to get into the habit.
In Python 'FPro.status ...''' is the concatenation of 2 string literals, 'FPro.status ...' and '', not a single literal with an escaped single quote in it.
Please do not concatenate or otherwise manually format values to SQL queries, it is error prone as you have found out. It should be obvious from the generated SQL that the values are concatenated as is, without proper quoting, and so produce the incorrect statement. The correct way to pass values to (raw) SQL queries is to use placeholders, or in case of SQLAlchemy, use the SQL Expression Language.
Using placeholders:
abc = "%abc%"
def_ = "%def%"
proj1 = "%project1%"
proj2 = "%project2%"
# Using **raw SQL** fragments
case_condition = case([
(text("FPro.status = 'ON' AND Ab.name LIKE :abc AND F.project LIKE :proj1"), 'value1'),
(text("FPro.status = 'ON' AND Ab.name LIKE :abc AND F.project LIKE :proj2"), 'value2'),
(text("FPro.status = 'OFF' AND Ab.name LIKE :def_ AND F.project LIKE :abc"), 'value3')
])
query = db.session.query(F)\
.join(FPro, F.id == FPro.f_id)\
.join(Ab, Ab.id == F.ab_id)\
.with_entities(
FPro.f_id,
case_condition.label('derived_vals'),
F.f_name,
F.proj,
FPro.status)\
.subquery()
main_query = db.session.query(Tags)\
.join(query, Tags.tag == query.c.derived_vals)\
.with_entities(
query.c.f_id.label('f_id'),
query.c.derived_vals.label('derived_vals'),
Tags.id.label('tag_id'))\
.params(abc=abc, def_=def_, proj1=proj1, proj2=proj2)\
.all()
Using the expression language:
from sqlalchemy import and_
abc = "%abc%"
def_ = "%def%"
proj1 = "%project1%"
proj2 = "%project2%"
# Using SQLAlchemy SQL Expression Language DSL **in Python**
case_condition = case([
(and_(FPro.status == 'ON', Ab.name.like(abc), F.project.like(proj1)), 'value1'),
(and_(FPro.status == 'ON', Ab.name.like(abc), F.project.like(proj2)), 'value2'),
(and_(FPro.status == 'OFF', Ab.name.like(def_), F.project.like(abc)), 'value3')
])
query = db.session.query(F)\
.join(FPro, F.id == FPro.f_id)\
.join(Ab, Ab.id == F.ab_id)\
.with_entities(
FPro.f_id,
case_condition.label('derived_vals'),
F.f_name,
F.proj,
FPro.status)\
.subquery()
main_query = db.session.query(Tags)\
.join(query, Tags.tag == query.c.derived_vals)\
.with_entities(
query.c.f_id.label('f_id'),
query.c.derived_vals.label('derived_vals'),
Tags.id.label('tag_id'))\
.all()
I am pretty new in python developing. I have a long python script what "clone" a database and add additional stored functions and procedures. Clone means copy only the schema of DB.These steps work fine.
My question is about pymysql insert exection:
I have to copy some table contents into the new DB. I don't get any sql error. If I debug or print the created INSERT INTO command is correct (I've tested it in an sql editor/handler). The insert execution is correct becuse the result contain the exact row number...but all rows are missing from destination table in dest.DB...
(Ofcourse DB_* variables have been definied!)
import pymysql
liveDbConn = pymysql.connect(DB_HOST, DB_USER, DB_PWD, LIVE_DB_NAME)
testDbConn = pymysql.connect(DB_HOST, DB_USER, DB_PWD, TEST_DB_NAME)
tablesForCopy = ['role', 'permission']
for table in tablesForCopy:
with liveDbConn.cursor() as liveCursor:
# Get name of columns
liveCursor.execute("DESCRIBE `%s`;" % (table))
columns = '';
for column in liveCursor.fetchall():
columns += '`' + column[0] + '`,'
columns = columns.strip(',')
# Get and convert values
values = ''
liveCursor.execute("SELECT * FROM `%s`;" % (table))
for result in liveCursor.fetchall():
data = []
for item in result:
if type(item)==type(None):
data.append('NULL')
elif type(item)==type('str'):
data.append("'"+item+"'")
elif type(item)==type(datetime.datetime.now()):
data.append("'"+str(item)+"'")
else: # for numeric values
data.append(str(item))
v = '(' + ', '.join(data) + ')'
values += v + ', '
values = values.strip(', ')
print("### table: %s" % (table))
testDbCursor = testDbConn.cursor()
testDbCursor.execute("INSERT INTO `" + TEST_DB_NAME + "`.`" + table + "` (" + columns + ") VALUES " + values + ";")
print("Result: {}".format(testDbCursor._result.message))
liveDbConn.close()
testDbConn.close()
Result is:
### table: role
Result: b"'Records: 16 Duplicates: 0 Warnings: 0"
### table: permission
Result: b'(Records: 222 Duplicates: 0 Warnings: 0'
What am I doing wrong? Thanks!
You have 2 main issues here:
You don't use conn.commit() (which would be either be liveDbConn.commit() or testDbConn.commit() here). Changes to the database will not be reflected without committing those changes. Note that all changes need committing but SELECT, for example, does not.
Your query is open to SQL Injection. This is a serious problem.
Table names cannot be parameterized, so there's not much we can do about that, but you'll want to parameterize your values. I've made multiple corrections to the code in relation to type checking as well as parameterization.
for table in tablesForCopy:
with liveDbConn.cursor() as liveCursor:
liveCursor.execute("SELECT * FROM `%s`;" % (table))
name_of_columns = [item[0] for item in liveCursor.description]
insert_list = []
for result in liveCursor.fetchall():
data = []
for item in result:
if item is None: # test identity against the None singleton
data.append('NULL')
elif isinstance(item, str): # Use isinstance to check type
data.append(item)
elif isinstance(item, datetime.datetime):
data.append(item.strftime('%Y-%m-%d %H:%M:%S'))
else: # for numeric values
data.append(str(item))
insert_list.append(data)
testDbCursor = testDbConn.cursor()
placeholders = ', '.join(['`%s`' for item in insert_list[0]])
testDbCursor.executemany("INSERT INTO `{}.{}` ({}) VALUES ({})".format(
TEST_DB_NAME,
table,
name_of_columns,
placeholders),
insert_list)
testDbConn.commit()
From this github thread, I notice that executemany does not work as expected in psycopg2; it instead sends each entry as a single query. You'll need to use execute_batch:
from psycopg2.extras import execute_batch
execute_batch(testDbCursor,
"INSERT INTO `{}.{}` ({}) VALUES ({})".format(TEST_DB_NAME,
table,
name_of_columns,
placeholders),
insert_list)
testDbConn.commit()
How to insert data into table using python pymsql
Find my solution below
import pymysql
import datetime
# Create a connection object
dbServerName = "127.0.0.1"
port = 8889
dbUser = "root"
dbPassword = ""
dbName = "blog_flask"
# charSet = "utf8mb4"
conn = pymysql.connect(host=dbServerName, user=dbUser, password=dbPassword,db=dbName, port= port)
try:
# Create a cursor object
cursor = conn.cursor()
# Insert rows into the MySQL Table
now = datetime.datetime.utcnow()
my_datetime = now.strftime('%Y-%m-%d %H:%M:%S')
cursor.execute('INSERT INTO posts (post_id, post_title, post_content, \
filename,post_time) VALUES (%s,%s,%s,%s,%s)',(5,'title2','description2','filename2',my_datetime))
conn.commit()
except Exception as e:
print("Exeception occured:{}".format(e))
finally:
conn.close()
I am trying to write. code that will allow a user to select specific columns from a sqlite database which will then be transformed into a pandas data frame. I am using a test database titled test_database.db with a table titled test. The table has three columns, id, value_one, and value_two. The function I am showing exists within a class that establishes a connection to the database and in this function the user only needs to pass the table name and a list of columns that they would like to extract. For instance in command line sqlite I might type the command select value_one, value_two from test if I wanted only to read in the columns value_one and column_two from the table test. If I type this command into command line the method works. However, in this case I use python to build the text string which is fed into pandas.read_sql_query() and the method does not work. My code is shown below
class ReadSQL:
def __init__(self, database):
self.database = database
self.conn = sqlite3.connect(self.database)
self.cur = self.conn.cursor()
def query_columns_to_dataframe(table, columns):
query = 'select '
for i in range(len(columns)):
query = query + columns[I] + ', '
query = query[:-2] + ' from ' + table
# print(query)
df = pd.read_sql_query(query, self.conn)
return
def close_database()
self.conn.close
return
test = ReadSQL(test_database.db)
df = query_columns_to_dataframe('test', ['value_one', 'value_two'])
I am assuming my problem has something to do with the way that query_columns_to_dataframe() pre-processes the information because if I uncomment the print command in query_columnes_to_dataframe() I get a text string that looks identical to what works if I just type it directly into command line. Any help is appreciated.
I mopped up a few mistakes in your code to produce this, which works. Note that I inadvertently changed the names of the fields in your test db.
import sqlite3
import pandas as pd
class ReadSQL:
def __init__(self, database):
self.database = database
self.conn = sqlite3.connect(self.database)
self.cur = self.conn.cursor()
def query_columns_to_dataframe(self, table, columns):
query = 'select '
for i in range(len(columns)):
query = query + columns[i] + ', '
query = query[:-2] + ' from ' + table
#~ print(query)
df = pd.read_sql_query(query, self.conn)
return df
def close_database():
self.conn.close
return
test = ReadSQL('test_database.db')
df = test.query_columns_to_dataframe('test', ['value_1', 'value_2'])
print (df)
Output:
value_1 value_2
0 2 3
Your code are full of syntax errors and issues
The return in query_columns_to_dataframe should be return df. This is the primary reason why your code does not return anything.
self.cur is not used
Missing self parameter when declaring query_columns_to_dataframe
Missing colon at the end of the line def close_database()
Missing self parameter when declaring close_database
Missing parentheses here: self.conn.close
This df = query_columns_to_dataframe should be df = test.query_columns_to_dataframe
Fixing these errors and your code should work.
I had a question pertaining to mysql as being used in Python. Basically I have a dropdown menu on a webpage using flask, that provides the parameters to change the mysql queries. Here is a code snippet of my problem.
select = request.form.get('option')
select_2 = request.form.get('option_2')
conn = mysql.connect()
cursor = conn.cursor()
query = "SELECT * FROM tbl_user WHERE %s = %s;"
cursor.execute(query, (select, select_2))
data = cursor.fetchall()
This returns no data from the query because there are single qoutes around the first variable, i.e.
Select * from tbl_user where 'user_name' = 'Adam'
versus
Select * from tbl_user where user_name = 'Adam'.
Could someone explain how to remove these single qoutes around the columns for me? When I hard code the columns I want to use, it gives me back my desired data but when I try to do it this way, it merely returns []. Any help is appreciated.
I have a working solution dealing with pymysql, which is to rewrite the escape method in class 'pymysql.connections.Connection', which obviously adds "'" arround your string. maybe you can try in a similar way, check this:
from pymysql.connections import Connection, converters
class MyConnect(Connection):
def escape(self, obj, mapping=None):
"""Escape whatever value you pass to it.
Non-standard, for internal use; do not use this in your applications.
"""
if isinstance(obj, str):
return self.escape_string(obj) # by default, it is :return "'" + self.escape_string(obj) + "'"
if isinstance(obj, (bytes, bytearray)):
ret = self._quote_bytes(obj)
if self._binary_prefix:
ret = "_binary" + ret
return ret
return converters.escape_item(obj, self.charset, mapping=mapping)
config = {'host':'', 'user':'', ...}
conn = MyConnect(**config)
cur = conn.cursor()