I'm new to Python and SQL, but I need to delete multiple entries in a table on a remote server. I would also prefer to preserve the input structure of a function I was given because it is used in codes of other colleagues.
I came up with a solution that does the job similar to the one presented below. I deliberately avoided using any sort of executemany() methods because (if I am not mistaken,) they can be terribly slow.
import sqlalchemy as sa
import urllib
def delete_rows(tablename, colnames, data):
"""
tablename - name of db table with dbname. like RiskData..factors
colnames - column names to use as keys in deletion
data - a list of tuples, a tuple per row, number of elements in each
tuple must is the same as number of column names
"""
# Connection details
engine = sa.create_engine("mssql+pyodbc://some_server")
connection = self.engine.connect()
# Data has to be a list - throw an exception if it is not
if (not (type(data) is list)):
raise Exception('Data must be a list');
# assemble one long query statement
query = "DELETE " + tablename + " WHERE "
query_dp = "or (" + " = '{}' and ".join(colnames) + "= '{}') "
query_tail = ""
for record_entries in data:
query_tail += query_dp.format(*record_entries)
query += query_tail[3:-1]
connection.execute(query)
connection.close()
I would like to ask whether this solution is inefficient and will be slow for a large amounts of data? If so, what would a more elegant solution be?
Don't know about speed, but as far as elegance goes, don't use string formatting for passing values to SQL queries. Since you're already using SQLAlchemy, you can leverage its query building capabilities:
def delete_rows(tablename, colnames, data):
"""
tablename - name of db table with dbname. like RiskData..factors
colnames - column names to use as keys in deletion
data - a list of tuples, a tuple per row, number of elements in each
tuple must is the same as number of column names
"""
# Data has to be a list - throw an exception if it is not
if not isinstance(data, list):
raise Exception('Data must be a list');
# Connection details
engine = sa.create_engine("mssql+pyodbc://some_server")
# Create `column()` objects for producing bindparams
cols = [sa.column(name) for name in colnames]
# Create a list of predicates, to be joined with OR
preds = []
for record_entries in data:
pred = sa.and_(*[c == e for c, e in zip(cols, record_entries)])
preds.append(pred)
# assemble one long query statement
query = sa.table(tablename).delete().where(sa.or_(*preds))
with engine.begin() as connection:
connection.execute(query)
Whether or not executemany() is slow depends on the DB-API driver in use. In case of pyodbc this used to be true, but there's been work to improve it.
Related
I am implementing a student database project which has multiple tables such as student,class,section etc
I wrote a delete_table function which takes parameters table name and value to delete a row from a specific table but there seems to be some sort of syntax error in my code :
def delete_tables(tab_name,attr,value):
c.execute("delete from table=:tab_name where attribute=:attr is value=:value ",{'tab_name':tab_name, 'attr': attr, 'value': value})
input :
delete_tables('section','sec_name','S1')
error text :
c.execute("delete from table=:tab_name where attribute=:attr is value=:value ",{'tab_name':tab_name, 'attr': attr, 'value': value})
sqlite3.OperationalError: near "table": syntax error
I've tried all mentioned answers and what y'all are suggesting is that it'll also be insecure even if it works out. So Do i Have to write functions to delete every table individually instead of going for one single function, and is there any other alternative to this problem where I need not keep on writing n functions for n number of tables?????
Thanks in Advance :))
The problem is that you can't use parametrized queries (that :tab_name) on things others than values (? not sure I am using the right term): table names, column names and SQL keywords are forbidden.
where age > :max_age is OK.
where :some_col > :max_age is not.
where age :comparison_operator :max_age is not OK.
Now, you can build your own query using string concatenation or f strings, but... 🧨 this is a massive, massive SQL injection risk. See Bobby Tables Not to mention that concatenating values into SQL query strings quickly runs into issues when you have to deal with characters, numbers or None. (None => NULL, characters need quotes, numbers dont).
You could possibly build a query using string substitutions that accept only known values for the table and column names and then drives the delete criteria value using a parametrized query on :value.
(While this seems restrictive, letting a random caller determine which tables to delete is just not safe in the least).
Something like:
delete_tables(tab_name,attr,value):
safe_tab_name = my_dict_of_known_table_names[tab_name]
safe_attr = my_dict_of_known_column_names[attr]
# you have to `=`, not `is` here👇
qry = f"delete from {safe_tab_name} where {safe_attr} = :value "
# not entirely sure about SQLite's bind/parametrized syntax.
# look it up if needed.
c.execute(qry, dict(value = value))
Assuming a user only enters value directly, that at least is protected from SQL injection.
You need to have a look at what will be the exact SQL command that will be executed in the python method.
For the method call delete_tables('section', 'sec_name', 'S1') the SQL command that will be generated will be
delete from table=section where attribute=sec_name is value=S1
This will be an invalid command in SQL. The correct command should be
delete from section where sec_name='S1'
So you need to change your python function accordingly. The changes that need to be done should be as follows:
def delete_tables(tab_name, attr, value):
c.execute("delete from :tab_name where :attr = ':value'",
{'tab_name': tab_name, 'attr': attr, 'value':value})
def delete_tables(tab_name, attr, value):
c.execute("delete from " + tab_name + "where " + attr + " = " + value)
I think something like that will work, the issue is that you are trying to modify an attribute but its name is always attribute, for that you would like to make it a parameter in order to properly handle it.
Hope it helped.
Edit:
Check this SQLite python
What the c.execute does is to 'execute' a SQL query, so, you can make something like c.execute("select * from clients") if you have a clients table.
execute makes a query and brings you the result set (if it is the case), so if you want to delete from your table using a normal SQL query you would type in the console delete from clients where client_id = 12 and that statement will delete the client with id equal to 12.
Now, if you are using SQLite in python, you will do
c.execute("delete from clients where client_id = 12")
but as you wish it to be for any table and any field (attribute) it turns in the table name, the field name and the value of that field being variables.
tableName = "clients"
field = "client_id"
value = "12" #must be string because you would have to cast it from int in the execute
"""
if value is a varchar you must write
value = "'12'" because the '' are needed.
"""
c.execute("delete from " + tableName + " where " + field + " = " + value)
and in the top of that, as you want it to be a function
def delete_tables(tableName, field, value):
c.execute("delete from " + tableName+ "where " + field + " = " + value)
Edit 2:
aaron's comment is true, it is not secure, the next step you would do is
def delete_tables(tab_name, attr, value):
#no ':value' (it limits the value to characters)
c.execute("delete from :tab_name where :attr = :value",
{'tab_name': tab_name, 'attr': attr, 'value':value})
It is from Vatsal's answer
I'm looking for a way to implement alternating SQL queries - i.e. a function that allows me to filter entries based on different columns. Take the following example:
el=[["a","b",1],["a","b",3]]
def save_sql(foo):
with sqlite3.connect("fn.db") as db:
cur=db.cursor()
cur.execute("CREATE TABLE IF NOT EXISTS et"
"(var1 VARCHAR, var2 VARCHAR, var3 INT)")
cur.executemany("INSERT INTO et VALUES "
"(?,?,?)", foo)
db.commit()
def load_sql(v1,v2,v3):
with sqlite3.connect("fn.db") as db:
cur=db.cursor()
cur.execute("SELECT * FROM et WHERE var1=? AND var2=? AND var3=?", (v1,v2,v3))
return cur.fetchall()
save_sql(el)
Now if I were to use load_sql("a","b",1), it would work. But assume I want to only query for the first and third column, i.e. load_sql("a",None,1) (the None is just intended as a placeholder) or only the last column load_sql(None,None,5), this wouldn't work.
This could of course be done with if statements checking which variables were supplied in the function call, but in tables with larger amounts of columns, this might get messy.
Is there a good way to do this?
What if load_sql() would accept an arbitrary number of keyword arguments, where keyword argument names would correspond to column names. Something along these lines:
def load_sql(**values):
with sqlite3.connect("fn.db") as db:
cur = db.cursor()
query = "SELECT * FROM et"
conditions = [f"{column_name} = :{column_name}" for column_name in values]
if conditions:
query = query + " WHERE " + " AND ".join(conditions)
cur.execute(query, values)
return cur.fetchall()
Note that here we trust keyword argument names to be valid and existing column names (and string-format them into the query) which may potentially be used as an SQL injection attack vector.
As a side note, I cannot stop but think that this feels like a reinventing-the-wheel step towards an actual ORM. Look into lightweight PonyORM or Peewee abstraction layers between Python and a database.
It will inevitably get messy if you want your SQL statements to remain sanitized/safe, but as long as you control your function signature it can remain reasonably safe, e.g.:
def load_sql(var1, var2, var3):
fields = dict(field for field in locals().items() if field[1] is not None)
query = "SELECT * FROM et"
if fields: # if at least one field is not None:
query += " WHERE " + " AND ".join((k + "=?" for k in fields.keys()))
with sqlite3.connect("fn.db") as db:
cur = db.cursor()
cur.execute(query, fields.values())
return cur.fetchall()
You can replace the function signature with load_sql(**kwargs) and then use kwargs.items() instead of locals.items() so that you can pass arbitrary column names, but that can be very dangerous and is certainly not recommended.
I'm trying to execute a raw sql query and safely pass an order by/asc/desc based on user input. This is the back end for a paginated datagrid. I cannot for the life of me figure out how to do this safely. Parameters get converted to strings so Oracle can't execute the query. I can't find any examples of this anywhere on the internet. What is the best way to safely accomplish this? (I am not using the ORM, must be raw sql).
My workaround is just setting ASC/DESC to a variable that I set. This works fine and is safe. However, how do I bind a column name to the ORDER BY? Is that even possible? I can just whitelist a bunch of columns and do something similar as I do with the ASC/DESC. I was just curious if there's a way to bind it. Thanks.
#default.route('/api/barcodes/<sort_by>/<sort_dir>', methods=['GET'])
#json_enc
def fetch_barcodes(sort_by, sort_dir):
#time.sleep(5)
# Can't use sort_dir as a parameter, so assign to variable to sanitize it
ord_dir = "DESC" if sort_dir.lower() == 'desc' else 'ASC'
records = []
stmt = text("SELECT bb_request_id,bb_barcode,bs_status, "
"TO_CHAR(bb_rec_cre_date, 'MM/DD/YYYY') AS bb_rec_cre_date "
"FROM bars_barcodes,bars_status "
"WHERE bs_status_id = bb_status_id "
"ORDER BY :ord_by :ord_dir ")
stmt = stmt.bindparams(ord_by=sort_by,ord_dir=ord_dir)
rs = db.session.execute(stmt)
records = [dict(zip(rs.keys(), row)) for row in rs]
DatabaseError: (cx_Oracle.DatabaseError) ORA-01036: illegal variable name/number
[SQL: "SELECT bb_request_id,bb_barcode,bs_status, TO_CHAR(bb_rec_cre_date, 'MM/DD/YYYY') AS bb_rec_cre_date FROM bars_barcodes,bars_status WHERE bs_status_id = bb_status_id ORDER BY :ord_by :ord_dir "] [parameters: {'ord_by': u'bb_rec_cre_date', 'ord_dir': 'ASC'}]
UPDATE Solution based on accepted answer:
def fetch_barcodes(sort_by, sort_dir, page, rows_per_page):
ord_dir_func = desc if sort_dir.lower() == 'desc' else asc
query_limit = int(rows_per_page)
query_offset = (int(page) - 1) * query_limit
stmt = select([column('bb_request_id'),
column('bb_barcode'),
column('bs_status'),
func.to_char(column('bb_rec_cre_date'), 'MM/DD/YYYY').label('bb_rec_cre_date')]).\
select_from(table('bars_barcode')).\
select_from(table('bars_status')).\
where(column('bs_status_id') == column('bb_status_id')).\
order_by(ord_dir_func(column(sort_by))).\
limit(query_limit).offset(query_offset)
result = db.session.execute(stmt)
records = [dict(row) for row in result]
response = json_return()
response.addRecords(records)
#response.setTotal(len(records))
response.setTotal(1001)
response.setSuccess(True)
response.addMessage("Records retrieved successfully. Limit: " + str(query_limit) + ", Offset: " + str(query_offset) + " SQL: " + str(stmt))
return response
You could use Core constructs such as table() and column() for this instead of raw SQL strings. That'd make your life easier in this regard:
from sqlalchemy import select, table, column, asc, desc
ord_dir = desc if sort_dir.lower() == 'desc' else asc
stmt = select([column('bb_request_id'),
column('bb_barcode'),
column('bs_status'),
func.to_char(column('bb_rec_cre_date'),
'MM/DD/YYYY').label('bb_rec_cre_date')]).\
select_from(table('bars_barcodes')).\
select_from(table('bars_status')).\
where(column('bs_status_id') == column('bb_status_id')).\
order_by(ord_dir(column(sort_by)))
table() and column() represent the syntactic part of a full blown Table object with Columns and can be used in this fashion for escaping purposes:
The text handled by column() is assumed to be handled like the name of a database column; if the string contains mixed case, special characters, or matches a known reserved word on the target backend, the column expression will render using the quoting behavior determined by the backend.
Still, whitelisting might not be a bad idea.
Note that you don't need to manually zip() the row proxies in order to produce dictionaries. They act as mappings as is, and if you need dict() for serialization reasons or such, just do dict(row).
I'm currently building SQL queries depending on input from the user. An example how this is done can be seen here:
def generate_conditions(table_name,nameValues):
sql = u""
for field in nameValues:
sql += u" AND {0}.{1}='{2}'".format(table_name,field,nameValues[field])
return sql
search_query = u"SELECT * FROM Enheter e LEFT OUTER JOIN Handelser h ON e.Id == h.Enhet WHERE 1=1"
if "Enhet" in args:
search_query += generate_conditions("e",args["Enhet"])
c.execute(search_query)
Since the SQL changes every time I cannot insert the values in the execute call which means that I should escape the strings manually. However, when I search everyone points to execute...
I'm also not that satisfied with how I generate the query, so if someone has any idea for another way that would be great also!
You have two options:
Switch to using SQLAlchemy; it'll make generating dynamic SQL a lot more pythonic and ensures proper quoting.
Since you cannot use parameters for table and column names, you'll still have to use string formatting to include these in the query. Your values on the other hand, should always be using SQL parameters, if only so the database can prepare the statement.
It's not advisable to just interpolate table and column names taken straight from user input, it's far too easy to inject arbitrary SQL statements that way. Verify the table and column names against a list of such names you accept instead.
So, to build on your example, I'd go in this direction:
tables = {
'e': ('unit1', 'unit2', ...), # tablename: tuple of column names
}
def generate_conditions(table_name, nameValues):
if table_name not in tables:
raise ValueError('No such table %r' % table_name)
sql = u""
params = []
for field in nameValues:
if field not in tables[table_name]:
raise ValueError('No such column %r' % field)
sql += u" AND {0}.{1}=?".format(table_name, field)
params.append(nameValues[field])
return sql, params
search_query = u"SELECT * FROM Enheter e LEFT OUTER JOIN Handelser h ON e.Id == h.Enhet WHERE 1=1"
search_params = []
if "Enhet" in args:
sql, params = generate_conditions("e",args["Enhet"])
search_query += sql
search_params.extend(params)
c.execute(search_query, search_params)
I'm currently building SQL queries depending on input from the user. An example how this is done can be seen here:
def generate_conditions(table_name,nameValues):
sql = u""
for field in nameValues:
sql += u" AND {0}.{1}='{2}'".format(table_name,field,nameValues[field])
return sql
search_query = u"SELECT * FROM Enheter e LEFT OUTER JOIN Handelser h ON e.Id == h.Enhet WHERE 1=1"
if "Enhet" in args:
search_query += generate_conditions("e",args["Enhet"])
c.execute(search_query)
Since the SQL changes every time I cannot insert the values in the execute call which means that I should escape the strings manually. However, when I search everyone points to execute...
I'm also not that satisfied with how I generate the query, so if someone has any idea for another way that would be great also!
You have two options:
Switch to using SQLAlchemy; it'll make generating dynamic SQL a lot more pythonic and ensures proper quoting.
Since you cannot use parameters for table and column names, you'll still have to use string formatting to include these in the query. Your values on the other hand, should always be using SQL parameters, if only so the database can prepare the statement.
It's not advisable to just interpolate table and column names taken straight from user input, it's far too easy to inject arbitrary SQL statements that way. Verify the table and column names against a list of such names you accept instead.
So, to build on your example, I'd go in this direction:
tables = {
'e': ('unit1', 'unit2', ...), # tablename: tuple of column names
}
def generate_conditions(table_name, nameValues):
if table_name not in tables:
raise ValueError('No such table %r' % table_name)
sql = u""
params = []
for field in nameValues:
if field not in tables[table_name]:
raise ValueError('No such column %r' % field)
sql += u" AND {0}.{1}=?".format(table_name, field)
params.append(nameValues[field])
return sql, params
search_query = u"SELECT * FROM Enheter e LEFT OUTER JOIN Handelser h ON e.Id == h.Enhet WHERE 1=1"
search_params = []
if "Enhet" in args:
sql, params = generate_conditions("e",args["Enhet"])
search_query += sql
search_params.extend(params)
c.execute(search_query, search_params)