I have a function that executes many SQL queries with different dates.
What I want is to pass all dates and other query variables as function parameters and then just execute the function. I have figured out how to do this for datetime variables as below. But I also have a query that looks at specific campaign_names in a database and pulls those as strings. I want to be able to pass those strings as function parameters but I haven't figured out the correct syntax for this in the SQL query.
def Camp_eval(start_date,end_1M,camp1,camp2,camp3):
query1 = f"""SELECT CONTACT_NUMBER, OUTCOME_DATE
FROM DATABASE1
where OUTCOME_DATE >= (to_date('{start_date}', 'dd/mm/yyyy'))
and OUTCOME_DATE < (to_date('{end_1M}', 'dd/mm/yyyy'))"""
query2 = """SELECT CONTACT_NUMBER
FROM DATABASE2
WHERE (CAMP_NAME = {camp1} or
CAMP_NAME = {camp2} or
CAMP_NAME = {camp3})"""
Camp_eval('01/04/2022','01/05/2022','Camp_2022_04','Camp_2022_05','Camp_2022_06')
The parameters start_date and end_1M work fine with the {} brackets but the camp variables, which are strings don't return any results even though there are results in the database with those conditions if I were to write them directly in the query.
Any help would be appreciated!!
Please, do not use f-strings for creating SQL queries!
Most likely, any library you use for accessing a database already has a way of creating queries: SQLite docs (check code examples).
Another example: cur.execute("SELECT * FROM tasks WHERE priority = ?", (priority,)).
Not only this way is safer (fixes SQL Injection problem mentioned by #d-malan in comments), but it also eliminates the need to care about how data is represented in SQL - the library will automatically cast dates, strings, etc. in what they need to be casted into. Therefore, your problem can be fixed by using proper instruments.
Related
I am wanting to run a Presto SQL query in a for loop so that the query will pull hourly data based on my date variables.
Example query is along the lines of:
x = datetime.strptime('12-10-22', '%d-%m-%y').date()
y = datetime.strptime('13-10-22', '%d-%m-%y').date()
for dt in rrule.rrule(rrule.HOURLY, dtstart=nextProcStart, until=nextProcEnd):
sql_query = "SELECT SUM(sales) FROM a WHERE date between x and y"
I will note I'm using the syntax of writing the SQL query as a variable so along the lines of:
sql_query = """ SELECT... FROM..."""
I have tried just adding the variables into the query but no luck. Unsure what steps will work.
I've also tried using .format(x,y) at the end of my SQL query but keep getting an error saying
KeyError: 'x'
Remember that your SQL statement is no more than a string, so you just need to know how to incorporate a variable into a string.
Try:
sql_query = "SELECT SUM(sales) FROM a WHERE date between {} and {}".format(x, y)
Read How do I put a variable’s value inside a string (interpolate it into the string)? for more info or alternative methods.
Hopefully this answers your immediate question above on how to incorporate variable into string and get your code, as is, to work. However, as #nbk, mentions in comment below, this method is NOT recommended as it is insecure.
Using concatenations in SQL statements like this does open the code up to injection attacks. Even if your database does not contain sensitive information, it is bad practice.
Prepared statements have many advantages, not least of all that they are more secure and more efficient. I would certainly invest some time in researching and understanding SQL prepared statements.
I am new to this and trying to learn python. I wrote a select statement in python where I used a parameter
Select """cln.customer_uid = """[(num_cuid_number)])
TypeError: string indices must be integers
Agree with the others, this doesn't look really like Python by itself.
I will see even without seeing the rest of that code I'll guess the [(num_cuid_number)] value(s) being returned is a string, so you'll want to convert it to integer for the select statement to process.
num_cuid_number is most likely a string in your code; the string indices are the ones in the square brackets. So please first check your data variable to see what you received there. Also, I think that num_cuid_number is a string, while it should be in an integer value.
Let me give you an example for the python code to execute: (Just for the reference: I have used SQLAlchemy with flask)
#app.route('/get_data/')
def get_data():
base_sql="""
SELECT cln.customer_uid='%s' from cln
""" % (num_cuid_number)
data = db.session.execute(base_sql).fetchall()
Pretty sure you are trying to create a select statement with a "where" clause here. There are many ways to do this, for example using raw sql, the query should look similar to this:
query = "SELECT * FROM cln WHERE customer_uid = %s"
parameters = (num_cuid_number,)
separating the parameters from the query is secure. You can then take these 2 variables and execute them with your db engine like
results = db.execute(query, parameters)
This will work, however, especially in Python, it is more common to use a package like SQLAlchemy to make queries more "flexible" (in other words, without manually constructing an actual string as a query string). You can do the same thing using SQLAlchemy core functionality
query = cln.select()
query = query.where(cln.customer_uid == num_cuid_number)
results = db.execute(query)
Note: I simplified "db" in both examples, you'd actually use a cursor, session, engine or similar to execute your queries, but that wasn't your question.
I am aware that queries in Python can be parameterized using either ? or %s in execute query here or here
However I have some long query that would use some constant variable defined at the beginning of the query
Set #my_const = 'xyz';
select #my_const;
-- Query that use #my_const 40 times
select ... coalesce(field1, #my_const), case(.. then #my_const)...
I would like to do the least modif possible to the query from Mysql. So that instead of modifying the query to
pd.read_sql(select ... coalesce(field1, %s), case(.. then %s)... , [my_const, my_const, my_const, ..]
,I could write something along the line of the initial query. Upon trying the following, however, I am getting a TypeError: 'NoneType' object is not iterable
query_str = "Set #null_val = \'\'; "\
" select #null_val"
erpur_df = pd.read_sql(query_str, con = db)
Any idea how to use the original variable defined in Mysql query ?
The reason
query_str = "Set #null_val = \'\'; "\
" select #null_val"
erpur_df = pd.read_sql(query_str, con = db)
throws that exception is because all you are doing is setting null_value to '' and then selecting that '' - what exactly would you have expected that to give you? EDIT read_sql only seems to execute one query at a time, and as the first query returns no rows it results in that exception.
If you split them in to two calls to read_sql then it will in fact return you the value of your #null value in the second call. Due to this behaviour read_sql is clearly not a good way to do this. I strongly suggest you use one of my suggestions below.
Why are you wanting to set the variable in the SQL using '#' anyway?
You could try using the .format style of string formatting.
Like so:
query_str = "select ... coalesce(field1, {c}), case(.. then {c})...".format(c=my_const)
pd.read_sql(query_str)
Just remember that if you do it this way and your my_const is a user input then you will need to sanitize it manually to prevent SQL injection.
Another possibility is using a dict of params like so:
query_str = "select ... coalesce(field1, %(my_const)s, case(.. then %(my_const)s)..."
pd.read_sql(query_str, params={'my_const': const_value})
However this is dependent on which database driver you use.
From the pandas.read_sql docs:
Check your database driver documentation for which of the five syntax
styles, described in PEP 249’s paramstyle, is supported. Eg. for
psycopg2, uses %(name)s so use params={‘name’ : ‘value’}
I am trying to do a simple filter operation on a query in sqlalchemy, like this:
q = session.query(Genotypes).filter(Genotypes.rsid.in_(inall))
where
inall is a list of strings
Genotypes is mapped to a table:
class Genotypes(object):
pass
Genotypes.mapper = mapper(Genotypes, kg_table, properties={'rsid': getattr(kg_table.c, 'rs#')})
This seems pretty straightforward to me, but I get the following error when I execute the above query by doing q.first():
"sqlalchemy.exc.OperationalError: (OperationalError) too many SQL
variables u'SELECT" followed by a list of the 1M items in the inall
list. But they aren't supposed to be SQL variables, just a list whose
membership is the filtering criteria.
Am I doing the filtering incorrectly?
(the db is sqlite)
If the table where you are getting your rsids from is available in the same database I'd use a subquery to pass them into your Genotypes query rather than passing the one million entries around in your Python code.
sq = session.query(RSID_Source).subquery()
q = session.query(Genotypes).filter(Genotypes.rsid.in_(sq))
The issue is that in order to pass that list to SQLite (or any database, really), SQLAlchemy has to pass over each entry for your in clause as a variable. The SQL translates roughly to:
-- Not valid SQLite SQL
DECLARE #Param1 TEXT;
SET #Param1 = ?;
DECLARE #Param2 TEXT;
SET #Param2 = ?;
-- snip 999,998 more
SELECT field1, field2, -- etc.
FROM Genotypes G
WHERE G.rsid IN (#Param1, #Param2, /* snip */)
The below workaround worked for me:
q = session.query(Genotypes).filter(Genotypes.rsid.in_(inall))
query_as_string = str(q.statement.compile(compile_kwargs={"literal_binds": True}))
session.execute(query_as_string).first()
This basically forces the query to compile as a string before execution, which bypasses the whole variables issue. Some details on this are available in SQLAlchemy's docs here.
BTW, if you're not using SQLite you can make use of the ANY operator to pass the list object as a single parameter (see my answer to this question here).
I'm trying to create a python script that constructs valid sqlite queries. I want to avoid SQL Injection, so I cannot use '%s'. I've found how to execute queries, cursor.execute('sql ?', (param)), but I want how to get the parsed sql param. It's not a problem if I have to execute the query first in order to obtain the last query executed.
If you're trying to transmit changes to the database to another computer, why do they have to be expressed as SQL strings? Why not pickle the query string and the parameters as a tuple, and have the other machine also use SQLite parameterization to query its database?
If you're not after just parameter substitution, but full construction of the SQL, you have to do that using string operations on your end. The ? replacement always just stands for a value. Internally, the SQL string is compiled to SQLite's own bytecode (you can find out what it generates with EXPLAIN thesql) and ? replacements are done by just storing the value at the correct place in the value stack; varying the query structurally would require different bytecode, so just replacing a value wouldn't be enough.
Yes, this does mean you have to be ultra-careful. If you don't want to allow updates, try opening the DB connection in read-only mode.
Use the DB-API’s parameter substitution. Put ? as a placeholder wherever you want to use a value, and then provide a tuple of values as the second argument to the cursor’s execute() method.
# Never do this -- insecure!
symbol = 'hello'
c.execute("SELECT * FROM stocks WHERE symbol = '%s'" % symbol)
# Do this instead
t = (symbol,)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)
print c.fetchone()
More reference is in the manual.
I want how to get the parsed 'sql param'.
It's all open source so you have full access to the code doing the parsing / sanitization. Why not just reading this code and find out how it works and if there's some (possibly undocumented) implementation that you can reuse ?