Python MySQLdb prevent SQL injection - not working as expected - python

I am trying to query a MySQL database in a secure way, avoiding SQL injection. I am getting an error when trying to execute the SQL in the DB cursor.
My code looks like this:
reseller_list = ('138',)
for reseller in reseller_list:
cur1 = db.cursor()
dbQuery = """
SELECT
TRIM(CONCAT(TRIM(c1.first_name), ' ', TRIM(c1.name))) AS 'User name',
FORMAT(sum(cost1),2) AS 'cost1',
FORMAT(sum(cost2),2) AS 'cost2',
FROM
client as c1,
client as c2
WHERE
c2.id = %s
AND start BETWEEN DATE_FORMAT(CURRENT_DATE - INTERVAL 1 MONTH, '%Y-%m-01 00:00:00')
AND DATE_FORMAT(LAST_DAY(CURRENT_DATE - INTERVAL 1 MONTH), '%Y-%m-%d 23:59:59')
GROUP BY
c1.name
ORDER BY
CONCAT(c1.first_name, ' ', c1.name);
"""
cur1.execute(dbQuery, (reseller_id,))
And what happens is this:
cur1.execute(dbQuery, (reseller_id,))
File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line 159, in execute
query = query % db.literal(args)
TypeError: not enough arguments for format string
I have read a number of pages both on this site and others but can't see what I am doing wrong. I can easily do this using string substitution into the query but want to do it the right way!

You have % signs in your date_format calls, so you'll need to escape them from the param substitution by doubling them.
WHERE
c2.id = %s
AND start BETWEEN DATE_FORMAT(CURRENT_DATE - INTERVAL 1 MONTH, '%%Y-%%m-01 00:00:00')
AND DATE_FORMAT(LAST_DAY(CURRENT_DATE - INTERVAL 1 MONTH), '%%Y-%%m-%%d 23:59:59')

Related

Airflow and Templates reference and PostgresHook

I have a question
I want to use Templates reference - {{ds}}
When substituting in PostgresOperator, everything works out well (I guess so)
And PostgresHook does not want to work
def prc_mymys_update(procedure: str, type_agg: str):
with PostgresHook(postgres_conn_id=CONNECTION_ID_GP).get_conn() as conn:
with conn.cursor() as cur:
with open(URL_YML_2,"r", encoding="utf-8") as f:
ya_2 = yaml.safe_load(f)
yml_mymts_2 = ya_2['type_agg']
query_pg = ""
if yml_mymts_2[0]['type_agg_name'] == "day" and type_agg == "day":
sql_1 = yml_mymts_2[0]['sql']
query_pg = f"""{sql_1}"""
elif yml_mymts_2[1]['type_agg_name'] == "retention" and type_agg == "retention":
sql_2 = yml_mymts_2[1]['sql']
query_pg = f"""{sql_2}"""
elif yml_mymts_2[2]['type_agg_name'] == "mau" and type_agg == "mau":
sql_3 = yml_mymts_2[2]['sql']
query_pg = f"""{sql_3}"""
cur.execute(query_pg)
dates_list = cur.fetchall()
for date_res in dates_list:
cur.execute(
"select from {}(%(date)s::date);".format(procedure),
{"date": date_res[0].strftime("%Y-%m-%d")},
)
conn.close()
I use yml
type_agg:
- type_agg_name: day
sql: select calendar_date from entertainment_dds.v_calendar where calendar_date between '{{ds}}'::date - interval '7 days' and '{{ds}}'::date - 1 order by 1 desc
- type_agg_name: retention
sql: SELECT t.date::date AS date FROM generate_series((date_trunc('month','{{execution_date.strftime('%Y-%m-%d')}}'::date) - interval '11 month'), date_trunc('month','{{execution_date.strftime('%Y-%m-%d')}}'::date) , '1 month'::interval) t(date) order by 1 asc
- type_agg_name: mau
sql: select dt::date date_ from generate_series('{{execution_date.strftime('%Y-%m-%d')}}'::date - interval '7 days', '{{execution_date.strftime('%Y-%m-%d')}}'::date - interval '1 days', interval '1 days') dt order by 1 asc
And when I run a dag, it comes to a moment with a certain task that uses
- type_agg_name: retention
sql: SELECT t.date::date AS date FROM generate_series((date_trunc('month','{{execution_date.strftime('%Y-%m-%d')}}'::date) - interval '11 month'), date_trunc('month','{{execution_date.strftime('%Y-%m-%d')}}'::date) , '1 month'::interval) t(date) order by 1 asc
I have wrong
psycopg2.errors.UndefinedColumn: column "y" does not exist
LINE 1: ...((date_trunc('month','{{execution_date.strftime('%Y-%m-%d')}...
enter image description here
I tried to find information on the interaction of Templates reference and PostgresHook, but found nothing
https://airflow.apache.org/docs/apache-airflow/stable/templates-ref.html#templates-reference
This is expected. templated_fields is an attribute of the BaseOperator in Airflow from which all operators inherit. This is which passing in a Jinja expression when using the PostgresOperator works just fine.
If you need to write a custom task, you need to render the template values explicitly. Like this, untested, but I'm sure this can be extrapolated in your function:
def prc_mymys_update(procedure: str, type_agg: str, ti):
ti.render_templates()
with PostgresHook(postgres_conn_id=CONNECTION_ID_GP).get_conn() as conn:
with conn.cursor() as cur:
...
The ti kwargs represents the Airflow Task Instance and is directly accessible as part of the execution context pushed to every task in Airflow. That object has a render_templates() method which will translate the Jinja expression to a value.
If the PostgresOperator doesn't fit your needs you can always subclass the operator and tailor it accordingly.
Also, the sql string itself has single quotes which cause string parsing issues as you're seeing:
'{{execution_date.strftime('%Y-%m-%d')}}'
Should be something like:
'{{execution_date.strftime("%Y-%m-%d")}}'
Note the single quotes in the following query:
sql: SELECT t.date::date AS date FROM generate_series((date_trunc('month','{{execution_date.strftime('%Y-%m-%d')}}'::date) - interval '11 month'), date_trunc('month','{{execution_date.strftime('%Y-%m-%d')}}'::date) , '1 month'::interval) t(date) order by 1 asc
Specifically, this part:
'{{execution_date.strftime('%Y-%m-%d')}}'
You have two separate strings here, separated by the date format. Here's the first string:
'{{execution_date.strftime('
This causes the date format to be rendered separately. If you wrap the date format in double quotes instead of single quotes, it should resolve this error. For example:
sql: SELECT t.date::date AS date FROM generate_series((date_trunc('month','{{execution_date.strftime("%Y-%m-%d")}}'::date) - interval '11 month'), date_trunc('month','{{execution_date.strftime('%Y-%m-%d')}}'::date) , '1 month'::interval) t(date) order by 1 asc
Note that you might need to swap the double and single quotes if double quotes in the RDBMS are used for other purposes, for example:
"{{execution_date.strftime('%Y-%m-%d')}}"

Error with SQL string: "Error while connecting to PostgreSQL operator does not exist: date = integer"

I have a Python(3) script that is supposed to run each morning. In it, I call some SQL. However I'm getting an error message:
Error while connecting to PostgreSQL operator does not exist: date = integer
The SQL is based on the concatenation of a string:
ecom_dashboard_query = """
with
days_data as (
select
s.date,
s.user_type,
s.channel_grouping,
s.device_category,
sum(s.sessions) as sessions,
count(distinct s.dimension2) as daily_users,
sum(s.transactions) as transactions,
sum(s.transaction_revenue) as revenue
from ga_flagship_ecom.sessions s
where date = """ + run.start_date + """
group by 1,2,3,4
)
insert into flagship_reporting.ecom_dashboard
select *
from days_data;
"""
Here is the full error:
09:31:25 Error while connecting to PostgreSQL operator does not exist: date = integer
09:31:25 LINE 14: where date = 2020-01-19
09:31:25 ^
09:31:25 HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
I tried wrapping run.start_date within str like so: str(run.start_date) but I received the same error message.
I suspect it may be to do with the way I concatenate the SQL query string, but I am not sure.
The query runs fine in SQL directly with a hard coded date and no concatenation:
where date = '2020-01-19'
How can I get the query string to work correctly?
It's more better to pass query params to cursor.execute method. From docs
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
So instead of string concatenation pass run.start_date as second argument of cursor.execute.
In your query instead of concatenation use %s:
where date = %s
group by 1,2,3,4
In your python code add second argument to execute method:
cur.execute(ecom_dashboard_query , (run.start_date,))
Your sentece is wrong:
where date = """ + run.start_date + """
try to compare a date and a string and this is not posible, you need to convert "run.start_date" to datetime and compare simply:
date_format = datetime.strptime(your_date_string, '%y-%m-%d')
and with this date converted to datetime do:
where date = date_format
Final code:
date_format = datetime.strptime(your_date_string, '%y-%m-%d')
ecom_dashboard_query = """
with
days_data as (
select
s.date,
s.user_type,
s.channel_grouping,
s.device_category,
sum(s.sessions) as sessions,
count(distinct s.dimension2) as daily_users,
sum(s.transactions) as transactions,
sum(s.transaction_revenue) as revenue
from ga_flagship_ecom.sessions s
where date = {}
group by 1,2,3,4
)
insert into flagship_reporting.ecom_dashboard
select *
from days_data;
""".format(date_format)

Using Wildcards in SQL Query with mysql-connector-python Python 3.6

I have a SQL query that works fine when copy pasted into my Python code. There is a line with a parameter that I want to make a variable in my Python script,
AND TimeStamp like '%2017-04-17%'
So I set a variable in the Python script:
mydate = datetime.date(2017, 4, 17) #Printing mydate gives 2017-04-17
and change the line in the query to:
AND TimeStamp like %s
Firstly, when I run the script with the date copy pasted in the query:cursor.execute(query) gives no errors and I can print the results with cursor.fetchall()
When I set the date to the variable mydate and use %s and try to run the script, any of these will give me an error:
cursor.execute(query,mydate) #"You have an error in your SQL Syntax..."
cursor.execute(query, ('%' + 'mydate' + '%',)) #"Not enough parameters for the SQL statement"
cursor.execute(query, ('%' + 'mydate' + '%')) #"You have an error in your SQL Syntax..."
cursor.execute(query, ('%' + mydate + '%')) #"must be str, not datetime.date
"
I simply want '%2017-04-17%' where the %s is.
If the value in your MySQL table is of type TIMESTAMP then you will need to do a
SELECT whatever FROM table where TimeStamp beween '2017-04-17' and '2017-04-18'
Having created your initial datetime.date value then you need to use a datetime.timedelta(days=1) on it.
You could format your query before execute it like this:
import datetime
mydate = datetime.date(2017,4,17)
query="select * from table where Column_A = 'A' AND TimeStamp like '%{}%'".format(mydate)
query
The query will be like:
"select * from table where Column_A = 'A' AND TimeStamp like '%2017-04-17%'"
After that you can pass it to cursor to query:
cursor.execute(query)

Parameterize a quoted string in Python's SQL DBI

I am using pg8000 to connect to a PostgreSQL database via Python. I would like to be able to send in dates as parameters via the cursor.execute method:
def info_by_month(cursor, year, month):
query = """
SELECT *
FROM info
WHERE date_trunc('month', info.created_at) =
date_trunc('month', '%s-%s-01')
"""
cursor.execute(query, (year, month))
return cursor
This will raise the error: InterfaceError: '%s' not supported in a quoted string within the query string. It's possible to use Python's string formatting to insert the date in there. The use of the string formatting mini language provides a measure of data validation to prevent SQL injection attacks, but it's still pretty ugly.
def info_by_month(cursor, year, month):
query = """
SELECT *
FROM info
WHERE date_trunc('month', info.created_at) =
date_trunc('month', '{:04}-{:02}-01')
""".format(year, month)
cursor.execute(query)
return cursor
How do I sent a quoted string into the cursor.execute method?
Do the format ahead of time, and then pass the resulting string into execute. That way you avoid the SQL injection potential, but still get the formatting you want.
e.g. the query becomes:
query = """
SELECT *
FROM info
WHERE date_trunc('month', info.created_at) =
date_trunc('month', %s)"""
And then the format and execute becomes:
dateStr = "{:04}-{:02}-01".format(year, month)
cursor.execute(query, dateStr)
I use psycopg2, but it appears pg8000 adheres to the same DBI standard, so I would expect this to work in pg8000, too.
It's possible to do this via concatenation, to the detriment of readability.
query = """
SELECT *
FROM info
WHERE date_trunc('month', info.created_at) =
date_trunc('month', %s || '-' || %s || '-01')
"""
cursor.execute(query, (year, month))

Inserting a Python datetime.datetime object into MySQL

I have a date column in a MySQL table. I want to insert a datetime.datetime() object into this column. What should I be using in the execute statement?
I have tried:
now = datetime.datetime(2009,5,5)
cursor.execute("INSERT INTO table
(name, id, datecolumn) VALUES (%s, %s
, %s)",("name", 4,now))
I am getting an error as: "TypeError: not all arguments converted during string formatting"
What should I use instead of %s?
For a time field, use:
import time
time.strftime('%Y-%m-%d %H:%M:%S')
I think strftime also applies to datetime.
You are most likely getting the TypeError because you need quotes around the datecolumn value.
Try:
now = datetime.datetime(2009, 5, 5)
cursor.execute("INSERT INTO table (name, id, datecolumn) VALUES (%s, %s, '%s')",
("name", 4, now))
With regards to the format, I had success with the above command (which includes the milliseconds) and with:
now.strftime('%Y-%m-%d %H:%M:%S')
Hope this helps.
Try using now.date() to get a Date object rather than a DateTime.
If that doesn't work, then converting that to a string should work:
now = datetime.datetime(2009,5,5)
str_now = now.date().isoformat()
cursor.execute('INSERT INTO table (name, id, datecolumn) VALUES (%s,%s,%s)', ('name',4,str_now))
Use Python method datetime.strftime(format), where format = '%Y-%m-%d %H:%M:%S'.
import datetime
now = datetime.datetime.utcnow()
cursor.execute("INSERT INTO table (name, id, datecolumn) VALUES (%s, %s, %s)",
("name", 4, now.strftime('%Y-%m-%d %H:%M:%S')))
Timezones
If timezones are a concern, the MySQL timezone can be set for UTC as follows:
cursor.execute("SET time_zone = '+00:00'")
And the timezone can be set in Python:
now = datetime.datetime.utcnow().replace(tzinfo=datetime.timezone.utc)
MySQL Documentation
MySQL recognizes DATETIME and TIMESTAMP values in these formats:
As a string in either 'YYYY-MM-DD HH:MM:SS' or 'YY-MM-DD HH:MM:SS'
format. A “relaxed” syntax is permitted here, too: Any punctuation
character may be used as the delimiter between date parts or time
parts. For example, '2012-12-31 11:30:45', '2012^12^31 11+30+45',
'2012/12/31 11*30*45', and '2012#12#31 11^30^45' are equivalent.
The only delimiter recognized between a date and time part and a
fractional seconds part is the decimal point.
The date and time parts can be separated by T rather than a space. For
example, '2012-12-31 11:30:45' '2012-12-31T11:30:45' are equivalent.
As a string with no delimiters in either 'YYYYMMDDHHMMSS' or
'YYMMDDHHMMSS' format, provided that the string makes sense as a date.
For example, '20070523091528' and '070523091528' are interpreted as
'2007-05-23 09:15:28', but '071122129015' is illegal (it has a
nonsensical minute part) and becomes '0000-00-00 00:00:00'.
As a number in either YYYYMMDDHHMMSS or YYMMDDHHMMSS format, provided
that the number makes sense as a date. For example, 19830905132800 and
830905132800 are interpreted as '1983-09-05 13:28:00'.
What database are you connecting to? I know Oracle can be picky about date formats and likes ISO 8601 format.
**Note: Oops, I just read you are on MySQL. Just format the date and try it as a separate direct SQL call to test.
In Python, you can get an ISO date like
now.isoformat()
For instance, Oracle likes dates like
insert into x values(99, '31-may-09');
Depending on your database, if it is Oracle you might need to TO_DATE it:
insert into x
values(99, to_date('2009/05/31:12:00:00AM', 'yyyy/mm/dd:hh:mi:ssam'));
The general usage of TO_DATE is:
TO_DATE(<string>, '<format>')
If using another database (I saw the cursor and thought Oracle; I could be wrong) then check their date format tools. For MySQL it is DATE_FORMAT() and SQL Server it is CONVERT.
Also using a tool like SQLAlchemy will remove differences like these and make your life easy.
If you're just using a python datetime.date (not a full datetime.datetime), just cast the date as a string. This is very simple and works for me (mysql, python 2.7, Ubuntu). The column published_date is a MySQL date field, the python variable publish_date is datetime.date.
# make the record for the passed link info
sql_stmt = "INSERT INTO snippet_links (" + \
"link_headline, link_url, published_date, author, source, coco_id, link_id)" + \
"VALUES(%s, %s, %s, %s, %s, %s, %s) ;"
sql_data = ( title, link, str(publish_date), \
author, posted_by, \
str(coco_id), str(link_id) )
try:
dbc.execute(sql_stmt, sql_data )
except Exception, e:
...
dt= datetime.now()
query = """INSERT INTO table1(python_Date_col)
VALUES (%s)
"""
conn = ...... # Connection creating process
cur = conn.cursor()
cur.execute(query,(dt))
Above code will fail as "datetime.now()" produces "datetime.datetime(2014, 2, 11, 1, 16)" as a parameter value to insert statement.
Use the following method to capture the datetime which gives string value.
dt= datetime.now().strftime("%Y%m%d%H%M%S")
I was able to successfully run the code after the change...
for example date is 5/5/22 convert it into mysql date format 2022-05-05 to insert record in mysql database
%m month
%d date
%Y Year of 4 digits
%y 2 digits
Code Below:
from datetime import datetime
now='5/5/22'
print("Before", now)
now= datetime.strptime(dob,'%m/%d/%y').strftime('%Y-%m-%d')
print("After", now)
cursor.execute("INSERT INTO table (name, id, datecolumn) VALUES (%s, %s, %s)",(name, 4,now))
Output:
Before 5/5/22
After 2022-05-05
(mysql format you can easily insert into database)
when iserting into t-sql
this fails:
select CONVERT(datetime,'2019-09-13 09:04:35.823312',21)
this works:
select CONVERT(datetime,'2019-09-13 09:04:35.823',21)
easy way:
regexp = re.compile(r'\.(\d{6})')
def to_splunk_iso(dt):
"""Converts the datetime object to Splunk isoformat string."""
# 6-digits string.
microseconds = regexp.search(dt).group(1)
return regexp.sub('.%d' % round(float(microseconds) / 1000), dt)

Categories