The problem is that I have a long SQL Script (that contains variables and comments and many lines of code) that I need to re-produce the results of which in Jupyter Notebooks.
I have already tried "tidying" the SQL into a string, but it is many lines and would take too long. For system architecture reasons I cannot create a procedure or view that encapsulates the script.
# The basic structure of my problem sans actual detail as not required
engine = create_engine("server/database=connect")
SQL = "select * from foo" #Insert very long script here
SQL_DF = pd.read_sql(SQL, engine)
I am hoping there is an informal way or trick to either convert a whole cell (containing my script text) to a string variable. Or if someone has another method of turning long SQL scripts into strings that can be used easily with SQL Alchemy.
you can try to use triple quote:
"""select * from foo
some more code
and more
etc"""
or you may use \ in the end of each line:
"select * from foo\
some more code\
and more\
etc"
Related
I am wanting to run a Presto SQL query in a for loop so that the query will pull hourly data based on my date variables.
Example query is along the lines of:
x = datetime.strptime('12-10-22', '%d-%m-%y').date()
y = datetime.strptime('13-10-22', '%d-%m-%y').date()
for dt in rrule.rrule(rrule.HOURLY, dtstart=nextProcStart, until=nextProcEnd):
sql_query = "SELECT SUM(sales) FROM a WHERE date between x and y"
I will note I'm using the syntax of writing the SQL query as a variable so along the lines of:
sql_query = """ SELECT... FROM..."""
I have tried just adding the variables into the query but no luck. Unsure what steps will work.
I've also tried using .format(x,y) at the end of my SQL query but keep getting an error saying
KeyError: 'x'
Remember that your SQL statement is no more than a string, so you just need to know how to incorporate a variable into a string.
Try:
sql_query = "SELECT SUM(sales) FROM a WHERE date between {} and {}".format(x, y)
Read How do I put a variable’s value inside a string (interpolate it into the string)? for more info or alternative methods.
Hopefully this answers your immediate question above on how to incorporate variable into string and get your code, as is, to work. However, as #nbk, mentions in comment below, this method is NOT recommended as it is insecure.
Using concatenations in SQL statements like this does open the code up to injection attacks. Even if your database does not contain sensitive information, it is bad practice.
Prepared statements have many advantages, not least of all that they are more secure and more efficient. I would certainly invest some time in researching and understanding SQL prepared statements.
I have a function that executes many SQL queries with different dates.
What I want is to pass all dates and other query variables as function parameters and then just execute the function. I have figured out how to do this for datetime variables as below. But I also have a query that looks at specific campaign_names in a database and pulls those as strings. I want to be able to pass those strings as function parameters but I haven't figured out the correct syntax for this in the SQL query.
def Camp_eval(start_date,end_1M,camp1,camp2,camp3):
query1 = f"""SELECT CONTACT_NUMBER, OUTCOME_DATE
FROM DATABASE1
where OUTCOME_DATE >= (to_date('{start_date}', 'dd/mm/yyyy'))
and OUTCOME_DATE < (to_date('{end_1M}', 'dd/mm/yyyy'))"""
query2 = """SELECT CONTACT_NUMBER
FROM DATABASE2
WHERE (CAMP_NAME = {camp1} or
CAMP_NAME = {camp2} or
CAMP_NAME = {camp3})"""
Camp_eval('01/04/2022','01/05/2022','Camp_2022_04','Camp_2022_05','Camp_2022_06')
The parameters start_date and end_1M work fine with the {} brackets but the camp variables, which are strings don't return any results even though there are results in the database with those conditions if I were to write them directly in the query.
Any help would be appreciated!!
Please, do not use f-strings for creating SQL queries!
Most likely, any library you use for accessing a database already has a way of creating queries: SQLite docs (check code examples).
Another example: cur.execute("SELECT * FROM tasks WHERE priority = ?", (priority,)).
Not only this way is safer (fixes SQL Injection problem mentioned by #d-malan in comments), but it also eliminates the need to care about how data is represented in SQL - the library will automatically cast dates, strings, etc. in what they need to be casted into. Therefore, your problem can be fixed by using proper instruments.
I use the simple query below to select from a table based on the date:
select * from tbl where date = '2019-10-01'
The simple query is part of a much larger query that extracts information from many tables on the same server. I don't have execute access on the server, so I can't install a stored procedure to make my life easier. Instead, I read the query into Python and try to replace certain values inside single quote strings, such as:
select * from tbl where date = '<InForceDate>'
I use a simple Python function (below) to replace with another value like 2019-10-01, but the str.replace() function isn't replacing when I look at the output. However, I tried this with a value like that wasn't in quotes and it worked. I'm sure I'm missing something fundamental, but haven't uncovered why it works without quotes and fails with quotes.
Python:
def generate_sql(sql_path, inforce_date):
with open(pd_sql_path, 'r') as sql_file:
sql_string = sql_file.read()
sql_final = str.replace(sql_string, r'<InForceDate>', inforce_date)
return(sql_final)
Can anyone point me in the right direction?
Nevermind folks -- problem solved, but haven't quite figured out why. File encoding is my guess.
can somebody please recomend me some python DBAL library that will best suit my requirements. I would like to write my sql statements directly, most of the logics will be in db stored procedures (postgresql), so I only need to invoke db procedures, pass arguments to them and fetch the results. The library should help me with quoting (preventing sql inject).
I played with sqlalchemy, but i think that there is no quoting helper when writing sql statement directly to engine.execute method.
Thank you
You should have given sqlalchemy a deeper look; It does a fine job of quoting placeholders:
>>> engine = sqlalchemy.create_engine("sqlite:///:memory:")
>>> engine.execute("select ?", 5).fetchall()
[(5,)]
>>> engine.execute("select ?", "; drop table users; --").fetchall()
[(u'; drop table users; --',)]
psycopg2 (via DB-API) will automatically quote to prevent SQL injection, IF you use it properly. (The python way is wrong; you have to pass the parameters as arguments to the query command itself.)
WRONG:
cur.execute('select * from table where last="%s" and first="%s"'
% (last, first))
RIGHT:
cur.execute('select * from table where last=%s and first=%s',
(last, first))
Note: you don't use %, and you don't put quotes around your values.
The syntax is slightly different for MySQLdb and sqlite3. (For example, sqlite uses ? instead of %s.)
Also, for psycopg2, always use %s even if you're dealing with numbers or some other type.
I've been trying to find a postgres interface for python 2.x that supports real prepared statements, but can't seem to find anything. I don't want one that just escapes quotes in the params you pass in and then interpolates them into the query before executing it. Anyone have any suggestions?
Either py-postgresql for Python3 or pg_proboscis for Python2 will do this.
Python-pgsql will also do this but is not threadsafe. Notably, SQLAlchemy does not make use of prepared statements.
have a look at web.py's db module
examples can be found at
http://webpy.org/cookbook/select
http://webpy.org/cookbook/update
http://webpy.org/cookbook/delete
http://webpy.org/Insert
These links hint at the answer when using psycopg2. You don't need special API extensions.
Re: psycopg2 and prepared statements
Prepared Statements in Postgresql
Transparently execute SQL queries as prepared statements with
Postgresql (Python recipe)
Here's an example that I played with. A word of caution though, it didn't give me the expected performance increase I had hoped for. In fact, it was even slower (just slightly) in a contrived case where I tried to read the whole table of one million rows, one row at a time.
cur.execute('''
PREPARE prepared_select(text, int) AS
SELECT * FROM test
WHERE (name = $1 and rowid > $2) or name > $1
ORDER BY name, rowid
LIMIT 1
''')
name = ''
rowid = 0
cur.execute('EXECUTE prepared_select(%s, %s)', (name, rowid))