Use same parameter multiple times in sql query - python

I am using pyodbc and Microsoft SQL Server
I am trying to replicate a stored procedure in python where this query is executed for every #currentSurveyId
SELECT *
FROM
(
SELECT
SurveyId,
QuestionId,
1 as InSurvey
FROM
SurveyStructure
WHERE
SurveyId = #currentSurveyId
UNION
SELECT
#currentSurveyId as SurveyId,
Q.QuestionId,
0 as InSurvey
FROM
Question as Q
WHERE NOT EXISTS
(
SELECT *
FROM SurveyStructure as S
WHERE S.SurveyId = #currentSurveyId AND S.QuestionId = Q.QuestionId
)
) as t
ORDER BY QuestionId
In Python, I so far have:
cursor.execute("""SELECT UserId FROM dbo.[User]""")
allSurveyID = cursor.fetchall()
for i in allSurveyID:
p = i
test = cursor.execute("""SELECT *
FROM
(
SELECT
SurveyId,
QuestionId,
1 as InSurvey
FROM
SurveyStructure
WHERE
SurveyId = (?)
UNION
SELECT
(?) as SurveyId,
Q.QuestionId,
0 as InSurvey
FROM
Question as Q
WHERE NOT EXISTS
(
SELECT *
FROM SurveyStructure as S
WHERE S.SurveyId = (?)AND S.QuestionId = Q.QuestionId
)
) as t
ORDER BY QuestionId""",p)
for i in test:
print(i)
The parameter works when used once (if I delete everything from UNION onwards). When trying to use the same parameter in the rest of the query, I get the following error:('The SQL contains 3 parameter markers, but 1 parameters were supplied', 'HY000')
Is it possible to use the same parameter multiple times in the same query?
Thank you

pyodbc itself only supports "qmark" (positional) parameters (ref: here), but with T-SQL (Microsoft SQL Server) we can use an anonymous code block to avoid having to pass the same parameter value multiple times:
cnxn = pyodbc.connect(connection_string)
crsr = cnxn.cursor()
sql = """\
SET NOCOUNT ON;
DECLARE #my_param int = ?;
SELECT #my_param AS original, #my_param * 2 AS doubled;
"""
results = crsr.execute(sql, 2).fetchone()
print(results) # (2, 4)

If reusing the same parameter value, simply multiply a one-item list of the parameter:
cursor.execute(sql, [p]*3)
Consider also refactoring your SQL for LEFT JOIN (or FULL JOIN) requiring two qmarks:
SELECT DISTINCT
ISNULL(S.SurveyId, ?) AS SurveyId,
Q.QuestionId,
IIF(S.SurveyId IS NOT NULL, 1, 0) AS InSurvey
FROM Question Q
LEFT JOIN SurveyStructure S
ON S.QuestionId = Q.QuestionId
AND S.SurveyId = ?
ORDER BY Q.QuestionId
Possibly even for one parameter:
SELECT MAX(S.SurveyId) AS SurveyId,
Q.QuestionId,
IIF(S.SurveyId IS NOT NULL, 1, 0) AS InSurvey
FROM Question Q
LEFT JOIN SurveyStructure S
ON S.QuestionId = Q.QuestionId
AND S.SurveyId = ?
GROUP BY Q.QuestionId,
IIF(S.SurveyId IS NOT NULL, 1, 0)
ORDER BY Q.QuestionId

Related

Oracle SQL show records WHERE there are more than one record for each pair of (a,b)

I am using Python to access an Oracle Exadata database, which is HUGE. The documentation for the table is rather poor and I need to understand strange cases. Coming from an R/python world I ran the following query:
query = ("""
SELECT COUNT(counter) as freq, counter
FROM (
SELECT COUNT(*) as counter
FROM schema.table
WHERE x = 1 AND y = 1
GROUP BY a,b )
GROUP BY counter""")
with cx_Oralce.connct(dsn=tsn, encoding = "UTF-8") as con:
df = pd.read_sql(con=con, query=sql)
This essentially counts the frequency of observations for a given (a,b) pair. My prior was that they are all 1 (they are not). So I would like to see the observations that drive this:
query = ("""
SELECT *
FROM schema.table
WHERE x = 1 and y = 1
AND (for each (a,b) there is more than one record)""")
I am struggling to translate this into proper Oracle SQL.
In R (dplyr) this would be a combination of group_by and mutate (instead of summarise) and in Python pandas this could be done with transform.
I am new to SQL and may use incorrect terminology. I appreciate being corrected.
You can use window functions:
SELECT ab.*
FROM (SELECT t.*, COUNT(*) OVER (PARTITION BY a, b) as cnt
FROM schema.table t
WHERE x = 1 AND y = 1
) ab
WHERE cnt > 1;

Can't convert this SQL String (with VALUES ... AS) to SQLAlchemy Code

The SQL query I have can identify the Max Edit Time from the 3 tables that it is joining together:
Select Identity.SSN, Schedule.First_Class, Students.Last_Name,
(SELECT Max(v)
FROM (VALUES (Students.Edit_DtTm), (Schedule.Edit_DtTm),
(Identity.Edit_DtTm)) AS value(v)) as [MaxEditDate]
FROM Schedule
LEFT JOIN Students ON Schedule.stdnt_id=Students.Student_Id
LEFT JOIN Identity ON Schedule.std_id=Identity.std_id
I need this to be in SQLAlchemy so I can reference the columns being used elsewhere in my code. Below is the simplest version of what i'm trying to do but it doesn't work. I've tried changing around how I query it but I either get a SQL error that I'm using VALUES incorrectly or it doesn't join properly and gets me the actual highest value in those columns without matching it to the outer query
max_edit_subquery = sa.func.values(Students.Edit_DtTm, Schedule.Edit_DtTm, Identity.Edit_DtTm)
base_query = (sa.select([Identity.SSN, Schedule.First_Class, Students.Last_Name,
(sa.select([sa.func.max(self.max_edit_subquery)]))]).
select_from(Schedule.__table__.join(Students, Schedule.stdnt_id == Students.stdnt_id).
join(Ident, Schedule.std_id == Identity.std_id)))
I am not an expert at SQLAlchemy but you could exchange VALUES with UNION ALL:
Select Identity.SSN, Schedule.First_Class, Students.Last_Name,
(SELECT Max(v)
FROM (SELECT Students.Edit_DtTm AS v
UNION ALL SELECT Schedule.Edit_DtTm
UNION ALL SELECT Identity.Edit_DtTm) s
) as [MaxEditDate]
FROM Schedule
LEFT JOIN Students ON Schedule.stdnt_id=Students.Student_Id
LEFT JOIN Identity ON Schedule.std_id=Identity.std_id;
Another approach is to use GREATEST function (not available in T-SQL):
Select Identity.SSN, Schedule.First_Class, Students.Last_Name,
GREATEST(Students.Edit_DtTm, Schedule.Edit_DtTm,Identity.Edit_DtTm)
as [MaxEditDate]
FROM Schedule
LEFT JOIN Students ON Schedule.stdnt_id=Students.Student_Id
LEFT JOIN Identity ON Schedule.std_id=Identity.std_id;
I hope that it will help you to translate it to ORM version.
I had the similar problem and i solved using the below approach. I have added the full code and resultant query. The code was executed on the MSSQL server. I had used different tables and masked with the tables and columns used in your requirement in the below code snippet.
from sqlalchemy import *
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.types import String
from sqlalchemy.sql.expression import FromClause
class values(FromClause):
def __init__(self, *args):
self.column_names = args
#compiles(values)
def compile_values(element, compiler, asfrom=False, **kwrgs):
values = "VALUES %s" % ", ".join("(%s)" % compiler.render_literal_value(elem, String()) for elem in element.column_names)
if asfrom:
values = "(%s)" % values
return values
base_query = self.db_session.query(Schedule.Edit_DtTm.label("Schedule_Edit_DtTm"),
Identity.Edit_DtTm.label("Identity_Edit_DtTm"),
Students.Edit_DtTm.label("Students_Edit_DtTm"),
Identity.SSN
).outerjoin(Students, Schedule.stdnt_id==Students.Student_Id
).outerjoin(Identity, Schedule.std_id==Identity.std_id).subquery()
values_at_from_clause = values(("Students_Edit_DtTm"), ("Schedule_Edit_DtTm"), ("Identity_Edit_DtTm")
).alias('values(MaxEditDate)')
get_max_from_values = self.db_session.query(func.max(text('MaxEditDate'))
).select_from(values_at_from_clause)
output_query = self.db_session.query(get_max_from_values.subquery()
).label("MaxEditDate")
**print output_query**
SELECT
anon_1.Schedule_Edit_DtTm AS anon_1_Schedule_Edit_DtTm,
anon_1.Students_Edit_DtTm AS anon_1_Students_Edit_DtTm,
anon_1.Identity_Edit_DtTm AS anon_1_Identity_Edit_DtTm,
anon_1.SSN AS anon_1_SSN
(
SELECT
anon_2.max_1
FROM
(
SELECT
max( MaxEditDate ) AS max_1
FROM
(
VALUES (Students_Edit_DtTm),
(Schedule_Edit_DtTm),
(Identity_Edit_DtTm)
) AS values(MaxEditDate)
) AS anon_2
) AS MaxEditDate
FROM
(
SELECT
Schedule.Edit_DtTm AS Schedule_Edit_DtTm,
Students.Edit_DtTm AS Students_Edit_DtTm,
Identity.Edit_DtTm AS Identity_Edit_DtTm,
Identity.SSN AS SSN
FROM
Schedule WITH(NOLOCK)
LEFT JOIN Students WITH(NOLOCK) ON
Schedule.stdnt_id==Students.Student_Id
LEFT JOIN Identity WITH(NOLOCK) ON
Schedule.std_id==Identity.std_id
) AS anon_1

Increase Query Speed in Sqlite

I am a very newbie in using python and sqlite. I am trying to create a script that reads a data from a table (rawdata) and then performs some calculations which is then stored in a new table. I am counting the number race that a player has won before that date at a particular track position and calculating the percentage. There are 15 track positions in total. Overall the script is very slow. Any suggestions to improve its speed. I have already used the PRAGMA parameters.
Below is the script.
for item in result:
l1 = str(item[0])
l2 = item[1]
l3 = int(item[2])
winpost = []
key = l1.split("|")
dt = l2
###Denominator--------------
cursor.execute(
"SELECT rowid FROM rawdata WHERE Track = ? AND Date< ? AND Distance = ? AND Surface =? AND OfficialFinish=1",
(key[2], dt, str(key[4]), str(key[5]),))
result_den1 = cursor.fetchall()
cursor.execute(
"SELECT rowid FROM rawdata WHERE Track = ? AND RaceSN<= ? AND Date= ? AND Distance = ? AND Surface =? AND OfficialFinish=1",
(key[2], int(key[3]), dt, str(key[4]), str(key[5]),))
result_den2 = cursor.fetchall()
totalmat = len(result_den1) + len(result_den2)
if totalmat > 0:
for i in range(1, 16):
cursor.execute(
"SELECT rowid FROM rawdata WHERE Track = ? AND Date< ? AND PolPosition = ? AND Distance = ? AND Surface =? AND OfficialFinish=1",
(key[2], dt, i, str(key[4]), str(key[5]),))
result_num1 = cursor.fetchall()
cursor.execute(
"SELECT rowid FROM rawdata WHERE Track = ? AND RaceSN<= ? AND Date= ? AND PolPosition = ? AND Distance = ? AND Surface =? AND OfficialFinish=1",
(key[2], int(key[3]), dt, i, str(key[4]), str(key[5]),))
result_num2 = cursor.fetchall()
winpost.append(len(result_num1) + len(result_num2))
winpost = [float(x) / totalmat for x in winpost]
rank = rankmin(winpost)
franks = list(rank)
franks.insert(0, int(key[3]))
franks.insert(0, dt)
franks.insert(0, l1)
table1.append(franks)
franks = []
cursor.executemany("INSERT INTO posttable VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)", table1)
Sending and retrieving an SQL query is "expensive" in terms of time. The easiest way to speed things up would be to use SQL functions to reduce the number of queries.
For example, the first two queries could be reduced to a single call using COUNT(), UNION, and Aliases.
SELECT COUNT(*)
FROM
( SELECT rowid FROM rawdata where ...
UNION
SELECT rowid FROM rawdata where ...
) totalmatch
In this case we take the two original queries (with your conditions in place of the "...") combine them with a UNION statement, give that union the alias "totalmatch", and count all the rows in it.
Same thing can be done with the second set of queries. Instead of cycling 16 times over 2 queries (resulting in 32 calls to the SQL engine) you can replace it with one query by also using GROUP BY.
SELECT PolPosition, COUNT(PolPosition)
FROM
( SELECT PolPosition FROM rawdata WHERE ...
UNION
SELECt PolPosition FROM rawdata WHERE ...
) totalmatch
GROUP BY PolPosition
In this case we take the exact same query as before and group it by PolPosition, using COUNT to display how many rows are in each group.
W3Schools is a great resource for how these functions work:
http://www.w3schools.com/sql/default.asp

Using a Python loop to create SQL databases from lists [duplicate]

if count == 1:
cursor.execute("SELECT * FROM PacketManager WHERE ? = ?", filters[0], parameters[0])
all_rows = cursor.fetchall()
elif count == 2:
cursor.execute("SELECT * FROM PacketManager WHERE ? = ? AND ? = ?", filters[0], parameters[0], filters[1], parameters[1])
all_rows = cursor.fetchall()
elif count == 3 :
cursor.execute("SELECT * FROM PacketManager WHERE ? = ? AND ? = ? AND ? = ?", filters[0], parameters[0], filters[1], parameters[1], filters[2], parameters[2])
all_rows = cursor.fetchall()
This is a code snippet in my program. What I'm planning to do is pass the column name and the parameter in the query.
The filters array contains the columnnames, the parameter array contains the parameters. The count is the number of filters set by the user. The filters and paramters array are already ready and have no problem. I just need to pass it to the query for it to execute. This give me an error of "TypeError: function takes at most 2 arguments"
You cannot use SQL parameters to interpolate column names. You'll have to use classic string formatting for those parts. That's the point of SQL parameters; they quote values so they cannot possibly be interpreted as SQL statements or object names.
The following, using string formatting for the column name works, but be 100% certain that the filters[0] value doesn't come from user input:
cursor.execute("SELECT * FROM PacketManager WHERE {} = ?".format(filters[0]), (parameters[0],))
You probably want to validate the column name against a set of permissible column names, to ensure no injection can take place.
You can only set parameters using ?, not table or column names.
You could build a dict with predefined queries.
queries = {
"foo": "SELECT * FROM PacketManager WHERE foo = ?",
"bar": "SELECT * FROM PacketManager WHERE bar = ?",
"foo_bar": "SELECT * FROM PacketManager WHERE foo = ? AND bar = ?",
}
# count == 1
cursor.execute(queries[filters[0], parameters[0])
# count == 2
cursor.execute(queries[filters[0] + "_" + queries[filters[1], parameters[0])
This approach will make you save from SQL injection in filters[0].

SQL- How to grab a large list of data instead of iterating each query

I gather a list of items for each item I check the database with SQL query with the following code:
SELECT *
FROM task_activity as ja
join task as j on ja.task_id = j.id
WHERE j.name = '%s'
AND ja.avg_runtime <> 0
AND ja.avg_runtime is not NULL
AND ja.id = (SELECT MAX(id) FROM task_activity
WHERE task_id = ja.task_id
and avg_runtime <> 0
AND ja.avg_runtime is not NULL)
% str(task.get('name'))).fetchall()
But do I need to iterate through the list and make a query for everyone. This list is quite large at times. Can I just make one query and get back a list data set?
In this particular query I'm only looking for the column avg_runtime with the task id and the maximum id will be the last calculated runtime.
I don't have access to the database other then to make queries. Using Microsoft SQL Server 2012 (SP1) - 11.0.3349.0 (X64)
You might be able to speed this up using row_number(). Note, I think there's a bug in your original query. Should ja.avg_runtime in the subquery just be avg_runtime?
sql = """with x as (
select
task_id,
avg_runtime,
id,
row_number() over (partition by ja.task_id order by ja.id desc) rn
from
task_activity as ja
join
task as j
on ja.task_id = j.id
where
j.name in ({0}) and
ja.avg_runtime <> 0 and
ja.avg_runtime is not null
) select
task_id,
avg_runtime,
id
from
x
where
rn = 1;"""
# build up ?,?,? for parameter substitution
# assume tasknames is the list containing the task names.
params = ",".join(tasknames.map(lambda x: "?"))
# connection is your db connection
cursor = connection.cursor()
# interpolate the ?,?,? and bind parameters
cursor.execute(sql.format(params), tasknames)
cursor.fetchall()
the following index should make this query pretty fast (although it depends how many rows are being excluded by the filters on ja.avg_runtime):
create index ix_task_id_id on task_activity (task_id, id desc);

Categories