SQLAlchemy filter in_ operator - python

I am trying to do a simple filter operation on a query in sqlalchemy, like this:
q = session.query(Genotypes).filter(Genotypes.rsid.in_(inall))
where
inall is a list of strings
Genotypes is mapped to a table:
class Genotypes(object):
pass
Genotypes.mapper = mapper(Genotypes, kg_table, properties={'rsid': getattr(kg_table.c, 'rs#')})
This seems pretty straightforward to me, but I get the following error when I execute the above query by doing q.first():
"sqlalchemy.exc.OperationalError: (OperationalError) too many SQL
variables u'SELECT" followed by a list of the 1M items in the inall
list. But they aren't supposed to be SQL variables, just a list whose
membership is the filtering criteria.
Am I doing the filtering incorrectly?
(the db is sqlite)

If the table where you are getting your rsids from is available in the same database I'd use a subquery to pass them into your Genotypes query rather than passing the one million entries around in your Python code.
sq = session.query(RSID_Source).subquery()
q = session.query(Genotypes).filter(Genotypes.rsid.in_(sq))
The issue is that in order to pass that list to SQLite (or any database, really), SQLAlchemy has to pass over each entry for your in clause as a variable. The SQL translates roughly to:
-- Not valid SQLite SQL
DECLARE #Param1 TEXT;
SET #Param1 = ?;
DECLARE #Param2 TEXT;
SET #Param2 = ?;
-- snip 999,998 more
SELECT field1, field2, -- etc.
FROM Genotypes G
WHERE G.rsid IN (#Param1, #Param2, /* snip */)

The below workaround worked for me:
q = session.query(Genotypes).filter(Genotypes.rsid.in_(inall))
query_as_string = str(q.statement.compile(compile_kwargs={"literal_binds": True}))
session.execute(query_as_string).first()
This basically forces the query to compile as a string before execution, which bypasses the whole variables issue. Some details on this are available in SQLAlchemy's docs here.
BTW, if you're not using SQLite you can make use of the ANY operator to pass the list object as a single parameter (see my answer to this question here).

Related

Add non-datetime variables in SQL python as function parameters

I have a function that executes many SQL queries with different dates.
What I want is to pass all dates and other query variables as function parameters and then just execute the function. I have figured out how to do this for datetime variables as below. But I also have a query that looks at specific campaign_names in a database and pulls those as strings. I want to be able to pass those strings as function parameters but I haven't figured out the correct syntax for this in the SQL query.
def Camp_eval(start_date,end_1M,camp1,camp2,camp3):
query1 = f"""SELECT CONTACT_NUMBER, OUTCOME_DATE
FROM DATABASE1
where OUTCOME_DATE >= (to_date('{start_date}', 'dd/mm/yyyy'))
and OUTCOME_DATE < (to_date('{end_1M}', 'dd/mm/yyyy'))"""
query2 = """SELECT CONTACT_NUMBER
FROM DATABASE2
WHERE (CAMP_NAME = {camp1} or
CAMP_NAME = {camp2} or
CAMP_NAME = {camp3})"""
Camp_eval('01/04/2022','01/05/2022','Camp_2022_04','Camp_2022_05','Camp_2022_06')
The parameters start_date and end_1M work fine with the {} brackets but the camp variables, which are strings don't return any results even though there are results in the database with those conditions if I were to write them directly in the query.
Any help would be appreciated!!
Please, do not use f-strings for creating SQL queries!
Most likely, any library you use for accessing a database already has a way of creating queries: SQLite docs (check code examples).
Another example: cur.execute("SELECT * FROM tasks WHERE priority = ?", (priority,)).
Not only this way is safer (fixes SQL Injection problem mentioned by #d-malan in comments), but it also eliminates the need to care about how data is represented in SQL - the library will automatically cast dates, strings, etc. in what they need to be casted into. Therefore, your problem can be fixed by using proper instruments.

How does a SQLAlchemy Session manage transactions, when executing multiple raw SQL statements at once?

None of the "similar questions" really get at this specific topic, but I am trying to find out how SQLAlchemy's Session handles transactions, when:
Passing raw SQL text to the execute() method, rather than utilizing any SQLAlchemy model objects, AND
The raw SQL text contains multiple distinct commands.
For instance:
bulk_operation = """
DELETE FROM the_table WHERE id = ...;
INSERT INTO the_table (id, name) VALUES (...);
"""
sql = text(bulk_operation)
session.execute(sql.bindparams(id=foo, name=bar))
The goal here is to restore the original state, if either the DELETE or the INSERT fails for any reason.
But does Session.execute() actually guarantee this, in this context? Is it necessary to include BEGIN and COMMIT commands within the raw SQL text itself, or manage from the Python level with session.commit() or something else? Thanks in advance!

MySql read_sql python query with variable #

I am aware that queries in Python can be parameterized using either ? or %s in execute query here or here
However I have some long query that would use some constant variable defined at the beginning of the query
Set #my_const = 'xyz';
select #my_const;
-- Query that use #my_const 40 times
select ... coalesce(field1, #my_const), case(.. then #my_const)...
I would like to do the least modif possible to the query from Mysql. So that instead of modifying the query to
pd.read_sql(select ... coalesce(field1, %s), case(.. then %s)... , [my_const, my_const, my_const, ..]
,I could write something along the line of the initial query. Upon trying the following, however, I am getting a TypeError: 'NoneType' object is not iterable
query_str = "Set #null_val = \'\'; "\
" select #null_val"
erpur_df = pd.read_sql(query_str, con = db)
Any idea how to use the original variable defined in Mysql query ?
The reason
query_str = "Set #null_val = \'\'; "\
" select #null_val"
erpur_df = pd.read_sql(query_str, con = db)
throws that exception is because all you are doing is setting null_value to '' and then selecting that '' - what exactly would you have expected that to give you? EDIT read_sql only seems to execute one query at a time, and as the first query returns no rows it results in that exception.
If you split them in to two calls to read_sql then it will in fact return you the value of your #null value in the second call. Due to this behaviour read_sql is clearly not a good way to do this. I strongly suggest you use one of my suggestions below.
Why are you wanting to set the variable in the SQL using '#' anyway?
You could try using the .format style of string formatting.
Like so:
query_str = "select ... coalesce(field1, {c}), case(.. then {c})...".format(c=my_const)
pd.read_sql(query_str)
Just remember that if you do it this way and your my_const is a user input then you will need to sanitize it manually to prevent SQL injection.
Another possibility is using a dict of params like so:
query_str = "select ... coalesce(field1, %(my_const)s, case(.. then %(my_const)s)..."
pd.read_sql(query_str, params={'my_const': const_value})
However this is dependent on which database driver you use.
From the pandas.read_sql docs:
Check your database driver documentation for which of the five syntax
styles, described in PEP 249’s paramstyle, is supported. Eg. for
psycopg2, uses %(name)s so use params={‘name’ : ‘value’}

Changing where clause without generating subquery in SQLAlchemy

I'm trying to build a relatively complex query and would like to manipulate the where clause of the result directly, without cloning/subquerying the returned query. An example would look like:
session = sessionmaker(bind=engine)()
def generate_complex_query():
return select(
columns=[location.c.id.label('id')],
from_obj=location,
whereclause=location.c.id>50
).alias('a')
query = generate_complex_query()
# based on this query, I'd like to add additional where conditions, ideally like:
# `query.where(query.c.id<100)`
# but without subquerying the original query
# this is what I found so far, which is quite verbose and it doesn't solve the subquery problem
query = select(
columns=[query.c.id],
from_obj=query,
whereclause=query.c.id<100
)
# Another option I was considering was to map the query to a class:
# class Location(object):pass
# mapper(Location, query)
# session.query(Location).filter(Location.id<100)
# which looks more elegant, but also creates a subquery
result = session.execute(query)
for r in result:
print r
This is the generated query:
SELECT a.id
FROM (SELECT location.id AS id
FROM location
WHERE location.id > %(id_1)s) AS a
WHERE a.id < %(id_2)s
I would like to obtain:
SELECT location.id AS id
FROM location
WHERE id > %(id_1)s and
id < %(id_2)s
Is there any way to achieve this? The reason for this is that I think query (2) is slightly faster (not much), and the mapper example (2nd example above) which I have in place messes up the labels (id becomes anon_1_id or a.id if I name the alias).
Why don't you do it like this:
query = generate_complex_query()
query = query.where(location.c.id < 100)
Essentially you can refine any query like this. Additionally, I suggest reading the SQL Expression Language Tutorial which is pretty awesome and introduces all the techniques you need. The way you build a select is only one way. Usually, I build my queries more like this: select(column).where(expression).where(next_expression) and so on. The FROM is usually automatically inferred by SQLAlchemy from the context, i.e. you rarely need to specify it.
Since you don't have access to the internals of generate_complex_query try this:
query = query.where(query.c.id < 100)
This should work in your case I presume.
Another idea:
query = query.where(text("id < 100"))
This uses SQLAlchemy's text expression. This could work for you, however, and this is important: If you want to introduce variables, read the description of the API linked above, because just using format strings intead of bound parameters will open you up to SQL injection, something that normally is a no-brainer with SQLAlchemy but must be taken care of if working with such literal expressions.
Also note that this works because you label the column as id. If you don't do that and don't know the column name, then this won't work either.

Python: Number of rows affected by cursor.execute("SELECT ...)

How can I access the number of rows affected by:
cursor.execute("SELECT COUNT(*) from result where server_state='2' AND name LIKE '"+digest+"_"+charset+"_%'")
Try using fetchone:
cursor.execute("SELECT COUNT(*) from result where server_state='2' AND name LIKE '"+digest+"_"+charset+"_%'")
result=cursor.fetchone()
result will hold a tuple with one element, the value of COUNT(*).
So to find the number of rows:
number_of_rows=result[0]
Or, if you'd rather do it in one fell swoop:
cursor.execute("SELECT COUNT(*) from result where server_state='2' AND name LIKE '"+digest+"_"+charset+"_%'")
(number_of_rows,)=cursor.fetchone()
PS. It's also good practice to use parametrized arguments whenever possible, because it can automatically quote arguments for you when needed, and protect against sql injection.
The correct syntax for parametrized arguments depends on your python/database adapter (e.g. mysqldb, psycopg2 or sqlite3). It would look something like
cursor.execute("SELECT COUNT(*) from result where server_state= %s AND name LIKE %s",[2,digest+"_"+charset+"_%"])
(number_of_rows,)=cursor.fetchone()
From PEP 249, which is usually implemented by Python database APIs:
Cursor Objects should respond to the following methods and attributes:
[…]
.rowcount
This read-only attribute specifies the number of rows that the last .execute*() produced (for DQL statements like 'select') or affected (for DML statements like 'update' or 'insert').
But be careful—it goes on to say:
The attribute is -1 in case no .execute*() has been performed on the cursor or the rowcount of the last operation is cannot be determined by the interface. [7]
Note:
Future versions of the DB API specification could redefine the latter case to have the object return None instead of -1.
So if you've executed your statement, and it works, and you're certain your code will always be run against the same version of the same DBMS, this is a reasonable solution.
The number of rows effected is returned from execute:
rows_affected=cursor.execute("SELECT ... ")
of course, as AndiDog already mentioned, you can get the row count by accessing the rowcount property of the cursor at any time to get the count for the last execute:
cursor.execute("SELECT ... ")
rows_affected=cursor.rowcount
From the inline documentation of python MySQLdb:
def execute(self, query, args=None):
"""Execute a query.
query -- string, query to execute on server
args -- optional sequence or mapping, parameters to use with query.
Note: If args is a sequence, then %s must be used as the
parameter placeholder in the query. If a mapping is used,
%(key)s must be used as the placeholder.
Returns long integer rows affected, if any
"""
In my opinion, the simplest way to get the amount of selected rows is the following:
The cursor object returns a list with the results when using the fetch commands (fetchall(), fetchone(), fetchmany()). To get the selected rows just print the length of this list. But it just makes sense for fetchall(). ;-)
print len(cursor.fetchall)
# python3
print(len(cur.fetchall()))
To get the number of selected rows I usually use the following:
cursor.execute(sql)
count = len(cursor.fetchall())
when using count(*) the result is {'count(*)': 9}
-- where 9 represents the number of rows in the table, for the instance.
So, in order to fetch the just the number, this worked in my case, using mysql 8.
cursor.fetchone()['count(*)']

Categories