SQLAlchemy match with or - python

I'm getting myself tied up in knots with some sqlalchemy I'm trying to work out. I've got an old web app I'm trying to tart up, and have decided to rewrite it from scratch. As part of that, I'm playing with SQL Alchemy and trying to improve my pythonic skills - so I've got a search object I'm trying to run, where I'm checking to see if the customer query exists in either the account name and customer name fields and match against either of them. However SQL Alchemy registers it as an AND
If I add extra or_ blocks, it fails to recognise them and process appropriately.
I've moved it so it's the first query, but the query planner in sqlalchemy leaves it exactly the same.
Any ideas?
def CustomerCountryMatch(query, page):
customer=models.Customer
country=models.CustomerCodes
query=customer.query.order_by(customer.account_name).\
group_by(customer.account_name).having(func.max(customer.renewal_date)).\
join(country, customer.country_code==country.CODE).\
add_columns(customer.account_name,
customer.customer_name,
customer.account_id,
customer.CustomerNote,
country.COUNTRY,
country.SupportRegion,
customer.renewal_date,
customer.contract_type,
customer.CCGroup).\
filter(customer.account_name.match(query)).filter(or_(customer.customer_name.match(query))).\
paginate(page, 50, False)
The query as executed is below:
sqlalchemy.engine.base.Engine SELECT customer.customer_id AS customer_customer_id,
customer.customer_code AS customer_customer_code,
customer.address_code AS customer_address_code,
customer.customer_name AS customer_customer_name,
customer.account_id AS customer_account_id,
customer.account_name AS customer_account_name,
customer.`CustomerNote` AS `customer_CustomerNote`,
customer.renewal_date AS customer_renewal_date,
customer.contract_type AS customer_contract_type,
customer.country_code AS customer_country_code,
customer.`CCGroup` AS `customer_CCGroup`,
customer.`AgentStatus` AS `customer_AgentStatus`,
customer.comments AS customer_comments,
customer.`SCR` AS `customer_SCR`,
customer.`isDummy` AS `customer_isDummy`,
customer_codes.`COUNTRY` AS `customer_codes_COUNTRY`,
customer_codes.`SupportRegion` AS `customer_codes_SupportRegion`
FROM customer INNER JOIN
customer_codes ON customer.country_code=customer_codes.`CODE` WHERE
MATCH (customer.account_name) AGAINST (%s IN BOOLEAN MODE) AND
MATCH (customer.customer_name) AGAINST (%s IN BOOLEAN MODE) GROUP BY
customer.account_name HAVING max(customer.renewal_date) ORDER BY
customer.account_name LIMIT %s,
%s 2015-11-06 03:32:52,035 INFO sqlalchemy.engine.base.Engine ('bob', 'bob', 0, 50)

The filter clause should be:
filter(
or_(
customer.account_name.match(query),
customer.customer_name.match(query)
)
)
Calling filter twice, as in filter(clause1).filter(clause2) joins the criteria using AND (see the docs).
The construct: filter(clause1).filter(or_(clause2)) does not do what you intend, and is translated into SQL: clause1 AND clause2.
The following example makes sense: filter(clause1).filter(or_(clause2, clause3)), and is translated into SQL as: clause1 AND (clause2 OR clause 3).

A simpler approach is to use an OR clause using the '|' operator within your match if you want to find all matches that contain one or more of the words your are searching for eg
query = query.filter(Table.text_searchable_column.match('findme | orme'))

Related

How to use sqlparse to parse sql statements

I am trying to parse all the queries executed by users (within a period of time) in PostgreSQL DB (by querying the pg_stat_statements table) and trying to create a report of which tables are used by users to run either a Select or an Insert or a Delete query. Basically running something like Select query, queryid, userid from pg_stat_state and then parsing each query to check if it was a Select or an Insert or a Delete query and also extract the table_Name from the query.
I am using sqlparse python module but very new to it so need help.
I am able to get the table name by using something like:
import sqlparse
from sqlparse.sql import Where, Comparison, Parenthesis, Identifier
for tokens in sqlparse.parse(sql_statement)[0]:
if isinstance(tokens, Identifier):
print(str(tokens))
but not sure how to get the type of statement (Select/Insert/Delete) together with the name of the table. Also, need to incorporate COPY statements as Selects too.
I tried using psqlparse but I did not see much info/help online regarding this module.
Please suggest.
Thanks.
This is not trivial, and I don't think sqlparse really helps very much. INSERT and DELETE are pretty easy, because they usually start out "INSERT INTO table" and "DELETE FROM table", but "SELECT" is the wild wild west. Clearly the tables will be mentioned in a FROM clause, but it could be "FROM table1 t1, table t2, table t3 WHERE" or "FROM table t1 LEFT INNER JOIN table t2 LEFT INNER JOIN table t3 WHERE".
You might have nested queries, and a SELECT doesn't even have to have a table. Plus, there could be UNIONs that mention further tables. And, of course, "SELECT INTO" is just another way of doing "INSERT". I believe you should start out just doing text processing, looking for the key words. You might get far enough.

Using top level item in nested subquery with SQLAlchemy

I have a query when I'm attempting to find a link between two tables, but I require few checks with association tables within the same query. I think my problem stems from having to check across multiple levels of relationships, where I want to filter a subquery based on the top level item, but I've hit an issue and have no idea how to proceed.
More specifically I want to query Script using the name of an Application, but narrow the results down to when the Application's Language matches the Script's Language.
Tables: Script (id, language_id), Application (id, name), Language (id)
Association Tables: ApplicationLanguage (app_id, language_id), ScriptApplication (script_id, app_id)
Current attempt: (it's important this stays as a single query)
value = 'appname'
# Search applications for a value
app_search = select([Application.id]).where(Application.name==value).as_scalar()
# Search for applications matching the language of the script
lang_search = select([ApplicationLanguage.app_id]).where(
ApplicationLanguage.language_id==Script.language_id
).as_scalar()
# Find the script based on which applications appear in both subqueries.
script_search = select([ScriptApplication.script_id]).where(and_(
ScriptApplication.app_id.in_(app_search),
ScriptApplication.app_id.in_(lang_search),
)).as_scalar()
# Turn it into an SQL expression
query = Script.id.in_(script_search)
Resulting SQL code:
SELECT script.id AS script_id
FROM script
WHERE script.id IN (SELECT script_application.script_id
FROM script_application
WHERE script_application.application_id IN (SELECT application.id
FROM application
WHERE application.name = ?) AND script_application.application_id IN (SELECT application_language.application_id
FROM application_language, script
WHERE script.language_id = application_language.language_id))
My theory
I believe the issue is on the line ApplicationLanguage.language_id==Script.language_id, because if I change it to (ApplicationLanguage.language_id==3, 3 being the value I'm expecting), then it works perfectly. In the SQL code, I assume it's the FROM application_language, script which is overwriting the top level script
How would I go about either rearranging or fixing this query? My current method seems to work fine if it's across a single relationship, just doesn't work if I try and do anything more complex.
I'd still love to know how I'd go about fixing the original query as I believe it'll come in useful in the future, but I managed to rearrange it.
I reversed the lang_search to grab languages for each application from app_search, and used that as part of the final query, instead of attempting to combine it in a subquery.
value = 'appname'
app_search = select([Application.id]).where(Application.name==value).as_scalar()
lang_search = select([ApplicationLanguage.language_id]).where(
ApplicationLanguage.app_id.in(app_search)
).as_scalar()
script_search = select([ScriptApplication.script_id]).where(and_(
ScriptApplication.app_id.in(app_search),
)).as_scalar()
query = and_(
Script.id.in_(script_search),
Script.language_id.in_(lang_search),
)
Final SQL query:
SELECT script.id AS script_id
FROM script
WHERE script.id IN (SELECT script_application.script_id
FROM script_application
WHERE script_application.application_id IN (SELECT application.id
FROM application
WHERE lower(application.name) = ?)) AND script.language_id IN (SELECT application_language.language_id
FROM application_language
WHERE application_language.application_id IN (SELECT application.id
FROM application
WHERE lower(application.name) = ?))

How to get Peewee ORM contains column working with join

I'm doing a join across two tables, pretty simple set up, but when I add a contains or startswith that references a column in the table being joined I can never get the results. No errors, but the count is always 0, despite me knowing that the records exist and being able to write the equivalent query in raw SQL and have it return all the results I expect.
Here's what it looks like, assume A and B are tables, they're related through a foreign key, and both the fields I'm using in the where clause are CharField.
This version does not work despite me expecting it to:
(A.select().join(B).where(
A.some_column.contains(B.other_column)
))
But this does work as expected:
(A.select().join(B).where(
SQL("t1.some_column ILIKE '%%' || t2.other_column || '%%'")
))
I would expect those two to be equivalent, but they're not. Looking at the output SQL from the first one it looks like this:
(SELECT "t1"."some_column" from "A" as "t1"
INNER JOIN "B" as "t2" ON ("t1"."b_id" = "t2"."id")
WHERE ("t1"."some_column" ILIKE %s)', ['%<CharField: B.other_column>%'])
The interesting thing to me about the SQL output is at the end where it's referencing B.other_column. I'm guessing that if it were t2.other_column instead then the query would work, but how do I make peewee do that? I've tried everything I can think of and I can't figure out a pure ORM way to get this working.
The contains method performs interpolation of the parameter.
To achieve what you're trying to do, you would stay away from the "contains" method and use the ILIKE operation.
A.select().join(B).where(
A.some_column % ('%' + B.other + '%'))
The first "%" is the operator overload for ILIKE. The '%' + B.other + '%' will concatenate the wildcards for substring search.
UPDATE: I felt like this was a legit issue, so I've made a small change to make the .contains(), .startswith() and .endswith() methods work properly when the right-hand-side value is, for example, a field. Going forward it should work more intuitively.
Commit here: https://github.com/coleifer/peewee/commit/0c98f3e1f556eba10cbbdf7c386c49c64f4da41c

How can I search a record in MySQL using Python

def search(title="",author="",year="",isbn=""):
con = mysql.connector.connect(host="localhost", user="root", passwd="junai2104", database="book")
cur = con.cursor()
sql_statement = "SELECT * FROM book WHERE title={} or author={} or year={} or isbn={} ".format(title,author,year,isbn)
cur.execute(sql_statement)
rows=cur.fetchall()
con.close()
return rows
print(search(title='test2'))
How can I search a value in MySQL using Python argument?
how to get a values from the argument?
You have a couple of issues with your code:
In your SQL SELECT statement you are looking for values in text columns (TEXT, VARCHAR etc.). To do so you must add single quotes to your search qriteria, since you want to indicate a text literal. So WHERE title={} should be WHERE title='{}' (same goes for the other parameters).
When one or more of your arguments are empty, you will search for rows where the respective value is an empty text. So in your example search(title='test2') will trigger a search for an entry where the title column has the value 'test2' or any of the other three columns (author, year and isbn) has an empty text. If you inted to look for a title 'test2', this will only work if none of the other columns will ever contain an empty text. And even then, because of the three OR operators in your query, performance will be poor. What you should do instead is to evaluate each parameter individually and construct the query only with the parameters that are not empty.
By constructing your query with formatting a string, you will create a massive security issue in case the values of your search parameters come from user input. Your code is wide open for SQL injection, which is one of the simplest and most effective attacks on your system. You should always parametrize your queries to prevent this attack. By general principle, never create SQL queries by formating or concatenating strings with their parameters. Note that with parametrized queries you do not need to add single quotes to your query as wriitten in point 1.

Changing where clause without generating subquery in SQLAlchemy

I'm trying to build a relatively complex query and would like to manipulate the where clause of the result directly, without cloning/subquerying the returned query. An example would look like:
session = sessionmaker(bind=engine)()
def generate_complex_query():
return select(
columns=[location.c.id.label('id')],
from_obj=location,
whereclause=location.c.id>50
).alias('a')
query = generate_complex_query()
# based on this query, I'd like to add additional where conditions, ideally like:
# `query.where(query.c.id<100)`
# but without subquerying the original query
# this is what I found so far, which is quite verbose and it doesn't solve the subquery problem
query = select(
columns=[query.c.id],
from_obj=query,
whereclause=query.c.id<100
)
# Another option I was considering was to map the query to a class:
# class Location(object):pass
# mapper(Location, query)
# session.query(Location).filter(Location.id<100)
# which looks more elegant, but also creates a subquery
result = session.execute(query)
for r in result:
print r
This is the generated query:
SELECT a.id
FROM (SELECT location.id AS id
FROM location
WHERE location.id > %(id_1)s) AS a
WHERE a.id < %(id_2)s
I would like to obtain:
SELECT location.id AS id
FROM location
WHERE id > %(id_1)s and
id < %(id_2)s
Is there any way to achieve this? The reason for this is that I think query (2) is slightly faster (not much), and the mapper example (2nd example above) which I have in place messes up the labels (id becomes anon_1_id or a.id if I name the alias).
Why don't you do it like this:
query = generate_complex_query()
query = query.where(location.c.id < 100)
Essentially you can refine any query like this. Additionally, I suggest reading the SQL Expression Language Tutorial which is pretty awesome and introduces all the techniques you need. The way you build a select is only one way. Usually, I build my queries more like this: select(column).where(expression).where(next_expression) and so on. The FROM is usually automatically inferred by SQLAlchemy from the context, i.e. you rarely need to specify it.
Since you don't have access to the internals of generate_complex_query try this:
query = query.where(query.c.id < 100)
This should work in your case I presume.
Another idea:
query = query.where(text("id < 100"))
This uses SQLAlchemy's text expression. This could work for you, however, and this is important: If you want to introduce variables, read the description of the API linked above, because just using format strings intead of bound parameters will open you up to SQL injection, something that normally is a no-brainer with SQLAlchemy but must be taken care of if working with such literal expressions.
Also note that this works because you label the column as id. If you don't do that and don't know the column name, then this won't work either.

Categories