Formatting a SQL query - python

I have as an input a string that is a SQL query. I need to get all tables that the query uses (like FROM table or table1 INNER JOIN table2). But the query does not respect any standard. So my question is if there is any method to format the query so that searching for these table names is easier.
My method right now is to search for the keywords from and join and take whatever line is after the keyword (or before in the case of the join), but there are exceptions in the queries where the from does not have a newline after it and I have to treat every exception like this. I don't think regex works because while the table name is {schema_name.table_name} there are also columns like that.
for row in text:
to_append = None
split_row = row.strip('\r').strip(' ').strip('\r').split(' ')
if split_row[-1].lower() == "from" and len(split_row) > 1:
from_indexes.append(text.index(row))
if ("join" in split_row or "JOIN" in split_row) and (split_row[-1] != "join" and split_row[-1]
!= "JOIN"):
for ind in range(len(split_row)):
if split_row[ind].lower() == "join":
to_append = split_row[ind + 1:]
row = split_row[:ind + 1]
row = ' '.join(row)
rows.append(row.strip('\r').strip(' ').strip('\t'))
if to_append is not None:
rows.append(' '.join(to_append))
So I am looking for some method that can standardize the sql query or for another method to extract the table names from the query.

I think a more straightforward approach would be to use regular expressions:
import re
sql = """select t1.*, t2.y, sq.z, table3.q from table1 t1 join
table2 t2 on t1.x = t2.x left join
(select 5 as x, 9 as z) sq JOIN
table3 on sq.x = table3.x
;"""
matches = re.findall(r'(\s+(from|join)\s+)(\w+)', sql, re.DOTALL|re.IGNORECASE)
for match in matches:
print(match[2])
Note that it will not consider (select 5 as x, 9 as z) as a table.

You should use an ORM tool in order to make cleaner queries (see https://en.wikipedia.org/wiki/Object-relational_mapping). Or at least some query builder modules.
I recently found a remake of laravels "eloquent" orm here https://pypi.org/project/eloquent/.
Other ORMs like PeeWee are pretty common to use, too.

Related

Format SQL query to Python

So far I have copied and pasted from SQL to Python simple codes where I have used the following formats:
sql = ("SELECT column1, column2, column3, column4 "
"FROM table1 "
"LEFT OUTER JOIN table2 ON x = y "
"LEFT OUTER JOIN table3 ON table3.z = table1.y "
However now I have started to copy into Python largest and more complicated SQL codes and I find quite difficult to use the same format as the above as columns start to contain sub-codes. I have seen some python packages that format an SQL code into python and I was wondering which one you suggest or what is the best and quiker way to overcome this situation.
You can use python multiline strings that start and end with three `
```This is a
a multi
line
string```
and not worry about formatting. This is what i generally use for such purposes but ideally you should go with an ORM
For reference please check
https://www.w3schools.com/python/python_strings.asp
For readability, you can try this
For example:
sql = """ SELECT country, product, SUM(profit) FROM sales left join
x on x.id=sales.k GROUP BY country, product having f > 7 and fk=9
limit 5; """
will result in:
sql = """
SELECT
country,
product,
SUM(profit)
FROM
sales
LEFT JOIN x ON
x.id = sales.k
GROUP BY
country,
product
HAVING
f > 7
AND fk = 9
LIMIT 5; """

How to filter if this id not exist in another table?

I'm try to filter if id of column A not exist in column B by this code.
query = db.session.query().select_from(Spare_Parts, Vendors, Replacement)\
.filter(Vendors.vendor_code == Spare_Parts.vendor_code,\
~ exists().where(Spare_Parts.spare_part_code == Replacement.spare_part_code))
I want to query the data from Spare_Parts that not have an id exist in Replacement as a foriegn key but i got the error like this.
Select statement 'SELECT *
FROM spare_parts, replacement
WHERE spare_parts.spare_part_code = replacement.spare_part_code' returned no FROM clauses due to auto-correlation; specify correlate(<tables>) to control correlation manually.
So what is a problem and how to fix that.
try to use the subquery like this instead
to filter spare_part_code from spare_parts which are not in replacement table``
SELECT *
FROM spare_parts
WHERE spare_parts.spare_part_code not in
(select distinct
replacement.spare_part_code
FROM replacement)
or you can use not exists
SELECT *
FROM spare_parts
WHERE not exists
(select 1
FROM replacement
where spare_parts.spare_parts_code = replacement.spare_parts_code)

MYSQL: how to insert statement without specifying col names or question marks?

I have a list of tuples of which i'm inserting into a Table.
Each tuple has 50 values. How do i insert without having to specify the column names and how many ? there is?
col1 is an auto increment column so my insert stmt starts in col2 and ends in col51.
current code:
l = [(1,2,3,.....),(2,4,6,.....),(4,6,7,.....)...]
for tup in l:
cur.execute(
"""insert into TABLENAME(col2,col3,col4.........col50,col51)) VALUES(?,?,?,.............)
""")
want:
insert into TABLENAME(col*) VALUES(*)
MySQL's syntax for INSERT is documented here: http://dev.mysql.com/doc/refman/5.7/en/insert.html
There is no wildcard syntax like you show. The closest thing is to omit the column names:
INSERT INTO MyTable VALUES (...);
But I don't recommend doing that. It works only if you are certain you're going to specify a value for every column in the table (even the auto-increment column), and your values are guaranteed to be in the same order as the columns of the table.
You should learn to use code to build the SQL query based on arrays of values in your application. Here's a Python example the way I do it. Suppose you have a dict of column: value pairs called data_values.
placeholders = ['%s'] * len(data_values)
sql_template = """
INSERT INTO MyTable ({columns}) VALUES ({placeholders})
"""
sql = sql_template.format(
columns=','.join(keys(data_values)),
placeholders=','.join(placeholders)
)
cur = db.cursor()
cur.execute(sql, data_values)
example code to put before your code:
cols = "("
for x in xrange(2, 52):
cols = cols + "col" + str(x) + ","
test = test[:-1]+")"
Inside your loop
for tup in l:
cur.execute(
"""insert into TABLENAME " + cols " VALUES {0}".format(tup)
""")
This is off the top of my head with no error checking

SQLAlchemy group_by where select differs from aggregate target

I'm struggling to write an aggregating GROUP BY query with SQL Alchemy that returns the result of aggregating over a table "lower down" and a joined entity "higher up" which happens to be the grouping key, instead of returning the aggregating entity, e.g.:
qry = session.query(PSU, func.count(PSU.id)).join(PSU).join(StockUnit).join(Part).group_by(Part)
but I want to return (Part, the_count), not (PSU, the_count). Writing session.query(Part, func.count(...)) queries the wrong way round.
Here is the SQL I want query using SQL Alchemy semantics:
select
psu.package_id,
p.*, -- the joined entity
count(psu.*) -- the aggregate
from packaged_stock_unit psu
inner join stock_unit su
on su.id = psu.stock_unit_id
inner join part p
on p.id = su.part_id
where
psu.some_value = 1
and psu.package_id = 1
group by psu.package_id, p.sku;
Perhaps this is possible with the SQLAlchemy base functions?
Use select_from() to control the "left" side of the join in case you need it:
qry = session.query(Part, func.count(PSU.id)).\
select_from(PSU).\
join(StockUnit).\
join(Part).\
group_by(Part)

How to count rows with SELECT COUNT(*) with SQLAlchemy?

I'd like to know if it's possible to generate a SELECT COUNT(*) FROM TABLE statement in SQLAlchemy without explicitly asking for it with execute().
If I use:
session.query(table).count()
then it generates something like:
SELECT count(*) AS count_1 FROM
(SELECT table.col1 as col1, table.col2 as col2, ... from table)
which is significantly slower in MySQL with InnoDB. I am looking for a solution that doesn't require the table to have a known primary key, as suggested in Get the number of rows in table using SQLAlchemy.
Query for just a single known column:
session.query(MyTable.col1).count()
I managed to render the following SELECT with SQLAlchemy on both layers.
SELECT count(*) AS count_1
FROM "table"
Usage from the SQL Expression layer
from sqlalchemy import select, func, Integer, Table, Column, MetaData
metadata = MetaData()
table = Table("table", metadata,
Column('primary_key', Integer),
Column('other_column', Integer) # just to illustrate
)
print select([func.count()]).select_from(table)
Usage from the ORM layer
You just subclass Query (you have probably anyway) and provide a specialized count() method, like this one.
from sqlalchemy.sql.expression import func
class BaseQuery(Query):
def count_star(self):
count_query = (self.statement.with_only_columns([func.count()])
.order_by(None))
return self.session.execute(count_query).scalar()
Please note that order_by(None) resets the ordering of the query, which is irrelevant to the counting.
Using this method you can have a count(*) on any ORM Query, that will honor all the filter andjoin conditions already specified.
I needed to do a count of a very complex query with many joins. I was using the joins as filters, so I only wanted to know the count of the actual objects. count() was insufficient, but I found the answer in the docs here:
http://docs.sqlalchemy.org/en/latest/orm/tutorial.html
The code would look something like this (to count user objects):
from sqlalchemy import func
session.query(func.count(User.id)).scalar()
Addition to the Usage from the ORM layer in the accepted answer: count(*) can be done for ORM using the query.with_entities(func.count()), like this:
session.query(MyModel).with_entities(func.count()).scalar()
It can also be used in more complex cases, when we have joins and filters - the important thing here is to place with_entities after joins, otherwise SQLAlchemy could raise the Don't know how to join error.
For example:
we have User model (id, name) and Song model (id, title, genre)
we have user-song data - the UserSong model (user_id, song_id, is_liked) where user_id + song_id is a primary key)
We want to get a number of user's liked rock songs:
SELECT count(*)
FROM user_song
JOIN song ON user_song.song_id = song.id
WHERE user_song.user_id = %(user_id)
AND user_song.is_liked IS 1
AND song.genre = 'rock'
This query can be generated in a following way:
user_id = 1
query = session.query(UserSong)
query = query.join(Song, Song.id == UserSong.song_id)
query = query.filter(
and_(
UserSong.user_id == user_id,
UserSong.is_liked.is_(True),
Song.genre == 'rock'
)
)
# Note: important to place `with_entities` after the join
query = query.with_entities(func.count())
liked_count = query.scalar()
Complete example is here.
If you are using the SQL Expression Style approach there is another way to construct the count statement if you already have your table object.
Preparations to get the table object. There are also different ways.
import sqlalchemy
database_engine = sqlalchemy.create_engine("connection string")
# Populate existing database via reflection into sqlalchemy objects
database_metadata = sqlalchemy.MetaData()
database_metadata.reflect(bind=database_engine)
table_object = database_metadata.tables.get("table_name") # This is just for illustration how to get the table_object
Issuing the count query on the table_object
query = table_object.count()
# This will produce something like, where id is a primary key column in "table_name" automatically selected by sqlalchemy
# 'SELECT count(table_name.id) AS tbl_row_count FROM table_name'
count_result = database_engine.scalar(query)
I'm not clear on what you mean by "without explicitly asking for it with execute()" So this might be exactly what you are not asking for.
OTOH, this might help others.
You can just run the textual SQL:
your_query="""
SELECT count(*) from table
"""
the_count = session.execute(text(your_query)).scalar()
def test_query(val: str):
query = f"select count(*) from table where col1='{val}'"
rtn = database_engine.query(query)
cnt = rtn.one().count
but you can find the way if you checked debug watch
query = session.query(table.column).filter().with_entities(func.count(table.column.distinct()))
count = query.scalar()
this worked for me.
Gives the query:
SELECT count(DISTINCT table.column) AS count_1
FROM table where ...
Below is the way to find the count of any query.
aliased_query = alias(query)
db.session.query(func.count('*')).select_from(aliased_query).scalar()
Here is the link to the reference document if you want to explore more options or read details.

Categories