How to improve query performance to select info from postgress? - python

I have a flask app:
db = SQLAlchemy(app)
#app.route('/')
def home():
query = "SELECT door_id FROM table WHERE id = 2422628557;"
result = db.session.execute(query)
return json.dumps([dict(r) for r in result])
When I execute: curl http://127.0.0.1:5000/
I got result very quickly [{"door_id": 2063805}]
But when I reverse my query: query = "SELECT id FROM table WHERE door_id = 2063805;"
everything work very-very slow.
Probably I have index on id attribute and don't have such on door_id.
How can I improve performance? How to add index on door_id?

If you want index on that column, just create it:
create index i1 on table (door_id)
Then depending on your settings you might have to analyze it to introduce to query planner, eg:
analyze table;
keep in mind - all indexes require additional IO on data manipulation

Look at the explain plan for your query.
EXPLAIN
"SELECT door_id FROM table WHERE id = 2422628557;"
It is likely you are seeing something like this:
QUERY PLAN
------------------------------------------------------------
Seq Scan on table (cost=0.00..483.00 rows=99999 width=244)
Filter: (id = 2422628557)
The Seq Scan is checking every single row in this table and this is then being filtered by the id you are restricting by.
What you should do in this occasion is to add an index to the ID column.
The plan will change to something like:
QUERY PLAN
-----------------------------------------------------------------------------
Index Scan using [INDEX_NAME] on table (cost=0.00..8.27 rows=1 width=244)
Index Cond: (id = 2422628557)
The optimiser will be using the index to reduce the row look ups for your query.
This will speed up your query.

Related

how to delete the only the rows in postgres but not to drop table using pandas read_sql_query method?

I wanted to perform an operation where i would like to delete all the rows(but not to drop the table) in postgres and update with new rows in it. And I wanted to use pd.read_sql_query() method from pandas:
qry = 'delete from "table_name"'
pd.read_sql_query(qry, conection, **kwargs)
But it was throwing error 'ResourceClosedError: This result object does not return rows. It has been closed automatically.'
I can expect this because the method should return the empty dataframe.But it was not returning any empty dataframe but only the the above error. Could you please help me in resolving it??
I use MySql, but the logic is the same:
Query 1: Choose all ids from you table
Quear 2: Delete all this ids
As a result you have:
Delete FROM table_name WHERE id IN (Select id FROM table_name)
The line do not return anuthing, it just delete all rows with a special id. I recomend to do the command using psycopg only - no pandas.
Then you need another query to get smth from db like:
pd.read_sql_query("SELECT * FROM table_name", conection, **kwargs)
Probably (I do not use pandas to read from db) in this case you'll get empty dataframe with Column names
Probably you can combine all the actions, the following way:
pd.read_sql_query('''Delete FROM table_name WHERE id IN (Select id FROM table_name); SELECT * FROM table_name''', conection, **kwargs)
Please try and share your results.
You can follow the next steps!
Check 'row existence' first in the table.
And then delete rows
Example code
check_row_query = "select exists(select * from tbl_name limit 1)"
check_exist = pd.read_sql_query(check_row_query, con)
if check_exist.exists[0]:
delete_query = 'DELETE FROM tbl_name WHERE condtion(s)'
con.execute(delete_query) # to delete rows using a sqlalchemy function
print('Delete all rows!)
else:
pass

Is there any way to get total count of table row and table data in single Query

The problem is, to get count and table data, I have to hit the database two times in Django. For example:
count = queryset.count() # To get count
data = queryset.values('columns') # To get data
Is there any way to get data in the single query. One solution is to use len() function, but it is not good for a bigger table to load in RAM.
In mysql, I got this, But how to execute through Django ORM
SELECT t1.count, id FROM table1, (select count(*) as count FROM table1) as t1 limit 10;
Any help will be appreciated.
I mean len should just count what is already in memory if you get the data first, so it should be better than two queries to database.
data = queryset.values('columns') # To get data
count = len(data) # To get count from database records in memory
Anyway, for direct database queries without the model layer from Django docs:
from django.db import connection
def my_custom_sql(self):
with connection.cursor() as cursor:
cursor.execute("SELECT t1.count, id FROM table1, (select count(*) as count FROM table1) as t1 limit 10")
row = dictfetchall(cursor)
return row

SQL: SELECT where one of many columns contains 'x' and result is not "NULL"

I have a piece of code that I realized is probably quite inefficient, though I'm not sure how to improve it.
Basically, I have a database table like this:
Example DB table
Any or several of columns A-G might match my search query. If that is the case, I want to query VALUE from that row. I need VALUE not to equal NULL though, so if that's the case, it should keep looking. If my query were abc, I'd want to obtain correct.
Below is my current code, using a database named db with a table table.
cur=db.cursor()
data="123"
fields_to_check=["A","B","C","D","E","F","G"]
for field in fields_to_check:
"SELECT Value FROM table WHERE {}='{}'".format(field,data)
query=cur.fetchone()
if query and query !="NULL":
break
db.close()
I think that the fact that this performs 8 queries is likely very inefficient.
cur=db.cursor()
data="123"
fields_to_check=["A","B","C","D","E","F","G"]
sub_query = ""
for field in fields_to_check:
sub_query = sub_query + "or {}='{}' ".format(field,data)
if sub_query:
query = "SELECT Value FROM table WHERE ("+ str(sub_query[2:]) +") and value IS NOT NULL;"
if query:
cur.execute(query)
rows = cur.fetchall()
if rows:
for row in rows:
print(row)

Put retrieved data from MySQL query into DataFrame pandas by a for loop

I have one database with two tables, both have a column called barcode, the aim is to retrieve barcode from one table and search for the entries in the other where extra information of that certain barcode is stored. I would like to have bothe retrieved data to be saved in a DataFrame. The problem is when I want to insert the retrieved data into DataFrame from the second query, it stores only the last entry:
import mysql.connector
import pandas as pd
cnx = mysql.connector(user,password,host,database)
query_barcode = ("SELECT barcode FROM barcode_store")
cursor = cnx.cursor()
cursor.execute(query_barcode)
data_barcode = cursor.fetchall()
Up to this point everything works smoothly, and here is the part with problem:
query_info = ("SELECT product_code FROM product_info WHERE barcode=%s")
for each_barcode in data_barcode:
cursor.execute(query_info % each_barcode)
pro_info = pd.DataFrame(cursor.fetchall())
pro_info contains only the last matching barcode information! While I want to retrieve all the information for each data_barcode match.
That's because you are consistently overriding existing pro_info with new data in each loop iteration. You should rather do something like:
query_info = ("SELECT product_code FROM product_info")
cursor.execute(query_info)
pro_info = pd.DataFrame(cursor.fetchall())
Making so many SELECTs is redundant since you can get all records in one SELECT and instantly insert them to your DataFrame.
#edit: However if you need to use the WHERE statement to fetch only specific products, you need to store records in a list until you insert them to DataFrame. So your code will eventually look like:
pro_list = []
query_info = ("SELECT product_code FROM product_info WHERE barcode=%s")
for each_barcode in data_barcode:
cursor.execute(query_info % each_barcode)
pro_list.append(cursor.fetchone())
pro_info = pd.DataFrame(pro_list)
Cheers!

How to count rows with SELECT COUNT(*) with SQLAlchemy?

I'd like to know if it's possible to generate a SELECT COUNT(*) FROM TABLE statement in SQLAlchemy without explicitly asking for it with execute().
If I use:
session.query(table).count()
then it generates something like:
SELECT count(*) AS count_1 FROM
(SELECT table.col1 as col1, table.col2 as col2, ... from table)
which is significantly slower in MySQL with InnoDB. I am looking for a solution that doesn't require the table to have a known primary key, as suggested in Get the number of rows in table using SQLAlchemy.
Query for just a single known column:
session.query(MyTable.col1).count()
I managed to render the following SELECT with SQLAlchemy on both layers.
SELECT count(*) AS count_1
FROM "table"
Usage from the SQL Expression layer
from sqlalchemy import select, func, Integer, Table, Column, MetaData
metadata = MetaData()
table = Table("table", metadata,
Column('primary_key', Integer),
Column('other_column', Integer) # just to illustrate
)
print select([func.count()]).select_from(table)
Usage from the ORM layer
You just subclass Query (you have probably anyway) and provide a specialized count() method, like this one.
from sqlalchemy.sql.expression import func
class BaseQuery(Query):
def count_star(self):
count_query = (self.statement.with_only_columns([func.count()])
.order_by(None))
return self.session.execute(count_query).scalar()
Please note that order_by(None) resets the ordering of the query, which is irrelevant to the counting.
Using this method you can have a count(*) on any ORM Query, that will honor all the filter andjoin conditions already specified.
I needed to do a count of a very complex query with many joins. I was using the joins as filters, so I only wanted to know the count of the actual objects. count() was insufficient, but I found the answer in the docs here:
http://docs.sqlalchemy.org/en/latest/orm/tutorial.html
The code would look something like this (to count user objects):
from sqlalchemy import func
session.query(func.count(User.id)).scalar()
Addition to the Usage from the ORM layer in the accepted answer: count(*) can be done for ORM using the query.with_entities(func.count()), like this:
session.query(MyModel).with_entities(func.count()).scalar()
It can also be used in more complex cases, when we have joins and filters - the important thing here is to place with_entities after joins, otherwise SQLAlchemy could raise the Don't know how to join error.
For example:
we have User model (id, name) and Song model (id, title, genre)
we have user-song data - the UserSong model (user_id, song_id, is_liked) where user_id + song_id is a primary key)
We want to get a number of user's liked rock songs:
SELECT count(*)
FROM user_song
JOIN song ON user_song.song_id = song.id
WHERE user_song.user_id = %(user_id)
AND user_song.is_liked IS 1
AND song.genre = 'rock'
This query can be generated in a following way:
user_id = 1
query = session.query(UserSong)
query = query.join(Song, Song.id == UserSong.song_id)
query = query.filter(
and_(
UserSong.user_id == user_id,
UserSong.is_liked.is_(True),
Song.genre == 'rock'
)
)
# Note: important to place `with_entities` after the join
query = query.with_entities(func.count())
liked_count = query.scalar()
Complete example is here.
If you are using the SQL Expression Style approach there is another way to construct the count statement if you already have your table object.
Preparations to get the table object. There are also different ways.
import sqlalchemy
database_engine = sqlalchemy.create_engine("connection string")
# Populate existing database via reflection into sqlalchemy objects
database_metadata = sqlalchemy.MetaData()
database_metadata.reflect(bind=database_engine)
table_object = database_metadata.tables.get("table_name") # This is just for illustration how to get the table_object
Issuing the count query on the table_object
query = table_object.count()
# This will produce something like, where id is a primary key column in "table_name" automatically selected by sqlalchemy
# 'SELECT count(table_name.id) AS tbl_row_count FROM table_name'
count_result = database_engine.scalar(query)
I'm not clear on what you mean by "without explicitly asking for it with execute()" So this might be exactly what you are not asking for.
OTOH, this might help others.
You can just run the textual SQL:
your_query="""
SELECT count(*) from table
"""
the_count = session.execute(text(your_query)).scalar()
def test_query(val: str):
query = f"select count(*) from table where col1='{val}'"
rtn = database_engine.query(query)
cnt = rtn.one().count
but you can find the way if you checked debug watch
query = session.query(table.column).filter().with_entities(func.count(table.column.distinct()))
count = query.scalar()
this worked for me.
Gives the query:
SELECT count(DISTINCT table.column) AS count_1
FROM table where ...
Below is the way to find the count of any query.
aliased_query = alias(query)
db.session.query(func.count('*')).select_from(aliased_query).scalar()
Here is the link to the reference document if you want to explore more options or read details.

Categories