Sqlalchemy query is very slow after using the in_() method - python

filters.append(Flow.time_point >= datetime.strptime(start_time, '%Y-%m-%d %H:%M:%S'))
filters.append(Flow.time_point <= datetime.strptime(end_time, '%Y-%m-%d %H:%M:%S'))
if domain_name != 'all':
filters.append(Bandwidth.domain_name.in_(domain_name.split('|')))
flow_list = db.session.query(Flow.time_point, db.func.sum(Flow.value).label('value')).filter(*filters).group_by(Flow.time_point).order_by(Flow.time_point.asc()).all()
The query time is 3 to 4 seconds when domain_name is 'all', otherwise the query time is 5 minutes. I have tried to add an index to a column but to no avail. What could be the reason for this?

When domain_name is not 'all' you end up performing an implicit CROSS JOIN between Flow and Bandwidth. When you add the IN predicate to your list of filters SQLAlchemy also picks up Bandwidth as a FROM object. As there is no explicit join between the two, the query will end up as something like:
SELECT flow.time_point, SUM(flow.value) AS value FROM flow, bandwidth WHERE ...
-- ^
-- `- This is the problem
In the worst case the planner produces a query that first joins every row from Flow with every row from Bandwidth. If your tables are even moderately big, the resulting set of rows can be huge.
Without seeing your models it is impossible to produce an exact solution, but in general you should include the proper join in your query, if you include Bandwidth:
query = db.session.query(Flow.time_point, db.func.sum(Flow.value).label('value'))
filters.append(Flow.time_point >= datetime.strptime(start_time, '%Y-%m-%d %H:%M:%S'))
filters.append(Flow.time_point <= datetime.strptime(end_time, '%Y-%m-%d %H:%M:%S'))
if domain_name != 'all':
query = query.join(Bandwidth)
filters.append(Bandwidth.domain_name.in_(domain_name.split('|')))
flow_list = query.\
filter(*filters).\
group_by(Flow.time_point).\
order_by(Flow.time_point.asc()).\
all()
If there are no foreign keys connecting your models, you must provide the ON clause as the second argument to Query.join() explicitly.

Related

There are some problems when using Sqlalchemy to query the data between time1 and time2

My database is SQL Server 2008.
The type of time character I want to query in the database (such as finishdate) is datetime2
I just want data between "10-11" and "10-17".
When using Sqlalchemy, I use
cast(FinishDate, DATE).between(cast(time1, DATE),cast(time2, DATE))
to query dates, but it does not return any data (I confirm that there must be some data statements meet the query time range)
==============================================
from sqlalchemy import DATE
bb = "2021-10-11 12:21:23"
cc = "2021-10-17 16:12:34"
record = session.query(sa.Name cast(sa.FinishDate, DATE)).filter(
cast(sa.SamplingTime, DATE).between(cast(bb, DATE), cast(cc, DATE)),
sa.SamplingType != 0
).all()
or
record = session.query(sa.Name cast(sa.FinishDate, DATE)).filter(
cast(sa.SamplingTime, DATE)>= cast(bb, DATE),
sa.SamplingType != 0
).all()
Both return []
Something is wrong with my code and I don't know what the trouble is.
It is working for me, I only changed the DATE that you are using to Date
from sqlalchemy import Date
record = session.query(
sa.Name cast(sa.FinishDate, Date)
).filter(
cast(sa.SamplingTime, Date).between(
cast(bb, Date), cast(cc, Date)
),
sa.SamplingType != 0
).all()
As a matter of fact first parameter of cast can be a string also, so in this case its fine to pass date as string in cast.
:param expression: A SQL expression, such as a
:class:`_expression.ColumnElement`
expression or a Python string which will be coerced into a bound
literal value.

search data based on date in sqlalchemy

i have a column(scheduledStartDateTime) in database which is of type datetime and i have to search previous row of data based on user entered datetime .
my query is like this:
order = self.trips_session.query(Order).filter(
and_(
Order.driverSystemId == driver_system_id,
func.date(Order.scheduledStartDateTime) < func.date(start_date)
)).order_by(
DispatchOrder.scheduledStartDateTime.desc()).first()
my search date is 2020-01-13 07:16:06 i,e order number 5673 so ideally i am looking for order number 5677 but i am getting is 5679 . how can i compare dates based on hours minutes and seconds as well.
So you want to convert a datetime however func.date() casts the datetime to date, therefore you are missing hour/minute/seconds. You just need to perform the comparison as normal, without casting your datetimes:
order = self.trips_session.query(Order).filter(
and_(
Order.driverSystemId == driver_system_id,
Order.scheduledStartDateTime < start_date
)).order_by(
DispatchOrder.scheduledStartDateTime.desc()).first()
Alternatively, if one of the datetime's provided is not in datetime format, you can use func.datetime() to cast it/them without losing the time information.

timedelta - most elegant way to pass 'days=-5' from string

I am trying call a function that triggers a report to be generated with a starting date that is either hour or days ago. The code below works fine but I would like to store the timedelta offset in a mysql database.
starting_date = datetime.today() - timedelta(days=-5)
I had hoped to store 'days=-5' in the database, extract that database column to variable 'delta_offset' and then run
starting_date = datetime.today() - timedelta(delta_offset)
It doesnt like this because delta_offset is a string. I know i could modify the function to just include the offset and store -5 in my database, like what is below. But I really wanted to store days=-5 in the database because my offset can be hours as well. I could make my offset in database always hours and store -120 in the database but was wondering if there was an elegant way where I store 'days=-5' in the database and not cause type issues
starting_date = datetime.today() - timedelta(days=delta_offset)
Instead of storing 'days=-5' in your database as a single column, you could break this into two columns named 'value' and 'unit' or similar.
Then you can pass these to timedelta in a dictionary and unpacking. Like so:
unit = 'days'
value = -5
starting_date = datetime.today() - timedelta(**{unit: value})
This will unpack the dictionary so you get the same result as doing timedelta([unit]=value).
Alternatively, if you really would like to keep 'days=-5' as a value of a single column in your database, you could split the string on '=' then take a similar approach. Here's how:
offset = 'days=-5'
unit, value = offset.split('=')
starting_date = datetime.today() - timedelta(**{unit: int(value)})
i would do it this way:
date_offset_split = date_offset.split("=")
kwargs = {date_offset_split[0]: int(date_offset_split[1])}
starting_date = datetime.today() - timedelta(**kwargs)

Variables in a Postgres view?

I have a view in Postgres which queries a master table (150 million rows) and retrieves data from the prior day (a function which returns SELECT yesterday; it was the only way to get the view to respect my partition constraints) and then joins it with two dimension tables. This works fine, but how would I loop through this query in Python? Is there a way to make the date dynamic?
for date in date_range('2016-06-01', '2017-07-31'):
(query from the view, replacing the date with the date in the loop)
My workaround was to literally copy and paste the entire view as a huge select statement format string, and then pass in the date in a loop. This worked, but it seems like there must be a better solution to utilize an existing view or to pass in a variable which might be useful in the future.
To loop day by day inside the interval on a for loop you could do something like:
import datetime
initialDate = datetime.datetime(2016, 6, 1)
finalDate = datetime.datetime(2017, 7, 31)
for day in range((finalDate - initialDate).days + 1):
current = (initialDate + datetime.timedelta(days = day)).date()
print("query from the view, replacing the date with " + current.strftime('%m/%d/%Y'))
Replacing the print with the call to your query. If the dates are strings you can do something like:
initialDate = datetime.datetime.strptime("06/01/2016", '%m/%d/%Y')

Python peewee, filtering query based on elapsed time

This is my "Request" class/Table:
class Request(BaseModel):
TIME_STAMP = DateTimeField(default=datetime.datetime.now)
MESSAGE_ID = IntegerField()
With peewee, I want to select from the table, all of the "requests" that have occurred over the past 10 minutes. Something like this:
rs = Request.select().where(Request.TIME_STAMP-datetime.datetime.now<(10 minutes))
But well, I'm not entirely sure how to get the difference between the TIME_STAMP and the current time in minutes.
EDIT:
I've tried Gerrat's suggestion, but Peewee seems to cry:
/usr/local/lib/python2.7/dist-packages/peewee.py:2356: Warning: Truncated incorrect DOUBLE value: '2014-07-19 15:51:24'
cursor.execute(sql, params or ())
/usr/local/lib/python2.7/dist-packages/peewee.py:2356: Warning: Truncated incorrect DOUBLE value: '0 0:10:0'
cursor.execute(sql, params or ())
I've never used peewee, but if you're just subtracting two timestamps, then you'll get back a datetime.timedelta object. You should be able to just compare it to another timedelta object then. Something like:
rs = Request.select().where(Request.TIME_STAMP-datetime.datetime.now()
<(datetime.timedelta(seconds=10*60)))
EDIT
...Looking a little closer at peewee, above may not work. If it doesn't then something like the following should:
ten_min_ago = datetime.datetime.now() - datetime.timedelta(seconds=10 * 60)
rs = Request.select().where(Request.TIME_STAMP > ten_min_ago)
It seems slightly odd that your timestamp in your database would be greater than the current time (what your select assumes)...so you may want to add the time delta and subtract the values the other way around (eg if you want to select records in the last 10 minutes).

Categories