how to compare date (in query criteria) in pyral

how to compare date (in query criteria) in pyral - python

I have a requirement to find the tasks which are not updated.
The criteria looks like :
'iteration.Name = \"iterationName\" and State!=Completed and LastUpdateDate<'+str(datetime.datetime.now())+"'"
This would result in:
iteration.Name = "iterationName" and State!=Completed and LastUpdateDate<'2015-12-27 20:17:08.769000'
I didn't get any results.
The rally task object has the LastUpdateDate as 2015-12-16T09:54:30.600Z 8.0
How do I compare the LastUpdateDate in the query criterion?

I had the same problem with multiple arguments in the end I had to add brackets to get it to work.
(('iteration.Name = \"iterationName\") AND (State!=Completed)) AND (LastUpdateDate<'+str(datetime.datetime.now())+"'")
There's probably too many brackets in there, the key thing I think was to get a bracket around the first two conditions; if adding another condition it would look something like this
((((condition1) AND (condition2)) AND (condition3)) AND (condition4))

Question was asked ages ago, but might be useful for people looking for using queries with dates in pyral.
There are several issues with the code in the question:
datetime.datetime.now() will probably not return a date in the format needed by Rally, so it is better to use strftime to get the proper format
Multiple conditions need to have correct set of parentheses; for example: ((((condition1) AND (condition2)) AND (condition3)) AND (condition4))
LastUpdateDate will necessarily be previous to current date (unless the users have been able to jump to the future).
It is better to put dates with double quotes (") instead of single quotes (')
Here is the code I came for identifying tasks that have not been updated in the last 5 days and are not completed.
iter_name = "2018-Iteration-4"
five_days_ago = datetime.datetime.now() - datetime.timedelta(days=5)
str_date = five_days_ago.strftime("%Y-%m-%dT%H:%M:%S.%fZ")
tasks_not_updated = rally.get(
'Task',
query = '(((iteration.Name = "%s")'
' and (State != Completed))'
' and (LastUpdateDate < "%s"))' % (iter_name, str_date)
)
for task in tasks_not_updated:
print("%s (%s)" % (task.Name, task.State))

Related

MySQL SQLALCHEMY Python Getting Max Count for Timestamp

I have data recorded for several timestamps ... I want to get the max amount of all timestamps.
This is my code:
for timestamp in timestamps:
count = db.query(models.Appointment.id).filter(models.Appointment.place == place) \
.filter(models.Appointment.date == date) \
.filter(models.Appointment.timestamp == timestamp).count()
data.append(count)
return max(data)
Sadly, it takes timestamps * 1.5 seconds to calculate that requested value.
Is there any possibility (a query) which can handle this in around 3-10 seconds?
Regards,
Martin

If using MySQL 8 and later, you could give the following a go:
return db.query(func.max(func.count()).over()).\
filter(models.Appointment.place == place).\
filter(models.Appointment.date == date).\
filter(models.Appointment.timestamp.in_(timestamps)).\
group_by(models.Appointment.timestamp).\
limit(1).\
scalar()
This uses the (slightly non obvious) fact that window functions are evaluated after forming group rows, and without a partition and order the window is over all the group rows.
If using a version of MySQL that does not yet support window functions, use a subquery instead:
counts = db.query(func.count().label('count')).\
filter(models.Appointment.place == place).\
filter(models.Appointment.date == date).\
filter(models.Appointment.timestamp.in_(timestamps)).\
group_by(models.Appointment.timestamp).\
subquery()
return db.query(func.max(counts.c.count)).scalar()
The difference in these to the original approach is that both make only a single trip to the database. That is usually desirable, but may require thinking a bit differently about the problem, due to SQL being a (more or less) declarative language – you mostly describe the answer you want, not how you want it✝.
✝ "I want coffee" vs. "Start by pouring some water in the..."

Fixing a meeting room function schedule with double and triple bookings to determine space usage

I need to calculate the total amount of time each group uses a meeting space. But the data set has double and triple booking, so I think I need to fix the data first. Disclosure: My coding experience consists solely of working through a few Dataquest courses, and this is my first stackoverflow posting, so I apologize for errors and transgressions.
Each line of the data set contains the group ID and a start and end time. It also includes the booking type, ie. reserved, meeting, etc. Generally, the staff reserve a space for the entire period, which would create a single line, and then add multiple lines for each individual function when the details are known. They should segment the original reserved line so it's only holding space in between functions, but instead they double book the space, so I need to add multiple lines for these interim RES holds, based on the actual holds.
Here's what the data basically looks like:
Existing data:
functions = [['Function', 'Group', 'FunctionType', 'StartTime', 'EndTime'],
[01,01,'RES',2019/10/04 07:00,2019/10/06 17:00],
[02,01,'MTG',2019/10/05 09:00,2019/10/05 12:00],
[03,01,'LUN',2019/10/05 12:30,2019/10/05 13:30],
[04,01,'MTG',2019/10/05 14:00,2019/10/05 17:00],
[05,01,'MTG',2019/10/06 09:00,2019/10/06 12:00]]
I've tried to iterate using a for loop:
for index, row in enumerate(functions):
last_row_index = len(functions) - 1
if index == last_row_index:
pass
else:
current_index = index
next_index = index + 1
if row[3] <= functions[next_index][2]:
next
elif row[4] == 'RES' or row[6] < functions[next_index][6]:
copied_current_row = row.copy()
row[3] = functions[next_index][2]
copied_current_row[2] = functions[next_index][3]
functions.append(copied_current_row)
There seems to be a logical problem in here, because that last append line seems to put the program into some kind of loop and I have to manually interrupt it. So I'm sure it's obvious to someone experienced, but I'm pretty new.
The reason I've done the comparison to see if a function is RES is that reserved should be subordinate to actual functions. But sometimes there are overlaps between actual functions, so I'll need to create another comparison to decide which one takes precedence, but this is where I'm starting.
How I (think) I want it to end up:
[['Function', 'Group', 'FunctionType', 'StartTime', 'EndTime'],
[01,01,'RES',2019/10/04 07:00,2019/10/05 09:00],
[02,01,'MTG',2019/10/05 09:00,2019/10/05 12:00],
[01,01,'RES',2019/10/05 12:00,2019/10/05 12:30],
[03,01,'LUN',2019/10/05 12:30,2019/10/05 13:30],
[01,01,'RES',2019/10/05 13:30,2019/10/05 14:00],
[04,01,'MTG',2019/10/05 14:00,2019/10/05 17:00],
[01,01,'RES',2019/10/05 14:00,2019/10/06 09:00],
[05,01,'MTG',2019/10/06 09:00,2019/10/06 12:00],
[01,01,'RES',2019/10/06 12:00,2019/10/06 17:00]]
This way, I could do a simple calculation of elapsed time for each function line and add it up to see how much time they had the space booked for.
What I'm looking for here is just some direction I should pursue, and I'm definitely not expecting anyone to do the work for me. For example, am I on the right path here, or would it be better to use pandas and vectorized functions? If I can get the basic direction right, I think I can muddle through the specifics.
Thank-you very much,
AF

invalid filter: Only one property per query may have inequality filters (>=, <=, >, <)

I have a number of items which are bookable in certain timeslots. Eg. a tennis court. So each item has got a number of associated availability slots each defined by begintime and endtime. Begintime and endtime are defined as datetime-objects so an availability slot from 09.00 - 11.30, is stored as eg. 2013-12-13 09.00 (begintime) to 2013-12-13 11.30 (endtime).
When a booking request comes in, I need to find out whether the tennis court is available for the desired timeslot.
So I am trying to filter availability slots based on start-time and end-time, and my query looks like this:
desired_availability_start = datetime(2013, 12, 13, 9,0,0)
desired_availability_end = datetime(2013, 12, 13, 10,0,0)
availability_slots = self.availability_slots.filter("begin <= ", desired_availability_start).filter("end >= ", desired_availability_end).fetch(limit=10)
but I get the following error
invalid filter: Only one property per query may have inequality filters (>=, <=, >, <)
Because I am trying to filter on both the begin- and end-property.
Based on the input and from some of the other posts on the topic Inequality Filter in AppEngine Datastore and BadFilterError: invalid filter: Only one property per query may have inequality filters (<=, >=, <, >) my current solution is to first filter on begin:
filtered_availability_slots = self.availability_slots.filter("begin <= ", desired_availability_start).fetch(limit=10)
and then filter on end and append the filtered items to a list:
final_availability_slots = []
for availability in filtered_availability_slots:
if availability.end >= desired_availability_end:
final_avaialability_slotes.append(availability)
But is this the best way of achieving what I want to achieve?
I am using Google App Engine and Python
Any help is appreciated
thanks
Thomas

As I guess you already know, you can't use more than one variable with inequality filters using the datastore. Unless you really need, you can filter using the 'begin' time only, and still get pretty accurate results.
calitem = self.appointments.filter("begin >= ", start).filter("begin <= ", end).fetch(limit=10)
If you really need, using your application logic, you can only show the items that doesn't go beyond the 'end' value. I don't see any other way around.

It's a bit unclear what you're asking. It's not clear whether you understand the problem: You're trying to use two inequality filters, and it's simply not allowed. Can't do it.
You must work around this datastore limitation.
The most basic option is to brute force it yourself. Use one filter, and manually filter out the results yourself. It may help to filter on begin, and sort on end, but you'll have to go through the results and pick the actual entities you want.
calitem = [x for x in self.appointments.filter("begin >= ", start).filter("begin <= " end) if x.end <= end]
In most cases, you'd want to restructure your data so that you don't need two inequality filters. This may or may not be possible.
I'm trying to guess at what you're doing, but if you're trying to see if someone is busy at 11am based on their calendar, a way to do this is:
Break the day down into time periods instead of using arbitrary time, ie 15min blocks.
Store an event as a list of the time blocks that it uses.
Query for events that contain the time block for 11am.

I have a similar requirement: pick entities out of Datastore that should be rendered/deliverd now. Since Datastore cannot handle this, application logic is required. I make two separate queries for keys that satisfy both ends of the constraint, and then take the intersection of them:
satisfies "begin" criteria: k1, k3, |k4, k5|, k6
--------+------+----
satisfies "end" criteria: k2, |k4, k5|, k7, k8
The intersection of "begin" and "end" are the keys k4, k5.
now = datetime.now()
start_dt_q = FooBar.all()
start_dt_q.filter('start_datetime <', now)
start_dt_q.filter('omit_from_feed =', False)
start_dt_keys = start_dt_q.fetch(None, keys_only=True)
end_dt_q = FooBar.all()
end_dt_q.filter('end_datetime >', now)
end_dt_q.filter('omit_from_feed =', False)
end_dt_keys = end_dt_q.fetch(None, keys_only=True)
# Get "intersection" of two queries; equivalent to
# single query with multiple criteria
valid_dt_keys = list(set(start_dt_keys) & set(end_dt_keys))
I then iterate over those keys getting the entities I need:
for key in valid_dt_keys:
foobar = FooBar.all().filter('__key__ =', key).get()
...
OR:
foobars = FooBar.all().filter('__key__ IN', valid_dt_keys)
for foobar in foobars:
...

How can I make my code more efficient?

I have a list of tuples that contains a tool_id, a time, and a message. I want to select from this list all the elements where the message matches some string, and all the other elements where the time is within some diff of any matching message for that tool.
Here is how I am currently doing this:
# record time for each message matching the specified message for each tool
messageTimes = {}
for row in cdata: # tool, time, message
if self.message in row[2]:
messageTimes[row[0], row[1]] = 1
# now pull out each message that is within the time diff for each matched message
# as well as the matched messages themselves
def determine(tup):
if self.message in tup[2]: return True # matched message
for (tool, date_time) in messageTimes:
if tool == tup[0]:
if abs(date_time-tup[1]) <= tdiff:
return True
return False
cdata[:] = [tup for tup in cdata if determine(tup)]
This code works, but it takes way too long to run - e.g. when cdata has 600,000 elements (which is typical for my app) it takes 2 hours for this to run.
This data came from a database. Originally I was getting just the data I wanted using SQL, but that was taking too long also. I was selecting just the messages I wanted, then for each one of those doing another query to get the data within the time diff of each. That was resulting in tens of thousands of queries. So I changed it to pull all the potential matches at once and then process it in python, thinking that would be faster. Maybe I was wrong.
Can anyone give me some suggestions on speeding this up?
Updating my post to show what I did in SQL as was suggested.
What I did in SQL was pretty straightforward. The first query was something like:
SELECT tool, date_time, message
FROM event_log
WHERE message LIKE '%foo%'
AND other selection criteria
That was fast enough, but it may return 20 or 30 thousand rows. So then I looped through the result set, and for each row ran a query like this (where dt and t are the date_time and tool from a row from the above select):
SELECT date_time, message
FROM event_log
WHERE tool = t
AND ABS(TIMESTAMPDIFF(SECOND, date_time, dt)) <= timediff
That was taking about an hour.
I also tried doing in one nested query where the inner query selected the rows from my first query, and the outer query selected the time diff rows. That took even longer.
So now I am selecting without the message LIKE '%foo%' clause and I am getting back 600,000 rows and trying to pull out the rows I want from python.

The way to optimize the SQL is to do it all in one query, instead of iterating over 20K rows and doing another query for each one.
Usually this means you need to add a JOIN, or occasionally a sub-query. And yes, you can JOIN a table to itself, as long as you rename one or both copies. So, something like this:
SELECT el2.date_time, el2.message
FROM event_log as el1 JOIN event_log as el2
WHERE el1.message LIKE '%foo%'
AND other selection criteria
AND el2.tool = el1.tool
AND ABS(TIMESTAMPDIFF(SECOND, el2.datetime, el1.datetime)) <= el1.timediff
Now, this probably won't be fast enough out of the box, so there are two steps to improve it.
First, look for any columns that obviously need to be indexed. Clearly tool and datetime need simple indices. message may benefit from either a simple index or, if your database has something fancier, maybe something fancier, but given that the initial query was fast enough, you probably don't need to worry about it.
Occasionally, that's sufficient. But usually, you can't guess everything correctly. And there may also be a need to rearrange the order of the queries, etc. So you're going to want to EXPLAIN the query, and look through the steps the DB engine is taking, and see where it's doing a slow iterative lookup when it could be doing a fast index lookup, or where it's iterating over a large collection before a small collection.

For tabular data, you can't go past the Python pandas library, which contains highly optimised code for queries like this.

I fixed this by changing my code as follows:
-first I made messageTimes a dict of lists keyed by the tool:
messageTimes = defaultdict(list) # a dict with sorted lists
for row in cdata: # tool, time, module, message
if self.message in row[3]:
messageTimes[row[0]].append(row[1])
-then in the determine function I used bisect:
def determine(tup):
if self.message in tup[3]: return True # matched message
times = messageTimes[tup[0]]
le = bisect.bisect_right(times, tup[1])
ge = bisect.bisect_left(times, tup[1])
return (le and tup[1]-times[le-1] <= tdiff) or (ge != len(times) and times[ge]-tup[1] <= tdiff)
With these changes the code that was taking over 2 hours took under 20 minutes, and even better, a query that was taking 40 minutes took 8 seconds!

I made 2 more changes and now that 20 minute query is taking 3 minutes:
found = defaultdict(int)
def determine(tup):
if self.message in tup[3]: return True # matched message
times = messageTimes[tup[0]]
idx = found[tup[0]]
le = bisect.bisect_right(times, tup[1], idx)
idx = le
return (le and tup[1]-times[le-1] <= tdiff) or (le != len(times) and times[le]-tup[1] <= tdiff)

Python timedelta receiving unexpected results

In a web application I'm writing for an existing database I need to calculate the difference between now and a timestamp stored in the database (in a text field, it's stupid, I know). Here's my sqlalchemy Ban class and the relevant method.
class Ban(base):
__tablename__= 'bans'
id= Column('banID', Integer, primary_key=True)
unban_timestamp= Column('UNBAN', Text)
banned_steamid= Column('ID', Text, ForeignKey('rp_users.steamid'))
banned_name= Column('NAME', Text)
banner_steamid= Column('BANNER_STEAMID', Text, ForeignKey('rp_users.steamid'))
banner_name= Column('BANNER', Text)
reason= Column('REASON', Text)
def unbanned_in(self, mode):
if self.unban_timestamp == '0':
return 'Never'
else:
now= datetime.datetime.utcnow()
time= datetime.datetime.fromtimestamp(int(self.unban_timestamp))
if now > time:
return 'Expired'
else:
remaining= time - now
if mode=='readable':
return remaining
elif mode=='int':
return str(remaining.seconds).zfill(10)
I need both the integer and the pretty string representations because I'll be presenting this in a html table and javascript needs a simple way to sort it. The problem I'm facing is that the integers and strings are not matching up, as you can see in this screenshot here:
if anyone can make sense of why the output is so screwed up that would be appreciated! if there's any more information that you need to answer my question I'll gladly add it.
edit
for the record at the top of the screenshot the unbanned_in timestamp is 1320247970 if I run that through my function this is the result I get
>>> ban = session.query(db.Ban).filter_by(id=3783).one()
>>> print ban.unbanned_in('int'), ban.unbanned_in('readable')
0000049044 2 days, 13:37:24.179045

If you want to get the number of seconds between time and now, use
remaining.days * 24 * 3600 + remaining.seconds
instead of just remaining.seconds

The output is screwed up because your calculation to convert a number of seconds to days, hours, minutes and seconds is wrong. Since you didn't post that bit of code that's all I can say, but note that there are 86400 seconds in a day and all of the counts of seconds you output are smaller than this.
The values you output for hours, minutes and seconds look fine, just your calculation for days is wrong.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to compare date (in query criteria) in pyral - python

Related

MySQL SQLALCHEMY Python Getting Max Count for Timestamp

Fixing a meeting room function schedule with double and triple bookings to determine space usage

invalid filter: Only one property per query may have inequality filters (>=, <=, >, <)

How can I make my code more efficient?

Python timedelta receiving unexpected results

Categories

Resources