How to advance clock and going through all the events

How to advance clock and going through all the events - python

Reading this answer (point 2) to a question related to Twisted's task.Clock for testing purposes, I found very weird that there is no way to advance the clock from t0 to t1 while catching all the callLater calls within t0 and t1.
Of course, you could solve this problem by doing something like:
clock = task.Clock()
reactor.callLater = clock.callLater
...
def advance_clock(total_elapsed, step=0.01):
elapsed = 0
while elapsed < total_elapsed:
clock.advance(step)
elapsed += step
...
time_to_advance = 10 # seconds
advance_clock(time_to_advance)
But then we have shifted the problem toward choosing a sufficiently small step, which could be very tricky for callLater calls that sample the time from a probability distribution, for instance.
Can anybody think of a solution to this problem?

I found very weird that there is no way to advance the clock from t0 to t1 while catching all the callLater calls within t0 and t1.
Based on what you wrote later in your question, I'm going to suppose that the case you're pointing out is the one demonstrated by the following example program:
from twisted.internet.task import Clock
def foo(reactor, n):
if n == 0:
print "Done!"
reactor.callLater(1, foo, reactor, n - 1)
reactor = Clock()
foo(reactor, 10)
reactor.advance(10)
One might expect this program to print Done! but it does not. If the last line is replaced with:
for i in range(10):
reactor.advance(1)
Then the resulting program does print Done!.
The reason Clock works this way is that it's exactly the way real clocks work. As far as I know, there are no computer clocks that operate with a continuous time system. I won't say it is impossible to implement a timed-event system on top of a clock with discrete steps such that it appears to offer a continuous flow of time - but I will say that Twisted makes no attempt to do so.
The only real difference between Clock and the real reactor implementations is that with Clock you can make the time-steps much larger than you are likely to encounter in typical usage of a real reactor.
However, it's quite possible for a real reactor to get into a situation where a very large chunk of time all passes in one discrete step. This could be because the system clock changes (there's some discussion of making it possible to schedule events independent of the system clock so that this case goes away) or it could be because some application code blocked the reactor for a while (actually, application code always blocks the reactor! But in typical programs it only blocks it for a period of time short enough for most people to ignore).
Giving Clock a way to mimic these large steps makes it possible to write tests for what your program does when one of these cases arises. For example, perhaps you really care that, when the kernel decides not to schedule your program for 2.1 seconds because of a weird quirk in the Linux I/O elevator algorithm, your physics engine nevertheless computes 2.1 seconds of physics even though 420 calls of your 200Hz simulation loop have been skipped.
It might be fair to argue that the default (standard? only?) time-based testing tool offered by Twisted should be somewhat more friendly towards the common case... Or not. Maybe that would encourage people to write programs that only work in the common case and break in the real world when the uncommon (but, ultimately, inevitable) case arises. I'm not sure.
Regarding Mike's suggestion to advance exactly to the next scheduled call, you can do this easily and without hacking any internals. clock.advance(clock.getDelayedCalls()[0].getTime() - clock.seconds()) will do exactly this (perhaps you could argue Clock would be better if it at least offered an obvious helper function for this to ease testing of the common case). Just remember that real clocks do not advance like this so if your code has a certain desirable behavior in your unit tests when you use this trick, don't be fooled into thinking this means that same desirable behavior will exist in real usage.

Given that the typical use-class for Twisted is to mix hardware events and timers I'm confused why you would want to do this, but...
My understanding is that interally Twisted is tracking callLater events via a number of lists that are inside of the reactor object (See: http://twistedmatrix.com/trac/browser/tags/releases/twisted-15.2.0/twisted/internet/base.py#L437 - the xxxTimedCalls lists inside of class ReactorBase)
I haven't done any work to figure out if those lists are exposed anywhere, but if you want to take the reactors life into your own hands I'm sure you could hack your way in.
With access to the timing lists you could simply forward time to whenever the next element of the list is ... though if your trying to test code that interacts with IO events, I can't imagine this is going to do anything but confuse you...
Best of luck

Here's a function that will advance the reactor to the next IDelayedCall by iterating over reactor.getDelayedCalls. This has the problem Mike mentioned of not catching IO events, so you can specify a minimum and maximum time that it should wait, as well as a maximum time step.
def advance_through_delayeds(reactor, min_t=None, max_t=None, max_step=None):
elapsed = 0
while True:
if max_t is not None and elapsed >= max_t:
break
try:
step = min(d.getTime() - reactor.seconds() for d in reactor.getDelayedCalls())
except ValueError:
# nothing else pending
if min_t is not None and elapsed < min_t:
step = min_t - elapsed
else:
break
if max_step is not None:
step = min(step, max_step)
if max_t is not None:
step = min(step, max_t-elapsed)
reactor.advance(step)
elapsed += step
return elapsed
If you need to wait for some I/O to complete, then set min_t and max_step to reasonable values.
# wait at least 10s, advancing the reactor by no more than 0.1s at a time
advance_through_delayeds(reactor, min_t=10, max_step=0.1)
If min_t is set, it will exit once getDelayedCalls returns an empty list after that time is reached.
It's probably a good idea to always set max_t to a sane value to prevent the test suite from hanging. For example, on the above foo function by JPC it does reach the print "Done!" statement, but then would hang forever as the callback chain never completes.

Related

Speed up my API calls using Python THREADS

So this is what I currently have. This code makes about 5,000 calls to the NBA API and returns the total Games Played and Points Scored of every NBA player who has ever played in the playoffs. The players (names as keys, stats as values) are all added to the 'stats_dict' dictionary.
MY QUESTION IS THIS: does anybody know how I could significantly increase the speed of this process by using threading? Right now, it takes about 30 minutes to make all these API calls, which of course I would love to significantly improve upon. I've never used threads before and would appreciate any guidance.
Thanks
import pandas as pd
from nba_api.stats.endpoints import commonallplayers
from nba_api.stats.endpoints import playercareerstats
import numpy as np
player_data = commonallplayers.CommonAllPlayers(timeout = 30)
player_df = player_data.common_all_players.get_data_frame().set_index('PERSON_ID')
id_list = player_df.index.tolist()
def playoff_stats(person_id):
player_stats = playercareerstats.PlayerCareerStats(person_id, timeout = 30)
yield player_stats.career_totals_post_season.get_data_frame()[['GP', 'PTS']].values.tolist()
stats_dict = {}
def run_it():
for i in id_list:
try:
stats_call = next(playoff_stats(i))
if len(stats_call) > 0:
stats_dict[player_df.loc[i]['DISPLAY_FIRST_LAST']] = [stats_call[0][0], stats_call[0][1]]
except KeyError:
continue

You're asking the wrong question. The real question is: why is my program taking 30 minutes?
In other words, where is my program spending time? What is it doing that's taking so long?
You can speed up a program by using threads ONLY if these two things are true:
The program is spending a significant fraction of its time waiting on some external resource (the internet or a printer, for example)
There is something useful that it could do in another thread while it's waiting
It is far from clear whether both of those things are true in your case.
Check out the time module in the standard Python library. If you go through your code and insert print(time.time()) statements at critical points, you will quickly see where the program is spending its time. Until you figure that out, you might be totally wasting your effort by writing a threaded version.
By the way, there are more sophisticated ways to get a handle on a program's performance, but your program is so incredibly slow that a few simple print statements should point you toward a better understanding.

Firstly, as others have mentioned, your program is not particularly optimized, which should be your number one step. I would recommend debugging it using some print statements or measuring run time (How to measure time taken between lines of code in python?).
Another possible solution that is a little more brute force is concurrent.futures. This can help to run a lot of things at once, but once again it won't matter if your code isn't optimized as you'll just be running unoptimized code a lot.
This link is for web scraping, but it might be helpful.

Is it possible to force a 2 second looping callback in Python?

I'm trying to get a looping call to run every 2 seconds. Sometimes, I get the desired functionality, but othertimes I have to wait up to ~30 seconds which is unacceptable for my applications purposes.
I reviewed this SO post and found that looping call might not be reliable for this by default. Is there a way to fix this?
My usage/reason for needing a consistent ~2 seconds:
The function I am calling scans an image (using CV2) for a dollar value and if it finds that amount it sends a websocket message to my point of sale client. I can't have customers waiting 30 seconds for the POS terminal to ask them to pay.
My source code is very long and not well commented as of yet, so here is a short example of what I'm doing:
#scan the image for sales every 2 seconds
def scanForSale():
print ("Now Scanning for sale requests")
#retrieve a new image every 2 seconds
def getImagePreview():
print ("Loading Image From Capture Card")
lc = LoopingCall(scanForSale)
lc.start(2)
lc2 = LoopingCall(getImagePreview)
lc2.start(2)
reactor.run()
I'm using a Raspberry Pi 3 for this application, which is why I suspect it hangs for so long. Can I utilize multithreading to fix this issue?

Raspberry Pi is not a real time computing platform. Python is not a real time computing language. Twisted is not a real time computing library.
Any one of these by itself is enough to eliminate the possibility of a guarantee that you can run anything once every two seconds. You can probably get close but just how close depends on many things.
The program you included in your question doesn't actually do much. If this program can't reliably print each of the two messages once every two seconds then presumably you've overloaded your Raspberry Pi - a Linux-based system with multitasking capabilities. You need to scale back your usage of its resources until there are enough available to satisfy the needs of this (or whatever) program.
It's not clear whether multithreading will help - however, I doubt it. It's not clear because you've only included an over-simplified version of your program. I would have to make a lot of wild guesses about what your real program does in order to think about making any suggestions of how to improve it.

python: run tasks at different precise times without busy loop

Input: array of float time values (in seconds) relative to program start. [0.452, 0.963, 1.286, 2.003, ... ]. They are not evenly spaced apart.
Desired Output: Output text to console at those times (i.e. printing '#')
My question is what is the best design principle to go about this. Below is my naive solution using time.time.
times = [0.452, 0.963, 1.286, 2.003]
start_time = time.time()
for event_time in times:
while 1:
if time.time() - start_time >= event_time:
print '#'
break
The above feels intuitively wrong using that busy loop (even if its in its own thread).
I'm leaning towards scheduling but want to make sure there aren't better design options: Executing periodic actions in Python
There is also the timer object: timers
Edit: Events only need 10ms precision, so +/- 10ms from exact event time.

A better pattern than busy waiting might be to use time.sleep(). This suspends execution rather than using the CPU.
time_diffs = [0.452, 0.511, 0.323, 0.716]
for diff in time_diffs:
time.sleep(diff)
print '#'
Threading can also be used to similar effect. However both of these solutions only work if the action you want to perform each time the program 'restarts' takes negligible time (perhaps not true of printing).
That being said no pattern is going to work if you are after 10ms precision and want to use Python on a standard OS. I recommend this question on Real time operating via Python which explains both that GUI events (i.e. printing to a screen) are too slow and unreliable for that level of precision, that your typical OSs where Python is run do not guarantee that level of precision and that Python's garbage collection and memory management also play havoc if you want 'real-time' events.

Python minecraft pyglet glClear() skipping frames

I recently downloaded fogleman's excellent "Minecraft in 500 lines" demo from https://github.com/fogleman/Craft. I used the 2to3 tool and corrected some details by hand to make it runnable under python3. I am now wondering about a thing with the call of self.clear() in the render method. This is my modified rendering method that is called every frame by pyglet:
def on_draw(self):
""" Called by pyglet to draw the canvas.
"""
frameStart = time.time()
self.clear()
clearTime = time.time()
self.set_3d()
glColor3d(1, 1, 1)
self.model.batch.draw()
self.draw_focused_block()
self.set_2d()
self.draw_label()
self.draw_reticle()
renderTime = time.time()
self.clearBuffer.append(str(clearTime - frameStart))
self.renderBuffer.append(str(renderTime - clearTime))
As you can see, I took the execution times of self.clear() and the rest of the rendering method. The call of self.clear() calls this method of pyglet, that can be found at .../pyglet/window/__init__.py:
def clear(self):
'''Clear the window.
This is a convenience method for clearing the color and depth
buffer. The window must be the active context (see `switch_to`).
'''
gl.glClear(gl.GL_COLOR_BUFFER_BIT | gl.GL_DEPTH_BUFFER_BIT)
So I basically do make a call to glClear().
I noticed some frame drops while testing the game (at 60 FPS), so I added the above code to measure the execution time of the commands, and especially that one of glClear(). I found out that the rendering itself never takes longer than 10 ms. But the duration of glClear() is a bit of a different story, here is the distribution for 3 measurements under different conditions:
Duration of glClear() under different conditions.
The magenta lines show the time limit of a frame. So everything behind the first line means there was a frame drop.
The execution time of glClear() seems to have some kind of "echo" after the first frame expires. Can you explain me why? And how can I make the call faster?
Unfortunately I am not an OpenGL expert, so I am thankful for every advice guys. ;)

Your graph is wrong. Well, at least it's not a suitable graph for the purpose of measuring performance. Don't ever trust a gl* function to execute when you tell it to, and don't ever trust it to execute as fast as you'd expect it to.
Most gl* functions aren't executed right away, they couldn't be. Remember, we're dealing with the GPU, telling it to do stuff directly is slow. So, instead, we write a to-do list (a command queue) for the GPU, and dump it into VRAM when we really need the GPU to output something. This "dump" is part of a process called synchronisation, and we can trigger one with glFlush. Though, OpenGL is user friendly (compared to, say, Vulkan, at least), and as such it doesn't rely on us to explicitly flush the command queue. Many gl* functions, which exactly depending on your graphics driver, will implicitly synchronise the CPU and GPU state, which includes a flush.
Since glClear usually initiates a frame, it is possible that your driver thinks it'd be good to perform such an implicit synchronisation. As you might imagine, synchronisation is a very slow process and blocks CPU execution until it's finished.
And this is probably what's going on here. Part of the synchronisation process is to perform memory transactions (like glBufferData, glTexImage*), which are probably queued up until they're flushed with your glClear call. Which makes sense in your example; the spikes that we can observe are probably the frames after you've uploaded a lot of block data.
But bear in mind, this is just pure speculation, and I'm not a total expert on this sort of stuff either, so don't trust me on the exact details. This page on the OpenGL wiki is a good resource on this sort of thing.
Though one thing is certain, your glClear call does not take as long as your profiler says it does. You should be using a profiler dedicated to profiling graphics.

How to add random delays between the queries sent to Google to avoid getting blocked in python

I have written a program which sends more than 15 queries to Google in each iteration, total iterations is about 50. For testing I have to run this program several times. However, by doing that, after several times, Google blocks me. is there any ways so I can fool google maybe by adding delays between each iteration? Also I have heard that google can actually learn the timesteps. so I need these delays to be random so google cannot find a patter from it to learn my behavior. also it should be short so the whole process doesn't take so much.
Does anyone knows something, or can provide me a piece of code in python?
Thanks

First, Google probably are blocking you because they don't like it when you take too many of their resources. The best way to fix this is to slow it down, not delay randomly. Stick a 1 second wait after every request and you'll probably stop having problems.
That said:
from random import randint
from time import sleep
sleep(randint(10,100))
will sleep a random number of seconds (between 10 and 100).

Best to use:
from numpy import random
from time import sleep
sleeptime = random.uniform(2, 4)
print("sleeping for:", sleeptime, "seconds")
sleep(sleeptime)
print("sleeping is over")
as a start and slowly decreasy range to see what works best (fastest).

Since you're not testing Google's speed, figure out some way to simulate it when doing your testing (as #bstpierre suggested in his comment). This should solve your problem and factor its variable response times out at the same time.

Also you can try to use few proxy servers for prevent ban by IP adress. urllib support proxies by special constructor parameter, httplib can use proxy too

For anyone stumbling here for the general "how to add random delay to my routine" case in 2022, numpy's recommended method [1] is to use their random number generator class:
from numpy.random import default_rng
from time import sleep
rng = default_rng()
# generates a scalar [single] value greater than or equal to 1
# but less than 3
time_to_sleep = rng.uniform(1, 3)
sleep(time_to_sleep)
[1] https://numpy.org/doc/stable/reference/random/index.html#quick-start

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.