Limit the rate of requests to an API in a CGI script - python

The script that I'm writing sometimes makes requests to an API and the API requires that requests are limited to a maximum of 1 per second.
What is the most straight forward way of limiting my requests to the API to 1 every second?
Would it involve storing the current time in a file each time a request is made?

You could use a separate thread for the CGI calls and a queuing mechanism that loops with a call to sleep on each iteration.
From 15.3. time
time.sleep(secs)
Suspend execution for the given number of seconds. The argument may be a floating point number to indicate a more precise sleep time. The actual suspension time may be less than that requested because any caught signal will terminate the sleep() following execution of that signal’s catching routine. Also, the suspension time may be longer than requested by an arbitrary amount because of the scheduling of other activity in the system.

One can use a rate-limiting python decorator on the function one wishes to rate-limit, like this one from Greg Burek:
import time
def RateLimited(maxPerSecond):
minInterval = 1.0 / float(maxPerSecond)
def decorate(func):
lastTimeCalled = [0.0]
def rateLimitedFunction(*args,**kargs):
elapsed = time.clock() - lastTimeCalled[0]
leftToWait = minInterval - elapsed
if leftToWait>0:
time.sleep(leftToWait)
ret = func(*args,**kargs)
lastTimeCalled[0] = time.clock()
return ret
return rateLimitedFunction
return decorate
#RateLimited(2) # 2 per second at most
def PrintNumber(num):
print num
if __name__ == "__main__":
print "This should print 1,2,3... at about 2 per second."
for i in range(1,100):
PrintNumber(i)

Related

Python most accurate method to measure time (ms)

I need to meassure the time certain parts of my code take. While executing my code on a powerfull server, I get 10 diffrent results
I tried comparing time measured with time.time(), time.perf_counter(), time.perf_counter_ns(), time.process_time() and time.process_time_ns().
import time
for _ in range(10):
start = time.perf_counter()
i = 0
while i < 100000:
i = i + 1
time.sleep(1)
end = time.perf_counter()
print(end - start)
I'm expecting when executing the same code 10 times, to be the same (the results to have a resolution of at least 1ms) ex. 1.041XX and not 1.030sec - 1.046sec.
When executing my code on a 16 cpu, 32gb memory server I'm receiving this result:
1.045549364
1.030857833
1.0466020120000001
1.0309665050000003
1.0464690349999994
1.046397238
1.0309525370000001
1.0312070380000007
1.0307592159999999
1.046095523
Im expacting the result to be:
1.041549364
1.041857833
1.0416020120000001
1.0419665050000003
1.0414690349999994
1.041397238
1.0419525370000001
1.0412070380000007
1.0417592159999999
1.041095523
Your expectations are wrong. If you want to measure code average time consumption use the timeit module. It executes your code multiple times and averages over the times.
The reason your code has different runtimes lies in your code:
time.sleep(1) # ensures (3.5+) _at least_ 1000ms are waited, won't be less, might be more
You are calling it in a tight loop,resulting in accumulated differences:
Quote from time.sleep(..) documentation:
Suspend execution of the calling thread for the given number of seconds. The argument may be a floating point number to indicate a more precise sleep time. The actual suspension time may be less than that requested because any caught signal will terminate the sleep() following execution of that signal’s catching routine. Also, the suspension time may be longer than requested by an arbitrary amount because of the scheduling of other activity in the system.
Changed in version 3.5: The function now sleeps at least secs even if the sleep is interrupted by a signal, except if the signal handler raises an exception (see PEP 475 for the rationale).
Emphasis mine.
Perfoming a code do not take the same time at each loop iteration because of the scheduling of the system (system puts on hold your process to perform another process then back to it...).

Sleeping for the remaining time

I'm trying to run a method every minute.
The method does some operations on the internet so it might take anywhere from 1 second to 30 seconds.
What I want to do is calculate the time spent by this method and then sleep for the remaining time, to make sure that the method itself runs every minute.
Currently my code looks like this:
def do_operation():
access_db()
sleep(60)
As you can see this does not take into account the delay whatsoever, and although it works, it will at some point fail and skip a minute completely, which should never happen.
import time
def do_operation():
start = time.time()
access_db()
time.sleep(60-time.time()+start)
This code will allow you to run a callable in defined intervals:
import time
import random
def recurring(interval, callable):
i = 0
start = time.time()
while True:
i += 1
callable()
remaining_delay = max(start + (i * interval) - time.time(), 0)
time.sleep(remaining_delay)
def tick_delay():
print('tick start')
time.sleep(random.randrange(1, 4))
print('tick end')
recurring(5, tick_delay)
Notes
The function tick_delay sleeps for some seconds to simulate a function which can take an undefined amount of time.
If the callable takes longer than the defined loop interval, the next iteration will be scheduled immediately after the last ended. To have the callable run in parallel you need to use threading or asyncio

How to profile/benchmark Python ASYNCIO code?

How do I profile/benchmark an assynchronous Python script (which uses ASYNCIO)?
I you would usualy do
totalMem = tracemalloc.get_traced_memory()[0]
totalTime = time.time()
retValue = myFunction()
totalTime = time.time() - totalTime
totalMem = tracemalloc.get_traced_memory()[0] - totalMem
This way I would save the total time taken by the function.
I learned how to use decorators and I did just that - and dumped all stats into a text file for later analysis.
But, when you have ASYNCIO script, things get pretty different: the function will block while doing an "await aiohttpSession.get()", and control will go back to the event loop, which will run other functions.
This way, the elapsed time and changes in total allocated memory won't reveal anything, because I will have measured more than just that function.
The only way it would work would be something like
class MyTracer:
def __init__(self):
self.totalTime = 0
self.totalMem = 0
self.startTime = time.time()
self.startMem = tracemalloc.get_traced_memory()[0]
def stop(self):
self.totalTime += time.time() - self.startTime
self.totalMem += tracemalloc.get_traced_memory()[0] - self.startMem
def start(self):
self.startTime = time.time()
self.startMem = tracemalloc.get_traced_memory()[0]
And now, somehow, insert it in the code:
def myFunction():
tracer = MyTracer()
session = aiohttp.ClientSession()
# do something
tracer.stop()
# the time elapsed here, and the changes in the memory allocation, are not from the current function
retValue = await(await session.get('https://hoochie-mama.org/cosmo-kramer',
headers={
'User-Agent': 'YoYo Mama! v3.0',
'Cookies': 'those cookies are making me thirsty!',
})).text()
tracer.start()
# do more things
tracer.stop()
# now "tracer" has the info about total time spent in this function, and the memory allocated by it
# (the memory stats could be negative if the function releases more than allocates)
Is there a way to accomplish this, I mean, profile all my asyncio code without having to insert all this code?
Or is there a module already capable of doing just that?
Check out Yappi profiler which has support for coroutine profiling. Their page on coroutine profiling describes the problem you're facing very clearly:
The main issue with coroutines is that, under the hood when a coroutine yields or in other words context switches, Yappi receives a return event just like we exit from the function. That means the time spent while the coroutine is in yield state does not get accumulated to the output. This is a problem especially for wall time as in wall time you want to see whole time spent in that function or coroutine. Another problem is call count. You see every time a coroutine yields, call count gets incremented since it is a regular function exit.
They also describe very high level how Yappi solves this problem:
With v1.2, Yappi corrects above issues with coroutine profiling. Under the hood, it differentiates the yield from real function exit and if wall time is selected as the clock_type it will accumulate the time and corrects the call count metric.

How to use sleep effectively in python 2.7 without increasing the active threads

I need to recursively call API request if an error occurs after making it sleep for few seconds. But I cannot allow to sleep because only limited number of active threads are allowed in Apache. So there comes situation like all active threads are in the sleep state. How can I handle this situation
currently, I am using like
from time import sleep
def api_request(flag=0):
time_delay = 2**flag
flag = flag + 1
result = apicall()
if result['status']="succes":
return(result)
else:
if flag < 5:
sleep(time_delay)
api_request(flag)
else:
print "api request failed"
I am looking alternative solution that won't sleep the thread but it calls the function after few seconds

Control the speed of a loop

I'm currently reading physics in the university, and im learning python as a little hobby.
To practise both at the same time, i figured I'll write a little "physics engine" that calculates the movement of an object based on x,y and z coordinates. Im only gonna return the movement in text (at least for now!) but i want the position updates to be real-time.
To do that i need to update the position of an object, lets say a hundred times a second, and print it back to the screen. So every 10 ms the program prints the current position.
So if the execution of the calculations take 2 ms, then the loop must wait 8ms before it prints and recalculate for the next position.
Whats the best way of constructing a loop like that, and is 100 times a second a fair frequency or would you go slower, like 25 times/sec?
The basic way to wait in python is to import time and use time.sleep. Then the question is, how long to sleep? This depends on how you want to handle cases where your loop misses the desired timing. The following implementation tries to catch up to the target interval if it misses.
import time
import random
def doTimeConsumingStep(N):
"""
This represents the computational part of your simulation.
For the sake of illustration, I've set it up so that it takes a random
amount of time which is occasionally longer than the interval you want.
"""
r = random.random()
computationTime = N * (r + 0.2)
print("...computing for %f seconds..."%(computationTime,))
time.sleep(computationTime)
def timerTest(N=1):
repsCompleted = 0
beginningOfTime = time.clock()
start = time.clock()
goAgainAt = start + N
while 1:
print("Loop #%d at time %f"%(repsCompleted, time.clock() - beginningOfTime))
repsCompleted += 1
doTimeConsumingStep(N)
#If we missed our interval, iterate immediately and increment the target time
if time.clock() > goAgainAt:
print("Oops, missed an iteration")
goAgainAt += N
continue
#Otherwise, wait for next interval
timeToSleep = goAgainAt - time.clock()
goAgainAt += N
time.sleep(timeToSleep)
if __name__ == "__main__":
timerTest()
Note that you will miss your desired timing on a normal OS, so things like this are necessary. Note that even with asynchronous frameworks like tulip and twisted you can't guarantee timing on a normal operating system.
Since you cannot know in advance how long each iteration will take, you need some sort of event-driven loop. A possible solution would be using the twisted module, which is based on the reactor pattern.
from twisted.internet import task
from twisted.internet import reactor
delay = 0.1
def work():
print "called"
l = task.LoopingCall(work)
l.start(delay)
reactor.run()
However, as has been noted, don't expect a true real-time responsiveness.
A piece of warning. You may not expect a real time on a non-realtime system. The sleep family of calls guarantees at least a given delay, but may well delay you for more.
Therefore, once you returned from sleep, query current time, and make the calculations into the "future" (accounting for the calculation time).

Categories