I have a python library which must be fast enough for online application. If a particular request (function call) takes too long, I want to just bypass this request with an empty result returned.
The function looks like the following:
def fast_function(text):
result = mylibrary.process(text)
...
If the mylibrary.process spend time more than a threshold limit, i.e. 100 milliseconds, I want to bypass this request and proceed to process the next 'text'.
What's the normal way to handle this? Is this a normal scenario? My application can afford to bypass a very small number of requests like this, if it takes too long.
One way is to use a signal timer. As an example:
import signal
def took_too_long():
raise TimeoutError
signal.signal(signal.SIGALRM, took_too_long)
signal.setitimer(signal.ITIMER_REAL, 0.1) # 0.1 seconds
try:
result = mylibrary.process(text)
signal.setitimer(signal.ITIMER_REAL, 0) # success, reset to 0 to disable the timer
except TimeoutError:
# took too long, do something
You'll have to experiment to see if this does or does not add too much overhead.
You can add a timeout to your function.
One way to implement it is to use a timeout decorator which will throw an exception if the function runs for more than the defined timeout. In order to pass to the next operation you can catch the exception thrown by the timeout.
Install this one for example: pip install timeout-decorator
import timeout_decorator
#timeout_decorator.timeout(5) # timeout of 5 seconds
def fast_function(text):
result = mylibrary.process(text)
Related
I'm desperate.
My code reads nframe in videos, sometimes the code just stop for no reason, and no error.
So I decided to somehow raise an error.
The thing is, the code does raise an error, but it ignores it for some reason, and just works as normal.
*Ive provided a code block on which exactly the same method works.
handler:
def handler(signum,frame):
print("error") ## This is printed
raise Exception('time out') ## I guess this is getting raised
Code part i want to wrap:
for i in range(0,int(frame_count), nframe): # basicly loads every nframe from the video
try:
frame = video.set(1,i)
signal.signal(signal.SIGALRM), handler)
signal.alarm(1) # At this point, the 'handler' did raise the error, but it did not kill this 'try' block.
_n,frame = video.read() # This line sometimes gets for infinit amount of time, and i want to wrap it
except Exception as e:
print('test') # Code does not get here, yet the 'handler' does raise an exception
raise e
# Here i need to return False, or rise an error, but the code just does not get here.
An example where exactly the same method will work:
import signal
import time
def handler(signum, frame):
raise Exception('time out')
def function():
try:
signal.signal(signal.SIGALRM,handler)
signal.alarm(5) # 5 seconds till raise
time.sleep(10) # does not get here, an Exception is raised after 5 seconds
except Exception as e:
raise e # This will indeed work
My guess is that the read() call is blocked somewhere inside C code. The signal handler runs, puts an exception into the Python interpreter somewhere, but the exception isn't handled until the Python interpreter regains control. This is a limitation documented in the signal module:
A long-running calculation implemented purely in C (such as regular expression matching on a large body of text) may run uninterrupted for an arbitrary amount of time, regardless of any signals received. The Python signal handlers will be called when the calculation finishes.
One possible workaround is to read frames on a separate process using the multiprocessing module, and return them to the main process using a multiprocessing.Queue (from which you can get with a timeout). However, there will be extra overhead in sending the frames between processes.
Another approach might be to try and avoid the root of the problem. OpenCV has different video backends (V4L, GStreamer, ffmpeg, ...); one of them might work where another doesn't. Using the second argument to the VideoCapture constructor, you can indicate a preference for which backend to use:
cv.VideoCapture(..., cv.CAP_FFMPEG)
See the documentation for the full list of backends. Depending on your platform and OpenCV build, not all of them will be available.
I have the following function,
import requests
def get_url_type(data):
x = {}
for i in range(0,len(data)):
print i
try:
x[i] = requests.head(data['url'][i]).headers.get('content-type')
except:
x[i] = 'Not Available'
return(x)
This function returns the URL type of each URL that is being passed to it and whenever there is no response, it throws error which is caught using exception. My problem here is, some of the requests take more than 5-10 mins time which is too much on production environment. I want the function to return "Not Available" when it takes more than 5 mins. When I did a research about it, it was mentioned to convert the function to asynchronous one. I have trying to change it without much success.
The following is what I have tried,
import asyncio
import time
from datetime import datetime
async def custom_sleep():
print('SLEEP', datetime.now())
time.sleep(5)
My objective is, whenever the request function takes more than 5 mins, it should return "Not available" and move to the next iteration.
Can anybody help me in doing this?
Thanks in advance !
It seems you just want a request to time out after a given time has passed without reply and move on to the next request. For this functionality there is a timeout parameter you can add to your request. The documentation on this: http://docs.python-requests.org/en/master/user/quickstart/#timeouts.
With a 300 seconds (5 minutes) timeout your code becomes:
requests.head(data['url'][i], timeout=300)
The asynchronous functionality you are mentioning has actually a different objective. It would allow your code to not have to wait the 5 minutes at all before continuing execution but I believe that would be a different question.
I have a little script which filters those domain names which are not registred yet. I use pywhois module. The problem is that it suddenly freeze and do nothing after several (sometimes hundreds) of requests. I think it is not a ban because I can run the program right after freeze and it works.
I would like to avoid this freezing. My idea is to count runtime of the function and if the time cross some line (for example 10 seconds) it repeats the code.
Do you have any advice how to avoid the freezing? Or the better way to check domains?
Here is the code:
for keyword in keywords:
try:
details = pythonwhois.get_whois(keyword+'.com')
except Exception as e:
print e
continue
if 'status' not in details.keys():
print 'Free domain!'
print keyword
This method is prone to change (if the underlying library changes), however, you can call internal socket functions to set a timeout for all pythonwhois network calls. For example:
TIMEOUT = 5.0 # timeout in seconds
pythonwhois.net.socket.setdefaulttimeout(TIMEOUT)
pythonwhois.get_whois("example.com")
Maybe you could try dnspython. It looks like you just want to check if a domain name is registered. For example:
import dns.resolver
for keyword in keywords:
try:
dns.resolver.query(keyword+'.com')
except dns.resolver.NXDOMAIN:
print(keyword+'.com is available!')
DNS resolver has a default timeout of 2 seconds. If you want to change that, you can make a new instance of dns.resolver.Resolver with a different timeout.
To make it multithreaded, a thread pool would be the best choice if you can use python3:
from multiprocessing import Pool
def check_keyword(keyword):
try:
dns.resolver.query(keyword+'.com')
except dns.resolver.NXDOMAIN:
# You probably want to change this to a return
print(keyword+'.com is available!')
if __name__ == '__main__':
keywords = [...]
p = Pool(5)
print(p.map(check_keyword, keywords))
I'm pretty new to twisted, I have an HTTP client that queries a server that has rate limit, when I hit this limit the server responds with HTTP 204, so when I'm handling the response I'm doing probably something nasty, like this:
def handleResponse(r, ip):
if r.code == 204:
print 'Got 204, sleeping'
time.sleep(120)
return None
else:
jsonmap[ip] = ''
whenFinished = twisted.internet.defer.Deferred()
r.deliverBody(PrinterClient(whenFinished, ip))
return whenFinished
I'm doing this because I want to pause all the tasks.
Following there are 2 behaviours that I've in my mind, either re-run the tasks that hit 204 afterwards in the same execution (don't know if it's possible) or just log the errors and re-run them afterwards in another execution of the program. Another problem that may raise is that I've set a timeout on the connection in order to cancel the deferred after a pre-defined amount of time (see the code below) if there's no response from the server
timeoutCall = reactor.callLater(60, d.cancel)
def completed(passthrough):
if timeoutCall.active():
timeoutCall.cancel()
return passthrough
d.addCallback(handleResponse, ip)
d.addErrback(handleError, ip)
d.addBoth(completed)
Another problem that I may encounter is that if I'm sleeping I may hit this timeout and all my requests will be cancelled.
I hope that I've been enough precise.
Thank you in advance.
Jeppo
Don't use time.sleep(20) in any Twisted-based code. This violates basic assumptions that any other Twisted-based code that you might be using makes.
Instead, if want to delay something by N seconds, use reactor.callLater(N, someFunction).
Once you remove the sleep calls from your program, the problem of unrelated timeouts being hit just because you've stopped the reactor from processing events will go away.
For anyone stumbling across this thread, it's imperative that you never call time.sleep(...); however, it is possible to create a Deferred that does nothing but sleep... which you can use to compose delays into a deferred chain:
def make_delay_deferred(seconds, result=None):
d = Deferred()
reactor.callLater(seconds, d.callback, result)
return d
I have scripts in both Python and Ruby that run for days at a time and rely on the internet to go to certain domains and collect data. Is there a way to implement a network connectivity check into my script so that I could pause/retry iterations of a loop if there is no connectivity and only restart when there is connectivity?
There may be a more elegant solution, but I'd do this:
require 'open-uri'
def internet_connectivity?
open('http://google.com')
true
rescue => ex
false
end
Well in Python I do something similar with a try except block like the following:
import requests
try:
response = requests.get(URL)
except Exception as e:
print "Something went wrong:"
print e
this is just a sample of what you could do, you can check for error_code or some information on the exception and according to that you can define what to do. I usually put the script to sleep for 10 minutes when something goes wrong on the request.
import time
time.sleep(600)
here's a unix-specific solution:
In [18]: import subprocess
In [19]: subprocess.call(['/bin/ping', '-c1', 'blahblahblah.com'])
Out[19]: 1
In [20]: subprocess.call(['/bin/ping', '-c1', 'google.com'])
Out[20]: 0
ie, ping will return 0 if the ping is successful
Inline way of doing it:
require 'open-uri'
def internet_access?; begin open('http://google.com'); true; rescue => e; false; end; end
puts internet_access?
In Python you can do something like this:
def get_with_retry(url, tries=5, wait=1, backoff=2, ceil=60):
while True:
try:
return requests.get(url)
except requests.exceptions.ConnectionError:
tries -= 1
if not tries:
raise
time.sleep(wait)
wait = min(ceil, wait * backoff)
This tries each request up to tries times, initially delaying wait seconds between attempts, but increasing the delay by a factor of backoff for each attempt up to a maximum of ceil seconds. (The default values mean it will wait 1 second, then 2, then 4, then 8, then fail.) By setting these values, you can set the maximum amount of time you want to wait for the network to come back, before your main program has to worry about it. For infinite retries, use a negative value for tries since that'll never reach 0 by subtracting 1.
At some point you want the program to tell you if it can't get on the network, and you can do that by wrapping the whole program in a try/except that notifies you in some way if ConnectionError occurs.