Python function with request might hang, how to timeout? - python

I'm building a script to process messages using the O365 module (https://pypi.org/project/O365/).
The script runs great but for some reason, after a random time (usually about 20 hours) it get's stuck on a request without response and the script just hangs there waiting for a response.
It's not a server throttling issue as I've slowed my script down to one request every minute and it still hangs.
I think it might be a bug in O365 module where it doesn't timeout the requests, so I'm thinking on making the calls on a separate thread and if it doesn't return in a certain amount of time, kill it.
But from what I understand, if I just try to join the thread it will try to wait until it finishes (which is never), is there a way to avoid this?
Thanks!

You can use multithreading and the join method. As explained in the documentation: "This blocks the calling thread until the thread whose join() method is called terminates – either normally or through an unhandled exception – or until the optional timeout occurs."
Your request will either terminate because it has been completed or because the maximum time limit has been reached.

Related

Abort long running http operation

In my (python) code I have a thread listening for changes from a couchdb feed (continuous changes). The changes request has a timeout parameter which is too big in certain circumstances (for example when a user wants to interrupt the program manually with ^C).
How can I abort a long-running blocking http request?
Is this possible, or do I need to reduce the timeout to make my program more responsive?
This would be unfortunate, because having a timeout small enough to make the program really responsive (say, 1s), means that there are lots of connections being created (one per second!), which defeats the purpose of listening to changes, and makes it very difficult to make sure that we are not missing any changes (in the re-connecting timespan we can indeed miss changes, so that special code is needed to handle that case)
The other option is to forcefully abort the thread, but that is not really an option in python.
If I understand correctly it looks like you are waiting too long between requests before deciding whether to respond to the users or not. You are right continuously closing and creating new connections will defeat the purpose of changes feed.
A solution could be to use heartbeat query parameter in which couchdb will keep sending newlines to tell the client that the connection is still alive.
http://localhost:5984/hello/_changes?feed=continuous&heartbeat=1000&include_docs=true
as long as you are getting heartbeats (newlines) you can be sure that you are getting new changes. A new line will indicate that no changes have occurred. Where as an actual change will be reported back. No need to close the connection. Respond to your clients if resp!="/n"
Blocking the thread execution in general prevents the thread from beeing terminated. You need to wait until the request timed out. But this is already clear.
Using a library that supports non blocking requests is maybe a solution, but I don't know if there is any.
Anyway ... you've mentioned that reducing the timeout will lead to more connections. I'd suggest to implement a waiting loop between requests that can be interrupted by an external signal to terminate the thread. with this loop you can control the number of requests independent from the timeout.

Listening for subprocess failure in python

Using subprocess.Popen(), I'm launching a process that is supposed to take a long time. However, there is a chance that the process will fail shortly after it launches (producing a return code of 1). If that happens, I want to intercept the failure and present an explanatory message to the user. Is there a way to "listen" to the process and respond if it fails? I can't just use Popen.wait() because my python program has to keep running.
The hack I have in place right now is to time.sleep() my python program for .5 seconds (which should be enough time for the subprocess to to fail if it's going to do so). After the python program resumes, it polls the subprocess to determine if it has failed or not.
I imagine that a better solution might use threading and Popen.wait(), but I'm a relative beginner to python.
Edit:
The subprocess is a Java daemon that I'm launching. If another instance of the daemon is already running on the system, the Java subprocess will exit with a return code of 1, and I want to intercept the messy Java exception stack trace and present an understandable error message to the user.
Two approaches:
Call Popen.wait() on a thread as you suggested yourself, then call an error handler function if the exit code is non-zero. Make sure that the error handler is thread safe, preferably by dispatching the error message to the main thread if your application has an event loop.
Rewrite your application to use an event loop that already supports monitoring child processes, such as pyev. If you just want to monitor one subprocess, this is probably overkill.

How to handle timeouts when a process receives SIGSTOP and SIGCONT?

I have some Python code which uses threading.Timer to implement a 60-second timeout for an operation.
The problem is that this code runs in a job-control environment where it may get pre-empted by a higher priority job. In this case it will be sent SIGSTOP, and then some time later, SIGCONT. I need a way to somehow notice that this has happened and reset the timeout: obviously the operation hasn't really timed out if it's been suspended for the whole 60 seconds.
I tried to add a signal handler for SIGCONT but this seems to get executed after the code provided to threading.Timer has been executed.
Is there some way to achieve this?
A fairly simple answer that occurred to me after posting this is to simply break up the timer into multiple sub-timers, e.g. having 10 6-second timers instead where each one starts the next one in a chain. That way, if I get suspended, I only lose one of the timers and still get most of the wait before timing out.
This is of course not foolproof, especially if I get repeatedly suspended and restarted, but it's easy to do and seems like it might be good enough.
You need to rethink what you're asking for; a timeout reflects elapsed time (wall time); you want to know the time used by your process.
Fortunately you can measure this with getrusage: http://docs.python.org/library/resource.html
You'll still need to set a timeout; when it returns, measure the increase in user or system time usage since the start of the operation and terminate the operation if it exceeds the limit, else reschedule the timeout appropriately.
If your application is multi-threaded, the docs says that:
only the main thread can set a new signal handler, and the main thread will be the only one to receive signals
Make sure you are handling your signals from the main thread.

Can AppEngine python threads last longer than the original request?

We're trying to use the new python 2.7 threading ability in Google App Engine and it seems like the created thread is getting killed before it finishes running. Our scenario:
User sends a message to the server
We update the user's data
We spawn a thread to do some more heavy duty processing
We return a response to the user before waiting for the heavy duty processing to finish
My assumption was that the thread would continue to run after the request had returned, as long as it did not exceed the total request time limit. What we're seeing though is that the thread is randomly killed partway through it's execution. No exceptions, no errors, nothing. It just stops running.
Are threads allowed to exist after the response has been returned? This does not repro on the dev server, only on live servers.
We could of course use a task queue instead, but that's a real pain since we'd have to set up a url for the action and serialize/deserialize the data.
The 'Sandboxing' section of this page:
http://code.google.com/appengine/docs/python/python27/using27.html#Sandboxing
indicates that threads cannot run past the end of the request.
Deferred tasks are the way to do this. You don't need a URL or serialization to use them:
from google.appengine.ext import deferred
deferred.defer(myfunction, arg1, arg2)

monitor stuck python processes

I have a python script that performs URL requests using the urllib2. I have a pool of 5 processes that run asynchronously and perform a function. This function is the one that makes the url calls, gets data, parses it into the required format, performs calculations and inserts data. The amount of data varies for each url request.
I run this script every 5 minutes using a cron job. Sometimes when i do ps -ef | grep python, I see stuck processes. Is there a way where in I can keep track of the processes meaning within the multiprocessing class that can keep track of the processes, their state meaning completed, stuck or dead and so on? Here is a code snippet:
This is how i call async processes
pool = Pool(processes=5)
pool.apply_async(getData, )
And the following is a part of getData which performs urllib2 requests:
try:
Url = "http://gotodatasite.com"
data = urllib2.urlopen(Url).read().split('\n')
except URLError, e:
print "Error:",e.code
print e.reason
sys.exit(0)
Is there a way to track stuck processes and rerun them again?
Implement a ping mechanism if you are so inclined to use multiprocessing. You're looking for processes that have become stuck because of slow I/O, I assume?
Personally I would go with a queue (not necessarily a queue server), say for example ~/jobs is a list of URLs to work on, then have a program that takes the first job and performs it. Then it's just a matter of bookkeeping - say, have the program note when it was started and what its PID is. If you need to kill slow jobs, just kill the PID and mark the job as failed.
Google for urllib2 and timeout. If the timeout is reached you get an exception, and the process is not stuck anymore.

Categories