python restart after unexpected exit with `while true` loop - python

l wrote a python script with while true to do task to catch email's attachment, but sometimes l found out it would exit unexpectedly on server.
l run it on my local for more than 4 hours with no problem, so l can confirm that the code is correct.
So is there a kind of mechanism to restart python when it exit unexpectedly, such as process monitoring? l am a novice in linux.
remark: l run this python script like python attachment.py & in a shell script.

While #triplee's comment will definitely do the trick, I would worry that there is something going on that you would be better-off understanding. That is, why the script is failing.
Without further details, it's difficult to speculate what might be happening. As a first debugging effort, you might try wrapping the entire body within the while True in a try ... except... block, and use the except block to log the error and/or the program state. That is,
while True:
try:
... do some stuff...
except:
... log the exception, print to screen, record the values of key variables, etc.
continue
This would allow you to understand what is happening during the failure, and to write more robust code that handles that event.

l run it on my local for more than 4 hours with no problem, so l can confirm that the code is correct.
You could be surprised by the number of bugs that only reveals after months if not years of correct processing... What you confirm is that the code does not break on first action, but unless you have tested it with all possible corner cases in input (including badly formatted ones) you cannot confirm that it will never break.
That is the reason why a program that is intented to run unattendedly should be carefully designed to always (try to*) leave a trace before exiting. try: except: and the logging module are your best friends here.
* Of cause in case of a system crash or a power outage there's nothing you can do at user program level...

You can try to use Supervisor to manage your process. The Supervisor able to configure the bevhiour of the process exit status and try to restart it.
Attached is the official document and the example in Ubuntu:
example configuration
[program:nodehook]
command=/usr/bin/node /srv/http.js
directory=/srv
autostart=true
autorestart=true
startretries=3
stderr_logfile=/var/log/webhook/nodehook.err.log
stdout_logfile=/var/log/webhook/nodehook.out.log
user=www-data
environment=SECRET_PASSPHRASE='this is secret',SECRET_TWO='another secret

Related

How to hard suspend or pause a python script after it runs so it doesn’t force close upon completion?

Hi so I’m working on a python script that involves a loop function, so far the loop function process is failing for some reason(although I kinda know why) but the problem I’ve got os.system(‘pause’) and also input(“prompt:”) at end of the code in order to pause all activity so I can read the error messages prior to script completion and termination but the script still shuts down, I need a way to HARD pause it or freeze before the window closes abruptly. Need help and any further insight.
Ps. Let me know if you need any more info to better describe this problem.
I assume you are just 'double clicking' the icon on Window Explorer. This has the disadvantage which you are encountering here in that the shell (terminal window) closes when the process finishes so you can't tell what went wrong if it terminated due to an error.
A better method would be to use the command prompt. If you are not familiar with this, there are many tutorials online.
The reason this will help with your problem is that, once navigating to the script's containing directory, you can use python your_script.py (assuming python is in your path environmental variable) to run the script within the same window.
Then, even if it fails, you can read the error messages as you will only be returned to the command line.
An alternative hacky method would be to create a script called something like run_pythons.py which will use the subprocess module to call your actual script in the same window, and then (no matter how it terminates), wait for your input before terminating itself so that you can read the error messages.
So something like:
import subprocess
subprocess.call(('python', input('enter script name: ')))
input('press ENTER to kill me')
I needed something like this at one point. I had a wrapper that loaded a bunch of modules and data and then waited for a prompt to run something. If I had a stupid mistake in a module, it would quit, and that time that it spent loading all that data into memory would be wasted, which was >1min. For me, I wanted a way to keep that data in memory even if I had an error in a module so that I could edit the module and rerun the script.
To do this:
while True:
update = raw_input("Paused. Enter = start, 'your input' = update params, C-C = exit")
if update:
update = update.split()
#unrelevant stuff used to parse my update
#custom thing to reload all my modules
fullReload()
try:
#my main script that needed all those modules and data loaded
model_starter.main(stuff, stuff2)
except Exception as e:
print(e)
traceback.print_exc()
continue
except KeyboardInterrupt:
print("I think you hit C-C. Do it again to exit.")
continue
except:
print("OSERROR? sys.exit()? who knows. C-C to exit.")
continue
This kept all the data loaded that I grabbed from before my while loop started, and prevented exiting on errors. It also meant that I could still ctrl+c to quit, I just had to do it from this wrapper instead of once it got to the main script.
Is this somewhat what you're looking for?
The answer is basically, you have to catch all your exceptions and have a method to restart your loop once you figured out and fixed the issue.

Large dataset - no error - but it wont run - python memory issue?

So I am trying to run various large images which gets put into an array using numpy so that I can then do some calculations. The calculations get done per image and the opening and closing of each image is done in a loop. I a have reached a frustration point because I have no errors in the code (well none to my knowledge nor any that python is complaining about), and as a matter of fact my code runs for one loop, and then it simply does not run for the second, third, or other loops.
I get no errors! No memory error, no syntax error, no nothing. I have used Spyder and even IDLE, and it simply runs all the calculations sometimes only for one image, sometimes for two, then it just quits the loop (again WITH NO ERROR) as if it had completed running for all images (when it has only ran for one/two images).
I am assuming its a memory error? - I mean it runs one loop , sometimes two, but never the rest? -- so ...
I have attempted to clear the tracebacks using this:
sys.exc_clear()
sys.exc_traceback = sys.last_traceback = None
I have also even tried to delete each variable when I am done with it
ie. del variable
However, nothing seems to fix it --
Any ideas of what could be wrong would be appreciated!
The exit code of the python process should reveal the reason for the process exiting. In the event of an adverse condition, the exit code will be something other than 0. If you are running in a Bash shell or similar, you can run "echo $?" in your shell after running Python to see its exit status.
If the exit status is indeed 0, try putting some print statements in your code to trace the execution of your program. In any case, you would do well to post your code for better feedback.
Good luck!

Android's MonkeyRunner occasionally throws exceptions

I am running an automated test using an Android emulator driving an app with a Monkey script written in Python.
The script is copying files onto the emulator, clicks buttons in the app and reacts depending on the activities that the software triggers during its operation. The script is supposed to be running the cycle a few thousand times so I have this in a loop to run the adb tool to copy the files, start the activities and see how the software is reacting by calling the getProperty method on the device with the parameter 'am.current.comp.class'.
So here is a very simplified version of my script:
for target in targets:
androidSDK.copyFile(emulatorName, target, '/mnt/sdcard')
# Runs the component
device.startActivity(component='com.myPackage/com.myPackage.myactivity')
while 1:
if device.getProperty('am.current.comp.class') == 'com.myPackage.anotheractivity':
time.sleep(1) # to allow the scree to display the new activity before I click on it
device.touch(100, 100, 'DOWN_AND_UP')
# Log the result of the operation somewhere
break
time.sleep(0.1)
(androidSDK is a small class I've written that wraps some utility functions to copy and delete files using the adb tool).
On occasions the script crashes with one of a number of exceptions, for instance (I am leaving out the full stack trace)
[com.android.chimpchat.adb.AdbChimpDevice]com.android.ddmlib.ShellCommandUnresponsiveException
or
[com.android.chimpchat.adb.AdbChimpDevice] Unable to get variable: am.current.comp.class
[com.android.chimpchat.adb.AdbChimpDevice]java.net.SocketException: Software caused connectionabort: socket write error
I have read that sometimes the socket connection to the device becomes unstable and may need a restart (adb start-server and adb kill-server come in useful).
The problem I'm having is that the tools are throwing Java exceptions (Monkey runs in Jython), but I am not sure how those can be trapped from within my Python script. I would like to be able to determine the exact cause of the failure inside the script and recover the situation so I can carry on with my iterations (re-establish the connection, for instance? Would for instance re-initialising my device with another call to MonkeyRunner.waitForConnection be enough?).
Any ideas?
Many thanks,
Alberto
EDIT. I thought I'd mention that I have discovered that it is possible to catch Java-specific exceptions in a Jython script, should anyone need this:
from java.net import SocketException
...
try:
...
except(SocketException):
...
It is possible to catch Java-specific exceptions in a Jython script:
from java.net import SocketException
...
try:
...
except(SocketException):
...
(Taken from OP's edit to his question)
This worked for me:
device.shell('exit')# Exit the shell

Why is my threading/multiprocessing python script not exiting properly?

I have a server script that I need to be able to shutdown cleanly. While testing the usual try..except statements I realized that Ctrl-C didn't work the usual way. Normally I'd wrap long running tasks like this
try:
...
except KeyboardInterrupt:
#close the script cleanly here
so the task could be shutdown cleanly on Ctrl-C. I have never ran into any problems with this before, but somehow when I hit Ctrl-C when this particular script is running the script just exits without catching the Ctrl-C.
The initial version was implemented using Process from multiprocessing. I rewrote the script using Thread from threading, but same issue there. I have used threading many times before, but I am new to the multiprocessing library. Either way, I have never experienced this Ctrl-C behavior before.
Normally I have always implemented sentinels etc to close down Queues and Thread instances in an orderly fashion, but this script just exits without any response.
Last, I tried overriding signal.SIGINT as well like this
def handler(signal, frame):
print 'Ctrl+C'
signal.signal(signal.SIGINT, handler)
...
Here Ctrl+C was actually caught, but the handler doesn't execute, it never prints anything.
Besides the threading / multiprocessing aspect, parts of the script contains C++ SWIG objects. I don't know if that has anything to do with it. I am running Python 2.7.2 on OS X Lion.
So, a few questions:
What's going on here?
How can I debug this?
What do I need to learn in order to understand the root cause?
PLEASE NOTE: The internals of the script is proprietary so I can't give code examples. I am however very willing to receive pointers so I could debug this myself. I am experienced enough to be able to figure it out if someone could point me in the right direction.
EDIT: I started commenting out imports etc to see what caused the weird behavior, and I narrowed it down to an import of a C++ SWIG library. Any ideas why importing a C++ SWIG library 'steals' Ctrl-C? I am not the author of the guilty library however and my SWIG experience is limited so don't really know where to start...
EDIT 2: I just tried the same script on a windows machine, and in Windows 7 the Ctrl-C is caught as expected. I'm not really going to bother with the OS X part, the script will be run in an Windows environment anyway.
This might have to do with the way Python manages threads, signals and C calls.
In short - Ctrl-C cannot interrupt C calls, since the implementation requires that a python thread will handle the signal, and not just any thread, but the main thread (often blocked, waiting for other threads).
In fact, long operations can block everything.
Consider this:
>>> nums = xrange(100000000)
>>> -1 in nums
False (after ~ 6.6 seconds)
>>>
Now, Try hitting Ctrl-C (uninterruptible!)
>>> nums = xrange(100000000)
>>> -1 in nums
^C^C^C (nothing happens, long pause)
...
KeyboardInterrupt
>>>
The reason Ctrl-C doesn't work with threaded programs is that the main thread is often blocked on an uninterruptible thread-join or lock (e.g, any 'wait', 'join' or just a plain empty 'main' thread, which in the background causes python to 'join' on any spawned threads).
Try to insert a simple
while True:
time.sleep(1)
in your main thread.
If you have a long running C function, do signal handling in C-level (May the Force be with you!).
This is largely based on David Beazley's video on the subject.
It exits because something else is likely catching the KeyboardInterupt and then raising some other exception, or simply returning None. You should still get a traceback to help debug. You need to capture the stderr output or run your script with the -i commandline option so you can see traceback. Also, add another except block to catch all other exceptions.
If you suspect the C++ function call to be catching the CTRL+C try catching it's output. If the C function is not returning anything then there isn't much you can do except ask the author to add some exception handling, return codes, etc.
try:
#Doing something proprietary ...
#catch the function call output
result = yourCFuncCall()
#raise an exception if it's not what you expected
if result is None:
raise ValueError('Unexpected Result')
except KeyboardInterupt:
print('Must be a CTRL+C')
return
except:
print('Unhandled Exception')
raise
how about atexit?
http://docs.python.org/library/atexit.html#module-atexit

debugging: how to check what where my Python program is hanging?

A fairly large Python program I write, runs, but sometimes, after running for minutes or hours, in a non easily reproducible moment, hangs and outputs nothing to the screen.
I have no idea what it is doing at that moment, and in what part of code it is.
How can I run this in a debugger or something to see what lines of code is the program executing in the moment it hangs?
Its too large to put "print" statements all over the place.
I did:
python -m trace --trace /usr/local/bin/my_program.py
but that gives me so much output that I can't really see anything, just millions of lines scrolling on the screen.
Best would be if I could send some signal to the program with "kill -SIGUSR1" or something, and at that moment the program would drop into a debugger and show me the line it stopped at and possibly allow me to step through the program then.
I've tried:
pdb usr/local/bin/my_program.py
and then:
(Pdb) cont
but what do I do to see where I am when it hangs?
It doesn't throw and exception, just seems like it waits for something, possibly in an infinite loop.
One more detail: when the program hangs, and I press ^C and then (not sure if that is necessary) the program continues normally (without throwing any exception and without giving me any hint on the screen why did it stop).
This could be useful to you. I usually do
>>> import pdb
>>> import program2debug
>>> pdb.run('program2debug.test()')
I usually add a -v option to my programs, which enables tons of print statements explaining what I'm doing in detail. When you write a program in the future, consider doing the same before it gets thousands of lines big.
You could try running it in debug mode in an IDE like pydev (eclipse) or pycharm. You can break the program at any moment and get to its current execution point.
No program is ever too big to put print statements all over the place. You need to read up on the logging module and insert lots of logging.debug() statements. This is just a better form of print statement that outputs to a file, and can be turned off easily in production software. But years from now, when you need to modify the code, you can easily turn it all back on and get the benefit of the insight of the original programmer.

Categories