python handling subprocess - python

I am running an os.system(cmd) in a for-loop. Since sometimes it hangs, I am trying to use process=subprocess.pOpen(cmd) in a for-loop. But I want to know the following:
If I do sleep(60) and then check if the process is still running by using process.poll(), how do I differentiate between process actually running even after 1 minute and process that hung?
If I kill the process which hung, will the for-loop still continue or will it exit?
Thanks!

I don't know of any general way to tell whether a process is hung or working. If a process hangs due to a locking issue, then it might consume 0% CPU and you might be able to guess that it is hung and not working; but if it hangs with an infinite loop, the process might make the CPU 100% busy but not accomplish any useful work. And you might have a process communicating on the network, talking to a really slow host with long timeouts; that would not be hung but would consume 0% CPU while waiting.
I think that, in general, the only hope you have is to set up some sort of "watchdog" system, where your sub-process uses inter-process communication to periodically send a signal that means "I'm still alive".
If you can't modify the program you are running as a sub-process, then at least try to figure out why it hangs, and see if you can then figure out a way to guess that it has hung. Maybe it normally has a balanced mix of CPU and I/O, but when it hangs it goes in a tight infinite loop and the CPU usage goes to 100%; that would be your clue that it is time to kill it and restart. Or, maybe it writes to a log file every 30 seconds, and you can monitor the size of the file and restart it if the file doesn't grow. Or, maybe you can put the program in a "verbose" mode where it prints messages as it works (either to stdout or stderr) and you can watch those. Or, if the program works as a daemon, maybe you can actively query it and see if it is alive; for example, if it is a database, send a simple query and see if it succeeds.
So I can't give you a general answer, but I have some hope that you should be able to figure out a way to detect when your specific program hangs.
Finally, the best possible solution would be to figure out why it hangs, and fix the problem so it doesn't happen anymore. This may not be possible, but at least keep it in mind. You don't need to detect the program hanging if the program never hangs anymore!
P.S. I suggest you do a Google search for "how to monitor a process" and see if you get any useful ideas from that.

A common way to detect things that have stopped working is to have them emit a signal at roughly regular intervals and have another process monitor the signal. If the monitor sees that no signal has arrived after, say, twice the interval it can take action such as killing and restarting the process.
This general idea can be used not only for software but also for hardware. I have used it to restart embedded controllers by simply charging a capacitor from an a.c. coupled signal from an output bit. A simple detector monitors the capacitor and if the voltage ever falls below a threshold it just pulls the reset line low and at the same time holds the capacitor charged for long enough for the controller to restart.
The principle for software is similar; one way is for the process to simply touch a file at intervals. The monitor checks the file modification time at intervals and if it is too old kills and restarts the process.
In OP's case the subprocess could write a status code to a file to say how far it has got in its work.

Related

Would adding an animation slow down my program in Python?

In Python, while I was testing a bruteforce script I saw that not printing something like Trying Password: *password* with every attempt significantly decreases the time it takes in order to find the password. I just let it run on a blank screen but if I put something as simple as a loading animation (Running . . .)in the beginning to let me know it's working fine, would that slow down my program too?
(Excuse me if any of what I said was hard to understand. I'm confused as well)
When attempting a bruteforce, it's best to have as much processing power available. A constant call from Python to update the screen (with a loading status, in this case) takes up some processing power and would indeed slow down the bruteforce.
By how much it slows down depends on how your script is written and the hardware it's running on. Better hardware - faster. Better threading for the script - faster. You might be able to avoid a noticeable impact if you offload the "animation" to a thread which isn't fully utilized (if your script leaves any such threads in the first place).
Though unless you are on a very slow PC, the main slow down probably doesn't come from the CPU, but from the data bus. Sending information between components at a very rapid pace could cause a bottleneck. So if your script waits for that bottleneck to pass before it continues cycling passwords - it gets slowed down. Try to separate the "loading" status from the rest of the logic, so that the CPU can keep cycling passwords without waiting for each screen refresh to pass.
I hope this helped.
I/O bound operations like printing are very slow compared to CPU bound ops like calculations.
So, everytime you printed, trying password, your program could have tried 1000 more combinations.
But if you want to print once in the beginning, it wont slow down, printing repetitively will.

Best way to restart (from beginning) a thread that is sleeping?

I have some python code I'm writing that's interfacing real-world hardware. It's replacing a hardware PLC. What I'm planning is when an event trigger happens to kick off multiple threads to effect certain 'on' actions, then go to sleep for a set interval, and then perform the corresponding 'off' actions. For example: at trigger, spawn a thread that turns the room lights on. Then go to sleep for 20 minutes. Then turn the lights off and terminate the thread.
However, I will have situations where the event trigger re-occurs. In that scenario I want the entire sequence to start over. My original plan was to use threads with unique names, so if a trigger occurs, check if the 'lights' thread exists, if if does kill it, and then re-spawn a new 'lights' thread. But in researching around these parts, it seems like people are suggesting that killing a thread is a Very Bad Thing to do.
So what would a better approach be to handling my situation? Note that in my example I only talked about one thread- but in reality there will be many different threads controlling many different devices.
This is python 3.x on a Rapberry Pi running raspbian, using rpi.gpio to monitor my input triggers and an I2C relay board for my output devices in case any of that info is useful.
Thanks!
the reason for not killing off threads is that it's easy to do it in a way that doesn't give the code any chance to "clean up" appropriately. i.e. finally blocks not run, resources leaked, etc…
there are various ways to get around this, you could wait on an Event as suggested by #Jérôme, treating a timeout as a signal to carry on
asyncio is another alternative as Cancelled exceptions tend to get used to clean up nicely and don't have the problems associated with killing native threads

Will I run into trouble with python's Global Interpreter Lock?

I am aware that this question is rather high-level and may be vague. Please ask if you need any more details and I will try to edit.
I am using QuickFix with Python bindings to consume high-throughput market data from circa 30 markets simultaneously. Most of computing the work is done in separate CPUs via the multiprocessing module. These parallel processes are spawned by the main process on startup. If I wish to interact with the market in any way via QuickFix, I have to do this within the main process, thus any commands (to enter orders, for example) which come from the child processes must be piped (via an mp.Queue object we will call Q) to the main process before execution.
This raises the problem of monitoring Q, which must be done within the main process. I cannot use Q.get(), since this method blocks and my entire main process will hang until something shows up in Q. In order to decrease latency, I must check Q frequently, on the order of 50 times per second. I have been using the apscheduler to do this, but I keep getting Warning errors stating that the runtime was missed. These errors are a serious issue because they prevent me from easily viewing important information.
I have therefore refactored my application to use the code posted by MestreLion as an answer to this question. This is working for me because it starts a new thread from the main process, and it does not print error messages. However, I am worried that this will cause nasty problems down the road.
I am aware of the Global Interpreter Lock in python (this is why I used the multiprocessing module to begin with), but I don't really understand it. Owing to the high-frequency nature of my application, I do not know if the Q monitoring thread and the main process consuming lots of incoming messages will compete for resources and slow each other down.
My questions:
Am I likely to run into trouble in this scenario?
If not, can I add more monitoring threads using the present approach and still be okay? There are at least two other things I would like to monitor at high frequency.
Thanks.
#MestreLion's solution that you've linked creates 50 threads per second in your case.
All you need is a single thread to consume the queue without blocking the rest of the main process:
import threading
def consume(queue, sentinel=None):
for item in iter(queue.get, sentinel):
pass_to_quickfix(item)
threading.Thread(target=consume, args=[queue], daemon=True).start()
GIL may or may not matter for performance in this case. Measure it.
Without knowing your scenario, it's difficult to say anything specific. Your question suggests, that the threads are waiting most of the time via get, so GIL is not a problem. Interprocess communication may result in problems much earlier. There you can think of switching to another protocol, using some kind of TCP-sockets. Then you can write the scheduler more efficient with select instead of threads, as threads are also slow and resource consuming. select is a system function, that allows to monitor many socket-connection at once, therefore it scales incredibly efficient with the amount of connections and needs nearly no CPU-power for monitoring.

Best practice: Monitor processes

I was wondering what the best practice solution would be to constantly monitor and resart processes, because there are multiple ways in doing it.
Additional info:
I have a unix program which uses multiple processes to work. There's a main process, it always starts first and is not likely to die or terminate without stopping the program.
Then I spawn multiple "module" processes, which take care of some work and communicate through the main process. Those modules sometimes die because of exceptions, and because it's an external program I can't resolve the issues, so I have to restart them if they die.
I've made a program to check if any of the modules died and restart them, but I need to run it manually. My program checks if the pid files of the modules exist and if they listen on a specific tcp port. If the pid file doesn't exist or the socket can't establish connection, it restarts the module.
My thoughts so far:
Cron job to run the checks every minute and restart any dead modules. (kind of an overkill, because they don't die that frequently)
Daemon running in the background, which starts the modules and receives notifications if they die, so it doesn't have to check them constantly. (SIGCHLD signal, os.wait)
If I use the daemon method, how should I communicate with the daemon through my interface? (socket, or maybe a file which gets read if the daemon receives a specific signal)
Usually I would just go with the daemon because it seems to be the best practice method to restart the modules asap(cron only runs once a minute), but I've wanted to get some opinions from more experienced users. (I've never done something like this before, and asking doesn't hurt anyone :D)
I apologize if these questions are answered somewhere else, but I couldn't find any related question.
P.S. If I forgot something or you need more infos, please feel free to ask. :)
I would investigate running the monitoring process as part of a dedicated monitoring framework. Monit is one example, however there are of course others.
This has the advantage of providing additional features which might be useful, such as email alerts and analytics. In my experience, you should be able to use your existing program without too much modification, and Monit itself uses few system resources if that is a concern.

How to detect unresponsive/frozen processes?

I have several scripts that I use to do some web crawling. They are always running, and should never stop. However, after about a week, they systematically "freeze": there is no output anymore, no response to Ctrl+C or anything. The only way is to kill the process and restart it.
I suspect that these issues come from the library I use for retrieving the data (urllib2), but the issue is very hard to reproduce.
I am thus wondering how I could check the state of the process and kill/restart it automatically if it is frozen. I was thinking of creating a PID file, and update it regularly. Another script could then periodically check the last modification date of this PID file, and restart the process if it's too old. I could use something like Monit to do the monitoring.
Is this how I should do it? Is there another best practice/common way for checking the responsiveness of a process?
If you have a process that is always running, has no connected terminal, and is the process group leader - that is a daemon. You undoubtedly know all that.
There are some defacto practices in coding programs like that. One is to have a signal handler which takes SIGHUP and forces the program to reinitialize itself. This means closing all of the open log files, rereading config scripts, etc. I do not know how applicable that is to your problem but it sometimes solves issues like frozen daemons at my work.
You can customize the idea by employing SIGUSR1 and SIGUSR2 signals to do special things, like write status to a file, or anything else. Since signals come in on an interrupt, the trap statement in scripts and signal handlers in python itself will push program state onto the interrupt stack and do "stuff".
In your case you may want the program fork/exec itself and then kill the parent.

Categories