Python Multiprocessing, Trouble Terminating Processes on Restart and Preventing Zombies - python

Solution:
Thanks to Rick Sanders, adding this function after terminating a process resolves the issue:
os.waitpid(pid, options)
Zombie processes are created when a process is terminated, and unless they are reaped (by requesting exit code). They remain for the purpose that the parent can request it's exit code, and as my script does not truly exit, it's process is replaced by execv(file, args), the parent never requests the exit code and the zombie process is kept. This works on both my OSX and Debian systems.
I am working on a very large script and have recently implemented multiprocessing and IMAP to listen for emails. Before I implemented this I had implemented a restart command that I can enter at command-line to refresh the script after editing, in a nutshell it does this:
if ipt = ':rs':
execv(__file__)
It prints out a bunch of crap in interim, though.
I also have a process running in another object, that listens to Google's IMAP server in a While-loop like so:
While True:
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('myemail#gmail', 'mypassword')
mail.list()
mail.select("inbox")
result, data = mail.uid('search', None, 'All')
latest_email_uid = data[0].split()[-1] #grabs the most recent email by
#unique id number
if int(latest_email_uid) != int(last_email_uid): # set earlier from sql
# database
# do stuff with the mail
else:
continue
Through watching top, I noticed I was creating zombies when I restarted, so I created a termination function:
def process_terminator(self):
self.imap_listener.terminate()
And I called it from restart:
if ipt == ':rs':
self.process_object.terminate()
execv(__file__)
However, the zombie processes still persist. So, after a few hours of work I realized that adding a time.sleep period after calling the function AND either setting a local variable to the process' exitcode OR printing the process' exitcode would allow the process to terminate, even if it was just 0.1 second:
if ipt == ':rs':
self.process_object.terminate()
time.sleep(.1)
print(self.process_object.imap_listener.exitcode)
execv(__file__)
This is not the case in OSX, though, simply executing a process' .terminate() function ends the process, however on my debian machine, I HAVE to have a sleep(n) period AND HAVE to refer to a process' exitcode in some form or fashion to prevent it from zombying.
I have also tried using .join, though that hangs up my entire script. I have tried creating variables to have the process break its while loop when (for example) self.terminated = 1, then join, however that does not work either.
I don't have this issue when running exec('quit'), so long as I terminate then process, .join() does not work.
Can someone please point out any misunderstandings on my part? I have tried doing my own research but have not found a sufficient solution, and I am aware that processes should not be explicitly terminated as they will not exit nicely, but I've found no other way after hours of work.
Sorry that I do not have more code to provide, will do my best to provide more if needed, these are just snippets of relevant code from my script (1000+ lines).

You might start here: https://en.wikipedia.org/wiki/Zombie_process. The parent process has to reap its children when they exit, for example by using waitpid():
os.waitpid(pid, options)
Waits for a particular child process to terminate and returns the pid of the deceased process, or -1 if there is no such child process. On some systems, a value of 0 indicates that there are processes still running.

Related

subprocces.Popen, kill process started with sudo

I am trying to start and later kill a process that requires sudo via a python-script. Even if the python script itself is run with sudo and kill() does not give any permission errors the process is not killed (and never receives SIGKILL).
Investigating this, i found out that Popen() returns the the process id of the sudo process, i assume at least, rather than the process i want to control. So when i correctly kill it later the underlying process keeps running. (Although if i kill the python program before killing the sudo process in python code the underlying process is also killed, so i guess there must be a way to do this manually, too).
I know it might be an option to use pgrep or pidof to search for the correct process, but as the processes name might not be unique it seems unnescessarly error prone (it might also occur that a process with the same name is started around the same time, so taking the latest one might not help).
Is there any solution to get reliably the pid of the underlying process started with sudo in python?
Using Python3.
My code for conducting the tests, taken slightly modified from https://stackoverflow.com/a/43417395/1171541:
import subprocess, time
cmd = ["sudo", "testscript.sh"]
def myfunction(action, process=None):
if action === "start":
process = subprocess.Popen(cmd)
return process
if action === "stop"
# kill() and send_signal(signal.SIGTERM) do not work either
process.terminate()
process = myfunction("start")
time.sleep(5)
myfunction("stop", process);
Okay, i can answer my own question here (which i found on https://izziswift.com/how-to-terminate-a-python-subprocess-launched-with-shelltrue/). The trick was to open the process with:
subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True, preexec_fn=os.setsid)
and then kill it:
os.killpg(os.getpgid(process.pid), signal.SIGTERM)
This time i use a shell to open and use the os to kill all the processes in the process group.

os.system and subprocess.run make my multi threaded process freeze until call ends

I am new to python and having some problems.
I wrote an update_manager class that can communicate with the user via Tcp and preform installations of different components.
My update_manager class uses 2 other classes(they are his members) to accomplish this. The first is used for TCP communication and the second for actual installation. the installation class runs from the main thread and the communication is run by using Threading.thread() function.
my main locks like this:
if __name__ == "__main__":
new_update = UpdateManager()
#time.sleep(10)
new_update.run()
and the run functions is:
def run(self):
comm_thread = threading.Thread(target=
self._comm_agent.start_server_tcp_comunication)
comm_thread.start()
while True:
if (False == self.is_recovery_required()):
self.calculate_free_storage_for_update_zip_extraction()
self.start_communication_with_client_in_state_machine()
self._comm_agent.disable_synchronized_communication()
self.start_update_install()
self._comm_agent.enable_synchronized_communication()
if (True == self.is_dry_run_requested()):
self.preform_cleanup_after_dry_run()
else:
self.reset_valid_states()
self.preform_clean_up_after_update_cycle()
I use 2 multiprocessing.Queue() to sync between the threads and between the user. One for incoming messages and one for outgoing messages.
At first TCP communication is synchronous, user provides installation file and few other things.
Once installation begins TCP communication is no longer synchronous.
During the installation I use 4 different install methods. and all but one work just fine with no problem(user can pool the update_manager process and ask progress questions and get immediate a reply)
The problematic one is the instantiation of rpm files. for this I tried calling for os.system() and subprocess.run() and it works but for big rpm files I notices the entire process with my threads freezes until
the call finishes(I can see the progress bar of rpm installation on my screen during this freeze).
What I noticed and tried:
1.There is no freeze during other installation methods which use python.
libraries.
2.Once user connects via TCP there are only 2 threads for the update_manager, once first request is sent and a reply is send back 2 more threads appear (I assume it have something to do with the queues I use).
3.I created third thread that prints time(and has nothing to do with the queues), and I start it as soon as update_manager process starts. When the 2 threads freeze this one keeps going.
4.On some rare occasions process will unfreeze just for a message to go throw from client to update_manager and freeze back.
Edit: I forgot one more important point
5. The freeze occurs when calling:
os.system("rpm --nodeps --force -ivh rpm_file_name")
But does not happen when calling:
os.system("sleep 5")
I would really appreciate some indigent, Thanks.
The problem was with the incoming queue.
I used:
if (True == self._outgoing_message_queue.empty()):
temp: dict = self._outgoing_message_queue.get()
This is a simple bug, thread just got stuck on an empty queue.
But even If the code is changed to
if (False == self._outgoing_message_queue.empty()):
temp: dict = self._outgoing_message_queue.get()
it might cause the same behavior because between the moment the if statement is evaluated and the moment the get() is called a contact switch might occur and the queue might become empty and thread will get stuck on .get() as in my original code.
A better solution is to use get_nowait()
try:
temp = self._outgoing_message_queue.get_nowait()
except:
temp = None

Windows python script to run a server and continue

I am working on my python script to launch a server, may be in background or in a different process and then further do some processing before killing the launched server.
Once the rest of the processing is over, then kill the launched server.
For Example
server_cmd = 'launch_server.exe -source '+ inputfile
print server_cmd
cmd_pid = subprocess.Popen(server_cmd).pid
...
...
... #Continue doing some processing
cmd_pid.terminate() # Once the processing is done, terminate the server
Some how the script does not continue after launching the server as the server may be running in infinite loop listening for a request. Is there a good way to send this process in background so that it doesn't expect for command line input.
I am using Python 2.7.8
It's odd that your script does not continue after launching the server command. In subprocess module, Popen starts another child process while the parent process (your script) should move on.
However in your code there's already a bug: cmd_pid is an int object and does not have terminate method. You should use subprocess.Popen object to call terminate method.
Making a small change resolved the problem
server_proc = subprocess.Popen(server_cmd, stdout=subprocess.PIPE)
server_proc.terminate()
Thanks Xu for correction in terminate.

Python Daemon: checking to have one daemon run at all times

myalert.py
from daemon import Daemon
import os, time, sys
class alertDaemon(Daemon):
def run(self):
while True:
time.sleep(1)
if __name__ == "__main__":
alert_pid = '/tmp/ex.pid'
# if pid doesnt exists run
if os.path.isfile(alert_pid): # is this check enough?
sys.exit(0)
daemon = alertDaemon(alert_pid)
daemon.start()
Given that no other programs or users will create the pid file:
1) Is there a case where pid does not exists yet the daemon process still running?
2) Is there a case where pid does exists yet the daemon isnt running?
Because if answer is yes to at least one of the questions above, then simply checking for the existence of pid file isnt enough if my goal is have one daemon running at all times.
Q: If i have to check for the process then, I am hoping of avoid something like system call ps -ef and grep for the name of the script. Is there a standard way of doing this?
Note: the script, myalert.py, will be a cronjob
The python-daemon library, which is the reference implementation for PEP 3143: "Standard daemon process library", handles this by using a file lock (via the lockfile library) on the pid file you pass to the DaemonContext object. The underlying OS guarantees that the file lock will be released when the daemon process exits, even if its uncleanly exited. Here's a simple usage example:
import daemon
from daemon.pidfile import PIDLockFile
context = daemon.DaemonContext(
pidfile= PIDLockFile('/var/run/spam.pid'),
)
with context:
main()
So, if a new instance starts up, it doesn't have to determine if the process that created the existing pid file is still running via the pid itself; if it can acquire the file lock, then no other instances are running (since they'd have acquired the lock). If it can't acquire the lock, then another daemon instance must be running.
The only way you'd run into trouble is if someone came along and manually deleted the pid file while the daemon was running. But I don't think you need to worry about someone deliberately breaking things in that way.
Ideally, python-daemon would be part of the standard library, as was the original goal of PEP 3143. Unfortunately, the PEP got deferred, essentially because there was no one willing to actually do the remaining work needed to get in added to the standard library:
Further exploration of the concepts covered in this PEP has been
deferred for lack of a current champion interested in promoting the
goals of the PEP and collecting and incorporating feedback, and with
sufficient available time to do so effectively.
Several ways in which I saw this implemented:
Check wheter pidfile exists -> if so, exit with an error message like "pid file exists -- rm it if you're sure no process is running"
Check whether pidfile exists -> if so, check whether process with that pid exists -> if that's the case, die telling the user "process is running..". The risk of conflicting (reused for another process) PID number is so small that it simply is ignored; telling the user how to make the program start again in case an error occurred
Hint: to check for a process existence, you can check for the /proc/<pid> directory
Also make sure you do all the possible to remove the pidfile when your script exits, eg:
Wrap code in a try .. finally:
# Check & create pidfile
try:
# your application logic
finally:
# remove pidfile
You can even install signal handlers (via the signal module) to remove pidfile upon receiving signals that would not normally raise an exception, but instead exit directly.

Cleaning up temp folder after long-running subprocess exits

I have a Python script (running inside another application) which generates a bunch of temporary images. I then use subprocess to launch an application to view these.
When the image-viewing process exists, I want to remove the temporary images.
I can't do this from Python, as the Python process may have exited before the subprocess completes. I.e I cannot do the following:
p = subprocess.Popen(["imgviewer", "/example/image1.jpg", "/example/image1.jpg"])
p.communicate()
os.unlink("/example/image1.jpg")
os.unlink("/example/image2.jpg")
..as this blocks the main thread, nor could I check for the pid exiting in a thread etc
The only solution I can think of means I have to use shell=True, which I would rather avoid:
import pipes
import subprocess
cmd = ['imgviewer']
cmd.append("/example/image2.jpg")
for x in cleanup:
cmd.extend(["&&", "rm", pipes.quote(x)])
cmdstr = " ".join(cmd)
subprocess.Popen(cmdstr, shell = True)
This works, but is hardly elegant..
Basically, I have a background subprocess, and want to remove the temp files when it exits, even if the Python process no longer exists.
If you're on any variant of Unix, you could fork your Python program, and have the parent process go on with its life while the child process daemonized, runs the viewer (doesn't matter in the least if that blocks the child process, which has no other job in life anyway;-), and cleans up after it. The original Python process may or may not exist at this point, but the "waiting to clean up" child process of course will (some process or other has to do the clean-up, after all, right?-).
If you're on Windows, or need cross-platform code, then have your Python program "spawn" (i.e., just start with subprocess, then go on with life) another (much smaller) one, which is the one tasked to run the viewer (blocking, who cares) and then do the clean-up. (If on Unix, even in this case you may want to daemonize, otherwise the child process might go away when the parent process does).

Categories