gevent - hub.loop.reinit() does not work after fork - python

The do_magic function would be called twice in the following example, both in parent and child process.
My confusion is the os.fork has been replaced with gevent.fork, and hub.loop.reinit() would be called in child process. If so, why do_magic still be called in child process?
import gevent
from gevent import monkey
monkey.patch_all()
import os, time
def do_magic():
print 'magic...'
def main():
g = gevent.spawn_later(1, do_magic)
pid = os.fork()
if pid != 0: # parent
g.join()
else:
gevent.get_hub().loop.reinit()
time.sleep(3)
main()

Related

Tornado ioloop instance seems to be shared across processes

In a multiprocessing application, a main process spawns multiple sub processes. Each process is meant to run its own Tornado ioloop. However, I noticed that when the process is started, all the instances of IOLoop.current() (in main and all the sub processes) are the same. Wouldn't that mean that ioloop.spawn_callback(my_func) runs all in one ioloop context (in the main process)?
Here's a minimal example that I could extract:
from tornado.ioloop import IOLoop
import time
from multiprocessing import Process
def sub(i):
print('sub %d: %s' % (i, hex(id(IOLoop.current(True)))))
for i in range(10):
time.sleep(1)
def main():
print('main ', hex(id(IOLoop.current(True))))
for i in range(2):
sub_process = Process(target=sub, args=(i, ))
sub_process.daemon = True
sub_process.start()
time.sleep(5)
main()
Output:
main 0x7f14a09cf750
sub 0: 0x7f14a09cf750
sub 1: 0x7f14a09cf750
Are the processes created correctly and isn't the expected behaviour that there would be multiple ioloop instances?
This is mentioned in Tornado's docs
it is important that nothing touches the global IOLoop instance (even indirectly) before the fork
You can get the behavior you want using a slightly modified main function:
def main():
processes = []
for i in range(2):
process = Process(target=sub, args=(i,))
process.daemon = True
process.start()
processes.append(process)
print('main ', hex(id(IOLoop.current(True))))
time.sleep(5)
Output:
main 0x7fbd4ca0da30
sub 0: 0x7fbd4ca0db50
sub 1: 0x7fbd4ca0dc40
Edit
As for the explanation: the sharing is due to due to how fork is implemented in Linux: using COW (copy-on-write); this means that unless you write to the shared object in the child process, both parent and child will share the same object. As soon as the child modifies the shared object it will be copied and changed (these changes won't be visible in the parent).

Python callback for a multiprocess Queue or Pipe

Is there a way to create a callback that executes whenever something is sent to the main process from a child process initiated via multiprocessing? The best I can think of thus far is:
import multiprocessing as mp
import threading
import time
class SomeProcess(mp.Process):
def run(self):
while True
time.sleep(1)
self.queue.put(time.time())
class ProcessListener(threading.Thread):
def run(self):
while True:
value = self.queue.get()
do_something(value)
if __name__ = '__main__':
queue = mp.Queue()
sp = SomeProcess()
sp.queue = queue
pl = ProcessListener()
pl.queue = queue
sp.start()
pl.start()
No there is no other clean way to do so than the one you already posted.
This is how concurrent.fututes.ProcessPoolExecutor and multiprocessing.Pool are actually implemented. They have a dedicated thread which drains the tasks/results queue and run any associated callback.
If you want to save some resource, you can use a SimpleQueue in this case.

Register SIGTEM handler only for parent process

I have the following program(this is replicated version of a complex program, but pretty much covers my problem statement).
A SIGTERM handler is registered before spanning the new process. I am unable to find a way to restraint the child process from inheriting this handler. I want to do some cleanup activities, but only once for the parent. That said, child process should not have any SIGTERM handlers.
One way might be to overwrite the sigterm handler(after process spawn) and not do anything there. But that seems a redundant code. Can someone help exploring other ways to do this.
from multiprocessing import Process
import signal
import os
import time
import psutil
def terminateChildProcesses():
"""
Terminate all child processes
"""
current = psutil.Process()
children = current.children(recursive=True)
for child in children:
print "Terminating %s: %s" % (child.pid, ''.join(child.cmdline()))
child.terminate()
def target():
time.sleep(100)
if __name__ == "__main__":
def handle_sigterm(*a):
print "I am handled: {}".format(os.getpid())
# terminate child processes
terminateChildProcesses()
os.kill(os.getpid(), 9)
signal.signal(signal.SIGTERM, handle_sigterm)
p = Process(target=target)
p.start()
target()
As far as I think, you can use multiprocessing.current_process() to conditionally register this handler:
from multiprocessing import current_process
if current_process().name == 'MainProcess':
signal.signal(signal.SIGTERM, handle_sigterm)
Starting with Python 3.7, you can also use os.register_at_fork() to restore the previous handler:
import os
import signal
handler = signal.signal(signal.SIGTERM, handle_sigterm)
os.register_at_fork(after_in_child=lambda: signal.signal(signal.SIGTERM, handler))

Pythonic way to detach a process?

I'm running an etcd process, which stays active until you kill it. (It doesn't provide a daemon mode option.) I want to detach it so I can keep running more python.
What I would do in the shell;
etcd & next_cmd
I'm using python's sh library, at the enthusiastic recommendation of the whole internet. I'd rather not dip into subprocess or Popen, but I haven't found solutions using those either.
What I want;
sh.etcd(detach=True)
sh.next_cmd()
or
sh.etcd("&")
sh.next_cmd()
Unfortunately detach is not a kwarg and sh treats "&" as a flag to etcd.
Am I missing anything here? What's the good way to do this?
To implement sh's &, avoid cargo cult programming and use subprocess module directly:
import subprocess
etcd = subprocess.Popen('etcd') # continue immediately
next_cmd_returncode = subprocess.call('next_cmd') # wait for it
# ... run more python here ...
etcd.terminate()
etcd.wait()
This ignores exception handling and your talk about "daemon mode" (if you want to implement a daemon in Python; use python-daemon. To run a process as a system service, use whatever your OS provides or a supervisor program such as supervisord).
Author of sh here. I believe you want to use the _bg special keyword parameter http://amoffat.github.io/sh/#background-processes
This will fork your command and return immediately. The process will continue to run even after your script exits.
Note in the following two examples there is a call to
time.sleep(...) to give etcd time to finish starting up before we
send it a request. A real solution would probably involving probing
the API endpoint to see if it was available and looping if not.
Option 1 (abusing the multiprocessing module):
import sh
import requests
import time
from multiprocessing import Process
etcd = Process(target=sh.etcd)
try:
# start etcd
etcd.start()
time.sleep(3)
# do other stuff
r = requests.get('http://localhost:4001/v2/keys/')
print r.text
finally:
etcd.terminate()
This uses the multiprocessing module to handle the mechanics of
spawning a background tasks. Using this model, you won't see the
output from etcd.
Option 2 (tried and true):
import os
import signal
import time
import requests
pid = os.fork()
if pid == 0:
# start etcd
os.execvp('etcd', ['etcd'])
try:
# do other stuff
time.sleep(3)
r = requests.get('http://localhost:4001/v2/keys/')
print r.text
finally:
os.kill(pid, signal.SIGTERM)
This uses the traditional fork and exec model, which works just as
well in Python as it does in C. In this model, the output of etcd
will show up on your console, which may or may not be what you want. You can control this by redirecting stdout and stderr in the child process.
subprocess is easy enough to do this too:
This approach works (python3). The key is using "start_new_session=True"
UPDATE: despite Popen docs saying this works, it does not. I found by forking the child and then doing os.setsid() it works as I want
client.py:
#!/usr/bin/env python3
import time
import subprocess
subprocess.Popen("python3 child.py", shell=True, start_new_session=True)
i = 0
while True:
i += 1
print("demon: %d" % i)
time.sleep(1)
child.py:
#!/usr/bin/env python3
import time
import subprocess
import os
pid = os.fork()
if (pid == 0):
os.setsid()
i = 0
while True:
i += 1
print("child: %d" % i)
time.sleep(1)
if i == 10:
print("child exiting")
break
output:
./client.py
demon: 1
child: 1
demon: 2
child: 2
^CTraceback (most recent call last):
File "./client.py", line 9, in <module>
time.sleep(1)
KeyboardInterrupt
$ child: 3
child: 4
child: 5
child: 6
child: 7
child: 8
child: 9
child: 10
child exiting
Posting this if for no other reason than finding it next time I google the same question:
if os.fork() == 0:
os.close(0)
os.close(1)
os.close(2)
subprocess.Popen(('etcd'),close_fds=True)
sys.exit(0)
Popen close_fds closes the file descriptors other than 0,1,2, so the code closes them explicitly.

how to kill zombie processes created by multiprocessing module?

I'm very new to multiprocessing module. And I just tried to create the following: I have one process that's job is to get message from RabbitMQ and pass it to internal queue (multiprocessing.Queue). Then what I want to do is : spawn a process when new message comes in. It works, but after the job is finished it leaves a zombie process not terminated by it's parent. Here is my code:
Main Process:
#!/usr/bin/env python
import multiprocessing
import logging
import consumer
import producer
import worker
import time
import base
conf = base.get_settings()
logger = base.logger(identity='launcher')
request_order_q = multiprocessing.Queue()
result_order_q = multiprocessing.Queue()
request_status_q = multiprocessing.Queue()
result_status_q = multiprocessing.Queue()
CONSUMER_KEYS = [{'queue':'product.order',
'routing_key':'product.order',
'internal_q':request_order_q}]
# {'queue':'product.status',
# 'routing_key':'product.status',
# 'internal_q':request_status_q}]
def main():
# Launch consumers
for key in CONSUMER_KEYS:
cons = consumer.RabbitConsumer(rabbit_q=key['queue'],
routing_key=key['routing_key'],
internal_q=key['internal_q'])
cons.start()
# Check reques_order_q if not empty spaw a process and process message
while True:
time.sleep(0.5)
if not request_order_q.empty():
handler = worker.Worker(request_order_q.get())
logger.info('Launching Worker')
handler.start()
if __name__ == "__main__":
main()
And here is my Worker:
import multiprocessing
import sys
import time
import base
conf = base.get_settings()
logger = base.logger(identity='worker')
class Worker(multiprocessing.Process):
def __init__(self, msg):
super(Worker, self).__init__()
self.msg = msg
self.daemon = True
def run(self):
logger.info('%s' % self.msg)
time.sleep(10)
sys.exit(1)
So after all the messages gets processed I can see processes with ps aux command. But I would really like them to be terminated once finished.
Thanks.
Using multiprocessing.active_children is better than Process.join. The function active_children cleans any zombies created since the last call to active_children. The method join awaits the selected process. During that time, other processes can terminate and become zombies, but the parent process will not notice, until the awaited method is joined. To see this in action:
import multiprocessing as mp
import time
def main():
n = 3
c = list()
for i in range(n):
d = dict(i=i)
p = mp.Process(target=count, kwargs=d)
p.start()
c.append(p)
for p in reversed(c):
p.join()
print('joined')
def count(i):
print(f'{i} going to sleep')
time.sleep(i * 10)
print(f'{i} woke up')
if __name__ == '__main__':
main()
The above will create 3 processes that terminate 10 seconds apart each. As the code is, the last process is joined first, so the other two, which terminated earlier, will be zombies for 20 seconds. You can see them with:
ps aux | grep Z
There will be no zombies if the processes are awaited in the sequence that they will terminate. Remove the call to the function reversed to see this case. However, in real applications we rarely know the sequence that children will terminate, so using the method multiprocessing.Process.join will result in some zombies.
The alternative active_children does not leave any zombies.
In the above example, replace the loop for p in reversed(c): with:
while True:
time.sleep(1)
if not mp.active_children():
break
and see what happens.
A couple of things:
Make sure the parent joins its children, to avoid zombies. See Python Multiprocessing Kill Processes
You can check whether a child is still running with the is_alive() member function. See http://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process
Use active_children.
multiprocessing.active_children

Categories