Strange process clone appears with python multiprocessing - python

I have faces a very strange behavior of Python. It looks like when I start parallel program which uses multiprocessing and in the main process spawn 2 more(producer, consumer) I see 4 processes running. I think there should be only 3: the main, Producer, Consumer. But after some time the 4th process appears.
I have made a minimal example of the code to reproduce the problem. It create two processes in which calculate Fibonacci numbers using recursion:
from multiprocessing import Process, Queue
import os, sys
import time
import signal
def fib(n):
if n == 1 or n == 2:
return 1
result = fib(n-1) + fib(n-2)
return result
def worker(queue, amount):
pid = os.getpid()
def workerProcess(a, b):
print a, b
print 'This is Writer(', pid, ')'
signal.signal(signal.SIGUSR1, workerProcess)
print 'Worker', os.getpid()
for i in range(0, amount):
queue.put(fib(35 - i % 4))
queue.put('end')
print 'Worker finished'
def writer(queue):
pid = os.getpid()
def writerProcess(a, b):
print a, b
print 'This is Writer(', pid, ')'
signal.signal(signal.SIGUSR1, writerProcess)
print 'Writer', os.getpid()
working = True
while working:
if not queue.empty():
value = queue.get()
if value != 'end':
fib(32 + value % 4)
else:
working = False
else:
time.sleep(1)
print 'Writer finished'
def daemon():
print 'Daemon', os.getpid()
while True:
time.sleep(1)
def useProcesses(amount):
q = Queue()
writer_process = Process(target=writer, args=(q,))
worker_process = Process(target=worker, args=(q, amount))
writer_process.daemon = True
worker_process.daemon = True
worker_process.start()
writer_process.start()
def run(amount):
print 'Main', os.getpid()
pid = os.getpid()
def killThisProcess(a, b):
print a, b
print 'Main killed by signal(', pid, ')'
sys.exit(0)
signal.signal(signal.SIGTERM, killThisProcess)
useProcesses(amount)
print 'Ready to exit main'
while True:
time.sleep(1)
def main():
run(1000)
if __name__=='__main__':
main()
What I see in the output is:
$ python python_daemon.py
Main 13257
Ready to exit main
Worker 13258
Writer 13259
but in htop I see the following:
And it looks like the process with PID 13322 is actually a thread. The question is what is it? Who spawn it? Why?
If I send SIGUSR1 to this PID I see in the output appears:
10 <frame object at 0x7f05c14ed5d8>
This is Writer( 13258 )
This question is slightly related with: Python multiprocessing: more processes than requested

The threads belongs to the Queue object.
It uses internally a thread to dispatch the data over a Pipe.
From the docs:
class multiprocessing.Queue([maxsize])
Returns a process shared queue implemented using a pipe and a few locks/semaphores. When a process first puts an item on the queue a feeder thread is started which transfers objects from a buffer into the pipe.

Related

The same process in the process pool will be executed multiple times

I wrote a python script to test the multi-process pool and use apply_async to call the class method. But why does the same process (same pid) output multiple times in the output?
OS: centos-7.4
PYTHON: python-2.7
#!/usr/bin/env python
import time
import os
from multiprocessing import Pool
class New(object):
def __init__(self):
self.pid = os.getpid()
def gen(self, num):
pid = os.getpid()
print 'NEW PROCESS PID IS {}'.format(pid)
return (pid, num)
def log(self, pid):
print 'START WRITE {} INTO FILE'.format(pid[0])
with open('log', 'a') as f:
f.write('CURRENT PROCESS IS {} <--> NUM IS {}\n'.format(pid[0], pid[1]))
def start(self):
print 'CREATE MAIN PROCESS {}'.format(self.pid)
self.pool = Pool()
num = 0
while True:
narg = num
self.pool.apply_async(self, args=(narg,), callback=self.log)
num += 1
time.sleep(2)
self.pool.close()
self.pool.join()
def __call__(self, num):
return self.gen(num)
def __getstate__(self):
self_dict = self.__dict__.copy()
del self_dict['pool']
return self_dict
def __setstate__(self, state):
self.__dict__.update(state)
if __name__ == '__main__':
new = New()
new.start()
The following is the result of the script print, the same process id output twice,The specific code is below。
eg:
NEW PROCESS PID IS 14459
START WRITE 14459 INTO FILE
NEW PROCESS PID IS 14459
START WRITE 14459 INTO FILE
callback of apply_async will write some lines into file.
The output at the same time is as follows
eg:
CURRENT PROCESS IS 14459 <--> NUM IS 29
CURRENT PROCESS IS 14459 <--> NUM IS 30
I just want to get one print and write for one process.
The behavior you're observing is expected. The point of using a multiprocessing.Pool() is to distribute the work over a pool of workers (i.e., processes). See multiprocessing.Pool with maxtasksperchild produces equal PIDs for one way to achieve what you want. But honestly, it seems to me you should just be using multiprocessing.Process() if you want to spawn a new process for each iteration of the inner loop.

Timeouts for multiprocessing?

I've searched StackOverflow and although I've found many questions on this, I haven't found an answer that fits for my situation/not a strong python programmer to adapt their answer to fit my need.
I've looked here to no avail:
kill a function after a certain time in windows
Python: kill or terminate subprocess when timeout
signal.alarm replacement in Windows [Python]
I am using multiprocessing to run multiple SAP windows at once to pull reports. The is set up to run on a schedule every 5 minutes. Every once in a while, one of the reports gets stalled due to the GUI interface and never ends. I don't get an error or exception, it just stalls forever. What I would like is to have a timeout function that during this part of the code that is executed in SAP, if it takes longer than 4 minutes, it times out, closes SAP, skips the rest of the code, and waits for next scheduled report time.
I am using Windows Python 2.7
import multiprocessing
from multiprocessing import Manager, Process
import time
import datetime
### OPEN SAP ###
def start_SAP():
print 'opening SAP program'
### REPORTS IN SAP ###
def report_1(q, lock):
while True: # logic to get shared queue
if not q.empty():
lock.acquire()
k = q.get()
time.sleep(1)
lock.release()
break
else:
time.sleep(1)
print 'running report 1'
def report_2(q, lock):
while True: # logic to get shared queue
if not q.empty():
lock.acquire()
k = q.get()
time.sleep(1)
lock.release()
break
else:
time.sleep(1)
print 'running report 2'
def report_3(q, lock):
while True: # logic to get shared queue
if not q.empty():
lock.acquire()
k = q.get()
time.sleep(1)
lock.release()
break
else:
time.sleep(1)
time.sleep(60000) #mimicking the stall for report 3 that takes longer than allotted time
print 'running report 3'
def report_N(q, lock):
while True: # logic to get shared queue
if not q.empty():
lock.acquire()
k = q.get()
time.sleep(1)
lock.release()
break
else:
time.sleep(1)
print 'running report N'
### CLOSES SAP ###
def close_SAP():
print 'closes SAP'
def format_file():
print 'formatting files'
def multi_daily_pull():
lock = multiprocessing.Lock() # creating a lock in multiprocessing
shared_list = range(6) # creating a shared list for all functions to use
q = multiprocessing.Queue() # creating an empty queue in mulitprocessing
for n in shared_list: # putting list into the queue
q.put(n)
print 'Starting process at ', time.strftime('%m/%d/%Y %H:%M:%S')
print 'Starting SAP Pulls at ', time.strftime('%m/%d/%Y %H:%M:%S')
StartSAP = Process(target=start_SAP)
StartSAP.start()
StartSAP.join()
report1= Process(target=report_1, args=(q, lock))
report2= Process(target=report_2, args=(q, lock))
report3= Process(target=report_3, args=(q, lock))
reportN= Process(target=report_N, args=(q, lock))
report1.start()
report2.start()
report3.start()
reportN.start()
report1.join()
report2.join()
report3.join()
reportN.join()
EndSAP = Process(target=close_SAP)
EndSAP.start()
EndSAP.join()
formatfile = Process(target=format_file)
formatfile .start()
formatfile .join()
if __name__ == '__main__':
multi_daily_pull()
One way to do what you want would be to use the optional timeout argument that the Process.join() method accepts. This will make it only block the calling thread at most that length of time.
I also set the daemon attribute of each Process instance so your main thread will be able to terminate even if one of the processes it started is still "running" (or has hung up).
One final point, you don't need a multiprocessing.Lock to control access a multiprocessing.Queue, because they handle that aspect of things automatically, so I removed it. You may still want to have one for some other reason, such as controlling access to stdout so printing to it from the various processes doesn't overlap and mess up what is output to the screen.
import multiprocessing
from multiprocessing import Process
import time
import datetime
def start_SAP():
print 'opening SAP program'
### REPORTS IN SAP ###
def report_1(q):
while True: # logic to get shared queue
if q.empty():
time.sleep(1)
else:
k = q.get()
time.sleep(1)
break
print 'report 1 finished'
def report_2(q):
while True: # logic to get shared queue
if q.empty():
time.sleep(1)
else:
k = q.get()
time.sleep(1)
break
print 'report 2 finished'
def report_3(q):
while True: # logic to get shared queue
if q.empty():
time.sleep(1)
else:
k = q.get()
time.sleep(60000) # Take longer than allotted time
break
print 'report 3 finished'
def report_N(q):
while True: # logic to get shared queue
if q.empty():
time.sleep(1)
else:
k = q.get()
time.sleep(1)
break
print 'report N finished'
def close_SAP():
print 'closing SAP'
def format_file():
print 'formatting files'
def multi_daily_pull():
shared_list = range(6) # creating a shared list for all functions to use
q = multiprocessing.Queue() # creating an empty queue in mulitprocessing
for n in shared_list: # putting list into the queue
q.put(n)
print 'Starting process at ', time.strftime('%m/%d/%Y %H:%M:%S')
print 'Starting SAP Pulls at ', time.strftime('%m/%d/%Y %H:%M:%S')
StartSAP = Process(target=start_SAP)
StartSAP.start()
StartSAP.join()
report1 = Process(target=report_1, args=(q,))
report1.daemon = True
report2 = Process(target=report_2, args=(q,))
report2.daemon = True
report3 = Process(target=report_3, args=(q,))
report3.daemon = True
reportN = Process(target=report_N, args=(q,))
reportN.daemon = True
report1.start()
report2.start()
report3.start()
reportN.start()
report1.join(30)
report2.join(30)
report3.join(30)
reportN.join(30)
EndSAP = Process(target=close_SAP)
EndSAP.start()
EndSAP.join()
formatfile = Process(target=format_file)
formatfile .start()
formatfile .join()
if __name__ == '__main__':
multi_daily_pull()

threads not running parallel in python script

I am new to python and threading. I am trying to run multiple threads at a time. Here is my basic code :
import threading
import time
threads = []
print "hello"
class myThread(threading.Thread):
def __init__(self,i):
threading.Thread.__init__(self)
print "i = ",i
for j in range(0,i):
print "j = ",j
time.sleep(5)
for i in range(1,4):
thread = myThread(i)
thread.start()
While 1 thread is waiting for time.sleep(5) i want another thread to start. In short, all the threads should run parallel.
You might have some misunderstandings on how to subclass threading.Thread, first of all __init__() method is roughly what represents a constructor in Python, basically it'll get executed every time you create an instance, so in your case when thread = myThread(i) executes, it'll block till the end of __init__().
Then you should move your activity into run(), so that when start() is called, the thread will start to run. For example:
import threading
import time
threads = []
print "hello"
class myThread(threading.Thread):
def __init__(self, i):
threading.Thread.__init__(self)
self.i = i
def run(self):
print "i = ", self.i
for j in range(0, self.i):
print "j = ",j
time.sleep(5)
for i in range(1,4):
thread = myThread(i)
thread.start()
P.S. Because of the existence of GIL in CPython, you might not be able to fully take advantages of all your processors if the task is CPU-bound.
Here is an example on how you could use threading based on your code:
import threading
import time
threads = []
print "hello"
def doWork(i):
print "i = ",i
for j in range(0,i):
print "j = ",j
time.sleep(5)
for i in range(1,4):
thread = threading.Thread(target=doWork, args=(i,))
threads.append(thread)
thread.start()
# you need to wait for the threads to finish
for thread in threads:
thread.join()
print "Finished"
import threading
import subprocess
def obj_func(simid):
simid = simid
workingdir = './' +str (simid) # the working directory for the simulation
cmd = './run_delwaq.sh' # cmd is a bash commend to launch the external execution
subprocess.Popen(cmd, cwd=workingdir).wait()
def example_subprocess_files():
num_threads = 4
jobs = []
# Launch the threads and give them access to the objective function
for i in range(num_threads):
workertask = threading.Thread(target=obj_func(i))
jobs.append(workertask)
for j in jobs:
j.start()
for j in jobs:
j.join()
print('All the work finished!')
if __name__ == '__main__':
example_subprocess_files()
This one not works for my case that the task is not print but CPU-Intensive task. The thread are excluded in serial.

Python threading: Queue workers - hang the program

I'm trying to implement a simple Queue with workers that do something.
The program should wait until the workers have finished emptying the queue, and continue execution.
I took the documentation example and tried to implement it in a class, since this is how it's gonna be implemented in my project.
Like this:
class Test:
def __init__(self, n, q):
self.q = Queue()
print "Starting workers..."
for i in range(n):
t = threading.Thread(target=self.worker)
t.daemon = True
t.start()
print "Workers started"
for i in range(q):
self.q.put(i)
self.q.join()
print "Exiting"
def worker(self):
name = threading.currentThread().getName()
print "Thread %s started" % name
while True:
item = self.q.get()
print "Processing item %d" % item
sleep(1)
self.q.task_done()
When instantiating the class t = Test(2, 100), all I can see is the "Thread... started" messages and the program hangs.
What is wrong with the code?
EDIT:
I just noticed that while this code hangs in IDLE (where I tested it), it performs flawlessly on the command line.
Looks like an environmental problem.
Yes, this have to be an environmental problem. I even tested it on a few different editors and PCs.
Output
Starting workers...
Thread Thread-1 started
Thread Thread-2 started
Workers started
Processing item 0
Processing item 1
Processing item 2
Processing item 3
Processing item 4
Processing item 5
Processing item 6
Processing item 7
Processing item 8
Processing item 9
Exiting
Code:
from Queue import Queue
import threading
from time import sleep
class Test:
def __init__(self, n, q):
self.q = Queue()
print "Starting workers..."
for i in range(n):
t = threading.Thread(target=self.worker)
t.daemon = True
t.start()
print "Workers started"
for i in range(q):
self.q.put(i)
self.q.join()
print "Exiting"
def worker(self):
name = threading.currentThread().getName()
print "Thread %s started" % name
while True:
item = self.q.get()
print "Processing item %d" % item
sleep(1)
self.q.task_done()
t = Test(2, 10)

Deliberately make an orphan process in python

I have a python script (unix-like, based on RHEL), called MyScript, that has two functions, called A and B. I'd like them to run in different, independent processes (detach B and A):
Start script MyScript
Execute function A
Spawn a new process, passing data from function A to B
While function B runs, continue with function A
When function A completes, exit MyScript even if B is still running
I thought I should use multiprocessing to create a daemon process, but the documentation suggests that's not the right usecase. So, I decided to spawn a child process and child^2 process (the child's child), and then force the child to terminate. While this workaround appears to work, it seems really ugly.
Can you help me make it more pythonic? Does the subprocess module have a method that will operate on a function? Sample code below.
import multiprocessing
import time
import sys
import os
def parent_child():
p = multiprocessing.current_process()
print 'Starting parent child:', p.name, p.pid
sys.stdout.flush()
cc = multiprocessing.Process(name='childchild', target=child_child)
cc.daemon = False
cc.start()
print 'Exiting parent child:', p.name, p.pid
sys.stdout.flush()
def child_child():
p = multiprocessing.current_process()
print 'Starting child child:', p.name, p.pid
sys.stdout.flush()
time.sleep(30)
print 'Exiting child child:', p.name, p.pid
sys.stdout.flush()
def main():
print 'starting main', os.getpid()
d = multiprocessing.Process(name='parentchild', target=parent_child)
d.daemon = False
d.start()
time.sleep(5)
d.terminate()
print 'exiting main', os.getpid()
main()
Here is just a random version of your original code that moves the functionality into a single call spawn_detached(callable). It keeps the detached process running even after the program exits:
import time
import os
from multiprocessing import Process, current_process
def spawn_detached(callable):
p = _spawn_detached(0, callable)
# give the process a moment to set up
# and then kill the first child to detach
# the second.
time.sleep(.001)
p.terminate()
def _spawn_detached(count, callable):
count += 1
p = current_process()
print 'Process #%d: %s (%d)' % (count, p.name, p.pid)
if count < 2:
name = 'child'
elif count == 2:
name = callable.func_name
else:
# we should now be inside of our detached process
# so just call the function
return callable()
# otherwise, spawn another process, passing the counter as well
p = Process(name=name, target=_spawn_detached, args=(count, callable))
p.daemon = False
p.start()
return p
def operation():
""" Just some arbitrary function """
print "Entered detached process"
time.sleep(15)
print "Exiting detached process"
if __name__ == "__main__":
print 'starting main', os.getpid()
p = spawn_detached(operation)
print 'exiting main', os.getpid()

Categories