I have a 3 processes running in one script. Process 1 passes data to Process 2, and then Process 2 passes data to Process 3. When I put data to queue2, error occurs that "Global name "queue2" is not defined", I am stuck on this error now...
if __name__ == '__main__':
queue1 = mp.Queue()
queue2 = mp.Queue()
p1 = mp.Process(target=f2, args=(queue1,))
p1.start()
p2 = mp.Process(target=f3, args=(queue2,))
p2.start()
f1()
def f1():
# do something to a get x
queue1.put(x)
def f2(q):
a = q.get()
# do something to a, to produce b
queue2.put(b) # error happens here: Global name "queue2" is not defined
def f3(q):
c = q.get()
# keeping processing c...
Just as you passed queue1 to f2, you also need to pass queue2.
You can declare the queues as global:
def f2(q):
global queue2
a = q.get()
queue2.put(b)
This works :
import multiprocessing as mp
queue1 = mp.Queue()
queue2 = mp.Queue()
def f1(q):
x = 5
# do something to a get x
q.put(x)
def f2(in_queue, out_queue):
a = in_queue.get()
b = a + 2
# do something to a, to produce b
out_queue.put(b)
def f3(q):
c = q.get()
print c
f1(queue1)
p1 = mp.Process(target=f2, args=(queue1, queue2))
p1.start()
p2 = mp.Process(target=f3, args=(queue2,))
p2.start()
Your code doesn't return the error you seem to have, it returns "f2 not defined" since you when you spawn the process p1, f2 is not a defined variable yet. The rule when you fork is that at creation time your processes must see the variables they use, i.e. they must be in the current scope.
To put it clearly, at spawning process time you inherit the current namespace from the parent process.
Related
I am pretty new to multiprocessing in python and trying to achieve something which should be a rather common thing to do. But I cannot find an easy way when searching the web.
I want to put data in a queue and then make this queue available to different consumer functions. Of course when getting an element from the queue, all consumer functions should get the same element. The following example should make clear what I want to achieve:
from multiprocessing import Process, Queue
def producer(q):
for i in range(10):
q.put(i)
q.put(None)
def consumer1(q):
while True:
data = q.get()
if data is None:
break
print(data)
def consumer2(q):
while True:
data = q.get()
if data is None:
break
print(data)
def main():
q = Queue()
p1 = Process(target=producer, args=(q,))
p2 = Process(target=consumer1, args=(q,))
p3 = Process(target=consumer2, args=(q,))
p1.start()
p2.start()
p3.start()
p1.join()
p2.join()
p3.join()
if __name__ == '__main__':
main()
Since the script is not terminating and I only get the print output of one function I guess this is not the way to do it. I think sharing a queue implies some things to consider? It works fine when using only one consumer function.
Appreciate the help!
If the values you are storing can be represented by one of the fundamental data types defined in the ctypes module, then the following could work. Here we are implementing a "queue" that can hold int values or None:
from multiprocessing import Process, Condition
import ctypes
from multiprocessing.sharedctypes import RawArray, RawValue
from threading import local
import time
my_local = local()
my_local.current = 0
class StructuredInt(ctypes.Structure):
"""
This class is necessary because we want to be able to store in the RawArray
either an int or None, which requires using ctypes.c_void_p as the array type.
But, infortunately, ctypes.c_void_p(0) is interpreted as None.
So we need a way to represent 0. Field value 'value' is the
actual int value being stored and we use an arbitrarty 'ptr'
field value that will not be interpreted as None.
To store a None value, we set 'ptr' to ctypes.c_void_p(None) and field
'value' is irrelevant.
To store an integer. we set 'ptr' to ctypes.c_void_p(1) and field
'value' has the actual value.
"""
_fields_ = [('ptr', ctypes.c_void_p), ('value', ctypes.c_int)]
class MultiIntQueue:
"""
An integer queue that can be processed by multiple threads where each thread
can retrieve all the values added to the queue.
:param maxsize: The maximum queue capacity (defaults to 20 if specified as None)
:type maxsize: int
"""
def __init__(self, maxsize=None):
if maxsize is None:
maxsize = 20
self.maxsize = maxsize
self.q = RawArray(StructuredInt, maxsize)
self.condition = Condition()
self.size = RawValue(ctypes.c_int, 0)
def get(self):
with self.condition:
while my_local.current >= self.size.value:
self.condition.wait()
i = self.q[my_local.current]
my_local.current += 1
return None if i.ptr is None else i.value
def put(self, i):
assert 0 <= self.size.value < self.maxsize
with self.condition:
self.q[self.size.value] = (ctypes.c_void_p(None), 0) if i is None else (ctypes.c_void_p(1), i)
self.size.value += 1
self.condition.notify_all()
def producer(q):
for i in range(10):
q.put(i)
time.sleep(.3) # simulate processing
q.put(None)
def consumer1(q):
while True:
data = q.get()
if data is None:
break
time.sleep(.1) # simulate processing
print('Consumer 1:', data)
def consumer2(q):
while True:
data = q.get()
if data is None:
break
time.sleep(.1) # simulate processing
print('Consumer 2:', data)
def main():
q = MultiIntQueue()
p1 = Process(target=producer, args=(q,))
p2 = Process(target=consumer1, args=(q,))
p3 = Process(target=consumer2, args=(q,))
p1.start()
p2.start()
p3.start()
p1.join()
p2.join()
p3.join()
if __name__ == '__main__':
main()
Prints:
Consumer 1: 0
Consumer 2: 0
Consumer 2: 1
Consumer 1: 1
Consumer 2: 2
Consumer 1: 2
Consumer 2: 3
Consumer 1: 3
Consumer 2: 4
Consumer 1: 4
Consumer 1: 5
Consumer 2: 5
Consumer 1: 6
Consumer 2: 6
Consumer 1: 7
Consumer 2: 7
Consumer 2: 8
Consumer 1: 8
Consumer 1: 9
Consumer 2: 9
Your question exemplifies the misunderstanding
"all consumer functions should get the same element"
That's just not how queues work. Queues are automatically managed (there's quite a lot under the hood) such if one item is put in, only one item can be taken out. That item is not duplicated to all consumers. It seems like you actually need two separate queues to guarantee that each consumer gets each input without competing against the other consumer:
from multiprocessing import Process, Queue
def producer(q1, q2):
for i in range(10):
q1.put(i)
q2.put(i)
q1.put(None)
q2.put(None)
def consumer1(q):
while True:
data = q.get()
if data is None:
break
print(data)
def consumer2(q):
while True:
data = q.get()
if data is None:
break
print(data)
def main():
q1 = Queue()
q2 = Queue()
p1 = Process(target=producer, args=(q1, q2))
p2 = Process(target=consumer1, args=(q1,))
p3 = Process(target=consumer2, args=(q2,))
p1.start()
p2.start()
p3.start()
p1.join()
p2.join()
p3.join()
if __name__ == '__main__':
main()
I am trying to run the following snippet which appends data to lists 'tests1' and 'tests2'. But when I print 'tests1' and 'tests2', the displayed list is empty. Anything incorrect here?
tests1 = []
tests2 = []
def func1():
for i in range(25,26):
tests1.append(test_loader.get_tests(test_prefix=new_paths[i],tags=params.get('tags', None),
exclude=params.get('exclude', False)))
def func2():
for i in range(26,27):
tests2.append(test_loader.get_tests(test_prefix=new_paths[i],tags=params.get('tags', None),
exclude=params.get('exclude', False)))
p1 = mp.Process(target=func1)
p2 = mp.Process(target=func2)
p1.start()
p2.start()
p1.join()
p2.join()
print tests1
print tests2
The worker processes don't actually share the same object. It gets copied (pickled).
You can send values between processes using a multiprocessing.Queue (or by various other means). See my simple example (in which I've made your tests into integers for simplicity).
from multiprocessing import Process, Queue
def add_tests1(queue):
for i in range(10):
queue.put(i)
queue.put(None)
def add_tests2(queue):
for i in range(100,110):
queue.put(i)
queue.put(None)
def run_tests(queue):
while True:
test = queue.get()
if test is None:
break
print test
if __name__ == '__main__':
queue1 = Queue()
queue2 = Queue()
add_1 = Process(target = add_tests1, args = (queue1,))
add_2 = Process(target = add_tests2, args = (queue2,))
run_1 = Process(target = run_tests, args = (queue1,))
run_2 = Process(target = run_tests, args = (queue2,))
add_1.start(); add_2.start(); run_1.start(); run_2.start()
add_1.join(); add_2.join(); run_1.join(); run_2.join()
Note that the parent program can also access the queues.
The end goal is to execute a method in background, but not in parallel : when multiple objects are calling this method, each should wait for their turn to proceed. To achieve running in background, I have to run the method in a subprocess (not a thread), and I need to start it using spawn (not fork). To prevent parallel executions, the obvious solution is to have a global lock shared between processes.
When processes are forked, which is the default on Unix, it is easy to achieve, as highlighted in both of the following codes.
We can share it as a class variable :
import multiprocessing as mp
from time import sleep
class OneAtATime:
l = mp.Lock()
def f(self):
with self.l:
sleep(1)
print("Hello")
if __name__ == "__main__":
a = OneAtATime()
b = OneAtATime()
p1 = mp.Process(target = a.f)
p2 = mp.Process(target = b.f)
p1.start()
p2.start()
Or we can pass it to the method :
import multiprocessing as mp
from time import sleep
class OneAtATime:
def f(self, l):
with l:
sleep(1)
print("Hello")
if __name__ == "__main__":
a = OneAtATime()
b = OneAtATime()
m = mp.Manager()
l = mp.Lock()
p1 = mp.Process(target = a.f, args = (l,))
p2 = mp.Process(target = b.f, args = (l,))
p1.start()
p2.start()
Both of these codes have the appropriate behaviour of printing "hello" at one second of interval.
However, when changing the start method to 'spawn', they become broken.
The first one (1) prints both "hello"s at the same time. This is because the internal state of a class is not pickled, so they do not have the same lock.
The second one (2) fails with FileNotFoundError at runtime. I think it has to do with the fact that locks cannot be pickled : see Python sharing a lock between processes.
In this answer, two fixes are suggested (side note : I cannot use a pool because I want to randomly create an arbitrary number of processes).
I haven't found a way to adapt the second fix, but I tried to implement the first one :
import multiprocessing as mp
from time import sleep
if __name__ == "__main__":
mp.set_start_method('spawn')
class OneAtATime:
def f(self, l):
with l:
sleep(1)
print("Hello")
if __name__ == "__main__":
a = OneAtATime()
b = OneAtATime()
m = mp.Manager()
l = m.Lock()
p1 = mp.Process(target = a.f, args = (l,))
p2 = mp.Process(target = b.f, args = (l,))
p1.start()
p2.start()
This fails with AttributeError and FileNotFoundError (3). In fact it also fails (BrokenPipe) when the fork method is used (4).
What is the proper way of sharing a lock between spawned processes ?
A quick explanation of the four fails I numbered would be nice, too.
I'm running Python 3.6 under Archlinux.
Congratulations, you got yourself 90% of the way there. The last step is actually not very hard to do.
Yes, your final code block fails with an AttributeError, but what specifically is the error? "Can't get attribute 'OneAtATime' on ". This is very similar to a problem you've already encountered - it's not pickling the class OneAtATime.
I made the following change and it worked as you'd like:
file ooat.py:
from time import sleep
class OneAtATime:
def f(self, l):
with l:
sleep(1)
print("Hello")
interactive shell:
import multiprocessing as mp
from oaat import OneAtATime
if __name__ == "__main__":
mp.set_start_method('spawn')
a = OneAtATime()
b = OneAtATime()
m = mp.Manager()
l = m.Lock()
p1 = mp.Process(target = a.f, args = (l,))
p2 = mp.Process(target = b.f, args = (l,))
p1.start()
p2.start()
You may notice, I didn't really do anything - just split your code into two separate files. Try it out, you'll see it works fine. (At least, it did for me, using python 3.5 on ubuntu.)
The last code snippet works, provided the script does not exit prematurely. Joining processes is enough :
import multiprocessing as mp
from time import sleep
class OneAtATime:
def f(self, l):
with l:
sleep(1)
print("Hello")
if __name__ == "__main__":
mp.set_start_method('spawn')
a = OneAtATime()
b = OneAtATime()
m = mp.Manager()
l = m.Lock()
p1 = mp.Process(target = a.f, args = (l,))
p2 = mp.Process(target = b.f, args = (l,))
p1.start()
p2.start()
p1.join()
p2.join()
More info on the error it was causing here https://stackoverflow.com/a/25456494/8194503.
I am very new to Python, thus am possibly asking a simple question.
I am wrting a multiprocess code with Python:
from multiprocessing import Process
from multiprocessing import Queue
class myClass(object):
def __init__(self):
self.__i = 0
self.__name = 'rob'
return
def target_func(self, name, q):
self.__name = name
print 'Hello', self.__name
self.__i += 1
print self.__i
q.put([self.__i, self.__name])
return
def name(self):
return self.__name
def i(self):
return self.__i
if __name__ == '__main__':
mc = myClass()
q = Queue()
p = Process(target = mc.target_func, args = ('bob', q,))
p.start()
ret = q.get()
p.join()
p2 = Process(target = mc.target_func, args = ('tom', q,))
p2.start()
ret = q.get()
p2.join()
I expect the print out should be
Hello bob
1
Hello tom
2
But actually, the print out is
Hello bob
1
Hello tom
1 <------------------ Why it's not 2?
May I know what am I wrong?
Many thanks.
target_func is called in separated process. mc is copied to each subprocess; not shared between processes.
Using Thread, you will get expected(?) result. For safety you should use lock; I omitted it in following code.
from threading import Thread
from Queue import Queue
....
if __name__ == '__main__':
mc = myClass()
q = Queue()
p = Thread(target = mc.target_func, args = ('bob', q,))
p.start()
ret = q.get()
p.join()
p2 = Thread(target = mc.target_func, args = ('tom', q,))
p2.start()
ret = q.get()
p2.join()
Processes don't share memory, unlike threads. The name __i in the second process refers to a different variable, whose initial value was copied from the original process when you launched the subprocess.
You can use the Value or Array data types to transfer information from one process to another, or you can use the Queue to push data from the subprocess back the the original. All of these classes are included in the multiprocessing module
http://docs.python.org/2/library/multiprocessing.html#multiprocessing.Queue
http://docs.python.org/2/library/multiprocessing.html#multiprocessing.Value
http://docs.python.org/2/library/multiprocessing.html#multiprocessing.Array
The value of the variable is still the same since each process you create gets a full copy of the memory space of the parent process, including a copy of the mc class instance that you created earlier. Hence, when you modify the instance variable of mc from within each process, it does not affect the variable in your main process. Here's a more concise example of this behavior:
from multiprocessing import Process
class A(object):
def __init__(self):
self.var = 1
print "Initialized class: ",self
def test(self):
print self
print "Variable value:",self.var
self.var += 1
if __name__ == '__main__':
a = A()
p1 = Process(target = a.test)
#Creates a copy of the curent memory space and will print "Variable value: 1"
p1.start()
p2 = Process(target = a.test)
#Will still print "Variable value: 1"
p2.start()
I have two processes and the data of one process has to be communicated to the other. I wrote a basic queue in order to communicate in real time but it doesn't serve the purpose.
The following is example code:
from multiprocessing import Process , Pipe , Queue
a , b = Pipe()
q = Queue()
def f(name):
i = 0
while i < 4:
q.put(i)
i += 1
def t():
print q.get()
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
p1 = Process(target=t, args= (''))
p1.start()
p1.join()
The expected output was 0 1 2 3 4, but I only get 0.
How can I resolve this?
try with this version :
def t():
while True:
try:
print q.get(timeout=1)
except:
break
You're only calling get() once. It returns one item at a time.
(As an aside, your function f is very non-Pythonic, ty:
def f(name):
for i in range(4):
q.put(i)
You're also using q as a global...