Share queue between processes

Share queue between processes - python

I am pretty new to multiprocessing in python and trying to achieve something which should be a rather common thing to do. But I cannot find an easy way when searching the web.
I want to put data in a queue and then make this queue available to different consumer functions. Of course when getting an element from the queue, all consumer functions should get the same element. The following example should make clear what I want to achieve:
from multiprocessing import Process, Queue
def producer(q):
for i in range(10):
q.put(i)
q.put(None)
def consumer1(q):
while True:
data = q.get()
if data is None:
break
print(data)
def consumer2(q):
while True:
data = q.get()
if data is None:
break
print(data)
def main():
q = Queue()
p1 = Process(target=producer, args=(q,))
p2 = Process(target=consumer1, args=(q,))
p3 = Process(target=consumer2, args=(q,))
p1.start()
p2.start()
p3.start()
p1.join()
p2.join()
p3.join()
if __name__ == '__main__':
main()
Since the script is not terminating and I only get the print output of one function I guess this is not the way to do it. I think sharing a queue implies some things to consider? It works fine when using only one consumer function.
Appreciate the help!

If the values you are storing can be represented by one of the fundamental data types defined in the ctypes module, then the following could work. Here we are implementing a "queue" that can hold int values or None:
from multiprocessing import Process, Condition
import ctypes
from multiprocessing.sharedctypes import RawArray, RawValue
from threading import local
import time
my_local = local()
my_local.current = 0
class StructuredInt(ctypes.Structure):
"""
This class is necessary because we want to be able to store in the RawArray
either an int or None, which requires using ctypes.c_void_p as the array type.
But, infortunately, ctypes.c_void_p(0) is interpreted as None.
So we need a way to represent 0. Field value 'value' is the
actual int value being stored and we use an arbitrarty 'ptr'
field value that will not be interpreted as None.
To store a None value, we set 'ptr' to ctypes.c_void_p(None) and field
'value' is irrelevant.
To store an integer. we set 'ptr' to ctypes.c_void_p(1) and field
'value' has the actual value.
"""
_fields_ = [('ptr', ctypes.c_void_p), ('value', ctypes.c_int)]
class MultiIntQueue:
"""
An integer queue that can be processed by multiple threads where each thread
can retrieve all the values added to the queue.
:param maxsize: The maximum queue capacity (defaults to 20 if specified as None)
:type maxsize: int
"""
def __init__(self, maxsize=None):
if maxsize is None:
maxsize = 20
self.maxsize = maxsize
self.q = RawArray(StructuredInt, maxsize)
self.condition = Condition()
self.size = RawValue(ctypes.c_int, 0)
def get(self):
with self.condition:
while my_local.current >= self.size.value:
self.condition.wait()
i = self.q[my_local.current]
my_local.current += 1
return None if i.ptr is None else i.value
def put(self, i):
assert 0 <= self.size.value < self.maxsize
with self.condition:
self.q[self.size.value] = (ctypes.c_void_p(None), 0) if i is None else (ctypes.c_void_p(1), i)
self.size.value += 1
self.condition.notify_all()
def producer(q):
for i in range(10):
q.put(i)
time.sleep(.3) # simulate processing
q.put(None)
def consumer1(q):
while True:
data = q.get()
if data is None:
break
time.sleep(.1) # simulate processing
print('Consumer 1:', data)
def consumer2(q):
while True:
data = q.get()
if data is None:
break
time.sleep(.1) # simulate processing
print('Consumer 2:', data)
def main():
q = MultiIntQueue()
p1 = Process(target=producer, args=(q,))
p2 = Process(target=consumer1, args=(q,))
p3 = Process(target=consumer2, args=(q,))
p1.start()
p2.start()
p3.start()
p1.join()
p2.join()
p3.join()
if __name__ == '__main__':
main()
Prints:
Consumer 1: 0
Consumer 2: 0
Consumer 2: 1
Consumer 1: 1
Consumer 2: 2
Consumer 1: 2
Consumer 2: 3
Consumer 1: 3
Consumer 2: 4
Consumer 1: 4
Consumer 1: 5
Consumer 2: 5
Consumer 1: 6
Consumer 2: 6
Consumer 1: 7
Consumer 2: 7
Consumer 2: 8
Consumer 1: 8
Consumer 1: 9
Consumer 2: 9

Your question exemplifies the misunderstanding
"all consumer functions should get the same element"
That's just not how queues work. Queues are automatically managed (there's quite a lot under the hood) such if one item is put in, only one item can be taken out. That item is not duplicated to all consumers. It seems like you actually need two separate queues to guarantee that each consumer gets each input without competing against the other consumer:
from multiprocessing import Process, Queue
def producer(q1, q2):
for i in range(10):
q1.put(i)
q2.put(i)
q1.put(None)
q2.put(None)
def consumer1(q):
while True:
data = q.get()
if data is None:
break
print(data)
def consumer2(q):
while True:
data = q.get()
if data is None:
break
print(data)
def main():
q1 = Queue()
q2 = Queue()
p1 = Process(target=producer, args=(q1, q2))
p2 = Process(target=consumer1, args=(q1,))
p3 = Process(target=consumer2, args=(q2,))
p1.start()
p2.start()
p3.start()
p1.join()
p2.join()
p3.join()
if __name__ == '__main__':
main()

Related

100 percent load with multiprocessing queues

this only replicates my problem to get 100% load for the main python script if it tries to control loop over a shared queue
import multiprocessing
import random
def func1(num, q):
while True:
num = random.randint(1, 101)
if q.empty():
q.put(num)
def func2(num, q):
while True:
num = q.get()
num = num ** 2
if q.empty():
q.put(num)
num = 2
q = multiprocessing.Queue()
p1 = multiprocessing.Process(target=func1, args=(num, q))
p2 = multiprocessing.Process(target=func2, args=(num, q))
p1.daemon = True
p2.daemon = True
p1.start()
p2.start()
running = True
while running:
if not q.empty():
num = q.get(True, 0.1)
print(num)
would there be a better method to control from a script multiple worker processes. Better in sense of no load !?

I'm not sure I understand your program:
What's with the num parameter of func1() and func2()? It never gets used.
func2 will discard its result if func1 happens to have posted another number after func2 got the last number out of the queue.
Why do you daemonize the workers? Are you quite sure this is what you want?
The if not q.empty(): q.get() construct in the main code will sooner or later raise a queue.Empty exception because it's a race between it and the q.get() in func2.
The uncaught queue.Empty exception will terminate the main process, leaving the two workers orphaned - and running.
General advice:
Use different queues for issuing jobs (request queue) and collecting results (response queue). Include the request in the response if necessary.
Think about how to terminate the workers. Consider a "poison pill", i.e. a value in the request queue that causes workers to die, i.e. exit/terminate.
Be really really sure you understand the race conditions in your code, like the one I mentioned above (empty vs. get).
Here's some sample code I hacked up:
import multiprocessing
import time
import random
import os
def request_generator(requests):
while True:
requests.put(random.randint(1, 101))
time.sleep(0.01)
def worker(requests, responses):
worker_id = os.getpid()
while True:
request = requests.get()
response = request ** 2
responses.put((request, response, worker_id))
def main():
requests = multiprocessing.Queue()
responses = multiprocessing.Queue()
gen = multiprocessing.Process(target=request_generator, args=(requests,))
w1 = multiprocessing.Process(target=worker, args=(requests, responses))
w2 = multiprocessing.Process(target=worker, args=(requests, responses))
gen.start()
w1.start()
w2.start()
while True:
req, resp, worker_id = responses.get()
print("worker {}: {} => {}".format(worker_id, req, resp))
if __name__ == "__main__":
main()

Cannot obtain values while parallelizing 2 for loops

I am trying to run the following snippet which appends data to lists 'tests1' and 'tests2'. But when I print 'tests1' and 'tests2', the displayed list is empty. Anything incorrect here?
tests1 = []
tests2 = []
def func1():
for i in range(25,26):
tests1.append(test_loader.get_tests(test_prefix=new_paths[i],tags=params.get('tags', None),
exclude=params.get('exclude', False)))
def func2():
for i in range(26,27):
tests2.append(test_loader.get_tests(test_prefix=new_paths[i],tags=params.get('tags', None),
exclude=params.get('exclude', False)))
p1 = mp.Process(target=func1)
p2 = mp.Process(target=func2)
p1.start()
p2.start()
p1.join()
p2.join()
print tests1
print tests2

The worker processes don't actually share the same object. It gets copied (pickled).
You can send values between processes using a multiprocessing.Queue (or by various other means). See my simple example (in which I've made your tests into integers for simplicity).
from multiprocessing import Process, Queue
def add_tests1(queue):
for i in range(10):
queue.put(i)
queue.put(None)
def add_tests2(queue):
for i in range(100,110):
queue.put(i)
queue.put(None)
def run_tests(queue):
while True:
test = queue.get()
if test is None:
break
print test
if __name__ == '__main__':
queue1 = Queue()
queue2 = Queue()
add_1 = Process(target = add_tests1, args = (queue1,))
add_2 = Process(target = add_tests2, args = (queue2,))
run_1 = Process(target = run_tests, args = (queue1,))
run_2 = Process(target = run_tests, args = (queue2,))
add_1.start(); add_2.start(); run_1.start(); run_2.start()
add_1.join(); add_2.join(); run_1.join(); run_2.join()
Note that the parent program can also access the queues.

Second queue is not defined [python]

I have a 3 processes running in one script. Process 1 passes data to Process 2, and then Process 2 passes data to Process 3. When I put data to queue2, error occurs that "Global name "queue2" is not defined", I am stuck on this error now...
if __name__ == '__main__':
queue1 = mp.Queue()
queue2 = mp.Queue()
p1 = mp.Process(target=f2, args=(queue1,))
p1.start()
p2 = mp.Process(target=f3, args=(queue2,))
p2.start()
f1()
def f1():
# do something to a get x
queue1.put(x)
def f2(q):
a = q.get()
# do something to a, to produce b
queue2.put(b) # error happens here: Global name "queue2" is not defined
def f3(q):
c = q.get()
# keeping processing c...

Just as you passed queue1 to f2, you also need to pass queue2.

You can declare the queues as global:
def f2(q):
global queue2
a = q.get()
queue2.put(b)

This works :
import multiprocessing as mp
queue1 = mp.Queue()
queue2 = mp.Queue()
def f1(q):
x = 5
# do something to a get x
q.put(x)
def f2(in_queue, out_queue):
a = in_queue.get()
b = a + 2
# do something to a, to produce b
out_queue.put(b)
def f3(q):
c = q.get()
print c
f1(queue1)
p1 = mp.Process(target=f2, args=(queue1, queue2))
p1.start()
p2 = mp.Process(target=f3, args=(queue2,))
p2.start()
Your code doesn't return the error you seem to have, it returns "f2 not defined" since you when you spawn the process p1, f2 is not a defined variable yet. The rule when you fork is that at creation time your processes must see the variables they use, i.e. they must be in the current scope.
To put it clearly, at spawning process time you inherit the current namespace from the parent process.

Producer consumer 3 threads for each in python

I'm trying to do a producer consumer program. I got it working just fine with one thread for each and I'm trying to modify it to run three threads of each. It appears that each of the consumer threads is trying to consume each released item.
# N is the number of slots in the buffer
N = 8
n = 0
i=0
j=0
# initialise buf with the right length, but without values
buf = N * [None]
free = threading.Semaphore(N)
items = threading.Semaphore(0)
block = threading.Semaphore(1)
# a function for the producer thread
def prod(n, j):
while True:
time.sleep(random.random())
free.acquire()
# produce a number and add it to the buffer
buf[i] = n
#print("produced")
j = (j + 1) % N
n += 1
items.release()
# a function for the consumer thread
def cons(th):
global i
while True:
time.sleep(random.random())
#acquire items to allow the consumer to print.
items.acquire()
print(buf[i])
print("consumed, th:{} i:{}".format(th, i))
i = (i + 1) % N
#time.sleep(3)
free.release()
# a main function
def main():
p1 = threading.Thread(target=prod, args=[n,j])
p2 = threading.Thread(target=prod, args=[n,j])
p3 = threading.Thread(target=prod, args=[n,j])
c1 = threading.Thread(target=cons, args=[1])
c2 = threading.Thread(target=cons, args=[2])
c3 = threading.Thread(target=cons, args=[3])
p1.start()
p2.start()
p3.start()
c1.start()
c2.start()
c3.start()
p1.join()
p2.join()
p3.join()
c1.join()
c2.join()
c3.join()
main()
Any help is appreciated. I'm really at a loss with this one.

When your code in a thread acquires a semaphore, it should then subsequently release the same semaphore. So instead of:
items.acquire()
...
free.release()
your code must do, e.g.
items.acquire()
...
items.release()

How to communicate between process in real time?

I have two processes and the data of one process has to be communicated to the other. I wrote a basic queue in order to communicate in real time but it doesn't serve the purpose.
The following is example code:
from multiprocessing import Process , Pipe , Queue
a , b = Pipe()
q = Queue()
def f(name):
i = 0
while i < 4:
q.put(i)
i += 1
def t():
print q.get()
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
p1 = Process(target=t, args= (''))
p1.start()
p1.join()
The expected output was 0 1 2 3 4, but I only get 0.
How can I resolve this?

try with this version :
def t():
while True:
try:
print q.get(timeout=1)
except:
break

You're only calling get() once. It returns one item at a time.
(As an aside, your function f is very non-Pythonic, ty:
def f(name):
for i in range(4):
q.put(i)
You're also using q as a global...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Share queue between processes - python

Related

100 percent load with multiprocessing queues

Cannot obtain values while parallelizing 2 for loops

Second queue is not defined [python]

Producer consumer 3 threads for each in python

How to communicate between process in real time?

Categories

Resources