Python socketio with multiprocessing - python

So I have been struggling with this one error of pickle which is driving me crazy. I have the following master Engine class with the following code :
import eventlet
import socketio
import multiprocessing
from multiprocessing import Queue
from multi import SIOSerever
class masterEngine:
if __name__ == '__main__':
serverObj = SIOSerever()
try:
receiveData = multiprocessing.Process(target=serverObj.run)
receiveData.start()
receiveProcess = multiprocessing.Process(target=serverObj.fetchFromQueue)
receiveProcess.start()
receiveData.join()
receiveProcess.join()
except Exception as error:
print(error)
and I have another file called multi which runs like the following :
import multiprocessing
from multiprocessing import Queue
import eventlet
import socketio
class SIOSerever:
def __init__(self):
self.cycletimeQueue = Queue()
self.sio = socketio.Server(cors_allowed_origins='*',logger=False)
self.app = socketio.WSGIApp(self.sio, static_files={'/': 'index.html',})
self.ws_server = eventlet.listen(('0.0.0.0', 5000))
#self.sio.on('production')
def p_message(sid, message):
self.cycletimeQueue.put(message)
print("I logged : "+str(message))
def run(self):
eventlet.wsgi.server(self.ws_server, self.app)
def fetchFromQueue(self):
while True:
cycle = self.cycletimeQueue.get()
print(cycle)
As you can see I can trying to create two processes of def run and fetchFromQueue which i want to run independently.
My run function starts the python-socket server to which im sending some data from a html web page ( This runs perfectly without multiprocessing). I am then trying to push the data received to a Queue so that my other function can retrieve it and play with the data received.
I have a set of time taking operations that I need to carry out with the data received from the socket which is why im pushing it all into a Queue.
On running the master Engine class I receive the following :
Can't pickle <class 'threading.Thread'>: it's not the same object as threading.Thread
I ended!
[Finished in 0.5s]
Can you please help with what I am doing wrong?

From multiprocessing programming guidelines:
Explicitly pass resources to child processes
On Unix using the fork start method, a child process can make use of a shared resource created in a parent process using a global resource. However, it is better to pass the object as an argument to the constructor for the child process.
Apart from making the code (potentially) compatible with Windows and the other start methods this also ensures that as long as the child process is still alive the object will not be garbage collected in the parent process. This might be important if some resource is freed when the object is garbage collected in the parent process.
Therefore, I slightly modified your example by removing everything unnecessary, but showing an approach where the shared queue is explicitly passed to all processes that use it:
import multiprocessing
MAX = 5
class SIOSerever:
def __init__(self, queue):
self.cycletimeQueue = queue
def run(self):
for i in range(MAX):
self.cycletimeQueue.put(i)
#staticmethod
def fetchFromQueue(cycletimeQueue):
while True:
cycle = cycletimeQueue.get()
print(cycle)
if cycle >= MAX - 1:
break
def start_server(queue):
server = SIOSerever(queue)
server.run()
if __name__ == '__main__':
try:
queue = multiprocessing.Queue()
receiveData = multiprocessing.Process(target=start_server, args=(queue,))
receiveData.start()
receiveProcess = multiprocessing.Process(target=SIOSerever.fetchFromQueue, args=(queue,))
receiveProcess.start()
receiveData.join()
receiveProcess.join()
except Exception as error:
print(error)
0
1
...

Related

How do I share a list across processes with SyncManager in Python without passing references

In python, the multiprocessing module provides managers that can generate shared lists/dicts between processes.
However, I'm having an issue using these shared objects if the processes accessing the manager are not child processes, but are instead connecting to the manager via Manager.connect.
Here's a very basic example: I'm trying to create a shared list that can be accessed by a group of processes. For this example, I'm just launching this same code twice in two terminal windows:
import os, time
from multiprocessing.managers import SyncManager
def main() -> None:
print(f"I am process {os.getpid()}")
print(f"Starting proxy server...")
manager = SyncManager(address=("127.0.0.1", 8000), authkey=b"noauth")
try:
manager.start() # will start the sync process if it doesn't exist
except:
manager.connect() # if it does already exist, connect to it instead
print(f"Proxy server started/connected")
# would like to generate a shared list that each process can access.
sharedList = manager.list() # this generates a new list, so each process gets their own, which is not what I want
sharedList.append(os.getpid())
time.sleep(20)
if __name__ == '__main__':
main()
Pythons documentation on using remote managers seems similar to what I'm looking for, but there's no information on how to get a Manager.list or Manager.dict shared.
Note: I'd also be perfectly happy sharing a namespace object.
Here's how I ended up solving the problem. You need to spawn a manager process, that is in possession of the shared list, manually.
import multiprocessing
from multiprocessing import process
import os, time, sys
from multiprocessing.managers import SyncManager, ListProxy
from queue import Queue
class SharedStorage(SyncManager):
pass
def ManagerProcess():
sys.stdout = open(os.devnull, 'w')
sys.stderr = open(os.devnull, 'w')
l = list()
SharedStorage.register('get_list', lambda: l, ListProxy)
try:
ss = SharedStorage(address=("127.0.0.1", 8000), authkey=b"noauth")
ss.get_server().serve_forever()
except OSError:
# failed to listen on port - already in use.
pass
def main() -> None:
print(f"I am process {os.getpid()}")
print(f"Starting proxy server...")
mainProcess = multiprocessing.Process(target=ManagerProcess, daemon=True)
mainProcess.start()
SharedStorage.register('get_list')
manager = SharedStorage(address=("127.0.0.1", 8000), authkey=b"noauth")
manager.connect()
print(f"Proxy server started/connected")
# required - see https://bugs.python.org/issue7503
multiprocessing.current_process().authkey = b"noauth"
# get reference to the shared list object
shared_list = manager.get_list()
shared_list.append(os.getpid())
for i in shared_list:
print(i)
time.sleep(20)
if __name__ == '__main__':
main()
This can be run several times safely, as manager processes spawned by later processes will exit after they cannot listen on the port.

Python callback for a multiprocess Queue or Pipe

Is there a way to create a callback that executes whenever something is sent to the main process from a child process initiated via multiprocessing? The best I can think of thus far is:
import multiprocessing as mp
import threading
import time
class SomeProcess(mp.Process):
def run(self):
while True
time.sleep(1)
self.queue.put(time.time())
class ProcessListener(threading.Thread):
def run(self):
while True:
value = self.queue.get()
do_something(value)
if __name__ = '__main__':
queue = mp.Queue()
sp = SomeProcess()
sp.queue = queue
pl = ProcessListener()
pl.queue = queue
sp.start()
pl.start()
No there is no other clean way to do so than the one you already posted.
This is how concurrent.fututes.ProcessPoolExecutor and multiprocessing.Pool are actually implemented. They have a dedicated thread which drains the tasks/results queue and run any associated callback.
If you want to save some resource, you can use a SimpleQueue in this case.

Python - Non-empty shared list on separate thread appears empty

I've two classes - MessageProducer and MessageConsumer.
MessageConsumer does the following:
receives messages and puts them in its message list "_unprocessed_msgs"
on a separate worker thread, moves the messages to internal list "_in_process_msgs"
on the worker thread, processes messages from "_in_process_msgs"
On my development environment, I'm facing issue with #2 above - after adding a message by performing step#1, when worker thread checks length of "_unprocessed_msgs", it gets it as zero.
When step #1 is repeated, the list properly shows 2 items on the thread on which the item was added. But in step #2, on worker thread, again the len(_unprocessed_msgs) returns zero.
Not sure why this is happening. Would really appreciate help any help on this.
I'm using Ubuntu 16.04 having Python 2.7.12.
Below is the sample source code. Please let me know if more information is required.
import threading
import time
class MessageConsumerThread(threading.Thread):
def __init__(self):
super(MessageConsumerThread, self).__init__()
self._unprocessed_msg_q = []
self._in_process_msg_q = []
self._lock = threading.Lock()
self._stop_processing = False
def start_msg_processing_thread(self):
self._stop_processing = False
self.start()
def stop_msg_processing_thread(self):
self._stop_processing = True
def receive_msg(self, msg):
with self._lock:
LOG.info("Before: MessageConsumerThread::receive_msg: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
self._unprocessed_msg_q.append(msg)
LOG.info("After: MessageConsumerThread::receive_msg: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
def _queue_unprocessed_msgs(self):
with self._lock:
LOG.info("MessageConsumerThread::_queue_unprocessed_msgs: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
if self._unprocessed_msg_q:
LOG.info("Moving messages from unprocessed to in_process queue")
self._in_process_msg_q += self._unprocessed_msg_q
self._unprocessed_msg_q = []
LOG.info("Moved messages from unprocessed to in_process queue")
def run(self):
while not self._stop_processing:
# Allow other threads to add messages to message queue
time.sleep(1)
# Move unprocessed listeners to in-process listener queue
self._queue_unprocessed_msgs()
# If nothing to process continue the loop
if not self._in_process_msg_q:
continue
for msg in self._in_process_msg_q:
self.consume_message(msg)
# Clean up processed messages
del self._in_process_msg_q[:]
def consume_message(self, msg):
print(msg)
class MessageProducerThread(threading.Thread):
def __init__(self, producer_id, msg_receiver):
super(MessageProducerThread, self).__init__()
self._producer_id = producer_id
self._msg_receiver = msg_receiver
def start_producing_msgs(self):
self.start()
def run(self):
for i in range(1,10):
msg = "From: %s; Message:%s" %(self._producer_id, i)
self._msg_receiver.receive_msg(msg)
def main():
msg_receiver_thread = MessageConsumerThread()
msg_receiver_thread.start_msg_processing_thread()
msg_producer_thread = MessageProducerThread(producer_id='Producer-01',
msg_receiver=msg_receiver_thread)
msg_producer_thread.start_producing_msgs()
msg_producer_thread.join()
msg_receiver_thread.stop_msg_processing_thread()
msg_receiver_thread.join()
if __name__ == '__main__':
main()
Following is the log the I get:
INFO: MessageConsumerThread::_queue_unprocessed_msgs: len(self._unprocessed_msg_q)=0
INFO: Before: MessageConsumerThread::receive_msg: len(self._unprocessed_msg_q)=0
INFO: After: MessageConsumerThread::receive_msg: **len(self._unprocessed_msg_q)=1**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: **len(self._unprocessed_msg_q)=0**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: len(self._unprocessed_msg_q)=0
INFO: Before: MessageConsumerThread::receive_msg: len(self._unprocessed_msg_q)=1
INFO: After: MessageConsumerThread::receive_msg: **len(self._unprocessed_msg_q)=2**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: **len(self._unprocessed_msg_q)=0**
This is not a good desing for you application.
I spent some time trying to debug this - but threading code is naturally complicated, so we should try to descomplicate it, instead of getting it even more confure.
When I see threading code in Python, I usually see it written a in a procedural form: a normal function that is passed to threading.Thread as the target argument that drives each thread. That way, you don't need to write code for a new class that will have a single instance.
Another thing is that, although Python's global interpreter lock itself guarantees lists won't get corrupted if modified in two separate threads, lists are not a recomended "thread data passing" data structure. You probably should look at threading.Queue to do that
The thing is wrong in this code at first sight is probably not the cause of your problem due to your use of locks, but it might be. Instead of
self._unprocessed_msg_q = []
which will create a new list object, the other thread have momentarily no reference too (so it might write data to the old list), you should do:
self._unprocessed_msg_q[:] = []
Or just the del slice thing you do on the other method.
But to be on the safer side, and having mode maintanable and less surprising code, you really should change to a procedural approach there, assuming Python threading. Assume "Thread" is the "final" object that can do its thing, and then use Queues around:
# coding: utf-8
from __future__ import print_function
from __future__ import unicode_literals
from threading import Thread
try:
from queue import Queue, Empty
except ImportError:
from Queue import Queue, Empty
import time
import random
TERMINATE_SENTINEL = object()
NO_DATA_SENTINEL = object()
class Receiver(object):
def __init__(self, queue):
self.queue = queue
self.in_process = []
def receive_data(self, data):
self.in_process.append(data)
def consume_data(self):
print("received data:", self.in_process)
del self.in_process[:]
def receiver_loop(self):
queue = self.queue
while True:
try:
data = queue.get(block=False)
except Empty:
print("got no data from queue")
data = NO_DATA_SENTINEL
if data is TERMINATE_SENTINEL:
print("Got sentinel: exiting receiver loop")
break
self.receive_data(data)
time.sleep(random.uniform(0, 0.3))
if queue.empty():
# Only process data if we have nothing to receive right now:
self.consume_data()
print("sleeping receiver")
time.sleep(1)
if self.in_process:
self.consume_data()
def producer_loop(queue):
for i in range(10):
time.sleep(random.uniform(0.05, 0.4))
print("putting {0} in queue".format(i))
queue.put(i)
def main():
msg_queue = Queue()
msg_receiver_thread = Thread(target=Receiver(msg_queue).receiver_loop)
time.sleep(0.1)
msg_producer_thread = Thread(target=producer_loop, args=(msg_queue,))
msg_receiver_thread.start()
msg_producer_thread.start()
msg_producer_thread.join()
msg_queue.put(TERMINATE_SENTINEL)
msg_receiver_thread.join()
if __name__ == '__main__':
main()
note that since you want multiple methods in the recever thread to do things with data, I used a class - but it does not inherit from Thread, and does not have to worry about its workings. All its methods are called within the same thread: no need of locks, no worries about race conditions within the receiver class itself. For communicating outside the class, the Queue class is structured to handle any race conditions for us.
The producer loop, as it is just a dummy producer, has no need at all to be written in class form. But it would look just the same, if it had more methods.
(The random sleeps help visualize what would happen in "real world" message receiving)
Also, you might want to take a look at something like:
https://www.thoughtworks.com/insights/blog/composition-vs-inheritance-how-choose
Finally I was able to solve the issue. In the actual code, I've a Manager class that is responsible for instantiating MessageConsumerThread as its last thing in the initializer:
class Manager(object):
def __init__(self):
...
...
self._consumer = MessageConsumerThread(self)
self._consumer.start_msg_processing_thread()
The problem seems to be with passing 'self' in MessageConsumerThread initializer when Manager is still executing its initializer (eventhough those are last two steps). The moment I moved the creation of consumer out of initializer, consumer thread was able to see the elements in "_unprocessed_msg_q".
Please note that the issue is still not reproducible with the above sample code. It is manifesting itself in the production environment only. Without the above fix, I tried queue and dictionary as well but observed the same issue. After the fix, tried with queue and list and was able to successfully execute the code.
I really appreciate and thank #jsbueno and #ivan_pozdeev for their time and help! Community #stackoverflow is very helpful!

Runing class method multiple times parallel in Python

I have implemented a Python socket server. It sends image data from multiple cameras to a client. My request handler class looks like:
class RequestHandler(SocketServer.BaseRequestHandler):
def handle(self):
while True:
data = self.request.recv(1024)
if data.endswith('0000000050'): # client requests data
for camera_id, camera_path in _video_devices.iteritems():
message = self.create_image_transfer_message(camera_id, camera_path)
self.request.sendto(message, self.client_address)
def create_image_transfer_message(self, camera_id, camera_path):
# somecode ...
I am forced to stick to the socket server because of the client. It works however the problem is that it works sequentially, so there are large delays between the camera images being uploaded. I would like to create the transfer messages in parallel with a small delay between the calls.
I tried to use the pool class from multiprocessing:
import multiprocessing
class RequestHandler(SocketServer.BaseRequestHandler):
def handle(self):
...
pool = multiprocessing.Pool(processes=4)
messages = [pool.apply(self.create_image_transfer_message, args=(camera_id, camera_path)) for camId, camPath in _video_devices.iteritems()]
But this throws:
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
I want to know if there is an another way to create those transfer messages in parallel with a defined delay between the calls?
EDIT:
I create the response messages using data from multiple cameras. The problem is, that if I run the image grabbing routines too close to each other I get image artifacts, because the USB bus is overloaded. I figured out, that calling the image grabbing sequentially with 0.2 sec delay will solve the problem. The cameras are not sending data the whole time the image grabbing function is running, so the delayed parallel cal result in good images with only a small delay between them.
I think you're on the right path already, no need to throw away your work.
Here's an answer to how to use a class method with multiprocessing I found via Google after searching for "multiprocessing class method"
from multiprocessing import Pool
import time
pool = Pool(processes=2)
def unwrap_self_f(arg, **kwarg):
return C.create_image_transfer_message(*arg, **kwarg)
class RequestHandler(SocketServer.BaseRequestHandler):
#classmethod
def create_image_transfer_message(cls, camera_id, camera_path):
# your logic goes here
def handle(self):
while True:
data = self.request.recv(1024)
if not data.endswith('0000000050'): # client requests data
continue
pool.map(unwrap_self_f,
(
(camera_id, camera_path)
for camera_id, camera_path in _video_devices.iteritems()
)
)
Note, if you want to return values from the workers then you'll need to explore using a shared resource see this answer here - How can I recover the return value of a function passed to multiprocessing.Process?
This code did the trick for me:
class RequestHandler(SocketServer.BaseRequestHandler):
def handle(self):
while True:
data = self.request.recv(1024)
if data.endswith('0000000050'): # client requests data
process_manager = multiprocessing.Manager()
messaging_queue = process_manager.Queue()
jobs = []
for camId, camPath in _video_devices.iteritems():
p = multiprocessing.Process(target=self.create_image_transfer_message,
args=(camera_id, camera_path, messaging_queue))
jobs.append(p)
p.start()
time.sleep(0.3)
# wait for all processes to finish
for p in jobs:
p.join()
while not messaging_queue.empty():
self.request.sendto(messaging_queue.get(), self.client_address)

Python multiprocessing with twisted's reactor

I am working on a xmlrpc server which has to perform certain tasks cyclically. I am using twisted as the core of the xmlrpc service but I am running into a little problem:
class cemeteryRPC(xmlrpc.XMLRPC):
def __init__(self, dic):
xmlrpc.XMLRPC.__init__(self)
def xmlrpc_foo(self):
return 1
def cycle(self):
print "Hello"
time.sleep(3)
class cemeteryM( base ):
def __init__(self, dic): # dic is for cemetery
multiprocessing.Process.__init__(self)
self.cemRPC = cemeteryRPC()
def run(self):
# Start reactor on a second process
reactor.listenTCP( c.PORT_XMLRPC, server.Site( self.cemRPC ) )
p = multiprocessing.Process( target=reactor.run )
p.start()
while not self.exit.is_set():
self.cemRPC.cycle()
#p.join()
if __name__ == "__main__":
import errno
test = cemeteryM()
test.start()
# trying new method
notintr = False
while not notintr:
try:
test.join()
notintr = True
except OSError, ose:
if ose.errno != errno.EINTR:
raise ose
except KeyboardInterrupt:
notintr = True
How should i go about joining these two process so that their respective joins doesn't block?
(I am pretty confused by "join". Why would it block and I have googled but can't find much helpful explanation to the usage of join. Can someone explain this to me?)
Regards
Do you really need to run Twisted in a separate process? That looks pretty unusual to me.
Try to think of Twisted's Reactor as your main loop - and hang everything you need off that - rather than trying to run Twisted as a background task.
The more normal way of performing this sort of operation would be to use Twisted's .callLater or to add a LoopingCall object to the Reactor.
e.g.
from twisted.web import xmlrpc, server
from twisted.internet import task
from twisted.internet import reactor
class Example(xmlrpc.XMLRPC):
def xmlrpc_add(self, a, b):
return a + b
def timer_event(self):
print "one second"
r = Example()
m = task.LoopingCall(r.timer_event)
m.start(1.0)
reactor.listenTCP(7080, server.Site(r))
reactor.run()
Hey asdvawev - .join() in multiprocessing works just like .join() in threading - it's a blocking call the main thread runs to wait for the worker to shut down. If the worker never shuts down, then .join() will never return. For example:
class myproc(Process):
def run(self):
while True:
time.sleep(1)
Calling run on this means that join() will never, ever return. Typically to prevent this I'll use an Event() object passed into the child process to allow me to signal the child when to exit:
class myproc(Process):
def __init__(self, event):
self.event = event
Process.__init__(self)
def run(self):
while not self.event.is_set():
time.sleep(1)
Alternatively, if your work is encapsulated in a queue - you can simply have the child process work off of the queue until it encounters a sentinel (typically a None entry in the queue) and then shut down.
Both of these suggestions means that prior to calling .join() you can send set the event, or insert the sentinel and when join() is called, the process will finish it's current task and then exit properly.

Categories