broken pipe error with python multiprocessing and socketserver

broken pipe error with python multiprocessing and socketserver - python

Essentially Im using the socketserver python library to try and handle communications from a central server to multiple raspberry pi4 and esp32 peripherals. Currently i have the socketserver running serve_forever, then the request handler calls a method from a processmanager class which starts a process that should handle the actual communication with the client.
It works fine if i use .join() on the process such that the processmanager method doesnt exit, but thats not how i would like it to run. Without .join() i get a broken pipe error as soon as the client communication process tries to send a message back to the client.
This is the process manager class, it gets defined in the main file and buildprocess is called through the request handler of the socketserver class:
import multiprocessing as mp
mp.allow_connection_pickling()
import queuemanager as qm
import hostmain as hmain
import camproc
import keyproc
import controlproc
# method that gets called into a process so that class and socket share memory
def callprocess(periclass, peritype, clientsocket, inqueue, genqueue):
periclass.startup(clientsocket)
class ProcessManager(qm.QueueManager):
def wipeproc(self, target):
# TODO make wipeproc integrate with the queue manager rather than directly to the class
for macid in list(self.procdict.keys()):
if target == macid:
# calls proc kill for the class
try:
self.procdict[macid]["class"].prockill()
except Exception as e:
print("exception:", e, "in wipeproc")
# waits for process to exit naturally (class threads to close)
self.procdict[macid]["process"].join()
# remove dict entry for this macid
self.procdict.pop(macid)
# called externally to create the new process and append to procdict
def buildprocess(self, peritype, macid, clientsocket):
# TODO put some logic here to handle the differences of the controller process
# generates queue object
inqueue = mp.Queue()
# creates periclass instance based on type
if peritype == hmain.cam:
periclass = camproc.CamMain(self, inqueue, self.genqueue)
elif peritype == hmain.keypad:
print("to be added to")
elif peritype == hmain.motion:
print("to be added to")
elif peritype == hmain.controller:
print("to be added to")
# init and start call for the new process
self.procdict[macid] = {"type": peritype, "inqueue": inqueue, "class": periclass, "process": None}
self.procdict[macid]["process"] = mp.Process(target=callprocess,
args=(self.procdict[macid]["class"], self.procdict[macid]["type"], clientsocket, self.procdict[macid]["inqueue"], self.genqueue))
self.procdict[macid]["process"].start()
# updating the process dictionary before class obj gets appended
# if macid in list(self.procdict.keys()):
# self.wipeproc(macid)
print(self.procdict)
print("client added")
to my eye, all the pertinent objects should be stored in the procdict dictionary but as i mentioned it just gets a broken pipe error unless i join the process with self.procdict[macid]["process"].join() before the end of the buildprocess method
I would like it to exit the method but leave the communication process running as is, ive tried a few different things with restructuring what gets defined within the process and without, but to no avail. Thus far i havent been able to find any pertinent solutions online but of course i may have missed something too.
Thankyou for reading this far if you did! Ive been stuck on this for a couple days so any help would be appreciated, this is my first project with multiprocessing and sockets on any sort of scale.
#################
Edit to include pastebin with all the code:
https://pastebin.com/u/kadytoast/1/PPWfyCFT

Without .join() i get a broken pipe error as soon as the client communication process tries to send a message back to the client.
That's because at the time when the request handler handle() returns, the socketserver does shutdown the connection. That socketserver simplifies the task of writing network servers means it does certain things automatically which are usually done in the course of network request handling. Your code is not quite making the intended use of socketserver. Especially, for handling requests asynchronously, Asynchronous Mixins are intended. With the ForkingMixIn the server will spawn a new process for each request, in contrast to your current code which does this by itself with mp.Process. So, I think you have basically two options:
code less of the request handling yourself and use the provided socketserver methods
stay with your own handling and don't use socketserver at all, so it won't get in the way.

Related

Python - Handling concurrent POST requests with HTTPServer and saving to files

I have the code below that I would like to use to receive POST requests with data that I will then use to print labels on a label printer. In order to print the labels I will need to write a file with print commands and then do a lp command via the command line to copy the file to the label printer.
The problem I have is that multiple people could be printing labels at the same time. So my question is do I have to change the code below to use ThreadingMixIn in order to handle concurrent POST requests or can I leave the code as is and there will only be a slight delay for secondary request in a concurrent scenario (that is any further requests will be queued and not lost)?
If I have to go the threaded way how does that impact the writing of the file and subsequent command line call to lp if there are now multiple threads trying to write to the same file?
Note that there are multiple label printers that are being accessed through print queues (CUPS).
import json
from http.server import HTTPServer, BaseHTTPRequestHandler
from io import BytesIO
class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.end_headers()
self.wfile.write(b'Hello, world!')
def do_POST(self):
content_length = int(self.headers['Content-Length'])
body = self.rfile.read(content_length)
try:
result = json.loads(body, encoding='utf-8')
self.send_response(200)
self.end_headers()
response = BytesIO()
response.write(b'This is POST request. ')
response.write(b'Received: ')
response.write(body)
self.wfile.write(response.getvalue())
except Exception as exc:
self.wfile.write('Request has failed to process. Error: %s', exc.message)
httpd = HTTPServer(('localhost', 8000), SimpleHTTPRequestHandler)
httpd.serve_forever()

Why not trying to use unique file name?
In this way you are sure that there will be no names clash.
Have a look on https://docs.python.org/2/library/tempfile.html , consider NamedTemporaryFile function. You should use delete=False, otherway the file is deleted imediatelly after close() .

according to your question what I understand is that you have only one label printer and you have multiple producers who try to print labels on it.
So even if you switch to a multithreading option you will have to synchronize threads in order to avoid deadlocks and infinite waiting,
So my best take is to go with a builtin python data structure called queue
according to the doc
The queue module implements multi-producer, multi-consumer queues. It is especially useful in threaded programming when information must be exchanged safely between multiple threads. The Queue class in this module implements all the required locking semantics. It depends on the availability of thread support in Python,
Even though its a multi-consumer, multi-producer queue I suppose it will still work for you like a charm.
So here is what you need to do
Your server receives a request to print the label,
do the necessary processing/cleanup and put it in the queue
A worker thread pops the items from the queue and executes the task
or if you would expect the system to be big enough, here are some links, but steps would be same as above
RabbitMq - A scalable message broker (simply put a queue)
Celery - A python package for popping items from a message broker such as rabbitmq and executes it

How to handle chat client using threading and queues?

The problem I've got right now is one concerning this chat client I've been trying to get working for some days now. It's supposed to be an upgrade of my original chat client, that could only reply to people if it received a message first.
So after asking around and researching people I decided to use select.select to handle my client.
Problem is it has the same problem as always.
*The loop gets stuck on receiving and won't complete until it receives something*
Here's what I wrote so far:
import select
import sys #because why not?
import threading
import queue
print("New Chat Client Using Select Module")
HOST = input("Host: ")
PORT = int(input("Port: "))
s = socket(AF_INET,SOCK_STREAM)
print("Trying to connect....")
s.connect((HOST,PORT))
s.setblocking(0)
# Not including setblocking(0) because select handles that.
print("You just connected to",HOST,)
# Lets now try to handle the client a different way!
while True:
# Attempting to create a few threads
Reading_Thread = threading.Thread(None,s)
Reading_Thread.start()
Writing_Thread = threading.Thread()
Writing_Thread.start()
Incoming_data = [s]
Exportable_data = []
Exceptions = []
User_input = input("Your message: ")
rlist,wlist,xlist = select.select(Incoming_data,Exportable_data,Exceptions)
if User_input == True:
Exportable_data += [User_input]
Your probably wondering why I've got threading and queues in there.
That's because people told me I could solve the problem by using threading and queues, but after reading documentation, looking for video tutorials or examples that matched my case. I still don't know at all how I can use them to make my client work.
Could someone please help me out here? I just need to find a way to have the client enter messages as much as they'd like without waiting for a reply. This is just one of the ways I am trying to do it.

Normally you'd create a function in which your While True loop runs and can receive the data, which it can write to some buffer or queue to which your main thread has access.
You'd need to synchronize access to this queue so as to avoid data races.
I'm not too familiar with Python's threading API, however creating a function which runs in a thread can't be that hard. Lemme find an example.
Turns out you could create a class with a function where the class derives from threading.Thread. Then you can create an instance of your class and start the thread that way.
class WorkerThread(threading.Thread):
def run(self):
while True:
print 'Working hard'
time.sleep(0.5)
def runstuff():
worker = WorkerThread()
worker.start() #start thread here, which will call run()
You can also use a simpler API and create a function and call thread.start_new_thread(fun, args) on it, which will run that function in a thread.
def fun():
While True:
#do stuff
thread.start_new_thread(fun) #run in thread.

How to use multiprocessing with class instances in Python?

I am trying to create a class than can run a separate process to go do some work that takes a long time, launch a bunch of these from a main module and then wait for them all to finish. I want to launch the processes once and then keep feeding them things to do rather than creating and destroying processes. For example, maybe I have 10 servers running the dd command, then I want them all to scp a file, etc.
My ultimate goal is to create a class for each system that keeps track of the information for the system in which it is tied to like IP address, logs, runtime, etc. But that class must be able to launch a system command and then return execution back to the caller while that system command runs, to followup with the result of the system command later.
My attempt is failing because I cannot send an instance method of a class over the pipe to the subprocess via pickle. Those are not pickleable. I therefore tried to fix it various ways but I can't figure it out. How can my code be patched to do this? What good is multiprocessing if you can't send over anything useful?
Is there any good documentation of multiprocessing being used with class instances? The only way I can get the multiprocessing module to work is on simple functions. Every attempt to use it within a class instance has failed. Maybe I should pass events instead? I don't understand how to do that yet.
import multiprocessing
import sys
import re
class ProcessWorker(multiprocessing.Process):
"""
This class runs as a separate process to execute worker's commands in parallel
Once launched, it remains running, monitoring the task queue, until "None" is sent
"""
def __init__(self, task_q, result_q):
multiprocessing.Process.__init__(self)
self.task_q = task_q
self.result_q = result_q
return
def run(self):
"""
Overloaded function provided by multiprocessing.Process. Called upon start() signal
"""
proc_name = self.name
print '%s: Launched' % (proc_name)
while True:
next_task_list = self.task_q.get()
if next_task is None:
# Poison pill means shutdown
print '%s: Exiting' % (proc_name)
self.task_q.task_done()
break
next_task = next_task_list[0]
print '%s: %s' % (proc_name, next_task)
args = next_task_list[1]
kwargs = next_task_list[2]
answer = next_task(*args, **kwargs)
self.task_q.task_done()
self.result_q.put(answer)
return
# End of ProcessWorker class
class Worker(object):
"""
Launches a child process to run commands from derived classes in separate processes,
which sit and listen for something to do
This base class is called by each derived worker
"""
def __init__(self, config, index=None):
self.config = config
self.index = index
# Launce the ProcessWorker for anything that has an index value
if self.index is not None:
self.task_q = multiprocessing.JoinableQueue()
self.result_q = multiprocessing.Queue()
self.process_worker = ProcessWorker(self.task_q, self.result_q)
self.process_worker.start()
print "Got here"
# Process should be running and listening for functions to execute
return
def enqueue_process(target): # No self, since it is a decorator
"""
Used to place an command target from this class object into the task_q
NOTE: Any function decorated with this must use fetch_results() to get the
target task's result value
"""
def wrapper(self, *args, **kwargs):
self.task_q.put([target, args, kwargs]) # FAIL: target is a class instance method and can't be pickled!
return wrapper
def fetch_results(self):
"""
After all processes have been spawned by multiple modules, this command
is called on each one to retreive the results of the call.
This blocks until the execution of the item in the queue is complete
"""
self.task_q.join() # Wait for it to to finish
return self.result_q.get() # Return the result
#enqueue_process
def run_long_command(self, command):
print "I am running number % as process "%number, self.name
# In here, I will launch a subprocess to run a long-running system command
# p = Popen(command), etc
# p.wait(), etc
return
def close(self):
self.task_q.put(None)
self.task_q.join()
if __name__ == '__main__':
config = ["some value", "something else"]
index = 7
workers = []
for i in range(5):
worker = Worker(config, index)
worker.run_long_command("ls /")
workers.append(worker)
for worker in workers:
worker.fetch_results()
# Do more work... (this would actually be done in a distributor in another class)
for worker in workers:
worker.close()
Edit: I tried to move the ProcessWorker class and the creation of the multiprocessing queues outside of the Worker class and then tried to manually pickle the worker instance. Even that doesn't work and I get an error
RuntimeError: Queue objects should only be shared between processes
through inheritance
. But I am only passing references of those queues into the worker instance?? I am missing something fundamental. Here is the modified code from the main section:
if __name__ == '__main__':
config = ["some value", "something else"]
index = 7
workers = []
for i in range(1):
task_q = multiprocessing.JoinableQueue()
result_q = multiprocessing.Queue()
process_worker = ProcessWorker(task_q, result_q)
worker = Worker(config, index, process_worker, task_q, result_q)
something_to_look_at = pickle.dumps(worker) # FAIL: Doesn't like queues??
process_worker.start()
worker.run_long_command("ls /")

So, the problem was that I was assuming that Python was doing some sort of magic that is somehow different from the way that C++/fork() works. I somehow thought that Python only copied the class, not the whole program into a separate process. I seriously wasted days trying to get this to work because all of the talk about pickle serialization made me think that it actually sent everything over the pipe. I knew that certain things could not be sent over the pipe, but I thought my problem was that I was not packaging things up properly.
This all could have been avoided if the Python docs gave me a 10,000 ft view of what happens when this module is used. Sure, it tells me what the methods of multiprocess module does and gives me some basic examples, but what I want to know is what is the "Theory of Operation" behind the scenes! Here is the kind of information I could have used. Please chime in if my answer is off. It will help me learn.
When you run start a process using this module, the whole program is copied into another process. But since it is not the "__main__" process and my code was checking for that, it doesn't fire off yet another process infinitely. It just stops and sits out there waiting for something to do, like a zombie. Everything that was initialized in the parent at the time of calling multiprocess.Process() is all set up and ready to go. Once you put something in the multiprocess.Queue or shared memory, or pipe, etc. (however you are communicating), then the separate process receives it and gets to work. It can draw upon all imported modules and setup just as if it was the parent. However, once some internal state variables change in the parent or separate process, those changes are isolated. Once the process is spawned, it now becomes your job to keep them in sync if necessary, either through a queue, pipe, shared memory, etc.
I threw out the code and started over, but now I am only putting one extra function out in the ProcessWorker, an "execute" method that runs a command line. Pretty simple. I don't have to worry about launching and then closing a bunch of processes this way, which has caused me all kinds of instability and performance issues in the past in C++. When I switched to launching processes at the beginning and then passing messages to those waiting processes, my performance improved and it was very stable.
BTW, I looked at this link to get help, which threw me off because the example made me think that methods were being transported across the queues: http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html
The second example of the first section used "next_task()" that appeared (to me) to be executing a task received via the queue.

Instead of attempting to send a method itself (which is impractical), try sending a name of a method to execute.
Provided that each worker runs the same code, it's a matter of a simple getattr(self, task_name).
I'd pass tuples (task_name, task_args), where task_args were a dict to be directly fed to the task method:
next_task_name, next_task_args = self.task_q.get()
if next_task_name:
task = getattr(self, next_task_name)
answer = task(**next_task_args)
...
else:
# poison pill, shut down
break

REF: https://stackoverflow.com/a/14179779
Answer on Jan 6 at 6:03 by David Lynch is not factually correct when he says that he was misled by
http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html.
The code and examples provided are correct and work as advertised. next_task() is executing a task received via the queue -- try and understand what the Task.__call__() method is doing.
In my case what, tripped me up was syntax errors in my implementation of run(). It seems that the sub-process will not report this and just fails silently -- leaving things stuck in weird loops! Make sure you have some kind of syntax checker running e.g. Flymake/Pyflakes in Emacs.
Debugging via multiprocessing.log_to_stderr()F helped me narrow down the problem.

QObject::connect: Cannot queue arguments of type 'QTextCursor'

Im trying to send a signal from a non-main thread in PyQt but i dont know what am doing wrong! And when i execute the program it fails with this error:
QObject::connect: Cannot queue arguments of type 'QTextCursor'
(Make sure 'QTextCursor' is registered using qRegisterMetaType().)
here is my code:
class Sender(QtCore.QThread):
def __init__(self,q):
super(Sender,self).__init__()
self.q=q
def run(self):
while True:
pass
try: line = q.get_nowait()
# or q.get(timeout=.1)
except Empty:
pass
else:
self.emit(QtCore.SIGNAL('tri()'))
class Workspace(QMainWindow, Ui_MainWindow):
""" This class is for managing the whole GUI `Workspace'.
Currently a Workspace is similar to a MainWindow
"""
def __init__(self):
try:
from Queue import Queue, Empty
except ImportError:
while True:
#from queue import Queue, Empty # python 3.x
print "error"
ON_POSIX = 'posix' in sys.builtin_module_names
def enqueue_output(out, queue):
for line in iter(out.readline, b''):
queue.put(line)
out.close()
p= Popen(["java -Xmx256m -jar bin/HelloWorld.jar"],cwd=r'/home/karen/sphinx4-1.0beta5-src/sphinx4-1.0beta5/',stdout=PIPE, shell=True, bufsize= 4024)
q = Queue()
t = threading.Thread(target=enqueue_output, args=(p.stdout, q))
t.daemon = True # thread dies with the program
t.start()
self.sender= Sender(q)
self.connect(self.sender, QtCore.SIGNAL('tri()'), self.__action_About)
self.sender.start()
I think that my way of send parameter to the thread is wrong...
I need to know how to send parameters to a thread, in my case i need to send q to the worker thread.

Quite new to PyQt5, but this appears to happen when you try to do a GUI operation from a thread which is not the "application thread". I put this in quotes because it appears to be a mistake to think that, even in a fairly simple PyQt5 app, QApplication.instance().thread() will always return the same object.
The thing to do is to use the signal/slot mechanism to send any kind of data from a worker thread (a thread created in my case by extending QtCore.QRunnable, one other pattern apparently being QtCore.QThread and QtCore.QObject.moveToThread, see here).
Then also include a check in all your slot methods which are likely to receive data from a non-"application thread". Example which logs messages visually during execution:
def append_message(self, message):
# this "instance" method is very useful!
app_thread = QtWidgets.QApplication.instance().thread()
curr_thread = QtCore.QThread.currentThread()
if app_thread != curr_thread:
raise Exception('attempt to call MainWindow.append_message from non-app thread')
ms_now = datetime.datetime.now().isoformat(sep=' ', timespec='milliseconds')
self.messages_text_box.insertPlainText(f'{ms_now}: {message}\n')
# scroll to bottom
self.messages_text_box.moveCursor(QtGui.QTextCursor.End)
It's all too easy to just call this inadvertently and directly from a non-"application thread".
Making such a mistake then raise an exception is good, because it gives you a stack trace showing the culprit call. Then change the call so that it instead sends a signal to the GUI class, the slot for which could be the method in the GUI class (here append_message), or alternatively one which then in turn calls append_message.
In my example I've included the "scroll to bottom" line above because it was only when I added that line that these "cannot queue" errors started happening. In other words, it is perfectly possible to get away with a certain amount of non-compliant handling (in this case adding some more text with each call) without any error being raised... and only later do you then run into difficulties. To prevent this, I suggest that EVERY method in a GUI class with GUI functionality should include such a check!

Make sure 'QTextCursor' is registered using qRegisterMetaType().
Did you try to use qRegisterMetaType function?
The official manual says:
The class is used as a helper to marshall types in QVariant and in
queued signals and slots connections. It associates a type name to a
type so that it can be created and destructed dynamically at run-time.
Declare new types with Q_DECLARE_METATYPE() to make them available to
QVariant and other template-based functions. Call qRegisterMetaType()
to make type available to non-template based functions, such as the
queued signal and slot connections.

I would like to add the following notes to the #mike rodent's post which solved my problem (I'm using PyQt5):
Custom signals and slots can be used to avoid directly modifying GUI from thread other than "application thread" (I'm using Python threading module and the equivalent there to that is probably "main thread"). I find this website very useful for basic custom signal and slot setup. Pay attention to using a class (and not an instance) attribute.
To avoid the QObject::connect: Cannot queue arguments of type 'QTextCursor' message I needed to find the following locations and add some code:
Before the function __init__ of the class MainWindow: definition of class attribute; I needed to use something like class_attribute = pyqtSignal(str).
In the function __init__: self.class_attribute.connect(self.slot_name)
Inside of a thread (I mean the thread which is not the main thread): self.class_attribute.emit(str)
In the slot inside the main thread: "safety mechanism" proposed by #mike rodent.

Producing content indefinitely in a separate thread for all connections?

I have a Twisted project which seeks to essentially rebroadcast collected data over TCP in JSON. I essentially have a USB library which I need to subscribe to and synchronously read in a while loop indefinitely like so:
while True:
for line in usbDevice.streamData():
data = MyBrandSpankingNewUSBDeviceData(line)
# parse the data, convert to JSON
output = convertDataToJSON(data)
# broadcast the data
...
The problem, of course, is the .... Essentially, I need to start this process as soon as the server starts and end it when the server ends (Protocol.doStart and Protocol.doStop) and have it constantly running and broadcasting a output to every connected transport.
How can I do this in Twisted? Obviously, I'd need to have the while loop run in its own thread, but how can I "subscribe" clients to listen to output? It's also important that the USB data collection only be running once, as it could seriously mess things up to have it running more than once.
In a nutshell, here's my architecture:
Server has a USB hub which is streaming data all the time. Server is constantly subscribed to this USB hub and is constantly reading data.
Clients will come and go, connecting and disconnecting at will.
We want to send data to all connected clients whenever it is available. How can I do this in Twisted?

One thing you probably want to do is try to extend the common protocol/transport independence. Even though you need a thread with a long-running loop, you can hide this from the protocol. The benefit is the same as usual: the protocol becomes easier to test, and if you ever manage to have a non-threaded implementation of reading the USB events, you can just change the transport without changing the protocol.
from threading import Thread
class USBThingy(Thread):
def __init__(self, reactor, device, protocol):
self._reactor = reactor
self._device = device
self._protocol = protocol
def run(self):
while True:
for line in self._device.streamData():
self._reactor.callFromThread(self._protocol.usbStreamLineReceived, line)
The use of callFromThread is part of what makes this solution usable. It makes sure the usbStreamLineReceived method gets called in the reactor thread rather than in the thread that's reading from the USB device. So from the perspective of that protocol object, there's nothing special going on with respect to threading: it just has its method called once in a while when there's some data to process.
Your protocol then just needs to implement usbStreamLineReceived somehow, and implement your other application-specific logic, like keeping a list of observers:
class SomeUSBProtocol(object):
def __init__(self):
self.observers = []
def usbStreamLineReceived(self, line):
data = MyBrandSpankingNewUSBDeviceData(line)
# broadcast the data
for obs in self.observers[:]:
obs(output)
And then observers can register themselves with an instance of this class and do whatever they want with the data:
class USBObserverThing(Protocol):
def connectionMade(self):
self.factory.usbProto.observers.append(self.emit)
def connectionLost(self):
self.factory.usbProto.observers.remove(self.emit)
def emit(self, output):
# parse the data, convert to JSON
output = convertDataToJSON(data)
self.transport.write(output)
Hook it all together:
usbDevice = ...
usbProto = SomeUSBProtocol()
thingy = USBThingy(reactor, usbDevice, usbProto)
thingy.start()
factory = ServerFactory()
factory.protocol = USBObserverThing
factory.usbProto = usbProto
reactor.listenTCP(12345, factory)
reactor.run()
You can imagine a better observer register/unregister API (like one using actual methods instead of direct access to that list). You could also imagine giving the USBThingy a method for shutting down so SomeUSBProtocol could control when it stops running (so your process will actually be able to exit).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.