Python - Handling concurrent POST requests with HTTPServer and saving to files - python

I have the code below that I would like to use to receive POST requests with data that I will then use to print labels on a label printer. In order to print the labels I will need to write a file with print commands and then do a lp command via the command line to copy the file to the label printer.
The problem I have is that multiple people could be printing labels at the same time. So my question is do I have to change the code below to use ThreadingMixIn in order to handle concurrent POST requests or can I leave the code as is and there will only be a slight delay for secondary request in a concurrent scenario (that is any further requests will be queued and not lost)?
If I have to go the threaded way how does that impact the writing of the file and subsequent command line call to lp if there are now multiple threads trying to write to the same file?
Note that there are multiple label printers that are being accessed through print queues (CUPS).
import json
from http.server import HTTPServer, BaseHTTPRequestHandler
from io import BytesIO
class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.end_headers()
self.wfile.write(b'Hello, world!')
def do_POST(self):
content_length = int(self.headers['Content-Length'])
body = self.rfile.read(content_length)
try:
result = json.loads(body, encoding='utf-8')
self.send_response(200)
self.end_headers()
response = BytesIO()
response.write(b'This is POST request. ')
response.write(b'Received: ')
response.write(body)
self.wfile.write(response.getvalue())
except Exception as exc:
self.wfile.write('Request has failed to process. Error: %s', exc.message)
httpd = HTTPServer(('localhost', 8000), SimpleHTTPRequestHandler)
httpd.serve_forever()

Why not trying to use unique file name?
In this way you are sure that there will be no names clash.
Have a look on https://docs.python.org/2/library/tempfile.html , consider NamedTemporaryFile function. You should use delete=False, otherway the file is deleted imediatelly after close() .

according to your question what I understand is that you have only one label printer and you have multiple producers who try to print labels on it.
So even if you switch to a multithreading option you will have to synchronize threads in order to avoid deadlocks and infinite waiting,
So my best take is to go with a builtin python data structure called queue
according to the doc
The queue module implements multi-producer, multi-consumer queues. It is especially useful in threaded programming when information must be exchanged safely between multiple threads. The Queue class in this module implements all the required locking semantics. It depends on the availability of thread support in Python,
Even though its a multi-consumer, multi-producer queue I suppose it will still work for you like a charm.
So here is what you need to do
Your server receives a request to print the label,
do the necessary processing/cleanup and put it in the queue
A worker thread pops the items from the queue and executes the task
or if you would expect the system to be big enough, here are some links, but steps would be same as above
RabbitMq - A scalable message broker (simply put a queue)
Celery - A python package for popping items from a message broker such as rabbitmq and executes it

Related

broken pipe error with python multiprocessing and socketserver

Essentially Im using the socketserver python library to try and handle communications from a central server to multiple raspberry pi4 and esp32 peripherals. Currently i have the socketserver running serve_forever, then the request handler calls a method from a processmanager class which starts a process that should handle the actual communication with the client.
It works fine if i use .join() on the process such that the processmanager method doesnt exit, but thats not how i would like it to run. Without .join() i get a broken pipe error as soon as the client communication process tries to send a message back to the client.
This is the process manager class, it gets defined in the main file and buildprocess is called through the request handler of the socketserver class:
import multiprocessing as mp
mp.allow_connection_pickling()
import queuemanager as qm
import hostmain as hmain
import camproc
import keyproc
import controlproc
# method that gets called into a process so that class and socket share memory
def callprocess(periclass, peritype, clientsocket, inqueue, genqueue):
periclass.startup(clientsocket)
class ProcessManager(qm.QueueManager):
def wipeproc(self, target):
# TODO make wipeproc integrate with the queue manager rather than directly to the class
for macid in list(self.procdict.keys()):
if target == macid:
# calls proc kill for the class
try:
self.procdict[macid]["class"].prockill()
except Exception as e:
print("exception:", e, "in wipeproc")
# waits for process to exit naturally (class threads to close)
self.procdict[macid]["process"].join()
# remove dict entry for this macid
self.procdict.pop(macid)
# called externally to create the new process and append to procdict
def buildprocess(self, peritype, macid, clientsocket):
# TODO put some logic here to handle the differences of the controller process
# generates queue object
inqueue = mp.Queue()
# creates periclass instance based on type
if peritype == hmain.cam:
periclass = camproc.CamMain(self, inqueue, self.genqueue)
elif peritype == hmain.keypad:
print("to be added to")
elif peritype == hmain.motion:
print("to be added to")
elif peritype == hmain.controller:
print("to be added to")
# init and start call for the new process
self.procdict[macid] = {"type": peritype, "inqueue": inqueue, "class": periclass, "process": None}
self.procdict[macid]["process"] = mp.Process(target=callprocess,
args=(self.procdict[macid]["class"], self.procdict[macid]["type"], clientsocket, self.procdict[macid]["inqueue"], self.genqueue))
self.procdict[macid]["process"].start()
# updating the process dictionary before class obj gets appended
# if macid in list(self.procdict.keys()):
# self.wipeproc(macid)
print(self.procdict)
print("client added")
to my eye, all the pertinent objects should be stored in the procdict dictionary but as i mentioned it just gets a broken pipe error unless i join the process with self.procdict[macid]["process"].join() before the end of the buildprocess method
I would like it to exit the method but leave the communication process running as is, ive tried a few different things with restructuring what gets defined within the process and without, but to no avail. Thus far i havent been able to find any pertinent solutions online but of course i may have missed something too.
Thankyou for reading this far if you did! Ive been stuck on this for a couple days so any help would be appreciated, this is my first project with multiprocessing and sockets on any sort of scale.
#################
Edit to include pastebin with all the code:
https://pastebin.com/u/kadytoast/1/PPWfyCFT
Without .join() i get a broken pipe error as soon as the client communication process tries to send a message back to the client.
That's because at the time when the request handler handle() returns, the socketserver does shutdown the connection. That socketserver simplifies the task of writing network servers means it does certain things automatically which are usually done in the course of network request handling. Your code is not quite making the intended use of socketserver. Especially, for handling requests asynchronously, Asynchronous Mixins are intended. With the ForkingMixIn the server will spawn a new process for each request, in contrast to your current code which does this by itself with mp.Process. So, I think you have basically two options:
code less of the request handling yourself and use the provided socketserver methods
stay with your own handling and don't use socketserver at all, so it won't get in the way.

How do you understand the ioloop in tornado?

I am looking for a way to understand ioloop in tornado, since I read the official doc several times, but can't understand it. Specifically, why it exists.
from tornado.concurrent import Future
from tornado.httpclient import AsyncHTTPClient
from tornado.ioloop import IOLoop
def async_fetch_future():
http_client = AsyncHTTPClient()
future = Future()
fetch_future = http_client.fetch(
"http://mock.kite.com/text")
fetch_future.add_done_callback(
lambda f: future.set_result(f.result()))
return future
response = IOLoop.current().run_sync(async_fetch_future)
# why get current IO of this thread? display IO, hard drive IO, or network IO?
print response.body
I know what is IO, input and output, e.g. read a hard drive, display graph on the screen, get keyboard input.
by definition, IOLoop.current() returns the current io loop of this thread.
There are many IO device on my laptop running this python code. Which IO does this IOLoop.current() return? I never heard of IO loop in javascript nodejs.
Furthermore, why do I care this low level thing if I just want to do a database query, read a file?
I never heard of IO loop in javascript nodejs.
In node.js, the equivalent concept is the event loop. The node event loop is mostly invisible because all programs use it - it's what's running in between your callbacks.
In Python, most programs don't use an event loop, so when you want one, you have to run it yourself. This can be a Tornado IOLoop, a Twisted Reactor, or an asyncio event loop (all of these are specific types of event loops).
Tornado's IOLoop is perhaps confusingly named - it doesn't do any IO directly. Instead, it coordinates all the different IO (mainly network IO) that may be happening in the program. It may help you to think of it as an "event loop" or "callback runner".
Rather to say it is IOLoop, maybe EventLoop is clearer for you to understand.
IOLoop.current() doesn't really return an IO device but just a pure python event loop which is basically the same as asyncio.get_event_loop() or the underlying event loop in nodejs.
The reason why you need event loop to just do a database query is that you are using event-driven structure to do databse query(In your example, you are doing http request).
Most of time you do not need to care about this low level structure. Instead you just need to use async&await keywords.
Let's say there is a lib which supports asynchronous database access:
async def get_user(user_id):
user = await async_cursor.execute("select * from user where user_id = %s" % user_id)
return user
Then you just need to use this function in your handler:
class YourHandler(tornado.web.RequestHandler):
async def get():
user = await get_user(self.get_cookie("user_id"))
if user is None:
return self.finish("No such user")
return self.finish("Your are %s" % user.user_name)

Python Requests: Don't wait for request to finish

In Bash, it is possible to execute a command in the background by appending &. How can I do it in Python?
while True:
data = raw_input('Enter something: ')
requests.post(url, data=data) # Don't wait for it to finish.
print('Sending POST request...') # This should appear immediately.
Here's a hacky way to do it:
try:
requests.get("http://127.0.0.1:8000/test/",timeout=0.0000000001)
except requests.exceptions.ReadTimeout:
pass
Edit: for those of you that observed that this will not await a response - that is my understanding of the question "fire and forget... do not wait for it to finish". There are much more thorough and complete ways to do it with threads or async if you need response context, error handling, etc.
I use multiprocessing.dummy.Pool. I create a singleton thread pool at the module level, and then use pool.apply_async(requests.get, [params]) to launch the task.
This command gives me a future, which I can add to a list with other futures indefinitely until I'd like to collect all or some of the results.
multiprocessing.dummy.Pool is, against all logic and reason, a THREAD pool and not a process pool.
Example (works in both Python 2 and 3, as long as requests is installed):
from multiprocessing.dummy import Pool
import requests
pool = Pool(10) # Creates a pool with ten threads; more threads = more concurrency.
# "pool" is a module attribute; you can be sure there will only
# be one of them in your application
# as modules are cached after initialization.
if __name__ == '__main__':
futures = []
for x in range(10):
futures.append(pool.apply_async(requests.get, ['http://example.com/']))
# futures is now a list of 10 futures.
for future in futures:
print(future.get()) # For each future, wait until the request is
# finished and then print the response object.
The requests will be executed concurrently, so running all ten of these requests should take no longer than the longest one. This strategy will only use one CPU core, but that shouldn't be an issue because almost all of the time will be spent waiting for I/O.
Elegant solution from Andrew Gorcester. In addition, without using futures, it is possible to use the callback and error_callback attributes (see
doc) in order to perform asynchronous processing:
def on_success(r: Response):
if r.status_code == 200:
print(f'Post succeed: {r}')
else:
print(f'Post failed: {r}')
def on_error(ex: Exception):
print(f'Post requests failed: {ex}')
pool.apply_async(requests.post, args=['http://server.host'], kwargs={'json': {'key':'value'},
callback=on_success, error_callback=on_error))
According to the doc, you should move to another library :
Blocking Or Non-Blocking?
With the default Transport Adapter in place, Requests does not provide
any kind of non-blocking IO. The Response.content property will block
until the entire response has been downloaded. If you require more
granularity, the streaming features of the library (see Streaming
Requests) allow you to retrieve smaller quantities of the response at
a time. However, these calls will still block.
If you are concerned about the use of blocking IO, there are lots of
projects out there that combine Requests with one of Python’s
asynchronicity frameworks.
Two excellent examples are
grequests and
requests-futures.
Simplest and Most Pythonic Solution using threading
A Simple way to go ahead and send POST/GET or to execute any other function without waiting for it to finish is using the built-in Python Module threading.
import threading
import requests
def send_req():
requests.get("http://127.0.0.1:8000/test/")
for x in range(100):
threading.Thread(target=send_req).start() # start's a new thread and continues.
Other Important Features of threading
You can turn these threads into daemons using thread_obj.daemon = True
You can go ahead and wait for one to complete executing and then continue using thread_obj.join()
You can check if a thread is alive using thread_obj.is_alive() bool: True/False
You can even check the active thread count as well by threading.active_count()
Official Documentation
If you can write the code to be executed separately in a separate python program, here is a possible solution based on subprocessing.
Otherwise you may find useful this question and related answer: the trick is to use the threading library to start a separate thread that will execute the separated task.
A caveat with both approach could be the number of items (that's to say the number of threads) you have to manage. If the items in parent are too many, you may consider halting every batch of items till at least some threads have finished, but I think this kind of management is non-trivial.
For more sophisticated approach you can use an actor based approach, I have not used this library myself but I think it could help in that case.
from multiprocessing.dummy import Pool
import requests
pool = Pool()
def on_success(r):
print('Post succeed')
def on_error(ex):
print('Post requests failed')
def call_api(url, data, headers):
requests.post(url=url, data=data, headers=headers)
def pool_processing_create(url, data, headers):
pool.apply_async(call_api, args=[url, data, headers],
callback=on_success, error_callback=on_error)

Python: write to single file from multiple processes (ZMQ)

I want to write to a single file from multiple processes. To be precise, I would rather not use the Multiple processing Queue solution for multiprocessing as there are several submodules written by other developers. However, each write to the file for such submodules is associated with a write to a zmq queue. Is there a way I can redirect the zmq messages to a file? Specifically I am looking for something along the lines of http://www.huyng.com/posts/python-logging-from-multiple-processes/ without using the logging module.
It's fairly straightforward. In one process, bind a PULL socket and open a file.
Every time the PULL socket receives a message, it writes directly to the file.
EOF = chr(4)
import zmq
def file_sink(filename, url):
"""forward messages on zmq to a file"""
socket = zmq.Context.instance().socket(zmq.PULL)
socket.bind(url)
written = 0
with open(filename, 'wb') as f:
while True:
chunk = socket.recv()
if chunk == EOF:
break
f.write(chunk)
written += len(chunk)
socket.close()
return written
In the remote processes, create a Proxy object,
whose write method just sends a message over zmq:
class FileProxy(object):
"""Proxy to a remote file over zmq"""
def __init__(self, url):
self.socket = zmq.Context.instance().socket(zmq.PUSH)
self.socket.connect(url)
def write(self, chunk):
"""write a chunk of bytes to the remote file"""
self.socket.send(chunk)
And, just for fun, if you call Proxy.write(EOF), the sink process will close the file and exit.
If you want to write multiple files, you can do this fairly easily either by starting multiple sinks and having one URL per file,
or making the sink slightly more sophisticated and using multipart messages to indicate what file is to be written.

Producing content indefinitely in a separate thread for all connections?

I have a Twisted project which seeks to essentially rebroadcast collected data over TCP in JSON. I essentially have a USB library which I need to subscribe to and synchronously read in a while loop indefinitely like so:
while True:
for line in usbDevice.streamData():
data = MyBrandSpankingNewUSBDeviceData(line)
# parse the data, convert to JSON
output = convertDataToJSON(data)
# broadcast the data
...
The problem, of course, is the .... Essentially, I need to start this process as soon as the server starts and end it when the server ends (Protocol.doStart and Protocol.doStop) and have it constantly running and broadcasting a output to every connected transport.
How can I do this in Twisted? Obviously, I'd need to have the while loop run in its own thread, but how can I "subscribe" clients to listen to output? It's also important that the USB data collection only be running once, as it could seriously mess things up to have it running more than once.
In a nutshell, here's my architecture:
Server has a USB hub which is streaming data all the time. Server is constantly subscribed to this USB hub and is constantly reading data.
Clients will come and go, connecting and disconnecting at will.
We want to send data to all connected clients whenever it is available. How can I do this in Twisted?
One thing you probably want to do is try to extend the common protocol/transport independence. Even though you need a thread with a long-running loop, you can hide this from the protocol. The benefit is the same as usual: the protocol becomes easier to test, and if you ever manage to have a non-threaded implementation of reading the USB events, you can just change the transport without changing the protocol.
from threading import Thread
class USBThingy(Thread):
def __init__(self, reactor, device, protocol):
self._reactor = reactor
self._device = device
self._protocol = protocol
def run(self):
while True:
for line in self._device.streamData():
self._reactor.callFromThread(self._protocol.usbStreamLineReceived, line)
The use of callFromThread is part of what makes this solution usable. It makes sure the usbStreamLineReceived method gets called in the reactor thread rather than in the thread that's reading from the USB device. So from the perspective of that protocol object, there's nothing special going on with respect to threading: it just has its method called once in a while when there's some data to process.
Your protocol then just needs to implement usbStreamLineReceived somehow, and implement your other application-specific logic, like keeping a list of observers:
class SomeUSBProtocol(object):
def __init__(self):
self.observers = []
def usbStreamLineReceived(self, line):
data = MyBrandSpankingNewUSBDeviceData(line)
# broadcast the data
for obs in self.observers[:]:
obs(output)
And then observers can register themselves with an instance of this class and do whatever they want with the data:
class USBObserverThing(Protocol):
def connectionMade(self):
self.factory.usbProto.observers.append(self.emit)
def connectionLost(self):
self.factory.usbProto.observers.remove(self.emit)
def emit(self, output):
# parse the data, convert to JSON
output = convertDataToJSON(data)
self.transport.write(output)
Hook it all together:
usbDevice = ...
usbProto = SomeUSBProtocol()
thingy = USBThingy(reactor, usbDevice, usbProto)
thingy.start()
factory = ServerFactory()
factory.protocol = USBObserverThing
factory.usbProto = usbProto
reactor.listenTCP(12345, factory)
reactor.run()
You can imagine a better observer register/unregister API (like one using actual methods instead of direct access to that list). You could also imagine giving the USBThingy a method for shutting down so SomeUSBProtocol could control when it stops running (so your process will actually be able to exit).

Categories