I am running a basic logger using a SocketHandler; essentially a minor variant of this code: https://docs.python.org/2.4/lib/network-logging.html.
My question is, is the logging from the client asynchronous? If it is not, is there a way to enforce a timeout? i.e. essentially the client should wait for the logging to happen till 't' seconds and then move on. I have multiple processes logging through the same server.
It's asynchronous in the sense that it can handle inputs from multiple processes interleaved with each other, but not asynchronous in the sense that the calls to sockets are blocking. Since each client connection is handled in a new thread, this doesn't matter too much as long as there aren't too many client connections.
Related
I'm currently working on a Benchmark project, where I'm trying to stress the server out with zmq requests.
I was wondering what would be the best way to approach this, I was thinking of having a context to create a socket and push it into a thread, in which I would send request and wait for responses in each thread respectively, but I'm not too sure this is possible with python's limitations.
More over, would it be the same socket for all threads, that is, if I'm waiting for a response on one thread (With it's own socket), would it be possible for another thread to catch that response?
Thanks.
EDIT:
Test flow logic would be like this:
Client socket would use zmq.REQ.
Client sends message.
Client waits for a response.
If no response, client reconnects and tries again until limit.
I'd like to scale this operation up to any number of clients, preferring not to deal with Processes unless performance wise the difference is significant..
How would you do this?
Q : "...can I have one context and use several sockets?"
Oh sure you can.
Moreover, you can have several Context()-instances, each one managing ... almost... any number of Socket()-instances, each Socket()-instance's methods may get called from one and only one python-thread ( a Zen-of-Zero rule: zero-sharing ).
Due to known GIL-lock re-[SERIAL]-isation of all the thread-based code-execution flow, this still has to and will wait for acquiring the GIL-lock ownership, which in turn permits a GIL-lock owner ( and nobody else ) to execute a fixed amount of python instructions, before it re-releases the GIL-lock to some other thread...
I have a Flask-SocketIO application. Can I safely call socketio.emit() from different threads? Is socketio.emit() atomic like the normal socket.send()?
The socketio.emit() function is thread safe, or I should say that it is intended to be thread-safe, as there is currently one open issue related to this. Note that 'thread' in this context means a supported threading model. Most people use Flask-SocketIO in conjunction with eventlet or gevent in production, so in those contexts thread means "green" thread.
The open issue is related to using a message queue, which is necessary when you have multiple servers. In that set up, the accesses to the queue are not thread safe at this time. This is a bug that needs to be fixed, but as a workaround, you can create a different socketio object per thread.
On second question regarding if socketio.emit() is atomic, the answer is no. This is not a simple socket write operation. The payload needs to be formatted in certain way to comply with the Socket.IO protocol, then depending on the selected transport (long-polling or websocket) the write happens in a completely different way.
I am working on an application in which I may potentially need to log the entire traffic reaching the server. This feature may be turned on or off, or may be used when exceptions are caught.
In any case, I am concerned about the blocking nature of disk I/O operations and their impact on the performance of the server. The business logic that is applied when a request is handled (mostly POST http requests), is asynchronous in such that every network or db calls are asynchronously executed.
On the other hand, I am concerned about the delay to the thread while it is waiting for the disk IO operation to complete. The logged messages can be a few bytes to a few KBs but in some cases a few MBs. There is no real need for the thread to pause while data is written to disk, the http request can definitely complete at that point and there is no reason that the ioloop thread not to work on another task while data is written to disk.
So my questions are:
am I over-worried about this issue? is logging to standard output
and later redirecting it to a file "good enough"?
what is the common approach, or the one you found most practical for logging in tornado-based applications? even for simple logging and not the (extreme) case I outlined above?
is this basically an ideal case for queuing the logging messages and consume them from a dedicated thread?
Say I do offload the logging to a different thread (like Homer Simpson's "Can't Someone Else Do It?"), if the thread that performs the disk logging is waiting for the disk io operation to complete, does the linux kernel takes that point as an opportunity a context switch?
Any comments or suggestion are much appreciated,
Erez
For "normal" logging (a few lines per request), I've always found logging directly to a file to be good enough. That may not be true if you're logging all the traffic to the server. The one time I've needed to do something like that I just captured the traffic externally with tcpdump instead of modifying my server.
If you want to capture it in the process, start by just writing to a file from the main thread. As always, measure things in your own environment before taking drastic action (IOLoop.set_blocking_log_threshold is useful for determining if your logging is a problem).
If writing from the main thread blocks for too long, you can either write to a queue that is processed by another thread, or write asynchronously to a pipe or socket to another process (syslog?).
" write asynchronously to a pipe or socket to another process
(syslog?"
How can it be? log_requestis a normal function - not a coroutine and all default python handlers are not driven by asyncio event loop so they are not truly asynchronous. This is imho one of the factors that make Tornado less performant than ie. aiohttp. Writing to the memory or using udp is fast but it is not async anyway.
Historic reference
I have ops experience from the time of the question circa 2016 with a Python 3.4 Tornado 4 application running on a decent bare-metal machine. The application interacted with few 3rd-party HTTP APIs, and logged some of the interactions for potential troubleshooting in the future (which is similar to OP's requirements). The machine had a RAID of HDDs. As far as I can recall the application wasn't high-traffic.
Tornado 4 had its own IO loop implementation (Tornado 5+ uses asyncio's now), and there was an interesting code instrumentation, controlled by IOLoop.set_blocking_log_threshold. Basically it logged a WARNING record with the stack trace whenever the loop was blocked longer than the threshold seconds. I can find a couple of screenshots from that time from the Sentry timeline for the very warning where the threshold was set to 1 second.
Most of the warnings had stack traces ending on the logging file handler's flush. It was a rotating and gzipping file handler. The later may explain what might take longer that a second, but anyway for the application it was desired to keep full responsibility over logging. The solution was the stdlib pair of logging.handlers.QueueHandler and logging.handlers.QueueListener.
Logging queue
Python logging cookbook has a dedicated section on Dealing with handlers that block. Here's the example from it (where listener.start starts a thread that read off the queue and delegates the records to the handler):
que = queue.Queue(-1) # no limit on size
queue_handler = QueueHandler(que)
handler = logging.StreamHandler()
listener = QueueListener(que, handler)
root = logging.getLogger()
root.addHandler(queue_handler)
formatter = logging.Formatter('%(threadName)s: %(message)s')
handler.setFormatter(formatter)
listener.start()
# The log output will display the thread which generated
# the event (the main thread) rather than the internal
# thread which monitors the internal queue. This is what
# you want to happen.
root.warning('Look out!')
listener.stop()
For a real-world reference of a QueueHanlder implementation that covers the edge cases, chronologer.client.QueueProxyHandler can be used.
asyncio instrumentation
asyncio has a debug mode.
By default asyncio runs in production mode. In order to ease the development asyncio has a debug mode. [...] When the debug mode is enabled:
asyncio checks for coroutines that were not awaited and logs them; this
mitigates the “forgotten await” pitfall.
Many non-threadsafe asyncio APIs (such as loop.call_soon() and
loop.call_at() methods) raise an exception if they are called from a
wrong thread.
The execution time of the I/O selector is logged if it takes too long to
perform an I/O operation.
Callbacks taking longer than 100ms are logged. The
loop.slow_callback_duration attribute can be used to set the minimum
execution duration in seconds that is considered “slow”.
It may look richer than what Tornado 4 had, but in fact it's not. First, it's not intended for production (and a pretty important metric is missing). Moreover, it's an after the fact warning without a stack trace, whereas Tornado's implementation was based on signal.SIGALRM and provided the stack trace at the threshold hit.
For the curious
Have you noticed that the warnings didn't go away completely? But I can assure you that the logging issue was fixed. What caused these rare issues, to my surprise, was uuid.uuid4, which can be blocking on a machine with empty entropy pool, but that's a another story.
Further reading
Python-tulip group discussion between asyncio maintainers about
asynchronous file IO, logging and aiofiles library
A section in Trio documentation that explains the theory and
trafeoffs behind async file I/O
When using time.sleep(1) before sendMessage, the hole process stops (even the others connections).
def handleConnected(self):
print self.address, 'connected'
for client in clients:
time.sleep(1)
client.sendMessage(self.address[0] + u' - connected')
Server: https://github.com/dpallot/simple-websocket-server
How to solve it?
The server that you are using is a synchronous, "select" type server. These servers use a single process and a single thread, they achieve concurrency through the use of the select() function to efficiently wait for I/O on multiple socket connections.
The advantage of select servers is that they can easily scale to very large number of clients. The disadvantage is that when the server invokes an application handler (the handleConnected(), handleMessage() and handleClose() methods for this server), the server blocks on them, meaning that while the handlers are running the server is suspended, because both the handlers and the server run on the same thread. The only way for the server to be responsive in this type of architecture is to code the handlers in such a way that they do what they need to do quickly and return control back to the server.
Your handleConnected handler function is not a good match for this type of server, because it is a long running function. This function will run for several seconds (as many seconds as there are clients), so during all that time the server is going to be blocked.
You can maybe work around the limitations in this server by creating a background thread for your long running task. That way your handler can return back to the server after launching the thread. The server will then regain control and go back to work, while the background thread does that loop with the one second sleeps inside. The only problem you have to consider is that now you have sort of a home-grown multithreaded server, so you will not be able to scale as easily.
Another option for you to consider is to use a different server architecture. A coroutine based server will support your handler function as you coded it, for example. The two servers that I recommend in this category are eventlet and gevent. The eventlet server comes with native WebSocket support. For gevent you have to install an extension called gevent-websocket.
Good luck!
You are suspending the thread with sleep and the server which you are using seems to be using select to handle the requests not threads. So no other request will be able to be handled.
So you can't use time.sleep.
Why do you need to sleep? Can you solve it some other way?
Maybe you can use something like threading.Timer()
def sendHello(client):
client.sendMessage("hello, world")
for client in clients:
t = Timer(1.0, lambda: sendHello(client))
t.start() # after 30 seconds, "hello, world" will be printed
This is off the top of my head. You would also need a way to cancel each timer so I guess you would need to save each t in a list and call it when done.
I'm trying to write a scalable custom web server.
Here's what I have so far:
The main loop and request interpreter are in Cython. The main loop accepts connections and assigns the sockets to one of the processes in the pool (has to be processes, threads won't get any benefit from multi-core hardware because of the GIL).
Each process has a thread pool. The process assigns the socket to a thread.
The thread calls recv (blocking) on the socket and waits for data. When some shows up, it gets piped into the request interpreter, and then sent via WSGI to the application running in that thread.
Now I've heard about epoll and am a little confused. Is there any benefit to using epoll to get socket data and then pass that directly to the processes? Or should I just go the usual route of having each thread wait on recv?
PS: What is epoll actually used for? It seems like multithreading and blocking fd calls would accomplish the same thing.
If you're already using multiple threads, epoll doesn't offer you much additional benefit.
The point of epoll is that a single thread can listen for activity on many file selectors simultaneously (and respond to events on each as they occur), and thus provide event-driven multitasking without requiring the spawning of additional threads. Threads are relatively cheap (compared to spawning processes), but each one does require some overhead (after all, they each have to maintain a call stack).
If you wanted to, you could rewrite your pool processes to be single-threaded using epoll, which would reduce your overall thread usage count, but of course you'd have to consider whether that's something you care about or not - in general, for low numbers of simultaneous requests on each worker, the overhead of spawning threads wouldn't matter, but if you want each worker to be able to handle 1000s of open connections, that overhead can become significant (and that's where epoll shines).
But...
What you're describing sounds suspiciously like you're basically reinventing the wheel - your:
main loop and request interpreter
pool of processes
sounds almost exactly like:
nginx (or any other load balancer/reverse proxy)
A pre-forking tornado app
Tornado is a single-threaded web server python module using epoll, and it has the capability built-in for pre-forking (meaning that it spawns multiple copies of itself as separate processes, effectively creating a process pool). Tornado is based on the tech created to power Friendfeed - they needed a way to handle huge numbers of open connections for long-polling clients looking for new real-time updates.
If you're doing this as a learning process, then by all means, reinvent away! It's a great way to learn. But if you're actually trying to build an application on top of these kinds of things, I'd highly recommend considering using the existing, stable, communally-developed projects - it'll save you a lot of time, false starts, and potential gotchas.
(P.S. I approve of your avatar. <3)
The epoll function (and the other functions in the same family poll and select) allow you to write single threading networking code that manage multiple networking connection. Since there is no threading, there is no need fot synchronisation as would be required in a multi-threaded program (this can be difficult to get right).
On the other hand, you'll need to have an explicit state machine for each connection. In a threaded program, this state machine is implicit.
Those function just offer another way to multiplex multiple connexion in a process. Sometimes it is easier not to use threads, other times you're already using threads, and thus it is easier just to use blocking sockets (which release the GIL in Python).