ZeroMQ poll thread safety

ZeroMQ poll thread safety - python

I have a thread that is polling on a ZMQ Poller:
poller.poll(timeout)
This thread is also the one which receives and sends back messages over the sockets registered in the poller.
Then I have another thread that may, eventually, create a new socket and register it for polling on input events:
socket = context.socket(...)
socket.bind/connect(...)
poller.register(socket, zmq.POLLIN)
Once the socket is registered, the latter thread will not touch it again.
Is this safe?
Update
The answers/comments I got were about how I should not be doing this. Or which are The Guide's recommendations (which I already knew). But that does not really answer my question.
To be more specific, I would say that I am working with pyzmq Python bindings for ZeroMQ.
Now, although ZeroMQ sockets are not thread safe, it is indeed possible to transfer them from one thread to another as long as there is a full memory barrier during the transfer.
So the first question would be: do I need to set an explicit memory barrier in there? Note that there is one thread that creates and binds/connects the socket and then it registers it, but it will not be using that thread again. Is there an actual conflict? could there be a moment in which I should be explicitly preventing access to the socket from both threads?
Then the second question would be: is registering a socket in a poller thread-safe? Most of the time the thread that performs the polling is busy doing other stuff, but it could happen that it is polling waiting for a timeout. In that case, do I need to use a lock to prevent concurrent access to the poller? or is it safe to register the new socket in the poller while the other thread is polling it?
Update II
I am using Pyro4 to handle and configure remote processes (i.e.: their ZeroMQ connections and their behavior). The initial configuration can be done with the Pyro Proxy very esaily. However, when I start the process, I am in fact running the main loop with a dedicated thread (Pyro oneway call) that keeps running, but if I access the object with the Pyro Proxy again, then this access is from another thread.
So the idea is to avoid modifying the remote object's class but still allow the use of Pyro for configuring the remote objects even when they are running. As long as the creation + binding/connecting + registering of new sockets is safe from another thread, I am good.

Once the socket is registered, the latter thread will not touch it again.
Is this safe?
No.
Industries that not only require safe solutions, but also export the responsibility to actually prove both the stable and warranted system behaviour to the vendor side (be it due to wise grandfathers, a deep belief in QA/TQM or due to regulations imposed on MIL/GOV/aerospace/healthcare/pharma/automotive et al segment vendor management) would simply straight reject.
Why?
" ... will not touch it again." is just a promise.
Safety cross-validated system design does not settle with less than a proof of a collision avoidance.
Let me cite from a lovely book from Pieter HINTJENS "Code Connected, Vol.1" - a must read piece for ZeroMQ:
Some widely used models, despite being the basis for entire industries, are fundamentally broken, and shared state concurrency is one of them. Code that wants to scale without limit does it like the Internet does, by sending messages and sharing nothing except a common contempt for broken programming models.
You should follow some rules to write happy multithreaded code with ØMQ:
• Isolate data privately within its thread and never share data in multiple threads. The only exception to this are ØMQ contexts, which are threadsafe.
• Stay away from the classic concurrency mechanisms like as mutexes, critical sections, semaphores, etc. These are an anti-pattern in ØMQ applications.
• Create one ØMQ context at the start of your process, and pass that to all threads that you want to connect via inproc sockets.
• Use attached threads to create structure within your application, and connect these to their parent threads using PAIR sockets over inproc. The pattern is: bind parent socket, then create child thread which connects its socket.
• Use detached threads to simulate independent tasks, with their own contexts. Connect these over tcp. Later you can move these to stand-alone processes without changing the code significantly.
• All interaction between threads happens as ØMQ messages, which you can define more or less formally.
• Don’t share ØMQ sockets between threads. ØMQ sockets are not threadsafe. Technically it’s possible to migrate a socket from one thread to another but it demands skill. The only place where it’s remotely sane to share sockets between threads are in language bindings that need to do magic like garbage collection on sockets.
If you need to start more than one proxy in an application, for example, you will want to run each in their own thread. It is easy to make the error of creating the proxy frontend and backend sockets in one thread, and then passing the sockets to the proxy in another thread. This may appear to work at first but will fail randomly in real use. Remember: Do not use or close sockets except in the thread that created them.
If you follow these rules, you can quite easily build elegant multithreaded applications, and later split off threads into separate processes as you need to. Application logic can sit in threads, processes, or nodes: whatever your scale needs.
ØMQ uses native OS threads rather than virtual “green” threads. The advantage is that you don’t need to learn any new threading API, and that ØMQ threads map cleanly to your operating system. You can use standard tools like Intel’s ThreadChecker to see what your application is doing. The disadvantages are that native threading APIs are not always portable, and that if you have a huge number of threads (in the thousands), some operating systems will get stressed.
If you’re sharing sockets across threads, don’t. It will lead to random weirdness, and crashes.
We could assume "light" conditions: system not stressed, high-watermark never reached, no big congestions. There is just a single thread running the application (polling and executing tasks on input). So most of the time (99.99%) there is no concurrency. Now, concurrency only occurs when a second thread appears just to add a socket to the pool. There will never be more than 2 threads being executed. And the second thread will be always restricted to adding new sockets to the pool (once added the socket is transferred to the main thread). Is this enough for boundary conditions? – Peque
The more the schematic use-case details were added in update-II, the professional solution shall not lose time and shall avoid any hidden risks by using thread-clean design.
#T1 a poller-maintainer -has Context() instance control
-has graceful .close() + .term() responsibility
-has POLLER instance under it's own control
-has PAIR .bind( "inproc://worker2poller" )
-has PAIR .recv() <add_socket>-request processing responsibility
#T2 a worker-process: -has PAIR .connect( "inproc://worker2poller" )
-has PAIR .send() privilege to ask T1 to add a socket & include it into POLLER
While GIL anyway avoids any chance to find the python threads run PARALLEL, the pure OOP-design is the motivation to keep the architecture with both clean and separated responsibilities and keeping the Formal Communication Patterns fully scaleable.

Related

pyzmq - zmq_req can I have one context and use several sockets?

I'm currently working on a Benchmark project, where I'm trying to stress the server out with zmq requests.
I was wondering what would be the best way to approach this, I was thinking of having a context to create a socket and push it into a thread, in which I would send request and wait for responses in each thread respectively, but I'm not too sure this is possible with python's limitations.
More over, would it be the same socket for all threads, that is, if I'm waiting for a response on one thread (With it's own socket), would it be possible for another thread to catch that response?
Thanks.
EDIT:
Test flow logic would be like this:
Client socket would use zmq.REQ.
Client sends message.
Client waits for a response.
If no response, client reconnects and tries again until limit.
I'd like to scale this operation up to any number of clients, preferring not to deal with Processes unless performance wise the difference is significant..
How would you do this?

Q : "...can I have one context and use several sockets?"
Oh sure you can.
Moreover, you can have several Context()-instances, each one managing ... almost... any number of Socket()-instances, each Socket()-instance's methods may get called from one and only one python-thread ( a Zen-of-Zero rule: zero-sharing ).
Due to known GIL-lock re-[SERIAL]-isation of all the thread-based code-execution flow, this still has to and will wait for acquiring the GIL-lock ownership, which in turn permits a GIL-lock owner ( and nobody else ) to execute a fixed amount of python instructions, before it re-releases the GIL-lock to some other thread...

Is Flask-SocketIO's emit function thread safe?

I have a Flask-SocketIO application. Can I safely call socketio.emit() from different threads? Is socketio.emit() atomic like the normal socket.send()?

The socketio.emit() function is thread safe, or I should say that it is intended to be thread-safe, as there is currently one open issue related to this. Note that 'thread' in this context means a supported threading model. Most people use Flask-SocketIO in conjunction with eventlet or gevent in production, so in those contexts thread means "green" thread.
The open issue is related to using a message queue, which is necessary when you have multiple servers. In that set up, the accesses to the queue are not thread safe at this time. This is a bug that needs to be fixed, but as a workaround, you can create a different socketio object per thread.
On second question regarding if socketio.emit() is atomic, the answer is no. This is not a simple socket write operation. The payload needs to be formatted in certain way to comply with the Socket.IO protocol, then depending on the selected transport (long-polling or websocket) the write happens in a completely different way.

ZMQ in Python - Can the PULL-side process know if the PUSH-side process was closed?

I use ZMQ in python to distribute calculations between a master application and worker sub-processes, via a PUSH-PULL.
At times, the master might crash and the sub-processes remain hanging, listening to their respective ports.
I tried to use atexit to close the workers in the event that the master crashes, as suggested in this SO question. However atexit does not capture the case when I forcefully close the master.
Is there a way for the PULL-side worker sub-processes to notice that the PUSH-side master is closed via the zmq sock (as possibly hinted here)?
Practical Solution (edit)
A practical solution I implemented is to have the master PUSH a message to close all pending workers when it re-starts:
Before spawning its own helpers, the new instance of the master broadcasts an exit message to all sockets.
Upon receiving the exit command, the hanging sub-processes (launched by the previous instance of the master) do a sys.exit().

A: No, but workarounds exist
In case the sole PUSH-PULL:
Scaleable Formal Communication Pattern remains on the scene, then the answer has no other option to be but: no, cannot.
But ZeroMQ is a powerful mental shift into distributed processing concept
However, with some slight architecture shifts, the needed functionality comes in hand from other Formal Communication Patterns, deployed side-by-side with the initial PUSH-PULL solo.
The co-existent TransportPLANE(s) + SIG_PLANE(s) behavioural orchestration is limited only by one's imagination.
While not directly solving a code for your [dead-man button] signalling scenario, this answer illustrates the possible approaches in this direction by focusing on co-existing BEHAVIOUR(s) rather than on code.
Understanding advanced ZeroMQ socket types + The Book - a must read for ZeroMQ
Understanding the concept "under" ZeroMQ

reactor design pattern in a single thread vs multiple threads

I've been reading about the reactor design pattern, specifically in the context of the Python Twisted networking framework. My simple understanding of the reactor design is that there is a single thread that will sit and wait until one or more I/O sources (or file descriptors) become available, and then it will synchronously loop through each of those sources, doing whatever callbacks specified for each of these sources. Which does mean that the program as a whole would block if any of the callbacks are themselves blocking. And regardless, once all callbacks have executed, the reactor goes back to waiting for more I/O sources to become ready.
What are the pros and cons of this, compared to asynchronously looping through each source as they appear, i.e. launching a separate thread for each source. I imagine this may be less efficient if all your callbacks are very fast, as the OS now has to deal with managing multiple threads and swapping between them. But it seems that it's now impossible to block the main program, and as an added benefit, the main reactor can keep listening for sources. In short, why does something like Twisted not do this, instead keeping to a single-threaded model?

What are the pros and cons of this, compared to asynchronously looping through each source as they appear, i.e. launching a separate thread for each source.
What you're describing is basically what happens in a multithreaded program that uses blocking I/O APIs. In this case, the "reactor" moves into the kernel and the "asynchronous looping" is the kernel completing some outstanding blocking operation, freeing up a user-space thread to proceed.
The cons of this approach are the greatly increased complexity with respect to thread-safety (ie, correctness) that it incurs compared to a strictly single-threaded approach.
The pros are better utilization of multiple CPUs (but running multiple single-threaded event-driven processes often offers this benefit as well) and the greater number of programmers who are familiar and comfortable (though often mistakenly so) with the multithreading approach to concurrency.
Also related, though, are the PyPy team's efforts towards providing a better abstraction over the conventional multithreading model. PyPy's work towards Software Transactional Memory (STM) could offer a system in which work is dispatched asynchronously to multiple worker threads without violating the assumptions that are valid in a strictly single-threaded system. If this works out, it could offer the best of both worlds.

But it seems that it's now impossible to block the main program,
I'm not a Python guy but have done this in the context of Boost. Asio. You're correct—your callbacks need to execute quickly and return control to the main reactor. The idea is to only use asynchronous calls in your callbacks. For example, you wouldn't use an API for sending an IP datagram that blocks and returns a status code. Instead, you'd use a non-blocking API where you register success and failure callbacks. This lets the call send call return immediately. The reactor will then call the success/failure callback once the OS has dealt with the packet.

Should I use epoll or just blocking recv in threads?

I'm trying to write a scalable custom web server.
Here's what I have so far:
The main loop and request interpreter are in Cython. The main loop accepts connections and assigns the sockets to one of the processes in the pool (has to be processes, threads won't get any benefit from multi-core hardware because of the GIL).
Each process has a thread pool. The process assigns the socket to a thread.
The thread calls recv (blocking) on the socket and waits for data. When some shows up, it gets piped into the request interpreter, and then sent via WSGI to the application running in that thread.
Now I've heard about epoll and am a little confused. Is there any benefit to using epoll to get socket data and then pass that directly to the processes? Or should I just go the usual route of having each thread wait on recv?
PS: What is epoll actually used for? It seems like multithreading and blocking fd calls would accomplish the same thing.

If you're already using multiple threads, epoll doesn't offer you much additional benefit.
The point of epoll is that a single thread can listen for activity on many file selectors simultaneously (and respond to events on each as they occur), and thus provide event-driven multitasking without requiring the spawning of additional threads. Threads are relatively cheap (compared to spawning processes), but each one does require some overhead (after all, they each have to maintain a call stack).
If you wanted to, you could rewrite your pool processes to be single-threaded using epoll, which would reduce your overall thread usage count, but of course you'd have to consider whether that's something you care about or not - in general, for low numbers of simultaneous requests on each worker, the overhead of spawning threads wouldn't matter, but if you want each worker to be able to handle 1000s of open connections, that overhead can become significant (and that's where epoll shines).
But...
What you're describing sounds suspiciously like you're basically reinventing the wheel - your:
main loop and request interpreter
pool of processes
sounds almost exactly like:
nginx (or any other load balancer/reverse proxy)
A pre-forking tornado app
Tornado is a single-threaded web server python module using epoll, and it has the capability built-in for pre-forking (meaning that it spawns multiple copies of itself as separate processes, effectively creating a process pool). Tornado is based on the tech created to power Friendfeed - they needed a way to handle huge numbers of open connections for long-polling clients looking for new real-time updates.
If you're doing this as a learning process, then by all means, reinvent away! It's a great way to learn. But if you're actually trying to build an application on top of these kinds of things, I'd highly recommend considering using the existing, stable, communally-developed projects - it'll save you a lot of time, false starts, and potential gotchas.
(P.S. I approve of your avatar. <3)

The epoll function (and the other functions in the same family poll and select) allow you to write single threading networking code that manage multiple networking connection. Since there is no threading, there is no need fot synchronisation as would be required in a multi-threaded program (this can be difficult to get right).
On the other hand, you'll need to have an explicit state machine for each connection. In a threaded program, this state machine is implicit.
Those function just offer another way to multiplex multiple connexion in a process. Sometimes it is easier not to use threads, other times you're already using threads, and thus it is easier just to use blocking sockets (which release the GIL in Python).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.