Is Flask-SocketIO's emit function thread safe? - python

I have a Flask-SocketIO application. Can I safely call socketio.emit() from different threads? Is socketio.emit() atomic like the normal socket.send()?

The socketio.emit() function is thread safe, or I should say that it is intended to be thread-safe, as there is currently one open issue related to this. Note that 'thread' in this context means a supported threading model. Most people use Flask-SocketIO in conjunction with eventlet or gevent in production, so in those contexts thread means "green" thread.
The open issue is related to using a message queue, which is necessary when you have multiple servers. In that set up, the accesses to the queue are not thread safe at this time. This is a bug that needs to be fixed, but as a workaround, you can create a different socketio object per thread.
On second question regarding if socketio.emit() is atomic, the answer is no. This is not a simple socket write operation. The payload needs to be formatted in certain way to comply with the Socket.IO protocol, then depending on the selected transport (long-polling or websocket) the write happens in a completely different way.

Related

Pika threaded execution gets error - 505, 'UNEXPECTED_FRAME

I'm aware that pika is not thread safe, i was trying to work around using a lock to access to channel but still get error:
pika.exceptions.ConnectionClosed: (505, 'UNEXPECTED_FRAME - expected content header for class 60, got non content header frame instead')
PS i cannot use a different channel.
what could i do? Thank you for help in advance
You need to redesign your application or choose another Rabbitmq library than Pika. Locks do not make Pika thread safe. Each thread needs to have a separate connection.
You have a couple of options, but none of them will be as simple as using a lock.
One would be to replace Pika with Kombu. Kombu is thread safe but the interface is rather different from Pika (simpler in my opinion but this is subjective).
If you want to keep using Pika, then you need to redesign your Rabbit interface. I do not know why you "cannot" use a different channel. But one possible way of doing this would be to have a single thread interfacing with Rabbit, and that thread would interact with worker threads doing tasks with the received data, and you would communicate via queues with them. This way your Rabbit thread would read data, send the received data to a worker in a queue, receive answers from workers via another queue and then submitting them to rabbit as responses.
You might also be able to untangle something in your communications protocol so that you actually can use a different channel and each thread can interface rabbit independently with their own connections and channels. This is the method I generally use.
Yet another candidate would be to get rid of threads and start using async methods instead. Your application may or may not be suitable for this.
But there is no simple workaround, and you will eventually encounter weird behaviour or exceptions if you try to share Pika objects between threads.

ZeroMQ poll thread safety

I have a thread that is polling on a ZMQ Poller:
poller.poll(timeout)
This thread is also the one which receives and sends back messages over the sockets registered in the poller.
Then I have another thread that may, eventually, create a new socket and register it for polling on input events:
socket = context.socket(...)
socket.bind/connect(...)
poller.register(socket, zmq.POLLIN)
Once the socket is registered, the latter thread will not touch it again.
Is this safe?
Update
The answers/comments I got were about how I should not be doing this. Or which are The Guide's recommendations (which I already knew). But that does not really answer my question.
To be more specific, I would say that I am working with pyzmq Python bindings for ZeroMQ.
Now, although ZeroMQ sockets are not thread safe, it is indeed possible to transfer them from one thread to another as long as there is a full memory barrier during the transfer.
So the first question would be: do I need to set an explicit memory barrier in there? Note that there is one thread that creates and binds/connects the socket and then it registers it, but it will not be using that thread again. Is there an actual conflict? could there be a moment in which I should be explicitly preventing access to the socket from both threads?
Then the second question would be: is registering a socket in a poller thread-safe? Most of the time the thread that performs the polling is busy doing other stuff, but it could happen that it is polling waiting for a timeout. In that case, do I need to use a lock to prevent concurrent access to the poller? or is it safe to register the new socket in the poller while the other thread is polling it?
Update II
I am using Pyro4 to handle and configure remote processes (i.e.: their ZeroMQ connections and their behavior). The initial configuration can be done with the Pyro Proxy very esaily. However, when I start the process, I am in fact running the main loop with a dedicated thread (Pyro oneway call) that keeps running, but if I access the object with the Pyro Proxy again, then this access is from another thread.
So the idea is to avoid modifying the remote object's class but still allow the use of Pyro for configuring the remote objects even when they are running. As long as the creation + binding/connecting + registering of new sockets is safe from another thread, I am good.
Once the socket is registered, the latter thread will not touch it again.
Is this safe?
No.
Industries that not only require safe solutions, but also export the responsibility to actually prove both the stable and warranted system behaviour to the vendor side (be it due to wise grandfathers, a deep belief in QA/TQM or due to regulations imposed on MIL/GOV/aerospace/healthcare/pharma/automotive et al segment vendor management) would simply straight reject.
Why?
" ... will not touch it again." is just a promise.
Safety cross-validated system design does not settle with less than a proof of a collision avoidance.
Let me cite from a lovely book from Pieter HINTJENS "Code Connected, Vol.1" - a must read piece for ZeroMQ:
Some widely used models, despite being the basis for entire industries, are fundamentally broken, and shared state concurrency is one of them. Code that wants to scale without limit does it like the Internet does, by sending messages and sharing nothing except a common contempt for broken programming models.
You should follow some rules to write happy multithreaded code with ØMQ:
• Isolate data privately within its thread and never share data in multiple threads. The only exception to this are ØMQ contexts, which are threadsafe.
• Stay away from the classic concurrency mechanisms like as mutexes, critical sections, semaphores, etc. These are an anti-pattern in ØMQ applications.
• Create one ØMQ context at the start of your process, and pass that to all threads that you want to connect via inproc sockets.
• Use attached threads to create structure within your application, and connect these to their parent threads using PAIR sockets over inproc. The pattern is: bind parent socket, then create child thread which connects its socket.
• Use detached threads to simulate independent tasks, with their own contexts. Connect these over tcp. Later you can move these to stand-alone processes without changing the code significantly.
• All interaction between threads happens as ØMQ messages, which you can define more or less formally.
• Don’t share ØMQ sockets between threads. ØMQ sockets are not threadsafe. Technically it’s possible to migrate a socket from one thread to another but it demands skill. The only place where it’s remotely sane to share sockets between threads are in language bindings that need to do magic like garbage collection on sockets.
If you need to start more than one proxy in an application, for example, you will want to run each in their own thread. It is easy to make the error of creating the proxy frontend and backend sockets in one thread, and then passing the sockets to the proxy in another thread. This may appear to work at first but will fail randomly in real use. Remember: Do not use or close sockets except in the thread that created them.
If you follow these rules, you can quite easily build elegant multithreaded applications, and later split off threads into separate processes as you need to. Application logic can sit in threads, processes, or nodes: whatever your scale needs.
ØMQ uses native OS threads rather than virtual “green” threads. The advantage is that you don’t need to learn any new threading API, and that ØMQ threads map cleanly to your operating system. You can use standard tools like Intel’s ThreadChecker to see what your application is doing. The disadvantages are that native threading APIs are not always portable, and that if you have a huge number of threads (in the thousands), some operating systems will get stressed.
If you’re sharing sockets across threads, don’t. It will lead to random weirdness, and crashes.
We could assume "light" conditions: system not stressed, high-watermark never reached, no big congestions. There is just a single thread running the application (polling and executing tasks on input). So most of the time (99.99%) there is no concurrency. Now, concurrency only occurs when a second thread appears just to add a socket to the pool. There will never be more than 2 threads being executed. And the second thread will be always restricted to adding new sockets to the pool (once added the socket is transferred to the main thread). Is this enough for boundary conditions? – Peque
The more the schematic use-case details were added in update-II, the professional solution shall not lose time and shall avoid any hidden risks by using thread-clean design.
#T1 a poller-maintainer -has Context() instance control
-has graceful .close() + .term() responsibility
-has POLLER instance under it's own control
-has PAIR .bind( "inproc://worker2poller" )
-has PAIR .recv() <add_socket>-request processing responsibility
#T2 a worker-process: -has PAIR .connect( "inproc://worker2poller" )
-has PAIR .send() privilege to ask T1 to add a socket & include it into POLLER
While GIL anyway avoids any chance to find the python threads run PARALLEL, the pure OOP-design is the motivation to keep the architecture with both clean and separated responsibilities and keeping the Formal Communication Patterns fully scaleable.

Tornado ioloop + threading

i have been working on tornado web framework from sometime, but still i didnt understood the ioloop functionality clearly, especially how to use it in multithreading.
Is it possible to create separate instance of ioloop for multiple server ??
The vast majority of Tornado apps should have only one IOLoop, running in the main thread. You can run multiple HTTPServers (or other servers) on the same IOLoop.
It is possible to create multiple IOLoops and give each one its own thread, but this is rarely useful, because the GIL ensures that only one thread is running at a time. If you do use multiple IOLoops you must be careful to ensure that the different threads only communicate with each other through thread-safe methods (i.e. IOLoop.add_callback).

What are the Tornado and Mongodb blocking and asynchronous considerations?

I am running the Tornado web server in conjunction with Mongodb (using the pymongo driver). I am trying to make architectural decisions to maximize performance.
I have several subquestions regarding the blocking/non-blocking and asynchronous aspects of the resulting application when using Tornado and pymongo together:
Question 1: Connection Pools
It appears that the pymongo.mongo_client.MongoClient object automatically implements a pool of connections. Is the intended purpose of a "connection pool" so that I can access mongodb simultaneously from different threads? Is it true that if run with a single MongoClient instance from a single thread that there is really no "pool" since there would only be one connection open at any time?
Question 2: Multi-threaded Mongo Calls
The following FAQ:
http://api.mongodb.org/python/current/faq.html#does-pymongo-support-asynchronous-frameworks-like-gevent-tornado-or-twisted
states:
Currently there is no great way to use PyMongo in conjunction with
Tornado or Twisted. PyMongo provides built-in connection pooling, so
some of the benefits of those frameworks can be achieved just by
writing multi-threaded code that shares a MongoClient.
So I assume that I just pass a single MongoClient reference to each thread? Or is there more to it than that? What is the best way to trigger a callback when each thread produces a result? Should I have one thread running who's job it is to watch a queue (python's Queue.Queue) to handle each result and then calling finish() on the left open RequestHandler object in Tornado? (of course using the tornado.web.asynchronous decorator would be needed)
Question 3: Multiple Instances
Finally, is it possible that I am just creating work? Should I just shortcut things by running a single threaded instance of Tornado and then start 3-4 instances per core? (The above FAQ reference seems to suggest this)
After all doesn't the GIL in python result in effectively different processes anyway? Or are there additional performance considerations (plus or minus) by the "non-blocking" aspects of Tornado? (I know that this is non-blocking in terms of I/O as pointed out here: Is Tornado really non-blocking?)
(Additional Note: I am aware of asyncmongo at: https://github.com/bitly/asyncmongo but want to use pymongo directly and not introduce this additional dependency.)
As i understand, there is two concepts of webservers:
Thread Based (apache)
Event Driven (tornado)
And you've the GIL with python, GIL is not good with threads, and event driven is a model that uses only one thread, so go with event driven.
Pymongo will block tornado, so here is suggestions:
Using Pymongo: use it, and make your database calls faster, by making indexes, but be aware; indexes dont work with operation that will scan lot of values for example: gte
Using AsyncMongo, it seems that has been updated, but still not all mongodb features.
Using Mongotor, this one is a like an update for Asynchmongo, and it has ODM (Object Document Mapper), has all what you need from MongoDB (aggregation, replica set..) and the only feature that you really miss is GridFS.
Using Motor, this is one, is the complete solution to use with Tornado, it has GridFS support, and it is the officialy Mongodb asynchronous driver for Tornado, it uses a hack using Greenlet, so the only downside is not to use with PyPy.
And now, if you decide other solution than Tornado, if you use Gevent, then you can use Pymongo, because it is said:
The only async framework that PyMongo fully supports is Gevent.
NB: sorry if going out of topic, but the sentence:
Currently there is no great way to use PyMongo in conjunction with Tornado
should be dropped from the documentation, Mongotor and Motor works in a perfect manner (Motor in particular).
While the question is old, I felt the answers given don't completely address all the queries asked by the user.
Is it true that if run with a single MongoClient instance from a single thread that there is really no "pool" since there would only be one connection open at any time?
This is correct if your script does not use threading. However if your script is multi-threaded then there would be multiple connections open at a given time
Finally, is it possible that I am just creating work? Should I just shortcut things by running a single threaded instance of Tornado and then start 3-4 instances per core?
No you are not! creating multiple threads is less resource intensive than multiple forks.
After all doesn't the GIL in python result in effectively different processes anyway?
The GIL only prevents multiple threads from accessing the interpreter at the same time. Does not prevent multiple threads from carrying out I/O simultaneously. In fact that's exactly how motor with asyncio achieves asynchronicity.
It uses a thread pool executor to spawn a new thread for each query, returns the result when the thread completes.
are you also aware of motor ? : http://emptysquare.net/blog/introducing-motor-an-asynchronous-mongodb-driver-for-python-and-tornado/
it is written by Jesse Davis who is coauthor of pymongo

Multi-threading under Apache resulting in duplicate threads

I am running a python thread pool under Apache2 to handle incoming special HTTP requests.
The idea is that I have a single "handler" thread per request source - so if I have three devices (A,B,C) sending me these special requests, each would have its own "handler" thread on the server (1-A, 2-B, 3-C)
I have a thread pool class defined like this:
class threadController(threading.Thread)
threadPool = []
And when I get a new request, I look through all my running threads, to match a particular one, and pass the request to it.
This seemed to work well enough under Windows.
However, on Linux, it seems that sometimes my threadPool variable returns as empty, and I get an extra thread. So I have a single device (A) sending requests, but end up with two threads (1-A and 2-A).
Here's the strange thing: It is always one extra thread, never more. Regardless whether my device (A) sends 5 requests, or 30.
I am using mod_wsgi (3.3) for django integration.
Note: I realize that this is a somewhat unorthodox way of handling sessions. I am not looking for a way to handle sessions better - I already know there are better ways :)
On Windows there is only one Apache child process handling requests. On non Windows systems, if using embedded mode there can be multiple processes.
Use mod_wsgi daemon and its default of a single process. See:
http://code.google.com/p/modwsgi/wiki/QuickConfigurationGuide#Delegation_To_Daemon_Process
and:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading

Categories