I am running the Tornado web server in conjunction with Mongodb (using the pymongo driver). I am trying to make architectural decisions to maximize performance.
I have several subquestions regarding the blocking/non-blocking and asynchronous aspects of the resulting application when using Tornado and pymongo together:
Question 1: Connection Pools
It appears that the pymongo.mongo_client.MongoClient object automatically implements a pool of connections. Is the intended purpose of a "connection pool" so that I can access mongodb simultaneously from different threads? Is it true that if run with a single MongoClient instance from a single thread that there is really no "pool" since there would only be one connection open at any time?
Question 2: Multi-threaded Mongo Calls
The following FAQ:
http://api.mongodb.org/python/current/faq.html#does-pymongo-support-asynchronous-frameworks-like-gevent-tornado-or-twisted
states:
Currently there is no great way to use PyMongo in conjunction with
Tornado or Twisted. PyMongo provides built-in connection pooling, so
some of the benefits of those frameworks can be achieved just by
writing multi-threaded code that shares a MongoClient.
So I assume that I just pass a single MongoClient reference to each thread? Or is there more to it than that? What is the best way to trigger a callback when each thread produces a result? Should I have one thread running who's job it is to watch a queue (python's Queue.Queue) to handle each result and then calling finish() on the left open RequestHandler object in Tornado? (of course using the tornado.web.asynchronous decorator would be needed)
Question 3: Multiple Instances
Finally, is it possible that I am just creating work? Should I just shortcut things by running a single threaded instance of Tornado and then start 3-4 instances per core? (The above FAQ reference seems to suggest this)
After all doesn't the GIL in python result in effectively different processes anyway? Or are there additional performance considerations (plus or minus) by the "non-blocking" aspects of Tornado? (I know that this is non-blocking in terms of I/O as pointed out here: Is Tornado really non-blocking?)
(Additional Note: I am aware of asyncmongo at: https://github.com/bitly/asyncmongo but want to use pymongo directly and not introduce this additional dependency.)
As i understand, there is two concepts of webservers:
Thread Based (apache)
Event Driven (tornado)
And you've the GIL with python, GIL is not good with threads, and event driven is a model that uses only one thread, so go with event driven.
Pymongo will block tornado, so here is suggestions:
Using Pymongo: use it, and make your database calls faster, by making indexes, but be aware; indexes dont work with operation that will scan lot of values for example: gte
Using AsyncMongo, it seems that has been updated, but still not all mongodb features.
Using Mongotor, this one is a like an update for Asynchmongo, and it has ODM (Object Document Mapper), has all what you need from MongoDB (aggregation, replica set..) and the only feature that you really miss is GridFS.
Using Motor, this is one, is the complete solution to use with Tornado, it has GridFS support, and it is the officialy Mongodb asynchronous driver for Tornado, it uses a hack using Greenlet, so the only downside is not to use with PyPy.
And now, if you decide other solution than Tornado, if you use Gevent, then you can use Pymongo, because it is said:
The only async framework that PyMongo fully supports is Gevent.
NB: sorry if going out of topic, but the sentence:
Currently there is no great way to use PyMongo in conjunction with Tornado
should be dropped from the documentation, Mongotor and Motor works in a perfect manner (Motor in particular).
While the question is old, I felt the answers given don't completely address all the queries asked by the user.
Is it true that if run with a single MongoClient instance from a single thread that there is really no "pool" since there would only be one connection open at any time?
This is correct if your script does not use threading. However if your script is multi-threaded then there would be multiple connections open at a given time
Finally, is it possible that I am just creating work? Should I just shortcut things by running a single threaded instance of Tornado and then start 3-4 instances per core?
No you are not! creating multiple threads is less resource intensive than multiple forks.
After all doesn't the GIL in python result in effectively different processes anyway?
The GIL only prevents multiple threads from accessing the interpreter at the same time. Does not prevent multiple threads from carrying out I/O simultaneously. In fact that's exactly how motor with asyncio achieves asynchronicity.
It uses a thread pool executor to spawn a new thread for each query, returns the result when the thread completes.
are you also aware of motor ? : http://emptysquare.net/blog/introducing-motor-an-asynchronous-mongodb-driver-for-python-and-tornado/
it is written by Jesse Davis who is coauthor of pymongo
Related
So which will be a better method to make database calls in Tornado for a high performance application with highly scalable infrastructure which makes a high number of database queries?
Method 1 : So I have come across async database drivers/clients like TorMySQL, Tornado-Mysql, asynctorndb,etc which can perform async database calls.
Method 2 : Use the general and common MySQLdb or mysqlclient (by PyMySQL) drivers and make the database calls to separate backend services and load-balance the calls with nginx.
Similar to what the original Friendfeed guys did which they mentioned in one of the groups,
Bret Taylor, one of the original authors, writes :
groups.google.com/group/python-tornado/browse_thread/thread/9a08f5f08cdab108
We experimented with different async DB approaches, but settled on
synchronous at FriendFeed because generally if our DB queries were
backlogging our requests, our backends couldn't scale to the load
anyway. Things that were slow enough were abstracted to separate
backend services which we fetched asynchronously via the async HTTP
module.
Other supporting links for Method 2 for better understanding what I am trying to say are:
StackOverFlow Answer
Method 3 : By making use of threads and IOLoop and using the general sync libraries.
Supporting example links to explain what I mean,
What the Tornado wiki says, https://github.com/tornadoweb/tornado/wiki/Threading-and-concurrency
Do it synchronously and block the IOLoop. This is most appropriate for
things like memcache and database queries that are under your control
and should always be fast. If it's not fast, make it fast by adding
the appropriate indexes to the database, etc.
Another sample Method :
For sync database(mysqldb), we can
executor = ThreadPoolExecutor(4)
result = yield executor.submit(mysqldb_operation)
Method 4 : Just use sync drivers like MySQLdb and start enough Tornado Instances and load balance using nginx that the application remains asynchronous on a broader level with some calls being blocked but other requests benefit the asynchronous nature via the large number of tornado instances.
'Explanation' :
For details follow this link - www.jjinux.com/2009/12/python-asynchronous-networking-apis-and.html, which says :
They allow MySQL queries to block the entire process. However, they
compensate in two ways. They lean heavily on their asynchronous web
client wherever possible. They also make use of multiple Python
processes. Hence, if you're handling 500 simultaneous request, you
might use nginx to split them among 10 different Tornado Web
processes. Each process is handling 50 simultaneous requests. If one
of those requests needs to make a call to the database, only 50
(instead of 500) requests are blocked.
FriendFeed used what you call "method 4": there was no separate backend service; the process was just blocked during the database call.
Using a completely asynchronous driver ("method 1") is generally best. However, this means that you can't use synchronous libraries that wrap the database operations like SQLAlchemy. In this case you may have to use threads ("method 3"), which is almost as good (and easier than "method 2").
The advantage of "method 4" is that it is easy. In the past this was enough to recommend it because introducing callbacks everywhere was tedious; with the advent of coroutines method 3 is almost as easy so it is usually better to use threads than to block the process.
Is it OK to run certain pieces of code asynchronously in a Django web app. If so how?
For example:
I have a search algorithm that returns hundreds or thousands of results. I want to enter into the database that these items were the result of the search, so I can see what users are searching most. I don't want the client to have to wait an extra hundred or thousand more database inserts. Is there a way I can do this asynchronously? Is there any danger in doing so? Is there a better way to achieve this?
As far as Django is concerned yes.
The bigger concern is your web server and if it plays nice with threading. For instance, the sync workers of gunicorn are single threads, but there are other engines, such as greenlet. I'm not sure how well they play with threads.
Combining threading and multiprocessing can be an issue if you're forking from threads:
Status of mixing multiprocessing and threading in Python
http://bugs.python.org/issue6721
That being said, I know of popular performance analytics utilities that have been using threads to report on metrics, so seems to be an accepted practice.
In sum, seems safest to use the threading.Thread object from the standard library, so long as whatever you do in it doesn't fork (python's multiprocessing library)
https://docs.python.org/2/library/threading.html
Offloading requests from the main thread is a common practice; as the end goal is to return a result to the client (browser) as quickly as possible.
As I am sure you are aware, HTTP is blocking - so until you return a response, the client cannot do anything (it is blocked, in a waiting state).
The de-facto way of offloading requests is through celery which is a task queuing system.
I highly recommend you read the introduction to celery topic, but in summary here is what happens:
You mark certain pieces of codes as "tasks". These are usually functions that you want to run asynchronously.
Celery manages workers - you can think of them as threads - that will run these tasks.
To communicate with the worker a message queue is required. RabbitMQ is the one often recommended.
Once you have all the components running (it takes but a few minutes); your workflow goes like this:
In your view, when you want to offload some work; you will call the function that does that work with the .delay() option. This will trigger the worker to start executing the method in the background.
Your view then returns a response immediately.
You can then check for the result of the task, and take appropriate actions based on what needs to be done. There are ways to track progress as well.
It is also good practice to include caching - so that you are not executing expensive tasks unnecessarily. For example, you might choose to offload a request to do some analytics on search keywords that will be placed in a report.
Once the report is generated, I would cache the results (if applicable) so that the same report can be displayed if requested later - rather than be generated again.
After doing Gevent/Eventlet monkey patching - can I assume that whenever DB driver (eg redis-py, pymongo) uses IO through standard library (eg socket) it will be asynchronous?
So using eventlets monkey patching is enough to make eg: redis-py non blocking in eventlet application?
From what I know it should be enough if I take care about connection usage (eg to use different connection for each greenlet). But I want to be sure.
If you known what else is required, or how to use DB drivers correctly with Gevent/Eventlet please type it also.
You can assume it will be magically patched if all of the following are true.
You're sure of the I/O is built on top of standard Python sockets or other things that eventlet/gevent monkeypatches. No files, no native (C) socket objects, etc.
You pass aggressive=True to patch_all (or patch_select), or you're sure the library doesn't use select or anything similar.
The driver doesn't use any (implicit) internal threads. (If the driver does use threads internally, patch_thread may work, but it may not.)
If you're not sure, it's pretty easy to test—probably easier than reading through the code and trying to work it out. Have one greenlet that just does something like this:
while True:
print("running")
gevent.sleep(0.1)
Then have another that runs a slow query against the database. If it's monkeypatched, the looping greenlet will keep printing "running" 10 times/second; if not, the looping greenlet will not get to run while the program is blocked on the query.
So, what do you do if your driver blocks?
The easiest solution is to use a truly concurrent threadpool for DB queries. The idea is that you fire off each query (or batch) as a threadpool job and greenlet-block your gevent on the completion of that job. (For really simple cases, where you don't need many concurrent queries, you can just spawn a threading.Thread for each one instead, but usually you can't get away with that.)
If the driver does significant CPU work (e.g., you're using something that runs an in-process cache, or even an entire in-process DBMS like sqlite), you want this threadpool to actually be implemented on top of processes, because otherwise the GIL may prevent your greenlets from running. Otherwise (especially if you care about Windows), you probably want to use OS threads. (However, this means you can't patch_threads(); if you need to do that, use processes.)
If you're using eventlet, and you want to use threads, there's a built-in simple solution called tpool that may be sufficient. If you're using gevent, or you need to use processes, this won't work. Unfortunately, blocking a greenlet (without blocking the whole event loop) on a real threading object is a bit different between eventlet and gevent, and not documented very well, but the tpool source should give you the idea. Beyond that part, the rest is just using concurrent.futures (see futures on pypi if you need this in 2.x or 3.1) to execute the tasks on a ThreadPoolExecutor or ProcessPoolExecutor. (Or, if you prefer, you can go right to threading or multiprocessing instead of using futures.)
Can you explain why I should use OS threads on Windows?
The quick summary is: If you stick to threads, you can pretty much just write cross-platform code, but if you go to processes, you're effectively writing code for two different platforms.
First, read the Programming guidelines for the multiprocessing module (both the "All platforms" section and the "Windows" section). Fortunately, a DB wrapper shouldn't run into most of this. You only need to deal with processes via the ProcessPoolExecutor. And, whether you wrap things up at the cursor-op level or the query level, all your arguments and return values are going to be simple types that can be pickled. Still, it's something you have to be careful about, which otherwise wouldn't be an issue.
Meanwhile, Windows has very low overhead for its intra-process synchronization objects, but very high overhead for its inter-process ones. (It also has very fast thread creation and very slow process creation, but that's not important if you're using a pool.) So, how do you deal with that? I had a lot of fun creating OS threads to wait on the cross-process sync objects and signal the greenlets, but your definition of fun may vary.
Finally, tpool can be adapted trivially to a ppool for Unix, but it takes more work on Windows (and you'll have to understand Windows to do that work).
abarnert's answer is correct and very comprehensive. I just want to add that there is no "aggressive" patching in eventlet, probably gevent feature. Also if library uses select that is not a problem, because eventlet can monkey patch that too.
Indeed, in most cases eventlet.monkey_patch() is all you need. Of course, it must be done before creating any sockets.
If you still have any issues, feel free to open issue or write to eventlet mailing list or G+ community. All relevant links can be found at http://eventlet.net/
I am working on a web backend that frequently grabs realtime market data from the web, and puts the data in a MySQL database.
Currently I have my main thread push tasks into a Queue object. I then have about 20 threads that read from that queue, and if a task is available, they execute it.
Unfortunately, I am running into performance issues, and after doing a lot of research, I can't make up my mind.
As I see it, I have 3 options:
Should I take a distributed task approach with something like Celery?
Should I switch to JPython or IronPython to avoid the GIL issues?
Or should I simply spawn different processes instead of threads using processing?
If I go for the latter, how many processes is a good amount? What is a good multi process producer / consumer design?
Thanks!
Maybe you should use an event-driven approach, and use an event-driven oriented frameworks like twisted(python) or node.js(javascript), for example this frameworks make use of the UNIX domain sockets, so your consumer listens at some port, and your event generator object pushes all the info to the consumer, so your consumer don't have to check every time to see if there's something in the queue.
First, profile your code to determine what is bottlenecking your performance.
If each of your threads are frequently writing to your MySQL database, the problem may be disk I/O, in which case you should consider using an in-memory database and periodically write it to disk.
If you discover that CPU performance is the limiting factor, then consider using the multiprocessing module instead of the threading module. Use a multiprocessing.Queue object to push your tasks. Also make sure that your tasks are big enough to keep each core busy for a while, so that the granularity of communication doesn't kill performance. If you are currently using threading, then switching to multiprocessing would be the easiest way forward for now.
I'm using pymongo to access mongodb in an application that also uses Celery to perform many asynchronous tasks. I know pymongo's connection pooling does not support asynchronous workers (based on the docs).
To access collections I've got a Collection class wrapping certain logic that fits my application. I'm trying to make sense of some code that I inherited with this wrapper:
Each collection at the moment creates its own Connection instance. Based on what I'm reading this is wrong and I should really have a single Connection instance (in settings.py or such) and import it into my Collection instances. That bit is clear. Is there a guideline as far as the maximum connections recommended? The current code surely creates a LOT of connections/sockets as its not really making use of the pooling facilities.
However, as some code is called from both asynchronous celery tasks as well as being run synchronously, I'm not sure how to handle this. My thought is to instantiate new Connection instances for the tasks and use the single one for for the synchronous ones (ending_request of course after each activity is done). Is this the right direction?
Thanks!
Harel
From pymongo's docs : "PyMongo is thread-safe and even provides built-in connection pooling for threaded applications."
The word "asynchronous" in your situation can be translated into how "inconsistent" requirements your application has.
Statements like "x += 1" will never be consistent in your app. If you can afford this, there is no problem. If you have "critical" operations you must somehow implement some locks for synchronization.
As for the maximum connections, I don't know exact numbers, so test and proceed.
Also take a look at Redis and this example, if speed and memory efficiency are required. From some benchmarks I made, Redis python driver is at least 2x faster than pymongo, for reads/writes.