I want aiohttp app running on number of processes.
Tornado has
server = HTTPServer(app)
server.bind(8888)
server.start(0)
for starting workers (0 - cpu cores). I know about gunicorn. But i want server embeded into application.
Found https://docs.aiohttp.org/en/stable/web_reference.html#server how use it?
According to the documentation for aiohttp, running on multiple cores in a single application isn't supported.
Standalone
Just call aiohttp.web.run_app() function passing
aiohttp.web.Application instance.
The method is very simple and could be the best solution in some
trivial cases. But it does not utilize all CPU cores.
For running multiple aiohttp server instances use reverse proxies.
This means that it's outside of the responsibility of aiohttp to support multiple processes; instead leaving that part to supervisord (which nginx can then connect to in a round-robin fashion) or gunicorn w/nginx.
I am using python flask framework for an object detection task. I have set threaded=Trueand multiple requests are handled well. As the detection process uses more processing power and time, I need to control the number of background threads to a certain limit. As of my knowledge, OS can manage the number of threads. But I need to limit the thread count to 4 or 5 and provide a server busy result if the request is overloaded. How can I achieve this?
When using threaded true, the number of threads will be depend on the system configuration. You need to use any production environment like gunicorn for this as flask is not supporting production environment officially. Limiting of thread count using flask is also very hard to execute.
What exactly does passing threaded = True to app.run() do?
My application processes input from the user, and takes a bit of time to do so. During this time, the application is unable to handle other requests. I have tested my application with threaded=True and it allows me to handle multiple requests concurrently.
As of Flask 1.0, the WSGI server included with Flask is run in threaded mode by default.
Prior to 1.0, or if you disable threading, the server is run in single-threaded mode, and can only handle one request at a time. Any parallel requests will have to wait until they can be handled, which can lead to issues if you tried to contact your own server from a request.
With threaded=True requests are each handled in a new thread. How many threads your server can handle concurrently depends entirely on your OS and what limits it sets on the number of threads per process. The implementation uses the SocketServer.ThreadingMixIn class, which sets no limits to the number of threads it can spin up.
Note that the Flask server is designed for development only. It is not a production-ready server. Don't rely on it to run your site on the wider web. Use a proper WSGI server (like gunicorn or uWSGI) instead.
How many requests will my application be able to handle concurrently with this statement?
This depends drastically on your application. Each new request will have a thread launched- it depends on how many threads your machine can handle. I don't see an option to limit the number of threads (like uwsgi offers in a production deployment).
What are the downsides to using this? If i'm not expecting more than a few requests concurrently, can I just continue to use this?
Switching from a single thread to multi-threaded can lead to concurrency bugs... if you use this be careful about how you handle global objects (see the g object in the documentation!) and state.
I have inherited a rather large code base that utilizes tornado to compute and serve big and complex data-types (imagine a 1 MB XML file). Currently there are 8 instances of tornado running to compute and serve this data.
That was a wrong design-decision from the start and I am facing many many timeouts from applications that access the servers.
I'd like to change as few lines of code as possible in the legacy code base because I do not want to break anything that has already been tested in the field. What can I do to transform this system into a threaded one that can execute more xml-computation in parallel?
transform this system into a threaded one that can execute more xml-computation in parallel
If there are enough Tornado instances to saturate the computational resources, moving to a threaded model will probably not gain much performance. Getting rid of blocking code however helps with connection timeouts.
Another option is getting rid of all asynchronous code and using tornado.wsgi.WSGIApplication. That way, you can run the application on a threaded WSGI server. Features that are not available in WSGI mode are listed here.
Use Tornado to just receive non-blocking requests. To do the actual XML processing you can then spawn another process or use an async task processor like celery. Using celery would facilitate easy scaling of your system in future. In fact with this model you'll just need one Tornado instance.
#Eren - I don't think that the computational resources are getting saturated. It would just be that more than 8 requests are not getting processed simultaneously as Tornado would right now be serving requests in blocking mode.
I'm building an app with Flask, but I don't know much about WSGI and it's HTTP base, Werkzeug. When I start serving a Flask application with gunicorn and 4 worker processes, does this mean that I can handle 4 concurrent requests?
I do mean concurrent requests, and not requests per second or anything else.
When running the development server - which is what you get by running app.run(), you get a single synchronous process, which means at most 1 request is being processed at a time.
By sticking Gunicorn in front of it in its default configuration and simply increasing the number of --workers, what you get is essentially a number of processes (managed by Gunicorn) that each behave like the app.run() development server. 4 workers == 4 concurrent requests. This is because Gunicorn uses its included sync worker type by default.
It is important to note that Gunicorn also includes asynchronous workers, namely eventlet and gevent (and also tornado, but that's best used with the Tornado framework, it seems). By specifying one of these async workers with the --worker-class flag, what you get is Gunicorn managing a number of async processes, each of which managing its own concurrency. These processes don't use threads, but instead coroutines. Basically, within each process, still only 1 thing can be happening at a time (1 thread), but objects can be 'paused' when they are waiting on external processes to finish (think database queries or waiting on network I/O).
This means, if you're using one of Gunicorn's async workers, each worker can handle many more than a single request at a time. Just how many workers is best depends on the nature of your app, its environment, the hardware it runs on, etc. More details can be found on Gunicorn's design page and notes on how gevent works on its intro page.
Currently there is a far simpler solution than the ones already provided. When running your application you just have to pass along the threaded=True parameter to the app.run() call, like:
app.run(host="your.host", port=4321, threaded=True)
Another option as per what we can see in the werkzeug docs, is to use the processes parameter, which receives a number > 1 indicating the maximum number of concurrent processes to handle:
threaded – should the process handle each request in a separate thread?
processes – if greater than 1 then handle each request in a new process up to this maximum number of concurrent processes.
Something like:
app.run(host="your.host", port=4321, processes=3) #up to 3 processes
More info on the run() method here, and the blog post that led me to find the solution and api references.
Note: on the Flask docs on the run() methods it's indicated that using it in a Production Environment is discouraged because (quote): "While lightweight and easy to use, Flask’s built-in server is not suitable for production as it doesn’t scale well."
However, they do point to their Deployment Options page for the recommended ways to do this when going for production.
Flask will process one request per thread at the same time. If you have 2 processes with 4 threads each, that's 8 concurrent requests.
Flask doesn't spawn or manage threads or processes. That's the responsability of the WSGI gateway (eg. gunicorn).
No- you can definitely handle more than that.
Its important to remember that deep deep down, assuming you are running a single core machine, the CPU really only runs one instruction* at a time.
Namely, the CPU can only execute a very limited set of instructions, and it can't execute more than one instruction per clock tick (many instructions even take more than 1 tick).
Therefore, most concurrency we talk about in computer science is software concurrency.
In other words, there are layers of software implementation that abstract the bottom level CPU from us and make us think we are running code concurrently.
These "things" can be processes, which are units of code that get run concurrently in the sense that each process thinks its running in its own world with its own, non-shared memory.
Another example is threads, which are units of code inside processes that allow concurrency as well.
The reason your 4 worker processes will be able to handle more than 4 requests is that they will fire off threads to handle more and more requests.
The actual request limit depends on HTTP server chosen, I/O, OS, hardware, network connection etc.
Good luck!
*instructions are the very basic commands the CPU can run. examples - add two numbers, jump from one instruction to another