I have to solve this issue: i have a 4G/4Core Ubuntu server 18.04LTS based machine with my django application, postgres, redis, and some other services that runs off my web application.
The nature of the application is quite simple: every 5 minutes i have to collect data and save it to my db from remote devices via SNMP protocol, the total number of devices is around 300.
I want to start this task without slowing down my application, so i've offloaded this task, i've tried both Threads, Multiprocessing and Django-RQ/python-rq.
Now, here are the results:
Threads: 1800 seconds
Multiprocessing: 180 seconds (using fork method as default context, but i'm having random deadlocks, so is totally unusable after all)
Python-RQ and Django-RQ with 12 workers: 180/200 seconds
Now, it seems that my operations are all I/O bound, so, excluding the multiprocessing approach, the background worker with Python-rq seems effective, but i'm experiencing that it eats up all my memory.
You guys, have a solution? i mean: the multiprocessing way using fork as default context is totally unusable due to the random deadlock..but the python-rq/django-rq solution is functional and heavy!
How i can use the multiprocessing module with spawn as default context, load Django and then feed the created process with the tasks to execute?
You have some alternatives?
Related
I want to use ThreadPoolExecutor on a webapp (django),
All examples that I saw are using the thread pool like that:
with ThreadPoolExecutor(max_workers=1) as executor:
code
I tried to store the thread pool as a class member of a class and to use map fucntion
but I got memory leak, the only way I could use it is by the with notation
so I have 2 questions:
Each time I run with ThreadPoolExecutor does it creates threads again and then release them, in other word is this operation is expensive?
If I avoid using with how can I release the memory of the threads
thanks
Normally, web applications are stateless. That means every object you create should live in a request and die at the end of the request. That includes your ThreadPoolExecutor. Having an executor at the application level may work, but it will be embedded into your web application instead of running as a separate group of processes.
So if you want to take the workers down or restart them, your web app will have to restart as well.
And there will be stability concerns, since there is no main process watching over child processes detecting which one has gotten stale, so requires a lot of code to get multiprocessing right.
Alternatively, If you want a persistent group of processes to listen to a job queue and run your tasks, there are several projects that do that for you. All you need to do is to set up a server that takes care of queueing and locking such as redis or rabbitmq, then point your project at that server and start the workers. Some projects even let you use the database as a job queue backend.
I'm doing some metric analysis on on my web app, which makes extensive use of celery. I have one metric which measures the full trip from a post_save signal through a celery task (which itself calls a number of different celery tasks) to the end of that task. I've been hitting the server with up to 100 requests in 5 seconds.
What I find interesting is that when I hit the server with hundreds of requests (which entails thousands of celery worker processes being queued), the time it takes for the trip from post save to the end of the main celery task increases significantly, even though I never do any additional database calls, and none of the celery tasks should be blocking the main task.
Could the fact that there are so many celery tasks in the queue when I make a bunch of requests really quickly be slowing down the logic in my post_save function and main celery task? That is, could the processing associated with getting the sub-tasks that the main celery task creates onto a crowded queue be having a significant impact on the time it takes to reach the end of the main celery task?
It's impossible to really answer your question without an in-depth analysis of your actual code AND benchmark protocol, and while having some working experience with Python, Django and Celery I wouldn't be able to do such an in-depth analysis. Now there are a couple very obvious points :
if your workers are running on the same computer as your Django instance, they will compete with Django process(es) for CPU, RAM and IO.
if the benchmark "client" is also running on the same computer then you have a "heisenbench" case - bombing a server with 100s of HTTP request per second also uses a serious amount of resources...
To make a long story short: concurrent / parallel programming won't give you more processing power, it will only allow you to (more or less) easily scale horizontally.
I'm not sure about slowing down, but it can cause your application to hang. I've had this problem where one application would backup several other queues with no workers. My application could then no longer queue messages.
If you open up a django shell and try to queue a task. Then hit ctrl+c. I can't quite remember what the stack trace should be, but if you post it here I could confirm it.
I have some spider that download pages and store data in database. I have created flask application with admin panel (by Flask-Admin extension) that show database.
Now I want append function to my flask app for control spider state: switch on/off.
I thing it posible by threads or multiprocessing. Celery is not good decision because total program must use minimum memory.
Which method to choose for implementation this function?
Discounting Celery based on memory usage would probably be a mistake, as Celery has low overhead in both time and space. In fact, using Celery+Flask does not use much more memory than using Flask alone.
In addition Celery comes with several choices you can make that can have an impact
on the amount of memory used. For example, there are 5 different pool implementations that all have different strengths and trade-offs, the pool choices are:
multiprocessing
By default Celery uses multiprocessing, which means that it will spawn child processes
to offload work to. This is the most memory expensive option - simply because
every child process will duplicate the amount of base memory needed.
But Celery also comes with an autoscale feature that will kill off worker
processes when there's little work to do, and spawn new processes when there's more work:
$ celeryd --autoscale=0,10
where 0 is the mininum number of processes, and 10 is the maximum. Here celeryd will
start off with no child processes, and grow based on load up to a maximum of 10 processes. When load decreases, so will the number of worker processes.
eventlet/gevent
When using the eventlet/gevent pools only a single process will be used, and thus it will
use a lot less memory, but with the downside that tasks calling blocking code will
block other tasks from executing. If your tasks are mostly I/O bound you should be ok,
and you can also combine different pools and send problem tasks to a multiprocessing pool instead.
threads
Celery also comes with a pool using threads.
The development version that will become version 2.6 includes a lot of optimizations,
and there is no longer any need for the Flask-Celery extension module. If you are not going
into production in the next days then I would encourage you to try the development version
which must be installed like this:
$ pip install https://github.com/ask/kombu/zipball/master
$ pip install https://github.com/ask/celery/zipball/master
The new API is now also Flask inspired, so you should read the new getting started guide:
http://ask.github.com/celery/getting-started/first-steps-with-celery.html
With all this said, most optimization work has been focused on execution speed so far,
and there is probably many more memory optimizations that can be made. It has not been a request so far, but in the unlikely event that Celery does not match your memory constraints, you can open up an issue at our bug tracker and I'm sure it will get focus, or you can even help us to do so.
You could hypervize the process using multiprocess or subprocess, then just hand the handle round the session.
I am working on a web backend that frequently grabs realtime market data from the web, and puts the data in a MySQL database.
Currently I have my main thread push tasks into a Queue object. I then have about 20 threads that read from that queue, and if a task is available, they execute it.
Unfortunately, I am running into performance issues, and after doing a lot of research, I can't make up my mind.
As I see it, I have 3 options:
Should I take a distributed task approach with something like Celery?
Should I switch to JPython or IronPython to avoid the GIL issues?
Or should I simply spawn different processes instead of threads using processing?
If I go for the latter, how many processes is a good amount? What is a good multi process producer / consumer design?
Thanks!
Maybe you should use an event-driven approach, and use an event-driven oriented frameworks like twisted(python) or node.js(javascript), for example this frameworks make use of the UNIX domain sockets, so your consumer listens at some port, and your event generator object pushes all the info to the consumer, so your consumer don't have to check every time to see if there's something in the queue.
First, profile your code to determine what is bottlenecking your performance.
If each of your threads are frequently writing to your MySQL database, the problem may be disk I/O, in which case you should consider using an in-memory database and periodically write it to disk.
If you discover that CPU performance is the limiting factor, then consider using the multiprocessing module instead of the threading module. Use a multiprocessing.Queue object to push your tasks. Also make sure that your tasks are big enough to keep each core busy for a while, so that the granularity of communication doesn't kill performance. If you are currently using threading, then switching to multiprocessing would be the easiest way forward for now.
I have a python (well, it's php now but we're rewriting) function that takes some parameters (A and B) and compute some results (finds best path from A to B in a graph, graph is read-only), in typical scenario one call takes 0.1s to 0.9s to complete. This function is accessed by users as a simple REST web-service (GET bestpath.php?from=A&to=B). Current implementation is quite stupid - it's a simple php script+apache+mod_php+APC, every requests needs to load all the data (over 12MB in php arrays), create all structures, compute a path and exit. I want to change it.
I want a setup with N independent workers (X per server with Y servers), each worker is a python app running in a loop (getting request -> processing -> sending reply -> getting req...), each worker can process one request at a time. I need something that will act as a frontend: get requests from users, manage queue of requests (with configurable timeout) and feed my workers with one request at a time.
how to approach this? can you propose some setup? nginx + fcgi or wsgi or something else? haproxy? as you can see i'am a newbie in python, reverse-proxy, etc. i just need a starting point about architecture (and data flow)
btw. workers are using read-only data so there is no need to maintain locking and communication between them
The typical way to handle this sort of arrangement using threads in Python is to use the standard library module Queue. An example of using the Queue module for managing workers can be found here: Queue Example
Looks like you need the "workers" to be separate processes (at least some of them, and therefore might as well make them all separate processes rather than bunches of threads divided into several processes). The multiprocessing module in Python 2.6 and later's standard library offers good facilities to spawn a pool of processes and communicate with them via FIFO "queues"; if for some reason you're stuck with Python 2.5 or even earlier there are versions of multiprocessing on the PyPi repository that you can download and use with those older versions of Python.
The "frontend" can and should be pretty easily made to run with WSGI (with either Apache or Nginx), and it can deal with all communications to/from worker processes via multiprocessing, without the need to use HTTP, proxying, etc, for that part of the system; only the frontend would be a web app per se, the workers just receive, process and respond to units of work as requested by the frontend. This seems the soundest, simplest architecture to me.
There are other distributed processing approaches available in third party packages for Python, but multiprocessing is quite decent and has the advantage of being part of the standard library, so, absent other peculiar restrictions or constraints, multiprocessing is what I'd suggest you go for.
There are many FastCGI modules with preforked mode and WSGI interface for python around, the most known is flup. My personal preference for such task is superfcgi with nginx. Both will launch several processes and will dispatch requests to them. 12Mb is not as much to load them separately in each process, but if you'd like to share data among workers you need threads, not processes. Note, that heavy math in python with single process and many threads won't use several CPU/cores efficiently due to GIL. Probably the best approach is to use several processes (as much as cores you have) each running several threads (default mode in superfcgi).
The most simple solution in this case is to use the webserver to do all the heavy lifting. Why should you handle threads and/or processes when the webserver will do all that for you?
The standard arrangement in deployments of Python is:
The webserver start a number of processes each running a complete python interpreter and loading all your data into memory.
HTTP request comes in and gets dispatched off to some process
Process does your calculation and returns the result directly to the webserver and user
When you need to change your code or the graph data, you restart the webserver and go back to step 1.
This is the architecture used Django and other popular web frameworks.
I think you can configure modwsgi/Apache so it will have several "hot" Python interpreters
in separate processes ready to go at all times and also reuse them for new accesses
(and spawn a new one if they are all busy).
In this case you could load all the preprocessed data as module globals and they would
only get loaded once per process and get reused for each new access. In fact I'm not sure this isn't the default configuration
for modwsgi/Apache.
The main problem here is that you might end up consuming
a lot of "core" memory (but that may not be a problem either).
I think you can also configure modwsgi for single process/multiple
thread -- but in that case you may only be using one CPU because
of the Python Global Interpreter Lock (the infamous GIL), I think.
Don't be afraid to ask at the modwsgi mailing list -- they are very
responsive and friendly.
You could use nginx load balancer to proxy to PythonPaste paster (which serves WSGI, for example Pylons), that launches each request as separate thread anyway.
Another option is a queue table in the database.
The worker processes run in a loop or off cron and poll the queue table for new jobs.