I am aware of the memory limitations of the Heroku platform, and I know that it is far more scalable to separate an app into web and worker dynos. However, I still would like to run asynchronous tasks alongside the web process for testing purposes. Dynos are costly and I would like to prototype on the free instance that Heroku provides.
Are there any issues with spawning a new job as a process or subprocess in the same dyno as a web process?
On the newer Cedar stack, there are no issues with spawning multiple processes. Each dyno is a virtual machine and has no particular limitations except in memory and CPU usage (about 512 MB of memory, I think, and 1 CPU core). Following the newer installation instructions for some stacks such as Python will result in a configuration with multiple (web server) processes out of the box.
Software installed on web dynos may vary depending on what buildpack you are using; if your subprocesses need special software then you may have to either bundle it with your application or (better) roll your own buildpack.
At this point I would normally remind you that running asynchronous tasks on worker dynos instead of web dynos, with a proper task queue system, is strongly encouraged, but it sounds like you know that already. Do keep in mind that accounts with only one web dyno (typically this means, "free" accounts) will have that dyno spun down after an hour or so of not receiving any web requests, and that any background processes running on the dyno at that time will necessarily be killed. Accounts with multiple web dynos are not subject to this restriction.
Related
I have a Python web application (using WSGI) deployed on Openshift. The application is quite memory greedy. What I have noticed is that there are several instance of Apache httpd service deployed at all times. That means the memory usage of my gear is multiplied by the number of these processes and the application crashes pretty often.
I don't have lots of traffic yet, so there is no need to have multiple httpd running.
Is there any way to configure Python cartridge to limit it to a single httpd process?
If you are using the OpenShift Python cartridge and its default setup, only two of those processes should actually have copies of your application running in it. The other httpd processes are the parent monitor process and the Apache child worker processes which will proxy requests to the processes which are actually running your web application.
If you need control to reduce it down to one process, then you would need to follow:
http://blog.dscpl.com.au/2015/01/using-alternative-wsgi-servers-with.html
to override the standard setup and use mod_wsgi-express instead. This will default to using one process for your application and allow you to control both number of processes and threads for the application processes.
If you are seeing lots of memory use, then it could just be your application code, or there is an outside chance you are seeing memory issues due to use of older mod_wsgi as there are some odd corner cases which can cause extra memory usage because of how Apache works. If you use mod_wsgi-express it will use the latest and avoid those problems.
So try mod_wsgi-express and if still have memory issues, suggest you get on the mod_wsgi mailing list to get help debugging it.
I am new to celery.I know how to install and run one server but I need to distribute the task to multiple machines.
My project uses celery to assign user requests passing to a web framework to different machines and then returns the result.
I read the documentation but there it doesn't mention how to set up multiple machines.
What am I missing?
My understanding is that your app will push requests into a queueing system (e.g. rabbitMQ) and then you can start any number of workers on different machines (with access to the same code as the app which submitted the task). They will pick out tasks from the message queue and then get to work on them. Once they're done, they will update the tombstone database.
The upshot of this is that you don't have to do anything special to start multiple workers. Just start them on separate identical (same source tree) machines.
The server which has the message queue need not be the same as the one with the workers and needn't be the same as the machines which submit jobs. You just need to put the location of the message queue in your celeryconfig.py and all the workers on all the machines can pick up jobs from the queue to perform tasks.
The way I deployed it is like this:
clone your django project on a heroku instance (this will run the frontend)
add RabitMQ as an add on and configure it
clone your django project into another heroku instance (call it like worker) where you will run the celery tasks
I am new to celery.I know how to install and run one server but I need to distribute the task to multiple machines.
My project uses celery to assign user requests passing to a web framework to different machines and then returns the result.
I read the documentation but there it doesn't mention how to set up multiple machines.
What am I missing?
My understanding is that your app will push requests into a queueing system (e.g. rabbitMQ) and then you can start any number of workers on different machines (with access to the same code as the app which submitted the task). They will pick out tasks from the message queue and then get to work on them. Once they're done, they will update the tombstone database.
The upshot of this is that you don't have to do anything special to start multiple workers. Just start them on separate identical (same source tree) machines.
The server which has the message queue need not be the same as the one with the workers and needn't be the same as the machines which submit jobs. You just need to put the location of the message queue in your celeryconfig.py and all the workers on all the machines can pick up jobs from the queue to perform tasks.
The way I deployed it is like this:
clone your django project on a heroku instance (this will run the frontend)
add RabitMQ as an add on and configure it
clone your django project into another heroku instance (call it like worker) where you will run the celery tasks
I've been able to deploy a test application by using pyramid with pserve and running pceleryd (I just send an email without blocking while it is sent).
But there's one point that I don't understand: I want to run my application with mod_wsgi, and I don't understand if I can can do it without having to run pceleryd from a shell, but if I can do something in the virtualhost configuration.
Is it possible? How?
There are technically ways you could use Apache/mod_wsgi to manage a process distinct from that handling web requests, but the pain point is that Celery will want to fork off further worker processes. Forking further processes from a process managed by Apache can cause problems at times and so is not recommended.
You are thus better of starting up Celery process separately. One option is to use supervisord to start it up and manage it.
I have written a Django app that makes use of Python threading to create a web spider, the spider operates as a series of threads to check links.
When I run this app using the django test server (built in), the app runs fine and the threads seem to start and stop on time.
However, running the app on Apache it seems the threads aren't kicking off and running (after about 80 seconds there should be a queued database update and these changes aren't occuring).
Does anyone have an idea what I'm missing here?
-- Edit: My question is, how does Apache handle threaded applications, i.e. is there a limit on how many threads can be run from a single app?
Any help would be appreciated!
Most likely, you are missing the creation of new processes. Apache will not run in a single process, but fork new processes for requests every now and then (depending on a dozen or so configuration parameters). If you run django in each process, they will share no memory, and the results produced in one worker won't be visible to any of the others. In addition, the Apache process might terminate (on idle, or after a certain time), discarding your in-memory results.