using celery with pyramid and mod_wsgi - python

I've been able to deploy a test application by using pyramid with pserve and running pceleryd (I just send an email without blocking while it is sent).
But there's one point that I don't understand: I want to run my application with mod_wsgi, and I don't understand if I can can do it without having to run pceleryd from a shell, but if I can do something in the virtualhost configuration.
Is it possible? How?

There are technically ways you could use Apache/mod_wsgi to manage a process distinct from that handling web requests, but the pain point is that Celery will want to fork off further worker processes. Forking further processes from a process managed by Apache can cause problems at times and so is not recommended.
You are thus better of starting up Celery process separately. One option is to use supervisord to start it up and manage it.

Related

Running ApScheduler in Gunicorn Without Duplicating Per Worker

The title basically says it all. I have gunicorn running my app with 5 workers. I have a data structure that all the workers need access to that is being updated on a schedule by apscheduler. Currently apscheduler is being run once per worker, but I just want it run once period. Is there a way to do this? I've tried using the --preload option, which let's me load the shared data structure just once, but doesn't seem to let all the workers have access to it when it updates. I'm open to switching to uWSGI if that helps.
I'm not aware of any way to do this with either, at least not without some sort of RPC. That is, run APScheduler in a separate process and then connect to it from each worker. You may want to look up projects like RPyC and Execnet to do that.

Openshift Python multiple httpd instances

I have a Python web application (using WSGI) deployed on Openshift. The application is quite memory greedy. What I have noticed is that there are several instance of Apache httpd service deployed at all times. That means the memory usage of my gear is multiplied by the number of these processes and the application crashes pretty often.
I don't have lots of traffic yet, so there is no need to have multiple httpd running.
Is there any way to configure Python cartridge to limit it to a single httpd process?
If you are using the OpenShift Python cartridge and its default setup, only two of those processes should actually have copies of your application running in it. The other httpd processes are the parent monitor process and the Apache child worker processes which will proxy requests to the processes which are actually running your web application.
If you need control to reduce it down to one process, then you would need to follow:
http://blog.dscpl.com.au/2015/01/using-alternative-wsgi-servers-with.html
to override the standard setup and use mod_wsgi-express instead. This will default to using one process for your application and allow you to control both number of processes and threads for the application processes.
If you are seeing lots of memory use, then it could just be your application code, or there is an outside chance you are seeing memory issues due to use of older mod_wsgi as there are some odd corner cases which can cause extra memory usage because of how Apache works. If you use mod_wsgi-express it will use the latest and avoid those problems.
So try mod_wsgi-express and if still have memory issues, suggest you get on the mod_wsgi mailing list to get help debugging it.

Python start and manage external processes from Django

I'm in need of a way to execute external long running processes from a web app written in Django and Python.
Right now I'm using Supervisord and the API. My problem with this solution is that it's very static. I need to build the commands from my app instead of having to pre configure Supervisord with all possible commands. The argument and the command is dynamic.
I need to execute the external process, save a pid/identifier and later be able to check if it's still alive and running and stop the process.
I've found https://github.com/mnaberez/supervisor_twiddler to add processes on the fly to supervisord. Maybe that's the best way to go?
Any other ideas how to best solve this problem?
I suggest you have a look at this post:
Processing long-running Django tasks using Celery + RabbitMQ + Supervisord + Monit
As the title says, there are a few additional components involved (mainly celery and rabbitMQ), but these are good and proven technologies for this kind of requirement.

Running celery in django not as an external process?

I want to give celery a try. I'm interested in a simple way to schedule crontab-like tasks, similar to Spring's quartz.
I see from celery's documentation that it requires running celeryd as a daemon process. Is there a way to refrain from running another external process and simply running this embedded in my django instance? Since I'm not interested in distributing the work at the moment, I'd rather keep it simple.
Add CELERY_ALWAYS_EAGER=True option in your django settings file and all your tasks will be executed locally. Seems like for the periodic tasks you have to execute celery beat as well.

Threaded code on mod_python

I have written a Django app that makes use of Python threading to create a web spider, the spider operates as a series of threads to check links.
When I run this app using the django test server (built in), the app runs fine and the threads seem to start and stop on time.
However, running the app on Apache it seems the threads aren't kicking off and running (after about 80 seconds there should be a queued database update and these changes aren't occuring).
Does anyone have an idea what I'm missing here?
-- Edit: My question is, how does Apache handle threaded applications, i.e. is there a limit on how many threads can be run from a single app?
Any help would be appreciated!
Most likely, you are missing the creation of new processes. Apache will not run in a single process, but fork new processes for requests every now and then (depending on a dozen or so configuration parameters). If you run django in each process, they will share no memory, and the results produced in one worker won't be visible to any of the others. In addition, the Apache process might terminate (on idle, or after a certain time), discarding your in-memory results.

Categories