For quite a long time I've wanted to start a pet project that will aim in
time to become a web hosting control panel, but mainly focused on Python hosting --
meaning I would like to make a way for users to generate/start Django/
other frameworks projects right from the panel. I seemed to have
found the perfect tool to build my app with it: CherryPy.
This would allow me to do it the way I want, building the app with its own HTTP/
HTTPS server and also all in my favorite programming language.
But now a new question arises: As CherryPy is a threaded server, will
it be the right for this kind of task?
There will be lots of time consuming tasks so if one of the
tasks blocks, the rest of the users trying to access other pages will
be left waiting and eventually get timed out.
I imagine that this kind of problem wouldn't happen on a fork based server.
What would you advise?
"Threaded" and "Fork based" servers are equivalent. A "threaded" server has multiple threads of execution, and if one blocks then the others will continue. A "Fork based" server has multiple processes executing, and if one blocks then the others will continue. The only difference is that threaded servers by default will share memory between the threads, "fork based" ones by default will not share memory.
One other point - the "subprocess" module is not thread safe, so if you try to use it from CherryPy you will get wierd errors. (This is Python Bug 1731717)
Related
I've developed a set of audio streaming server, all of them are using Twisted, and they are in Python, of course. They work, but a problem keeps troubling me, when I found some bugs there in the running server, or I want add something into the server, I need to stop them and start. Unlike HTTP servers, it's okay to restart them whenever, but not okay with audio streaming servers. Once I restart my streaming server, it means my users will encounter a disconnection.
I did try to setup a manhole (a ssh service for Twisted servers, you can login and type Python code in the console to do something), and connect to the console, reload Python modules on the fly. It works sometimes, but hard to control. You never know how many instances of old class are there in the server, and some of them might be hard to reach, and relationships of class would be very complex. Also, it may works in some situations, but sometimes you really need to restart server, for example, you are running the server with selector reactor, and you want to run it with epoll reactor instead, then you have to restart it. Another example, when the memory usage goes too high, you have to restart them, too.
To build such system, I have an idea comes in my head, I'm thinking is that possible to hand over those connections and data from a process to another. For example:
We have a Server named Broadcasting, and the running instance is under rev.123, and we want replace it with rev.124.
Broadcasting rev.123 is running....
Startup Broadcasting rev.124 ....
Broadcasting rev.124 is stand by
Hand over connections from instance of rev.123 to instance of rev.124
Stop Broadcasting rev. 123 instance
Is this possible? I have no idea that does lifetime of socket handles bound to processes or not, I thought sockets created by a process will be closed when the creator process is killed, but I'm not sure. If it is possible, are there any guidelines or articles for designing such kind of hot code swapping mechanism? And is there something can achieve what I want for Twisted already be done?
Thanks.
I gave a talk about this at PyCon 2004. There's also some effort to add more functionality to help with this to Twisted itself.
Question is relevant to this and this;
the difference is, I'd prefer something with possibly more precision and low load (per-minute cron job isn't preferable for those) and with minimal overhead (i.e. installing celery with rabbitmq seems like a big overkill).
An example task for such is personal reminders server (with reminders that could be edited over web and sent out through e-mail or XMPP).
I'm probably looking for something more like node.js's setTimeout but for django (and though I might prefer to implement reminders in node.js anyway, it's still a possibly interesting question).
For example, it's possible to start new threads in django app (with functions consisting of sleep() and send()); in what ways this can be bad?
The problem with using threads for this solution are the typical problems with Python threads that always drive people towards multi-process solutions instead. The problem is compounded here by the fact your thread isn't driven by the normal request-response cycle. This is summarized nicely by Malcolm Tredinnick here:
Have to disagree. Threads are not a good solution to this problem. The
issue is process management. As written, your threads will never be
rejoined. Webserver processes have a lifecycle uncontrollable by you
(the MaxRequestsPerChild Apache parameter and similar things in other
servers) and you are messing with that by using threads.
If you need a process with a lifecycle that is not matched by the
request-response path — something long running and independent of the
response — a completely separate process is definitely the right model
to use. Using a thread is tying it to the response lifecycle, which
wil have unintended side-effects.
A possible solution for you might be to have a long running process performing your tasks which gets a wake-up signal from a light cron process.
Another possibility would be build something using 0mq, which is much lighter than AMQP style queues (at the cost of some features of course). Tarek Ziade is working on a Mozilla project called powerhose that uses 0mq, looks super simple, and has a heartbeat capability with resolution to the second.
I've developed a set of audio streaming server, all of them are using Twisted, and they are in Python, of course. They work, but a problem keeps troubling me, when I found some bugs there in the running server, or I want add something into the server, I need to stop them and start. Unlike HTTP servers, it's okay to restart them whenever, but not okay with audio streaming servers. Once I restart my streaming server, it means my users will encounter a disconnection.
I did try to setup a manhole (a ssh service for Twisted servers, you can login and type Python code in the console to do something), and connect to the console, reload Python modules on the fly. It works sometimes, but hard to control. You never know how many instances of old class are there in the server, and some of them might be hard to reach, and relationships of class would be very complex. Also, it may works in some situations, but sometimes you really need to restart server, for example, you are running the server with selector reactor, and you want to run it with epoll reactor instead, then you have to restart it. Another example, when the memory usage goes too high, you have to restart them, too.
To build such system, I have an idea comes in my head, I'm thinking is that possible to hand over those connections and data from a process to another. For example:
We have a Server named Broadcasting, and the running instance is under rev.123, and we want replace it with rev.124.
Broadcasting rev.123 is running....
Startup Broadcasting rev.124 ....
Broadcasting rev.124 is stand by
Hand over connections from instance of rev.123 to instance of rev.124
Stop Broadcasting rev. 123 instance
Is this possible? I have no idea that does lifetime of socket handles bound to processes or not, I thought sockets created by a process will be closed when the creator process is killed, but I'm not sure. If it is possible, are there any guidelines or articles for designing such kind of hot code swapping mechanism? And is there something can achieve what I want for Twisted already be done?
Thanks.
I gave a talk about this at PyCon 2004. There's also some effort to add more functionality to help with this to Twisted itself.
I'm working on a turn-based web game that will perform all world updates (player orders, physics, scripted events, etc.) on the server. For now, I could simply update the world in a web request callback. Unfortunately, that naive approach is not at all scalable. I don't want to bog down my web server when I start running many concurrent games.
So what is the best way to separate the load from the web server, ideally in a way that could even be run on a separate machine?
A simple python module with infinite loop?
A distributed task in something like Celery?
Some sort of cross-platform Cron scheduler?
Some other fancy Django feature or third-party library that I don't know about?
I also want to minimize code duplication by using the same model layer. That probably means my service would need access to the Django model code, so that definitely determines how I architect the service.
I think Celery, which you mention in your question, is the way to go here. It will interface nicely with the rest of your setup, support your eventual aim of separating out the systems, and is compatible with Django.
I'd just write the backend to just use the Django database interface (look at the setup code in your manage.py), spawn it as its own process, and interface to it with Protocol Buffers. That route should move to a separate machine with little work. MPI may be an option, too.
Pipes, FIFOs, and most other IPC requires both processes to be on the same box.
Though I have to point out a flaw in your premise:
Unfortunately, that naive approach is not at all scalable. I don't want to bog down my web server when I start running many concurrent games.
If you run concurrent games, so long as you keep all the parts for a given game on the same server, this is a non-issue, unless there's a common resource needed by all games. Then the real issue becomes load balancing across the servers.
I need to run a server side script like Python "forever" (or as long as possible without loosing state), so they can keep sockets open and asynchronously react to events like data received. For example if I use Twisted for socket communication.
How would I manage something like this?
Am I confused? or are there are better ways to implement asynchronous socket communication?
After starting the script once via Apache server, how do I stop it running?
If you are using twisted then it has a whole infrastructure for starting and stopping daemons.
http://twistedmatrix.com/projects/core/documentation/howto/application.html
How would I manage something like this?
Twisted works well for this, read the link above
Am I confused? or are there are better ways to implement asynchronous socket communication?
Twisted is very good at asynchronous socket communications. It is hard on the brain until you get the hang of it though!
After starting the script once via Apache server, how do I stop it running?
The twisted tools assume command line access, so you'd have to write a cgi wrapper for starting / stopping them if I understand what you want to do.
You can just write an script that is continuously in a while block waiting for the connection to happen and waits for a signal to close it.
http://docs.python.org/library/signal.html
Then to stop it you just need to run another script that sends that signal to him.
You can use a ‘double fork’ to run your code in a new background process unbound to the old one. See eg this recipe with more explanatory comments than you could possibly want.
I wouldn't recommend this as a primary way of running background tasks for a web site. If your Python is embedded in an Apache process, for example, you'll be forking more than you want. Better to invoke the daemon separately (just under a similar low-privilege user).
After starting the script once via Apache server, how do I stop it running?
You have your second fork write the process number (pid) of the daemon process to a file, and then read the pid from that file and send it a terminate signal (os.kill(pid, signal.SIG_TERM)).
Am I confused?
That's the question! I'm assuming you are trying to have a background process that responds on a different port to the web interface for some sort of unusual net service. If you merely talking about responding to normal web requests you shoudn't be doing this, you should rely on Apache to handle your sockets and service one request at a time.
I think Comet is what you're looking for. Make sure to take a look at Tornado too.
You may want to look at FastCGI, it sounds exactly like what you are looking for, but I'm not sure if it's under current development. It uses a CGI daemon and a special apache module to communicate with it. Since the daemon is long running, you don't have the fork/exec cost. But as a cost of managing your own resources (no automagic cleanup on every request)
One reason why this style of FastCGI isn't used much anymore is there are ways to embed interpreters into the Apache binary and have them run in server. I'm not familiar with mod_python, but i know mod_perl has configuration to allow long running processes. Be careful here, since a long running process in the server can cause resource leaks.
A general question is: what do you want to do? Why do you need this second process, but yet somehow controlled by apache? Why can'ty ou just build a daemon that talks to apache, why does it have to be controlled by apache?