Profile python program that forks itself as a daemon

Profile python program that forks itself as a daemon - python

Is it possible to run cprofile on a mult-threaded python program that forks itself into a daemon process? I know you can make it work on multi thread, but I haven't seen anything on profiling a daemon.

Well you can always profile it for a single process or single thread & optimize. After which make it multi-thread. Am I missing something here?

Related

Best way to stop a Python script even if there are Threads running in the script

I have a python program run_tests.py that executes test scripts (also written in python) one by one. Each test script may use threading.
The problem is that when a test script unexpectedly crashes, it may not have a chance to tidy up all open threads (if any), hence the test script cannot actually complete due to the threads that are left hanging open. When this occurs, run_tests.py gets stuck because it is waiting for the test script to finish, but it never does.
Of course, we can do our best to catch all exceptions and ensure that all threads are tidied up within each test script so that this scenario never occurs, and we can also set all threads to daemon threads, etc, but what I am looking for is a "catch-all" mechanism at the run_tests.py level which ensures that we do not get stuck indefinitely due to unfinished threads within a test script. We can implement guidelines for how threading is to be used in each test script, but at the end of the day, we don't have full control over how each test script is written.
In short, what I need to do is to stop a test script in run_tests.py even when there are rogue threads open within the test script. One way is to execute the shell command killall -9 <test_script_name> or something similar, but this seems to be too forceful/abrupt.
Is there a better way?
Thanks for reading.

To me, this looks like a pristine application for the subprocess module.
I.e. do not run the test-scripts from within the same python interpreter, rather spawn a new process for each test-script. Do you have any particular reason why you would not want to spawn a new process and run them in the same interpreter instead? Having a sub-process isolates the scripts from each other, like imports, and other global variables.
If you use the subprocess.Popen to start the sub-processes, then you have a .terminate() method to kill the process if need be.

What I actually needed to do was tidy up all threads at the end of each test script rather than at the run_tests.py level. I don't have control over the main functions of each test script, but I do have control over the tidy up functions.
So this is my final solution:
for key, thread in threading._active.iteritems():
if thread.name != 'MainThread':
thread._Thread__stop()
I don't actually need to stop the threads. I simply need to mark them as stopped with _Thread__stop() so that the test script can exit. I hope others find this useful.

User Input Python Script Executing Daemon

I am working on a web service that requires user input python code to be executed on my server (we have checks for code injection). I have to import a rather large module so I would like to make sure that I am not starting up python and importing the module from scratch each time something runs (it takes about 4-6s).
To do this I was planning to create a python (3.2) deamon that imports the user input code as a module, executes it and then delete/garbage collect that module. I need to make sure that that module is completely gone from RAM since this process will continue until the server is restarted. I have read a bunch of things that say this is a very difficult thing to do in python.
What is the best way to do this? Would it be better to use exec to define a function with the user input code (for variable scoping) and then execute that function and somehow remove the function? Or is there a better way to do this process that I have missed?

You could perhaps consider to create a pool of python daemon processes?
Their purpose would be to serve one request and to die afterwards.
You would have to write a pool-manager that ensures that there are always X daemon processes waiting for an incoming request. (X being the number of waiting daemon processes: depending on the required workload). The pool-manager would have to observe the pool of daemon processes and start new instances every time a process was finished.

How do I run long term (infinite) Python processes?

I've recently started experimenting with using Python for web development. So far I've had some success using Apache with mod_wsgi and the Django web framework for Python 2.7. However I have run into some issues with having processes constantly running, updating information and such.
I have written a script I call "daemonManager.py" that can start and stop all or individual python update loops (Should I call them Daemons?). It does that by forking, then loading the module for the specific functions it should run and starting an infinite loop. It saves a PID file in /var/run to keep track of the process. So far so good. The problems I've encountered are:
Now and then one of the processes will just quit. I check ps in the morning and the process is just gone. No errors were logged (I'm using the logging module), and I'm covering every exception I can think of and logging them. Also I don't think these quitting processes has anything to do with my code, because all my processes run completely different code and exit at pretty similar intervals. I could be wrong of course. Is it normal for Python processes to just die after they've run for days/weeks? How should I tackle this problem? Should I write another daemon that periodically checks if the other daemons are still running? What if that daemon stops? I'm at a loss on how to handle this.
How can I programmatically know if a process is still running or not? I'm saving the PID files in /var/run and checking if the PID file is there to determine whether or not the process is running. But if the process just dies of unexpected causes, the PID file will remain. I therefore have to delete these files every time a process crashes (a couple of times per week), which sort of defeats the purpose. I guess I could check if a process is running at the PID in the file, but what if another process has started and was assigned the PID of the dead process? My daemon would think that the process is running fine even if it's long dead. Again I'm at a loss just how to deal with this.
Any useful answer on how to best run infinite Python processes, hopefully also shedding some light on the above problems, I will accept
I'm using Apache 2.2.14 on an Ubuntu machine.
My Python version is 2.7.2

I'll open by stating that this is one way to manage a long running process (LRP) -- not de facto by any stretch.
In my experience, the best possible product comes from concentrating on the specific problem you're dealing with, while delegating supporting tech to other libraries. In this case, I'm referring to the act of backgrounding processes (the art of the double fork), monitoring, and log redirection.
My favorite solution is http://supervisord.org/
Using a system like supervisord, you basically write a conventional python script that performs a task while stuck in an "infinite" loop.
#!/usr/bin/python
import sys
import time
def main_loop():
while 1:
# do your stuff...
time.sleep(0.1)
if __name__ == '__main__':
try:
main_loop()
except KeyboardInterrupt:
print >> sys.stderr, '\nExiting by user request.\n'
sys.exit(0)
Writing your script this way makes it simple and convenient to develop and debug (you can easily start/stop it in a terminal, watching the log output as events unfold). When it comes time to throw into production, you simply define a supervisor config that calls your script (here's the full example for defining a "program", much of which is optional: http://supervisord.org/configuration.html#program-x-section-example).
Supervisor has a bunch of configuration options so I won't enumerate them, but I will say that it specifically solves the problems you describe:
Backgrounding/Daemonizing
PID tracking (can be configured to restart a process should it terminate unexpectedly)
Log normally in your script (stream handler if using logging module rather than printing) but let supervisor redirect to a file for you.

You should consider Python processes as able to run "forever" assuming you don't have any memory leaks in your program, the Python interpreter, or any of the Python libraries / modules that you are using. (Even in the face of memory leaks, you might be able to run forever if you have sufficient swap space on a 64-bit machine. Decades, if not centuries, should be doable. I've had Python processes survive just fine for nearly two years on limited hardware -- before the hardware needed to be moved.)
Ensuring programs restart when they die used to be very simple back when Linux distributions used SysV-style init -- you just add a new line to the /etc/inittab and init(8) would spawn your program at boot and re-spawn it if it dies. (I know of no mechanism to replicate this functionality with the new upstart init-replacement that many distributions are using these days. I'm not saying it is impossible, I just don't know how to do it.)
But even the init(8) mechanism of years gone by wasn't as flexible as some would have liked. The daemontools package by DJB is one example of process control-and-monitoring tools intended to keep daemons living forever. The Linux-HA suite provides another similar tool, though it might provide too much "extra" functionality to be justified for this task. monit is another option.

I assume you are running Unix/Linux but you don't really say. I have no direct advice on your issue. So I don't expect to be the "right" answer to this question. But there is something to explore here.
First, if your daemons are crashing, you should fix that. Only programs with bugs should crash. Perhaps you should launch them under a debugger and see what happens when they crash (if that's possible). Do you have any trace logging in these processes? If not, add them. That might help diagnose your crash.
Second, are your daemons providing services (opening pipes and waiting for requests) or are they performing periodic cleanup? If they are periodic cleanup processes you should use cron to launch them periodically rather then have them run in an infinite loop. Cron processes should be preferred over daemon processes. Similarly, if they are services that open ports and service requests, have you considered making them work with INETD? Again, a single daemon (inetd) should be preferred to a bunch of daemon processes.
Third, saving a PID in a file is not very effective, as you've discovered. Perhaps a shared IPC, like a semaphore, would work better. I don't have any details here though.
Fourth, sometimes I need stuff to run in the context of the website. I use a cron process that calls wget with a maintenance URL. You set a special cookie and include the cookie info in with wget command line. If the special cookie doesn't exist, return 403 rather than performing the maintenance process. The other benefit here is login to the database and other environmental concerns of avoided since the code that serves normal web pages are serving the maintenance process.
Hope that gives you ideas. I think avoiding daemons if you can is the best place to start. If you can run your python within mod_wsgi that saves you having to support multiple "environments". Debugging a process that fails after running for days at a time is just brutal.

python Global Interpreter Lock GIL problem

I want to provide a service on the web that people can test out the performance of an algo, which is written in python and running on the linux machine
basically what I want to do is that, there is a very trivial PHP handler, let's say start_algo.php, which accepts the request coming from browser, and in the php code through system() or popen() (something like exec( "python algo.py" ) ) to issue a new process running the python script, I think it is doable in this part
problem is that since it is a web service, surely it has to serve multiple users at the same time, but I am quite confused by the Global Interpreter Lock GIL http://wiki.python.org/moin/GlobalInterpreterLock
that the 'standard' CPython has implemented,
does it mean, if I have 3 users running the algo now (which means 3 separated processes, correct me if I am wrong plz), at a particular moment, there is only one user is being served by the Python interpreter and the other 2 are waiting for their turns?
Many thanks in advance
Ted

If you are opening each script by invoking a new process; you will not run afoul of the GIL. Each process gets its own interpreter and therefore its own interpreter lock.

The GIL is per-process. If you start multiple python processes, each will have its own GIL that prevents the interpreter(s) in this specific process from running more than one thread at a time. But independent processes can run at the same time.
Also, multiple threads inside one Python process do take turns on running (rather frequently, IIRC once per hundred opcode instructions or a few dozen milliseconds depending on the version), so it's not like the GIL prevents concurrency at all - it just prevents multi-threading.

Kill sub-threads when Django restarts?

I'm running Django, and I'm creating threads that run in parallel while Django runs. Those threads sometimes run external processes that block while waiting for external input.
When I restart Django, those threads that are blocking while awaiting external input sometimes persist through the restart, and further they have and keep open Port 8080 so Django can't restart.
If I knew when Django was restarting, I could kill those threads. How can I tell when Django is restarting so that I can kill those threads (and their spawn).
It wasn't obvious from django.utils.autoreload where any hooks may be to tell when a restart is occurring.
Is there an alternative way to kill these threads when Django starts up?
Thanks for reading.
Brian

It's not easy for a Python process to kill its own threads -- even harder (nearly impossible) to kill the threads of another process, and I suspect the latter is the case you have... the "restart" is presumably happening on a different process, so those threads are more or less out of bounds for you!
What I suggest instead is "a stitch in time saves nine": when you create those threads, make sure you set their daemon property to True (see the docs -- it's the setDaemon method in Python <= 2.5). This way, when the main thread finishes, e.g. to restart in another process, so will the entire process (which should take all the daemon threads down, too, automatically!-)

What are you using to restart django? I'd put something in that script to look for process id's in the socket file(s) and kill those before starting django.
Alternatively, you could be very heavy handed and just run something like 'pkill -9 *django*' before your django startup sequence.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.