Goal: Create a long running process from a python script.
I started with a simple unix/linux daemon in Python. But, then I also created an init script that just sents the python script (with a while loop) into the background like this: python test.py & I'm wondering what the difference, in effect, is between the two of these methods?
note: I understand that one creates a child process, and the other doesn't. My question revolves more around the effect.
They are the same thing. The only difference is the python daemon should set the parent process which means if you kill the parent process the child should die too.
Related
This question already has answers here:
python multiprocessing on windows, if __name__ == "__main__"
(2 answers)
Closed 4 years ago.
While using multiprocessing in python on windows, it is expected to protect the entry point of the program.
The documentation says "Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process)". Can anyone explain what exactly does this mean ?
Expanding a bit on the good answer you already got, it helps if you understand what Linux-y systems do. They spawn new processes using fork(), which has two good consequences:
All data structures existing in the main program are visible to the child processes. They actually work on copies of the data.
The child processes start executing at the instruction immediately following the fork() in the main program - so any module-level code already executed in the module will not be executed again.
fork() isn't possible in Windows, so on Windows each module is imported anew by each child process. So:
On Windows, no data structures existing in the main program are visible to the child processes; and,
All module-level code is executed in each child process.
So you need to think a bit about which code you want executed only in the main program. The most obvious example is that you want code that creates child processes to run only in the main program - so that should be protected by __name__ == '__main__'. For a subtler example, consider code that builds a gigantic list, which you intend to pass out to worker processes to crawl over. You probably want to protect that too, because there's no point in this case to make each worker process waste RAM and time building their own useless copies of the gigantic list.
Note that it's a Good Idea to use __name__ == "__main__" appropriately even on Linux-y systems, because it makes the intended division of work clearer. Parallel programs can be confusing - every little bit helps ;-)
The multiprocessing module works by creating new Python processes that will import your module. If you did not add __name__== '__main__' protection then you would enter a never ending loop of new process creation. It goes like this:
Your module is imported and executes code during the import that cause multiprocessing to spawn 4 new processes.
Those 4 new processes in turn import the module and executes code during the import that cause multiprocessing to spawn 16 new processes.
Those 16 new processes in turn import the module and executes code during the import that cause multiprocessing to spawn 64 new processes.
Well, hopefully you get the picture.
So the idea is that you make sure that the process spawning only happens once. And that is achieved most easily with the idiom of the __name__== '__main__' protection.
It seems like subprocess.Popen() and os.fork() both are able to create a child process. I would however like to know what the difference is between both. When would you use which one? I tried looking at their source code but I couldn't find fork()'s source code on my machine and it wasn't totally clear how Popen works on Unix machines.
Could someobody please elaborate?
Thanks
subprocess.Popen let's you execute an arbitrary program/command/executable/whatever in its own process.
os.fork only allows you to create a child process that will execute the same script from the exact line in which you called it. As its name suggests, it "simply" forks the current process into 2.
os.fork is only available on Unix, and subprocess.Popen is cross-platfrom.
So I read the documentation for you. Results:
os.fork only exists on Unix. It creates a child process (by cloning the existing process), but that's all it does. When it returns, you have two (mostly) identical processes, both running the same code, both returning from os.fork (but the new process gets 0 from os.fork while the parent process gets the PID of the child process).
subprocess.Popen is more portable (in particular, it works on Windows). It creates a child process, but you must specify another program that the child process should execute. On Unix, it is implemented by calling os.fork (to clone the parent process), then os.execvp (to load the program into the new child process). Because Popen is all about executing a program, it lets you customize the initial environment of the program. You can redirect its standard handles, specify command line arguments, override environment variables, set its working directory, etc. None of this applies to os.fork.
In general, subprocess.Popen is more convenient to use. If you use os.fork, there's a lot you need to handle manually, and it'll only work on Unix systems. On the other hand, if you actually want to clone a process and not execute a new program, os.fork is the way to go.
Subprocess.popen() spawns a new OS level process.
os.fork() creates another process which will resume at exactly the same place as this one. So within the first loop run, you get a fork after which you have two processes, the "original one" (which gets a pid value of the PID of the child process) and the forked one (which gets a pid value of 0).
I have a python program run_tests.py that executes test scripts (also written in python) one by one. Each test script may use threading.
The problem is that when a test script unexpectedly crashes, it may not have a chance to tidy up all open threads (if any), hence the test script cannot actually complete due to the threads that are left hanging open. When this occurs, run_tests.py gets stuck because it is waiting for the test script to finish, but it never does.
Of course, we can do our best to catch all exceptions and ensure that all threads are tidied up within each test script so that this scenario never occurs, and we can also set all threads to daemon threads, etc, but what I am looking for is a "catch-all" mechanism at the run_tests.py level which ensures that we do not get stuck indefinitely due to unfinished threads within a test script. We can implement guidelines for how threading is to be used in each test script, but at the end of the day, we don't have full control over how each test script is written.
In short, what I need to do is to stop a test script in run_tests.py even when there are rogue threads open within the test script. One way is to execute the shell command killall -9 <test_script_name> or something similar, but this seems to be too forceful/abrupt.
Is there a better way?
Thanks for reading.
To me, this looks like a pristine application for the subprocess module.
I.e. do not run the test-scripts from within the same python interpreter, rather spawn a new process for each test-script. Do you have any particular reason why you would not want to spawn a new process and run them in the same interpreter instead? Having a sub-process isolates the scripts from each other, like imports, and other global variables.
If you use the subprocess.Popen to start the sub-processes, then you have a .terminate() method to kill the process if need be.
What I actually needed to do was tidy up all threads at the end of each test script rather than at the run_tests.py level. I don't have control over the main functions of each test script, but I do have control over the tidy up functions.
So this is my final solution:
for key, thread in threading._active.iteritems():
if thread.name != 'MainThread':
thread._Thread__stop()
I don't actually need to stop the threads. I simply need to mark them as stopped with _Thread__stop() so that the test script can exit. I hope others find this useful.
hi lets assume i have a simple programm in python. This programm is running every five minutes throught cron. but i dont know how to write it so the programm will allow to run multiple processes of its self simultaneously. i want to speed things up ...
I'd handle the forking and process control inside your main python program. Let the cron spawn only a single process and that process be a master for (possible multiple) worker processes.
As for how you can create multiple workers, there's the threading module for multi threading and multiprocessing module for multi processing. You can also keep your actual worker code as separate files and use the subprocess module.
Now that I think about it, maybe you should use supervisord to do the actual process control and simply write the actual work code.
Is it possible to run cprofile on a mult-threaded python program that forks itself into a daemon process? I know you can make it work on multi thread, but I haven't seen anything on profiling a daemon.
Well you can always profile it for a single process or single thread & optimize. After which make it multi-thread. Am I missing something here?