Multiprocessing launching too many instances of Python VM - python

I am writing some multiprocessing code (Python 2.6.4, WinXP) that spawns processes to run background tasks. In playing around with some trivial examples, I am running into an issue where my code just continuously spawns new processes, even though I only tell it to spawn a fixed number.
The program itself runs fine, but if I look in Windows TaskManager, I keep seeing new 'python.exe' processes appear. They just keep spawning more and more as the program runs (eventually starving my machine).
For example,
I would expect the code below to launch 2 python.exe processes. The first being the program itself, and the second being the child process it spawns. Any idea what I am doing wrong?
import time
import multiprocessing
class Agent(multiprocessing.Process):
def __init__(self, i):
multiprocessing.Process.__init__(self)
self.i = i
def run(self):
while True:
print 'hello from %i' % self.i
time.sleep(1)
agent = Agent(1)
agent.start()

It looks like you didn't carefully follow the guidelines in the documentation, specifically this section where it talks about "Safe importing of main module".
You need to protect your launch code with an if __name__ == '__main__': block or you'll get what you're getting, I believe.
I believe it comes down to the multiprocessing module not being able to use os.fork() as it does on Linux, where an already-running process is basically cloned in memory. On Windows (which has no such fork()) it must run a new Python interpreter and tell it to import your main module and then execute the start/run method once that's done. If you have code at "module level", unprotected by the name check, then during the import it starts the whole sequence over again, ad infinitum

When I run this in Linux with python2.6, I see a maximum of 4 python2.6 processes and I can't guarantee that they're all from this process. They're definitely not filling up the machine.
Need new python version? Linux/Windows difference?

I don't see anything wrong with that. Works fine on Ubuntu 9.10 (Python 2.6.4).
Are you sure you don't have cron or something starting multiple copies of your script? Or that the spawned script is not calling anything that would start a new instance, for example as a side effect of import if your code runs directly on import?

Related

Process getting stuck after being launched from another process

I was working on a specific type of application in Dash which required the action executed by pressing the button to be performed in a separate process. This process, in turn, was parallelizable, and in some cases spawned child processes for the efficient computation. The configuration given in this cases makes the child processes to get stuck. The code below reproduces the situation described as follows:
import multiprocessing
import time
import dash
from dash import html
from dash.dependencies import Input, Output
app = dash.Dash(__name__)
app.layout = html.Div([
html.Button(id='refresh-button', children='Button'),
html.Div(id='dynamic-container1')
])
def run_function(i):
print('hello')
time.sleep(15)
print(f'hello world {i}')
def run_process():
num = 1
print('hello world 00000')
process = multiprocessing.Process(target=run_function, args=(num,))
process.start()
process.join()
print('hello world')
#app.callback(Output('dynamic-container1', 'children'), Input('refresh-button', 'n_clicks'))
def refresh_state(click):
if click == 0 or click is None:
return None
p = multiprocessing.Process(target=run_process)
p.start()
p.join()
return None
if __name__ == '__main__':
app.run_server(debug=True)
The output of this application when pressing the button is always the following:
Connected to pydev debugger (build 172.3968.37)
Dash is running on http://127.0.0.1:8050/
* Serving Flask app 'main' (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: on
pydev debugger: process 6884 is connecting
hello world 00000
which means that the first process utilizing the function run_process() was launched, however, the child process run_function(i) was not even started. I was trying to find an explanation in popular books on multiprocessing in Python and any guidance to these "chaining" processes, but to no avail. From my understanding, the new child process run_function(i) should occupy a separate core (if there is any free core) and not to depend on the resources consumed by the parent process run_process(). Could you, please, explain to me the mechanics of this? I have a doubt that in this code run_function(i) might be coerced to consume the same resources as run_process() does, so the system basically is just restricting any new process from starting from the same resources, but I would like to confirm it from more expert users of Python.
I used Python 3.7 and Pycharm Community 2017.2.3 on Win7 to reproduce this example
I can not reproduce your error and i think this is because you are using windows ( i am on Linux with python 3.9), so i can not find the error for you, but maybe i can give you some hints:
First: To find the error, try to reduce the code to the core of the problem (you can remove the whole dash stuff to check if this is the error). In my tests the results were the same, with and without dash
Second: Windows and Linux handle the multiprocessing stuff a bit different:
windows spawns the process:
The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process object’s run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.
Unix Systems fork the process
The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.
With Unix System you can spawn (multiprocessing.set_start_method("spawn")), with this i can't reproduce you error, but with this fork/spawn example i wanted to make clear, that there sometimes things are a bit different between windows and linux, even if the same packages are used. I think your understanding of multiprocessing is correct. (Maybe this site helps too.)
Third: In the docs are some programming guidelines you should be aware of. Maybe they will help, too. And in general, the multiprocessing package does not work well with a lot of interactive python shells (like IDLE or pycharm), this can may be better on newer versions. Maybe you should try it from terminal to check if this changes something.
I hope this helps a litle bit.

Puzzling behaviour of __name__ variable using multiprocessing (Python 2.7)

Today i was messing with the multiprocessing library, and i noticed something wierd. I was trying to figure out if it was possible to have nested scripts using multiprocessing (a script that uses multiprocessing to run a part of a script which uses multiprocessing to run more parts of the script). To figure this out i started looking at what the __name__ variable is for the child-scripts, because, if you are familiar with multiprocessing, you know this is going to be a problem
When i wrote a test script, the result surprised me. I wrote this simple script:
from multiprocessing import Pool
def Child(Inf):
print "Child" + __name__
if __name__ == "__main__":
Process = Pool(4)
Process.map(Child, [0,0,0,0])
print "Parent" + __name__
(don't mind the list of four zero's)
The console printed out this:
Child__main__
Child__main__
Child__main__
Child__main__
Parent__main__
This means that the __name__ of the child processes is also __main__
if __name__ == "__main__":
This is the part that puzzles me. After testing, it appeared that the child process gets run 4 times, while the if statement only gets run once. This makes sense when reading the code, but the testing shows that all processes are called the same, and the computer shouldn't be able to tell the difference between them, because the child looks at a variable that is no different from the parent.
I am puzzled by this, because i thought i understood how putting in the if statement prevents the child-processes from running the main program as well, but this appears to be untrue.
Am a missing an important clue or is this just something weird i shouldn't be looking at? :p
Regards,
Harm
What happens is that each process is not fed with the same input.
The parent process will receive and execute the full .py file you input while child processes will fork from the parent with certain functions loaded into memory and will be requested to run one specific function instead of running the entire program (which would lead to infinite recursion...).
The fact __name__ variable is the same is because each child process is a copy of the parent. They are just at different executing points.
On Windows OS:
I didn't notice until now but somehow Python runs the code again when creating multiple processes on Windows.
On Python 3.5 (maybe other versions of Python 3 too, but I didn't check), it will set the __name__ variable to __mp_main__ and avoid the problem.
On Python 2.7, if the value is really __main__, the only explanation I have is that the variable is being set after the input code runs. Otherwise the if block would be executed as well.
Edit: I just tested on Python 2.7.11 and the __name__ variable is set to __parents_main__ before being set to __main__. I would not rely on this because it was changed on Python 3 as you can see.
There is no reason the __name__ should be different at all. It will just give you the name of the module (or __main__ in case of the program itself) where the code using it resides. And you are always calling it either from the original program or from a forked version of it.

Python Multiprocessing RuntimeError on Windows

I have a class function (let's call it "alpha.py") that uses multiprocessing (processes=2) to fork a process and is part of a Python package that I wrote. In a separate Python script (let's call it "beta.py"), I instantiated an object from this class and called the corresponding function that uses multiprocessing. Finally, all of this is wrapped inside a wrapper Python script (let's call this "gamma.py") that handles many different class objects and functions.
Essentially:
Run ./gamma.py from the command line
gamma.py uses subprocess and executes beta.py
beta.py instantiates an object from the alpha.py class and calls the function which uses multiprocessing (processes=2)
This has no problems being run on a Mac or Linux. However, it becomes a problem on a Windows machine and the error (and documentation) suggests that I should write this somewhere:
if __name__ == '__main__':
freeze_support()
This other post also mentions doing the same thing.
However, I don't know exactly where these two lines should reside. Currently, neither alpha.py, beta.py, or gamma.py contains an if __name__ == '__main__': section. It would be great if somebody can tell me where these two lines should go and also the rationale behind it.
Actually, freeze_support() is not needed here. You get a RuntimeError because you create and start your new processes at the top level of your beta module.
When a new process is created using multiprocessing on Windows, a new Python interpreter will be started in this process and it will try to import the module with the target function that should be executed. This is your beta module. Now, when you import it, all your top level statements should be executed which will cause a new process to be created and started again. And then, recursively, another process from that process, and so on and so forth.
This is most likely not what you want, thus new processes should be initialized and started only once, when you run beta.py directly with a subprocess.
if __name__ == '__main__': should be placed in beta.py, then move initialization and start code for your new processes in this section. After that, when beta.py will be imported and not run directly, no new process will be started and you will not see any side effects.

Python multiprocessing/threading blocking main thread

I’m trying to write a program in Python. What I want to write is a script which immediately returns a friendly message to the user, but spawns a long subprocess in the background that takes with several different files and writes them to a granddaddy file. I’ve done several tutorials on threading and processing, but what I’m running into is that no matter what I try, the program waits and waits until the subprocess is done before it displays the aforementioned friendly message to the user. Here’s what I’ve tried:
Threading example:
#!/usr/local/bin/python
import cgi, cgitb
import time
import threading
class TestThread(threading.Thread):
def __init__(self):
super(TestThread, self).__init__()
def run(self):
time.sleep(5)
fileHand = open('../Documents/writable/output.txt', 'w')
fileHand.write('Big String Goes Here.')
fileHand.close()
print 'Starting Program'
thread1 = TestThread()
#thread1.daemon = True
thread1.start()
I’ve read these SO posts on multithreading
How to use threading in Python?
running multiple threads in python, simultaneously - is it possible?
How do threads work in Python, and what are common Python-threading specific pitfalls?
The last of these says that running threads concurrently in Python is actually not possible. Fair enough. Most of those posts also mention the multiprocessing module, so I’ve read up on that, and it seems fairly straightforward. Here’s the some of the resources I’ve found:
How to run two functions simultaneously
Python Multiprocessing Documentation Example
https://docs.python.org/2/library/multiprocessing.html
So here’s the same example translated to multiprocessing:
#!/usr/local/bin/python
import time
from multiprocessing import Process, Pipe
def f():
time.sleep(5)
fileHand = open('../Documents/writable/output.txt', 'w')
fileHand.write('Big String Goes Here.')
fileHand.close()
if __name__ == '__main__':
print 'Starting Program'
p = Process(target=f)
p.start()
What I want is for these programs to immediately print ‘Starting Program’ (in the web-browser) and then a few seconds later a text file shows up in a directory to which I’ve given write privileges. However, what actually happens is that they’re both unresponsive for 5 seconds and then they print ‘Starting Program’ and create the text file at the same time. I know that my goal is possible because I’ve done it in PHP, using this trick:
//PHP
exec("php child_script.php > /dev/null &");
And I figured it would be possible in Python. Please let me know if I’m missing something obvious or if I’m thinking about this in the completely wrong way. Thanks for your time!
(System information: Python 2.7.6, Mac OSX Mavericks. Python installed with homebrew. My Python scripts are running as CGI executables in Apache 2.2.26)
Ok- I think I found the answer. Part of it was my own misunderstanding. A python script can't simply return message to a client-side (ajax) program but still be executing a big process. The very act of responding to the client means that the program has finished, threads and all. The solution, then, is to use the python version of this PHP trick:
//PHP
exec("php child_script.php > /dev/null &");
And in Python:
#Python
subprocess.call(" python worker.py > /dev/null &", shell=True)
It starts an entirely new process outside the current one, and it will continue after the current one has ended. I'm going to stick with Python because at least we're using a civilized api function to start the worker script instead of the exec function, which always made me uncomfortable.

Constantly monitor a program/process using Python

I am trying to constantly monitor a process which is basically a Python program. If the program stops, then I have to start the program again. I am using another Python program to do so.
For example, say I have to constantly run a process called run_constantly.py. I initially run this program manually, which writes its process ID to the file "PID" (in the location out/PROCESSID/PID).
Now I run another program which has the following code to monitor the program run_constantly.py from a Linux environment:
def Monitor_Periodic_Process():
TIMER_RUNIN = 1800
foo = imp.load_source("Run_Module","run_constantly.py")
PROGRAM_TO_MONITOR = ['run_constantly.py','out/PROCESSID/PID']
while(1):
# call the function checkPID to see if the program is running or not
res = checkPID(PROGRAM_TO_MONITOR)
# if res is 0 then program is not running so schedule it
if (res == 0):
date_time = datetime.now()
scheduler.add_cron_job(foo.Run_Module, year=date_time.year, day=date_time.day, month=date_time.month, hour=date_time.hour, minute=date_time.minute+2)
scheduler.start()
scheduler.get_jobs()
time.sleep(TIMER_NOT_RUNIN)
continue
else:
#the process is running sleep and then monitor again
time.sleep(TIMER_RUNIN)
continue
I have not included the checkPID() function here. checkPID() basically checks if the process ID still exists (i.e. if the program is still running) and if it does not exist, it returns 0. In the above program, I check if res == 0, and if so, then I use Python's scheduler to schedule the program. However, the major problem that I am currently facing is that the process ID of this program and the run_constantly.py program turns to be same once I schedule the run_constantly.py using the scheduler.add_cron_job() function. So if the program run_constantly.py crashes, the following program still thinks that the run_constantly.py is running (since both process IDs are same), and therefore continues to go into the else loop to sleep and monitor again.
Can someone tell me how to solve this issue? Is there a simple way to constantly monitor a program and reschedule it when it has crashed?
There are many programs that can do this.
On Ubuntu there is upstart (installed by default)
Lots of people like http://supervisord.org/
monit as mentioned by #nathan
If you are looking for a python alternative there is a library that has just been released called circus which looks interesting.
And pretty much every linux distro probably has one of these built in.
The choice is really just down to which one you like better, but you would be far better off using one of these than writing it yourself.
Hope that helps
If you are willing to control the monitored program directly from python instead of using cron, have a look at the subprocess module :
The subprocess module allows you to spawn new processes,
connect to their input/output/error pipes, and obtain their return codes.
Check examples like track process status with python on SO for examples and references.
You could just use monit
http://mmonit.com/monit/
It monitors processes and restarts them (and other things.)
I thought I'd add a more versatile solution, which is one that I personally use all the time as well.
It's name is Immortal (source is at https://github.com/immortal/immortal)
To have it monitor and instantly restart a program if it stops, simply run the following command:
immortal <command>
So in your case I would run run_constantly.py like so:
immortal python run_constantly.py
The command ps aux | grep run_constantly.py should return 2 process IDs, one for the Immortal command, and one for the separate command Immortal started (just the regular command. As long as the Immortal process is running, run_constantly.py will stay running.

Categories