Why the multiprocessing cannot work in Python in this code? [duplicate] - python

A basic example of multiprocessing Process class runs when executed from file, but not from IDLE. Why is that and can it be done?
from multiprocessing import Process
def f(name):
print('hello', name)
p = Process(target=f, args=('bob',))
p.start()
p.join()

Yes. The following works in that function f is run in a separate (third) process.
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
However, to see the print output, at least on Windows, one must start IDLE from a console like so.
C:\Users\Terry>python -m idlelib
hello bob
(Use idlelib.idle on 2.x.) The reason is that IDLE runs user code in a separate process. Currently the connection between the IDLE process and the user code process is via a socket. The fork done by multiprocessing does not duplicate or inherit the socket connection. When IDLE is started via an icon or Explorer (in Windows), there is nowhere for the print output to go. When started from a console with python (rather than pythonw), output goes to the console, as above.

Related

How to keep sub-process running after main process has exited?

I have a requirement to use python to start a totally independent process. That means even the main process exited, the sub-process can still run.
Just like the shell in Linux:
#./a.out &
then even if the ssh connection is lost, then a.out can still keep running.
I need a similar but unified way across Linux and Windows
I have tried the multiprocessing module
import multiprocessing
import time
def fun():
while True:
print("Hello")
time.sleep(3)
if __name__ == '__main__':
p = multiprocessing.Process(name="Fun", target=fun)
p.daemon = True
p.start()
time.sleep(6)
If I set the p.daemon = True, then the print("Hello") will stop in 6s, just after the main process exited.
But if I set the p.daemon = False, the main process won't exit on time, and if I CTRL+C to force quit the main process, the print("Hello") will also be stopped.
So, is there any way the keep print this "Hello" even the main process has exited?
The multiprocessing module is generally used to split a huge task into multiple sub tasks and run them in parallel to improve performance.
In this case, you would want to use the subprocess module.
You can put your fun function in a seperate file(sub.py):
import time
while True:
print("Hello")
time.sleep(3)
Then you can call it from the main file(main.py):
from subprocess import Popen
import time
if __name__ == '__main__':
Popen(["python", "./sub.py"])
time.sleep(6)
print('Parent Exiting')
The subprocess module can do it. If you have a .py file like this:
from subprocess import Popen
p = Popen([r'C:\Program Files\VideoLAN\VLC\vlc.exe'])
The file will end its run pretty quickly and exit, but vlc.exe will stay open.
In your case, because you want to use another function, you could in principle separate that into another .py file

Python Multiprocessing within Jupyter Notebook

I am new to the multiprocessing module in Python and work with Jupyter notebooks. I have tried the following code snippet from PMOTW:
import multiprocessing
def worker():
"""worker function"""
print('Worker')
return
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
When I run this as is, there is no output.
I have also tried creating a module called worker.py and then importing that to run the code:
import multiprocessing
from worker import worker
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
There is still no output in that case. In the console, I see the following error (repeated multiple times):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
exitcode = _main(fd)
File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 116, in _main
self = pickle.load(from_parent)
AttributeError: Can't get attribute 'worker' on <module '__main__' (built-in)>
However, I get the expected output when the code is saved as a Python script and exectued.
What can I do to run this code directly from the notebook without creating a separate script?
I'm relatively new to parallel computing so I may be wrong with some technicalities. My understanding is this:
Jupyter notebooks don't work with multiprocessing because the module pickles (serialises) data to send to processes.
multiprocess is a fork of multiprocessing that uses dill instead of pickle to serialise data which allows it to work from within Jupyter notebooks. The API is identical so the only thing you need to do is to change
import multiprocessing
to...
import multiprocess
You can install multiprocess very easily with a simple
pip install multiprocess
You will however find that your processes will still not print to the output, (although in Jupyter labs they will print out to the terminal the server out is running in). I stumbled upon this post trying to work around this and will edit this post when I find out how to.
I'm not an export either in multiprocessing or in ipykernel(which is used by jupyter notebook) but because there seems nobody gives an answer, I will tell you what I guessed. I hope somebody complements this later on.
I guess your jupyter notebook server is running on Windows host. In multiprocessing there are three different start methods. Let's focus on spawn, which is the default on windows, and fork, the default on Unix.
Here is a quick overview.
spawn
(cpython) interactive shell - always raise an error
run as a script - okay only if you nested multiprocessing code in if __name__ == '__main'__
Fork
always okay
For example,
import multiprocessing
def worker():
"""worker function"""
print('Worker')
return
if __name__ == '__main__':
multiprocessing.set_start_method('spawn')
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
This code works when it's saved and run as a script, but raises an error when entered in an python interactive shell. Here is the implementation of ipython kernel, and my guess is that that it uses some kind of interactive shell and so doesn't go well with spawn(but please don't trust me).
For a side note, I will give you an general idea of how spawn and fork are different. Each subprocess is running a different python interpreter in multiprocessing. Particularly, with spawn, a child process starts a new interpreter and imports necessary module from scratch. It's hard to import code in interactive shell, so it may raise an error.
fork is different. With fork, a child process copies the main process including most of the running states of the python interpreter and then continues execution. This code will help you understand the concept.
import os
main_pid = os.getpid()
os.fork()
print("Hello world(%d)" % os.getpid()) # print twice. Hello world(id1) Hello world(id2)
if os.getpid() == main_pid:
print("Hello world(main process)") # print once. Hello world(main process)
Much like you I encountered the attribute error. The problem seems to be related how jupyter handles multithreading. The fastest result I got was to follow the Multi-processing example.
So the ThreadPool took care of my issue.
from multiprocessing.pool import ThreadPool as Pool
def worker():
"""worker function"""
print('Worker\n')
return
pool = Pool(4)
for result in pool.map(worker, range(5)):
pass # or print diagnostics
This works for me on MAC (cannot make it work on windows):
import multiprocessing as mp
mp_start_count = 0
if __name__ == '__main__':
if mp_start_count == 0:
mp.set_start_method('fork')
mp_start_count += 1
Save the function to a separate Python file then import the function back in. It should work fine that way.

launch a python console and control its output

I need to launch a python console and control its output. I am using python subprocess.Popen() to a create a new instance.
I saved following code in a python script and running it from windows command prompt. When I run the script it launches the python instance in current windows command prompt, do not launch it in a separate console.
p = subprocess.Popen(["C:\Python31\python.exe"], shell=False,
# stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
out, _ = p.communicate()
print(out.decode())
In Windows you can spawn subprocesses in new console sessions by using the CREATE_NEW_CONSOLE creation flag:
from subprocess import Popen, CREATE_NEW_CONSOLE, PIPE
p = Popen(["C:\Python31\python.exe"], creationflags=CREATE_NEW_CONSOLE)
If you are on windows you can use win32console module to open a second console for your thread or subprocess output. This is the most simple and easiest way that works if you are on windows.
Here is a sample code:
import win32console
import multiprocessing
def subprocess(queue):
win32console.FreeConsole() #Frees subprocess from using main console
win32console.AllocConsole() #Creates new console and all input and output of subprocess goes to this new console
while True:
print(queue.get())
#prints any output produced by main script passed to subprocess using queue
if __name__ == "__main__":
queue = multiprocessing.Queue()
multiprocessing.Process(target=subprocess, args=[queue]).start()
while True:
print("Hello World in main console")
queue.put("Hello work in sub process console")
#sends above string to subprocess and it prints it into its console
#and whatever else you want to do in ur main process
You can also do this with threading. You have to use queue module if you want the queue functionality as threading module doesn't have queue
Here is the win32console module documentation

When I use os.system() to open a .py file in Python it automatically closes it immediately. How do I fix this?

I'm writing a script that needs to open another script, but continue running the main script such that both scripts are running simultaneously.
I've tried execfile() but the file doesn't open. When I use os.system(somefile.py) it successfully opens the .py file via console but immediately closes it. Are there alternatives so that I can run a python script within a main python script, but have both processes running simultaneously without conflicting one another?
Here is sample code I've tested:
import os
file_path = 'C:\\Users\\Tyler\\Documents\\Multitask Bot\\somefile.py'
def main():
os.system(file_path)
if __name__ == '__main__':
main()
execfile() and os.system() will block the parent process until the child exits. Use subprocess.Popen(), e.g.
import subprocess, time
file_path = 'C:\\Users\\Tyler\\Documents\\Multitask Bot\\somefile.py'
def main():
child = subprocess.Popen(['python', file_path])
while child.poll() is None:
print "parent: child (pid = %d) is still running" % child.pid
# do parent stuff
time.sleep(1)
print "parent: child has terminated, returncode = %d" % child.returncode
if __name__ == '__main__':
main()
This is just one way to handle it. You may want to collect stdout and/or stderr from the child and possibly send data to the child's stdin. Read up on the subprocess module.
If you want to run another script simultaneously, consider the subprocess module.
Your problem can be that that file is not executed in C:\\Users\\Tyler\\Documents\\Multitask Bot\\
but somewhere else. The local import may fail.
could you try executing os.chdir('C:\\Users\\Tyler\\Documents\\Multitask Bot\\') before os.system ?

Python multiprocessing continuously spawns pythonw.exe processes without doing any actual work

I don't understand why this simple code
# file: mp.py
from multiprocessing import Process
import sys
def func(x):
print 'works ', x + 2
sys.stdout.flush()
p = Process(target= func, args= (2, ))
p.start()
p.join()
p.terminate()
print 'done'
sys.stdout.flush()
creates "pythonw.exe" processes continuously and it doesn't print anything, even though I run it from the command line:
python mp.py
I am running the latest of Python 2.6 on Windows 7 both 32 and 64 bits
You need to protect then entry point of the program by using if __name__ == '__main__':.
This is a Windows specific problem. On Windows your module has to be imported into a new Python interpreter in order for it to access your target code. If you don't stop this new interpreter running the start up code it will spawn another child, which will then spawn another child, until it's pythonw.exe processes as far as the eye can see.
Other platforms use os.fork() to launch the subprocesses so don't have the problem of reimporting the module.
So your code will need to look like this:
from multiprocessing import Process
import sys
def func(x):
print 'works ', x + 2
sys.stdout.flush()
if __name__ == '__main__':
p = Process(target= func, args= (2, ))
p.start()
p.join()
p.terminate()
print 'done'
sys.stdout.flush()
According to the programming guidelines for multiprocessing, on windows you need to use an if __name__ == '__main__':
Funny, works on my Linux machine:
$ python mp.py
works 4
done
$
Is the multiprocessing thing supposed to work on Windows? A lot of programs originated in the Unix world don't handle Windows so well, because Unix uses fork(2) to clone processes quite cheaply, but (it is my understanding) that Windows does not support fork(2) gracefully, if at all.

Categories