I am learning Python and its multiprocessing.
I created a project with a mian() in main.py and a a_simulation inside the module simulation.py under the package simulator/.
The symptom is that a test statement print("hello\n") inside main.py before the definition of mian() is executed multiple times when the program is run with python main.py, indicating things before the print, including the creations of the lists are all executed multiple times.
I do not think I understand the related issues of python very well. May I know what is reason for the symptom and what is the best practice in python when creating projects like this? I have included the codes and the terminal prints. Thank you!
Edit: Forgot to mention that I am running it with anaconda python on macOS, although I would wish my project will work just fine on any platforms.
mian.py:
from multiprocessing import Pool
from simulator.simulation import a_simulation
import random
num_trials = 10
iter_trials = list(range(num_trials))
arg_list = [random.random() for _ in range(num_trials)]
input = list(zip(iter_trials, arg_list))
print("hello\n")
def main():
with Pool(processes=4) as pool:
result = pool.starmap(a_simulation, input)
print(result)
if __name__ == "__main__":
main()
simulatior/simulation.py:
import os
from time import sleep
def a_simulation(x, seed_):
print(f"Process {os.getpid()}: trial {x} received {seed_}\n" )
sleep(1)
return seed_
Results from the terminal:
hello
hello
hello
hello
hello
Process 71539: trial 0 received 0.4512600158461971
Process 71538: trial 1 received 0.8772526554425158
Process 71541: trial 2 received 0.6893833978242683
Process 71540: trial 3 received 0.29249994820563296
Process 71538: trial 4 received 0.5759647958461107
Process 71541: trial 5 received 0.08799525261308505
Process 71539: trial 6 received 0.3057644321667139
Process 71540: trial 7 received 0.5402091856171599
Process 71538: trial 8 received 0.1373456223147438
Process 71541: trial 9 received 0.24000943476017
[0.4512600158461971, 0.8772526554425158, 0.6893833978242683, 0.29249994820563296, 0.5759647958461107, 0.08799525261308505, 0.3057644321667139, 0.5402091856171599, 0.1373456223147438, 0.24000943476017]
(base)
The reason why this happens is because multiprocessing uses start method spawn, by default, on Windows and macOS to start new processes. What this means is that whenever you want to start a new process, the child process is initially created without sharing any of the memory of the parent. However, this makes things messy when you want to start a function in the child process from the parent because not only will the child not know the definition of the function itself, you might also run into some unexpected obstacles (what if the function depends on a variable defined in the parent processes' module?). To stop these sorts of things from happening, multiprocessing automatically imports the parent processes' module from the child process, which essentially copies almost the entire state of the parent when the child process was started.
This is where the if __name__ == "__main__" comes in. This statement basically translates to if the current file is being run directly then..., the code under this block will not run if the module is being imported. Therefore, the child process will not run anything under this block when they are spawned. You can hence use this block to create, for example, variables which use up a lot of memory and are not required for the child processes to function but are used by the parent. Basically, anything that the child processes won't need, throw it under here.
Now coming to your comment about imports:
This must be a silly questions, but should I leave the import statements as they are, or move them inside if name == "main":, or somewhere else? Thanks
Like I said, anything that the child doesn't need can be put under this if block. The reason you don't often see imports under this block is perhaps due to sticking to convention ("imports should be done at the top") and because the modules being imported don't really affect performance much (even after being needlessly imported multiple times). Keep in mind however, that if a child process requires a particular module to start its work, it will always be imported again within the child process, even if you have imported it under the if __name__... block. This is because when you attempt to spawn child processes to start a function in parallel, multiprocessing automatically serializes and sends the names of the function, and the module that defines the function (actual code is not serialized, only the names), to the child processes where they are imported once more (relevant question).
This is only specific to when the start method is spawn, you can read more about the differences here
Related
I'm struggling to find answers on what objects and variables are copied to child processes when creating a multiprocessing pool in Python 3.
In other words, say I have a huge list (~230000000 elements) stored in a class that implements a function that uses a pool of four child processes. Will this list then be copied across to all four child processes if...
the child processes do not read from the list?
the child processes read from the list (however, the list is not modified)?
To concretely answer the original question specifically regarding the usage of "spawn" (as OP mentioned they are familiar with "fork")
When a process object is created, it is constructed in main, and then a new python process is executed with command line args to share a pair of file handles for communication as well as a stub of code to start from.
That "bootstrap" code will try to import the main file, which is both why you need to protect against unintended side-effects on import (if __name__ == "__main__":), and why anything outside of that protection is "available" to the child. This primarily is meant to make sure functions from the main file are defined, but any variables defined at the module level are also defined. This is useful for constants as long as it doesn't matter that you're effectively re-computing the values, and making one copy for each process. For large datasets this is very inefficient.
The bootstrap code will also read one of the file handles, and attempt to unpickle the process object that the parent sent to it. The target of the process is generally a function you have defined, but care must be taken that it's accessible in the "main" namespace on import (no lambda's, no instance methods, etc..). Python does not serialize code objects with pickle, rather it relays how to properly import the function, which gets dicey with objects that don't have a concrete namespace on import (sidebar, the 3rd party multiprocess library attempts to solve this by using dill instead of pickle to generally good success). This also plays into account when you subclass the Process class, and attach other data to a process instance; it all must be pickleable.
Once the process object has been successfully un-pickled by the child process, the run method is called. This is generally the entrypoint of your code. with a Pool, there's a big class that lives on the main process, and launches "worker" processes with a pre-defined function that takes in "jobs" and returns the results until told to exit. Data (task items consisting of a function to execute and args for that func) is sent to and from the workers via Queues which work pretty much the same as sending the original Process object: the thing you put into the queue is pickled, sent via a file handle, and un-pickled in the child.
Note: this answer is partial in the sense that I too couldn't (yet) find written evidence and documentation about this, but the following gives some kind of empirical data, if you will.
The following code is used to demonstrate how data is being passed/copied to child processes using a Pool (the actual list l is not used on purpose in the map to allow clean printings):
from multiprocessing import Pool
import os
def process(x):
print(os.getpid(), __name__, 'l' in globals())
# A - l = list(range(100000))
if __name__ == "__main__":
# B - l = list(range(100000))
with Pool() as pool:
pool.map(process, [1,2,3,4])
print(os.getpid(), __name__, 'l' in globals())
On Windows
When uncommenting comment A, a printout similar to:
19604 __mp_main__ True
6392 __mp_main__ True
19604 __mp_main__ True
7048 __mp_main__ True
6568 __main__ True
will be given. This is because the list is defined outside the __name__ guard, and as the processes in Windows basically import the py file, they all define their own version of l.
When uncommenting comment B, a printout similar to:
7248 __mp_main__ False
22644 __mp_main__ False
22676 __mp_main__ False
16520 __mp_main__ False
19736 __main__ True
will be given. i.e. as the the list is defined inside the __name__ guard, only the __main__ process have it defined and it passes the arguments through map to the different processes.
On Linux
Uncommenting any of the comments will give a printout similar to:
25261 __main__ True
25262 __main__ True
25263 __main__ True
25264 __main__ True
25260 __main__ True
I am guessing that this is because Linux uses fork to create the spawned processes, where the processes are being "cloned" so the list will be defined either way.
I'm trying to understand multiprocessing. My actual application is to display log messages in real time on a pyqt5 GUI, but I ran into some problems using queues so I made a simple program to test it out.
The issue I'm seeing is that I am unable to add elements to a Queue across python modules and across processes. Here is my code and my output, along with the expected output.
Config file for globals:
# cfg.py
# Using a config file to import my globals across modules
#import queue
import multiprocessing
# q = queue.Queue()
q = multiprocessing.Queue()
Main module:
# mod1.py
import cfg
import mod2
import multiprocessing
def testq():
global q
print("q has {} elements".format(cfg.q.qsize()))
if __name__ == '__main__':
testq()
p = multiprocessing.Process(target=mod2.add_to_q)
p.start()
p.join()
testq()
mod2.pullfromq()
testq()
Secondary module:
# mod2.py
import cfg
def add_to_q():
cfg.q.put("Hello")
cfg.q.put("World!")
print("qsize in add_to_q is {}".format(cfg.q.qsize()))
def pullfromq():
if not cfg.q.empty():
msg = cfg.q.get()
print(msg)
Here is the output that I actually get from this:
q has 0 elements
qsize in add_to_q is 2
q has 0 elements
q has 0 elements
vs the output that I would expect to get:
q has 0 elements
qsize in add_to_q is 2
q has 2 elements
Hello
q has 1 elements
So far I have tried using both multiprocessing.Queue and queue.Queue. I have also tested this with and without Process.join().
If I run the same program without using multiprocessing, I get the expected output shown above.
What am I doing wrong here?
EDIT:
Process.run() gives me the expected output, but it also blocks the main process while it is running, which is not what I want to do.
My understanding is that Process.run() runs the created process in the context of the calling process (in my case the main process), meaning that it is no different from the main process calling the same function.
I still don't understand why my queue behavior isn't working as expected
I've discovered the root of the issue and I'll document it here for future searches, but I'd still like to know if there's a standard solution to creating a global queue between modules so I'll accept any other answers/comments.
I found the problem when I added the following to my cfg.py file.
print("cfg.py is running in process {}".format(multiprocessing.current_process()))
This gave me the following output:
cfg.py is running in process <_MainProcess(MainProcess, started)>
cfg.py is running in process <_MainProcess(Process-1, started)>
cfg.py is running in process <_MainProcess(Process-2, started)>
It would appear that I'm creating separate Queue objects for each process that I create, which would certainly explain why they aren't interacting as expected.
This question has a comment stating that
a shared queue needs to originate from the master process, which is then passed to all of its subprocesses.
All this being said, I'd still like to know if there is an effective way to share a global queue between modules without having to pass it between methods.
This code executes on linux but throws an AttributeError: type object 'T' has no attribute 'val' on windows, why?
from multiprocessing import Process
import sys
class T():
#classmethod
def init(cls, val):
cls.val = val
def f():
print(T.val)
if __name__ == '__main__':
T.init(5)
f()
p = Process(target=f, args=())
p.start()
Windows lacks a fork() system call, which duplicates current process. This has many implications, including those listed on the windows multiprocessing documentation page. More specifically:
Bear in mind that if code run in a child process tries to access a global variable, then the value it sees (if any) may not be the same as the value in the parent process at the time that Process.start was called.
In internals, python creates a new process on windows by starting a new process from scratch, and telling it to load all modules again. So any change you have done in current process will not be seen.
In your example, this means that in the child process, your module will be loaded, but the if __name__ == '__main__' section will not be run. So T.init will not be called, and T.val won't exist, thus the error you see.
On the other hand, on POSIX systems (that includes Linux), process creation uses fork, and all global state is left untouched. The child runs with a copy of everything, so it does not have to reload anything and will see its copy of T with its copy of val.
This also means that Process creation is much faster and much lighter on resources on POSIX systems, especially as the “duplication” uses copy-on-write to avoid the overhead of actually copying the data.
There are other quirks when using multiprocessing, all of which are detailed in the python multiprocessing guidelines.
I have written a class in python 2.7 (under linux) that uses multiple processes to manipulate a database asynchronously. I encountered a very strange blocking behaviour when using multiprocessing.Queue.put() and multiprocessing.Queue.get() which I can't explain.
Here is a simplified version of what I do:
from multiprocessing import Process, Queue
class MyDB(object):
def __init__(self):
self.inqueue = Queue()
p1 = Process(target = self._worker_process, kwargs={"inqueue": self.inqueue})
p1.daemon = True
started = False
while not started:
try:
p1.start()
started = True
except:
time.sleep(1)
#Sometimes I start a same second process but it makes no difference to my problem
p2 = Process(target = self._worker_process, kwargs={"inqueue": self.inqueue})
#blahblah... (same as above)
#staticmethod
def _worker_process(inqueue):
while True:
#--------------this blocks depite data having arrived------------
op = inqueue.get(block = True)
#do something with specified operation
#---------------problem area end--------------------
print "if this text gets printed, the problem was solved"
def delete_parallel(self, key, rawkey = False):
someid = ...blahblah
#--------------this section blocked when I was posting the question but for unknown reasons it's fine now
self.inqueue.put({"optype": "delete", "kwargs": {"key":key, "rawkey":rawkey}, "callid": someid}, block = True)
#--------------problem area end----------------
print "if you see this text, there was no blocking or block was released"
If I run the code above inside a test (in which I call delete_parallel on the MyDB object) then everything works, but if I run it in context of my entire application (importing other stuff, inclusive pygtk) strange things happen:
For some reason self.inqueue.get blocks and never releases despite self.inqueue having the data in its buffer. When I instead call self.inqueue.get(block = False, timeout = 1) then the call finishes by raising Queue.Empty, despite the queue containing data. qsize() returns 1 (suggests that data is there) while empty() returns True (suggests that there is no data).
Now clearly there must be something somewhere else in my application that renders self.inqueue unusable by causing acquisition of some internal semaphore. However I don't know what to look for. Eclipse dubugging becomes useless once a blocking semaphore is reached.
Edit 8 (cleaning up and summarizing my previous edits) Last time I had a similar problem, it turned out that pygtk was hijacking the global interpreter lock, but I solved it by calling gobject.threads_init() before I called anything else. Could this issue be related?
When I introduce a print "successful reception" after the get() method and execute my application in terminal, the same behaviour happens at first. When I then terminate by pressing CTRL+D I suddenly get the string "successful reception" inbetween messages. This looks to me like some other process/thread is terminated and releases the lock that blocks the process that is stuck at get().
Since the process that was stuck terminates later, I still see the message. What kind of process could externally mess with a Queue like that? self.inqueue is only accessed inside my class.
Right now it seems to come down to this queue which won't return anything despite the data being there:
the get() method seems to get stuck when it attempts to receive the actual data from some internal pipe. The last line before my debugger hangs is:
res = self._recv()
which is inside of multiprocessing.queues.get()
Tracking this internal python stuff further I find the assignments
self._recv = self._reader.recv and self._reader, self._writer = Pipe(duplex=False).
Edit 9
I'm currently trying to hunt down the import that causes it. My application is quite complex with hundreds of classes and each class importing a lot of other classes, so it's a pretty painful process. I have found a first candidate class which Uses 3 different MyDB instances when I track all its imports (but doesn't access MyDB.inqueue at any time as far as I can tell). The strange thing is, it's basically just a wrapper and the wrapped class works just fine when imported on its own. This also means that it uses MyDB without freezing. As soon as I import the wrapper (which imports that class), I have the blocking issue.
I started rewriting the wrapper by gradually reusing the old code. I'm testing each time I introduce a couple of new lines until I will hopefully see which line will cause the problem to return.
queue.Queue uses internal threads to maintain its state. If you are using GTK then it will break these threads. So you will need to call gobject.init_threads().
It should be noted that qsize() only returns an approximate size of the queue. The real size may be anywhere between 0 and the value returned by qsize().
I'm trying to a program that executes a piece of code in such a way that the user can stop its execution at any time without stopping the main program. I thought I could do this using threading.Thread, but then I ran the following code in IDLE (Python 3.3):
from threading import *
import math
def f():
eval("math.factorial(1000000000)")
t = Thread(target = f)
t.start()
The last line doesn't return: I eventually restarted the shell. Is this a consequence of the Global Interpreter Lock, or am I doing something wrong? I didn't see anything specific to this problem in the threading documentation (http://docs.python.org/3/library/threading.html)
I tried to do the same thing using a process:
from multiprocessing import *
import math
def f():
eval("math.factorial(1000000000)")
p = Process(target = f)
p.start()
p.is_alive()
The last line returns False, even though I ran it only a few seconds after I started the process! Based on my processor usage, I am forced to conclude that the process never started in the first place. Can somebody please explain what I am doing wrong here?
Thread.start() never returns! Could this have something to do with the C implementation of the math library?
As #eryksun pointed out in the comment: math.factorial() is implemented as a C function that doesn't release GIL so no other Python code may run until it returns.
Note: multiprocessing version should work as is: each Python process has its own GIL.
factorial(1000000000) has hundreds millions of digits. Try import time; time.sleep(10) as dummy calculation instead.
If you have issues with multithreaded code in IDLE then try the same code from the command line, to make sure that the error persists.
If p.is_alive() returns False after p.start() is already called then it might mean that there is an error in f() function e.g., MemoryError.
On my machine, p.is_alive() returns True and one of cpus is at 100% if I paste your code from the question into Python shell.
Unrelated: remove wildcard imports such as from multiprocessing import *. They may shadow other names in your code so that you can't be sure what a given name means e.g., threading could define eval function (it doesn't but it could) with a similar but different semantics that might break your code silently.
I want my program to be able to handle ridiculous inputs from the user gracefully
If you pass user input directly to eval() then the user can do anything.
Is there any way to get a process to print, say, an error message without constructing a pipe or other similar structure?
It is an ordinary Python code:
print(message) # works
The difference is that if several processes run print() then the output might be garbled. You could use a lock to synchronize print() calls.