I'm trying to use multiprocessing on my script. But it doesn't work. what am I doing wrong? I searched too much but I didn't find the solution. Can you help me guys?
It seems HistogramMerger working with multiprocessing. I saw some print-out when I run the script but I don't get any result file which I normally getting with for loop.
I'm getting this error message:
AttributeError: 'module' object has no attribute 'histogramAdd'
ps: This histogram merger script merging multiple files to one single file. And, I'm trying to run this script faster than normal. If you know better solution, please let me know.
without multiprocessing (working)
from histogram_merger import HistogramMerger
var1=697
var2=722
with HistogramMerger("results/resMergedHistograms_"+str(var1)+"_"+str(var1)+".root") as hm:
for i in xrange(var1,var2+1):
print "Run Number : " +str(i)
hm.addHistogramFile("../results/run"+str(i)+"/run"+str(i)+"_histo.root")
with MultiProcessing
from histogram_merger import HistogramMerger
from multiprocessing import Pool
var1=697
var2=722
##################################################
arrayOfNumbers = [xx for xx in range(var1, var2+1)]
print(arrayOfNumbers)
pool = Pool(8)
def histogramAdd(run):
print("Run Number : "+str(run))
hm.addHistogramFile("../results/run"+str(run)+"/run"+str(run)+"_histo.root")
if __name__ == '__main__':
with HistogramMerger("results/resMergedHistograms_"+str(var1)+"_"+str(var2)+".root") as hm:
pool.map(histogramAdd, arrayOfNumbers)
pool.join()
The error message is odd. hm is not in scope within the function histogramAdd. I would expect something like NameError: name 'hm' is not defined. Perhaps there is some hm import you are not showing.
Regardless, you need to pass the object to the function. You can use functools.partial for this. E.g.
from functools import partial
# ...
def histogramAdd(run, hm): # <- extra parameter!
print("Run Number : "+str(run))
hm.addHistogramFile("../results/run"+str(run)+"/run"+str(run)+"_histo.root")
if __name__ == '__main__':
with HistogramMerger("results/resMergedHistograms_"+str(var1)+"_"+str(var2)+".root") as hm:
pool.map(partial(histogramAdd, hm), arrayOfNumbers)
pool.join()
Related
I have a multiprocessing pool , that runs with 1 thread, and it keeps repeating the code before my function, i have tried with different threads, and also, i make things like this quite a bit, so i think i know what is causing the problem but i dont understand why, usually i use argparse to to parse files from the user, but i instead wanted to use input, no errors are thrown so i honestly have no clue.
from colorama import Fore
import colorama
import os
import ctypes
import multiprocessing
from multiprocessing import Pool
import random
colorama.init(autoreset=False)
print("headerhere")
#as you can see i used input instead of argparse
g = open(input(Fore.RED + " File Path?: " + Fore.RESET))
gg = open(input(Fore.RED + "File Path?: " + Fore.RESET))
#I messed around with this to see if it was the problem, ultimately disabling it until i fixed it, i just use 1 thread
threads = int(input(Fore.RED + "Amount of Threads?: " + Fore.RESET))
arrange = [lines.replace("\n", "")for lines in g]
good = [items.replace("\n", "") for items in gg]
#this is all of the code before the function that Pool calls
def che(line):
print("f")
#i would show my code but as i said this isnt the problem since ive made programs like this before, the only thing i changed is how i take file inputs from the user
def main():
pool = Pool(1)
pool.daemon = True
result = pool.map(che, arrange)
if __name__ == "__main__":
main()
if __name__ == "__main__":
main()
Here's a minimal, reproducible example of your issue:
from multiprocessing import Pool
print('header')
def func(n):
print(f'func {n}')
def main():
pool = Pool(3)
pool.map(func,[1,2,3])
if __name__ == '__main__':
main()
On OSes where "spawn" (Windows and MacOS) or "forkserver" (some Unix) are the default start methods, the sub-process imports your script. Since print('header') is at global scope, it will run the first time a script is imported into a process, so the output is:
header
header
header
header
func 1
func 2
func 3
A multiprocessing script should have everything meant to run once inside function(s), and they should be called once by the main script via if_name__ == '__main__':, so the solution is to move it into your def main()::
from multiprocessing import Pool
def func(n):
print(f'func {n}')
def main():
print('header')
pool = Pool(3)
pool.map(func,[1,2,3])
if __name__ == '__main__':
main()
Output:
header
func 1
func 2
func 3
If you want the top level code before the definition of che to only be executed in the master process, then place it in a function and call that function in main.
In multiprocessing, the top level statements will be interpreted/executed by both the master process and every child process. So, if some code should be executed only by the master and not by the children, then such code should not placed that at the top-level. Instead, such code should be placed in functions and these functions should be invoked in the main scope, i.e., in the scope of if block controlled by __main__ (or called in the main function in your code snippet).
I am trying to use multiprocessing on a different problem but I can't get it to work. To make sure I'm using the Pool class correctly, I made the following simpler problem but even that won't work. What am I doing wrong here?
from multiprocessing import Pool
def square(x):
sq = x**2
return sq
def main():
x1 = [1,2,3,4]
pool = Pool()
result = pool.map( square, x1 )
print(result)
if __name__ == '__main__': main()
The computer just seems to run forever and I need to close and restart the IPython shell before I can do anything.
I figured out what was wrong. I named the script "multiprocessing.py" which is the name of the module that was being imported. This resulted in the script attempting to import itself instead of the actual module.
I am trying to use the multiprocessing.Pool to implement a multithread application. To share some variables I am using a Queue as hinted here:
def get_prediction(data):
#here the real calculation will be performed
....
def mainFunction():
def get_prediction_init(q):
print("a")
get_prediction.q = q
queue = Queue()
pool = Pool(processes=16, initializer=get_prediction_init, initargs=[queue,])
if __name__== '__main__':
mainFunction()
This code is running perfectly on a Debian machine, but is not working at all on another Windows 10 device. It fails with the error
AttributeError: Can't pickle local object 'mainFunction.<locals>.get_prediction_init'
I do not really know what exactly is causing the error. How can I solve the problem so that I can run the code on the Windows device as well?
EDIT: The problem is solved if I create the get_predediction_init function on the same level as the mainFunction. It has only failed when I defined it as an inner function. Sorry for the confusion in my post.
The problem is in something you haven't shown us. For example, it's a mystery where "mainFunction" came from in the AttributeError message you showed.
Here's a complete, executable program based on the fragment you posted. Worked fine for me under Windows 10 just now, under Python 3.6.1 (I'm guessing you're using Python 3 from your print syntax), printing "a" 16 times:
import multiprocessing as mp
def get_prediction(data):
#here the real calculation will be performed
pass
def get_prediction_init(q):
print("a")
get_prediction.q = q
if __name__ == "__main__":
queue = mp.Queue()
pool = mp.Pool(processes=16, initializer=get_prediction_init, initargs=[queue,])
pool.close()
pool.join()
Edit
And, based on your edit, this program also works fine for me:
import multiprocessing as mp
def get_prediction(data):
#here the real calculation will be performed
pass
def get_prediction_init(q):
print("a")
get_prediction.q = q
def mainFunction():
queue = mp.Queue()
pool = mp.Pool(processes=16, initializer=get_prediction_init, initargs=[queue,])
pool.close()
pool.join()
if __name__ == "__main__":
mainFunction()
Edit 2
And now you've moved the definition of get_prediction_init() into the body of mainFunction. Now I can see your error :-)
As shown, define the function at module level instead. Trying to pickle local function objects can be a nightmare. Perhaps someone wants to fight with that, but not me ;-)
I am writing a custom script to run multiple instances of the same functions using multiprocessing with django models.
The code which concerns this post consists of:
if __name__ == '__main__':
for count, script in enumerate(scripts):
for counter in range(0, len(counters)):
p = Process(target=script, args=(counters[counter][count],))
p.start()
p.join()
the loops execute correctly, but I am having a problem with the __name__ == '__main__' statement. I could hack it together to say __name__=__main__ before that line, but then I would run into a problem where p.start() throws an error for:
PicklingError: Can't pickle <function nordstrom_script at 0x0000000003B2A208>: it's not found as __main__.nordstrom_script
I am relatively new to python/django and have never experimented with multiprocessing before, so please excuse my lack of knowledge if something is dreadfully wrong with my logic.
Any help resolving this would be greatly appreciated. I know that django does not work well with multiprocessing, and the problem comes from me using:
>>>python manage.py shell
>>>execscript('scripts/script.py')
and not
>>>python scripts/script.py
This version is directly runnable, and works for me. Could you modify this code to produce the same error? Note that it only processes the 1st arg of 'counters', I assume this is by design.
source
import multiprocessing
def produce(arg):
print 'arg:',arg
scripts = [produce]
counters = [ [3350000, 7000000] ]
if __name__ == '__main__':
for count, script in enumerate(scripts):
for counter in range(0, len(counters)):
p = multiprocessing.Process(
target=script, args=(counters[counter][count],)
)
p.start()
p.join()
output
arg: 3350000
I want to learn multiprocessing in python. I started reading http://www.doughellmann.com/PyMOTW/multiprocessing/basics.html and I am not able to understand the section on importing target functions.
In particular what does the following sentence mean..
"Wrapping the main part of the application in a check for __main__ ensures that it is not run recursively in each child as the module is imported."
Can someone explain this in more detail with an example ?
http://effbot.org/pyfaq/tutor-what-is-if-name-main-for.htm
http://docs.python.org/tutorial/modules.html#executing-modules-as-scripts
What does if __name__ == "__main__": do?
http://en.wikipedia.org/wiki/Main_function#Python
On Windows, the multiprocessing module imports the __main__ module when spawning a new process. If the code that spawns the new process is not wrapped in a if __name__ == '__main__' block, then importing the main module will again spawn a new process. And so on, ad infinitum.
This issue is also mentioned in the multiprocessing docs in the section entitled "Safe importing of main module". There, you'll find the following simple example:
Running this on Windows:
from multiprocessing import Process
def foo():
print 'hello'
p = Process(target=foo)
p.start()
results in a RuntimeError.
And the fix is to use:
if __name__ == '__main__':
p = Process(target=foo)
p.start()
"""This is my module (mymodule.py)"""
def sum(a,b):
""">>> sum(1,1)
2
>>> sum(1,-1)
0
"""
return a+b
# if you run this module using 'python mymodule.py', run a self test
# if you just import this module, you get foo() and other definitions,
# but the self-test isn't run
if __name__=='__main__':
import doctest
doctest.testmod()
Ensures that the script being run is in the 'top level environment' for interactivity.
For example if you wanted to interact with the user (launching process) you would want to ensure that it is main.
if __name__ == '__main__':
do_something()