Using multiprocessing in python script with Django models - python

I am writing a custom script to run multiple instances of the same functions using multiprocessing with django models.
The code which concerns this post consists of:
if __name__ == '__main__':
for count, script in enumerate(scripts):
for counter in range(0, len(counters)):
p = Process(target=script, args=(counters[counter][count],))
p.start()
p.join()
the loops execute correctly, but I am having a problem with the __name__ == '__main__' statement. I could hack it together to say __name__=__main__ before that line, but then I would run into a problem where p.start() throws an error for:
PicklingError: Can't pickle <function nordstrom_script at 0x0000000003B2A208>: it's not found as __main__.nordstrom_script
I am relatively new to python/django and have never experimented with multiprocessing before, so please excuse my lack of knowledge if something is dreadfully wrong with my logic.
Any help resolving this would be greatly appreciated. I know that django does not work well with multiprocessing, and the problem comes from me using:
>>>python manage.py shell
>>>execscript('scripts/script.py')
and not
>>>python scripts/script.py

This version is directly runnable, and works for me. Could you modify this code to produce the same error? Note that it only processes the 1st arg of 'counters', I assume this is by design.
source
import multiprocessing
def produce(arg):
print 'arg:',arg
scripts = [produce]
counters = [ [3350000, 7000000] ]
if __name__ == '__main__':
for count, script in enumerate(scripts):
for counter in range(0, len(counters)):
p = multiprocessing.Process(
target=script, args=(counters[counter][count],)
)
p.start()
p.join()
output
arg: 3350000

Related

Python Multiprocessing - Object has no attribute

I'm trying to use multiprocessing on my script. But it doesn't work. what am I doing wrong? I searched too much but I didn't find the solution. Can you help me guys?
It seems HistogramMerger working with multiprocessing. I saw some print-out when I run the script but I don't get any result file which I normally getting with for loop.
I'm getting this error message:
AttributeError: 'module' object has no attribute 'histogramAdd'
ps: This histogram merger script merging multiple files to one single file. And, I'm trying to run this script faster than normal. If you know better solution, please let me know.
without multiprocessing (working)
from histogram_merger import HistogramMerger
var1=697
var2=722
with HistogramMerger("results/resMergedHistograms_"+str(var1)+"_"+str(var1)+".root") as hm:
for i in xrange(var1,var2+1):
print "Run Number : " +str(i)
hm.addHistogramFile("../results/run"+str(i)+"/run"+str(i)+"_histo.root")
with MultiProcessing
from histogram_merger import HistogramMerger
from multiprocessing import Pool
var1=697
var2=722
##################################################
arrayOfNumbers = [xx for xx in range(var1, var2+1)]
print(arrayOfNumbers)
pool = Pool(8)
def histogramAdd(run):
print("Run Number : "+str(run))
hm.addHistogramFile("../results/run"+str(run)+"/run"+str(run)+"_histo.root")
if __name__ == '__main__':
with HistogramMerger("results/resMergedHistograms_"+str(var1)+"_"+str(var2)+".root") as hm:
pool.map(histogramAdd, arrayOfNumbers)
pool.join()
The error message is odd. hm is not in scope within the function histogramAdd. I would expect something like NameError: name 'hm' is not defined. Perhaps there is some hm import you are not showing.
Regardless, you need to pass the object to the function. You can use functools.partial for this. E.g.
from functools import partial
# ...
def histogramAdd(run, hm): # <- extra parameter!
print("Run Number : "+str(run))
hm.addHistogramFile("../results/run"+str(run)+"/run"+str(run)+"_histo.root")
if __name__ == '__main__':
with HistogramMerger("results/resMergedHistograms_"+str(var1)+"_"+str(var2)+".root") as hm:
pool.map(partial(histogramAdd, hm), arrayOfNumbers)
pool.join()

python multiprocessing on eclipse

I write python code on eclipse using pydev.
The code is following :
from multiprocessing import Process, Queue
import time
g_workercount = 1
def calc_step():
print('calc_step started')
q = Queue()
p_worker = []
for i in range(0, g_workercount):
ww = Process(target=worker_calc_step, args=(q,i,))
ww.start()
p_worker.append(ww)
for ww in p_worker:
ww.join()
print('calc_step ended')
def worker_calc_step(q, n):
print('worker_calc_step started')
print('worker_calc_step ended')
if __name__ == '__main__':
calc_step()
print('finished')
It is a very simple code, and I expected the ouput would be :
calc_step started
worker_calc_step started
worker_calc_step ended
calc_step ended
finished
It is ok executing on console,
but running on eclipse is not ok like :
calc_step started
calc_step ended
finished
I guess before starting worker process, the main process would finished.
So I added sleep code in the main process function, but it is also same in eclipse.
Do you have any idea about it?
It is a little difficult making a multiproccess code in python and eclipse for me.
Thanks.
I'm not really sure what may be wrong there... I've tried it here inside and outside Eclipse/PyDev in Python 2 and 3 and got the same results (where I got the expected output).
Some questions to help diagnose the issue:
Which OS are you using?
What's the Python version?
Have you tried running it under the debugger to see where it might fail?
Have you tried printing to a file instead of stdout? (maybe it's starting the process but it's not properly printing to stdout?)

Multiprocessing Pool initializer fails pickling

I am trying to use the multiprocessing.Pool to implement a multithread application. To share some variables I am using a Queue as hinted here:
def get_prediction(data):
#here the real calculation will be performed
....
def mainFunction():
def get_prediction_init(q):
print("a")
get_prediction.q = q
queue = Queue()
pool = Pool(processes=16, initializer=get_prediction_init, initargs=[queue,])
if __name__== '__main__':
mainFunction()
This code is running perfectly on a Debian machine, but is not working at all on another Windows 10 device. It fails with the error
AttributeError: Can't pickle local object 'mainFunction.<locals>.get_prediction_init'
I do not really know what exactly is causing the error. How can I solve the problem so that I can run the code on the Windows device as well?
EDIT: The problem is solved if I create the get_predediction_init function on the same level as the mainFunction. It has only failed when I defined it as an inner function. Sorry for the confusion in my post.
The problem is in something you haven't shown us. For example, it's a mystery where "mainFunction" came from in the AttributeError message you showed.
Here's a complete, executable program based on the fragment you posted. Worked fine for me under Windows 10 just now, under Python 3.6.1 (I'm guessing you're using Python 3 from your print syntax), printing "a" 16 times:
import multiprocessing as mp
def get_prediction(data):
#here the real calculation will be performed
pass
def get_prediction_init(q):
print("a")
get_prediction.q = q
if __name__ == "__main__":
queue = mp.Queue()
pool = mp.Pool(processes=16, initializer=get_prediction_init, initargs=[queue,])
pool.close()
pool.join()
Edit
And, based on your edit, this program also works fine for me:
import multiprocessing as mp
def get_prediction(data):
#here the real calculation will be performed
pass
def get_prediction_init(q):
print("a")
get_prediction.q = q
def mainFunction():
queue = mp.Queue()
pool = mp.Pool(processes=16, initializer=get_prediction_init, initargs=[queue,])
pool.close()
pool.join()
if __name__ == "__main__":
mainFunction()
Edit 2
And now you've moved the definition of get_prediction_init() into the body of mainFunction. Now I can see your error :-)
As shown, define the function at module level instead. Trying to pickle local function objects can be a nightmare. Perhaps someone wants to fight with that, but not me ;-)

How does the python multiprocessing works in backend?

When i tried to run the code:
import multiprocessing
def worker():
"""worker function"""
print 'Worker'
return
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
The output is blank and simply executing without printing "Worker". How to print the required output in multiprocessing?
What actually is happening while using multiprocessing?
What is the maximum number of cores we can use for multiprocessing?
I've tried your code in Windows 7, Cygwin, and Ubuntu. For me all the threads finish before the loop comes to an end so I get all the prints to show, but using join() will guarantee all the threads will finish.
import multiprocessing
def worker():
"""worker function"""
print 'Worker'
return
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
for i in range(len(jobs)):
jobs.pop().join()
As far as how multiprocessing works in the backend, I'm going to let someone more experienced than myself answer that one :) I'll probably just make a fool of myself.
I get 5 time "Worker" printed for my part, are you on Python 3 ? if it is the case you muste use print("Worker"). from my experiment, I think multitreading doesn't mean using multiple cores, it just run the diferent tread alternatively to ensure a parallelism. try reading the multiprocessing lib documentation for more info.

importing target functions | multiprocessing

I want to learn multiprocessing in python. I started reading http://www.doughellmann.com/PyMOTW/multiprocessing/basics.html and I am not able to understand the section on importing target functions.
In particular what does the following sentence mean..
"Wrapping the main part of the application in a check for __main__ ensures that it is not run recursively in each child as the module is imported."
Can someone explain this in more detail with an example ?
http://effbot.org/pyfaq/tutor-what-is-if-name-main-for.htm
http://docs.python.org/tutorial/modules.html#executing-modules-as-scripts
What does if __name__ == "__main__": do?
http://en.wikipedia.org/wiki/Main_function#Python
On Windows, the multiprocessing module imports the __main__ module when spawning a new process. If the code that spawns the new process is not wrapped in a if __name__ == '__main__' block, then importing the main module will again spawn a new process. And so on, ad infinitum.
This issue is also mentioned in the multiprocessing docs in the section entitled "Safe importing of main module". There, you'll find the following simple example:
Running this on Windows:
from multiprocessing import Process
def foo():
print 'hello'
p = Process(target=foo)
p.start()
results in a RuntimeError.
And the fix is to use:
if __name__ == '__main__':
p = Process(target=foo)
p.start()
"""This is my module (mymodule.py)"""
def sum(a,b):
""">>> sum(1,1)
2
>>> sum(1,-1)
0
"""
return a+b
# if you run this module using 'python mymodule.py', run a self test
# if you just import this module, you get foo() and other definitions,
# but the self-test isn't run
if __name__=='__main__':
import doctest
doctest.testmod()
Ensures that the script being run is in the 'top level environment' for interactivity.
For example if you wanted to interact with the user (launching process) you would want to ensure that it is main.
if __name__ == '__main__':
do_something()

Categories