Multiprocessing in a class and use it's instance in another class - python

I am stuck in using the multiprocessing module of Python. I wanted to create multiprocessing workers in a class and use the object of this class as a composite in another class. Below is the dummy code where the structure looks the same as my original code.
a.py
import concurrent.futures
class A:
def __init__(self, basic_input):
self.initial = basic_input
def _worker(self, work):
print(f"Doing some {work} using {self.initial}")
def print_my_work(self, set_of_works):
with concurrent.futures.ProcessPoolExecutor(max_workers=5) as executor:
executor.map(self._worker, set_of_works)
b.py
class B:
def __init__(self, basic_input):
# A is a composite object
self.a = A(basic_input)
def get_all_works(self):
return {'A', 'B', 'C'}
def processingB(self):
works = self.get_all_works()
self.a.print_my_work(works)
Here I'm trying to use class B in another module as below
check.py
import b
obj = B('Test')
obj.processingB()
Getting below error
Python multiprocessing RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.......
Can somebody help me. Thank you for reading this question.

Googling your problem, youn find a similar post : It seems that you are using Windows and you should then add if __name__ == '__main__': at the start of you main script (to avoid creating subprocesses recursively) :
import B
if __name__ == '__main__':
obj = B('Test')
obj.processingB()
Using your code I finally get :
Doing some A using Test
Doing some B using Test
Doing some C using Test

Related

Why unit tests fil whereas the program runs?

I'm asked to develop unit tests for a program which is such badly developed that the tests don't run... but the program does. Thus, I need to explain the reason why and I actually don't know!
Here is a piece of code that intends to represent the code I need to test:
from services import myModule1
from services.spec1 import importedFunc
from services.spec2 import getTool
from services.spec3 import getDict
class myClass(object):
def __init__(self, param1, param2):
self.param1 = param1
self.param2 = param2
self.param3 = 0
self.param4 = 0
def myMethod(self):
try:
myVar1 = globalDict['key1']
myVar2 = globalDict['key2']
newVar = importedFunc(par1=myVar1, par2=myVar2, par3=extVar3)
calcParam = myModule1.methodMod1(self.param1)
self.param3 = calcParam["keyParam3"]
self.param4 = newVar.meth1(self.param2)
globTools.send_message(self.param3, self.param4)
except:
globTools.error_message(self.param3, self.param4)
return
class myClass2(object):
def __init__(self, *myclass2_params):
# some piece of code to intialize dedicated attributes
self.add_objects()
def add_objects(self):
# Some piece of code
my_class = myClass(**necessary_params)
# Some piece of code
return
if __name__ == '__main__':
globTools = getTool("my_program")
globalDict = getDict(some_params)
# Some piece of code
my_class2 = myClass2(**any_params)
# Some piece of code
As you can see, the problem is that the class and its methods uses global variables, defined in the main scope. And it's just a quick summary because it's actually a bit more complicated, but I hope it's enough to give you an overview of the context and help me understand why the unit test fail.
I tried to mock the imported modules, but I did not manage to a successful result, so I first tried to make it simple and just initialize all parameters.
I went to this test file:
import unittest
from my_module import myClass
from services import myModule1
from services.spec1 import importedFunc
from services.spec2 import getTool
from services.spec3 import getDict
def test_myClass(unittest.TestCase):
def setUp(self):
globTools = getTool("my_program")
globalDict = getDict(some_params)
def test_myMethod(self):
test_class = myClass(*necessary_parameters)
test_res = test_class.myMethod()
self.assertIsNotNone(test_res)
if __name__ == '__main__':
unittest.main()
But the test fail, telling me 'globTools is not defined' when trying to instantiate myClass
I also tried to initialize variables directly in the test method, but the result is the same
And to be complete about the technical environment, I cannot run python programs directly and need to launch a docker environment via a Jenkins pipeline - I'm not very familiar with this but I imagine it should not have an impact on the result
I guess the problem comes from the variable's scopes, but I'm not able to explain it in this case: why the test fail where as the method itself works (yes, it actually works, or at least the program globally runs without)
It's not as bad as you think. Your setUp method just needs to define the appropriate top-level globals in your module, rather than local variables.
import unittest
import my_module
from my_module import myClass
from services import myModule1
from services.spec1 import importedFunc
from services.spec2 import getTool
from services.spec3 import getDict
class test_myClass(unittest.TestCase):
def setUp(self):
my_module.globTools = getTool("my_program")
my_module.globalDict = getDict(some_params)
def test_myMethod(self):
test_class = myClass(*necessary_parameters)
test_res = test_class.myMethod()
self.assertIsNotNone(test_res)
if __name__ == '__main__':
unittest.main()
Depending on how the code uses the two globals, setUpClass might be a better place to initialize them, but it's probably not worth worrying about. Once you have tests for the code, you are in a better position to remove the dependency on these globals from the code.

Module has no attribute... unsure

In an effort to create a simple script for another question to highlight an issue I am having, I ran into this confusing problem. My code won't run. I read several other Stack Overflow answers and ensured that I am not using a pre-defined class. I am also not doing a cyclical import. I have no idea. I am new to Python.
TestClass.py:
class TestClass:
test_number = 10000 # Default score limit
def __init__(self):
pass
def check_test_number(self):
# this needs to be an instance method
print(TestClass.test_number)
TestScript.py:
import TestClass
def main():
t1 = TestClass.TestClass()
print(TestClass.test_number)
print(t1.check_test_number())
TestClass.test_number = 500
print(TestClass.test_number)
print(t1.check_test_number())
if __name__ == "__main__":
main()
I recieve this error:
AttributeError: module 'TestClass' has no attribute 'test_number'
Thanks in advance, guys!
You need to refer to the fields test_number and score_limit on lines print(TestClass.test_number) and TestClass.test_number= 500 like this: TestClass.TestClass.test_number or use expression from *your_file* import *ClassName*. In your code you're trying to refer not to a class field, but to a method or variable in the file TestClass.py. I advise you to use snake_case to name .py files to avoid confusions. I think your code can be rewritten like this (with renaming TestClass.py):
test_script.py
from test_class import TestClass
def main():
t1 = TestClass()
print(TestClass.test_number)
print(t1.check_test_number())
TestClass.test_number= 500
print(TestClass.test_number)
print(t1.check_test_number())
if __name__ == "__main__":
main()

Passing arguments from one class to another class using threading python

I'm new to threading and python. I would like to understand how to pass multiple arguments from one class to another class in python using threading.
I'm using a main thread to call a class- Process then inside the run I'm doing some business logic and calling another class- build using thread and passing multiple arguments.
The run of build class is getting executed but Inside the build class, I'm unable to access those arguments and hence not able to proceed further.
Not sure if my approach is right? Any suggestions will be appreciated.
Below is my main class :
from threading import Thread
import logging as log
from process import Process
if __name__ == '__main__':
try:
proc = Process()
proc.start()
except Exception as e:
#log some error
Inside Process:
#all the dependencies are imported
class Process(Thread):
'''
classdocs
'''
def __init__(self):
'''
Constructor
'''
Thread.__init__(self)
#other intializations
def run(self):
#some other logic
self.notification(pass_some_data)
#inside notification I'm calling another thread
def notification(self,passed_data):
#passed data is converted dict1
#tup1 is being formed from another function.
#build is a class, and if i don't pass None, i get groupname error.
th = build(None,(tup1,),(dict1,))
th.start()
#inside build
class build(Thread):
def _init_(self,tup1,dict1):
super(build,self).__init__(self)
self.tup1 = tup1
self.dict1 = dict1
def run(self):
#some business logic
#I'm unable to get the arguments being passed here.

Wrote script in OSX, with multiprocessing. Now windows won't play ball

The program/script I've made works on OSX and linux. It uses selenium to scrape data from some pages, manipulates the data and saves it. In order to be more efficient, I included the multiprocessing pool and manager. I create a pool, for each item in a list, it calles the scrap class, starts a phantomjs instance and scrapes. Since I'm using multiprocessing.pool, and I want a way to pass data between the threads, I read that multiprocessing.manager was the way forward. If I wrote
manager = Manager()
info = manager.dict([])
it would create a dict that could be accessed by all threads. It all worked perfectly.
My issue is that the client wants to run this on a windows machine (I wrote the entire thing on OSX) I assumed, it would be as simple as installing python, selenium and launching it. I had errors which later lead me to writing if __name__ == '__main__: at the top of my main.py file, and indenting everything to be inside. The issue is, when I have class scrape(): outside of the if statement, it cannot see the global info, since it is declared outside of the scope. If I insert the class scrape(): inside the if __name__ == '__main__': then i get an attribute error saying
AttributeError: 'module' object has no attribute 'scrape'
And if I go back to declaring manager = manager() and info = manager.dict([]) outside of the if __name__ == '__main__' then I get the error in windows about making sure I use if __name__ == '__main__' it doesn't seem like I can win with this project at the moment.
Code Layout...
Imports...
from multiprocessing import Pool
from multiprocessing import Manager
manager = Manager()
info = manager.dict([])
date = str(datetime.date.today())
class do_scrape():
def __init__():
def...
def scrape_items():#This contains code which creates a pool and then pool.map(do_scrape, s) s = a list of items
def save_scrape():
def update_price():
def main():
main()
Basically, the scrape_items is called by main, then scrape_items uses pool.map(do_scrape, s) so it calls the do_scrape class and passes the list of items to it one by one. The do_scrape then scrapes a web page based on the item url in "s" then saves that info in the global info which is the multiprocessing.manager dict. The above code does not show any if __name__ == '__main__': statements, it is an outline of how it works on my OSX setup. It runs and completes the task as is. If someone could issue a few pointers, I would appreciate it. Thanks
It would be helpful to see your code, but its sounds like you just need to explicitly pass your shared dict to scrape, like this:
import multiprocessing
from functools import partial
def scrape(info, item):
# Use info in here
if __name__ == "__main__":
manager = multiprocessing.Manager()
info = manager.dict()
pool = multiprocessing.Pool()
func = partial(scrape, info) # use a partial to make it easy to pass the dict to pool.map
items = [1,2,3,4,5] # This would be your actual data
results = pool.map(func, items)
#pool.apply_async(scrape, [shared_dict, "abc"]) # In case you're not using map...
Note that you shouldn't put all your code inside the if __name__ == "__main__": guard, just the code that's actually creating processes via multiprocessing, this includes creating the Manager and the Pool.
Any method you want to run in a child process must be declared at the top level of the module, because it has to be importable from __main__ in the child process. When you declared scrape inside the if __name__ ... guard, it could no longer be imported from the __main__ module, so you saw the AttributeError: 'module' object has no attribute 'scrape' error.
Edit:
Taking your example:
import multiprocessing
from functools import partial
date = str(datetime.date.today())
#class do_scrape():
# def __init__():
# def...
def do_scrape(info, s):
# do stuff
# Also note that do_scrape should probably be a function, not a class
def scrape_items():
# scrape_items is called by main(), which is protected by a`if __name__ ...` guard
# so this is ok.
manager = multiprocessing.Manager()
info = manager.dict([])
pool = multiprocessing.Pool()
func = partial(do_scrape, info)
s = [1,2,3,4,5] # Substitute with the real s
results = pool.map(func, s)
def save_scrape():
def update_price():
def main():
scrape_items()
if __name__ == "__main__":
# Note that you can declare manager and info here, instead of in scrape_items, if you wanted
#manager = multiprocessing.Manager()
#info = manager.dict([])
main()
One other important note here is that the first argument to map should be a function, not a class. This is stated in the docs (multiprocessing.map is meant to be equivalent to the built-in map).
Find the starting point of your program, and make sure you wrap only that with your if statement. For example:
Imports...
from multiprocessing import Pool
from multiprocessing import Manager
manager = Manager()
info = manager.dict([])
date = str(datetime.date.today())
class do_scrape():
def __init__():
def...
def scrape_items():#This contains code which creates a pool and then pool.map(do_scrape, s) s = a list of items
def save_scrape():
def update_price():
def main():
if __name__ == "__main__":
main()
Essentially the contents of the if are only executed if you called this file directly when running your python code. If this file/module is included as an import from another file, all attributes will be defined, so you can access various attributes without actually beginning execution of the module.
Read more here:
What does if __name__ == "__main__": do?

python multiprocessing manager & composite pattern sharing

I'm trying to share a composite structure through a multiprocessing manager but I felt in trouble with a "RuntimeError: maximum recursion depth exceeded" when trying to use just one of the Composite class methods.
The class is token from code.activestate and tested by me before inclusion into the manager.
When retrieving the class into a process and invoking its addChild() method I kept the RunTimeError, while outside the process it works.
The composite class inheritates from a SpecialDict class, that implements a ** ____getattr()____ **
method.
Could be possible that while calling addChild() the interpreter of python looks for a different ** ____getattr()____ ** because the right one is not proxied by the manager?
If so It's not clear to me the right way to make a proxy to that class/method
The following code reproduce exactly this condition:
1) this is the manager.py:
from multiprocessing.managers import BaseManager
from CompositeDict import *
class PlantPurchaser():
def __init__(self):
self.comp = CompositeDict('Comp')
def get_cp(self):
return self.comp
class Manager():
def __init__(self):
self.comp = QueuePurchaser().get_cp()
BaseManager.register('get_comp', callable=lambda:self.comp)
self.m = BaseManager(address=('127.0.0.1', 50000), authkey='abracadabra')
self.s = self.m.get_server()
self.s.serve_forever()
2) I want to use the composite into this consumer.py:
from multiprocessing.managers import BaseManager
class Consumer():
def __init__(self):
BaseManager.register('get_comp')
self.m = BaseManager(address=('127.0.0.1', 50000), authkey='abracadabra')
self.m.connect()
self.comp = self.m.get_comp()
ret = self.comp.addChild('consumer')
3) run all launching by a controller.py:
from multiprocessing import Process
class Controller():
def __init__(self):
for child in _run_children():
child.join()
def _run_children():
from manager import Manager
from consumer import Consumer as Consumer
procs = (
Process(target=Manager, name='Manager' ),
Process(target=Consumer, name='Consumer'),
)
for proc in procs:
proc.daemon = 1
proc.start()
return procs
c = Controller()
Take a look this related questions on how to do a proxy for CompositeDict() class
as suggested by AlberT.
The solution given by tgray works but cannot avoid race conditions
Is it possible there is a circular reference between the classes? For example, the outer class has a reference to the composite class, and the composite class has a reference back to the outer class.
The multiprocessing manager works well, but when you have large, complicated class structures, then you are likely to run into an error where a type/reference can not be serialized correctly. The other problem is that errors from multiprocessing manager are very cryptic. This makes debugging failure conditions even more difficult.
I think the problem is that you have to instruct the Manager on how to manage you object, which is not a standard python type.
In other worlds you have to create a proxy for you CompositeDict
You could look at this doc for an example: http://ruffus.googlecode.com/svn/trunk/doc/html/sharing_data_across_jobs_example.html
Python has a default maximum recursion depth of 1000 (or 999, I forget...). But you can change the default behavior thusly:
import sys
sys.setrecursionlimit(n)
Where n is the number of recursions you wish to allow.
Edit:
The above answer does nothing to solve the root cause of this problem (as pointed out in the comments). It only needs to be used if you are intentionally recursing more than 1000 times. If you are in an infinite loop (like in this problem), you will eventually hit whatever limit you set.
To address your actual problem, I re-wrote your code from scratch starting as simply as I could make it and built it up to what I believe is what you want:
import sys
from multiprocessing import Process
from multiprocessing.managers import BaseManager
from CompositDict import *
class Shared():
def __init__(self):
self.comp = CompositeDict('Comp')
def get_comp(self):
return self.comp
def set_comp(self, c):
self.comp = c
class Manager():
def __init__(self):
shared = Shared()
BaseManager.register('get_shared', callable=lambda:shared)
mgr = BaseManager(address=('127.0.0.1', 50000), authkey='abracadabra')
srv = mgr.get_server()
srv.serve_forever()
class Consumer():
def __init__(self, child_name):
BaseManager.register('get_shared')
mgr = BaseManager(address=('127.0.0.1', 50000), authkey='abracadabra')
mgr.connect()
shared = mgr.get_shared()
comp = shared.get_comp()
child = comp.addChild(child_name)
shared.set_comp(comp)
print comp
class Controller():
def __init__(self):
pass
def main(self):
m = Process(target=Manager, name='Manager')
m.daemon = True
m.start()
consumers = []
for i in xrange(3):
p = Process(target=Consumer, name='Consumer', args=('Consumer_' + str(i),))
p.daemon = True
consumers.append(p)
for c in consumers:
c.start()
for c in consumers:
c.join()
return 0
if __name__ == '__main__':
con = Controller()
sys.exit(con.main())
I did this all in one file, but you shouldn't have any trouble breaking it up.
I added a child_name argument to your consumer so that I could check that the CompositDict was getting updated.
Note that there is both a getter and a setter for your CompositDict object. When I only had a getter, each Consumer was overwriting the CompositDict when it added a child.
This is why I also changed your registered method to get_shared instead of get_comp, as you will want access to the setter as well as the getter within your Consumer class.
Also, I don't think you want to try joining your manager process, as it will "serve forever". If you look at the source for the BaseManager (./Lib/multiprocessing/managers.py:Line 144) you'll notice that the serve_forever() function puts you into an infinite loop that is only broken by KeyboardInterrupt or SystemExit.
Bottom line is that this code works without any recursive looping (as far as I can tell), but let me know if you still experience your error.

Categories