In what thread will calculations be performed? - python

I have following classes for example:
class CalculatorA:
def functionA(self):
#some calculations
class CalculatorB:
def functionB(self):
#some calculations
class CalculatorC:
def functionC(self):
#some calculations
class Aggregator:
def __init__(self, objectA, objectB, objectC):
self.objectA = objectA
self.objectB = objectB
self.objectC = objectC
def aggregator_function(self):
self.objectA.functionA()
self.objectB.functionB()
self.objectC.functionC()
class Worker(threading.Thread):
def __init__(self,
objectA,
objectB,
objectC):
threading.Thread.__init__(self)
self.objectA = objectA
self.objectB = objectB
self.objectC = objectC
def run(self):
agregator = Aggregator(self.objectA,
self.objectB,
self.objectC)
agregator.aggregator_function()
My main() function:
def main():
objectA = CalculatorA()
objectB = CalculatorB()
objectC = CalculatorC()
worker = Worker(objectA, objectB, objectC)
worker.start()
I create objects of CalculatorA, CalculatorB, CalculatorC classes in the main() function and pass variables as parameters
to the constructor of Worker. An object of the Worker class saves them. Later it passes the variables to the constructor of an Aggregator object in the run() function. It creates an Aggregator object in a separate thread of calculations. Aggregator calls functions functionA(), functionB(), functionC().
My question is, in what thread calculations of the functions functionA(), functionB(), functionC() will be performed? Will they be performed
in the Worker thread, or in the main thread? Should I use threading.local storage, if in the main thread?

Functions are called in whatever thread the calling function is run in. The fact that they're object methods doesn't change this.
Since self.objectA.functionA() is called from aggregator.aggregator_function(), and this is called from the Worker.run() method, it gets called in the Worker thread.

They'll all be run from the worker thread.
You should be able to verify this with something like print(threading.get_ident()).

Related

class variable become empty in multiprocessing pool.map

I have a class variable in a Utils class.
class Utils:
_raw_data = defaultdict(list)
#classmethod
def raw_data(cls):
return cls._raw_data.copy()
#classmethod
def set_raw_data(cls, key, data):
cls._raw_data[key] = data
The _raw_data was filled with key and value pairs before it was being read.
...
data = [ipaddress.IPv4Network(address) for address in ip_addresses]
Utils.set_raw_data(device_name, data)
But when I try to execute a function in multiprocessing Pool.map that reads the raw_data from Utils class, it returns empty list.
This is the method from the parent class
class Parent:
...
def evaluate_without_prefix(self, devices):
results = []
print(Utils.raw_data()) <------ this print shows that the Utils.raw_data() is empty
for network1, network2 in itertools.product(Utils.raw_data()[devices[0]], Utils.raw_data()[devices[1]]):
if network1.subnet_of(network2):
results.append((devices[0], network1, devices[1], network2))
if network2.subnet_of(network1):
results.append((devices[1], network2, devices[0], network1))
return results
and in the child class, I execute the method from the parent class, with multiprocessing pool.
class Child(Parent):
...
def execute(self):
pool = Pool(os.cpu_count() - 1)
devices = list(itertools.combinations(list(Utils.raw_data().keys()), 2))
results = pool.map(super().evaluate_without_prefix, devices)
return results
The print() in the Parent class shows that the raw_data() is empty, but the variable actually has data, devices variable in Child class actually get data from the raw_data() but when it enters the multiprocessing pool, the raw_data() becomes empty. Any reason for this?
The problem seems to be as follows:
The class data created in your main process must be serialized/de-serialized using pickle so that it can be passed from the main process's address space to the address spaces of the processes in the multiprocessing pool that needs to work with these objects. But the class data in question is an instance of class Parent since you are calling one of its methods, i.e. valuate_without_prefix. But nowhere in that instance is there a reference to class Util or anything that would cause the multiprocessing pool to be serializing the Util class along with the Parent instance. Consequently, when that method references class Util in any of the processes, a new Util will be created and, of course, it will not have its dictionary initialized.
I think the simplest change is to:
Make attribute _raw_data an instance attribute rather than a class attribute (by the way, according to your current usage, there is no need for this to be a defaultdict).
Create an instance of class Util named util and initialize the dictionary via this reference.
Use the initializer and initargs arguments of the multiprocessing.Pool constructor to initialize each process in the multiprocessing pool to have a global variable named util that will be a copy of the util instance created by the main process.
So I would organize the code along the following lines:
class Utils:
def __init__(self):
self._raw_data = {}
def raw_data(self):
# No need to make a copy ???
return self._raw_data.copy()
def set_raw_data(self, key, data):
self._raw_data[key] = data
def init_processes(utils_instance):
"""
Initialize each process in the process pool with global variable utils.
"""
global utils
utils = utils_instance
class Parent:
...
def evaluate_without_prefix(self, devices):
results = []
print(utils.raw_data())
for network1, network2 in itertools.product(utils.raw_data()[devices[0]], utils.raw_data()[devices[1]]):
results.append([network1, network2])
return results
class Child(Parent):
...
def execute(self, utils):
pool = Pool(os.cpu_count() - 1, initializer=init_processes, initargs=(utils,))
# No need to make an explicit list (map will do that for you) ???
devices = list(itertools.combinations(list(utils.raw_data().keys()), 2))
results = pool.map(super().evaluate_without_prefix, devices)
return results
def main():
utils = Utils()
# Initialize utils:
...
data = [ipaddress.IPv4Network(address) for address in ip_addresses]
utils.set_raw_data(device_name, data)
child = Child()
results = child.execute(utils)
if __name__ == '__main__':
main()
Further Explanation
The following program's main process calls class method Foo.set_x to update class attribute x to the value of 10 before creating a multiprocessing pool and invoking worker function worker, which prints out the value of Foo.x.
On Windows, which uses OS spawn to create new processes, the process in the pool is initialized prior to calling the worker function essentially by launching a new Python interpreter and re-executing the source program executing every statement at global scope. Hence the class definition of Foo is created by the Python interpreter compiling it; there is no pickling involved. But Foo.x will be 0.
The same program run on Linux, which uses OS fork to create new processes, inherits a copy-on-write address space from the main process. Therefore, it will have a copy of the Foo class as it existed at the time the multiprocessing pool was created and Foo.x will be 10.
My solution above, which uses a pool initializer to set a global variable in each pool's process's address space to the value of the Util instance, is what is required for Windows platforms and will work also for Linux. An alternative, of course, is to pass the Util instance as an additional argument to your worker function instead of using a pool initializer, but this is generally not as efficient because generally the number of processes in the pool is less than the number of times the worker function is being invoked so less pickling will be required with the pool initializer method.
from multiprocessing import Pool
class Foo:
x = 0
#classmethod
def set_x(cls, x):
cls.x = x
def worker():
print(Foo.x)
if __name__ == '__main__':
Foo.set_x(10)
pool = Pool(1)
pool.apply(worker)

Passing arguments from one class to another class using threading python

I'm new to threading and python. I would like to understand how to pass multiple arguments from one class to another class in python using threading.
I'm using a main thread to call a class- Process then inside the run I'm doing some business logic and calling another class- build using thread and passing multiple arguments.
The run of build class is getting executed but Inside the build class, I'm unable to access those arguments and hence not able to proceed further.
Not sure if my approach is right? Any suggestions will be appreciated.
Below is my main class :
from threading import Thread
import logging as log
from process import Process
if __name__ == '__main__':
try:
proc = Process()
proc.start()
except Exception as e:
#log some error
Inside Process:
#all the dependencies are imported
class Process(Thread):
'''
classdocs
'''
def __init__(self):
'''
Constructor
'''
Thread.__init__(self)
#other intializations
def run(self):
#some other logic
self.notification(pass_some_data)
#inside notification I'm calling another thread
def notification(self,passed_data):
#passed data is converted dict1
#tup1 is being formed from another function.
#build is a class, and if i don't pass None, i get groupname error.
th = build(None,(tup1,),(dict1,))
th.start()
#inside build
class build(Thread):
def _init_(self,tup1,dict1):
super(build,self).__init__(self)
self.tup1 = tup1
self.dict1 = dict1
def run(self):
#some business logic
#I'm unable to get the arguments being passed here.

Django background executor

I am trying to run multiple tasks in queue. The tasks come on user input. What i tried was creating a singleton class with ThreadPoolExecutor property and adding tasks into it. The tasks are added fine, but it looks like only the first addition of set of tasks works. The following are added but not executed.
class WebsiteTagScrapper:
class __WebsiteTagScrapper:
def __init__(self):
self.executor = ThreadPoolExecutor(max_workers=5)
instance = None
def __new__(cls): # __new__ always a classmethod
if not WebsiteTagScrapper.instance:
WebsiteTagScrapper.instance = WebsiteTagScrapper.__WebsiteTagScrapper()
return WebsiteTagScrapper.instance
I used multiprocess in one of my project without using celery, cause i think it was overkill for my use.
Maybe you could do something like this:
from multiprocessing import Process
class MyQueuProcess(Process):
def __init__(self):
super(MyQueuProcess, self).__init__()
self.tasks = []
def add_task(self, task):
self.tasks.append(task)
def run(self):
for task in self.tasks:
#Do your task
You just have to create an instance in your view, set up your task and then run(). Also if you need to access your database, you will need to import django in your child and then make a django.setup().

access a method's attributes after calling a celery task

If I have a class with attributes...
class Test(object):
def __init__():
self.variable='test'
self.variable2=''
def testmethod():
print self.variable2
t=Test()
#celery.task(name="tasks.application")
def application():
t.testmethod()
t.variable2 = '1234'
job = application.apply_async()
and I want to access the attributes of my class...
In my testing I am not able to access t.variable2 once inside of my celery task... How can I get access to those attributes?
Thanks!
Tasks are executed by a separate worker process, which being in a different process does not have access to the thread where you assigned those values. You need to send the data required by the class you're instantiating inside the task as arguments to the task, and create the instance inside the task as well:
#celery.task(name="tasks.application")
def application(variable, variable2):
t = Test()
t.variable = variable
t.variable2 = variable2
t.testmethod()
job = application.apply_async(['test', '1234'])

python mock get calling object

I have a UUT class which instantiates Worker objects, and calls their do_stuff() method.
The Worker objects uses a Provider object for two things:
Calls methods on the provider object to do some stuff
Gets notifications from the provider by subscribing a method with the provider's events
When a worker gets a notification, it processes it, an notifies the UUT object, which in reponse can create more Worker objects.
I've already tested each class on its own, and I want to test UUT+Worker together. For that, I intend to mock-out Provider.
import mock
import unittest
import provider
class Worker():
def __init__(self, *args):
resource.default_resource.subscribe('on_spam', self._on_spam) # I'm going to patch 'resource.default_resource'
def do_stuff(self):
self.resource.do_stuff()
def _on_spam(self, message):
self._tell_uut_to_create_more_workers(message['num_of_new_workers_to_create'])
class UUT():
def __init__(self, *args):
self._workers = []
def gen_worker_and_do_stuff(self, *args)
worker = Worker(*args)
self._workers.append(resource)
worker.do_stuff()
class TestCase1(unittest.TestCase):
#mock.patch('resource.default_resource', spec_set=resource.Resource)
def test_1(self, mock_resource):
uut = UUT()
uut.gen_worker_and_do_stuff('Egg') # <-- say I automagically grabbed the resulting Worker into self.workers
self.workers[0]._on_spam({'num_of_new_workers_to_create':5}) # <-- I also want to get hold of the newly-created workers
Is there a way to grab the worker objects generated by uut, without directly accessing the _workers list in uut (which is an implementation detail)?
I guess I can do it in Worker.__init__, where the worker subscribes to provider events, so I guess the question reduces to:
How to I extract the self in the callee, when calling resource.default_resource.subscribe('on_spam', self._on_spam)?
As an application of the Dependency Inversion principle, I'd pass the Worker class as a dependency to UUT:
class UUT():
def __init__(self, make_worker=Worker):
self._workers = []
self._make_worker = make_worker
def gen_worker_and_connect(self, *args)
worker = self._make_worker(*args)
self._workers.append(resource)
worker.connect()
Then provide anything you want from the test instead of Worker. This own function could share the created object with the test scope. Besides solving this particular problem, that would also make the dependency explicit and independent of the UUT implementation. And you would not need to mock the resource thing as well, which makes the test dependent on things unrelated to the class under test.

Categories