Related
I have a class variable in a Utils class.
class Utils:
_raw_data = defaultdict(list)
#classmethod
def raw_data(cls):
return cls._raw_data.copy()
#classmethod
def set_raw_data(cls, key, data):
cls._raw_data[key] = data
The _raw_data was filled with key and value pairs before it was being read.
...
data = [ipaddress.IPv4Network(address) for address in ip_addresses]
Utils.set_raw_data(device_name, data)
But when I try to execute a function in multiprocessing Pool.map that reads the raw_data from Utils class, it returns empty list.
This is the method from the parent class
class Parent:
...
def evaluate_without_prefix(self, devices):
results = []
print(Utils.raw_data()) <------ this print shows that the Utils.raw_data() is empty
for network1, network2 in itertools.product(Utils.raw_data()[devices[0]], Utils.raw_data()[devices[1]]):
if network1.subnet_of(network2):
results.append((devices[0], network1, devices[1], network2))
if network2.subnet_of(network1):
results.append((devices[1], network2, devices[0], network1))
return results
and in the child class, I execute the method from the parent class, with multiprocessing pool.
class Child(Parent):
...
def execute(self):
pool = Pool(os.cpu_count() - 1)
devices = list(itertools.combinations(list(Utils.raw_data().keys()), 2))
results = pool.map(super().evaluate_without_prefix, devices)
return results
The print() in the Parent class shows that the raw_data() is empty, but the variable actually has data, devices variable in Child class actually get data from the raw_data() but when it enters the multiprocessing pool, the raw_data() becomes empty. Any reason for this?
The problem seems to be as follows:
The class data created in your main process must be serialized/de-serialized using pickle so that it can be passed from the main process's address space to the address spaces of the processes in the multiprocessing pool that needs to work with these objects. But the class data in question is an instance of class Parent since you are calling one of its methods, i.e. valuate_without_prefix. But nowhere in that instance is there a reference to class Util or anything that would cause the multiprocessing pool to be serializing the Util class along with the Parent instance. Consequently, when that method references class Util in any of the processes, a new Util will be created and, of course, it will not have its dictionary initialized.
I think the simplest change is to:
Make attribute _raw_data an instance attribute rather than a class attribute (by the way, according to your current usage, there is no need for this to be a defaultdict).
Create an instance of class Util named util and initialize the dictionary via this reference.
Use the initializer and initargs arguments of the multiprocessing.Pool constructor to initialize each process in the multiprocessing pool to have a global variable named util that will be a copy of the util instance created by the main process.
So I would organize the code along the following lines:
class Utils:
def __init__(self):
self._raw_data = {}
def raw_data(self):
# No need to make a copy ???
return self._raw_data.copy()
def set_raw_data(self, key, data):
self._raw_data[key] = data
def init_processes(utils_instance):
"""
Initialize each process in the process pool with global variable utils.
"""
global utils
utils = utils_instance
class Parent:
...
def evaluate_without_prefix(self, devices):
results = []
print(utils.raw_data())
for network1, network2 in itertools.product(utils.raw_data()[devices[0]], utils.raw_data()[devices[1]]):
results.append([network1, network2])
return results
class Child(Parent):
...
def execute(self, utils):
pool = Pool(os.cpu_count() - 1, initializer=init_processes, initargs=(utils,))
# No need to make an explicit list (map will do that for you) ???
devices = list(itertools.combinations(list(utils.raw_data().keys()), 2))
results = pool.map(super().evaluate_without_prefix, devices)
return results
def main():
utils = Utils()
# Initialize utils:
...
data = [ipaddress.IPv4Network(address) for address in ip_addresses]
utils.set_raw_data(device_name, data)
child = Child()
results = child.execute(utils)
if __name__ == '__main__':
main()
Further Explanation
The following program's main process calls class method Foo.set_x to update class attribute x to the value of 10 before creating a multiprocessing pool and invoking worker function worker, which prints out the value of Foo.x.
On Windows, which uses OS spawn to create new processes, the process in the pool is initialized prior to calling the worker function essentially by launching a new Python interpreter and re-executing the source program executing every statement at global scope. Hence the class definition of Foo is created by the Python interpreter compiling it; there is no pickling involved. But Foo.x will be 0.
The same program run on Linux, which uses OS fork to create new processes, inherits a copy-on-write address space from the main process. Therefore, it will have a copy of the Foo class as it existed at the time the multiprocessing pool was created and Foo.x will be 10.
My solution above, which uses a pool initializer to set a global variable in each pool's process's address space to the value of the Util instance, is what is required for Windows platforms and will work also for Linux. An alternative, of course, is to pass the Util instance as an additional argument to your worker function instead of using a pool initializer, but this is generally not as efficient because generally the number of processes in the pool is less than the number of times the worker function is being invoked so less pickling will be required with the pool initializer method.
from multiprocessing import Pool
class Foo:
x = 0
#classmethod
def set_x(cls, x):
cls.x = x
def worker():
print(Foo.x)
if __name__ == '__main__':
Foo.set_x(10)
pool = Pool(1)
pool.apply(worker)
I do not know why, but I get this strange error whenever I try to pass to the method of a shared object shared custom class object. Python version: 3.6.3
Code:
from multiprocessing.managers import SyncManager
class MyManager(SyncManager): pass
class MyClass: pass
class Wrapper:
def set(self, ent):
self.ent = ent
MyManager.register('MyClass', MyClass)
MyManager.register('Wrapper', Wrapper)
if __name__ == '__main__':
manager = MyManager()
manager.start()
try:
obj = manager.MyClass()
lst = manager.list([1,2,3])
collection = manager.Wrapper()
collection.set(lst) # executed fine
collection.set(obj) # raises error
except Exception as e:
raise
Error:
---------------------------------------------------------------------------
Traceback (most recent call last):
File "D:\Program Files\Python363\lib\multiprocessing\managers.py", line 228, in serve_client
request = recv()
File "D:\Program Files\Python363\lib\multiprocessing\connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
File "D:\Program Files\Python363\lib\multiprocessing\managers.py", line 881, in RebuildProxy
return func(token, serializer, incref=incref, **kwds)
TypeError: AutoProxy() got an unexpected keyword argument 'manager_owned'
---------------------------------------------------------------------------
What's the problem here?
I ran into this too, as noted, this is a bug in Python multiprocessing (see issue #30256) and the pull request that corrects this has not yet been merged. The pull request has since been superseded by another PR that makes the same change but adds a test as well.
Apart from manually patching your local installation, you have three other options:
you could use the MakeProxyType() callable to specify your proxytype, without relying on the AutoProxy proxy generator,
you could define a custom proxy class,
you can patch the bug with a monkeypatch
I'll describe those options below, after explaining what AutoProxy does:
What's the point of the AutoProxy class
The multiprocessing Manager pattern gives access to shared values by putting the values all in the same, dedicated 'canonical values server' process. All other processes (clients) talk to the server through proxies that then pass messages back and forth with the server.
The server does need to know what methods are acceptable for the type of object, however, so clients can produce a proxy object with the same methods. This is what the AutoProxy object is for. Whenever a client needs a new instance of your registered class, the default proxy the client creates is an AutoProxy, which then asks the server to tell it what methods it can use.
Once it has the method names, it calls MakeProxyType to construct a new class and then creates an instance for that class to return.
All this is deferred until you actually need an instance of the proxied type, so in principle AutoProxy saves a little bit of memory if you are not using certain classes you have registered. It's very little memory, however, and the downside is that this process has to take place in each client process.
These proxy objects use reference counting to track when the server can remove the canonical value. It is that part that is broken in the AutoProxy callable; a new argument is passed to the proxy type to disable reference counting when the proxy object is being created in the server process rather than in a client but the AutoProxy type wasn't updated to support this.
So, how can you fix this? Here are those 3 options:
Use the MakeProxyType() callable
As mentioned, AutoProxy is really just a call (via the server) to get the public methods of the type, and a call to MakeProxyType(). You can just make these calls yourself, when registering.
So, instead of
from multiprocessing.managers import SyncManager
SyncManager.register("YourType", YourType)
use
from multiprocessing.managers import SyncManager, MakeProxyType, public_methods
# arguments: classname, sequence of method names
YourTypeProxy = MakeProxyType("YourType", public_methods(YourType))
SyncManager.register("YourType", YourType, YourTypeProxy)
Feel free to inline the MakeProxyType() call there.
If you were using the exposed argument to SyncManager.register(), you should pass those names to MakeProxyType instead:
# SyncManager.register("YourType", YourType, exposed=("foo", "bar"))
# becomes
YourTypeProxy = MakeProxyType("YourType", ("foo", "bar"))
SyncManager.register("YourType", YourType, YourTypeProxy)
You'd have to do this for all the pre-registered types, too:
from multiprocessing.managers import SyncManager, AutoProxy, MakeProxyType, public_methods
registry = SyncManager._registry
for typeid, (callable, exposed, method_to_typeid, proxytype) in registry.items():
if proxytype is not AutoProxy:
continue
create_method = hasattr(managers.SyncManager, typeid)
if exposed is None:
exposed = public_methods(callable)
SyncManager.register(
typeid,
callable=callable,
exposed=exposed,
method_to_typeid=method_to_typeid,
proxytype=MakeProxyType(f"{typeid}Proxy", exposed),
create_method=create_method,
)
Create custom proxies
You could not rely on multiprocessing creating a proxy for you. You could just write your own. The proxy is used in all processes except for the special 'managed values' server process, and the proxy should pass messages back and forth. This is not an option for the already-registered types, of course, but I'm mentioning it here because for your own types this offers opportunities for optimisations.
Note that you should have methods for all interactions that need to go back to the 'canonical' value instance, so you'd need to use properties to handle normal attributes or add __getattr__, __setattr__ and __delattr__ methods as needed.
The advantage is that you can have very fine-grained control over what methods actually need to exchange data with the server process; in my specific example, my proxy class caches information that is immutable (the values would never change once the object was created), but were used often. That includes a flag value that controls if other methods would do something, so the proxy could just check the flag value and not talk to the server process if not set. Something like this:
class FooProxy(BaseProxy):
# what methods the proxy is allowed to access through calls
_exposed_ = ("__getattribute__", "expensive_method", "spam")
#property
def flag(self):
try:
v = self._flag
except AttributeError:
# ask for the value from the server, "realvalue.flag"
# use __getattribute__ because it's an attribute, not a property
v = self._flag = self._callmethod("__getattribute__", ("flag",))
return flag
def expensive_method(self, *args, **kwargs):
if self.flag: # cached locally!
return self._callmethod("expensive_method", args, kwargs)
def spam(self, *args, **kwargs):
return self._callmethod("spam", args, kwargs
SyncManager.register("Foo", Foo, FooProxy)
Because MakeProxyType() returns a BaseProxy subclass, you can combine that class with a custom subclass, saving yourself having to write any methods that just consist of return self._callmethod(...):
# a base class with the methods generated for us. The second argument
# doubles as the 'permitted' names, stored as _exposed_
FooProxyBase = MakeProxyType(
"FooProxyBase",
("__getattribute__", "expensive_method", "spam"),
)
class FooProxy(FooProxyBase):
#property
def flag(self):
try:
v = self._flag
except AttributeError:
# ask for the value from the server, "realvalue.flag"
# use __getattribute__ because it's an attribute, not a property
v = self._flag = self._callmethod("__getattribute__", ("flag",))
return flag
def expensive_method(self, *args, **kwargs):
if self.flag: # cached locally!
return self._callmethod("expensive_method", args, kwargs)
def spam(self, *args, **kwargs):
return self._callmethod("spam", args, kwargs
SyncManager.register("Foo", Foo, FooProxy)
Again, this won't solve the issue with standard types nested inside other proxied values.
Apply a monkeypatch
I use this to fix the AutoProxy callable, this should automatically avoid patching when you are running a Python version where the fix has already been applied to the source code:
# Backport of https://github.com/python/cpython/pull/4819
# Improvements to the Manager / proxied shared values code
# broke handling of proxied objects without a custom proxy type,
# as the AutoProxy function was not updated.
#
# This code adds a wrapper to AutoProxy if it is missing the
# new argument.
import logging
from inspect import signature
from functools import wraps
from multiprocessing import managers
logger = logging.getLogger(__name__)
orig_AutoProxy = managers.AutoProxy
#wraps(managers.AutoProxy)
def AutoProxy(*args, incref=True, manager_owned=False, **kwargs):
# Create the autoproxy without the manager_owned flag, then
# update the flag on the generated instance. If the manager_owned flag
# is set, `incref` is disabled, so set it to False here for the same
# result.
autoproxy_incref = False if manager_owned else incref
proxy = orig_AutoProxy(*args, incref=autoproxy_incref, **kwargs)
proxy._owned_by_manager = manager_owned
return proxy
def apply():
if "manager_owned" in signature(managers.AutoProxy).parameters:
return
logger.debug("Patching multiprocessing.managers.AutoProxy to add manager_owned")
managers.AutoProxy = AutoProxy
# re-register any types already registered to SyncManager without a custom
# proxy type, as otherwise these would all be using the old unpatched AutoProxy
SyncManager = managers.SyncManager
registry = managers.SyncManager._registry
for typeid, (callable, exposed, method_to_typeid, proxytype) in registry.items():
if proxytype is not orig_AutoProxy:
continue
create_method = hasattr(managers.SyncManager, typeid)
SyncManager.register(
typeid,
callable=callable,
exposed=exposed,
method_to_typeid=method_to_typeid,
create_method=create_method,
)
Import the above and call the apply() function to fix multiprocessing. Do so before you start the manager server!
Solution editing multiprocessing source code
The original answer by Sergey requires you to edit multiprocessing source code as follows:
Find your multiprocessing package (mine, installed via Anaconda, was in /anaconda3/lib/python3.6/multiprocessing).
Open managers.py
Add the key argument manager_owned=True to the AutoProxy function.
Original AutoProxy:
def AutoProxy(token, serializer, manager=None, authkey=None,
exposed=None, incref=True):
...
Edited AutoProxy:
def AutoProxy(token, serializer, manager=None, authkey=None,
exposed=None, incref=True, manager_owned=True):
...
Solution via code, at run time
I have managed to solve the unexpected keyword argument TypeError exception without editing directly the source code of multiprocessing by instead adding these few lines of code where I use multiprocessing's Managers:
import multiprocessing
# Backup original AutoProxy function
backup_autoproxy = multiprocessing.managers.AutoProxy
# Defining a new AutoProxy that handles unwanted key argument 'manager_owned'
def redefined_autoproxy(token, serializer, manager=None, authkey=None,
exposed=None, incref=True, manager_owned=True):
# Calling original AutoProxy without the unwanted key argument
return backup_autoproxy(token, serializer, manager, authkey,
exposed, incref)
# Updating AutoProxy definition in multiprocessing.managers package
multiprocessing.managers.AutoProxy = redefined_autoproxy
Found temporary solution here.
I've managed to fix it by adding needed keyword to initializer of AutoProxy in multiprocessing\managers.py Though, I don't know if this kwarg is responsible for anything.
I am new to Python, so I apologize if this is a duplicate or overly simple question. I have written a coordinator class that calls two other classes that use the kafka-python library to send/read data from Kafka. I want to write a unit test for my coordinator class but I'm having trouble figuring out how to best to go about this. I was hoping that I could make an alternate constructor that I could pass my mocked objects into, but this doesn't seem to be working as I get an error that test_mycoordinator cannot be resolved. Am I going about testing this class the wrong way? Is there a pythonic way I should be testing it?
Here is what my test class looks like so far:
import unittest
from mock import Mock
from mypackage import mycoordinator
class MyTest(unittest.TestCase):
def setUpModule(self):
# Create a mock producer
producer_attributes = ['__init__', 'run', 'stop']
mock_producer = Mock(name='Producer', spec=producer_attributes)
# Create a mock consumer
consumer_attributes = ['__init__', 'run', 'stop']
data_out = [{u'dataObjectID': u'test1'},
{u'dataObjectID': u'test2'},
{u'dataObjectID': u'test3'}]
mock_consumer = Mock(
name='Consumer', spec=consumer_attributes, return_value=data_out)
self.coor = mycoordinator.test_mycoordinator(mock_producer, mock_consumer)
def test_send_data(self):
# Create some data and send it to the producer
count = 0
while count < 3:
count += 1
testName = 'test' + str(count)
self.coor.sendData(testName , None)
And here is the class I am trying to test:
class MyCoordinator():
def __init__(self):
# Process Command Line Arguments using argparse
...
# Initialize the producer and the consumer
self.myproducer = producer.Producer(self.servers,
self.producer_topic_name)
self.myconsumer = consumer.Consumer(self.servers,
self.consumer_topic_name)
# Constructor used for testing -- DOES NOT WORK
#classmethod
def test_mycoordinator(cls, mock_producer, mock_consumer):
cls.myproducer = mock_producer
cls.myconsumer = mock_consumer
# Send the data to the producer
def sendData(self, data, key):
self.myproducer.run(data, key)
# Receive data from the consumer
def getData(self):
data = self.myconsumer.run()
return data
There is no need to provide a separate constructor. Mocking patches your code to replace objects with mocks. Just use the mock.patch() decorator on your test methods; it'll pass in references to the generated mock objects.
Both producer.Producer() and consumer.Consumer() are then mocked out before you create the instance:
import mock
class MyTest(unittest.TestCase):
#mock.patch('producer.Producer', autospec=True)
#mock.patch('consumer.Consumer', autospec=True)
def test_send_data(self, mock_consumer, mock_producer):
# configure the consumer instance run method
consumer_instance = mock_consumer.return_value
consumer_instance.run.return_value = [
{u'dataObjectID': u'test1'},
{u'dataObjectID': u'test2'},
{u'dataObjectID': u'test3'}]
coor = MyCoordinator()
# Create some data and send it to the producer
for count in range(3):
coor.sendData('test{}'.format(count) , None)
# Now verify that the mocks have been called correctly
mock_producer.assert_has_calls([
mock.Call('test1', None),
mock.Call('test2', None),
mock.Call('test3', None)])
So the moment test_send_data is called, the mock.patch() code replaces the producer.Producer reference with a mock object. Your MyCoordinator class then uses those mock objects rather than the real code. calling producer.Producer() returns a new mock object (the same object that mock_producer.return_value references), etc.
I've made the assumption that producer and consumer are top-level module names. If they are not, provide the full import path. From the mock.patch() documentation:
target should be a string in the form 'package.module.ClassName'. The target is imported and the specified object replaced with the new object, so the target must be importable from the environment you are calling patch() from. The target is imported when the decorated function is executed, not at decoration time.
I am trying to return values from subprocesses but these values are unfortunately unpicklable. So I used global variables in threads module with success but have not been able to retrieve updates done in subprocesses when using multiprocessing module. I hope I'm missing something.
The results printed at the end are always the same as initial values given the vars dataDV03 and dataDV04. The subprocesses are updating these global variables but these global variables remain unchanged in the parent.
import multiprocessing
# NOT ABLE to get python to return values in passed variables.
ants = ['DV03', 'DV04']
dataDV03 = ['', '']
dataDV04 = {'driver': '', 'status': ''}
def getDV03CclDrivers(lib): # call global variable
global dataDV03
dataDV03[1] = 1
dataDV03[0] = 0
# eval( 'CCL.' + lib + '.' + lib + '( "DV03" )' ) these are unpicklable instantiations
def getDV04CclDrivers(lib, dataDV04): # pass global variable
dataDV04['driver'] = 0 # eval( 'CCL.' + lib + '.' + lib + '( "DV04" )' )
if __name__ == "__main__":
jobs = []
if 'DV03' in ants:
j = multiprocessing.Process(target=getDV03CclDrivers, args=('LORR',))
jobs.append(j)
if 'DV04' in ants:
j = multiprocessing.Process(target=getDV04CclDrivers, args=('LORR', dataDV04))
jobs.append(j)
for j in jobs:
j.start()
for j in jobs:
j.join()
print 'Results:\n'
print 'DV03', dataDV03
print 'DV04', dataDV04
I cannot post to my question so will try to edit the original.
Here is the object that is not picklable:
In [1]: from CCL import LORR
In [2]: lorr=LORR.LORR('DV20', None)
In [3]: lorr
Out[3]: <CCL.LORR.LORR instance at 0x94b188c>
This is the error returned when I use a multiprocessing.Pool to return the instance back to the parent:
Thread getCcl (('DV20', 'LORR'),)
Process PoolWorker-1:
Traceback (most recent call last):
File "/alma/ACS-10.1/casa/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap
self.run()
File "/alma/ACS-10.1/casa/lib/python2.6/multiprocessing/process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "/alma/ACS-10.1/casa/lib/python2.6/multiprocessing/pool.py", line 71, in worker
put((job, i, result))
File "/alma/ACS-10.1/casa/lib/python2.6/multiprocessing/queues.py", line 366, in put
return send(obj)
UnpickleableError: Cannot pickle <type 'thread.lock'> objects
In [5]: dir(lorr)
Out[5]:
['GET_AMBIENT_TEMPERATURE',
'GET_CAN_ERROR',
'GET_CAN_ERROR_COUNT',
'GET_CHANNEL_NUMBER',
'GET_COUNT_PER_C_OP',
'GET_COUNT_REMAINING_OP',
'GET_DCM_LOCKED',
'GET_EFC_125_MHZ',
'GET_EFC_COMB_LINE_PLL',
'GET_ERROR_CODE_LAST_CAN_ERROR',
'GET_INTERNAL_SLAVE_ERROR_CODE',
'GET_MAGNITUDE_CELSIUS_OP',
'GET_MAJOR_REV_LEVEL',
'GET_MINOR_REV_LEVEL',
'GET_MODULE_CODES_CDAY',
'GET_MODULE_CODES_CMONTH',
'GET_MODULE_CODES_DIG1',
'GET_MODULE_CODES_DIG2',
'GET_MODULE_CODES_DIG4',
'GET_MODULE_CODES_DIG6',
'GET_MODULE_CODES_SERIAL',
'GET_MODULE_CODES_VERSION_MAJOR',
'GET_MODULE_CODES_VERSION_MINOR',
'GET_MODULE_CODES_YEAR',
'GET_NODE_ADDRESS',
'GET_OPTICAL_POWER_OFF',
'GET_OUTPUT_125MHZ_LOCKED',
'GET_OUTPUT_2GHZ_LOCKED',
'GET_PATCH_LEVEL',
'GET_POWER_SUPPLY_12V_NOT_OK',
'GET_POWER_SUPPLY_15V_NOT_OK',
'GET_PROTOCOL_MAJOR_REV_LEVEL',
'GET_PROTOCOL_MINOR_REV_LEVEL',
'GET_PROTOCOL_PATCH_LEVEL',
'GET_PROTOCOL_REV_LEVEL',
'GET_PWR_125_MHZ',
'GET_PWR_25_MHZ',
'GET_PWR_2_GHZ',
'GET_READ_MODULE_CODES',
'GET_RX_OPT_PWR',
'GET_SERIAL_NUMBER',
'GET_SIGN_OP',
'GET_STATUS',
'GET_SW_REV_LEVEL',
'GET_TE_LENGTH',
'GET_TE_LONG_FLAG_SET',
'GET_TE_OFFSET_COUNTER',
'GET_TE_SHORT_FLAG_SET',
'GET_TRANS_NUM',
'GET_VDC_12',
'GET_VDC_15',
'GET_VDC_7',
'GET_VDC_MINUS_7',
'SET_CLEAR_FLAGS',
'SET_FPGA_LOGIC_RESET',
'SET_RESET_AMBSI',
'SET_RESET_DEVICE',
'SET_RESYNC_TE',
'STATUS',
'_HardwareDevice__componentName',
'_HardwareDevice__hw',
'_HardwareDevice__stickyFlag',
'_LORRBase__logger',
'__del__',
'__doc__',
'__init__',
'__module__',
'_devices',
'clearDeviceCommunicationErrorAlarm',
'getControlList',
'getDeviceCommunicationErrorCounter',
'getErrorMessage',
'getHwState',
'getInternalSlaveCanErrorMsg',
'getLastCanErrorMsg',
'getMonitorList',
'hwConfigure',
'hwDiagnostic',
'hwInitialize',
'hwOperational',
'hwSimulation',
'hwStart',
'hwStop',
'inErrorState',
'isMonitoring',
'isSimulated']
In [6]:
When you use multiprocessing to open a second process, an entirely new instance of Python, with its own global state, is created. That global state is not shared, so changes made by child processes to global variables will be invisible to the parent process.
Additionally, most of the abstractions that multiprocessing provides use pickle to transfer data. All data transferred using proxies must be pickleable; that includes all the objects that a Manager provides. Relevant quotations (my emphasis):
Ensure that the arguments to the methods of proxies are picklable.
And (in the Manager section):
Other processes can access the shared objects by using proxies.
Queues also require pickleable data; the docs don't say so, but a quick test confirms it:
import multiprocessing
import pickle
class Thing(object):
def __getstate__(self):
print 'got pickled'
return self.__dict__
def __setstate__(self, state):
print 'got unpickled'
self.__dict__.update(state)
q = multiprocessing.Queue()
p = multiprocessing.Process(target=q.put, args=(Thing(),))
p.start()
print q.get()
p.join()
Output:
$ python mp.py
got pickled
got unpickled
<__main__.Thing object at 0x10056b350>
The one approach that might work for you, if you really can't pickle the data, is to find a way to store it as a ctype object; a reference to the memory can then be passed to a child process. This seems pretty dodgy to me; I've never done it. But it might be a possible solution for you.
Given your update, it seems like you need to know a lot more about the internals of a LORR. Is LORR a class? Can you subclass from it? Is it a subclass of something else? What's its MRO? (Try LORR.__mro__ and post the output if it works.) If it's a pure python object, it might be possible to subclass it, creating a __setstate__ and a __getstate__ to enable pickling.
Another approach might be to figure out how to get the relevant data out of a LORR instance and pass it via a simple string. Since you say that you really just want to call the methods of the object, why not just do so using Queues to send messages back and forth? In other words, something like this (schematically):
Main Process Child 1 Child 2
LORR 1 LORR 2
child1_in_queue -> get message 'foo'
call 'foo' method
child1_out_queue <- return foo data string
child2_in_queue -> get message 'bar'
call 'bar' method
child2_out_queue <- return bar data string
#DBlas gives you a quick url and reference to the Manager class in an answer, but I think its still a bit vague so I thought it might be helpful for you to just see it applied...
import multiprocessing
from multiprocessing import Manager
ants = ['DV03', 'DV04']
def getDV03CclDrivers(lib, data_dict):
data_dict[1] = 1
data_dict[0] = 0
def getDV04CclDrivers(lib, data_list):
data_list['driver'] = 0
if __name__ == "__main__":
manager = Manager()
dataDV03 = manager.list(['', ''])
dataDV04 = manager.dict({'driver': '', 'status': ''})
jobs = []
if 'DV03' in ants:
j = multiprocessing.Process(
target=getDV03CclDrivers,
args=('LORR', dataDV03))
jobs.append(j)
if 'DV04' in ants:
j = multiprocessing.Process(
target=getDV04CclDrivers,
args=('LORR', dataDV04))
jobs.append(j)
for j in jobs:
j.start()
for j in jobs:
j.join()
print 'Results:\n'
print 'DV03', dataDV03
print 'DV04', dataDV04
Because multiprocessing actually uses separate processes, you cannot simply share global variables because they will be in completely different "spaces" in memory. What you do to a global under one process will not reflect in another. Though I admit that it seems confusing since the way you see it, its all living right there in the same piece of code, so "why shouldn't those methods have access to the global"? Its harder to wrap your head around the idea that they will be running in different processes.
The Manager class is given to act as a proxy for data structures that can shuttle info back and forth for you between processes. What you will do is create a special dict and list from a manager, pass them into your methods, and operate on them locally.
Un-pickle-able data
For your specialize LORR object, you might need to create something like a proxy that can represent the pickable state of the instance.
Not super robust or tested much, but gives you the idea.
class LORRProxy(object):
def __init__(self, lorrObject=None):
self.instance = lorrObject
def __getstate__(self):
# how to get the state data out of a lorr instance
inst = self.instance
state = dict(
foo = inst.a,
bar = inst.b,
)
return state
def __setstate__(self, state):
# rebuilt a lorr instance from state
lorr = LORR.LORR()
lorr.a = state['foo']
lorr.b = state['bar']
self.instance = lorr
When using multiprocess, the only way to pass objects between processes is to use Queue or Pipe; globals are not shared. Objects must be pickleable, so multiprocess won't help you here.
You could also use a multiprocessing Array. This allows you to have a shared state between processes and is probably the closest thing to a global variable.
At the top of main, declare an Array. The first argument 'i' says it will be integers. The second argument gives the initial values:
shared_dataDV03 = multiprocessing.Array ('i', (0, 0)) #a shared array
Then pass this array to the process as an argument:
j = multiprocessing.Process(target=getDV03CclDrivers, args=('LORR',shared_dataDV03))
You have to receive the array argument in the function being called, and then you can modify it within the function:
def getDV03CclDrivers(lib,arr): # call global variable
arr[1]=1
arr[0]=0
The array is shared with the parent, so you can print out the values at the end in the parent:
print 'DV03', shared_dataDV03[:]
And it will show the changes:
DV03 [0, 1]
I use p.map() to spin off a number of processes to remote servers and print the results when they come back at unpredictable times:
Servers=[...]
from multiprocessing import Pool
p=Pool(len(Servers))
p.map(DoIndividualSummary, Servers)
This worked fine if DoIndividualSummary used print for the results, but the overall result was in unpredictable order, which made interpretation difficult. I tried a number of approaches to use global variables but ran into problems. Finally, I succeeded with sqlite3.
Before p.map(), open a sqlite connection and create a table:
import sqlite3
conn=sqlite3.connect('servers.db') # need conn for commit and close
db=conn.cursor()
try: db.execute('''drop table servers''')
except: pass
db.execute('''CREATE TABLE servers (server text, serverdetail text, readings text)''')
conn.commit()
Then, when returning from DoIndividualSummary(), save the results into the table:
db.execute('''INSERT INTO servers VALUES (?,?,?)''', (server,serverdetail,readings))
conn.commit()
return
After the map() statement, print the results:
db.execute('''select * from servers order by server''')
rows=db.fetchall()
for server,serverdetail,readings in rows: print serverdetail,readings
May seem like overkill but it was simpler for me than the recommended solutions.
I have multiple scripts that are exporting a same interface and they're executed using execfile() in insulated scope.
The thing is, I want them to share some resources so that each new script doesn't have to load them again from the start, thus loosing starting speed and using unnecessary amount of RAM.
The scripts are in reality much better encapsulated and guarded from malicious plug-ins than presented in example below, that's where problems for me begins.
The thing is, I want the script that creates a resource to be able to fill it with data, remove data or remove a resource, and of course access it's data.
But other scripts shouldn't be able to change another's scripts resource, just read it. I want to be sure that newly installed plug-ins cannot interfere with already loaded and running ones via abuse of shared resources.
Example:
class SharedResources:
# Here should be a shared resource manager that I tried to write
# but got stuck. That's why I ask this long and convoluted question!
# Some beginning:
def __init__ (self, owner):
self.owner = owner
def __call__ (self):
# Here we should return some object that will do
# required stuff. Read more for details.
pass
class plugin (dict):
def __init__ (self, filename):
dict.__init__(self)
# Here some checks and filling with secure versions of __builtins__ etc.
# ...
self["__name__"] = "__main__"
self["__file__"] = filename
# Add a shared resources manager to this plugin
self["SharedResources"] = SharedResources(filename)
# And then:
execfile(filename, self, self)
# Expose the plug-in interface to outside world:
def __getattr__ (self, a):
return self[a]
def __setattr__ (self, a, v):
self[a] = v
def __delattr__ (self, a):
del self[a]
# Note: I didn't use self.__dict__ because this makes encapsulation easier.
# In future I won't use object itself at all but separate dict to do it. For now let it be
----------------------------------------
# An example of two scripts that would use shared resource and be run with plugins["name"] = plugin("<filename>"):
# Presented code is same in both scripts, what comes after will be different.
def loadSomeResource ():
# Do it here...
return loadedresource
# Then Load this resource if it's not already loaded in shared resources, if it isn't then add loaded resource to shared resources:
shr = SharedResources() # This would be an instance allowing access to shared resources
if not shr.has_key("Default Resources"):
shr.create("Default Resources")
if not shr["Default Resources"].has_key("SomeResource"):
shr["Default Resources"].add("SomeResource", loadSomeResource())
resource = shr["Default Resources"]["SomeResource"]
# And then we use normally resource variable that can be any object.
# Here I Used category "Default Resources" to add and/or retrieve a resource named "SomeResource".
# I want more categories so that plugins that deal with audio aren't mixed with plug-ins that deal with video for instance. But this is not strictly needed.
# Here comes code specific for each plug-in that will use shared resource named "SomeResource" from category "Default Resources".
...
# And end of plugin script!
----------------------------------------
# And then, in main program we load plug-ins:
import os
plugins = {} # Here we store all loaded plugins
for x in os.listdir("plugins"):
plugins[x] = plugin(x)
Let say that our two scripts are stored in plugins directory and are both using some WAVE files loaded into memory.
Plugin that loads first will load the WAVE and put it into RAM.
The other plugin will be able to access already loaded WAVE but not to replace or delete it, thus messing with other plugin.
Now, I want each resource to have an owner, some id or filename of the plugin script, and that this resource is writable only by it's owner.
No tweaking or workarounds should enable the other plugin to access the first one.
I almost did it and then got stuck, and my head is spining with concepts that when implemented do the thing, but only partially.
This eats me, so I cannot concentrate any more. Any suggestion is more than welcome!
Adding:
This is what I use now without any safety included:
# Dict that will hold a category of resources (should implement some security):
class ResourceCategory (dict):
def __getattr__ (self, i): return self[i]
def __setattr__ (self, i, v): self[i] = v
def __delattr__ (self, i): del self[i]
SharedResources = {} # Resource pool
class ResourceManager:
def __init__ (self, owner):
self.owner = owner
def add (self, category, name, value):
if not SharedResources.has_key(category):
SharedResources[category] = ResourceCategory()
SharedResources[category][name] = value
def get (self, category, name):
return SharedResources[category][name]
def rem (self, category, name=None):
if name==None: del SharedResources[category]
else: del SharedResources[category][name]
def __call__ (self, category):
if not SharedResources.has_key(category):
SharedResources[category] = ResourceCategory()
return SharedResources[category]
__getattr__ = __getitem__ = __call__
# When securing, this must not be left as this, it is unsecure, can provide a way back to SharedResources pool:
has_category = has_key = SharedResources.has_key
Now a plugin capsule:
class plugin(dict):
def __init__ (self, path, owner):
dict.__init__()
self["__name__"] = "__main__"
# etc. etc.
# And when adding resource manager to the plugin, register it with this plugin as an owner
self["SharedResources"] = ResourceManager(owner)
# ...
execfile(path, self, self)
# ...
Example of a plugin script:
#-----------------------------------
# Get a category we want. (Using __call__() ) Note: If a category doesn't exist, it is created automatically.
AudioResource = SharedResources("Audio")
# Use an MP3 resource (let say a bytestring):
if not AudioResource.has_key("Beep"):
f = open("./sounds/beep.mp3", "rb")
Audio.Beep = f.read()
f.close()
# Take a reference out for fast access and nicer look:
beep = Audio.Beep # BTW, immutables doesn't propagate as references by themselves, doesn't they? A copy will be returned, so the RAM space usage will increase instead. Immutables shall be wrapped in a composed data type.
This works perfectly but, as I said, messing resources is too much easy here.
I would like an instance of ResourceManager() to be in charge to whom return what version of stored data.
So, my general approach would be this.
Have a central shared resource pool. Access through this pool would be read-only for everybody. Wrap all data in the shared pool so that no one "playing by the rules" can edit anything in it.
Each agent (plugin) maintains knowledge of what it "owns" at the time it loads it. It keeps a read/write reference for itself, and registers a reference to the resource to the centralized read-only pool.
When an plugin is loaded, it gets a reference to the central, read-only pool that it can register new resources with.
So, only addressing the issue of python native data structures (and not instances of custom classes), a fairly locked down system of read-only implementations is as follows. Note that the tricks that are used to lock them down are the same tricks that someone could use to get around the locks, so the sandboxing is very weak if someone with a little python knowledge is actively trying to break it.
import collections as _col
import sys
if sys.version_info >= (3, 0):
immutable_scalar_types = (bytes, complex, float, int, str)
else:
immutable_scalar_types = (basestring, complex, float, int, long)
# calling this will circumvent any control an object has on its own attribute lookup
getattribute = object.__getattribute__
# types that will be safe to return without wrapping them in a proxy
immutable_safe = immutable_scalar_types
def add_immutable_safe(cls):
# decorator for adding a new class to the immutable_safe collection
# Note: only ImmutableProxyContainer uses it in this initial
# implementation
global immutable_safe
immutable_safe += (cls,)
return cls
def get_proxied(proxy):
# circumvent normal object attribute lookup
return getattribute(proxy, "_proxied")
def set_proxied(proxy, proxied):
# circumvent normal object attribute setting
object.__setattr__(proxy, "_proxied", proxied)
def immutable_proxy_for(value):
# Proxy for known container types, reject all others
if isinstance(value, _col.Sequence):
return ImmutableProxySequence(value)
elif isinstance(value, _col.Mapping):
return ImmutableProxyMapping(value)
elif isinstance(value, _col.Set):
return ImmutableProxySet(value)
else:
raise NotImplementedError(
"Return type {} from an ImmutableProxyContainer not supported".format(
type(value)))
#add_immutable_safe
class ImmutableProxyContainer(object):
# the only names that are allowed to be looked up on an instance through
# normal attribute lookup
_allowed_getattr_fields = ()
def __init__(self, proxied):
set_proxied(self, proxied)
def __setattr__(self, name, value):
# never allow attribute setting through normal mechanism
raise AttributeError(
"Cannot set attributes on an ImmutableProxyContainer")
def __getattribute__(self, name):
# enforce attribute lookup policy
allowed_fields = getattribute(self, "_allowed_getattr_fields")
if name in allowed_fields:
return getattribute(self, name)
raise AttributeError(
"Cannot get attribute {} on an ImmutableProxyContainer".format(name))
def __repr__(self):
proxied = get_proxied(self)
return "{}({})".format(type(self).__name__, repr(proxied))
def __len__(self):
# works for all currently supported subclasses
return len(get_proxied(self))
def __hash__(self):
# will error out if proxied object is unhashable
proxied = getattribute(self, "_proxied")
return hash(proxied)
def __eq__(self, other):
proxied = get_proxied(self)
if isinstance(other, ImmutableProxyContainer):
other = get_proxied(other)
return proxied == other
class ImmutableProxySequence(ImmutableProxyContainer, _col.Sequence):
_allowed_getattr_fields = ("count", "index")
def __getitem__(self, index):
proxied = get_proxied(self)
value = proxied[index]
if isinstance(value, immutable_safe):
return value
return immutable_proxy_for(value)
class ImmutableProxyMapping(ImmutableProxyContainer, _col.Mapping):
_allowed_getattr_fields = ("get", "keys", "values", "items")
def __getitem__(self, key):
proxied = get_proxied(self)
value = proxied[key]
if isinstance(value, immutable_safe):
return value
return immutable_proxy_for(value)
def __iter__(self):
proxied = get_proxied(self)
for key in proxied:
if not isinstance(key, immutable_scalar_types):
# If mutable keys are used, returning them could be dangerous.
# If owner never puts a mutable key in, then integrity should
# be okay. tuples and frozensets should be okay as keys, but
# are not supported in this implementation for simplicity.
raise NotImplementedError(
"keys of type {} not supported in "
"ImmutableProxyMapping".format(type(key)))
yield key
class ImmutableProxySet(ImmutableProxyContainer, _col.Set):
_allowed_getattr_fields = ("isdisjoint", "_from_iterable")
def __contains__(self, value):
return value in get_proxied(self)
def __iter__(self):
proxied = get_proxied(self)
for value in proxied:
if isinstance(value, immutable_safe):
yield value
yield immutable_proxy_for(value)
#classmethod
def _from_iterable(cls, it):
return set(it)
NOTE: this is only tested on Python 3.4, but I tried to write it to be compatible with both Python 2 and 3.
Make the root of the shared resources a dictionary. Give a ImmutableProxyMapping of that dictionary to the plugins.
private_shared_root = {}
public_shared_root = ImmutableProxyMapping(private_shared_root)
Create an API where the plugins can register new resources to the public_shared_root, probably on a first-come-first-served basis (if it's already there, you can't register it). Pre-populate private_shared_root with any containers you know you're going to need, or any data you want to share with all plugins but you know you want to be read-only.
It might be convenient if the convention for the keys in the shared root mapping were all strings, like file-system paths (/home/dalen/local/python) or dotted paths like python library objects (os.path.expanduser). That way collision detection is immediate and trivial/obvious if plugins try to add the same resource to the pool.