If you create a new Process in python, it will serialize and copy the entire available scope, as far as I understand it. If you use multiprocessing.Pipe() it also allows sending various things, not just raw bytes.
However, instead of sending, I simply want to update a variable that contains a simple POD object like this:
class MyStats:
def __init__(self):
self.bytes_read = 0
self.bytes_written = 0
So say that in a process, when I update these stats, I want to tell python to serialize it and send it to the parent process' side somehow. I don't want to have to create multiprocessing.Value for each and every one of these things, that sounds super tedious.
Is there a way to tell python to pass and overwrite a specific object property somehow?
A manager is what you need here: it will be slower but all data stored inside will be automatically synced with other processes. Here is a simple example below:
from multiprocessing.managers import BaseManager, public_methods, NamespaceProxy
from multiprocessing import Process
def make_proxy(name, cls, base=None):
"""
Args:
name : A string that should match the variable name the proxy will be assigned to
cls : The class for which you want to create a proxy for
base : If you are subclassing NamespaceProxy (or any other implementation) and want to use that subclass as the
base for this new proxy, then pass the subclass as the base using this argument
"""
exposed = public_methods(cls) + ['__getattribute__', '__setattr__', '__delattr__']
return _MakeProxyType(name, exposed, base)
def _MakeProxyType(name, exposed, base=None):
"""
Attempts to replicate multiprocessing.managers.MakeProxType properly
"""
if base is None:
base = NamespaceProxy
exposed = tuple(exposed)
dic = {}
for meth in exposed:
if hasattr(base, meth):
continue
exec('''def %s(self, *args, **kwds):
return self._callmethod(%r, args, kwds)''' % (meth, meth), dic)
ProxyType = type(name, (base,), dic)
ProxyType._exposed_ = exposed
return ProxyType
class MyStats:
def __init__(self):
self.bytes_read = 0
self.bytes_written = 0
def worker(my_stats):
my_stats.bytes_read = 100
print("Worker process read 100 bytes!")
# Remember to set the name of the variable and the "name" argument to the same value otherwise you will have trouble
# pickling this. If for some reason you cannot do this then you must change the variable's __qualname__ property to
# reflect where the object actually resides so pickle can find it.
MyStatsProxy = make_proxy('MyStatsProxy', MyStats)
if __name__ == "__main__":
# Register our proxy and start the manager process
BaseManager.register("MyStats", MyStats, MyStatsProxy)
manager = BaseManager()
manager.start()
# Create our shared instance and modify it from another process
my_stats = manager.MyStats()
p = Process(target=worker, args=(my_stats,))
p.start()
p.join()
# Check value from main process
print(f"In main process, bytes read are {my_stats.bytes_read}!")
Output
Worker process read 100 bytes!
In main process, bytes read are 100!
Check this question and its answers for more useful information about managers/registering classes and alternate methods to achieve the same result
Note: Keep in mind that managers return pickled values for all objects you access through it. So any modifications you do on mutable objects should be done from within an instance method rather than requesting the mutable object through the proxy and modifying it from outside. For example, doing below will not modify the attribute some_list in the manager at all, only the local copy (to the process) of this attribute will be modified:
my_stats.some_list[0] = "some value"
Instead, you should create an instance method for modifications and call that instead:
my_stats.modify_list(0, "some value")
Alternatively, you can also force the manager to update the mutable object by re-assigning the new value for the object:
local_copy = my_stats.some_list
local_copy[0] = "some value"
my_stats.some_list = local_copy
I have a class variable in a Utils class.
class Utils:
_raw_data = defaultdict(list)
#classmethod
def raw_data(cls):
return cls._raw_data.copy()
#classmethod
def set_raw_data(cls, key, data):
cls._raw_data[key] = data
The _raw_data was filled with key and value pairs before it was being read.
...
data = [ipaddress.IPv4Network(address) for address in ip_addresses]
Utils.set_raw_data(device_name, data)
But when I try to execute a function in multiprocessing Pool.map that reads the raw_data from Utils class, it returns empty list.
This is the method from the parent class
class Parent:
...
def evaluate_without_prefix(self, devices):
results = []
print(Utils.raw_data()) <------ this print shows that the Utils.raw_data() is empty
for network1, network2 in itertools.product(Utils.raw_data()[devices[0]], Utils.raw_data()[devices[1]]):
if network1.subnet_of(network2):
results.append((devices[0], network1, devices[1], network2))
if network2.subnet_of(network1):
results.append((devices[1], network2, devices[0], network1))
return results
and in the child class, I execute the method from the parent class, with multiprocessing pool.
class Child(Parent):
...
def execute(self):
pool = Pool(os.cpu_count() - 1)
devices = list(itertools.combinations(list(Utils.raw_data().keys()), 2))
results = pool.map(super().evaluate_without_prefix, devices)
return results
The print() in the Parent class shows that the raw_data() is empty, but the variable actually has data, devices variable in Child class actually get data from the raw_data() but when it enters the multiprocessing pool, the raw_data() becomes empty. Any reason for this?
The problem seems to be as follows:
The class data created in your main process must be serialized/de-serialized using pickle so that it can be passed from the main process's address space to the address spaces of the processes in the multiprocessing pool that needs to work with these objects. But the class data in question is an instance of class Parent since you are calling one of its methods, i.e. valuate_without_prefix. But nowhere in that instance is there a reference to class Util or anything that would cause the multiprocessing pool to be serializing the Util class along with the Parent instance. Consequently, when that method references class Util in any of the processes, a new Util will be created and, of course, it will not have its dictionary initialized.
I think the simplest change is to:
Make attribute _raw_data an instance attribute rather than a class attribute (by the way, according to your current usage, there is no need for this to be a defaultdict).
Create an instance of class Util named util and initialize the dictionary via this reference.
Use the initializer and initargs arguments of the multiprocessing.Pool constructor to initialize each process in the multiprocessing pool to have a global variable named util that will be a copy of the util instance created by the main process.
So I would organize the code along the following lines:
class Utils:
def __init__(self):
self._raw_data = {}
def raw_data(self):
# No need to make a copy ???
return self._raw_data.copy()
def set_raw_data(self, key, data):
self._raw_data[key] = data
def init_processes(utils_instance):
"""
Initialize each process in the process pool with global variable utils.
"""
global utils
utils = utils_instance
class Parent:
...
def evaluate_without_prefix(self, devices):
results = []
print(utils.raw_data())
for network1, network2 in itertools.product(utils.raw_data()[devices[0]], utils.raw_data()[devices[1]]):
results.append([network1, network2])
return results
class Child(Parent):
...
def execute(self, utils):
pool = Pool(os.cpu_count() - 1, initializer=init_processes, initargs=(utils,))
# No need to make an explicit list (map will do that for you) ???
devices = list(itertools.combinations(list(utils.raw_data().keys()), 2))
results = pool.map(super().evaluate_without_prefix, devices)
return results
def main():
utils = Utils()
# Initialize utils:
...
data = [ipaddress.IPv4Network(address) for address in ip_addresses]
utils.set_raw_data(device_name, data)
child = Child()
results = child.execute(utils)
if __name__ == '__main__':
main()
Further Explanation
The following program's main process calls class method Foo.set_x to update class attribute x to the value of 10 before creating a multiprocessing pool and invoking worker function worker, which prints out the value of Foo.x.
On Windows, which uses OS spawn to create new processes, the process in the pool is initialized prior to calling the worker function essentially by launching a new Python interpreter and re-executing the source program executing every statement at global scope. Hence the class definition of Foo is created by the Python interpreter compiling it; there is no pickling involved. But Foo.x will be 0.
The same program run on Linux, which uses OS fork to create new processes, inherits a copy-on-write address space from the main process. Therefore, it will have a copy of the Foo class as it existed at the time the multiprocessing pool was created and Foo.x will be 10.
My solution above, which uses a pool initializer to set a global variable in each pool's process's address space to the value of the Util instance, is what is required for Windows platforms and will work also for Linux. An alternative, of course, is to pass the Util instance as an additional argument to your worker function instead of using a pool initializer, but this is generally not as efficient because generally the number of processes in the pool is less than the number of times the worker function is being invoked so less pickling will be required with the pool initializer method.
from multiprocessing import Pool
class Foo:
x = 0
#classmethod
def set_x(cls, x):
cls.x = x
def worker():
print(Foo.x)
if __name__ == '__main__':
Foo.set_x(10)
pool = Pool(1)
pool.apply(worker)
I am trying to return values from subprocesses but these values are unfortunately unpicklable. So I used global variables in threads module with success but have not been able to retrieve updates done in subprocesses when using multiprocessing module. I hope I'm missing something.
The results printed at the end are always the same as initial values given the vars dataDV03 and dataDV04. The subprocesses are updating these global variables but these global variables remain unchanged in the parent.
import multiprocessing
# NOT ABLE to get python to return values in passed variables.
ants = ['DV03', 'DV04']
dataDV03 = ['', '']
dataDV04 = {'driver': '', 'status': ''}
def getDV03CclDrivers(lib): # call global variable
global dataDV03
dataDV03[1] = 1
dataDV03[0] = 0
# eval( 'CCL.' + lib + '.' + lib + '( "DV03" )' ) these are unpicklable instantiations
def getDV04CclDrivers(lib, dataDV04): # pass global variable
dataDV04['driver'] = 0 # eval( 'CCL.' + lib + '.' + lib + '( "DV04" )' )
if __name__ == "__main__":
jobs = []
if 'DV03' in ants:
j = multiprocessing.Process(target=getDV03CclDrivers, args=('LORR',))
jobs.append(j)
if 'DV04' in ants:
j = multiprocessing.Process(target=getDV04CclDrivers, args=('LORR', dataDV04))
jobs.append(j)
for j in jobs:
j.start()
for j in jobs:
j.join()
print 'Results:\n'
print 'DV03', dataDV03
print 'DV04', dataDV04
I cannot post to my question so will try to edit the original.
Here is the object that is not picklable:
In [1]: from CCL import LORR
In [2]: lorr=LORR.LORR('DV20', None)
In [3]: lorr
Out[3]: <CCL.LORR.LORR instance at 0x94b188c>
This is the error returned when I use a multiprocessing.Pool to return the instance back to the parent:
Thread getCcl (('DV20', 'LORR'),)
Process PoolWorker-1:
Traceback (most recent call last):
File "/alma/ACS-10.1/casa/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap
self.run()
File "/alma/ACS-10.1/casa/lib/python2.6/multiprocessing/process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "/alma/ACS-10.1/casa/lib/python2.6/multiprocessing/pool.py", line 71, in worker
put((job, i, result))
File "/alma/ACS-10.1/casa/lib/python2.6/multiprocessing/queues.py", line 366, in put
return send(obj)
UnpickleableError: Cannot pickle <type 'thread.lock'> objects
In [5]: dir(lorr)
Out[5]:
['GET_AMBIENT_TEMPERATURE',
'GET_CAN_ERROR',
'GET_CAN_ERROR_COUNT',
'GET_CHANNEL_NUMBER',
'GET_COUNT_PER_C_OP',
'GET_COUNT_REMAINING_OP',
'GET_DCM_LOCKED',
'GET_EFC_125_MHZ',
'GET_EFC_COMB_LINE_PLL',
'GET_ERROR_CODE_LAST_CAN_ERROR',
'GET_INTERNAL_SLAVE_ERROR_CODE',
'GET_MAGNITUDE_CELSIUS_OP',
'GET_MAJOR_REV_LEVEL',
'GET_MINOR_REV_LEVEL',
'GET_MODULE_CODES_CDAY',
'GET_MODULE_CODES_CMONTH',
'GET_MODULE_CODES_DIG1',
'GET_MODULE_CODES_DIG2',
'GET_MODULE_CODES_DIG4',
'GET_MODULE_CODES_DIG6',
'GET_MODULE_CODES_SERIAL',
'GET_MODULE_CODES_VERSION_MAJOR',
'GET_MODULE_CODES_VERSION_MINOR',
'GET_MODULE_CODES_YEAR',
'GET_NODE_ADDRESS',
'GET_OPTICAL_POWER_OFF',
'GET_OUTPUT_125MHZ_LOCKED',
'GET_OUTPUT_2GHZ_LOCKED',
'GET_PATCH_LEVEL',
'GET_POWER_SUPPLY_12V_NOT_OK',
'GET_POWER_SUPPLY_15V_NOT_OK',
'GET_PROTOCOL_MAJOR_REV_LEVEL',
'GET_PROTOCOL_MINOR_REV_LEVEL',
'GET_PROTOCOL_PATCH_LEVEL',
'GET_PROTOCOL_REV_LEVEL',
'GET_PWR_125_MHZ',
'GET_PWR_25_MHZ',
'GET_PWR_2_GHZ',
'GET_READ_MODULE_CODES',
'GET_RX_OPT_PWR',
'GET_SERIAL_NUMBER',
'GET_SIGN_OP',
'GET_STATUS',
'GET_SW_REV_LEVEL',
'GET_TE_LENGTH',
'GET_TE_LONG_FLAG_SET',
'GET_TE_OFFSET_COUNTER',
'GET_TE_SHORT_FLAG_SET',
'GET_TRANS_NUM',
'GET_VDC_12',
'GET_VDC_15',
'GET_VDC_7',
'GET_VDC_MINUS_7',
'SET_CLEAR_FLAGS',
'SET_FPGA_LOGIC_RESET',
'SET_RESET_AMBSI',
'SET_RESET_DEVICE',
'SET_RESYNC_TE',
'STATUS',
'_HardwareDevice__componentName',
'_HardwareDevice__hw',
'_HardwareDevice__stickyFlag',
'_LORRBase__logger',
'__del__',
'__doc__',
'__init__',
'__module__',
'_devices',
'clearDeviceCommunicationErrorAlarm',
'getControlList',
'getDeviceCommunicationErrorCounter',
'getErrorMessage',
'getHwState',
'getInternalSlaveCanErrorMsg',
'getLastCanErrorMsg',
'getMonitorList',
'hwConfigure',
'hwDiagnostic',
'hwInitialize',
'hwOperational',
'hwSimulation',
'hwStart',
'hwStop',
'inErrorState',
'isMonitoring',
'isSimulated']
In [6]:
When you use multiprocessing to open a second process, an entirely new instance of Python, with its own global state, is created. That global state is not shared, so changes made by child processes to global variables will be invisible to the parent process.
Additionally, most of the abstractions that multiprocessing provides use pickle to transfer data. All data transferred using proxies must be pickleable; that includes all the objects that a Manager provides. Relevant quotations (my emphasis):
Ensure that the arguments to the methods of proxies are picklable.
And (in the Manager section):
Other processes can access the shared objects by using proxies.
Queues also require pickleable data; the docs don't say so, but a quick test confirms it:
import multiprocessing
import pickle
class Thing(object):
def __getstate__(self):
print 'got pickled'
return self.__dict__
def __setstate__(self, state):
print 'got unpickled'
self.__dict__.update(state)
q = multiprocessing.Queue()
p = multiprocessing.Process(target=q.put, args=(Thing(),))
p.start()
print q.get()
p.join()
Output:
$ python mp.py
got pickled
got unpickled
<__main__.Thing object at 0x10056b350>
The one approach that might work for you, if you really can't pickle the data, is to find a way to store it as a ctype object; a reference to the memory can then be passed to a child process. This seems pretty dodgy to me; I've never done it. But it might be a possible solution for you.
Given your update, it seems like you need to know a lot more about the internals of a LORR. Is LORR a class? Can you subclass from it? Is it a subclass of something else? What's its MRO? (Try LORR.__mro__ and post the output if it works.) If it's a pure python object, it might be possible to subclass it, creating a __setstate__ and a __getstate__ to enable pickling.
Another approach might be to figure out how to get the relevant data out of a LORR instance and pass it via a simple string. Since you say that you really just want to call the methods of the object, why not just do so using Queues to send messages back and forth? In other words, something like this (schematically):
Main Process Child 1 Child 2
LORR 1 LORR 2
child1_in_queue -> get message 'foo'
call 'foo' method
child1_out_queue <- return foo data string
child2_in_queue -> get message 'bar'
call 'bar' method
child2_out_queue <- return bar data string
#DBlas gives you a quick url and reference to the Manager class in an answer, but I think its still a bit vague so I thought it might be helpful for you to just see it applied...
import multiprocessing
from multiprocessing import Manager
ants = ['DV03', 'DV04']
def getDV03CclDrivers(lib, data_dict):
data_dict[1] = 1
data_dict[0] = 0
def getDV04CclDrivers(lib, data_list):
data_list['driver'] = 0
if __name__ == "__main__":
manager = Manager()
dataDV03 = manager.list(['', ''])
dataDV04 = manager.dict({'driver': '', 'status': ''})
jobs = []
if 'DV03' in ants:
j = multiprocessing.Process(
target=getDV03CclDrivers,
args=('LORR', dataDV03))
jobs.append(j)
if 'DV04' in ants:
j = multiprocessing.Process(
target=getDV04CclDrivers,
args=('LORR', dataDV04))
jobs.append(j)
for j in jobs:
j.start()
for j in jobs:
j.join()
print 'Results:\n'
print 'DV03', dataDV03
print 'DV04', dataDV04
Because multiprocessing actually uses separate processes, you cannot simply share global variables because they will be in completely different "spaces" in memory. What you do to a global under one process will not reflect in another. Though I admit that it seems confusing since the way you see it, its all living right there in the same piece of code, so "why shouldn't those methods have access to the global"? Its harder to wrap your head around the idea that they will be running in different processes.
The Manager class is given to act as a proxy for data structures that can shuttle info back and forth for you between processes. What you will do is create a special dict and list from a manager, pass them into your methods, and operate on them locally.
Un-pickle-able data
For your specialize LORR object, you might need to create something like a proxy that can represent the pickable state of the instance.
Not super robust or tested much, but gives you the idea.
class LORRProxy(object):
def __init__(self, lorrObject=None):
self.instance = lorrObject
def __getstate__(self):
# how to get the state data out of a lorr instance
inst = self.instance
state = dict(
foo = inst.a,
bar = inst.b,
)
return state
def __setstate__(self, state):
# rebuilt a lorr instance from state
lorr = LORR.LORR()
lorr.a = state['foo']
lorr.b = state['bar']
self.instance = lorr
When using multiprocess, the only way to pass objects between processes is to use Queue or Pipe; globals are not shared. Objects must be pickleable, so multiprocess won't help you here.
You could also use a multiprocessing Array. This allows you to have a shared state between processes and is probably the closest thing to a global variable.
At the top of main, declare an Array. The first argument 'i' says it will be integers. The second argument gives the initial values:
shared_dataDV03 = multiprocessing.Array ('i', (0, 0)) #a shared array
Then pass this array to the process as an argument:
j = multiprocessing.Process(target=getDV03CclDrivers, args=('LORR',shared_dataDV03))
You have to receive the array argument in the function being called, and then you can modify it within the function:
def getDV03CclDrivers(lib,arr): # call global variable
arr[1]=1
arr[0]=0
The array is shared with the parent, so you can print out the values at the end in the parent:
print 'DV03', shared_dataDV03[:]
And it will show the changes:
DV03 [0, 1]
I use p.map() to spin off a number of processes to remote servers and print the results when they come back at unpredictable times:
Servers=[...]
from multiprocessing import Pool
p=Pool(len(Servers))
p.map(DoIndividualSummary, Servers)
This worked fine if DoIndividualSummary used print for the results, but the overall result was in unpredictable order, which made interpretation difficult. I tried a number of approaches to use global variables but ran into problems. Finally, I succeeded with sqlite3.
Before p.map(), open a sqlite connection and create a table:
import sqlite3
conn=sqlite3.connect('servers.db') # need conn for commit and close
db=conn.cursor()
try: db.execute('''drop table servers''')
except: pass
db.execute('''CREATE TABLE servers (server text, serverdetail text, readings text)''')
conn.commit()
Then, when returning from DoIndividualSummary(), save the results into the table:
db.execute('''INSERT INTO servers VALUES (?,?,?)''', (server,serverdetail,readings))
conn.commit()
return
After the map() statement, print the results:
db.execute('''select * from servers order by server''')
rows=db.fetchall()
for server,serverdetail,readings in rows: print serverdetail,readings
May seem like overkill but it was simpler for me than the recommended solutions.
I am trying to pass some data between requests in tornado. I have a variable called tcp_con.
class Application( tornado.web.Application ):
def __init__( self, **overrides ):
handlers = [
( r"/", hd.MainHandler ),
]
settings = { "cookie_secret": "thisismysecret",
"login_url": "/auth/login",
"template_path": os.path.join( os.path.dirname( __file__ ), "templates" ),
"static_path": os.path.join( os.path.dirname( __file__ ), "static" ),
"xsrf_cookies": True
}
# Initializing variables
self.debug = overrides['debug']
self.__is_running_checks = False
self.tcp_con = {}
self.queue = Queue.Queue()
I fill up with some variables when the use submited a form, but when I am refreshing the page the tcp_con variable is empty, if I refresh again then the dictionary contains the data again, so sometimes the values are in the dictionary, sometimes are not. What can be the problem?
This is the part of the request handlet:
#tornado.web.authenticated
def get( self ):
"""
"""
print self.application.tcp_con
Most likely you have more than one tornado process on server. Each process have own Application class. On request you can get response from different process. So, you cant use this class to save state of your application.
From tornado documentation (class tornado.tcpserver.TCPServer):
start(num_processes=1)
Starts this server in the IOLoop.
By default, we run the server in this process and do not fork any
additional child process.
If num_processes is None or <= 0, we detect the number of cores
available on this machine and fork that number of child processes. If
num_processes is given and > 1, we fork that specific number of
sub-processes.
Since we use processes and not threads, there is no shared memory
between any server code.