Python multiprocessing shared variable list can't append - python

I'm trying to do multiprocessing with Python and there is a variable that needs to be shared across all instances.
The variable to be shared is a list that stores variables of types by appending some user-defined class
<class 'output_handlers.email_output_handler.email_output_handler'>
<class 'output_handlers.dw_output_handler.dw_output_handler'>
as seen from print(type(var)).
I attempted to use a Manager to append the same variables to it's generated list before passing it into the threads:
from multiprocessing import Manager
...
man = Manager()
output_handlers = man.list()
output_handlers.append(variable)
The above yields error TypeError: cannot serialize '_io.TextIOWrapper' object even though it can append simple types like integers and chars.
Attempting to do
tmp = []
tmp.append(variables)
output_handlers = man.list(tmp)
also yields the same error.
I also wanted to use multiprocessing.Value() to explicitly make the list sharable, but I never found the ctype code for the list
Can anyone help out with this problem?

Related

Is it possible to access builtins or any other useful functions through an Ellipsis object?

I have a challenge where I'm given a function where I can pass only a single argument which must be a builtin (no modules of any kind), for example chr or IndexError and use its attributes and call its functions to get access to other builtin types.
For example, if I choose the getattr function, I can access the builtins like this:
def main(a):
builtins = a(a, '__self__')
main(getattr)
Most other functions aren't of much help for my challenge. I know that the attributes are deep and a lot of information can be extracted.
This is a good reference: https://book.hacktricks.xyz/misc/basic-python/bypass-python-sandboxes
What can I get access to using an Ellipsis object, in Python written as ... ?
Subclasses can be accessed using ....__class__.__base__.__subclasses__() which returns a list and eventually get access back using a for loop to find which of those classes's __name__ attribute is catch-warnings, and that class's _module attribute has all the builtins (Code). I cannot use that because the index at which it will appear is always different
The python version I target is 3.9.

Pass Python object as argument to function in "parfeval"

I am trying to pass one Python object as an argument to a function that I am evaluating in the background with parfeval. The Python object is an instance of a Python class, and I detail it below. However, to reproduce the error, I will exemplify with a Python dictionary... However, simply using struct(pydict) would not work because I would lose all the attributes and methods in the Python class.
Assume the Python dictionary is
o = py.dict(pyargs('soup',3.57,'bread',2.29,'bacon',3.91,'salad',5.00));
and the function is
function t = testFunc(x)
t = x{'soup'};
end
If I evaluate the function, I get the correct answer:
>> testFunc(o)
ans =
3.5700
However, if I use parfeval, I get the following error:
>> f = parfeval(#testFunc,1,o);
>> fetchOutputs(f)
Error using parallel.Future/fetchOutputs
One or more futures resulted in an error.
Caused by:
Error using testFunc (line 2)
Invalid or deleted object.
Is there a workaround to this error that doesn't mean I have to recode my whole Python class?
Here is the preview of the object I want to pass as a function to parfeval:
clt =
Python Client with properties:
enforce_enums: 1
api_key: [1×45 py.str]
request_number: [1×1 py.int]
logger: [1×1 py.logging.Logger]
session: [1×1 py.authlib.integrations.httpx_client.oauth2_client.OAuth2Client]
token_metadata: [1×1 py.tda.auth.TokenMetadata]
<tda.client.synchronous.Client object at 0x000001ECA08EAE50>
I didn't find any restrictions in the documentation that says that parfeval function inputs cannot be anything...
https://www.mathworks.com/help/matlab/ref/parfeval.html
"X1,...,Xm — Input arguments
comma-separated list of variables or expressions... Input arguments, specified as a comma-separated list of variables or expressions"
One of the limitations of the MATLAB->Python support is that Python objects cannot be serialized. parfeval (and other parallel constructs) require serialization to transfer data from one MATLAB process to another.
You might be able to work around this by having each worker build the data structure directly and storing it / accessing it via parallel.pool.Constant, like this:
oC = parallel.pool.Constant(#() py.dict(pyargs('soup',3.57,'bread',2.29,'bacon',3.91,'salad',5.00)));
fetchOutputs(parfeval(#(c) c.Value{'salad'}, 1, oC))

Python: how to get size of all objects in current namespace?

I have some code that I am running from my own package and the program is using a lot more memory (60GB) than it should be. How can I print the size of all objects (in bytes) in the current namespace in order to attempt to work out where this memory is being used?
I attempted something like
from pympler import asizeof
for objname in dir():
print(asizeof.asizeof(thing)/1024) # print size in kb
But it doesn't work as it just prints the size of the string containing the name of the object in the namespace. Is there a way to get an object reference to everything in the namespace in order to use this method or is there a better method for working out what is using the memory?
dir() returns only the names present in the local scope. Use the locals() function to get the local scope as a dictionary:
for obj in locals().values():
print(asizeof.asizeof(obj) / 1024)
Note that outside of functions, locals() is the same mapping as globals().
If asizeof() is in the dictionary, you want to filter it out:
for name, obj in locals().items():
if name != 'asizeof':
print(asizeof.asizeof(obj) / 1024)
dir() without arguments is functionally equivalent to sorted(locals()) (a sorted list of the keys of the local namespace).
If you prefer to use a standard library and also want them sorted by size:
import sys
objects=[]
for name,obj in locals().items():
objects.append([name,sys.getsizeof(obj)])
sorted(objects,key=lambda x: x[1],reverse=True)
You can use gc.get_objects() to just fetch all objects tracked by the collector, not just those in a specific namespace. I'd start by using it to count the number of instances of each type as that might give you some clues in itself.
from collections import Counter
import gc
c = Counter(type(o) for o in gc.get_objects())
print(c.most_common(20))
Then you might drill down to find the size of any likely suspects.

how to access the data of a GStreamer buffer in Python?

In the old (pre-GObject-introspection) GStreamer bindings, it was possible to access gst.Buffer data via the .data attribute or by casting to str. This is no longer possible:
>>> p buf.data
*** AttributeError: 'Buffer' object has no attribute 'data'
>>> str(buf)
'<GstBuffer at 0x7fca2c7c2950>'
To access the contents of a Gst.Buffer in recent versions, you must first map() the buffer to get a Gst.MapInfo, which has a data attribute of type bytes (str in Python 2).
(result, mapinfo) = buf.map(Gst.MapFlags.READ)
assert result
try:
# use mapinfo.data here
pass
finally:
buf.unmap(mapinfo)
You can also access the buffer's constituent Gst.Memory elements with get_memory(), and map them individually. (AFAICT, calling Buffer.map() is equivalent to calling .get_all_memory() and mapping the resulting Memory.)
Unfortunately, writing to these buffers is not possible since Python represents them with immutable types even when the Gst.MapFlags.WRITE flag is set. Instead, you'd have to do something like create a new Gst.Memory with the modified data, and use Gst.Buffer.replace_all_memory().

Gracefully-degrading pickling in Python

(You may read this question for some background)
I would like to have a gracefully-degrading way to pickle objects in Python.
When pickling an object, let's call it the main object, sometimes the Pickler raises an exception because it can't pickle a certain sub-object of the main object. For example, an error I've been getting a lot is "can’t pickle module objects." That is because I am referencing a module from the main object.
I know I can write up a little something to replace that module with a facade that would contain the module's attributes, but that would have its own issues(1).
So what I would like is a pickling function that automatically replaces modules (and any other hard-to-pickle objects) with facades that contain their attributes. That may not produce a perfect pickling, but in many cases it would be sufficient.
Is there anything like this? Does anyone have an idea how to approach this?
(1) One issue would be that the module may be referencing other modules from within it.
You can decide and implement how any previously-unpicklable type gets pickled and unpickled: see standard library module copy_reg (renamed to copyreg in Python 3.*).
Essentially, you need to provide a function which, given an instance of the type, reduces it to a tuple -- with the same protocol as the reduce special method (except that the reduce special method takes no arguments, since when provided it's called directly on the object, while the function you provide will take the object as the only argument).
Typically, the tuple you return has 2 items: a callable, and a tuple of arguments to pass to it. The callable must be registered as a "safe constructor" or equivalently have an attribute __safe_for_unpickling__ with a true value. Those items will be pickled, and at unpickling time the callable will be called with the given arguments and must return the unpicked object.
For example, suppose that you want to just pickle modules by name, so that unpickling them just means re-importing them (i.e. suppose for simplicity that you don't care about dynamically modified modules, nested packages, etc, just plain top-level modules). Then:
>>> import sys, pickle, copy_reg
>>> def savemodule(module):
... return __import__, (module.__name__,)
...
>>> copy_reg.pickle(type(sys), savemodule)
>>> s = pickle.dumps(sys)
>>> s
"c__builtin__\n__import__\np0\n(S'sys'\np1\ntp2\nRp3\n."
>>> z = pickle.loads(s)
>>> z
<module 'sys' (built-in)>
I'm using the old-fashioned ASCII form of pickle so that s, the string containing the pickle, is easy to examine: it instructs unpickling to call the built-in import function, with the string sys as its sole argument. And z shows that this does indeed give us back the built-in sys module as the result of the unpickling, as desired.
Now, you'll have to make things a bit more complex than just __import__ (you'll have to deal with saving and restoring dynamic changes, navigate a nested namespace, etc), and thus you'll have to also call copy_reg.constructor (passing as argument your own function that performs this work) before you copy_reg the module-saving function that returns your other function (and, if in a separate run, also before you unpickle those pickles you made using said function). But I hope this simple cases helps to show that there's really nothing much to it that's at all "intrinsically" complicated!-)
How about the following, which is a wrapper you can use to wrap some modules (maybe any module) in something that's pickle-able. You could then subclass the Pickler object to check if the target object is a module, and if so, wrap it. Does this accomplish what you desire?
class PickleableModuleWrapper(object):
def __init__(self, module):
# make a copy of the module's namespace in this instance
self.__dict__ = dict(module.__dict__)
# remove anything that's going to give us trouble during pickling
self.remove_unpickleable_attributes()
def remove_unpickleable_attributes(self):
for name, value in self.__dict__.items():
try:
pickle.dumps(value)
except Exception:
del self.__dict__[name]
import pickle
p = pickle.dumps(PickleableModuleWrapper(pickle))
wrapped_mod = pickle.loads(p)
Hmmm, something like this?
import sys
attribList = dir(someobject)
for attrib in attribList:
if(type(attrib) == type(sys)): #is a module
#put in a facade, either recursively list the module and do the same thing, or just put in something like str('modulename_module')
else:
#proceed with normal pickle
Obviously, this would go into an extension of the pickle class with a reimplemented dump method...

Categories