I am trying to wrap the constructor for pyspark Pipeline.init constructor, and monkey patch in the newly wrapped constructor. However, I am running into an error that seems to have something to do with the way Pipeline.init uses decorators
Here is the code that actually does the monkey patch:
def monkeyPatchPipeline():
oldInit = Pipeline.__init__
def newInit(self, **keywordArgs):
oldInit(self, stages=keywordArgs["stages"])
Pipeline.__init__ = newInit
However, when I run a simple program:
import PythonSparkCombinatorLibrary
from pyspark.ml import Pipeline
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.feature import HashingTF, Tokenizer
PythonSparkCombinatorLibrary.TransformWrapper.monkeyPatchPipeline()
tokenizer = Tokenizer(inputCol="text", outputCol="words")
hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(),outputCol="features")
lr = LogisticRegression(maxIter=10, regParam=0.001)
pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
I get this error:
Traceback (most recent call last):
File "C:\<my path>\PythonApplication1\main.py", line 26, in <module>
pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
File "C:<my path>PythonApplication1 \PythonSparkCombinatorLibrary.py", line 36, in newInit
oldInit(self, stages=keywordArgs["stages"])
File "C:\<pyspark_path>\pyspark\__init__.py", line 98, in wrapper
return func(*args, **kwargs)
File "C:\<pyspark_path>\pyspark\ml\pipeline.py", line 63, in __init__
kwargs = self.__init__._input_kwargs
AttributeError: 'function' object has no attribute '_input_kwargs'
Looking into the pyspark interface, I see that Pipeline.init looks like this:
#keyword_only
def __init__(self, stages=None):
"""
__init__(self, stages=None)
"""
if stages is None:
stages = []
super(Pipeline, self).__init__()
kwargs = self.__init__._input_kwargs
self.setParams(**kwargs)
And noting the #keyword_only decorator, I inspected that code as well:
def keyword_only(func):
"""
A decorator that forces keyword arguments in the wrapped method
and saves actual input keyword arguments in `_input_kwargs`.
"""
#wraps(func)
def wrapper(*args, **kwargs):
if len(args) > 1:
raise TypeError("Method %s forces keyword arguments." % func.__name__)
wrapper._input_kwargs = kwargs
return func(*args, **kwargs)
return wrapper
I'm totally confused both about how this code works in the first place, and also why it seems to cause problems with my own wrapper. I see that wrapper is adding a _input_kwargs field to itself, but how is Pipeline.__init__ about to read that field with self.__init__._input_kwargs? And why doesn't the same thing happen when I wrap Pipeline.__init__ again?
Decorator 101. Decorator is a higher-order function which takes a function as its first argument (and typically only), and returns a function. # annotation is just a syntactic sugar for a simple function call, so following
#decorator
def decorated(x):
...
can be rewritten for example as:
def decorated_(x):
...
decorated = decorator(decorated_)
So Pipeline.__init__ is actually a functools.wrapped wrapper which captures defined __init__ (func argument of the keyword_only) as a part of its closure. When it is called, it uses received kwargs as a function attribute of itself. Basically what happens here can be simplified to:
def f(**kwargs):
f._input_kwargs = kwargs # f is in the current scope
hasattr(f, "_input_kwargs")
False
f(foo=1, bar="x")
hasattr(f, "_input_kwargs")
True
When you further wrap (decorate) __init__ the external function won't have _input_kwargs attached, hence the error. If you want to make it work you have apply the same process, as used by the original __init__, to your own version, for example with the same decorator:
#keyword_only
def newInit(self, **keywordArgs):
oldInit(self, stages=keywordArgs["stages"])
but I liked I mentioned in the comments, you should rather consider subclassing.
Related
I am trying to understand the examples for replacing the use of try-finally and flag variables in Python's documentation
According to the documentation instead of:
cleanup_needed = True
try:
result = perform_operation()
if result:
cleanup_needed = False
finally:
if cleanup_needed:
cleanup_resources()
we could use a small ExitStack-based helper class Callback like this (I added the perform_operation and cleanup_resources function):
from contextlib import ExitStack
class Callback(ExitStack):
def __init__(self, callback, /, *args, **kwds):
super(Callback, self).__init__()
self.callback(callback, *args, **kwds)
def cancel(self):
self.pop_all()
def perform_operation():
return False
def cleanup_resources():
print("Cleaning up resources")
with Callback(cleanup_resources) as cb:
result = perform_operation()
if result:
cb.cancel()
I think the code simulates the exceptional case, where the perform_operation() did not run smoothly and a cleanup is needed (perform_operation() returned False). The Callback class magically takes care of the running the cleanup_resources() function (I can't quite understand why, by the way).
Then I simulated the normal case, where everything runs smoothly and no cleanup is needed, I changed the code to make perform_operation() return True instead. In this case, however, the cleanup_resources function also runs and the code errors out:
$ python minimal.py
Cleaning up resources
Traceback (most recent call last):
File "minimal.py", line 26, in <module>
cb.cancel()
File "minimal.py", line 12, in cancel
self.pop_all()
File "C:\ProgramData\Anaconda3\envs\claw\lib\contextlib.py", line 390, in pop_all
new_stack = type(self)()
TypeError: __init__() missing 1 required positional argument: 'callback'
Can you explain what exactly is going on here and how this whole ExitStack and callack stuff works?
I'm trying to code a method from a class that uses a decorator from another class. The problem is that I need information stored in the Class that contains the decorator (ClassWithDecorator.decorator_param). To achieve that I'm using partial, injecting self as the first argument, but when I do that the self, from the class that uses the decorator " gets lost" somehow and I end up getting an error. Note that this does not happen if I remove partial() from my_decorator() and "self" will be correctly stored inside *args.
See the code sample:
from functools import partial
class ClassWithDecorator:
def __init__(self):
self.decorator_param = "PARAM"
def my_decorator(self, decorated_func):
def my_callable(ClassWithDecorator_instance, *args, **kwargs):
# Do something with decorator_param
print(ClassWithDecorator_instance.decorator_param)
return decorated_func(*args, **kwargs)
return partial(my_callable, self)
decorator_instance = ClassWithDecorator()
class WillCallDecorator:
def __init__(self):
self.other_param = "WillCallDecorator variable"
#decorator_instance.my_decorator
def decorated_method(self):
pass
WillCallDecorator().decorated_method()
I get
PARAM
Traceback (most recent call last):
File "****/decorator.py", line 32, in <module>
WillCallDecorator().decorated_method()
File "****/decorator.py", line 12, in my_callable
return decorated_func(*args, **kwargs)
TypeError: decorated_method() missing 1 required positional argument: 'self'
How can I pass the self corresponding to WillCallDecorator() into decorated_method() but at the same time pass information from its own class to my_callable() ?
It seems that you may want to use partialmethod instead of partial:
From the docs:
class functools.partialmethod(func, /, *args, **keywords)
When func is a non-descriptor callable, an appropriate bound method is created dynamically. This behaves like a normal Python function when used as a method: the self argument will be inserted as the first positional argument, even before the args and keywords supplied to the partialmethod constructor.
So much simpler just to use the self variable you already have. There is absolutely no reason to be using partial or partialmethod here at all:
from functools import partial
class ClassWithDecorator:
def __init__(self):
self.decorator_param = "PARAM"
def my_decorator(self, decorated_func):
def my_callable(*args, **kwargs):
# Do something with decorator_param
print(self.decorator_param)
return decorated_func(*args, **kwargs)
return my_callable
decorator_instance = ClassWithDecorator()
class WillCallDecorator:
def __init__(self):
self.other_param = "WillCallDecorator variable"
#decorator_instance.my_decorator
def decorated_method(self):
pass
WillCallDecorator().decorated_method()
Also, to answer your question about why your code didn't work, when you access something.decorated_method() the code checks whether decorated_method is a function and if so turns it internally into a call WillCallDecorator.decorated_method(something). But the value returned from partial is a functools.partial object, not a function. So the class lookup binding won't happen here.
In more detail, something.method(arg) is equivalent to SomethingClass.method.__get__(something, arg) when something doesn't have an attribute method and its type SomethingClass does have the attribute and the attribute has a method __get__ but the full set of steps for attribute lookup is quite complicated.
What I am trying to do is write a wrapper around another module so that I can transform the parameters that are being passed to the methods of the other module. That was fairly confusing, so here is an example:
import somemodule
class Wrapper:
def __init__(self):
self.transforms = {}
self.transforms["t"] = "test"
# This next function is the one I want to exist
# Please understand the lines below will not compile and are not real code
def __intercept__(self, item, *args, **kwargs):
if "t" in args:
args[args.index("t")] = self.transforms["t"]
return somemodule.item(*args, **kwargs)
The goal is to allow users of the wrapper class to make simplified calls to the underlying module without having to rewrite all of the functions in the module. So in this case if somemodule had a function called print_uppercase then the user could do
w = Wrapper()
w.print_uppercase("t")
and get the output
TEST
I believe the answer lies in __getattr__ but I'm not totally sure how to use it for this application.
__getattr__ combined with defining a function on the fly should work:
# somemodule
def print_uppercase(x):
print(x.upper())
Now:
from functools import wraps
import somemodule
class Wrapper:
def __init__(self):
self.transforms = {}
self.transforms["t"] = "test"
def __getattr__(self, attr):
func = getattr(somemodule, attr)
#wraps(func)
def _wrapped(*args, **kwargs):
if "t" in args:
args = list(args)
args[args.index("t")] = self.transforms["t"]
return func(*args, **kwargs)
return _wrapped
w = Wrapper()
w.print_uppercase('Hello')
w.print_uppercase('t')
Output:
HELLO
TEST
I would approach this by calling the intercept method, and entering the desired method to execute, as a parameter for intercept. Then, in the intercept method, you can search for a method with that name and execute it.
Since your Wrapper object doesn't have any mutable state, it'd be easier to implement without a class. Example wrapper.py:
def func1(*args, **kwargs):
# do your transformations
return somemodule.func1(*args, **kwargs)
Then call it like:
import wrapper as w
print w.func1('somearg')
How do I pass a decorator's function into a job?
I have a decorator that would run a job using the function.
#job
def queueFunction(passedFunction, *args, **kwargs):
# Do some stuff
passedFunction(*args, **kwargs)
def myDecorator(async=True):
def wrapper(function):
def wrappedFunc(*args, **kwargs):
data = DEFAULT_DATA
if async:
queueFunction.delay(function, *args, **kwargs)
else:
data = queueFunction(function, *args, **kwargs)
return data
return wrappedFunc
return wrapper
I get an error when trying to use it.
Can't pickle <function Model.passedFunction at 0x7f410ad4a048>: it's not the same object as modelInstance.models.Model.passedFunction
Using Python 3.4
What happens is that you are passing in the original function (or method) to the queueFunction.delay() function, but that's not the same function that it's qualified name says it is.
In order to run functions in a worker, Python RQ uses the pickle module to serialise both the function and its arguments. But functions (and classes) are serialised as importable names, and when deserialising the pickle module simply imports the recorded name. But it does first check that that will result in the right object. So when pickling, the qualified name is tested to double-check it'll produce the exact same object.
If we use pickle.loads as a sample function, then what roughly happens is this:
>>> import pickle
>>> import sys
>>> sample_function = pickle.loads
>>> module_name = sample_function.__module__
>>> function_name = sample_function.__qualname__
>>> recorded_name = f"{module_name}.{function_name}"
>>> recorded_name
'_pickle.loads'
>>> parent, obj = sys.modules[module_name], None
>>> for name in function_name.split("."): # traverse a dotted path of names
... obj = getattr(parent, name)
...
>>> obj is sample_function
True
Note that pickle.loads is really _pickle.loads; that doesn't matter all that much, but what does matter is that _pickle can be accessed and it has an object that can be found by using the qualified name, and it is the same object still. This will work even for methods on classes (modulename.ClassName.method_name).
But when you decorate a function, you are potentially replacing that function object:
>>> def decorator(f):
... def wrapper(*args, **kwargs):
... return f, f(*args, **kwargs)
... return wrapper
...
>>> #decorator
... def foo(): pass
...
>>> foo.__qualname__
'decorator.<locals>.wrapper'
>>> foo()[0].__qualname__ # original function
'foo'
Note that the decorator result has a very different qualified name from the original! Pickle won't be able to map that back to either the decorator result or to the original function.
You are passing in the original, undecorated function to queueFunction.delay(), and it's qualified name will not match that of the wrappedFunc() function you replaced it with; when pickle tries to import the fully qualified name found on that function object, it'll find the wrappedFunc object and that's not the same object.
There are several ways around this, but the easiest is to store the original function as an attribute on the wrapper, and rename it's qualified name to match. This makes the original function available
You'll have to use he #functools.wraps() utility decorator here to copy various attributes from the original, decorated function over to your wrapper function. This includes the original name.
Here is a version that alters the original function qualified name:
from functools import wraps
def myDecorator(async_=True):
def wrapper(function):
#wraps(function)
def wrappedFunc(*args, **kwargs):
data = DEFAULT_DATA
if async:
queueFunction.delay(function, *args, **kwargs)
else:
data = queueFunction(function, *args, **kwargs)
return data
# make the original available to the pickle module as "<name>.original"
wrappedFunc.original = function
wrappedFunc.original.__qualname__ += ".original"
return wrappedFunc
return wrapper
The #wraps(function) decorator makes sure that wrappedFunc.__qualname__ is set to that of function, so if function was named foo, so now is the wrappedFunc function object. The wrappedFunc.original.__qualname__ += ".original" statement then sets the qualified name of wrappedFunc.original to foo.original, and that's exactly where pickle can find it again!
Note: I renamed async to async_ to make the above code work on Python 3.7 and above; as of Python 3.7 async is a reserved keyword.
I also see that you are making the decision to run something synchronous or asynchronous at decoration time. In that case I'd re-write it to not check the aync_ boolean flag each time you call the function. Just return different wrappers:
from functools import wraps
def myDecorator(async_=True):
def decorator(function):
if async_:
#wraps(function)
def wrapper(*args, **kwargs):
queueFunction.delay(wrappedFunc.original, *args, **kwargs)
return DEFAULT_DATA
# make the original available to the pickle module as "<name>.original"
wrapper.original = function
wrapper.original.__qualname__ += ".original"
else:
#wraps(function)
def wrapper(*args, **kwargs):
return queueFunction(function, *args, **kwargs)
return wrapper
return decorator
I also renamed the various inner functions; myDecorator is a decorator factory that returns the actual decorator, and the decorator returns the wrapper.
Either way, the result is that now the .original object can be pickled:
>>> import pickle
>>> #myDecorator(True)
... def foo(): pass
...
>>> foo.original
<function foo.original at 0x10195dd90>
>>> pickle.dumps(foo.original, pickle.HIGHEST_PROTOCOL)
b'\x80\x04\x95\x1d\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x0cfoo.original\x94\x93\x94.'
I'm using ctypes to work with a library written in C. This C library allows me to register a callback function, which I'm implementing in Python.
Here is the callback function type, according to the ctypes API:
_command_callback = CFUNCTYPE(
UNCHECKED(c_int),
POINTER(vedis_context),
c_int,
POINTER(POINTER(vedis_value)))
Here is a decorator I've written to mark a function as a callback:
def wrap_callback(fn):
return _command_callback(fn)
To use this, I am able to simply write:
#wrap_callback
def my_callback(*args):
print args
return 1 # Needed by C library to indicate OK response.
c_library_func.register_callback(my_callback)
I can now invoke my callback (my_callback) from C and this works perfectly well.
The problem I'm encountering is that there will be some boilerplate behavior I would like to perform as part of these callbacks (such as returning a success flag, etc). To minimize boilerplate, I tried to write a decorator:
def wrap_callback(fn):
def inner(*args, **kwargs):
return fn(*args, **kwargs)
return _command_callback(inner)
Note that this is functionally equivalent to the previous example.
#wrap_callback
def my_callback(*args):
print args
return 1
When I attempt to invoke the callback using this approach, however, I receive the following exception, originating from _ctypes/callbacks.c:
Traceback (most recent call last):
File "_ctypes/callbacks.c", line 314, in 'calling callback function'
File "/home/charles/tmp/scrap/z1/src/vedis/vedis/core.py", line 28, in inner
return fn(*args, **kwargs)
SystemError: Objects/cellobject.c:24: bad argument to internal function
I am not sure what is going on here that would cause the first example to work but the second example to fail. Can anyone shed some light on this? Bonus points if you can help me find a way to decorate these callbacks so I can reduce boilerplate code!
Thanks to eryksyn, I was able to fix this issue. The fix looks like:
def wrap_callback(fn):
def inner(*args, **kwargs):
return fn(*args, **kwargs)
return _command_callback(inner), inner
def my_callback(*args):
print args
return 1
ctypes_cb, my_callback = wrap_callback(my_callback)