Get function handle that were created using exec() - python

I'm creating a function dynamically, and trying to pass the handle to a class for pickling:
def my_func():
exec("""def my_collate_fn():
pass""")
loader = DataLoader(collate_fn=my_collate_fn)
This code above will throw an error saying that my_collate_fn is not defined. Weird thing is that during debugging, the handle did actually exist and I can see it under local scope, but it throws error during runtime. Is there something I missed?
For context I'm strongly avoiding lambda since Pytorch's DataLoader class can't pickle them if number of workers greater that 0.
EDIT:

When you call execyou may pass two additional parameters with dicionaries representing the global and local namespaces where the code is run.
When one creates a function with the def statement, is name is bound on the local namespace. If only globals is given, locals actually defaults to be the same dictionary.
If you do not pass a globals parameter to exec it will use the global namespace where it is called from - the function will be set in the runniong context, just as if where typed inline, and you can just use the name you used inside the exec string. Every linter on earth and some other tools will yell at you.
If you simply pass an ordinary dictionary as he globals parameter, you can retrieve your function from there:
from textwrap import dedent as D
#use of dedent will allow you to keep identation inside the string
# conforming to the indentation outside
def my_func():
namespace = {}
exec(D("""\
def my_collate_fn():
pass
"""), namespace)
return namespace["my_collate_fn"]
The bad news: this is even less pickable than a lambda (if that is possible).
If you have to pass functions around that have to be passed as arguments
to sub-processes (for which the internal mechanism is pickling the function), just declare a plain, named function, at global scope, with def. Pickle will do its best to find the function and pass it around by using its __qualname__, and it should work in most cases - just keep it simple.

Related

__closure__ attribute of function object always be 'None' when defining func inside exec()

EDIT2:
A minimal demonstration is:
code = """\
a=1
def f1():
print(a)
print(f1.__closure__)
f1()
"""
def foo():
exec(code)
foo()
Which gives:
None
Traceback (most recent call last):
File "D:/workfiles/test_eval_rec.py", line 221, in <module>
foo()
File "D:/workfiles//test_eval_rec.py", line 219, in foo
exec(code)
File "<string>", line 5, in <module>
File "<string>", line 3, in f1
NameError: name 'a' is not defined
It can be seen that the __closure__ attribute of function defined inside code str passed to exec() is None, making calling the function fails.
Why does this happen and how can I define a function successfully?
I find several questions that may be related.
Closure lost during callback defined in exec()
Using exec() with recursive functions
Why exec() works differently when invoked inside of function and how to avoid it
Why are closures broken within exec?
NameError: name 'self' is not defined IN EXEC/EVAL
These questions are all related to "defining a function insdie exec()". I think the fourth question here is closest to the essence of these problems. The common cause of these problems is that when defining a function in exec(), the __closure__ attribute of the function object can not be set correctly and will always be None. However, many existing answers to this question didn't realize this point.
Why these questions are caused by wrong __closure__:
When defining a function, __closure__ attribute is set to a dict that contains all local symbols (at the place where the keyword def is used) that is used inside the newly defined funtion. When calling a function, local symbol tables will be retrived from the __closure__ attribute. Since the __closure__ is set to None, the local symbol tables can not be retrived as expected, making the function call fail.
These answers work by making None a correct __closure__ attribute:
Existing solutions to the questions listed above solve these problems by getting the function definition rid of the usage of local symbol, i.e, they make the local symbols used(variable, function definition) global by passing globals() as locals of exec or by using keyword global explicitly in the code string.
Why existing solution unsatisfying:
These solutions I think is just an escape of the core problem of setting __closure__ correctly when define a functioni inside exec(). And as symbols used in the function definition is made global, these solutions will produce redundant global symbol which I don't want.
Original Questions:
(You May ignore this session, I have figured something out, and what I currently want to ask is described as the session EDIT2. The original question can be viewed as a sepecial case of the question described in session EDIT2)
original title of this question is: Wrapping class function to new function with exec() raise NameError that ‘self’ is not defined
I want to wrap an existing member function to a new class function. However, exec() function failed with a NameError that ‘self’ is not defined.
I did some experiment with the following codes. I called globals() and locals() in the execed string, it seems that the locals() is different in the function definition scope when exec() is executed. "self" is in the locals() when in exec(), however, in the function definition scope inside the exec(), "self" is not in the locals().
class test_wrapper_function():
def __init__(self):
# first wrapper
def temp_func():
print("locals() inside the function definition without exec:")
print(locals())
return self.func()
print("locals() outside the function definition without exec:")
print(locals())
self.wrappered_func1 = temp_func
# third wrapper using eval
define_function_str = '''def temp_func():
print("locals() inside the function definition:")
print(locals())
print("globals() inside the function definition:")
print(globals())
return self.func()
print("locals() outside the function definition:")
print(locals())
print("globals() outside the function definition:")
print(globals())
self.wrappered_func2 = temp_func'''
exec(define_function_str)
# call locals() here, it will contains temp_func
def func(self):
print("hi!")
t = test_wrapper_function()
print("**********************************************")
t.wrappered_func1()
t.wrappered_func2()
I have read this link. In the exec(), memeber function, attribute of "self" can be accessed without problem, while in the function difinition in the exec(), "self" is not available any more. Why does this happen?
Why I want to do this:
I am building a PyQt program. I want to create several similar slots(). These slots can be generated by calling one member function with different arguments. I decided to generate these slots using exec() function of python. I also searched with the keyword "nested name scope in python exec", I found this question may be related, but there is no useful answer.
To be more specific. I want to define a family of slots like func_X (X can be 'a', 'b', 'c'...), each do something like self.do_something_on(X). Here, do_something is a member function of my QWidget. So I use a for loop to create these slots function. I used codes like this:
class MyWidget():
def __init__(self):
self.create_slots_family()
def do_something(self, character):
# in fact, this function is much more complex. Do some simplification.
print(character)
def create_slots_i(self, character):
# want to define a function like this:
# if character is 'C', define self.func_C such that self.func_C() works like self.do_something(C)
create_slot_command_str = "self.func_" + character + " = lambda:self.do_something('" + character + "')"
print(create_slot_command_str)
exec(create_slot_command_str)
def create_slots_family(self):
for c in ["A", "B", "C", "D"]:
self.create_slots_i(c)
my_widget = MyWidget()
my_widget.func_A()
Note that, as far as I know, the Qt slots should not accept any parameter, so I have to wrap self.do_something(character) to be a series function self.func_A, self.func_C and so on for all the possible characters.
So the above is what I want to do orignially.
EDIT1:
(You May ignore this session, I have figured something out, and what I currently want to ask is described as the session EDIT2. This simplified version of original question can also be viewed as a sepecial case of the question described in session EDIT2)
As #Mad Physicist suggested. I provide a simplified version here, deleting some codes used for experiments.
class test_wrapper_function():
def __init__(self):
define_function_str = '''\
def temp_func():
return self.func()
self.wrappered_func2 = temp_func'''
exec(define_function_str)
def func(self):
print("hi!")
t = test_wrapper_function()
t.wrappered_func2()
I expected this to print a "hi". However, I got the following exception:
Traceback (most recent call last):
File "D:/workfiles/test_eval_class4.py", line 12, in <module>
t.wrappered_func2()
File "<string>", line 2, in temp_func
NameError: name 'self' is not defined
Using Exec
You've already covered most of the problems and workarounds with exec, but I feel that there is still value in adding a summary.
The key issue is that exec only knows about globals and locals, but not about free variables and the non-local namespace. That is why the docs say
If exec gets two separate objects as globals and locals, the code will be executed as if it were embedded in a class definition.
There is no way to make it run as though it were in a method body. However, as you've already noted, you can make exec create a closure and use that instead of the internal namespace by adding a method body to your snippet. However, there are still a couple of subtle restrictions there.
Your example of what you are trying to do showcases the issues perfectly, so I will use a modified version of that. The goal is to make a method that binds to self and has a variable argument in the exec string.
class Test:
def create_slots_i(self, c):
create_slot_command_str = f"self.func_{c} = lambda: self.do_something('{c}')"
exec(create_slot_command_str)
def do_something(self, c):
print(f'I did {c}!')
There are different ways of getting exec to "see" variables: literals, globals, and internal closures.
Literals. This works robustly, but only for simple types that can be easily instantiated from a string. The usage of c above is a perfect example. This will not help you with a complex object like self:
>>> t = Test()
>>> t.create_slots_i('a')
>>> t.func_a()
...
NameError: name 'self' is not defined
This happens exactly because exec has no concept of free variables. Since self is passed to it via the default locals(), it does not bind the reference to a closure.
globals. You can pass in a name self to exec via globals. There are a couple of ways of doing this, each with its own issues. Remember that globals are accessed by a function through its __globals__ (look at the table under "Callable types") attribute. Normally __globals__ refers to the __dict__ of the module in which a function is defined. In exec, this is the case by default as well, since that's what globals() returns.
Add to globals: You can create a global variable named self, which will make your problem go away, sort of:
>>> self = t
>>> t.func_a()
I did a!
But of course this is a house of cards that falls apart as soon as you delete, self, modify it, or try to run this on multiple instances:
>>> del self
>>> t.func_a()
...
NameError: name 'self' is not defined
Copy globals. A much more versatile solution, on the surface of it, is to copy globals() when you run exec in create_slots_i:
def create_slots_i(self, c):
create_slot_command_str = f"self.func_{c} = lambda: self.do_something('{c}')"
g = globals().copy()
g['self'] = self
exec(create_slot_command_str, g)
This appears to work normally, and for a very limited set of cases, it actually does:
>>> t = Test()
>>> t.create_slots_i('a')
>>> t.func_a()
I did a!
But now, your function's __globals__ attribute is no longer bound to the module you created it in. If it uses any other global values, especially ones that might change, you will not be able to see the changes. For limited functionality, this is OK, but in the general case, it can be a severe handicap.
Internal Closures. This is the solution you already hit upon, where you create a closure within the exec string to let it know that you have a free variable by artificial means. For example:
class Test:
def create_slots_i(self, c):
create_slot_command_str = f"""def make_func(self):
def func_{c}():
self.do_something('{c}')
return func_{c}
self.func_{c} = make_func(self)"""
g = globals().copy()
g['self'] = self
exec(create_slot_command_str, g)
def do_something(self, c):
print(f'I did {c}!')
This approach works completely:
>>> t = Test()
>>> t.create_slots_i('a')
>>> t.func_a()
I did a!
The only real drawbacks here are security, which is always a problem with exec, and the sheer awkwardness of this monstrosity.
A Better Way
Since you are already creating closures, there is really no need to use exec at all. In fact, the only thing you are really doing is creating methods so that self.func_... will bind the method for you, since you need a function with the signature of your slot and access to self. You can write a simple method that will generate functions that you can assign to your slots directly. The advantage of doing it this way is that (a) you avoid calling exec entirely, and (b) you don't need to have a bunch of similarly named auto-generated methods polluting your class namespace. The slot generator would look something like this:
def create_slots_i(self, c):
def slot_func():
self.do_something(c) # This is a real closure now
slot_func.__name__ = f'func_{c}'
return slot_func
Since you will not be referring to these function objects anywhere except your slots, __name__ is the only way to get the "name" under which they were stored. That is the same thing that def does for you under the hood.
You can now assign slots directly:
some_widget.some_signal.connect(self.create_slots_i('a'))
Note
I originally had a more complex approach in mind for you, since I thought you cared about generating bound methods, instead of just setting __name__. In case you have a sufficiently complex scenario where it still applies, here is my original blurb:
A quick recap of the descriptor protocol: when you bind a function with the dot operator, e.g., t.func_a, python looks at the class for descriptors with that name. If your class has a data descriptor (like property, but not functions), then that descriptor will shadow anything you may have placed in the instance __dict__. However, if you have a non-data descriptor (one a __get__ method but without a __set__ method, like a function object), then it will only be bound if an instance attribute does not shadow it. Once this decision has been made, actually invoking the descriptor protocol involves calling type(t).func_a.__get__(t). That's how a bound method knows about self.
Now you can return a bound method from within your generator:
def create_slots_i(self, c):
def slot_func(self):
self.do_something(c) # This is a closure on `c`, but not on `self` until you bind it
slot_func.__name__ = f'func_{c}'
return slot_func.__get__(self)
Why this phenomena happen:
Actually the answer of the question 4 listed above can answer this question.
When call exec() on one code string, the code string is first compiled. I suppose that during compiling, the provided globals and locals is not considered. The symbol in the exec()ed code str is compiled to be in the globals. So the function defined in the code str will be considered using global variables, and thus __closure__ is set to None.
Refer to this answer for more information about what the func exec does.
How to deal with this phenomena:
Imitating the solutions provided in the previous questions, for the minimal demostration the question, it can also be modified this way to work:
a=1 # moving out of the variable 'code'
code = """\
def f1():
print(a)
print(f1.__closure__)
f1()
"""
def foo():
exec(code)
foo()
Although the __closure__ is still None, the exception can be avoided because now only the global symbol is needed and __closure__ should also be None if correctly set. You can read the part The reason why the solutions work in the question body for more information.
This was originally added in Revision 4 of the question.
TL;DR
To set correct __closure__ attribute of function defined in the code string passed to exec() function. Just wrap the total code string with a function definition.
I provide an example here to demonstrate all possible situations. Suppose you want to define a function named foo inside a code string used by exec(). The foo use function, variables that defined inside and outside the code string:
def f1():
outside_local_variable = "this is local variable defined outside code str"
def outside_local_function():
print("this is function defined outside code str")
code = """\
local_variable = "this is local variable defined inside code str"
def local_function():
print("this is function defined inside code str")
def foo():
print(local_variable)
local_function()
print(outside_local_variable)
outside_local_function()
foo()
"""
exec(code)
f1()
It can be wrapper like this:
def f1():
outside_local_variable = "this is local variable defined outside code str"
def outside_local_function():
print("this is function defined outside code str")
code = """\
def closure_helper_func(outside_local_variable, outside_local_function):
local_variable = "this is local variable defined inside code str"
def local_function():
print("this is function defined inside code str")
def foo():
print(local_variable)
local_function()
print(outside_local_variable)
outside_local_function()
foo()
closure_helper_func(outside_local_variable, outside_local_function)
"""
exec(code)
f1()
Detailed explanation:
Why the __closure__ attribute is not corretly set:
please refer to The community wiki answer.
How to set the __closure__ attribute to what's expected:
Just wrap the whole code str with a helper function definition and call the helper function once, then during compiling, the variables are considered to be local, and will be stored in the __closure__ attribute.
For the minimal demonstration in the question, it can be modified to following:
code = """\
def closure_helper_func():
a=1
def f1():
print(a)
print(f1.__closure__)
f1()
closure_helper_func()
"""
def foo():
exec(code)
foo()
This output as expected
(<cell at 0x0000019CE6239A98: int object at 0x00007FFF42BFA1A0>,)
1
The example above provide a way to add symbols that defined in the code str to the __closure__ For example, in the minimal demo, a=1 is a defined inside the code str. But what if one want to add the local symbols defined outside the code str? For example, in the code snippet in EDIT1 session, the self symbol needs to be added to the __closure__, and the symbol is provided in the locals() when exec() is called. Just add the name of these symbols to the arguments of helper function and you can handle this situation.
The following shows how to fix the problem in EDIT1 session.
class test_wrapper_function():
def __init__(self):
define_function_str = '''\
def closure_helper_func(self):
def temp_func():
return self.func()
self.wrappered_func2 = temp_func
closure_helper_func(self)
'''
exec(define_function_str)
def func(self):
print("hi!")
t = test_wrapper_function()
t.wrappered_func2()
The following shows how to fix the codes in the session "Why I want to do this"
class MyWidget():
def __init__(self):
self.create_slots_family()
def do_something(self, character):
# in fact, this function is much more complex. Do some simplification.
print(character)
def create_slots_i(self, character):
# want to define a function like this:
# if character is 'C', define self.func_C such that self.func_C() works like self.do_something(C)
# create_slot_command_str = "self.func_" + character + " = lambda:self.do_something('" + character + "')"
create_slot_command_str = """
def closure_helper_func(self):
self.func_""" + character + " = lambda:self.do_something('" + character + """')
closure_helper_func(self)
"""
# print(create_slot_command_str)
exec(create_slot_command_str)
def create_slots_family(self):
for c in ["A", "B", "C", "D"]:
self.create_slots_i(c)
my_widget = MyWidget()
my_widget.func_A()
This solution seems to be too tricky. However, I can not find a more elegant way to declare that some variables should be local symbol during compiling.

Passing default argument after initialization

I have a python file with many functions.
Every function gets a client as the default argument,
but on the stage of function initialization, it does not exist yet.
He will be created in a main function inside with context manager.
How I can solve it (make client available in every function) and preserve a context manager as suggested in docs?
I tried to use global statement, generator, global variables, and functools.partial but all these attempts failed mostly because of a closed connection outside of the context manager.
... # Many functions
def get_statistic_with(client: TelegramClient = client1): # client1 is undefunded on this stage
client.send_message(entity=config.BOT_NAME, message='/get_statistic_with')
def main():
with TelegramClient('test_client`', config.API_ID, config.API_HASH) as client1, \
TelegramClient('test_client1', config.API_ID, config.API_HASH) as client2:
...
I know that I can use a class to fix it but don't plan to create many instances, I only need to import and run the main function from another file.
I also can place all the functions inside with code block but it will be too ugly.
Passing a connection to every function is a pretty inconvenient.
You mentioned trying both global variables and functools.partial (which works a lot like a lambda function), and neither worked.
I believe the same methods you tried would work if you used both of those together.
Here is an example where I've done that (I used lambda but if you prefer partial it should be an easy substitution):
class TelegramClient:
def __init__(self, name):
self.name = name
def showProofOfWorking(self):
print("--- SUCCESS! This is a TelegramClient named " + self.name)
# In order to avoid namespace errors, there must be a reference to the clients in global scope.
# However, the value of client1 doesn't need to be set until 'main', as will be demonstrated in the
# first of the three methods tested in this example code.
globalVars = {'client1': None, 'client2': None}
# For comparison purposes. Compare how this globally scoped client reference acts relative to the function-scoped 'localClient' inside 'main'
otherGlobalClient = TelegramClient("OTHER CLIENT IN GLOBAL SCOPE")
def refersToGlobalClient(client = globalVars['client1']):
client.showProofOfWorking()
def refersToGlobalWrappedClient(clientWrapper = lambda: globalVars['client1']):
unwrappedClient = clientWrapper()
unwrappedClient.showProofOfWorking()
def refersToLocalWrappedClient(clientWrapper = lambda: client1):
unwrappedClient = clientWrapper()
unwrappedClient.showProofOfWorking()
def main():
client1 = TelegramClient('CLIENT 1')
client2 = TelegramClient('CLIENT 2')
# No references to this from the global scope: it only exists inside 'main'
otherLocalClient = TelegramClient('CLIENT ONLY IN LOCAL SCOPE')
# Only 'client1' will be put into global scope. 'client2' is not defined outside the scope of 'main'.
globalVars['client1'] = client1
# ----- This is the only one of the three different methods tried here that works. -----
# It combines a global variable with a lambda function, which changes the reference of the wrapped global variable from a compile-time value to a runtime value,
# circumventing the fact that at compile time, the value of the global variable will be None as it will not be set to its final value until inside 'main'.
print("FIRST METHOD: Global Scope plus Lambda Wrapper: Works in All Cases")
print("Inside main(), we will now call 'refersToGlobalWrappedClient' with its default argument, which is a function that returns the current value of 'globalVars['client1']'")
refersToGlobalWrappedClient()
print("\nNow we'll call the same function with a different argument from global scope.")
refersToGlobalWrappedClient(lambda: otherGlobalClient)
print("\nNow we'll call the same function with a different argument from local scope.")
refersToGlobalWrappedClient(lambda: otherLocalClient)
print("\n__________________________\n")
print("SECOND METHOD: Global Scope with No Wrapper: Works With Arguments, Does Not Work With Default Parameter")
print("Inside main(), we will now attempt to call 'refersToGlobalClient' with its default argument, which is 'globalVars['client1']'")
try:
refersToGlobalClient()
except Exception as error:
print("\nAN EXCEPTION OCCURRED! Exception:")
print(error)
print("\nNow we'll call the same function with a different argument from global scope.")
refersToGlobalClient(otherGlobalClient)
print("\nNow we'll call the same function with a different argument from local scope.")
refersToGlobalClient(otherLocalClient)
print("\n__________________________\n")
print("THIRD METHOD: Local Scope with Lambda Wrapper: ")
print("Inside main(), we will now attempt to call 'refersToLocalClient' with its default argument, which is a function that returns 'client1'")
try:
refersToLocalWrappedClient()
except Exception as error:
print("\nAN EXCEPTION OCCURRED! Exception:")
print(error)
print("\nNow we'll call the same function with a different argument from global scope.")
refersToLocalWrappedClient(lambda: otherGlobalClient)
print("\nNow we'll call the same function with a different argument from local scope.")
refersToLocalWrappedClient(lambda: otherLocalClient)
print("\n__________________________\n")
main()
Here, I tried three different methods. Like you said, only adding a reference to client1 in global scope, or only wrapping the reference to the not-yet-defined client1 inside a function, do not work and cause errors when you try to use client1 as a default argument.
But, if you both set up a reference to your client1 in global scope (here, it's inside a dict called globalVars) and wrap the default client1 parameter inside a function, you can call your functions successfully from inside main() with the default parameter (client1) or with any other TelegramClient as an argument.
The global scoped reference is necessary to avoid the namespace error name 'client1' is not defined.
The lambda wrapper is necessary to avoid the error you would get from the value of the global reference not being set until main() runs (at compile-time when the default parameters are evaluated, it will be None and you'd get the error 'NoneType' object has no attribute 'showProofOfWorking').
Functions are objects and can have attributes. I don't see this feature used much here on SO and have seen criticisms of their use - if I recall it obfuscates and interferes with introspection, but sometimes my memory is faulty.
If you want to be able to call a function without passing an argument but the value of the argument is not defined/calculated till after the function's definition you could make use of a function attribute like this.
def f():
return 3 * f.x
def main():
f.x = 6
main()
print(f())
You will have to decide whether this is more advantageous than just defining the function with parameter's - without default arguments - and just passing the argument when the function is called. At least for me there is not enough information in your question and [mre] to really understand whether this approach will satisfy your need.
Maybe you should have a module level variable, before the function definition
client1 = None
..
def get_statistic_with(client: TelegramClient = client1):
...
then reassign in the with statement.

Python warn me or prevent me from using global variables

I've gotten myself in trouble a few times now with accidentially (unintentionally) referencing global variables in a function or method definition.
My question is: is there any way to disallow python from letting me reference a global variable? Or at least warn me that I am referencing a global variable?
x = 123
def myfunc() :
print x # throw a warning or something!!!
Let me add that the typical situation where this arrises for my is using IPython as an interactive shell. I use 'execfile' to execute a script that defines a class. In the interpreter, I access the class variable directly to do something useful, then decide I want to add that as a method in my class. When I was in the interpreter, I was referencing the class variable. However, when it becomes a method, it needs to reference 'self'. Here's an example.
class MyClass :
a = 1
b = 2
def add(self) :
return a+b
m = MyClass()
Now in my interpreter I run the script 'execfile('script.py')', I'm inspecting my class and type: 'm.a * m.b' and decide, that would be a useful method to have. So I modify my code to be, with the non-intentional copy/paste error:
class MyClass :
a = 1
b = 2
def add(self) :
return a+b
def mult(self) :
return m.a * m.b # I really meant this to be self.a * self.b
This of course still executes in IPython, but it can really confuse me since it is now referencing the previously defined global variable!
Maybe someone has a suggestion given my typical IPython workflow.
First, you probably don't want to do this. As Martijn Pieters points out, many things, like top-level functions and classes, are globals.
You could filter this for only non-callable globals. Functions, classes, builtin-function-or-methods that you import from a C extension module, etc. are callable. You might also want to filter out modules (anything you import is a global). That still won't catch cases where you, say, assign a function to another name after the def. You could add some kind of whitelisting for that (which would also allow you to create global "constants" that you can use without warnings). Really, anything you come up with will be a very rough guide at best, not something you want to treat as an absolute warning.
Also, no matter how you do it, trying to detect implicit global access, but not explicit access (with a global statement) is going to be very hard, so hopefully that isn't important.
There is no obvious way to detect all implicit uses of global variables at the source level.
However, it's pretty easy to do with reflection from inside the interpreter.
The documentation for the inspect module has a nice chart that shows you the standard members of various types. Note that some of them have different names in Python 2.x and Python 3.x.
This function will get you a list of all the global names accessed by a bound method, unbound method, function, or code object in both versions:
def get_globals(thing):
thing = getattr(thing, 'im_func', thing)
thing = getattr(thing, '__func__', thing)
thing = getattr(thing, 'func_code', thing)
thing = getattr(thing, '__code__', thing)
return thing.co_names
If you want to only handle non-callables, you can filter it:
def get_callable_globals(thing):
thing = getattr(thing, 'im_func', thing)
func_globals = getattr(thing, 'func_globals', {})
thing = getattr(thing, 'func_code', thing)
return [name for name in thing.co_names
if callable(func_globals.get(name))]
This isn't perfect (e.g., if a function's globals have a custom builtins replacement, we won't look it up properly), but it's probably good enough.
A simple example of using it:
>>> def foo(myparam):
... myglobal
... mylocal = 1
>>> print get_globals(foo)
('myglobal',)
And you can pretty easily import a module and recursively walk its callables and call get_globals() on each one, which will work for the major cases (top-level functions, and methods of top-level and nested classes), although it won't work for anything defined dynamically (e.g., functions or classes defined inside functions).
If you only care about CPython, another option is to use the dis module to scan all the bytecode in a module, or .pyc file (or class, or whatever), and log each LOAD_GLOBAL op.
One major advantage of this over the inspect method is that it will find functions that have been compiled, even if they haven't been created yet.
The disadvantage is that there is no way to look up the names (how could there be, if some of them haven't even been created yet?), so you can't easily filter out callables. You can try to do something fancy, like connecting up LOAD_GLOBAL ops to corresponding CALL_FUNCTION (and related) ops, but… that's starting to get pretty complicated.
Finally, if you want to hook things dynamically, you can always replace globals with a wrapper that warns every time you access it. For example:
class GlobalsWrapper(collections.MutableMapping):
def __init__(self, globaldict):
self.globaldict = globaldict
# ... implement at least __setitem__, __delitem__, __iter__, __len__
# in the obvious way, by delegating to self.globaldict
def __getitem__(self, key):
print >>sys.stderr, 'Warning: accessing global "{}"'.format(key)
return self.globaldict[key]
globals_wrapper = GlobalsWrapper(globals())
Again, you can filter on non-callables pretty easily:
def __getitem__(self, key):
value = self.globaldict[key]
if not callable(value):
print >>sys.stderr, 'Warning: accessing global "{}"'.format(key)
return value
Obviously for Python 3 you'd need to change the print statement to a print function call.
You can also raise an exception instead of warning pretty easily. Or you might want to consider using the warnings module.
You can hook this into your code in various different ways. The most obvious one is an import hook that gives each new module a GlobalsWrapper around its normally-built globals. Although I'm not sure how that will interact with C extension modules, but my guess is that it will either work, or be harmlessly ignored, either of which is probably fine. The only problem is that this won't affect your top-level script. If that's important, you can write a wrapper script that execfiles the main script with a GlobalsWrapper, or something like that.
I've been struggling with a similar challenge (especially in Jupyter notebooks) and created a small package to limit the scope of functions.
>>> from localscope import localscope
>>> a = 'hello world'
>>> #localscope
... def print_a():
... print(a)
Traceback (most recent call last):
...
ValueError: `a` is not a permitted global
The #localscope decorator uses python's disassembler to find all instances of the decorated function using a LOAD_GLOBAL (global variable access) or LOAD_DEREF (closure access) statement. If the variable to be loaded is a builtin function, is explicitly listed as an exception, or satisfies a predicate, the variable is permitted. Otherwise, an exception is raised.
Note that the decorator analyses the code statically. Consequently, it does not have access to the values of variables accessed by closure.

How to create cross-module, on-thy-fly variable name in python?

What I am trying to do, is creating a module, with a class; and a function, which is an interface of that class; and a variable name on-the-fly in this function, which is pointing to an instance of that class. This function and the class itself should be in a separate module, and their usage should be in a different python file.
I think, it's much easier to understand what I am trying to do, when you are looking at my code:
This is the first.py:
class FirstClass:
def setID(self, _id):
self.id = _id
def func(self):
pass
# An 'interface' for FirstClass
def fst(ID):
globals()['%s' % ID] = FirstClass(ID)
return globals()['%s' % ID]
Now, if I'm calling fst('some_text') right in first.py, the result is pretty much what I dreamed of, because later on, any time I write some_text.func(), it will call the func(), because some_text is pointing to an instance of FirstClass.
But, when the second.py is something like this:
from first import fst
fst('sample_name')
sample_name.func()
Then the answer from python is going to be like this:
NameError: name 'sample_name' is not defined.
Which is somewhat reasonable.. So my question is: is there a "prettier" method or a completely different one to do this? Or do I have to change something small in my code to get this done?
Thank you!
Don't set it as a global in the function. Instead, just return the new instance from the function and set the global to that return value:
def fst(ID):
return FirstClass(ID)
then in second.py:
sample_name = fst('sample_name')
where, if inside a function, you declare sample_name a global.
The globals() method only ever returns the globals of the module in which you call it. It'll never return the globals of whatever is calling the function. If you feel you need to have access to those globals, rethink your code, you rarely, if ever, need to alter the globals of whatever is calling your function.
If you are absolutely certain you need access to the caller globals, you need to start hacking with stack frames:
# retrieve caller globals
import sys
caller_globals = sys._getframe(1).f_globals
But, as the documentation of sys._getframe() states:
CPython implementation detail: This function should be used for internal and specialized purposes only. It is not guaranteed to exist in all implementations of Python.

"self" inside plain function?

I've got a bunch of functions (outside of any class) where I've set attributes on them, like funcname.fields = 'xxx'. I was hoping I could then access these variables from inside the function with self.fields, but of course it tells me:
global name 'self' is not defined
So... what can I do? Is there some magic variable I can access? Like __this__.fields?
A few people have asked "why?". You will probably disagree with my reasoning, but I have a set of functions that all must share the same signature (accept only one argument). For the most part, this one argument is enough to do the required computation. However, in a few limited cases, some additional information is needed. Rather than forcing every function to accept a long list of mostly unused variables, I've decided to just set them on the function so that they can easily be ignored.
Although, it occurs to me now that you could just use **kwargs as the last argument if you don't care about the additional args. Oh well...
Edit: Actually, some of the functions I didn't write, and would rather not modify to accept the extra args. By "passing in" the additional args as attributes, my code can work both with my custom functions that take advantage of the extra args, and with third party code that don't require the extra args.
Thanks for the speedy answers :)
self isn't a keyword in python, its just a normal variable name. When creating instance methods, you can name the first parameter whatever you want, self is just a convention.
You should almost always prefer passing arguments to functions over setting properties for input, but if you must, you can do so using the actual functions name to access variables within it:
def a:
if a.foo:
#blah
a.foo = false
a()
see python function attributes - uses and abuses for when this comes in handy. :D
def foo():
print(foo.fields)
foo.fields=[1,2,3]
foo()
# [1, 2, 3]
There is nothing wrong with adding attributes to functions. Many memoizers use this to cache results in the function itself.
For example, notice the use of func.cache:
from decorator import decorator
#decorator
def memoize(func, *args, **kw):
# Author: Michele Simoniato
# Source: http://pypi.python.org/pypi/decorator
if not hasattr(func, 'cache'):
func.cache = {}
if kw: # frozenset is used to ensure hashability
key = args, frozenset(kw.iteritems())
else:
key = args
cache = func.cache # attribute added by memoize
if key in cache:
return cache[key]
else:
cache[key] = result = func(*args, **kw)
return result
You can't do that "function accessing its own attributes" correctly for all situations - see for details here how can python function access its own attributes? - but here is a quick demonstration:
>>> def f(): return f.x
...
>>> f.x = 7
>>> f()
7
>>> g = f
>>> g()
7
>>> del f
>>> g()
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
File "<interactive input>", line 1, in f
NameError: global name 'f' is not defined
Basically most methods directly or indirectly rely on accessing the function object through lookup by name in globals; and if original function name is deleted, this stops working. There are other kludgey ways of accomplishing this, like defining class, or factory - but thanks to your explanation it is clear you don't really need that.
Just do the mentioned keyword catch-all argument, like so:
def fn1(oneArg):
// do the due
def fn2(oneArg, **kw):
if 'option1' in kw:
print 'called with option1=', kw['option1']
//do the rest
fn2(42)
fn2(42, option1='something')
Not sure what you mean in your comment of handling TypeError - that won't arise when using **kw. This approach works very well for some python system functions - check min(), max(), sort(). Recently sorted(dct,key=dct.get,reverse=True) came very handy to me in CodeGolf challenge :)
Example:
>>> def x(): pass
>>> x
<function x at 0x100451050>
>>> x.hello = "World"
>>> x.hello
"World"
You can set attributes on functions, as these are just plain objects, but I actually never saw something like this in real code.
Plus. self is not a keyword, just another variable name, which happens to be the particular instance of the class. self is passed implicitly, but received explicitly.
if you want globally set parameters for a callable 'thing' you could always create a class and implement the __call__ method?
There is no special way, within a function's body, to refer to the function object whose code is executing. Simplest is just to use funcname.field (with funcname being the function's name within the namespace it's in, which you indicate is the case -- it would be harder otherwise).
This isn't something you should do. I can't think of any way to do what you're asking except some walking around on the call stack and some weird introspection -- which isn't something that should happen in production code.
That said, I think this actually does what you asked:
import inspect
_code_to_func = dict()
def enable_function_self(f):
_code_to_func[f.func_code] = f
return f
def get_function_self():
f = inspect.currentframe()
code_obj = f.f_back.f_code
return _code_to_func[code_obj]
#enable_function_self
def foo():
me = get_function_self()
print me
foo()
While I agree with the the rest that this is probably not good design, the question did intrigue me. Here's my first solution, which I may update once I get decorators working. As it stands, it relies pretty heavily on being able to read the stack, which may not be possible in all implementations (something about sys._getframe() not necessarily being present...)
import sys, inspect
def cute():
this = sys.modules[__name__].__dict__.get(inspect.stack()[0][3])
print "My face is..." + this.face
cute.face = "very cute"
cute()
What do you think? :3
You could use the following (hideously ugly) code:
class Generic_Object(object):
pass
def foo(a1, a2, self=Generic_Object()):
self.args=(a1,a2)
print "len(self.args):", len(self.args)
return None
... as you can see it would allow you to use "self" as you described. You can't use an "object()" directly because you can't "monkey patch(*)" values into an object() instance. However, normal subclasses of object (such as the Generic_Object() I've shown here) can be "monkey patched"
If you wanted to always call your function with a reference to some object as the first argument that would be possible. You could put the defaulted argument first, followed by a *args and optional **kwargs parameters (through which any other arguments or dictionaries of options could be passed during calls to this function).
This is, as I said hideously ugly. Please don't ever publish any code like this or share it with anyone in the Python community. I'm only showing it here as a sort of strange educational exercise.
An instance method is like a function in Python. However, it exists within the namespace of a class (thus it must be accessed via an instance ... myobject.foo() for example) and it is called with a reference to "self" (analagous to the "this" pointer in C++) as the first argument. Also there's a method resolution process which causes the interpreter to search the namespace of the instance, then it's class, and then each of the parent classes and so on ... up through the inheritance tree.
An unbound function is called with whatever arguments you pass to it. There can't bee any sort of automatically pre-pended object/instance reference to the argument list. Thus, writing a function with an initial argument named "self" is meaningless. (It's legal because Python doesn't place any special meaning on the name "self." But meaningless because callers to your function would have to manually supply some sort of object reference to the argument list and it's not at all clear what that should be. Just some bizarre "Generic_Object" which then floats around in the global variable space?).
I hope that clarifies things a bit. It sounds like you're suffering from some very fundamental misconceptions about how Python and other object-oriented systems work.
("Monkey patching" is a term used to describe the direct manipulation of an objects attributes -- or "instance variables" by code that is not part of the class hierarchy of which the object is an instance).
As another alternative, you can make the functions into bound class methods like so:
class _FooImpl(object):
a = "Hello "
#classmethod
def foo(cls, param):
return cls.a + param
foo = _FooImpl.foo
# later...
print foo("World") # yes, Hello World
# and if you have to change an attribute:
foo.im_self.a = "Goodbye "
If you want functions to share attribute namespaecs, you just make them part of the same class. If not, give each its own class.
What exactly are you hoping "self" would point to, if the function is defined outside of any class? If your function needs some global information to execute properly, you need to send this information to the function in the form of an argument.
If you want your function to be context aware, you need to declare it within the scope of an object.

Categories