I've written a remote Python debugger and one of the features I need is to execute arbitrary code while stopped at a breakpoint. My debugger uses the following to execute code received from the remote debugger:
exec (compile(code, '<string>', 'single') , frame.f_globals, frame.f_locals)
This works fine for the most part, but I've noticed a couple issues.
Assignment statements aren't actually applied to the original locals dictionary. This is probably due to the fact that f_locals is supposed to be read-only.
If stopped within a class method, accessing protected attributes (names beginning with double underscore) does not work. I'm assuming this is due to the name mangling that Python performs on protected attributes.
So my question is, is there a way around these limitations? Can I trick Python into thinking that the code is being executed in the actual local scope of that frame?
I'm using CPython 2.7, and I'm willing to accept a solution/hack specific to this version.
Assignment statements aren't actually
applied to the original locals
dictionary. This is probably due to
the fact that f_locals is supposed to
be read-only.
Not exactly, but the bytecode for the function will not look at locals, using rather a simple but crucial optimization whereby local variables are in a simple array, avoiding runtime lookups. The only way to avoid this (and make the function much, much slower) is compiling different code, e.g. code starting with an exec '' to force the compiler to avoid the optimization (in Python 2; no way, in Python 3). If you need to work with existing bytecode, you're out of luck: there is no way to accomplish what you desire.
If stopped within a class method,
accessing protected attributes (names
beginning with double underscore) does
not work. I'm assuming this is due to
the name mangling that Python performs
on protected attributes.
Yep, so this issue does allow a workaround: prepend _Classname to the name to mimic what the compiler does. Note that double-underscore prefixes means private: protected would be a single underscore (and would give you no trouble). Private names are specifically meant to avoid accidental classes with names bound in subclasses (and work decently for that one purpose, though not perfectly, and not for anything else;-).
I'm not sure I've understood you correctly, but exec does populate the locals parameter with assignments inside the code:
>>> loc = {}
>>> exec(compile('a=3', '<string>', 'single'), {}, loc)
>>> loc
{'a': 3}
Perhaps f_locals doesn't allow writes.
to execute arbitrary code while stopped at a breakpoint ... Can I trick Python into thinking that the code is being executed in the actual local scope of that frame?
The Python debugger, pdb, allows this. For example, let's say you are debugging the file tests/scopeTest.py, and you have the following line in your program, where the variable hasn't been declared in the program itself :
print (NOT_DEFINED_IN_PROGRAM)
so that running the code python tests/scopeTest.py would result in :
NameError: name 'NOT_DEFINED_IN_PROGRAM' is not defined
Now you would like to define that variable when stopped at that line in the debugger, and have the program continue executing, using that variable as if it had been defined in the program all along. In other words, you would like to effect the change within that scope, so that you can continue execution with that change permanent. It is actually possible :
$ python -m pdb tests/scopeTest.py
> /home/user/tests/scopeTest.py(1)<module>()
-> print (NOT_DEFINED_IN_PROGRAM)
(Pdb) 'NOT_DEFINED_IN_PROGRAM' in locals()
False
(Pdb) NOT_DEFINED_IN_PROGRAM = 5
(Pdb) 'NOT_DEFINED_IN_PROGRAM' in locals()
True
(Pdb) step
5
Pdb does this through a compile and exec in its default function, which does the equivalent of :
code = compile(line + '\n', <stdin>, 'single')
exec(code, self.curframe.f_globals, self.curframe_locals)
where self.curframe is a specific frame. Now, self.curframe_locals is not self.curframe.f_locals, because, as the setup function says :
# The f_locals dictionary is updated from the actual frame
# locals whenever the .f_locals accessor is called, so we
# cache it here to ensure that modifications are not overwritten.
self.curframe_locals = self.curframe.f_locals
Hope that helps, and is what you meant!
Take note that, even then, should you want to, for example, replace a function in the context of the program being debugged with a monkey-patched version, such as:
newGlobals['abs'] = myCustomAbsFunction
exec(code, newGlobals, locals)
the scope of the myCustomAbsFunction is not going to be the user program, but is going to be the context of where that function was defined, which is the debugger! There is a way around that too, but as it wasn't specifically asked, it is left as an exercise for the reader, for now. ^__^
Related
I am reading only firstline from python using :
with open(file_path, 'r') as f:
my_count = f.readline()
print(my_count)
I am bit confused over scope of variable my_count. Although prints work fine, would it be better to do something like my_count = 0 outside with statement first (for eg in C in used to do int my_count = 0)
A with statement does not create a scope (like if, for and while do not create a scope either).
As a result, Python will analyze the code and see that you made an assignment in the with statement, and thus that will make the variable local (to the real scope).
In Python variables do not need initialization in all code paths: as a programmer, you are responsible to make sure that a variable is assigned before it is used. This can result in shorter code: say for instance you know for sure that a list contains at least one element, then you can assign in a for loop. In Java assignment in a for loop is not considered safe (since it is possible that the body of the loop is never executed).
Initialization before the with scope can be safer in the sense that after the with statement we can safely assume that the variable exists. If on the other hand the variable should be assigned in the with statement, not initializing it before the with statement actually results in an additional check: Python will error if somehow the assignment was skipped in the with statement.
A with statement is only used for context management purposes. It forces (by syntax) that the context you open in the with is closed at the end of the indentation.
You should also go through PEP-343 and Python Documentation. It will clear that its not about creating scope its about using Context Manager. I am quoting python documentation on context manager
A context manager is an object that defines the runtime context to be established when executing a with statement. The context manager handles the entry into, and the exit from, the desired runtime context for the execution of the block of code. Context managers are normally invoked using the with statement (described in section The with statement), but can also be used by directly invoking their methods.
Typical uses of context managers include saving and restoring various kinds of global state, locking and unlocking resources, closing opened files, etc.
I would like to write a function which receives a local namespace dictionary and update it. Something like this:
def UpdateLocals(local_dict):
d = {'a':10, 'b':20, 'c':30}
local_dict.update(d)
When I call this function from the interactive python shell it works all right, like this:
a = 1
UpdateLocals(locals())
# prints 20
print a
However, when I call UpdateLocals from inside a function, it doesn't do what I expect:
def TestUpdateLocals():
a = 1
UpdateLocals(locals())
print a
# prints 1
TestUpdateLocals()
How can I make the second case work like the first?
UPDATE:
Aswin's explanation makes sense and is very helpful to me. However I still want a mechanism to update the local variables. Before I figure out a less ugly approach, I'm going to do the following:
def LoadDictionary():
return {'a': 10, 'b': 20, 'c': 30}
def TestUpdateLocals():
a = 1
for name, value in LoadDictionary().iteritems():
exec('%s = value' % name)
Of course the construction of the string statements can be automated, and the details can be hidden from the user.
You have asked a very good question. In fact, the ability to update local variables is very important and crucial in saving and loading datasets for machine learning or in games. However, most developers of Python language have not come to a realization of its importance. They focus too much on conformity and optimization which is nevertheless important too.
Imagine you are developing a game or running a deep neural network (DNN), if all local variables are serializable, saving the entire game or DNN can be simply put into one line as print(locals()), and loading the entire game or DNN can be simply put into one line as locals().update(eval(sys.stdin.read())).
Currently, globals().update(...) takes immediate effect but locals().update(...) does not work because Python documentation says:
The default locals act as described for function locals() below:
modifications to the default locals dictionary should not be
attempted. Pass an explicit locals dictionary if you need to see
effects of the code on locals after function exec() returns.
Why they design Python in such way is because of optimization and conforming the exec statement into a function:
To modify the locals of a function on the fly is not possible without
several consequences: normally, function locals are not stored in a
dictionary, but an array, whose indices are determined at compile time
from the known locales. This collides at least with new locals added
by exec. The old exec statement circumvented this, because the
compiler knew that if an exec without globals/locals args occurred in
a function, that namespace would be "unoptimized", i.e. not using the
locals array. Since exec() is now a normal function, the compiler does
not know what "exec" may be bound to, and therefore can not treat is
specially.
Since global().update(...) works, the following piece of code will work in root namespace (i.e., outside any function) because locals() is the same as globals() in root namespace:
locals().update({'a':3, 'b':4})
print(a, b)
But this will not work inside a function.
However, as hacker-level Python programmers, we can use sys._getframe(1).f_locals instead of locals(). From what I have tested so far, on Python 3, the following piece of code always works:
def f1():
sys._getframe(1).f_locals.update({'a':3, 'b':4})
print(a, b)
f1()
However, sys._getframe(1).f_locals does not work in root namespace.
The locals are not updated here because, in the first case, the variable declared has a global scope. But when declared inside a function, the variable loses scope outside it.
Thus, the original value of the locals() is not changed in the UpdateLocals function.
PS: This might not be related to your question, but using camel case is not a good practice in Python. Try using the other method.
update_locals() instead of UpdateLocals()
Edit To answer the question in your comment:
There is something called a System Stack. The main job of this system stack during the execution of a code is to manage local variables, make sure the control returns to the correct statement after the completion of execution of the called function etc.,
So, everytime a function call is made, a new entry is created in that stack,
which contains the line number (or instruction number) to which the control has to return after the return statement, and a set of fresh local variables.
The local variables when the control is inside the function, will be taken from the stack entry. Thus, the set of locals in both the functions are not the same. The entry in the stack is popped when the control exits from the function. Thus, the changes you made inside the function are erased, unless and until those variables have a global scope.
I have read the following posts but I am still unsure of something.
Python Compilation/Interpretation Process
Why python compile the source to bytecode before interpreting?
If I have a single Python file myfunctions.py containing the following code.
x = 3
def f():
print x
x = 2
Then, saying $ python myfunctions.py runs perfectly fine.
But now make one small change to the above file. The new file looks as shown below.
x = 3
def f():
print x
x = 2
f() # there is a function call now
This time, the code gives out an error. Now, I am trying to understand this behavior. And so far, these are my conclusions.
Python creates bytecode for x=3
It creates a function object f, quickly scans and has bytecode which talks about the local variables within f's scope but note that the bytecode for all statements in Python are unlikely to have been constructed.
Now, Python encounters a function call, it knows this function call is legitimate because the bare minimum bytecode talking about the function object f and its local variables is present.
Now the interpreter takes the charge of executing the bytecode but from the initial footprint it knows x is a local variable here and says - "Why are you printing before you assign?"
Can someone please comment on this? Thanks in advance. And sorry if this has been addressed before.
When the interpreter reads a function, for each "name" (variable) it encounters, the interpreter decides if that name is local or non-local. The criteria that is uses is pretty simple ... Is there an assignment statement anywhere in the body to that name (barring global statements)? e.g.:
def foo():
x = 3 # interpreter will tag `x` as a local variable since we assign to it here.
If there is an assignment statement to that name, then the name is tagged as "local", otherwise, it gets tagged as non-local.
Now, in your case, you try to print a variable which was tagged as local, but you do so before you've actually reached the critical assignment statement. Python looks for a local name, but doesn't find it so it raises the UnboundLocalError.
Python is very dynamic and allows you to do lots of crazy things which is part of what makes it so powerful. The downside of this is that it becomes very difficult to check for these errors unless you actually run the function -- In fact, python has made the decision to not check anything other than syntax until the function is run. This explains why you never see the exception until you actually call your function.
If you want python to tag the variable as global, you can do so with an explicit global1 statement:
x = 3
def foo():
global x
print x
x = 2
foo() # prints 3
print x # prints 2
1python3.x takes this concept even further an introduces the nonlocal keyword
mgilson got half of the answer.
The other half is that Python doesn't go looking for errors beyond syntax errors in functions (or function objects) it is not about to execute. So in the first case, since f() doesn't get called, the order-of-operations error isn't checked for.
In this respect, it is not like C and C++, which require everything to be fully declared up front. It's kind of like C++ templates, where errors in template code might not be found until the code is actually instantiated.
Consider the following:
def test(s):
globals()['a'] = s
sandbox = {'test': test}
py_str = 'test("Setting A")\nglobals()["b"] = "Setting B"'
eval(compile(py_str, '<string>', 'exec'), sandbox)
'a' in sandbox # returns False, !What I dont want!
'b' in sandbox # returns True, What I want
'a' in globals() # returns True, !What I dont want!
'b' in globals() # returns False, What I want
I'm not even sure how to ask, but I want the global scope for a function to be the environment I intend to run it in without having to compile the function during the eval. Is this possible?
Thanks for any input
Solution
def test(s):
globals()['a'] = s
sandbox = {}
# create a new version of test() that uses the sandbox for its globals
newtest = type(test)(test.func_code, sandbox, test.func_name, test.func_defaults,
test.func_closure)
# add the sandboxed version of test() to the sandbox
sandbox["test"] = newtest
py_str = 'test("Setting A")\nglobals()["b"] = "Setting B"'
eval(compile(py_str, '<string>', 'exec'), sandbox)
'a' in sandbox # returns True
'b' in sandbox # returns True
'a' in globals() # returns False
'b' in globals() # returns False
When you call a function in Python, the global variables it sees are always the globals of the module it was defined in. (If this wasn't true, the function might not work -- it might actually need some global values, and you don't necessarily know which those are.) Specifying a dictionary of globals with exec or eval() only affects the globals that the code being exec'd or eval()'d sees.
If you want a function to see other globals, then, you do indeed have to include the function definition in the string you pass to exec or eval(). When you do, the function's "module" is the string it was compiled from, with its own globals (i.e., those you supplied).
You could get around this by creating a new function with the same code object as the one you're calling but a different func_globals attribute that points to your globals dict, but this is fairly advanced hackery and probably not worth it. Still, here's how you'd do it:
# create a sandbox globals dict
sandbox = {}
# create a new version of test() that uses the sandbox for its globals
newtest = type(test)(test.func_code, sandbox, test.func_name, test.func_defaults,
test.func_closure)
# add the sandboxed version of test() to the sandbox
sandbox["test"] = newtest
Sandboxing code for exec by providing alternative globals/locals has lots of caveats:
The alternative globals/locals only apply for the code in the sandbox. They do not affect anything outside of it, they can't affect anything outside of it, and it wouldn't make sense if they could.
To put it another way, your so-called "sandbox" passes the object test to the code ran by exec. To change the globals that test sees it would also have to modify the object, not pass it as it is. That's not really possible in any way that would keep it working, much less in a way that the object would continue to do something meaningful.
By using the alternative globals, anything in the sandbox would still see the builtins. If you want to hide some or all builtins from the code inside the sandbox you need to add a "__builtins__" key to your dictionary that points to either None (disables all the builtins) or to your version of them. This also restricts certain attributes of the objects, for example accessing func_globals attribute of a function will be disabled.
Even if you remove the builtins, the sandbox will still not be safe. Sandbox only code that you trust in the first place.
Here's a simple proof of concept:
import subprocess
code = """[x for x in ().__class__.__bases__[0].__subclasses__()
if x.__name__ == 'Popen'][0](['ls', '-la']).wait()"""
# The following runs "ls"...
exec code in dict(__builtins__=None)
# ...even though the following raises
exec "(lambda:None).func_globals" in dict(__builtins__=None)
External execution contexts are defined statically in Python (f.func_globals is read-only), so I would say that what you want is not possible. The reason is because the function could become invalid Python it its definition context is changed at runtime. If the language allowed it, it would be an extremely easy route for injection of malicious code into library calls.
def mycheck(s):
return True
exec priviledged_code in {'check_password':mycheck}
Does python have an equivalent to Tcl's uplevel command? For those who don't know, the "uplevel" command lets you run code in the context of the caller. Here's how it might look in python:
def foo():
answer = 0
print "answer is", answer # should print 0
bar()
print "answer is", answer # should print 42
def bar():
uplevel("answer = 42")
It's more than just setting variables, however, so I'm not looking for a solution that merely alters a dictionary. I want to be able to execute any code.
In general, what you ask is not possible (with the results you no doubt expect). E.g., imagine the "any code" is x = 23. Will this add a new variable x to your caller's set of local variables, assuming you do find a black-magical way to execute this code "in the caller"? No it won't -- the crucial optimization performed by the Python compiler is to define once and for all, when def executes, the exact set of local variables (all the barenames that get assigned, or otherwise bound, in the function's body), and turn every access and setting to those barenames into very fast indexing into the stackframe. (You could systematically defeat that crucial optimization e.g. by having an exec '' at the start of every possible caller -- and see your system's performance crash through the floor in consequence).
Except for assigning to the caller's local barenames, exec thecode in thelocals, theglobals may do roughly what you want, and the inspect module lets you get the locals and globals of the caller in a semi-reasonable way (in as far as deep black magic -- which would make me go postal on any coworker suggesting it be perpetrated in production code -- can ever be honored with the undeserved praise of calling it "semi-reasonable", that is;-).
But you do specify "I want to be able to execute any code." and the only solution to that unambiguous specification (and thanks for being so precise, as it makes answering easier!) is: then, use a different programming language.
Is the third party library written in Python? If yes, you could rewrite and rebind the function "foo" at runtime with your own implementation. Like so:
import third_party
original_foo = third_party.foo
def my_foo(*args, **kwds):
# do your magic...
original_foo(*args, **kwds)
third_party.foo = my_foo
I guess monkey-patching is slighly better than rewriting frame locals. ;)