Python C API - How to assign a value to eval expression? - python

Is it possible to assign a value to an "eval expression" without manipulating the evaluation string? Example: The user writes the expression
"globalPythonArray[10]"
which would evaluate to the current value of item 10 of globalPythonArray. But the goal is, to set the value of item 10 to a new value instead of getting the old value. A dirty workaround would be, to define a temporary variable "newValue" and extend the evaluation string to
"globalPythonArray[10] = newValue"
and compile and evaluate that modified string. Are there some low level Python C API functions that I can use such that I don't have to manipulate the evaluation string?

I'd say probably not, since accessing and storing subscriptions are different opcodes:
>>> dis.dis(compile('globalPythonArray[10]', 'a', 'exec'))
1 0 LOAD_NAME 0 (globalPythonArray)
2 LOAD_CONST 0 (10)
4 BINARY_SUBSCR
6 POP_TOP
8 LOAD_CONST 1 (None)
10 RETURN_VALUE
>>> dis.dis(compile('globalPythonArray[10] = myValue', 'a', 'exec'))
1 0 LOAD_NAME 0 (myValue)
2 LOAD_NAME 1 (globalPythonArray)
4 LOAD_CONST 0 (10)
6 STORE_SUBSCR
8 LOAD_CONST 1 (None)
10 RETURN_VALUE
Also, insert the usual warning about user input and eval() here:
globalPythonArray[__import__('os').system('rm -rf /')]

It's possible to "assign" a value to an eval expression by manipulating its abstract syntax tree (AST). It's not necessary to modify the evaluation string directly and if the type of the new value is not too complicated (e.g. numeric or string), you can hard code it into the AST:
Compile eval expression to an AST.
Replace Load context of expression at root node by Store.
Create a new AST with an Assign statement at the root node.
Set target to the expression node of the modified eval AST.
Set value to the value.
Compile the new AST to byte code and execute it.
Example:
import ast
import numpy as np
def eval_assign_num(expression, value, global_dict, local_dict):
expr_ast = ast.parse(expression, 'eval', 'eval')
expr_node = expr_ast.body
expr_node.ctx = ast.Store()
assign_ast = ast.Module(body=[
ast.Assign(
targets=[expr_node],
value=ast.Num(n=value)
)
])
ast.fix_missing_locations(assign_ast)
c = compile(assign_ast, 'assign', 'exec')
exec(c, global_dict, local_dict)
class TestClass:
arr = np.array([1, 2])
x = 6
testClass = TestClass()
arr = np.array([1, 2])
eval_assign_num('arr[0]', 10, globals(), locals())
eval_assign_num('testClass.arr[1]', 20, globals(), locals())
eval_assign_num('testClass.x', 30, globals(), locals())
eval_assign_num('newVarName', 40, globals(), locals())
print('arr', arr)
print('testClass.arr', testClass.arr)
print('testClass.x', testClass.x)
print('newVarName', newVarName)
Output:
arr [10 2]
testClass.arr [ 1 20]
testClass.x 30
newVarName 40

Related

Detect all global variables within a python function?

I am trying to analyze some messy code, that happens to use global variables quite heavily within functions (I am trying to refactor the code so that functions only use local variables). Is there any way to detect global variables within a function?
For example:
def f(x):
x = x + 1
z = x + y
return z
Here the global variable is y since it isn't given as an argument, and neither is it created within the function.
I tried to detect global variables within the function using string parsing, but it was getting a bit messy; I was wondering if there was a better way to do this?
Edit: If anyone is interested this is the code I am using to detect global variables (based on kindall's answer and Paolo's answer to this question: Capture stdout from a script in Python):
from dis import dis
def capture(f):
"""
Decorator to capture standard output
"""
def captured(*args, **kwargs):
import sys
from cStringIO import StringIO
# setup the environment
backup = sys.stdout
try:
sys.stdout = StringIO() # capture output
f(*args, **kwargs)
out = sys.stdout.getvalue() # release output
finally:
sys.stdout.close() # close the stream
sys.stdout = backup # restore original stdout
return out # captured output wrapped in a string
return captured
def return_globals(f):
"""
Prints all of the global variables in function f
"""
x = dis_(f)
for i in x.splitlines():
if "LOAD_GLOBAL" in i:
print i
dis_ = capture(dis)
dis_(f)
dis by default does not return output, so if you want to manipulate the output of dis as a string, you have to use the capture decorator written by Paolo and posted here: Capture stdout from a script in Python
Inspect the bytecode.
from dis import dis
dis(f)
Result:
2 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (1)
6 BINARY_ADD
7 STORE_FAST 0 (x)
3 10 LOAD_FAST 0 (x)
13 LOAD_GLOBAL 0 (y)
16 BINARY_ADD
17 STORE_FAST 1 (z)
4 20 LOAD_FAST 1 (z)
23 RETURN_VALUE
The global variables will have a LOAD_GLOBAL opcode instead of LOAD_FAST. (If the function changes any global variables, there will be STORE_GLOBAL opcodes as well.)
With a little work, you could even write a function that scans the bytecode of a function and returns a list of the global variables it uses. In fact:
from dis import HAVE_ARGUMENT, opmap
def getglobals(func):
GLOBAL_OPS = opmap["LOAD_GLOBAL"], opmap["STORE_GLOBAL"]
EXTENDED_ARG = opmap["EXTENDED_ARG"]
func = getattr(func, "im_func", func)
code = func.func_code
names = code.co_names
op = (ord(c) for c in code.co_code)
globs = set()
extarg = 0
for c in op:
if c in GLOBAL_OPS:
globs.add(names[next(op) + next(op) * 256 + extarg])
elif c == EXTENDED_ARG:
extarg = (next(op) + next(op) * 256) * 65536
continue
elif c >= HAVE_ARGUMENT:
next(op)
next(op)
extarg = 0
return sorted(globs)
print getglobals(f) # ['y']
As mentioned in the LOAD_GLOBAL documentation:
LOAD_GLOBAL(namei)
Loads the global named co_names[namei] onto the stack.
This means you can inspect the code object for your function to find globals:
>>> f.__code__.co_names
('y',)
Note that this isn't sufficient for nested functions (nor is the dis.dis method in #kindall's answer). In that case, you will need to look at constants too:
# Define a function containing a nested function
>>> def foo():
... def bar():
... return some_global
# It doesn't contain LOAD_GLOBAL, so .co_names is empty.
>>> dis.dis(foo)
2 0 LOAD_CONST 1 (<code object bar at 0x2b70440c84b0, file "<ipython-input-106-77ead3dc3fb7>", line 2>)
3 MAKE_FUNCTION 0
6 STORE_FAST 0 (bar)
9 LOAD_CONST 0 (None)
12 RETURN_VALUE
# Instead, we need to walk the constants to find nested functions:
# (if bar contain a nested function too, we'd need to recurse)
>>> from types import CodeType
>>> for constant in foo.__code__.co_consts:
... if isinstance(constant, CodeType):
... print constant.co_names
('some_global',)

Hard coded variables in python function

Sometimes, some values/strings are hard-coded in functions. For example in the following function, I define a "constant" comparing string and check against it.
def foo(s):
c_string = "hello"
if s == c_string:
return True
return False
Without discussing too much about why it's bad to do this, and how it should be defined in the outer scope, I'm wondering what happens behind the scenes when it is defined this way.
Does the string get created each call?
If instead of the string "hello" it was the list: [1,2,3] (or a list with mutable content if it matters) would the same happen?
Because the string is immutable (as would a tuple), it is stored with the bytecode object for the function. It is loaded by a very simple and fast index lookup. This is actually faster than a global lookup.
You can see this in a disassembly of the bytecode, using the dis.dis() function:
>>> import dis
>>> def foo(s):
... c_string = "hello"
... if s == c_string:
... return True
... return False
...
>>> dis.dis(foo)
2 0 LOAD_CONST 1 ('hello')
3 STORE_FAST 1 (c_string)
3 6 LOAD_FAST 0 (s)
9 LOAD_FAST 1 (c_string)
12 COMPARE_OP 2 (==)
15 POP_JUMP_IF_FALSE 22
4 18 LOAD_GLOBAL 0 (True)
21 RETURN_VALUE
5 >> 22 LOAD_GLOBAL 1 (False)
25 RETURN_VALUE
>>> foo.__code__.co_consts
(None, 'hello')
The LOAD_CONST opcode loads the string object from the co_costs array that is part of the code object for the function; the reference is pushed to the top of the stack. The STORE_FAST opcode takes the reference from the top of the stack and stores it in the locals array, again a very simple and fast operation.
For mutable literals ({..}, [..]) special opcodes build the object, with the contents still treated as constants as much as possible (more complex structures just follow the same building blocks):
>>> def bar(): return ['spam', 'eggs']
...
>>> dis.dis(bar)
1 0 LOAD_CONST 1 ('spam')
3 LOAD_CONST 2 ('eggs')
6 BUILD_LIST 2
9 RETURN_VALUE
The BUILD_LIST call creates the new list object, using two constant string objects.
Interesting fact: If you used a list object for a membership test (something in ['option1', 'option2', 'option3'] Python knows the list object will never be mutated and will convert it to a tuple for you at compile time (a so-called peephole optimisation). The same applies to a set literal, which is converted to a frozenset() object, but only in Python 3.2 and newer. See Tuple or list when using 'in' in an 'if' clause?
Note that your sample function is using booleans rather verbosely; you could just have used:
def foo(s):
c_string = "hello"
return s == c_string
for the exact same result, avoiding the LOAD_GLOBAL calls in Python 2 (Python 3 made True and False keywords so the values can also be stored as constants).

Why can I use the same name for iterator and sequence in a Python for loop?

This is more of a conceptual question. I recently saw a piece of code in Python (it worked in 2.7, and it might also have been run in 2.5 as well) in which a for loop used the same name for both the list that was being iterated over and the item in the list, which strikes me as both bad practice and something that should not work at all.
For example:
x = [1,2,3,4,5]
for x in x:
print x
print x
Yields:
1
2
3
4
5
5
Now, it makes sense to me that the last value printed would be the last value assigned to x from the loop, but I fail to understand why you'd be able to use the same variable name for both your parts of the for loop and have it function as intended. Are they in different scopes? What's going on under the hood that allows something like this to work?
What does dis tell us:
Python 3.4.1 (default, May 19 2014, 13:10:29)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from dis import dis
>>> dis("""x = [1,2,3,4,5]
... for x in x:
... print(x)
... print(x)""")
1 0 LOAD_CONST 0 (1)
3 LOAD_CONST 1 (2)
6 LOAD_CONST 2 (3)
9 LOAD_CONST 3 (4)
12 LOAD_CONST 4 (5)
15 BUILD_LIST 5
18 STORE_NAME 0 (x)
2 21 SETUP_LOOP 24 (to 48)
24 LOAD_NAME 0 (x)
27 GET_ITER
>> 28 FOR_ITER 16 (to 47)
31 STORE_NAME 0 (x)
3 34 LOAD_NAME 1 (print)
37 LOAD_NAME 0 (x)
40 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
43 POP_TOP
44 JUMP_ABSOLUTE 28
>> 47 POP_BLOCK
4 >> 48 LOAD_NAME 1 (print)
51 LOAD_NAME 0 (x)
54 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
57 POP_TOP
58 LOAD_CONST 5 (None)
61 RETURN_VALUE
The key bits are sections 2 and 3 - we load the value out of x (24 LOAD_NAME 0 (x)) and then we get its iterator (27 GET_ITER) and start iterating over it (28 FOR_ITER). Python never goes back to load the iterator again.
Aside: It wouldn't make any sense to do so, since it already has the iterator, and as Abhijit points out in his answer, Section 7.3 of Python's specification actually requires this behavior).
When the name x gets overwritten to point at each value inside of the list formerly known as x Python doesn't have any problems finding the iterator because it never needs to look at the name x again to finish the iteration protocol.
Using your example code as the core reference
x = [1,2,3,4,5]
for x in x:
print x
print x
I would like you to refer the section 7.3. The for statement in the manual
Excerpt 1
The expression list is evaluated once; it should yield an iterable
object. An iterator is created for the result of the expression_list.
What it means is that your variable x, which is a symbolic name of an object list : [1,2,3,4,5] is evaluated to an iterable object. Even if the variable, the symbolic reference changes its allegiance, as the expression-list is not evaluated again, there is no impact to the iterable object that has already been evaluated and generated.
Note
Everything in Python is an Object, has an Identifier, attributes and methods.
Variables are Symbolic name, a reference to one and only one object at any given instance.
Variables at run-time can change its allegiance i.e. can refer to some other object.
Excerpt 2
The suite is then executed once for each item provided by the
iterator, in the order of ascending indices.
Here the suite refers to the iterator and not to the expression-list. So, for each iteration, the iterator is executed to yield the next item instead of referring to the original expression-list.
It is necessary for it to work this way, if you think about it. The expression for the sequence of a for loop could be anything:
binaryfile = open("file", "rb")
for byte in binaryfile.read(5):
...
We can't query the sequence on each pass through the loop, or here we'd end up reading from the next batch of 5 bytes the second time. Naturally Python must in some way store the result of the expression privately before the loop begins.
Are they in different scopes?
No. To confirm this you could keep a reference to the original scope dictionary (locals()) and notice that you are in fact using the same variables inside the loop:
x = [1,2,3,4,5]
loc = locals()
for x in x:
print locals() is loc # True
print loc["x"] # 1
break
What's going on under the hood that allows something like this to
work?
Sean Vieira showed exactly what is going on under the hood, but to describe it in more readable python code, your for loop is essentially equivalent to this while loop:
it = iter(x)
while True:
try:
x = it.next()
except StopIteration:
break
print x
This is different from the traditional indexing approach to iteration you would see in older versions of Java, for example:
for (int index = 0; index < x.length; index++) {
x = x[index];
...
}
This approach would fail when the item variable and the sequence variable are the same, because the sequence x would no longer be available to look up the next index after the first time x was reassigned to the first item.
With the former approach, however, the first line (it = iter(x)) requests an iterator object which is what is actually responsible for providing the next item from then on. The sequence that x originally pointed to no longer needs to be accessed directly.
It's the difference between a variable (x) and the object it points to (the list). When the for loop starts, Python grabs an internal reference to the object pointed to by x. It uses the object and not what x happens to reference at any given time.
If you reassign x, the for loop doesn't change. If x points to a mutable object (e.g., a list) and you change that object (e.g., delete an element) results can be unpredictable.
Basically, the for loop takes in the list x, and then, storing that as a temporary variable, reassigns a x to each value in that temporary variable. Thus, x is now the last value in the list.
>>> x = [1, 2, 3]
>>> [x for x in x]
[1, 2, 3]
>>> x
3
>>>
Just like in this:
>>> def foo(bar):
... return bar
...
>>> x = [1, 2, 3]
>>> for x in foo(x):
... print x
...
1
2
3
>>>
In this example, x is stored in foo() as bar, so although x is being reassigned, it still exist(ed) in foo() so that we could use it to trigger our for loop.
x no longer refers to the original x list, and so there's no confusion. Basically, python remembers it's iterating over the original x list, but as soon as you start assigning the iteration value (0,1,2, etc) to the name x, it no longer refers to the original x list. The name gets reassigned to the iteration value.
In [1]: x = range(5)
In [2]: x
Out[2]: [0, 1, 2, 3, 4]
In [3]: id(x)
Out[3]: 4371091680
In [4]: for x in x:
...: print id(x), x
...:
140470424504688 0
140470424504664 1
140470424504640 2
140470424504616 3
140470424504592 4
In [5]: id(x)
Out[5]: 140470424504592

Writing (and not) to global variable in Python

Coming from much less dynamic C++, I have some trouble understanding the behaviour of this Python (2.7) code.
Note: I am aware that this is bad programming style / evil, but I would like to understand it non the less.
vals = [1,2,3]
def f():
vals[0] = 5
print 'inside', vals
print 'outside', vals
f()
print 'outside', vals
This code runs without error, and f manipulates the (seemingly) global list. This is contrary to my prior understanding that global variables that are to be manipulated (and not only read) in a function must be declared as global ....
On the other hand, if I replace vals[0] = 5 with vals += [5,6], execution fails with an UnboundLocalError unless I add a global vals to f. This is what I would have expected to happen in the first case as well.
Could you explain this behaviour?
Why can I manipulate vals in the first case? Why does the second type of manipulation fail while the first does not?
Update:
It was remarked in a comment that vals.extend(...) works without global. This adds to my confusion - why is += treated differently from a call to extend?
global is only needed when you are trying to change the object which the variable references. Because vals[0] = 5 changes the actual object rather than the reference, no error is raised. However, with vals += [5, 6], the interpreter tries to find a local variable because it can't change the global variable.
The confusing thing is that using the += operator with list modifies the original list, like vals[0] = 5. And whereas vals += [5, 6] fails, vals.extend([5, 6]) works. We can enlist the help of dis.dis to lend us some clues.
>>> def a(): v[0] = 1
>>> def b(): v += [1]
>>> def c(): v.extend([1])
>>> import dis
>>> dis.dis(a)
1 0 LOAD_CONST 1 (1)
3 LOAD_GLOBAL 0 (v)
6 LOAD_CONST 2 (0)
9 STORE_SUBSCR
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
>>> dis.dis(b)
1 0 LOAD_FAST 0 (v)
3 LOAD_CONST 1 (1)
6 BUILD_LIST 1
9 INPLACE_ADD
10 STORE_FAST 0 (v)
13 LOAD_CONST 0 (None)
16 RETURN_VALUE
d
>>> dis.dis(c)
1 0 LOAD_GLOBAL 0 (v)
3 LOAD_ATTR 1 (extend)
6 LOAD_CONST 1 (1)
9 BUILD_LIST 1
12 CALL_FUNCTION 1
15 POP_TOP
16 LOAD_CONST 0 (None)
19 RETURN_VALUE
We can see that functions a and c use LOAD_GLOBAL, whereas b tries to use LOAD_FAST. We can see now why using += won't work - the interpreter tries to load v as a local variable because of it's default behaviour with in-place addition. Because it can't know whether v is a list or not, it essentially assumes that the line means the same as v = v + [1].
global is needed when you want to assign to a variable in the outer scope. If you don't use global, Python will consider vals as a local variable when doing assignments.
+= is an assignment (an augmented assignment) and vals += [5, 6] is equivalent to reading vals, then append [5, 6] to that value and assign the resulting list back to the original vals. Because vals += [5,6] has no global statement, Python sees the assignment and treats vals as local. You didn't create a local variable called vals but you try to append to it and from here the UnboundLocalError.
But for reading it is not necessary to use global. The variable will be looked up locally first then, if it's not found in the local scope, it's looked up in the outer scope and so on. And since you are dealing with a reference type you get back a reference when you do the read. You can change the content of the object trough that reference.
That's why .extend() works (because it's called on the reference and acts on the object itself) while vals += [5, 6] fails (because vals is neither local nor marked global).
Here is a modified example to try out (using a local vals clears the UnboundLocalError):
vals = [1, 2, 3]
def f():
vals = []
vals += [5,6]
print 'inside', vals
print 'outside', vals
f()
print 'outside', vals
As long as you do not change object reference, Python will preserve global object. Compare
In [114]: vals = [1,2,3]
In [116]: id(vals)
Out[116]: 144255596
In [118]: def func():
vals[0] = 5
return id(vals)
.....:
In [119]: func()
Out[119]: 144255596
In [120]: def func_update():
vals = vals
return id(vals)
.....:
In [121]: func_update()
---------------------------------------------------------------------------
UnboundLocalError Traceback (most recent call last)
/homes/markg/<ipython-input-121-f1149c600a85> in <module>()
----> 1 func_update()
/homes/markg/<ipython-input-120-257ba6ff792a> in func_update()
1 def func_update():
----> 2 vals = vals
3 return id(vals)
UnboundLocalError: local variable 'vals' referenced before assignment
The moment you try assignment, Python regards vals as local variable - and (oops) it's not there!

exec() bytecode with arbitrary locals?

Suppose I want to execute code, for example
value += 5
inside a namespace of my own (so the result is essentially mydict['value'] += 5). There's a function exec(), but I have to pass a string there:
exec('value += 5', mydict)
and passing statements as strings seems strange (e.g. it's not colorized that way).
Can it be done like:
def block():
value += 5
???(block, mydict)
? The obvious candidate for last line was exec(block.__code__, mydict), but no luck: it raises UnboundLocalError about value. I believe it basically executes block(), not the code inside block, so assignments aren't easy – is that correct?
Of course, another possible solution would be to disassembly block.__code__...
FYI, I got the question because of this thread. Also, this is why some (me undecided) call for new syntax
using mydict:
value += 5
Note how this doesn't throw error but doesn't change mydict either:
def block(value = 0):
value += 5
block(**mydict)
You can pass bytecode instead of a string to exec, you just need to make the right bytecode for the purpose:
>>> bytecode = compile('value += 5', '<string>', 'exec')
>>> mydict = {'value': 23}
>>> exec(bytecode, mydict)
>>> mydict['value']
28
Specifically, ...:
>>> import dis
>>> dis.dis(bytecode)
1 0 LOAD_NAME 0 (value)
3 LOAD_CONST 0 (5)
6 INPLACE_ADD
7 STORE_NAME 0 (value)
10 LOAD_CONST 1 (None)
13 RETURN_VALUE
the load and store instructions must be of the _NAME persuasion, and this compile makes them so, while...:
>>> def f(): value += 5
...
>>> dis.dis(f.func_code)
1 0 LOAD_FAST 0 (value)
3 LOAD_CONST 1 (5)
6 INPLACE_ADD
7 STORE_FAST 0 (value)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
...code in a function is optimized to use the _FAST versions, and those don't work on a dict passed to exec. If you started somehow with a bytecode using the _FAST instructions, you could patch it to use the _NAME kind instead, e.g. with bytecodehacks or some similar approach.
Use the global keyword to force dynamic scoping on any variables you want to modify from within the block:
def block():
global value
value += 5
mydict = {"value": 42}
exec(block.__code__, mydict)
print(mydict["value"])
Here is a crazy decorator to create such a block that uses "custom locals". In reality it is a quick hack to turn all variable access inside the function to global access, and evaluate the result with the custom locals dictionary as environment.
import dis
import functools
import types
import string
def withlocals(func):
"""Decorator for executing a block with custom "local" variables.
The decorated function takes one argument: its scope dictionary.
>>> #withlocals
... def block():
... counter += 1
... luckynumber = 88
>>> d = {"counter": 1}
>>> block(d)
>>> d["counter"]
2
>>> d["luckynumber"]
88
"""
def opstr(*opnames):
return "".join([chr(dis.opmap[N]) for N in opnames])
translation_table = string.maketrans(
opstr("LOAD_FAST", "STORE_FAST"),
opstr("LOAD_GLOBAL", "STORE_GLOBAL"))
c = func.func_code
newcode = types.CodeType(c.co_argcount,
0, # co_nlocals
c.co_stacksize,
c.co_flags,
c.co_code.translate(translation_table),
c.co_consts,
c.co_varnames, # co_names, name of global vars
(), # co_varnames
c.co_filename,
c.co_name,
c.co_firstlineno,
c.co_lnotab)
#functools.wraps(func)
def wrapper(mylocals):
return eval(newcode, mylocals)
return wrapper
if __name__ == '__main__':
import doctest
doctest.testmod()
This is just a monkey-patching adaption of someone's brilliant recipe for a goto decorator
From S.Lott's comment above I think I get the idea for an answer using creation of new class.
class _(__metaclass__ = change(mydict)):
value += 1
...
where change is a metaclass whose __prepare__ reads dictionary and whose __new__ updates dictionary.
For reuse, the snippet below would work, but it's kind of ugly:
def increase_value(d):
class _(__metaclass__ = change(d)):
value += 1
...
increase_value(mydict)

Categories