Dynamic creation of variables and modification of globals()

Dynamic creation of variables and modification of globals() - python

If I do this:
newvar = raw_input()
globals()[newvar] = 4
It is clear that the resulting variable is created at runtime, simply because it's the only possibility. However, if I do this:
globals()['y']=3
It seems that y is also created at runtime. Why is it the case? Where does the dynamic behavior come from?
PS: I am aware that this is a bad practice, I just want to understand.

Your module's (or exec context's, etc.) globals are a dict, and globals() just returns that dict. After that, the ['y'] = 3 part is just like any other dictionary assignment.
If you're asking why Python doesn't optimize this into a static assignment… well, think about what it would have to do.
First, detecting that 'y' is a literal is pretty easy; that information is right there in the AST.
But detecting that the dict is your module's global dictionary is a lot harder. globals isn't a keyword or anything else magical, it's just a regular function in builtins. You could hide it with a global, nonlocal, or local name, or monkeypatch builtins, or even replace the builtins for your globals so it's not accessible. So, it would have to do sufficient analysis to determine that there's no way the name lookup on globals could possibly return anything but the appropriate global dict.
And, not only that, in order to make this useful, the language would have to require every implementation to make the same optimization. Otherwise, you could get different semantics from some programs based on whether the optimization took place.
It's also worth keeping in mind that CPython doesn't do anything beyond basic peephole optimization, so you'd have to build the infrastructure for a more complicated optimizer from scratch just to add this one minor change.
On top of that, references to the same global dictionary are stored all over the place. So, even with this optimization, you could still trick Python just as easily:
g = globals()
g['y'] = 3
globals().__getitem__('globals')()['y'] = 3
def f(): pass
f.__globals__['y'] = 3
inspect.currentframe().f_globals['y'] = 3

Related

Python - Bad practice to store instance vars in local vars to avoid "self"?

I've been mostly programming in Java and I find Pythons explicit self referencing to class members to be ugly. I really don't like how all the "self."s clutter down my methods, so I find myself wanting to store instance variables in local variables just to get rid of it. For example, I would replace this:
def insert(self, data, priority):
self.list.append(self.Node(data, priority))
index = len(self)-1
while self.list[index].priority < self.list[int(index/2)].priority:
self.list[index], self.list[int(index/2)] = self.list[int(index/2)], self.list[index]
index = int(index/2)
with this:
def insert(self, data, priority):
l = self.list
l.append(self.Node(data, priority))
index = len(self)-1
while l[index].priority < l[int(index/2)].priority:
l[index], l[int(index/2)] = l[int(index/2)], l[index]
index = int(index/2)
Normally I would name the local variable the same as the instance variable, but "list" is reserved so I went with "l". My question is: is this considered bad practice in the Python community?

Easier answer first. In Python, underscore is used to avoid clashes with keywords and builtins:
list_ = self.list
This will be understood by Python programmers as the right way.
As for making local variables for properties, it depends. Grepping codebase of Plone (and even standard library) shows, that x = self.x is used, especially,
context = self.context
As pointed out in comments, it's potentially error-prone, because binding another value to local variable will not affect the property.
On the other hand, if some attribute is read-only in the method, it makes code much more readable. So, it's ok if variable use is local enough, say, like let-clauses in functional programming languages.
Sometimes properties are actually functions, so self.property will be calculated each time. (It's another question how "pythonic" is doing extensive calculations for property getters) (thanks Python #property versus getters and setters for a ready example):
class MyClass(object):
...
#property
def my_attr(self):
...
#my_attr.setter
def my_attr(self, value):
...
In summary, use sparingly, with care, do not make it a rule.

I agree that explicitly adding "self" (or "this" for other languages) isn't very appealing for the eye. But as people said, python follows the philosophy "explicit is better than implicit". Therefore it really wants you to express the scope of the variable you want to access.
Java won't let you use variables you didn't declare, so there are no chances for confusion. But in python if the "self" was optional, for the assignment a = 5 it would not be clear whether to create a member or local variable. So the explicit self is required at some places. Accessing would work the same though. Note that also Java requires an explicit this for name clashes.
I just counted the selfs in some spaghetti code of mine. For 1000 lines of code there's more than 500 appearances of self. Now the code indeed isn't that readable, but the problem isn't the repeated use of self. For your code example above: the 2nd version has a shorter line length, which makes it easier and/or faster to comprehend. I would say your example is an acceptable case.

Limitations of variables in python

I realize this may be a bit broad, and thought this was an interesting question that I haven't really seen an answer to. It may be hidden in the python documentation somewhere, but as I'm new to python haven't gone through all of it yet.
So.. are there any general rules of things that we cannot set to be variables? Everything in python is an object and we can use variables for the typical standard usage of storing strings, integers, aliasing variables, lists, calling references to classes, etc and if we're clever even something along the lines as the below that I can think of off the top of my head, wherever this may be useful
var = lambda: some_function()
storing comparison operators to clean code up such as:
var = some_value < some_value ...
So, that being said I've never come across anything that I couldn't store as a variable if I really wanted to, and was wondering if there really are any limitations?

You can't store syntactical constructs in a variable. For example, you can't do
command = break
while condition:
if other_condition:
command
or
operator = +
three = 1 operator 2

You can't really store expressions and statements as objects in Python.
Sure, you can wrap an expression in a lambda, and you can wrap a series of statements in a code object or callable, but you can't easily manipulate them. For instance, changing all instances of addition to multiplication is not readily possible.
To some extent, this can be worked around with the ast module, which provides for parsing Python code into abstract syntax trees. You can then manipulate the trees, instead of the code itself, and pass it to compile() to turn it back into a code object.
However, this is a form of indirection, compensating for a feature Python itself lacks. ast can't really compare to the anything-goes flexibility of (say) Lisp macros.

According to the Language Reference, the right hand side of an assignment statement can be an 'expression list' or a 'yield expression'. An expression list is a comma-separated list of one or more expressions. You need to follow this through several more tokens to come up with anything concrete, but ultimately you can find that an 'expression' is any number of objects (literals or variable names, or the result of applying a unary operator such as not, ~ or - to a nested expression_list) chained together by any binary operator (such as the arithmetic, comparison or bitwise operators, or logical and and or) or the ternary a if condition else b.
You can also note in other parts of the language reference that an 'expression' is exactly something you can use as an argument to a function, or as the first part (before the for) of a list comprehension or generator expression.
This is a fairly broad definition - in fact, it amounts to "anything Python resolves to an object". But it does leave out a few things - for example, you can't directly store the less-than operator < in a variable, since it isn't a valid expression by itself (it has to be between two other expressions) and you have to put it in a function that uses it instead. Similarly, most of the Python keywords aren't expressions (the exceptions are True, False and None, which are all canonical names for certain objects).
Note especially that functions are also objects, and hence the name of a function (without calling it) is a valid expression. This means that your example:
var = lambda: some_function()
can be written as:
var = some_function

By definition, a variable is something which can vary, or change. In its broadest sense, a variable is no more than a way of referring to a location in memory in your given program. Another way to think of a variable is as a container to place your information in.
Unlike popular strongly typed languages, variable declaration in Python is not required. You can place pretty much anything in a variable so long as you can come up with a name for it. Furthermore, in addition to the value of a variable in Python being capable of changing, the type often can as well.
To address your question, I would say the limitations on a variable in Python relate only to a few basic necessary attributes:
A name
A scope
A value
(Usually) a type
As a result, things like operators (+ or * for instance) cannot be stored in a variable as they do not meet these basic requirements, and in general you cannot store expressions themselves as variables (unless you're wrapping them in a lambda expression).
As mentioned by Kevin, it's also worth noting that it is possible to sort of store an operator in a variable using the operator module , however even doing so you cannot perform the kinds of manipulations that a variable is otherwise subject to as really you are just making a value assignment. An example of the operator module:
import operator
operations = {"+": operator.add,
"-": operator.sub,}
operator_variable_string= input('Give me an operand:')
operator_function = operations[operator_variable_string]
result = operator_function(8, 4)

Is there a python naming convention for avoiding conflicts with standard module names?

PEP 8 recommends using a single trailing underscore to avoid conflicts with python keywords, but what about conflicts with module names for standard python modules? Should that also be a single trailing underscore?
I'm imagining something like this:
import time
time_ = time.time()

PEP 8 doesn't seem to address it directly.
The trailing underscore is obviously necessary when you're colliding with a keyword, because your code would otherwise raise a SyntaxError (or, if you're really unlucky, compile to mean something completely different than you intended).
So, even in contexts where you have a class attribute, instance attribute, function parameter, or local variable that you want to name class, you have to go with class_ instead.
But the same isn't true for time. And I think in those cases, you shouldn't postfix an underscore for time.
There's precedent for that—multiple classes in the stdlib itself have methods or data attributes named time (and none of them have time_).
Of course there's the case where you're creating a name at the same scope as the module (usually meaning a global variable or function). Then you've got much more potential for confusion, and hiding the ability to access anything on the time module for the rest of the current scope.
I think 90% of the time, the answer is going to be "That shouldn't be a global".
But that still leaves the other 10%.
And there's also the case where your name is in a restricted namespace, but that namespace is a local scope inside a function where you need to access the time module.
Or, maybe, in a long, complicated function (which you shouldn't have any of, but… sometimes you do). If it wouldn't be obvious to a human reader that time is a local rather than the module, that's just as bad as confusing the interpreter.
Here, I think that 99% of the remaining time, the answer is "Just pick a different name".
For example, look at this code:
def dostuff(iterable):
time = time.time()
for thing in iterable:
dothing(thing)
return time.time() - time # oops!
The obvious answer here is to rename the variable start or t0 or something else. Besides solving the problem, it's also a more meaningful name.
But that still leaves the 1%.
For example, there are libraries that generate Python code out of, say, a protocol specification, or a .NET or ObjC interface, where the names aren't under your control; all you can do is apply some kind of programmatic and unambiguous rule to the translated names. In that case, I think a rule that appends _ to stdlib module names as well as keywords might be a good idea.
You can probably come up with other examples where the variable can't just be arbitrarily renamed, and has to (at least potentially) live in the same scope as the time module, and so on. In any such cases, I'd go for the _ suffix.

Is it a good idea to using class as a namespace in Python

I am putting a bunch of related stuff into a class. The main purpose is to organize them into a namespace.
class Direction:
north = 0
east = 1
south = 2
west = 3
#staticmethod
def turn_right(d):
return turn_to_the_right
#staticmethod
def turn_left(d):
return turn_to_the_left
# defined a short alias because direction will be used a lot
D = Direction
d0 = D.north
d1 = D.turn_right(d)
There is not much object concept involved. In C++, I will be using the actual language keyword namespace. There is no such thing in Python. So I am trying to use class for this purpose.
Is this a good idea? Any pitfall with this approach?
I've just answer a related question yesterday. This question is asked in a different way. It is an actual decision I need to make for myself.
Static method vs module function in python - Stack Overflow
Static method vs module function in python

Yes, indeed. You can use Python classes strictly for namespacing as that is one of the special things they can do and do differently than modules. It's a lot easier to define a class as a namespace inline in a file than to generate more files.
You should not do it without commenting your code saying what it's for.
Python classes come in a lot of different forms and purposes and this makes difficulty understanding code you have not seen before.
A Python class used as a namespace is no less a Python class than one that meets the perception of what a class is in other languages. Python does not require a class to be instantiated to be useful. It does not require ivars and does not require methods. It is fairly flexible.
Clases can contain other classes too.
Lots of people have their ideas about what is or isn't Pythonic.
But if they were all worried about something like consistency, they'd push to have things like len() dir() and help() be a method of objects rather than a global function.
Do what works, comment / document it if it isn't usual or obvious usage.

No. Stick it in a module instead.
Python doesn't have namespaces in the same way that C++ does, but modules serve a somewhat similar purpose (that is, grouping "like" classes and functions together, and giving them unique names to avoid clashes).
Edit
I saw the comment you posted to your question. To answer more explicitly, no, in Pythonic code it's not really correct to use a class to emulate a namespace. Modules are there to group related classes, functions, and variables -- use a module instead. A class represents a "thing" that has a behavior (methods) and data (instance variables) -- it's not just a collection of standalone functions and variables.

Yes, it's fine. You can even use property to make methods look like attributes.
If you have a big class, it might be neater to use a module

It depends on the situation; if you can stick a constant in the module and have it make sense, by all means do so, but putting them in the class can make their meaning more obvious, and allow similar constants to have more "abstraction": placing them in the ServerError class makes more sense than having them all prepended with SERVER_ERROR residing freely in the module.
Do what is most intuitive, but try to avoid namespace pollution.

I mostly agree with #uchuga's answer, but I want to emphasize a caveat:
a = "global"
class C:
a = "class"
def f():
print(a)
f()
... will print "global", not "class".

In my opinion, a class is a class, and a Namespace is a namespace. You can use argparse.Namespace like so to create a namespace:
from argparse import Namespace
directions = Namespace(
north = 0,
east = 1,
south = 2,
west = 3,
)
print(directions.north) # 0
print(directions.east) # 1
print(directions.south) # 2
print(directions.west) # 3

When are function local python variables created?

When are function-local variables are created? For example, in the following code is dictionary d1 created each time the function f1 is called or only once when it is compiled?
def f1():
d1 = {1: 2, 3: 4}
return id(d1)
d2 = {1: 2, 3: 4}
def f2():
return id(d2)
Is it faster in general to define a dictionary within function scope or to define it globally (assuming the dictionary is used only in that function). I know it is slower to look up global symbols than local ones, but what if the dictionary is large?
Much python code I've seen seems to define these dictionaries globally, which would seem not to be optimal. But also in the case where you have a class with multiple 'encoding' methods, each with a unique (large-ish) lookup dictionary, it's awkward to have the code and data spread throughout the file.

Local variables are created when assigned to, i.e., during the execution of the function.
If every execution of the function needs (and does not modify!-) the same dict, creating it once, before the function is ever called, is faster. As an alternative to a global variable, a fake argument with a default value is even (marginally) faster, since it's accessed as fast as a local variable but also created only once (at def time):
def f(x, y, _d={1:2, 3:4}):
I'm using the name _d, with a leading underscore, to point out that it's meant as a private implementation detail of the function. Nevertheless it's a bit fragile, as a bumbling calles might accidentally and erroneously pass three arguments (the third one would be bound as _d within the function, likely causing bugs), or the function's body might mistakenly alter _d, so this is only recommended as an optimization to use when profiling reveals it's really needed. A global dict is also subject to erroneous alterations, so, even though it's faster than buiding a local dict afresh on every call, you might still pick the latter possibility to achieve higher robustness (although the global dict solution, plus good unit tests to catch any "oops"es in the function, are the recommended alternative;-).

If you look at the disassembly with the dis module you'll see that the creation and filling of d1 is done on every execution. Given that dictionaries are mutable this is unlikely to change anytime soon, at least until good escape analysis comes to Python virtual machines. On the other hand lookup of global constants will get speculatively optimized with the next generation of Python VM's such as unladen-swallow (the speculation part is that they are constant).

Speed is relative to what you're doing. If you're writing a database-intensive application, I doubt your application is going to suffer one way or another from your choice of global versus local variables. Use a profiler to be sure. ;-)
As Alex noted, the locals are initialized when the function is called. As easy way to demonstrate this for yourself:
import random
def f():
d = [random.randint(1, 100), random.randint(100, 1000)]
print(d)
f()
f()
f()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.