Is it bad practice to reference an attribute with a new variable name in a method within a class? For example:
class Stuff:
def __init__(self, a):
self.a = a
def some_method(self):
a = self.a
# Do some stuff with a
I've seen this in other peoples' code and I've gotten into a habit of it myself, especially with long variable names. It seems like a copy of a is created when I do this which could be a problem if a is very large. Should I just stick to calling self.a inside of some_method? Does python garbage collect the a created in some_method after it is called?
This isn't necessarily a bad practice, you could make this assignment with two reasons (see the comments by #ShadowRanger for a quite obscure third reason) backing it:
Making code more readable (as you mentioned, long names can be too long.
Eliminating the dot; if you have a tedious loop that uses self.a, it might shed some time if you don't need to perform the look-up every time (not too much time, though). Additionally, if this wasn't a plain attribute but instead was a function, assigning it to a local variable would eliminate the transformation from function to method which also sheds some execution time.
Also, copy isn't the best term, you just make a different name refer to the same object. After the method some_method is completed, a will just not exist because it is only created in the local scope.
No, garbage collection doesn't happen because a (which is assigned to the value of self.a) isn't the only reference; you still have self.a which keeps the value assigned to it alive.
Related
I came across weird behavior in Python 3.6.
I was able to call function and access variable defined only in child class from base class method.
I find this useful in my code but I come from C++ and this code looks very weird.
Can someone please explain this behavior?
class a:
def __init__(self):
print(self.var)
self.checker()
class b(a):
def __init__(self):
self.var=5
super().__init__()
def checker(self):
print('inside B checker')
myB = b()
Output:
5
inside B checker
All methods in Python are looked up dynamically. You're calling a method on self, and self is a b instance, and b instances have a checker method, so that method gets called.
Consider this code at the module top level, or in a top-level function:
myB = b()
myB.checker()
Obviously the global module code isn't part of the b class definition, and yet, this is obviously legal. Why should it be any different if you put the code inside the class a definition, and rename myB to welf? Python doesn't care. You're just asking the value—whether you've called it myB or self—"do you have something named checker?", and the answer is yes, so you can call it.
And var is even simpler; self.var just adds var to self.__dict__, so it's there; the fact that it's a b instance isn't even relevant here (except indirectly—being a b instance means it had b.__init___ called n it, and that's where var was created).
If you're wondering how this "asking the value", a slightly oversimplified version is:
Every object has a __dict__. When you do self.var=5, that actually does self.__dict__['var'] = 5. And when you print(self.var), that does print(self.__dict__['var']).
When that raises a KeyError, as it will for self.checker, Python tries type(self).__dict__['checker'], and, if that doesn't work, it loops over type(self).mro() and tries all of those dicts.
When all of those raise a KeyError, as they would with self.spam, Python calls self.__getattr__('spam').
If even that fails, you get an AttributeError.
Notice that if you try to construct an a instance, this will fail with an AttributeError. That's because now self is an a, not a b. It doesn't have a checker method, and it hasn't gone through the __init__ code that adds a var attribute.
The reason you can't do this in C++ is that C++ methods are looked up statically. It's not a matter of what type the value is at runtime, but what type the variable is at compile time. If the statically looked-up method says it's virtual, then the compiler inserts some dynamic-lookup code, but otherwise, it doesn't.1
One way it's often explained is that in Python (and other languages with SmallTalk semantics, like ObjC and Ruby), all methods are automatically virtual. But that's a bit misleading, because in C++, even with virtual methods, the method name and signature still has to be findable on the base class; in Python (and SmallTalk, etc.), that isn't necessary.
If you're thinking this must be horribly slow, that Python must have to do something like search some stack of namespaces for the method by name every time you call a method—well, it does that, but it's not as slow as you've expect. For one thing, a namespace is a dict, so it's a constant-time search. And the strings are interned and have their hash values cached. And the interpreter can even cache the lookup results if it wants to. The result is still slower than dereferencing a pointer through a vtable, but not by a huge margin (and besides, there are plenty of other things in Python that can be 20x slower than C++, like for loops; you don't use pure Python when you need every detail to work as fast as possible).
1. C++ also has another problem: even if you defined a var attribute and a checker virtual method in a, you don't get to choose the order the initializers get called; the compiler automatically calls a's constructor first, then b's. In Python, it calls b.__init__, and you choose when you want to call super().__init__(), whether it's at the start of the method, at the end, in the middle, or even never.
First you are creating an instance of class b which will call the constructor of class b __init__.
Inside the constructor your are setting the attribute self.var as 5.Later super().__init__() will call the constructor of the parent class A.
Inside the constructor of class A both self.var is printed and self.checker() is called.
Note that when calling the super().__init__() will place the child class instance self as the first argument by default.
I have seen the following many times, where the instance variables (ex. obj_foo and obj_bar) are re-assigned to be local method variables (ex. within call):
class Example:
def __init__(self, obj_foo, obj_bar):
self.obj_foo = obj_foo
self.obj_bar = obj_bar
def call(self):
obj_foo, obj_bar = self.obj_foo, self.obj_bar
obj_foo.do_something()
obj_bar.do_something_else()
I am not sure if this is convention (easy to read) or if there is a more significant purpose?
Is this bad practice?
Does this effect performance?
Usually, there is no reason to do that, but in some circumstances it may be:
faster (because access to local variables is fast)
more easily readable (because it is shorter)
The speed is probably the more important factor here. Accessing member variables involves various mechanisms (see __getattr__, __getattribute__, __dict__, descriptors) which take some time to resolve. Additionally, the getter for the variable may do something even more expensive.
On the other hand, local variables are in CPython optimised at compile time, so there is actually no lookup for a variable named 'obj_foo' in the __dict__, but instead the interpreter just picks the first local variable, because it knows that obj_foo is the first local variable without the need to search for the name.
So, if a member variable is used many times in the same function and profiling shows that it takes significant time to access that member variable, it may be useful to use a local variable instead.
Usually, that does not make a big difference, but here is an example to show the idea:
class A:
def __init__(self,x):
self.x=x
def f(self):
for i in range(100):
self.x()
class B:
def __init__(self,x):
self.x=x
def f(self):
x=self.x
for i in range(100):
x()
The timing is almost the same, but there is some difference:
>>> timeit.timeit('a.f()', setup='a=A(lambda:None)', globals=locals())
13.119033042000638
>>>
>>> timeit.timeit('b.f()', setup='b=B(lambda:None)', globals=locals())
10.219889547632562
IMHO, in this case, the difference is barely enough to justify adding that one line of code.
You could do this just to avoid writing out self every time.
However, it could also be that there is a more important reason to do this: it could completely change the semantics. Example:
def __init__(self, x):
self.x = 42
def theMethod(self):
x = self.x
self.x = 58
print(x)
print(self.x)
In this example, x and self.x are not interchangeable, even though you have assigned x = self.x in the first line of theMethod. The first print will output 42, the second print will output 58. This can happen every time that some member variable is assigned to a local variable and then overridden.
How this impacts performance is not entirely obvious, because both lookups self.x and x will have to look for the symbol in a dictionary: in the first case, the dictionary of the member variables of self, in the second case, in the current scope. It could impact the performance both positively and negatively, depending on how many and what other variables are defined in each scope. In most non-contrived cases, it could have a tiny positive effect on performance.
EDIT: As #zvone has pointed out, the last paragraph does not necessarily hold for all implementations of the python interpreter.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I used to be a c programmer, so we have to pass every variable as argument or pointer and not encouraged to define global variable.
I am going to use some variable in several functions in python.
Generally, which is better, pass the variable as an argument, or define a self variable when we get the value of the variables? Does python has any general rules about this?
Like this:
class A:
def func2(self, var):
print var
def func1(self):
var = 1
self.func2(var)
class B:
def func2(self):
print self.var
def func1(self):
self.var = 1
self.func2()
Which is better? A or B?
In Python, you have a lot of freedom to do what "makes sense". In this case, I would say that it depends on how you plan on using func2 and who will be accessing it. If func2 is only ever supposed to act upon self.var, then you should code it as such. If other objects are going to need to pass in different arguments to func2, then you should allow for it to be an argument. Of course, this all depends on the larger scope of what you're trying to do, but given your simple example, this makes sense.
Also, I'm confused about how your question relates to global variables. Member variables are not the same thing as global variables.
Edited to reflect updated post:
The difference between A and B in your example is that B persists the information about self.var, while A does not. If var needs to be persisted as part of the object's state, then you need to store it as part of self. I get the sense that your question might relate more to objects as a general concept than anything Python-specific.
Of course it's better to design your program to use scope intelligently. The most obvious problem is that a mutation of a global variable can affect distant parts of code in ways that are difficult to trace, but in addition, garbage collection (reference counting, whatever) becomes effectively moot when your references live in long-lived scopes.
That said, Python has a global keyword, but it doesn't have globals in the same way c does. Python globals are module level, so they're namespaced with the module by default. The downstream programmer can bypass or alias this namespacing, but that's his/her problem. There are certainly cases where defining a module-level configuration value, pseudo-enum or -const makes sense.
Next, consider whether you need to maintain state: if the behavior of an object depends on it being aware of a certain value, make it a property. You can do that by attaching the value to self. Otherwise, just pass the value as an argument. (But then, if you have a lot of methods and no state, ask yourself if you really need a class, or should they just be module functions?)
This questions has implications towards object-oriented design. Python is an object oriented language; c is not. You would be dramatically undermining (and in some cases thwarting) object oriented advantages to use in-out programming or entirely global variables in Python except where there's particular reason to do so.
Consider the following reasons, which are not exhaustive:
Garbage collection won't know when to collect if the variables are all global
You no longer have fields (which is what "self" helps you reference). Say your object is a Cat; there isn't some global name for a cat which you reassign whenever a new Cat appears in your neighborhood. Rather, each cat has its own name, age, size, etc. Someone who wants to find out how big the cat is shouldn't have to go to some global repository of cat sizes and look it up, they should just look at the cat
You can run into problems with primitives because Python, unlike C, does not let you track (easily) the reference of an object. If I pass in an integer variable, I can't change the value of the variable in its original location, only within the scope of the function. This can be solved with global variables, but only by being very messy. Consider the following code:
def foo(x):
x = 3
myVar = 5
foo(myVar)
print(myVar)
This will, of course, output 5, not three. There is no "x*" like there is in C, so solving this would be rather tricky in Python if we wanted foo to reassign 3 to the input variable. Rather, we could write
class Foo:
x = 5
def foo( fooObj ):
fooObj.x = 3
myFoo = Foo()
foo(myFoo)
print(myFoo.x)
Problem solved - it now outputs 3, not 5!
As a general rule, it is better to use self whenever possible, to encapsulate internal information and bind it to object (or class). I may be helpful to explain how self and classes work.
In Python class or object variables are passed to methods explicitly, just as you would do in C if you want to do OOP. This is different from other object oriented languages, like Java or C++, where this argument is passed implicitly (but it always is!).
Thus if you define class like:
Class B(object):
def __init__(self, var=None): # this is constructor
self.var = var
def func2(self):
print self.var
when you call object method with . operator, this object will be passed as the first argument, that maps to self in method signature:
b = B(1) # object b is created and B.__init__(b, 1) is called
b.func2() # B.func2(b) is called, outputs 1
I hope this clears up things for you a bit
I recommend focusing on proximity. If the variable only relates to the current method or is created to be passed to another method, then it probably isn't expressing the persistent state of the class instance. Create the variable and throw it away when you're done.
If the variable describes an important facet of the instance, use self. This is not encroaching on your aversion to global variables as the variable is encapsulated within the instance. Class and module variables are also fine for the same reason.
In short, both A and B are proper implementations depending on context. I'm sorry that I haven't given you a clear answer but it has more to do with how important an object is to the objects around it than maintaining any sort of community standard. That you asked the question makes me think you'll make a reasonable judgement.
I've spent some time looking for a guide on how to decide how to store data and functions in a python class. I should point out that I am new to OOP, so answers such as:
data attributes correspond to “instance variables”
in Smalltalk, and to “data members” in C++. (as seen in http://docs.python.org/tutorial/classes.html
leave me scratching my head. I suppose what I'm after is a primer on OOP targeted to python programmers. I would hope that the guide/primer would also include some sort of glossary, or definitions, so after reading I would be able to speak intelligently about the different types of variables available. I want to understand the thought processes behind deciding when to use the forms of a, b, c, and d in the following code.
class MyClass(object):
a = 0
def __init__(self):
b = 0
self.c = 0
self.__d = 0
def __getd(self):
return self.__d
d = property(__getd, None, None, None)
a, b, and c show different scopes of variables. Meaning, these variables have a different visibility and environment in which they are valid.
At first, you need to understand the difference between a class and an object. A class is a vehicle to describe some generic behavior. An object is then created based on that class. The objects "inherits" all the methods of the class and can define variables which are bound to the object. The idea is that objects encapsulate some data and the required behavior to work on that data. This is the main difference to procedural programming, where modules just define the behavior , but not the data.
c is now such a instance variable, meaning a variable which lives in the scope of an instance of MyClass. self is always a reference to the current object instance the current code is run under. Technically, __d works the same as c and has the same scope. The difference here is that it is a convention in Python that variables and methods starting with two underscores are to be considered private are are not to be used by code outside of the class. This is required because Python doesn't have a way to define truely private or proteted methods and variables as many other languages do.
b is a simple variable which is only valid inside the __init__ method. If the execution leaves the __init__ method, the b variable is going to be garbage collected and is not accessible anymore while c and __d are still valid. Note that b it is not prepended with self.
Now a is defined directly on the class. That makes it a so called class variable. Typically, it is used to store static data. This variable is the same on all instances of the MyClass class.
Note that this description is a bit simplified and omits things like metaclasses and the difference between functions and bound methods, but you get the idea...
a. is for variables shared by all instances of MyClass
b. is for a variable that will exist only within the init function.
c. is for an attribute of the specific MyClass instance, and is part of the external interface of MyClass (i.e. don't be surprised if some other programmer mucks around with this variable). The disadvantage of using "c", is that it reduces your flexibility to make changes to MyClass (at some point, someone is probably going to rely on the fact that "c" exists and does certain things, so if you decide to reorganize your class, you will need to be prepared to keep "c" around forever).
__d. is for an attribute of the specific MyClass instance, and is part of the internal implementation of MyClass; it can be assumed that only the code of MyClass will read/write this attribute.
d. Makes __d look in many ways like c. However, the advantage of using d with __d is that if, for example, d() can be computed from some other attribute it would be possible to eliminate the additional storage of __d. Also, you ensure that this is only read externally, not written externally.
A newbie-friendly online book which is widely-recommended is Dive into Python
See Chapter 5 especially.
As to your questions:
a is a class variable (identical for all objects of that class.)
b is a local (temporary) variable not related to the class. Assigning to it inside the __init__() method might make you think it persists after the __init__() call, but it doesn't. You might think that b could refer to a global (as it would in other languages), but in Python when scope is not explicitly specified and there is no global b statement in effect, a variable refers to the innermost scope.
c,d are instance variables (each object can have different values); and their different semantics mean:
c is an ordinary instance variable which can be read or written as object.c (the class doesn't define a getter or setter for it)
__d is a private variable, read-only, and intended to be accessed through the getter function getd() ; its property line shows you it has no setter setd() hence cannot be changed. The double-underscore prefix __ signifies it is internal and not intended to be accessed by anything outside the class.
d is a property which allows (readonly) access to __d, but as Michael points out without needing storage for an extra variable, and it can be computed dynamically when getd() is called.
Both of these blocks of code work. Is there a "right" way to do this?
class Stuff:
def __init__(self, x = 0):
global globx
globx = x
def inc(self):
return globx + 1
myStuff = Stuff(3)
print myStuff.inc()
Prints "4"
class Stuff:
def __init__(self, x = 0):
self.x = x
def inc(self):
return self.x + 1
myStuff = Stuff(3)
print myStuff.inc()
Also prints "4"
I'm a noob, and I'm working with a lot of variables in a class. Started wondering why I was putting "self." in front of everything in sight.
Thanks for your help!
You should use the second way, then every instance has a separate x
If you use a global variable then you may find you get surprising results when you have more than one instance of Stuff as changing the value of one will affect all the others.
It's normal to have explicit self's all over your Python code. If you try tricks to avoid that you will be making your code difficult to read for other Python programmers (and potentially introducing extra bugs)
There are 2 ways for "class scope variables". One is to use self, this is called instance variable, each instance of a class has its own copy of instance variables; another one is to define variables in the class definition, this could be achieved by:
class Stuff:
globx = 0
def __init__(self, x = 0):
Stuff.globx = x
...
This is called class attribute, which could be accessed directly by Stuff.globx, and owned by the class, not the instances of the class, just like the static variables in Java.
you should never use global statement for a "class scope variable", because it is not. A variable declared as global is in the global scope, e.g. the namespace of the module in which the class is defined.
namespace and related concept is introduced in the Python tutorial here.
Those are very different semantically. self. means it's an instance variable, i.e. each instance has its own. This is propably the most common kind, but not the only one. And then there are class variables, defined at class level (and therefore by the time the class definition is executed) and accessable in class methods. The equivalent to most uses of static methods, and most propably what you want when you need to share stuff between instances (this is perfectly valid, although not automatically teh one and only way for a given problem). You propably want one of those, depending on what you're doing. Really, we can't read your mind and tell you which one fits your problem.
Globals variables are a different story. They're, well, global - everyone has the same one. This is almost never a good idea (for reasons explained on many occasions), but if you're just writing a quick and dirty script and need share something between several places, they can be acceptable.