I've only just dipped my toes into the Python pool. I understand how classes and namespaces interact and del just removes stuff from scope, and all that goodness.
My question is simple. I made a simple "hello world" (Pygame) app, but made a critical mistake (simple case below):
class Cat:
__name = ""
def __init__(self, newName):
__name = newName
def meow(self):
print "Hi from {0}".format(self.__name)
c = Cat("Mittens")
c.meow(); # prints: "Hi from "
In hindsight, my mistake is obvious: in my method (constructor, here specifically) I assigned a value to __name instead of to __self.name (which, by the way, should be mangled and private).
I even understand how and why it works -- it's kind of like static scope in Java/C#.
What I don't understand is how to avoid these kinds of tricksy errors. This was obvious, easy, and the result was, in my case, nothing -- things should have worked but nothing happened.
How do Python developers avoid this problem? Is seems like it would be very common.
In this case, I would definitely avoid adding __name to the class namespace. Since __name is (supposed to be) set by the initializer (__init__) unconditionally, there is no point to having a class level __name.
e.g.:
class Cat:
def __init__(self, newName):
__name = newName
def meow(self):
print "Hi from {0}".format(self.__name)
c = Cat("Mittens")
c.meow() # prints: "Hi from "
Now run the code and you'll get an AttributeError which helps to catch these sorts of bugs.
In other words, the purpose of __init__ in your code is to set __name on the new instance being initialized (self) via self.__name = newName. In your code, you're providing a "safety net" by setting __name on the class. In python, if an attribute isn't found on the instance, it is then looked up on the class. However, this "safety net" is actually a bad thing in this case -- you get different behavior than you expected rather than an Exception which could be used to help you track down the bug.
As a side note, the way attribute access on an instance works (baring interesting things like __getattribute__) is that python first looks on the instance for the attribute, then it searches the class and then it works it's way up the class's method resolution order (which is different for old and new-style classes).
Static analysis tools such as pep8, pylint, pyflakes and flake8 can help to certain extent. For example, pyflakes would complain about your code (placed in a file named stack_overflow-2013-03-08.py):
$ pyflakes stack_overflow-2013-03-08.py
stack_overflow-2013-03-08.py:8: local variable '__name' is assigned to but never used
But nothing beats a good set of tests. Needless to say, if you had a test for Cat.meow that reasonably covered its range of functionality, you would have caught this problem right meow.
Related
I came across weird behavior in Python 3.6.
I was able to call function and access variable defined only in child class from base class method.
I find this useful in my code but I come from C++ and this code looks very weird.
Can someone please explain this behavior?
class a:
def __init__(self):
print(self.var)
self.checker()
class b(a):
def __init__(self):
self.var=5
super().__init__()
def checker(self):
print('inside B checker')
myB = b()
Output:
5
inside B checker
All methods in Python are looked up dynamically. You're calling a method on self, and self is a b instance, and b instances have a checker method, so that method gets called.
Consider this code at the module top level, or in a top-level function:
myB = b()
myB.checker()
Obviously the global module code isn't part of the b class definition, and yet, this is obviously legal. Why should it be any different if you put the code inside the class a definition, and rename myB to welf? Python doesn't care. You're just asking the value—whether you've called it myB or self—"do you have something named checker?", and the answer is yes, so you can call it.
And var is even simpler; self.var just adds var to self.__dict__, so it's there; the fact that it's a b instance isn't even relevant here (except indirectly—being a b instance means it had b.__init___ called n it, and that's where var was created).
If you're wondering how this "asking the value", a slightly oversimplified version is:
Every object has a __dict__. When you do self.var=5, that actually does self.__dict__['var'] = 5. And when you print(self.var), that does print(self.__dict__['var']).
When that raises a KeyError, as it will for self.checker, Python tries type(self).__dict__['checker'], and, if that doesn't work, it loops over type(self).mro() and tries all of those dicts.
When all of those raise a KeyError, as they would with self.spam, Python calls self.__getattr__('spam').
If even that fails, you get an AttributeError.
Notice that if you try to construct an a instance, this will fail with an AttributeError. That's because now self is an a, not a b. It doesn't have a checker method, and it hasn't gone through the __init__ code that adds a var attribute.
The reason you can't do this in C++ is that C++ methods are looked up statically. It's not a matter of what type the value is at runtime, but what type the variable is at compile time. If the statically looked-up method says it's virtual, then the compiler inserts some dynamic-lookup code, but otherwise, it doesn't.1
One way it's often explained is that in Python (and other languages with SmallTalk semantics, like ObjC and Ruby), all methods are automatically virtual. But that's a bit misleading, because in C++, even with virtual methods, the method name and signature still has to be findable on the base class; in Python (and SmallTalk, etc.), that isn't necessary.
If you're thinking this must be horribly slow, that Python must have to do something like search some stack of namespaces for the method by name every time you call a method—well, it does that, but it's not as slow as you've expect. For one thing, a namespace is a dict, so it's a constant-time search. And the strings are interned and have their hash values cached. And the interpreter can even cache the lookup results if it wants to. The result is still slower than dereferencing a pointer through a vtable, but not by a huge margin (and besides, there are plenty of other things in Python that can be 20x slower than C++, like for loops; you don't use pure Python when you need every detail to work as fast as possible).
1. C++ also has another problem: even if you defined a var attribute and a checker virtual method in a, you don't get to choose the order the initializers get called; the compiler automatically calls a's constructor first, then b's. In Python, it calls b.__init__, and you choose when you want to call super().__init__(), whether it's at the start of the method, at the end, in the middle, or even never.
First you are creating an instance of class b which will call the constructor of class b __init__.
Inside the constructor your are setting the attribute self.var as 5.Later super().__init__() will call the constructor of the parent class A.
Inside the constructor of class A both self.var is printed and self.checker() is called.
Note that when calling the super().__init__() will place the child class instance self as the first argument by default.
As you know, when the project's code is very large and there are so many attributes and functions defined in a Class, but some of them never be called by the instance of the Class, and maybe some of them has been discarded. Here is a example:
class Foo(object):
""""""
def __init__(self):
self.a = 1
self.b = 2
self.c = 3
...
self.y = 25
self.z = 26
def func1(self):
pass
def func2(self):
pass
def func3(self):
pass
...
...
def func100(self):
pass
if __name__ == '__main__':
f = Foo()
f.func1()
f.func2()
print f.a, f.b, f.z
In the above code, the instance f of class Foo just called func1() and func2(). And how to find all the attributes and functions of class that never called by the instance of class.
I have tried compiler module but that could not solve my question. And dir(my_instance) is just print all the functions and attributes defined the the class.
Thanks in advance.
You can try coverage.py. It's not static analysis, but actually runs your code and records which statements are executed, outputting annotated html or txt as you wish (quite nicely formatted as well). You can then look for functions and methods whose bodies are not executed at all.
This still doesn't take care of unused attributes. And I don't know the answer to that. Maybe comment them out one at a time and see if tests still pass...
It's pretty hard to prove something is or is not used in the general case. Python is a dynamic language; if even one bit of code calls into code the static analyzer doesn't fully analyze, it could be accessing the variables mentioned.
The pylint and flake8 tools will tell you about local and global names that aren't defined prior to use (unless you break them by using from x import * style imports), and about imports that are never used (an import that is never used is usually wrong, but even then, it could be an intentional part of the interface, where linters would have to be silenced), but I don't believe they can tell you that a given attribute is never accessed; after all, someone else could import your module and access said attributes.
Use the profile module in the standard library.
python -m cProfile -o output_file myscript.py
Then load the stats file and use print_callees() to get all the functions that were called--during that run of the program.
I don't know of any easy way to find out which attributes are used.
Suppose we have the following structure:
class A():
class __A():
def __to_be_mocked(self):
#something here
def __init__(self):
with A.lock:
if not A.instance:
A.instance = A.__A()
def __getattr__(self,name):
return getattr(self.instance,name)
Now we want to mock the function __to_be_mocked.How can we mock it as the target accepted by mock.patch.object is package.module.ClassName.I have tried all methods like
target = A.__A
target = A.___A
and many more.
EDIT:
I solved it using
target=A._A__A and attribute as '_A__to_be_mocked`
Now the question is __to_be_mocked is inside __A so shouldn't it be ___A__to_be_mocked .
Is it because of setattribute in A or __init__ in A?
I mocked a lot of things in python and after did it lot of times I can say:
NEVER mock/patch __something attributes (AKA private attributes)
AVOID to mock/patch _something attributes (AKA protected attributes)
Private
If you mock private things you'll tangled production and test code. When you do this kind of mocks there is always a way to obtain the same behavior by patching or mocking public or protected stuffs.
To explain better what I mean by tangling production and test code I can use your example: to patch A.__B.__to_be_mocked() (I replaced __A inner class by __B to make it more clear) you need to write something like
patch('amodule.A._A__B._B__to_be_mocked')
Now by patching __to_be_mocked you are spreading A, B and to_be_mocked names in your test: that is exactly what I mean to tangled code. So if you need to change some name you should go in all your test and change your patches and no refactoring tool can propose to you to change _A__B._B string.
Now if you are a good guy and take your tests clean you can have just a few points where these names come out but if it is a singleton I can bet that it will spot out like mushrooms.
I would like to point out that private and protected have nothing to do with some security concern but are just way to make your code more clear. That point is crystal clear in python where you don't need to be a hacker to change private or protected attributes: these conventions are here just to help you on reading code where you can say Oh great! I don't need to understand what is it ... it just the dirty work. IMHO private attributes in python fails this goal (__ is too long and see it really bother me) and protected are just enough.
Side note: little example to understand python's private naming:
>>> class A():
... class __B():
... def __c(self):
... pass
...
>>> a = A()
>>> dir(a)
['_A__B', '__doc__', '__module__']
>>> dir(a._A__B)
['_B__c', '__doc__', '__module__']
To come back at your case: How your code use __to_be_mocked() method? is it possible to have the same effect by patch/mock something else in A (and not A.__A) class?
Finally, if you are mocking private method to sense something to test you are in the wrong place: never test the dirty work it should/may/can change without change your tests. What you need is to test code behavior and not how it is written.
Protected
If you need test, patch or mock protected stuffs maybe your class hide some collaborators: test it and use your test to refactor your code then clean your tests.
Disclaimer
Indeed: I spread this kind of crap in my tests and then I fight to remove it when I understand that I can do it better.
Class & instance members starting with double underscores have their names rewritten to prevent collisions with same-name members in parent classes, making them behave as if "private". So __B here is actually accessible as A._A__B. (Underscore, class name, double underscored member name). Note that if you use the single-underscore convention (_B), no rewriting happens.
That being said, you'll rarely see anyone actually use this form of access and especially not in prod code as things are made "private" for a reason. For mocking, maybe, if there's no better way.
Both of these blocks of code work. Is there a "right" way to do this?
class Stuff:
def __init__(self, x = 0):
global globx
globx = x
def inc(self):
return globx + 1
myStuff = Stuff(3)
print myStuff.inc()
Prints "4"
class Stuff:
def __init__(self, x = 0):
self.x = x
def inc(self):
return self.x + 1
myStuff = Stuff(3)
print myStuff.inc()
Also prints "4"
I'm a noob, and I'm working with a lot of variables in a class. Started wondering why I was putting "self." in front of everything in sight.
Thanks for your help!
You should use the second way, then every instance has a separate x
If you use a global variable then you may find you get surprising results when you have more than one instance of Stuff as changing the value of one will affect all the others.
It's normal to have explicit self's all over your Python code. If you try tricks to avoid that you will be making your code difficult to read for other Python programmers (and potentially introducing extra bugs)
There are 2 ways for "class scope variables". One is to use self, this is called instance variable, each instance of a class has its own copy of instance variables; another one is to define variables in the class definition, this could be achieved by:
class Stuff:
globx = 0
def __init__(self, x = 0):
Stuff.globx = x
...
This is called class attribute, which could be accessed directly by Stuff.globx, and owned by the class, not the instances of the class, just like the static variables in Java.
you should never use global statement for a "class scope variable", because it is not. A variable declared as global is in the global scope, e.g. the namespace of the module in which the class is defined.
namespace and related concept is introduced in the Python tutorial here.
Those are very different semantically. self. means it's an instance variable, i.e. each instance has its own. This is propably the most common kind, but not the only one. And then there are class variables, defined at class level (and therefore by the time the class definition is executed) and accessable in class methods. The equivalent to most uses of static methods, and most propably what you want when you need to share stuff between instances (this is perfectly valid, although not automatically teh one and only way for a given problem). You propably want one of those, depending on what you're doing. Really, we can't read your mind and tell you which one fits your problem.
Globals variables are a different story. They're, well, global - everyone has the same one. This is almost never a good idea (for reasons explained on many occasions), but if you're just writing a quick and dirty script and need share something between several places, they can be acceptable.
I'm new to python, and I've been reading that using global to pass variables to other functions is considered noobie, as well as a bad practice. I would like to move away from using global variables, but I'm not sure what to do instead.
Right now I have a UI I've created in wxPython as its own separate class, and I have another class that loads settings from a .ini file. Since the settings in the UI should match those in the .ini, how do I pass around those values? I could using something like: Settings = Settings() and then define the variables as something like self.settings1, but then I would have to make Settings a global variable to pass it to my UI class (which it wouldn't be if I assign in it main()).
So what is the correct and pythonic way to pass around these variables?
Edit: Here is the code that I'm working with, and I'm trying to get it to work like Alex Martelli's example. The following code is saved in Settings.py:
import ConfigParser
class _Settings():
#property
def enableautodownload(self): return self._enableautodownload
def __init__(self):
self.config = ConfigParser.ConfigParser()
self.config.readfp(open('settings.ini'))
self._enableautodownload=self.config.getboolean('DLSettings', 'enableautodownload')
settings = _Settings()
Whenever I try to refer to Settings.settings.enableautodownload from another file I get: AttributeError: 'module' object has no attribute 'settings'. What am I doing wrong?
Edit 2: Never mind about the issue, I retyped the code and it works now, so it must have been a simple spelling or syntax error.
The alternatives to global variables are many -- mostly:
explicit arguments to functions, classes called to create one of their instance, etc (this is usually the clearest, since it makes the dependency most explicit, when feasible and not too repetitious);
instance variables of an object, when the functions that need access to those values are methods on that same object (that's OK too, and a reasonable way to use OOP);
"accessor functions" that provide the values (or an object which has attributes or properties for the values).
Each of these (esp. the first and third ones) is particularly useful for values whose names must not be re-bound by all and sundry, but only accessed. The really big problem with global is that it provides a "covert communication channel" (not in the cryptographic sense, but in the literal one: apparently separate functions can actually be depending on each other, influencing each other, via global values that are not "obvious" from the functions' signatures -- this makes the code hard to test, debug, maintain, and understand).
For your specific problem, if you never use the global statement, but rather access the settings in a "read-only" way from everywhere (and you can ensure that more fully by making said object's attributes be read-only properties!), then having the "read-only" accesses be performed on a single, made-once-then-not-changed, module-level instance, is not too bad. I.e., in some module foo.py:
class _Settings(object):
#property
def one(self): return self._one
#property
def two(self): return self._two
def __init__(self, one, two):
self._one, self._two = one, two
settings = _Settings(23, 45)
and from everywhere else, import foo then just access foo.settings.one and foo.settings.two as needed. Note that I've named the class with a single leading underscore (just like the two instance attributes that underlie the read-only properties) to suggest that it's not meant to be used from "outside" the module -- only the settings object is supposed to be (there's no enforcement -- but any user violating such requested privacy is most obviously the only party responsible for whatever mayhem may ensue;-).