What is behind the R0902 limit? Will too many instance attributes slow down the python interpreter or is it simply because too many instance attributes will make the class harder to understand?
From the (DOCS):
too-many-instance-attributes (R0902):
Too many instance attributes (%s/%s) Used when class has too many instance attributes, try to reduce this to get a simpler (and so easier to use) class.
And then from the (Tutorial):
... but I have run into error messages that left me with no clue about what went wrong, simply because I was unfamiliar with the underlying mechanism of code theory. One error that puzzled my newbie mind was:
:too-many-instance-attributes (R0902): *Too many instance attributes (%s/%s)*
I get it now thanks to Pylint pointing it out to me. If you don’t get that one, pour a fresh cup of coffee and look into it - let your programmer mind grow!
So, yes that is not so clear.
But, as stated in the description above, with methods, classes and modules, smaller is generally better for understand-ability, re-usability and therefore manageability. When things get too large, it is often a sign that things could be refactored, to make them smaller. There is really no way to place a hard limit on this, and as disccussed in this question about how to turn this message off, pylint should not have the last word.
So, just take the message as a hint to study that section of code a bit more.
Related
I'm learning python and came into a situation where I need to change the behvaviour of a function. I'm initially a java programmer so in the Java world a change in a function would let Eclipse shows that a lot of source files in Java has errors. That way I can know which files need to get modified. But how would one do such a thing in python considering there are no types?! I'm using TextMate2 for python coding.
Currently I'm doing the brute-force way. Opening every python script file and check where I'm using that function and then modify. But I'm sure this is not the way to deal with large projects!!!
Edit: as an example I define a class called Graph in a python script file. Graph has two objects variables. I created many objects (each with different name!!!) of this class in many script files and then decided that I want to change the name of the object variables! Now I'm going through each file and reading my code again in order to change the names again :(. PLEASE help!
Example: File A has objects x,y,z of class C. File B has objects xx,yy,zz of class C. Class C has two instance variables names that should be changed Foo to Poo and Foo1 to Poo1. Also consider many files like A and B. What would you do to solve this? Are you serisouly going to open each file and search for x,y,z,xx,yy,zz and then change the names individually?!!!
Sounds like you can only code inside an IDE!
Two steps to free yourself from your IDE and become a better programmer.
Write unit tests for your code.
Learn how to use grep
Unit tests will exercise your code and provide reassurance that it is always doing what you wanted it to do. They make refactoring MUCH easier.
grep, what a wonderful tool grep -R 'my_function_name' src will find every reference to your function in files under the directory src.
Also, see this rather wonderful blog post: Unix as an IDE.
Whoa, slow down. The coding process you described is not scalable.
How exactly did you change the behavior of the function? Give specifics, please.
UPDATE: This all sounds like you're trying to implement a class and its methods by cobbling together a motley patchwork of functions and local variables - like I wrongly did when I first learned OO coding in Python. The code smell is that when the type/class of some class internal changes, it should generally not affect the class methods. If you're refactoring all your code every 10 mins, you're doing something seriously wrong. Step back and think about clean decomposition into objects, methods and data members.
(Please give more specifics if you want a more useful answer.)
If you were only changing input types, there might be no need to change the calling code.
(Unless the new fn does something very different to the old one, in which case what was the argument against calling it a different name?)
If you changed the return type, and you can't find a common ancestor type or container (tuple, sequence etc.) to put the return values in, then yes you need to change its caller code. However...
...however if the function should really be a method of a class, declare that class and the method already. The previous paragraph was a code smell that your function really should have been a method, specifically a polymorphic method.
Read about code smells, anti-patterns and When do you know you're dealing with an anti-pattern?. There e.g. you will find a recommendation for the video "Recovery from Addiction - A taste of the Python programming language's concision and elegance from someone who once suffered an addiction to the Java programming language." - Sean Kelly
Also, sounds like you want to use Test-Driven Design and add some unittests.
If you give us the specifics we can critique it better.
You won't get this functionality in a text editor. I use sublime text 3, and I love it, but it doesn't have this functionality. It does however jump to files and functions via its 'Goto Anything' (Ctrl+P) functionality, and its Multiple Selections / Multi Edit is great for small refactoring tasks.
However, when it comes to IDEs, JetBrains pycharm has some of the amazing re-factoring tools that you might be looking for.
The also free Python Tools for Visual Studio (see free install options here which can use the free VS shell) has some excellent Refactoring capabilities and a superb REPL to boot.
I use all three. I spend most of my time in sublime text, I like pycharm for refactoring, and I find PT4VS excellent for very involved prototyping.
Despite python being a dynamically typed language, IDEs can still introspect to a reasonable degree. But, of course, it won't approach the level of Java or C# IDEs. Incidentally, if you are coming over from Java, you may have come across JetBrains IntelliJ, which PyCharm will feel almost identical to.
One's programming style is certainly different between a statically typed language like C# and a dynamic language like python. I find myself doing things in smaller, testable modules. The iteration speed is faster. And in a dynamic language one relies less on IDE tools and more on unit tests that cover the key functionality. If you don't have these you will break things when you refactor.
One answer only specific to your edit:
if your old code was working and does not need to be modified, you could just keep old names as alias of the new ones, resulting in your old code not to be broken. Example:
class MyClass(object):
def __init__(self):
self.t = time.time()
# creating new names
def new_foo(self, arg):
return 'new_foo', arg
def new_bar(self, arg):
return 'new_bar', arg
# now creating functions aliases
foo = new_foo
bar = new_bar
if your code need rework, rewrite your common code, execute everything, and correct any failure. You could also look for any import/instantiation of your class.
One of the tradeoffs between statically and dynamically typed languages is that the latter require less scaffolding in the form of type declarations, but also provide less help with refactoring tools and compile-time error detection. Some Python IDEs do offer a certain level of type inference and help with refactoring, but even the best of them will not be able to match the tools developed for statically typed languages.
Dynamic language programmers typically ensure correctness while refactoring in one or more of the following ways:
Use grep to look for function invocation sites, and fix them. (You would have to do that in languages like Java as well if you wanted to handle reflection.)
Start the application and see what goes wrong.
Write unit tests, if you don't already have them, use a coverage tool to make sure that they cover your whole program, and run the test suite after each change to check that everything still works.
If I use a single leading underscore for an attribute in a class, would it be wrong of me to access it from a different object? Is a single underscore saying "I will use this as I please but you the user shouldn't touch it" or should even the developer treat it as though it were private?
The single underscore indicates that it's not for public consumption; code within the same package is welcome to poke and prod it.
I think you should refrain from doing so as much as reasonably possible because it breaks encapsulation which one of the benefits of the object-oriented paradigm. While some feel it's fine as certain higher levels of scoping, say module or package, I've found avoiding doing so a useful rule-of-thumb even within the various methods of a single class -- at least if it's fairly complicated, subtle, or just an aspect I think I might want to change later.
The reason is every time you do something with one, you're creating a dependency between one object's internals and where you're coding, be it another part of the same object or module or whatever. The more of this there is, the harder it will be later to maintain or enhance things.
Another aspect to consider is the fact that whenever you feel a need to do so, it may be indicative of a need for a better design -- in which cause you can treat it a warning sign and react accordingly.
Creating an interface for doing operations that deal with internal details often means having to design and implement more code than not doing so would. So in each case, the benefits must be weighed against the costs to determine if it's worth it. Eventually experience (and education if it's ongoing) makes such decisions easier maybe even instinctual.
I would like to know if there is a way of writing the below module code without having to add another indentation level the whole module code.
# module code
if not condition:
# rest of the module code (big)
I am looking for something like this:
# module code
if condition:
# here I need something like a `return`
# rest of the module code (big)
Note, I do not want to throw an Exception, the import should pass normally.
I don't know of any solution to that, but I guess you could put all your code in an internal module and import that if the condition is not met.
I know of no way to do this. The only thing I could imagine that would work would be return but that needs to be inside a function.
It's super hard to say without knowing what your higher-level goal is. (For instance, what is the condition? Why does it matter? Are you DEAD SURE you're not having an X-Y problem here? Can't you just tell us what your overall goal is?) It's also really hard to say without knowing how the module is going to be called. (As a script from the command line? By being imported by another module?) And it would help a lot to know (a) why you're trying to avoid indentation (WWII is over, and we don't need to ration spaces any more; or, to put it more kindly, Python is a language that uses indentation as a SYNTACTIC FEATURE, so saying "I can't use this syntactic feature" strikes many people as a weird constraint. It's like giving up if-then tests: you might theoretically be able to work around that constraint, possibly, sometimes, but why are you going into the boxing ring with your hands tied behind your back?), and (b) why you can't throw an exception (no, really: are you TOTALLY SURE you ABSOLUTELY CANNOT THROW ANY EXCEPTIONS AT ALL?).
As it is, all you've really done is ask a "how do I do X, given conditions A, B, and C?" question, without indicating why you want to do X, or why conditions A, B, and C exist, or even whether you're 100% sure they exist and cannot be worked around.
If what you're really saying is "I don't want to hit {TAB} 40 times while writing a function," then the real problem is that you need a better text editor. If what you're really saying is "I happen to find indentation to be aesthetically unpleasant," then you should think about (a) what the other side of the argument is; that is, why people Python's use of indentation as syntax to be useful; (b) whether your own aesthetic preferences in this regard are more important than the reasons you've come up with in (a); and (c) whether, given these things, Python is the right tool for you personally to be using to accomplish whatever your own larger-scale goal is. (It's OK to not like indentation as a syntactic feature; but this is so basic to Python that being philosophically opposed to it to an extent that rules it out is a strong indication that maybe Python is not the ideal language for you to accomplish your programming goals in.) If what you're really saying is that you would benefit from factoring code that needs to be run under two different sets of circumstances into two modules, then it would benefit you to refactor. If what you're saying is that you've got spaghetti code that winds up being totally impossible to refactor, then that's really the first problem to be addressed, before you try to abort module imports.
A friend was "burned" when starting to learn Python, and now sees the language as perhaps fatally flawed.
He was using a library and changed the value of an object's attribute (the class being in the library), but he used the wrong abbreviation for the attribute name. It took him "forever" to figure out what was wrong. His objection to Python is thus that it allows one to accidentally add attributes to an object.
Unit tests don't provide a solution to this. One doesn't write unit tests against an API being used. One may have a mock for the class, but the mock could have the same typo or incorrect assumption about the attribute name.
It's possible to use __setattr__() to guard against this, but (as far as I know), no one does.
The only thing I've been able to tell my friend is that after several years of writing Python code full-time, I don't recall ever being burned by this. What else can I tell him?
"changed the value of an object's attribute" Can lead to problems. This is pretty well known. You know it, now, also. That doesn't indict the language. It simply says that you've learned an important lesson in dynamic language programming.
Unit testing absolutely discovers this. You are not forced to mock all library classes. Some folks say it's only a unit test when it's tested in complete isolation. This is silly. You have to trust the library modules -- it's a feature of your architecture. Rather than mock them, just use them. (It is important to write mocks for your own newly-developed libraries. It's also important to mock libraries that make expensive API calls.)
In most cases, you can (and should) test your classes with the real library modules. This will find the misspelled attribute name.
Also, now that you know that attributes are dynamic, it's really easy to verify that the attribute exists. How?
Use interactive Python to explore the classes before writing too much code.
Remember, Python is not Java and it's not C. You can execute Python interactively and determine immediately if you've spelled something wrong. Writing a lot of code without doing any interactive confirmation is -- simply -- the wrong way to use Python.
A little interactive exploration will find misspelled attribute names.
Finally -- for your own classes -- you can wrap updatable attributes as properties. This makes it easier to debug any misspelled attribute names. Again, you know to check for this. You can use interactive development to confirm the attribute names.
Fussing around with __setattr__ creates problems. In some cases, we actually need to add attributes to an object. Why? It's simpler than creating a whole subclass for one special case where we have to maintain more state information.
Other things you can say:
I was burned by a C program that absolutely could not be made to work because of ______. [Insert any known C-language problem you want here. No array bounds checking, for example] Does that make C fatally flawed?
I was burned by a DBA who changed a column name and all the SQL broke. It's painful to unit test all of it. Does that make the relational database fatally flawed?
I was burned by a sys admin who changed a directory's permissions and my application broke. It was nearly impossible to find. Does that make the OS fatally flawed?
I was burned by a COBOL program where someone changed the copybook, forgot to recompile the program, and we couldn't debug it because the source looked perfect. COBOL, however, actually is fatally flawed, so this isn't a good example.
There are code analyzers like pylint that will warn you if you add a attribute outside of __init__. PyDev has nice support for it. Such errors are very easy to find with a debugger too.
If the possibility to make mistakes is enough for him to consider a language "fatally flawed", I don't think you can convince him otherwise. The more you can do with a language, the more you can do wrong with the language. It's a caveat of flexibility—but that's true for any language.
You can use the __slots__ class attribute to limit the attributes that instances have. Attempting to set an attribute that's not expliticly listed will raise an AttributeError. There are some complications that arise with subclassing. See the Python data model reference for details.
A tool like pylint or pychecker may be able to detect this.
He's effectively ruling out an entire class of programming languages -- dynamically-typed languages -- because of one hard lesson learned. He can use only statically-typed languages if he wishes and still have a very productive career as a programmer, but he is certainly going to have deep frustrations with them as well. Will he then conclude that they are fatally-flawed?
I think your friend has misplaced his frustration in the language. His real problem is lack of debugging techniques. teach him how to break down a program into small pieces to examine the output. like a manual unit test, this way any inconsistency is found and any assumptions are proven or discarded.
I had a similar bad experience with Python when I first started ... took me 3 months to get over it. Having a tool which warns would be nice back then ...
I am sorry all - I am not here to blame Python. This is just a reflection on whether what I believe is right. Being a Python devotee for two years, I have been writing only small apps and singing Python's praises wherever I go. I recently had the chance to read Django's code, and have started wondering if Python really follows its "readability counts" philosophy. For example,
class A:
a = 10
b = "Madhu"
def somemethod(self, arg1):
self.c = 20.22
d = "some local variable"
# do something
....
...
def somemethod2 (self, arg2):
self.c = "Changed the variable"
# do something 2
...
It's difficult to track the flow of code in situations where the instance variables are created upon use (i.e. self.c in the above snippet). It's not possible to see which instance variables are defined when reading a substantial amount of code written in this manner. It becomes very frustrating even when reading a class with just 6-8 methods and not more than 100-150 lines of code.
I am interested in knowing if my reading of this code is skewed by C++/Java style, since most other languages follow the same approach as them. Is there a Pythonic way of reading this code more fluently? What made Python developers adopt this strategy keeping "readability counts" in mind?
The code fragment you present is fairly atypical (which might also because you probably made it up):
you wouldn't normally have an instance variable (self.c) that is a floating point number at some point, and a string at a different point. It should be either a number or a string all the time.
you normally don't bring instance variables into life in an arbitrary method. Instead, you typically have a constructor (__init__) that initializes all variables.
you typically don't have instance variables named a, b, c. Instead, they have some speaking names.
With these fixed, your example would be much more readable.
A sufficiently talented miscreant can write unreadable code in any language. Python attempts to impose some rules on structure and naming to nudge coders in the right direction, but there's no way to force such a thing.
For what it's worth, I try to limit the scope of local variables to the area where they're used in every language that i use - for me, not having to maintain a huge mental dictionary makes re-familiarizing myself with a bit of code much, much easier.
I agree that what you have seen can be confusing and ought to be accompanied by documentation. But confusing things can happen in any language.
In your own code, you should apply whatever conventions make things easiest for you to maintain the code. With respect to this particular issue, there are a number of possible things that can help.
Using something like Epydoc, you can specify all the instance variables a class will have. Be scrupulous about documenting your code, and be equally scrupulous about ensuring that your code and your documentation remain in sync.
Adopt coding conventions that encourage the kind of code you find easiest to maintain. There's nothing better than setting a good example.
Keep your classes and functions small and well-defined. If they get too big, break them up. It's easier to figure out what's going on that way.
If you really want to insist that instance variables be declared before referenced, there are some metaclass tricks you can use. e.g., You can create a common base class that, using metaclass logic, enforces the convention that only variables that are declared when the subclass is declared can later be set.
This problem is easily solved by specifying coding standards such as declaring all instance variables in the init method of your object. This isn't really a problem with python as much as the programmer.
If what the code is doing becomes mysterious for some reason .. there should either be comments or the function names should make it obvious.
This is just my opinion though.
I personally think not having to declare variables is one of the dangerous things in Python, especially when doing classes. It is all too easy to accidentally create a variable by simple mistyping and then boggle at the code at length, unable to find the mistake.
Adding a property just before you need it will prevent you from using it before it's got a value. Personally, I always find classes hard to follow just from reading source - I read the documentation and find out what it's supposed to do, and then it usually makes sense when I read the source again.
The fact that such stuff is allowed is only useful in rare times for prototyping; while Javascript tends to allow anything and maybe such an example could be considered normal (I don't really know), in Python this is mostly a negative byproduct of omission of type declaration, which can help speeding up development - if you at some point change your mind on the type of a variable, fixing type declarations can take more time than the fixes to actual code, in some cases, including the renaming of a type, but also cases where you use a different type with some similar methods and no superclass/subclass relationship.