My colleague just pointed out that my use of inner-classes seemed to be "unpythonic". I guess it violates the "flat is better than nested" heuristic.
What do people here think? Are inner-classes something which are more appropriate to Java etc than Python?
NB : I don't think this is a "subjective" question. Surely style and aesthetics are objective within a programming community.
Related Question: Is there a benefit to defining a class inside another class in Python?
This may not deserve a [subjective] tag on StackOverflow, but it's subjective on the larger stage: some language communities encourage nesting and others discourage it. So why would the Python community discourage nesting? Because Tim Peters put it in The Zen of Python? Does it apply to every scenario, always, without exception? Rules should be taken as guidelines, meaning you should not switch off your brain when applying them. You must understand what the rule means and why it's important enough that someone bothered to make a rule.
The biggest reason I know to keep things flat is because of another philosophy: do one thing and do it well. Lots of little special purpose classes living inside other classes is a sign that you're not abstracting enough. I.e., you should be removing the need and desire to have inner classes, not just moving them outside for the sake of following rules.
But sometimes you really do have some behavior that should be abstracted into a class, and it's a special case that only obtains within another single class. In that case you should use an inner class because it makes sense, and it tells anyone else reading the code that there's something special going on there.
Don't slavishly follow rules.
Do understand the reason for a rule and respect that.
"Flat is better than nested" is focused on avoiding excessive nesting -- i.e., seriously deep hierarchies. One level of nesting, per se, is no big deal: as long as your nesting still respects (a weakish form of) the Law of Demeter, as typically confirmed by the fact that you don't find yourself writing stuff like onething.another.andyet.anotherone (too many dots in an expression are a "code smell" suggesting you've gone too deep in nesting and need to refactor to flatten things out), I wouldn't worry too much.
Actually, I'm not sure if I agree with the whole premise that "Flat is better than nested". Sometimes, quite often actually, the best way to represent something is hierarchically... Even nature itself, often uses hierarchy.
Related
When I started learning Python, I created a few applications just using functions and procedural code. However, now I know classes and realized that the code can be much readable (and subjectively easier to understand) if I rewrite it with classes.
How much slower the equivalent classes may get compared to the functions in general? Will the initializer, methods in the classes make any considerable difference in speed?
To answer the question: yes, it is likely to be a little slower, all else being equal. Some things that used to be variables (including functions) are now going to be object attributes, and self.foo is always going to be slightly slower than foo regardless of whether foo was a global or local originally. (Local variables are accessed by index, and globals by name, but an attribute lookup on an object is either a local or a global lookup, plus an additional lookup by name for the attribute, possibly in multiple places.) Calling a method is also slightly slower than calling a function -- not only is it slower to get the attribute, it is also slower to make the call, because a method is a wrapper object that calls the function you wrote, adding an extra function call overhead.
Will this be noticeable? Usually not. In rare cases it might be, say if you are accessing an object attribute a lot (thousands or millions of times) in a particular method. But in that case you can just assign self.foo to a local variable foo at the top of the method, and reference it by the local name throughout, to regain 99.44% of the local variable's performance advantage.
Beyond that there will be some overhead for allocating memory for instances that you probably didn't have before, but unless you are constantly creating and destroying instances, this is likely a one-time cost.
In short: there will be a likely-minor performance hit, and where the performance hit is more than minor, it is easy to mitigate. On the other hand, you could save hours in writing and maintaining the code, assuming your problem lends itself to an object-oriented solution. And saving time is likely why you're using a language like Python to begin with.
No.
In general you will not notice any difference in performance based on using classes or not. The different code structures implied may mean that one is faster than the other, but it's impossible to say which.
Always write code to be read, then if, and only if, it's not fast enough make it faster. Remember: Premature optimization is the root of all evil.
Donald Knuth, one of the grand old minds of computing, is credited with the observation that "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." Deciding to use procedural techniques rather than object-oriented ones on the basis of speed gains that may well not be realized anyway is not a sensible strategy.
If your code works and doesn't need to be modified then feel free to leave it alone. If it needs to be modified then you should consider a judicious refactoring to include classes, since program readability is far more important than speed during development. You will also see benefits in improved maintainability. An old saw from Kernighan and Plauger's "Elements of Programming Style" still applies:
First, make it work. Then (if it doesn't work fast enough) make it work faster.
But, first and foremost, go for readability. Seriously.
You probably don't care as much as you think you do.
Really.
Sure, code with classes might be a little slower through indirection. Maybe. That is what JIT compilation is for, right? I can never remember which versions of python do this and which don't, because:
Performance doesn't matter.
At least constant performance differences like this. Unless you are doing a hell of a lot of computations (you aren't!), you will spend more time developing/debugging/maintaining your code. Optimize for that.
Really. Because you will never ever be able to measure the difference, unless you are in a tight loop. And you don't want to be doing that in python anyway, unless you don't really care about time. It's not like you're trying to balance your segway in python, right? You just want to compute some numbers, right? Your computer is really good at this. Trust it.
That said, this doesn't mean classes are the way to go. Just that speed isn't the question you should be asking. Instead, try to figure out what representation will be the best for your code. It seems, now you know classes, you will write clean code in OO fashion. Go ahead. Learn. Iterate.
I have taught myself python in quite a haphazard way. So my question perhaps won't be very pythonic.
When I write functions in classes, I often lose overview of what each function does. I do try to name them properly. But still, they are sometimes smaller parts of code where it seems arbitrary in which functions to put them. So whenever I want to make changes, I still need to go through the entire code in order to figure out how my functions actually flow.
From what I understand, we have objects and we have functions, and these are our units for structuring our code. But this only gives you a flat structure. It doesn't allow you to do any kind of nesting, like in a tree diagram, over multiple levels. Especially, the code in my file doesn't automatically water itself so that the functions that are called first and more often automatically further up on the top, whereas helper functions would be automatically further down in the document, or even nested.
In fact, even being able to visually nest lower-order subroutines "inside" a higher-order function that calls it would seem helpful. But it's not something that would be supported by Python's syntax. (Plus, it wouldn't quite suffice, because a sub-routine might be used by several higher-order functions.)
I imagine it would be useful to see all functions in my code visualized as a tree, or as a concept map:
Where each function is a dot. And calling order visualized by arrows. This way, I would also easily see which functions are more central, and which are more outliers, or even orphaned.
Yet perhaps this isn't even a case for another tool. Perhaps this is more a case of me not understanding proper coding. Perhaps there is something I can do differently in order to get the kind of intuitive overview over how my program works, without needing another tool.
Firstly, I am not quite sure why this isn't asked more often. Reading code is not intuitive, at all! We should be able to visualize the evolution of a process or function so well that close to every one will be able to understand its behavior. In the 60s and so, people had to be reasonably sure their programs would run, because getting access to the computer would take time; today we execute or compile our program, run tests if we have them, and get to know immediately whether it works. What happened is there is less mental effort now, we execute a bit less code in our heads, and the computer a bit more. But we must still think of how the program behaves midst execution in order to debug. For the future, it would be nice if the computer could just tell us how the program behaves.
You propose looking at a sort of tree of the program as a resolute, and after all, the abstract syntax tree is literally a tree, but I don't think this is what we ought to spend our efforts on when it comes to visualizing systems. What would be preferable is if we could look at an interactive view of how the problem changes its intermediate data-structures as a function of time.
Currently, we have debuggers--but that's akin to looking at the issue by asking what a function is at many values, when we would much rather look at its graph. A lot of programming is done by doing something you feel is right, observing if the behavior correct, if it's not correct then we make modifications by reacting and correcting said behavior.
Bret Victor in his essay, Learnable Programming, discusses this topic. I highly recommend it, even though it won't help you right now, maybe you can help others in the future by making these ideas more prevalent.
Onwards, then to where I tell you what you can do right now. In his book Clean Code, Robert C. Martin proposes structuring code much like how a newspaper is laid out.
Think of a well-written newspaper article. You read it vertically. At the top you expect a headline that will tell you what the story is about and allows you to decide whether it is something you want to read. The first paragraph gives you a synopsis of the whole story, hiding all the details while giving you the broad-brush concepts. As you continue downward, the details increase until you have all the dates, names, quotes, claims, and other minutia.
What is proposed, is to organize your program top-down, with higher level procedures that call mid-level procedures, which in turn call the lower level procedures. At any place, it should be obvious that (1) you are at the appropriate level of abstraction, and (2) you are looking at the part of the program implementing the behavior you seek to modify.
This means storing state at the level where it belongs, and exposing it anywhere else. Procedures should take only the parameters they need, because more parameters means you must also think about more parameters when reasoning about the code.
This is the primary reason for decomposing procedures into smaller procedures. For example, here some code I've written previously. It's not perfect, but you can see very clearly which part of the program you need to go to if you want to change anything.
Of course, higher order procedures are listed before any other. I'm telling you what I'm going to do, before I show you how I do it.
function formatLinks(matches, stripped) {
let formatted_links = []
for (match of matches) {
let diff = difference(match, stripped)
if (isSimpleLink(diff)) {
formatted_links.push(formatAsSimpleLink(diff))
} else if (hasPrefix(diff)) {
formatted_links.push(formatAsPrefixedLink(diff))
} else if (hasSuffix(diff)) {
formatted_links.push(formatAsSuffixedLink(diff))
}
}
// There might be multiple links within a word
// in which case we join them and strip any duplicated parts
// (which are otherwise interpreted as suffixes)
if (formatted_links.length > 1) {
return combineLinks(formatted_links)
}
// Default case
return formatted_links[0]
}
If JavaScript was a typed language, and if we could see an image of the decisions made in the code, as a factor of input and time, this could be even better.
I think Quokka.js and VS Code Debug Visualizer are both doing interesting work in this sector.
I'm looking for more information about dealing with trees of class instances, and how best to go about calling methods on the leaves from the trunk. I have a trunk instance with many branch instances (in a dictionary), and each has many leaf instances (and dicts in the branches). The leaves are where the action really happens, and as such there are methods in the leaves for querying values, restoring values, and many other things.
This leads to what feels like duplication of code, as I might want to do something to all leaves of a branch, so there are methods in the branches for doing something to a leaf, a specified set of leaves, or all leaves known to that branch, though these do reduce the code duplication by simply looping over the leafs and asking them to do said things to themselves (thus the actual code doing the work is in one place in the leaf class).
Then the trunk comes in, where I might want to do something to the entire tree (i.e. all leaves) in one fell swoop, so I have methods there that ask all known objects to run their all-leaf functions. I start to feel pretty removed from the real action in the leaves this way, though it works fine, and the code seems fairly tight - extremely brief, readable, and functioning fine.
Another issue comes in logical groupings. There are bits of data I might want to associate with some, most, or all leaves, to indicate that they're part of some logical group, so currently the leaves themselves are all storing that kind of data. When I want to get a logical group, I have to scan all leaves and gather them back up, rather than having some sort of list at the trunk level. This actually all works fine, and is even pretty logical, yet it feels insane. Is this simply the nature of working with tree-like structures, because of their complexity, or are there other ways of doing these kinds of things? I prefer not to build secondary structures to connect to things from the opposite direction - e.g. making a structure with references to the leaves in a logical group, approaching them then from that more list-like direction. One bonus of keeping things all in a large tree like this is that it can be dumped and loaded in one shot with pickle.
I'd love to hear thoughts - any and all - from anyone else's experience with such things.
What I'm taking away from your question is that "everything works", but that the code is starting to feel unmanagable and difficult to reason about, and: is there a better way to do this?
The one thing your question is missing is a solid context. What sort of problem is your tree structure actually solving? What do these object actually do? Are they all the same type of object, or is there a mix of objects? With some of these specifics you might get more practical responses.
As it stands, I would suggest checking out some resources on design patterns. Specifically the composite and visitor patterns.
On the book end of things you could have a look at Design Patterns and/or Refactoring to Patterns. Neither of these have any Python code in them, but if you don't mind Java, the latter is an excellent introduction to taking hard to reason code structures and using a pattern to better organize things.
You might also have a look at Alex Martelli's talk on Python Design Patterns.
This question has some further resource links regarding patterns and python in general.
Hope that helps.
Okay so i am currently working on an inhouse statistics package for python, its mainly geared towards a combination of working with arcgis geoprocessor, for modeling comparasion and tools.
Anyways, so i have a single class, that calculates statistics. Lets just call it Stats. Now my Stats class, is getting to the point of being very large. It uses statistics calculated by other statistics, to calculate other statistics sets, etc etc. This leads to alot of private variables, that are kept simply to prevent recalculation. however there is certain ones, while used quite frequintly they are often only used by one or two key subsections of functionality. (e.g. summation of matrix diagonals, and probabilities). However its starting to become a major eyeesore, and i feel as if i am doing this terribly wrong.
So is this bad?
I was recommended by a coworker, to simply start putting core and common functionality togther, in the main class, then simply having capsules, that take a reference to the main class, and simply do what ever functionality they need to within themselves. E.g. for calculating accuracy of model predictions, i would create a capsule, who simply takes a reference to the parent, and it will offload all of the calculations needed, for model predictions.
Is something like this really a good idea? Is there a better way? Right now i have over a dozen different sub statistics that are dumped to a text file to make a smallish report. The code base is growing, and i would just love it if i could start splitting up more and more of my python classes. I am just not sure really what the best way about doing stuff like this is.
Why not create a class for each statistic you need to compute and when of the statistics requires other, just pass an instance of the latter to the computing method? However, there is little known about your code and required functionalities. Maybe you could describe in a broader fashion, what kind of statistics you need calculate and how they depend on each other?
Anyway, if I had to count certain statistics, I would instantly turn to creating separate class for each of them. I did once, when I was writing code statistics library for python. Every statistic, like how many times class is inherited or how often function was called, was a separate class. This way each of them was simple, however I didn't need to use any of them in the other.
I can think of a couple of solutions. One would be to simply store values in an array with an enum like so:
StatisticType = enum('AveragePerDay','MedianPerDay'...)
Another would be to use a inheritance like so:
class StatisticBase
....
class AveragePerDay ( StatisticBase )
...
class MedianPerDay ( StatisticBase )
...
There is no hard and fast rule on "too many", however a guideline is that if the list of fields, properties, and methods when collapsed, is longer than a single screen full, it's probably too big.
It's a common anti-pattern for a class to become "too fat" (have too much functionality and related state), and while this is commonly observed about "base classes" (whence the "fat base class" monicker for the anti-pattern), it can really happen without any inheritance involved.
Many design patterns (DPs for short_ can help you re-factor your code to whittle down the large, untestable, unmaintainable "fat class" to a nice package of cooperating classes (which can be used through "Facade" DPs for simplicity): consider, for example, State, Strategy, Memento, Proxy.
You could attack this problem directly, but I think, especially since you mention in a comment that you're looking at it as a general class design topic, it may offer you a good opportunity to dig into the very useful field of design patterns, and especially "refactoring to patterns" (Fowler's book by that title is excellent, though it doesn't touch on Python-specific issues).
Specifically, I believe you'll be focusing mostly on a few Structural and Behavioral patterns (since I don't think you have much need for Creational ones for this use case, except maybe "lazy initialization" of some of your expensive-to-compute state that's only needed in certain cases -- see this wikipedia entry for a pretty exhaustive listing of DPs, with classification and links for further explanations of each).
Since you are asking about best practices you might want to check out pylint (http://www.logilab.org/857). It has many good suggestions about code style including ones relating to how many private variables in a class.
I have read a few articles around alternatives to the switch statement in Python. Mainly using dicts instead of lots of if's and elif's. However none really answer the question: is there one with better performance or efficiency? I have read a few arguments that if's and elifs would have to check each statement and becomes inefficient with many ifs and elif's. However using dicts gets around that, but you end up having to create new modules to call which cancels the performance gain anyways. The only difference in the end being readability.
Can anyone comment on this, is there really any difference in the long run? Does anyone regularly use the alternative? Only reason I ask is because I am going to end up having 30-40 elif/if's and possibly more in the future. Any input is appreciated. Thanks.
dict's perfomance is typically going to be unbeatable, because a lookup into a dict is going to be O(1) except in rare and practically never-observed cases (where they key involves user-coded types with lousy hashing;-). You don't have to "create new modules" as you say, just arbitrary callables, and that creation, which is performed just once to prep the dict, is not particularly costly anyway -- during operation, it's just one lookup and one call, greased lightning time.
As others have suggested, try timeit to experiment with a few micro-benchmarks of the alternatives. My prediction: with a few dozen possibilities in play, as you mention you have, you'll be slapping your forehead about ever considering anything but a dict of callables!-)
If you find it too hard to run your own benchmarks and can supply some specs, I guess we can benchmark the alternatives for you, but it would be really more instructive if you tried to do it yourself before you ask SO for help!-)
Your concern should be about the readability and maintainability of the code, rather than its efficiency. This applies in most scenarios, and particularly in the one you describe now. The efficiency difference is likely to be negligible (you can easily check it with a small amount of benchmarking code), but 30-40 elif's are a warning sign - perhaps something can be abstracted away and make the code more readable. Describe your case, and perhaps someone can come up with a better design.
With all performance/profiling questions, the right answer is "test each case yourself for your specific needs."
One great tool for this is timeit which you can learn about in the python docs.
In general I have seen no performance issues related to using a dictionary in place of other languages switch statement. My guess would be that the comparison in performance would depend on the number of alternatives. Who knows, there may be a tipping point where one becomes better than the other.
If you (or anyone else) tests it, feel free to post your results.
Times when you'd use a switch in many languages you would use a dict in Python. A switch statement, if added to Python (it's been considered), would not be able to give any real performance gain anyways.
dicts are used ubiquitously in Python. CPython dicts are an insanely-efficient, robust hashtable implementation. Lookup is O(1), as opposed to traversing an elif chain, which is O(n). (30-40 probably doesn't qualify as big enough for this to matter tons anyways). I am not sure what you mean about creating new modules to call, but using dicts is very scalable and easy.
As for actual performance gain, that is impossible to really tackle effectively abstractly. Write your code in the most straightforward and maintainable way (you're using Python forgoshsakes!) and then see if it's too slow. If it is, profile it and find out what places it needs to be sped up to make a real difference.
I think a dict will gain advantage over the alternative sequence of if statements as the number of cases goes up, since the key lookup only requires one hash operation. Otherwise if you only have a few cases, a few if statements are better. A dict is probably a more elegant solution for what you are doing. Either way, the performance difference wont really be noticeable in your case.
I ran a few benchmarks (see here). Using lists of function pointers is fastest,if your keys are sequential integers. For the general case: Alex Martelli is right, dictionary are fastest.