Python: Separate class methods from the class in an own package - python

I wrote a class with many different parameters, depending on the parameter the class uses different actions.
I can easily differentiate my methods into different cases, each set of methods belonging to certain parameters.
This resulted in a huge .py file, implementing all methods in the one class. For better readability, is it possible to write multiple methods in an own file and load it (similar as a package) into the class to treat them as class methods?
To give more details, my class is a decision tree. A parameter for example is the pruning method, used to shrink the tree. As I use different pruning methods, this takes a lot of lines in my class. I need to have a set of methods for each pruning parameter. It would be nice to simply load the methods for pruning from another file into the class and therefore shrinking the size of my decision tree .py file.

For better readability, I got a few suggestions.
Collapse all functions definitions, which can easily be achieved by
one click in most of the popular text editors.
Place related methods next to each other.
Give proper names to categorise and differentiate methods, for
example.
def search_board_first(): pass
def search_deep_first(): pass
Regarding splitting a huge class into a Object oriented behaviour, my rule of thumb is to consider the re-usability. If functions can be reused by other classes, they should be put in separate files and make them independent(static) of other classes.
If the methods are tied to the class and no where else, it is better to just to enclose that method within the class itself. Think it this way, to review the code, you need to refer the class properties anyway. Logically it doesn't quite make sense to split files just for splitting.

Related

Both Inheritance and composition in Python, bad practice?

I'm working a project, where the natural approach is to implement a main object with sub-components based on different classes, e.g. a PC consisting of CPU, GPU, ...
I've started with a composition structure, where the components have attributes and functions inherent to their sub-system and whenever higher level attributes are needed, they are given as arguments.
Now, as I'm adding more functionality, it would make sense to have different types of the main object, e.g. a notebook, which would extend the PC class, but still have a CPU, etc. At the moment, I'm using a separate script, which contains all the functions related to the type.
Would it be considered bad practice to combine inheritance and composition, by using child classes for different types of the main object?
In short
Preferring composition over inheritance does not exclude inheritance, and does not automatically make the combination of both a bad practice. It's about making an informed decision.
More details
The recommendation to prefer composition over inheritance is a rule of thumb. It was first coined by GoF. If you'll read their full argumentation, you'll see that it's not about composition being good and inheritance bad; it's that composition is more flexible and therefore more suitable in many cases.
But you'll need to decide case by case. And indeed, if you consider some variant of the composite pattern, specialization of the leaf and composite classes can be perfectly justified in some situations:
polymorphism could avoid a lot of if and cases,
composition could in some circumstances require additional call-forwarding overhead that might not be necessary when it's really about type specialization.
combination of composition and inheritance could be used to get the best of both worlds (caution: if applied carelessly, it could also give the worst of both worlds)
Note: If you'd provide a short overview of the context with an UML diagram, more arguments could be provided in your particular context. Meanwhile, this question on SE, could also be of interest

How to deal with global parameters in a scientific Python script

I am writing a piece of scientific software in Python which comprises both a Poisson equation solver (using the Newton method) on a rectangular mesh, and a particle-in-cell code. I've written the Newton Solver and the particle-in-cell code as separate functions, which are called by my main script.
I had originally written the code as one large script, but decided to break up the script so that it was more modular, and so that the individual functions could be called on their own. My problem is that I have a large number of "global" variables which I consider parameters for the problem. This includes mostly problem constants and parameters which define the problem geometry and mesh (such as dimensions, locations of certain boundary conditions, boundary conditions etc.).
These parameters are required by both the main script and the individual functions. My question is: What is the best way (and most proper) to store these variables such that they can be accessed by both the main script and the functions.
My current solution is to define a class in a separate module (parameters.py) as so:
class Parameters:
length = 0.008
width = 0.0014
nz = 160
nr = 28
dz = length/nz
dr = width/nr
...
In my main script I then have:
from parameters import Parameters
par = Parameters()
coeff_a = -2 * (1/par.dr**2 + 1/par.dz**2)
...
This method allows me to then use par as a container for my parameters which can be passed to any functions I want. It also provides an easy way to easily set up the problem space to run just one of the functions on their own. My only concern is that each function does not require everything stored in par, and hence it seems inefficient passing it forward all the time. I could probably remove many of the parameters from par, but then I would need to recalculate them every time a function is called, which seems even more inefficient.
Is there a standard solution which people use in these scenarios? I should mention that my functions are not changing the attributes of par, just reading them. I am also interested in achieving high performance, if possible.
Generally, when your program requires many parameters in different places, it makes sense to come up with a neat configuration system, usually a class that provides a certain interface to your own code.
Upon instantiation of that class, you have a configuration object at hand which you can pass around. In some places you might want to populate it, in other places you just might want to use it. In any case, this configuration object will be globally accessible. If your program is a Python package, then this configuration mechanism might be written in its own module which you can import from all other modules in your package.
The configuration class might provide useful features such as parameter registration (a certain code section says that it needs a certain parameter to be set), definition of defaults and parameter validation.
The actual population of parameters is then based on defaults, user-given commandline arguments or user-given input files.
To make Jan-Philip Gehrcke's answer more figurative, check out A global class pattern for python (btw: it's just a normal class, nothing special about "global" - but you can pass it around "globally").
Before actually implementing this in my own program, I had the same idea but wanted to find out how others would do it (like questioner nicholls). I was a bit skeptical to implement this in the first place, in particular it looked quite strange to instantiate a class in the module itself. But it works fine.
However, there are some things to keep in mind though:
It is not super clean. For instance, someone that doesn't know the function in your module wouldn't expect that a parameter in a configuration class needs to be set
If you have to reload your module/functions but want to maintain the values set in your configuration class, you should not instantiate the configuration class again: if "mem" not in locals(): mem = Mem()
It's not advised to assign a parameter from your configuration class as a default argument for a function. For example function(a, b=mem.defaultB).
You cannot change this default value later after initialization. Instead, do function(a, b=None): if b is None: b=mem.defaultB. Then you can also adjust your configuration class after you loaded your module/functions.
Certainly there are more issues...

Python 3.2: Information on class, class attributes, and values of objects

I'm new to Python, and I'm learning about the class function. Does anyone know of any good sources/examples for this? Here's an example I wrote up. I'm trying to read more about self and init
class Example:
def __init__(a, b, c, d):
self.a = a
self.b = b
self.c = c
self.d = d
test = Example(1, 1, 1, 1)
I've been to python.org, as well as this site. I've also been reading beginners Python books, but I'd like more information.
Thanks.
A couple of generic clarifications here:
Python's "class" keyword is not a function, it's a statement which signals to the language that the following code describes a class (a user-defined data type and its associated behavior). "class" takes a name (and a possibly empty list of "parent" classes) ... and introduces a "suite" (an indented block of code).
The "def" keyboard is a similar statement which defines a function. In your example, which should have read: *def _init_(self, a, b, c)*) you're defining a special type of function which is "part of" (associated with, bound to) the Example class. It's also possible (and fairly common) to create unbound functions. Commonly, in Python, unbound functions are simple called "functions" while those which are part of a class are called "methods" ... or "instance functions."
classes are templates for instantiating objects. The terms "instance" and "object" are synonymous in this context. Your example "test" is an instance ... and the Python interpreter "instantiates" and initializes that object according to the class description.
A class is also a "type", that is to say that it's a user definition of a type of data and its associated methods. "class" and "type" are somewhat synonymous in Python though they are conventionally used in different ways. The core "types" of Python data (integers, real numbers, imaginary/complex numbers, strings, lists, tuples, and dictionaries) are all referred to as "types" while the more complex data/operational structures are called classes. Early versions of Python were implemented with constraints that made the distinction between "type" and "class" more than merely a matter of terminological difference. However, the last several versions of Python have eliminated those underlying technical distinctions. Those distinctions related to "subclassing" (inheritance).
classes can be described as a set of additions and modifications to another class. This is called "inheritance" and the class which is derived from another in this manner is referred to as a "subclass." It's common for programmers to create hierarchies of classes ... with specific variations all deriving from more common bases. It's also common to define related functionality within the same files or sets of files. These are "class libraries" and sometimes they are built as "packages."
_init_() is a method; in particular it's the initializer for Python objects (instances of a class).
Python generally uses _..._ (prefixing and suffixing pairs of underscore characters around selected keywords) for "special" method or attribute names ... which is intended to reduce the likelihood that its naming choices will conflict with the meaningful names that you might wish to give to your own methods. While you can name your other methods and attributes with this _XXXX_ --- Python will not inherently treat that as an error --- it's an extremely bad idea to do so. Even if you don't pick any of the currently defined special names there's no guarantee that some future version of Python won't conflict with your usage later.
"methods" are functions ... but they are a type of function which is bound (associated with) a particular instance of a particular class. There are also "class methods" which are associated with the class rather than with a specific instance of the class.
In your example self.b, self.c and so on are "attributes" or "members" of the object (instance). (The terms are synonymous).
In general the purpose of object orient programming is to provide ways of describing types of data and operations on those types of data in a way that's amenable to both human comprehension and computerized interpretation and execution. (All programming languages are intended to strike some balance between human readability/comprehension and machine parsing/execution; but object-oriented languages do so specifically with a focus on the description and definition of "types," and the instantiation of those types into "objects" and finally the various interactions among those objects.
"self" is a Python specific convention for naming the special (and generally required) first argument to any bound method. It's a "self" reference (a way for the code in a method to refer to the object/instance's own attributes and other methods without any ambiguity). While you can call your first argument to your bound methods "a" (as you've unwittingly done in your Example) it's an extremely bad idea to do so. Not only will it likely confuse you later ... it will make no sense to anyone else trying to read your Python code).
The term "object-oriented" is confusing unless one is aware of the comparisons to other forms of programming language. It's an evolution from "procedural" programming. The simplest gist of that comparison is when you consider the sorts of functions one would define in a procedural language were one might have to define and separately name different functions to perform analogous operations on different types of data: print_student_record(this_student) vs. print_teacher_report(some_report) --- a programming model which necessitates a fair amount of overhead on the part of the programmer, to keep track of which functions work on which types. This sort of problem is eliminated in OO (object oriented) programming where one can, conceivably, call on this.print_() ... and, assuming one has created compatible classes, this will "print" regardless of whether "this" is a student (record/object) or a teacher (report/record/object). That's greatly oversimplified but useful for understanding the pressures which led to the development and adoption of OO based programming.
In Python it's possible to create classes with little or no functionality. Your example does nothing, yet, but transfer a set of arguments into "attributes" (members) during initialization (instantiation). After that you could use these attributes in programming statements like: test.a += 1 and print (test.a). This is possible because Python is a "multi-paradigm" language. It supports procedural as well as object-orient programming styles. Objects used this way are very similar to "structs" from the C programming language (predecessor to C++) and to the "records" in Pascal, etc. That style of programming is largely considered to be obsolete (particularly when using a modern, "very high level" language such as Python).
The gist of what I'm getting at is this ... you'll want to learn how to think of your data as the combination of it's "parts" (attributes) and the functionality that changes, manipulates, validates, and handles input, output, and possibly storage, of those attributes.
For example if you were writing a "code breaker" program to solve simply ciphers you might implement a "Histogram" object which counts the letter frequencies of a given coded message. That would have attributes (one integer for every letter) and behavior (feeding ports of the coded message(s) into the instance, splitting the strings into individual characters, filtering out all the non-letter characters, converting all the letters to upper or lower case, and counting them --- that is incrementing the integer corresponding to each letter). Additionally you'd need to have some way of querying the histogram ... for example getting list of the letters sorted by their frequency in the cipher text.
Once you had such a "histogram" class then you could think of ways to use that for your solver. For example to solve a cryptogram puzzle you might computer the histogram then try substituting "etaon" for the five most common ciphered letters ... then check how many of the "partial" strings (t.e for "the") match words, trying permutations, and so on. Each of these might be it's own class. A key point of programming is that your histogram class might be useful for counting all sorts of other things (even in a simple voting system or popularity context). A particular subclass or instantiation might make it a histogram of letters while others could be re-used for other types of "things" you want counted. Similarly the code that iterates over permutions of some list might be used in any number of simulation, optimization, and related programs. (In fact Python's standard libraries already including "counters" and "permutations" functions in the "collections" and "itertools" modules, respectively).
Of course you're going to hear of all of these concepts repeatedly as you study programming. This has been a rather rambling attempt to kickstart that process. I know I've been a bit repetitious in a few points here --- part of that is because I'm typing this a 4am after having started work at 7am yesterday; but part of it serves a pedagogical purpose as well.
There's an error in your class definition. Always include the variable self in your __init__ method. It represents the instance of the object itself and should be included as the first parameter to all of your methods.
What do you want to accomplish with this class? Up until now, it just stores a few variables. Try adding a few methods to spice things up a little bit! There is a plethora of available resources on classes in Python. To start, you may wanna give this one a try:
Python Programming - Classes
I'm learning python now also, and there is an intro class that's pretty good on codecademy.com
http://www.codecademy.com/tracks/python
It has a section that goes through an exercise on classes. Hope this helps

A long definition of an object inside a Python unit test

I'm unit testing my application. What most of the tests do is calling a function with specific arguments and asserting the equality of the return value with an expected value.
In some tests the expected return value is a relatively big object. One of them, for example, is a dictionary which maps 5 strings to lists of tuples. It takes 40-50 repetitive lines of code to define that object, but that object is an expected value of one of the functions I'm testing. I don't want to have a 40-50 lines of code defining an expected return value inside a test function because most of my test functions consist of 3-6 lines of code. I'm looking for a best practice for such situations. What is the right way of putting lengthy definitions inside a test?
Here are the ideas I was thinking of to address the issue, ranked from the best to the worst as I see it:
Testing samples of the object: Making a few equality assertions based on a subset of the keys. This will sacrifice the thoroughness of the test for the sake of code elegance.
Defining the object in a separate module: Writing the lengthy 40-50 lines of code in a separate .py file, importing the module in the test and then make the equality assertion. This will make the test short and concise but I don't like having a separate file as a supplement to a test; the object definition is part of the test after all.
Defining the object inside the test function: This is the trivial solution which I wish to avoid. My tests are pretty simple and straightforward and the lengthy definition of that object doesn't fit.
Maybe I'm too obsessed with clean code, but I like none of the above solutions. Is there another common practice I haven't thought of?
I'd suggest using a separation of testing code and testing data. For this reason I usually create an abstract base class which contains the methods I'd like to test and create several specific test case classes to tie the methods to the data. (I use the Django framework, so all abstract test classes I put into testbase.py):
testbase.py:
class TestSomeFeature(unittest.TestCase):
test_data_A = ...
def test_A(self):
... #perform test
and now the implementations in test.py
class TestSomeFeatureWithDataXY(testbase.TestSomeFeature):
test_data_A = XY
The test data can also be externalized, e.g a JSON file:
class TestSomeFeatureWithDataXYZ(testbase.TestSomeFeature):
#property
def test_data_A(self):
return json.load("data/XYZ.json")
I hope I made my points clear enough. In your case I'd strongly opt for using data files. Django supports this out-of-the-box by using test fixtures to be loaded into the database prior executing any tests.
It really depends on what you want to test.
If you want to test that a dictionary contains certain keys with certain values, then I would suggest separate assertions to check each key. This way your test will still be valid if the dictionary is extended, and test failures should clearly identify the problem (an error message telling you that one 50-line long dictionary is not equal to a second 50 line long dictionary is not exactly clear).
If you really do want to verify that the dictionary contains only the given keys, then a single assertion might be appropriate. Define the object you are comparing against where it is most clear. If defining it in a separate file (as Constantinius's answer suggests) makes things more readable then consider doing that.
In both cases, the guiding principle is to only test the behaviour you care about. If you test behaviour you don't care about, you may find your test suite more obstructive than helpful when refactoring.

How many private variables are too many? Capsulizing classes? Class Practices?

Okay so i am currently working on an inhouse statistics package for python, its mainly geared towards a combination of working with arcgis geoprocessor, for modeling comparasion and tools.
Anyways, so i have a single class, that calculates statistics. Lets just call it Stats. Now my Stats class, is getting to the point of being very large. It uses statistics calculated by other statistics, to calculate other statistics sets, etc etc. This leads to alot of private variables, that are kept simply to prevent recalculation. however there is certain ones, while used quite frequintly they are often only used by one or two key subsections of functionality. (e.g. summation of matrix diagonals, and probabilities). However its starting to become a major eyeesore, and i feel as if i am doing this terribly wrong.
So is this bad?
I was recommended by a coworker, to simply start putting core and common functionality togther, in the main class, then simply having capsules, that take a reference to the main class, and simply do what ever functionality they need to within themselves. E.g. for calculating accuracy of model predictions, i would create a capsule, who simply takes a reference to the parent, and it will offload all of the calculations needed, for model predictions.
Is something like this really a good idea? Is there a better way? Right now i have over a dozen different sub statistics that are dumped to a text file to make a smallish report. The code base is growing, and i would just love it if i could start splitting up more and more of my python classes. I am just not sure really what the best way about doing stuff like this is.
Why not create a class for each statistic you need to compute and when of the statistics requires other, just pass an instance of the latter to the computing method? However, there is little known about your code and required functionalities. Maybe you could describe in a broader fashion, what kind of statistics you need calculate and how they depend on each other?
Anyway, if I had to count certain statistics, I would instantly turn to creating separate class for each of them. I did once, when I was writing code statistics library for python. Every statistic, like how many times class is inherited or how often function was called, was a separate class. This way each of them was simple, however I didn't need to use any of them in the other.
I can think of a couple of solutions. One would be to simply store values in an array with an enum like so:
StatisticType = enum('AveragePerDay','MedianPerDay'...)
Another would be to use a inheritance like so:
class StatisticBase
....
class AveragePerDay ( StatisticBase )
...
class MedianPerDay ( StatisticBase )
...
There is no hard and fast rule on "too many", however a guideline is that if the list of fields, properties, and methods when collapsed, is longer than a single screen full, it's probably too big.
It's a common anti-pattern for a class to become "too fat" (have too much functionality and related state), and while this is commonly observed about "base classes" (whence the "fat base class" monicker for the anti-pattern), it can really happen without any inheritance involved.
Many design patterns (DPs for short_ can help you re-factor your code to whittle down the large, untestable, unmaintainable "fat class" to a nice package of cooperating classes (which can be used through "Facade" DPs for simplicity): consider, for example, State, Strategy, Memento, Proxy.
You could attack this problem directly, but I think, especially since you mention in a comment that you're looking at it as a general class design topic, it may offer you a good opportunity to dig into the very useful field of design patterns, and especially "refactoring to patterns" (Fowler's book by that title is excellent, though it doesn't touch on Python-specific issues).
Specifically, I believe you'll be focusing mostly on a few Structural and Behavioral patterns (since I don't think you have much need for Creational ones for this use case, except maybe "lazy initialization" of some of your expensive-to-compute state that's only needed in certain cases -- see this wikipedia entry for a pretty exhaustive listing of DPs, with classification and links for further explanations of each).
Since you are asking about best practices you might want to check out pylint (http://www.logilab.org/857). It has many good suggestions about code style including ones relating to how many private variables in a class.

Categories