Refactoring a huge Python class using Inheritance to do Composition

Refactoring a huge Python class using Inheritance to do Composition - python

I built a pygame game a few years back. It worked, but wasn't the best coding style and had a lot of classic code smells. I've recently picked it back up and am trying to refactor it with more discipline this time.
One big code smell was that I had a huge class, GameObject, that inherited from pygame.sprite.DirtySprite which had a lot of code related to:
various ways of moving a sprite
various ways of animating a sprite
various ways of exploding a sprite
etc.
The crazier I though of ways for sprites to behave, the code duplication was adding up and changes were getting more difficult. So, I started breaking out functionality into lots of smaller classes and then passing them in at object creation:
class GameObject(DirtySprite):
def __init__(initial_position, mover_cls, imager_cls, exploder_cls):
self.mover = mover(self, initial_position)
self.imager = imager(self)
self.exploder = exploder(self)
...
spaceship = GameObject(pos, CrazyMover, SpaceshipImager, BasicExploder)
As I factored out more and more code into these helper classes, the code was definitely better, more flexible and had less duplication. However, for each type of helper classes, the number of parameters got longer and longer. Creating sprites became a chore and the code was ugly. So, during another refactor I created a bunch of really small classes to do the composition:
class GameObjectAbstract(MoverAbstract, ImagerAbstract, \
ExploderAbstract, DirtySprite):
def __init__(self, initial_position):
...
...
class CrazySpaceship(CrazyMover, SpaceshipImager, BasicExploder, GameObjectAbstract):
pass # Many times, all the behavior comes from super classes
...
spaceship = CrazySpaceship(pos)
I like this approach better. Is this a common approach? It seems to have the same benefits of having all the logic broken out in small classes, but creating the objects is much cleaner.
However, this approach isn't as dynamic. I cannot, for example, decide on a new mashup at run-time. However, this wasn't something I was really doing. While I do a lot of mashups, it seems OK that they are statically defined using class statements.
Am I missing anything when it comes to future maintainability and reuse? I hear that composition is better than inheritance, but this feels like I'm using inheritance to do composition - so I feel like this is OK.
Is there a different pattern that I should be using?

That is ok, if you can separate the behaviors well enough -
Just that it is not "composition" at all - it is multiple inheritance, using what we call "mixin classes": a mixin class is roughly a class that provides an specific behavior that can be combined with other classes.
If you are using Python's super correctly, thatcouldbe the best approach. (If you are managing to create your game objects basically just defining the class name and the mixin classes it uses, that is actually a very good approach)
By the way, if you ever want to create new classes at runtime with this method, it is also possible - just use a call to type to create a new class, instead of a class statement:
class CrazySpaceship(CrazyMover, SpaceshipImager, BasicExploder, GameObjectAbstract):
pass # Many times, all the behavior comes from super classes
Is just equivalent in Python to:
CrazySpaceShip = type('CrazySpaceShip', (CrazyMover, SpaceshipImager, BasicExploder, GameObjectAbstract), {})
And the tuple you used as second parameter can be any sequence built at runtime.

Related

What are the advantages of declaring methods in classes instead of functions?

This is the normal/right way to do in a Django project:
in models.py
class Reservation():
def cancel_reservation(self):
# ....
#classmethod
def get_client_reservations(cls):
The alternative way that I found in a company codebase:
in models.py
class Reservation():
# There is no method here except __unicode__
and in manage_reservations.py
def cancel_reservation(reservation):
# ...
def get_client_reservations():
# ...
I'd like to have an exshaustive list of the consequences of choosing the first way instead of the second one.

It's a coding style. "Object" in OOP is data and methods, together. The object has everything you need to hold the data and manipulate it. There is no "right" answer, more opinion and style.
So you can write:
r = Reservation.objects.get(pk=1)
r.get_client_reservation()
Rather then:
from . import get_client_reservation
get_client_reservation(r)
But the truth is that Python modules are a very good solution to keep things together, and it's easier to debug than a complex inheritance chain.
In django the OOP is essential because the framework lets you easily subclass components and customise only what you need, this is hard to do without objects.
If you need a specific form, with specific fields, then you can write it as a simple module with functions. But if you need a generic "Form" that everybody can customise (or a model, authentication backend etc), you need OOP.
So bottom line (IMHO): if Reservation is at the bottom of the pyramid, the end line of data and code, no big difference, more personal preference. If it's in the top and you are going to need ReservationThis and ReservationThat, OOP is better.

This isn't a technical answer, but try doing a git blame on that code, and seeing who wrote the methods, and ask them why they chose to do it like that. In general it's better to keep the methods on the class (for multiple reasons) - for example being able to do dir(r) (where r is a reservation) and seeing all the methods on r. There may be a reason though (that we can't know unless we saw the code)

You shoud put a method inside a class if the it's related with the class, for example if it needs some class variable or if it logically belongs with the class

When should I be using classes in Python?

I have been programming in python for about two years; mostly data stuff (pandas, mpl, numpy), but also automation scripts and small web apps. I'm trying to become a better programmer and increase my python knowledge and one of the things that bothers me is that I have never used a class (outside of copying random flask code for small web apps). I generally understand what they are, but I can't seem to wrap my head around why I would need them over a simple function.
To add specificity to my question: I write tons of automated reports which always involve pulling data from multiple data sources (mongo, sql, postgres, apis), performing a lot or a little data munging and formatting, writing the data to csv/excel/html, send it out in an email. The scripts range from ~250 lines to ~600 lines. Would there be any reason for me to use classes to do this and why?

Classes are the pillar of Object Oriented Programming. OOP is highly concerned with code organization, reusability, and encapsulation.
First, a disclaimer: OOP is partially in contrast to Functional Programming, which is a different paradigm used a lot in Python. Not everyone who programs in Python (or surely most languages) uses OOP. You can do a lot in Java 8 that isn't very Object Oriented. If you don't want to use OOP, then don't. If you're just writing one-off scripts to process data that you'll never use again, then keep writing the way you are.
However, there are a lot of reasons to use OOP.
Some reasons:
Organization:
OOP defines well known and standard ways of describing and defining both data and procedure in code. Both data and procedure can be stored at varying levels of definition (in different classes), and there are standard ways about talking about these definitions. That is, if you use OOP in a standard way, it will help your later self and others understand, edit, and use your code. Also, instead of using a complex, arbitrary data storage mechanism (dicts of dicts or lists or dicts or lists of dicts of sets, or whatever), you can name pieces of data structures and conveniently refer to them.
State: OOP helps you define and keep track of state. For instance, in a classic example, if you're creating a program that processes students (for instance, a grade program), you can keep all the info you need about them in one spot (name, age, gender, grade level, courses, grades, teachers, peers, diet, special needs, etc.), and this data is persisted as long as the object is alive, and is easily accessible. In contrast, in pure functional programming, state is never mutated in place.
Encapsulation:
With encapsulation, procedure and data are stored together. Methods (an OOP term for functions) are defined right alongside the data that they operate on and produce. In a language like Java that allows for access control, or in Python, depending upon how you describe your public API, this means that methods and data can be hidden from the user. What this means is that if you need or want to change code, you can do whatever you want to the implementation of the code, but keep the public APIs the same.
Inheritance:
Inheritance allows you to define data and procedure in one place (in one class), and then override or extend that functionality later. For instance, in Python, I often see people creating subclasses of the dict class in order to add additional functionality. A common change is overriding the method that throws an exception when a key is requested from a dictionary that doesn't exist to give a default value based on an unknown key. This allows you to extend your own code now or later, allow others to extend your code, and allows you to extend other people's code.
Reusability: All of these reasons and others allow for greater reusability of code. Object oriented code allows you to write solid (tested) code once, and then reuse over and over. If you need to tweak something for your specific use case, you can inherit from an existing class and overwrite the existing behavior. If you need to change something, you can change it all while maintaining the existing public method signatures, and no one is the wiser (hopefully).
Again, there are several reasons not to use OOP, and you don't need to. But luckily with a language like Python, you can use just a little bit or a lot, it's up to you.
An example of the student use case (no guarantee on code quality, just an example):
Object Oriented
class Student(object):
def __init__(self, name, age, gender, level, grades=None):
self.name = name
self.age = age
self.gender = gender
self.level = level
self.grades = grades or {}
def setGrade(self, course, grade):
self.grades[course] = grade
def getGrade(self, course):
return self.grades[course]
def getGPA(self):
return sum(self.grades.values())/len(self.grades)
# Define some students
john = Student("John", 12, "male", 6, {"math":3.3})
jane = Student("Jane", 12, "female", 6, {"math":3.5})
# Now we can get to the grades easily
print(john.getGPA())
print(jane.getGPA())
Standard Dict
def calculateGPA(gradeDict):
return sum(gradeDict.values())/len(gradeDict)
students = {}
# We can set the keys to variables so we might minimize typos
name, age, gender, level, grades = "name", "age", "gender", "level", "grades"
john, jane = "john", "jane"
math = "math"
students[john] = {}
students[john][age] = 12
students[john][gender] = "male"
students[john][level] = 6
students[john][grades] = {math:3.3}
students[jane] = {}
students[jane][age] = 12
students[jane][gender] = "female"
students[jane][level] = 6
students[jane][grades] = {math:3.5}
# At this point, we need to remember who the students are and where the grades are stored. Not a huge deal, but avoided by OOP.
print(calculateGPA(students[john][grades]))
print(calculateGPA(students[jane][grades]))

Whenever you need to maintain a state of your functions and it cannot be accomplished with generators (functions which yield rather than return). Generators maintain their own state.
If you want to override any of the standard operators, you need a class.
Whenever you have a use for a Visitor pattern, you'll need classes. Every other design pattern can be accomplished more effectively and cleanly with generators, context managers (which are also better implemented as generators than as classes) and POD types (dictionaries, lists and tuples, etc.).
If you want to write "pythonic" code, you should prefer context managers and generators over classes. It will be cleaner.
If you want to extend functionality, you will almost always be able to accomplish it with containment rather than inheritance.
As every rule, this has an exception. If you want to encapsulate functionality quickly (ie, write test code rather than library-level reusable code), you can encapsulate the state in a class. It will be simple and won't need to be reusable.
If you need a C++ style destructor (RIIA), you definitely do NOT want to use classes. You want context managers.

I think you do it right. Classes are reasonable when you need to simulate some business logic or difficult real-life processes with difficult relations.
As example:
Several functions with share state
More than one copy of the same state variables
To extend the behavior of an existing functionality
I also suggest you to watch this classic video

dantiston gives a great answer on why OOP can be useful. However, it is worth noting that OOP is not necessary a better choice most cases it is used. OOP has the advantage of combining data and methods together. In terms of application, I would say that use OOP only if all the functions/methods are dealing and only dealing with a particular set of data and nothing else.
Consider a functional programming refactoring of dentiston's example:
def dictMean( nums ):
return sum(nums.values())/len(nums)
# It's good to include automatic tests for production code, to ensure that updates don't break old codes
assert( dictMean({'math':3.3,'science':3.5})==3.4 )
john = {'name':'John', 'age':12, 'gender':'male', 'level':6, 'grades':{'math':3.3}}
# setGrade
john['grades']['science']=3.5
# getGrade
print(john['grades']['math'])
# getGPA
print(dictMean(john['grades']))
At a first look, it seems like all the 3 methods exclusively deal with GPA, until you realize that Student.getGPA() can be generalized as a function to compute mean of a dict, and re-used on other problems, and the other 2 methods reinvent what dict can already do.
The functional implementation gains:
Simplicity. No boilerplate class or selfs.
Easily add automatic test code right after each
function for easy maintenance.
Easily split into several programs as your code scales.
Reusability for purposes other than computing GPA.
The functional implementation loses:
Typing in 'name', 'age', 'gender' in dict key each time is not very DRY (don't repeat yourself). It's possible to avoid that by changing dict to a list. Sure, a list is less clear than a dict, but this is a none issue if you include an automatic test code below anyway.
Issues this example doesn't cover:
OOP inheritance can be supplanted by function callback.
Calling an OOP class has to create an instance of it first. This can be boring when you don't have data in __init__(self).

A class defines a real world entity. If you are working on something that exists individually and has its own logic that is separate from others, you should create a class for it. For example, a class that encapsulates database connectivity.
If this not the case, no need to create class

It depends on your idea and design. If you are a good designer, then OOPs will come out naturally in the form of various design patterns.
For simple script-level processing, OOPs can be overhead.
Simply consider the basic benefits of OOPs like reusability and extendability and make sure if they are needed or not.
OOPs make complex things simpler and simpler things complex.
Simply keep the things simple in either way using OOPs or not using OOPs. Whichever is simpler, use that.

Python classes, how to use them style-wise, and the Single Responsibility Principle [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I've been programming in Python for some time and have covered some knowledge in Python style but still have a problem on how to use classes properly.
When reading object oriented lecture I often find rules like Single Responsibility Principle that state
"The Single Responsibility Principle says that a class should have
one, and only one, reason to change"
Reading this, I might think of breaking one class into two, like:
class ComplicatedOperations(object):
def __init__(self, item):
pass
def do(self):
...
## lots of other functions
class CreateOption(object):
def __init__(self, simple_list):
self.simple_list = simple_list
def to_options(self):
operated_data = self.transform_data(self.simple_list)
return self.default_option() + operated_data
def default_option(self):
return [('', '')]
def transform_data(self, simple_list):
return [self.make_complicated_operations_that_requires_losts_of_manipulation(item)
for item in simple_list]
def make_complicated_operations_that_requires_losts_of_manipulation(self, item):
return ComplicatedOperations(item).do()
This, for me, raises lots of different questions; like:
When should I use class variables or pass arguments in class functions?
Should the ComplicatedOperations class be a class or just a bunch of functions?
Should the __init__ method be used to calculate the final result. Does that makes that class hard to test.
What are the rules for the pythonists?
Edited after answers:
So, reading Augusto theory, I would end up with something like this:
class ComplicatedOperations(object):
def __init__(self):
pass
def do(self, item):
...
## lots of other functions
def default_option():
return [('', '')]
def complicate_data(item):
return ComplicatedOperations().do(item)
def transform_data_to_options(simple_list):
return default_option() + [self.complicate_data(item)
for item in simple_list]
(Also corrected a small bug with default_option.)

When should I use class variables or pass arguments in class functions
In your example I would pass item into the do method. Also, this is related to programming in any language, give a class only the information it needs (Least Authority), and pass everything that is not internal to you algorithm via parameters (Depedency Injection), so, if the ComplicatedOperations does not need item to initialize itself, do not give it as a init parameter, and if it needs item to do it's job, give it as a parameter.
Should the ComplicatedOperations class be a class or just a bunch of functions
I'd say, depends. If you're using various kinds of operations, and they share some sort of interface or contract, absolutely. If the operation reflects some concept and all the methods are related to the class, sure. But if they are loose and unrelated, you might just use functions or think again about the Single Responsability and split the methods up into other classes
Should the init method be used to calculate the final result. Does that makes that class hard to test.
No, the init method is for initialization, you should do its work on a separated method.
As a side note, because of the lack of context, I did not understand what is CreateOption's role. If it is only used as show above, you might as well just remove it ...

I personally think of classes as of concepts. I'd define a Operation class which behaves like an operation, so contains a do() method, and every other method/property that may make it unique.
As mgilson correctly says, if you cannot define and isolate any concept, maybe a simple functional approach would be better.
To answer your questions:
you should use class attributes when a certain property is shared among the instances (in Python class attributes are initialized at compile time, so different object will see the same value. Usually class attributes should be constants). Use instance attributes to have object-specific properties to use in its methods without passing them. This doesn't mean you should put everything in self, but just what you consider characterising for your object. Use passed variables to have values that do not regard your object and may depend from the state of external objects (or on the execution of the program).
As said above, I'd keep one single class Operation and use a list of Operation objects to do your computations.
the init method would just instantiate the object and make all the processing needed for the proper behaviour of the object (in other words make it read to use).
Just think about the ideas you're trying to model.

A class generally represents a type of object. Class instances are specific objects of that type. A classic example is an Animal class. a cat would be an instance of Animal. class variables (I assume you mean those that belong to the instance rather than the class object itself), should be used for attributes of the instance. In this case, for example, colour could be a class attribute, which would be set as cat.colour = "white" or bear.colour = "brown". Arguments should be used where the value could come from some source outside the class. If the Animal class has a sleep method, it might need to know the duration of the sleep and posture that the animal sleeps in. duration would be an argument of the method, since it has no relation on the animal, but posture would be a class variable since it is determined by the animal.
In python, a class is typically used to group together a set of functions and variables which share a state. Continuing with the above example, a specific animal has a state which is shared across its methods and is defined by its attributes. If your class is just a group of functions which don't in any way depend on the state of the class, then they could just as easily be separate functions.
If __init__ is used to calculate the final result (which would have to be stored in an attribute of the class since __init__ cannot return a result), then you might as well use a function. A common pattern, however, is to do a lot of processing in __init__ via several other, sometimes private, methods of the class. The reason for this is that large complicated functions are often easier to test if they are broken down into smaller, distinct tasks, each of which can then be tested individually. However, this is usually only done when a class is needed anyway.
One approach to the whole business is to start out by deciding what functionality you need. When you have a group of functions or variables which all act on or apply to the same object, then it is time to move them into a class. Remember that Object Oriented Programming (OOP) is a design method suited to some tasks, but is not inherently superiour to functional programming (in fact, some programmers would argue the opposite!), so there's no need to use classes unless there is actually a need.

Classes are an organizational structure. So, if you are not using them to organize, you are doing it wrong. :)
There are several different things you can use them for organizing:
Bundle data with methods that use said data, defines one spot that the code will interact with this data
Bundle like functions together, provides understandable api since 'everyone knows' that all math functions are in the math object
Provide defined communications between methods, sets up a 'conveyor belt' of operations with a defined interface. Each operation is a black box, and can change arbitrarily, so long as it keeps to the standard
Abstract a concept. This can include sub classes, data, methods, so on and so forth all around some central idea like database access. This class then becomes a component you can use in other projects with a minimal amount of retooling
If you don't need to do some organizational thing like the above, then you should go for simplicity and program in a procedural/functional style. Python is about having a toolbox, not a hammer.

Anything wrong with a really large init?

I'm writing a Python program with a GUI built with the Tkinter module. I'm using a class to define the GUI because it makes it easier to pass commands to buttons and makes the whole thing a bit easier to understand.
The actual initialization of my GUI takes about 150 lines of code. To make this easier to understand, I've written the __init__ function like so:
def __init__(self, root):
self.root = root
self._init_menu()
self._init_connectbar()
self._init_usertree()
self._init_remotetree()
self._init_bottom()
where _init_menu(), _init_connectbar(), and so on do all the initialization work. This makes my code easier to follow and prevents __init__ from getting too big.
However, this creates scope issues. Since an Entry widget that I defined in _init_connectbar() is in the function scope and is not a class attribute, I can't refer to it in other methods in the class.
I can make these problems go away by doing most of the initialization in __init__, but I'll lose the abstraction I got with my first method.
Should I expand __init__, or find another way to bring the widgets into class scope?

Either store some of those widget references in instance variables or return them (a minimal set mind you; you want to Reduce Coupling) and store them in local variables in __init__ before passing the relevant ones as arguments to your subsequent construction helpers. The latter is cleaner, but requires that things be decoupled enough that you can create an ordering that makes it possible.

Why don't you make your widgets that you need to refer to, instance variables. This is what I usaully do and seems to be quite a common approach.
e.g.
self.some_widget

In my opinion, you should store the widgets as instance variables so that you can refer to them from any method. As in most programming languages, readability decreases when functions get too large, so your approach of splitting up the initialization code is a good idea.
When the class itself grows too large for one source file, you can also split up the class using mix-in classes (similar to having partial classes in C#).
For example:
class MainGuiClass(GuiMixin_FunctionalityA, GuiMixin_FunctionalityB):
def __init__(self):
GuiMixin_FunctionalityA.__init__(self)
GuiMixin_FunctionalityB.__init__(self)
This comes in handy when the GUI consists of different functionalities (for instance a configuration tab, an execution tab or whatsoever).

You should look into the builder-pattern for this kind of stuff. If your GUI is complex, then there will be some complexity in describing it. Whether that is a complex function or a complex description in some file comes down to the same. You can just try to make it as readable and maintainable as possible, and in my experience the builder pattern really helps here.

Python class design - Splitting up big classes into multiple ones to group functionality

OK I've got 2 really big classes > 1k lines each that I currently have split up into multiple ones. They then get recombined using multiple inheritance. Now I'm wondering, if there is any cleaner/better more pythonic way of doing this. Completely factoring them out would result in endless amounts of self.otherself.do_something calls, which I don't think is the way it should be done.
To make things clear here's what it currently looks like:
from gui_events import GUIEvents # event handlers
from gui_helpers import GUIHelpers # helper methods that don't directly modify the GUI
# GUI.py
class GUI(gtk.Window, GUIEvents, GUIHelpers):
# general stuff here stuff here
One problem that is result of this is Pylint complaining giving me trillions of "init not called" / "undefined attribute" / "attribute accessed before definition" warnings.
EDIT:
You may want to take a look at the code, to make yourself a picture about what the whole thing actually is.
http://github.com/BonsaiDen/Atarashii/tree/next/atarashii/usr/share/pyshared/atarashii/
Please note, I'm really trying anything to keep this thing as DRY as possible, I'm using pylint to detect code duplication, the only thing it complains about are the imports.

If you want to use multiple inheritance to combine everything into one big class (it might make sense to do this), then you can refactor each of the parent classes so that every method and property is either private (starts with '__') or has a short 2-3 character prefix unique to that class. For example, all the methods and properties in your GUIEvents class could start with ge_, everything in GUIHelpers could start with gh_. By doing this, you'll get achieve some of the clarity of using separate sub-class instances (self.ge.doSomething() vs self.ge_doSomething()) and you'll avoid conflicting member names, which is the main risk when combining such large classes into one.

Start by finding classes that model real world concepts that your application needs to work with. Those are natural candidates for classes.
Try to avoid multiple inheritance as much as possible; it's rarely useful and always somewhat confusing. Instead, look to use functional composition ("HAS-A" relationships) to give rich attributes to your objects made of other objects.
Remember to make each method do one small, specific thing; this necessarily entails breaking up methods that do too many things into smaller pieces.
Refactor cases where you find many such methods are duplicating each other's functionality; this is another way to find natural collections of functionality that deserve to be in a distinct class.

I think this is more of a general OO-design problem than Python problem. Python pretty much gives you all the classic OOP tools, conveniently packaged. You'd have to describe the problem in more detail (e.g. what do the GUIEvents and GUIHelpers classes contain?)
One Python-specific aspect to consider is the following: Python supports multiple programming paradigms, and often the best solution is not OOP. This may be the case here. But again, you'll have to throw in more details to get a meaningful answer.

Your code may be substantially improved by implementing a Model-View-Controller design. Depending on how your GUI and tool are setup, you may also benefit from "widgetizing" portions of your GUI, so that rather than having one giant Model-View-Controller, you have a main Model-View-Controller that manages a bunch of smaller Model-View-Controllers, each for distinct portions of your GUI. This would allow you to break up your tool and GUI into many classes, and you may be able to reuse portions of it, reducing the total amount of code you need to maintain.
While python does support multiple programming paradigms, for GUI tools, the best solution will nearly always be an Object-Oriented design.

One possibility is to assign imported functions to class attributes:
In file a_part_1.py:
def add(self, n):
self.n += n
def __init__(self, n):
self.n = n
And in main class file:
import a_part_1
class A:
__init__ = a_part_1.__init__
add = a_part_1.add
Or if you don't want to update main file when new methods are added:
class A: pass
import a_part_1
for k, v in a_part_1.__dict__.items():
if callable(v):
setattr(A,k,v)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Refactoring a huge Python class using Inheritance to do Composition - python

Related

What are the advantages of declaring methods in classes instead of functions?

When should I be using classes in Python?

Python classes, how to use them style-wise, and the Single Responsibility Principle [closed]

Anything wrong with a really large init?

Python class design - Splitting up big classes into multiple ones to group functionality

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Refactoring a huge Python class using Inheritance to do Composition - python

Related

What are the advantages of declaring methods in classes instead of functions?

When should I be using classes in Python?

Python classes, how to use them style-wise, and the Single Responsibility Principle [closed]

Anything wrong with a really large __init__?

Python class design - Splitting up big classes into multiple ones to group functionality

Categories

Resources

Anything wrong with a really large init?