Python unit testing for methods working on complex objects

Python unit testing for methods working on complex objects - python

I have inherited a relatively large (~30.000 lines) Python-based project (running on a CAD system for architects) with some messy methods that I have to bugfix at first and go on with the development. These methods place, say, bricks (or stuff like that), into a wall. So most of the code does 3D calculations on coords, vectors, etc.
There are no unit tests for the project currently (and I'm a complete noob for unit testing, I'm a brick-and-mortar architect). Because of the complexity of the functions I have decided to support my work with unit testing, the question is that how can I do it most effectively. Examples I have seen before are much website-based, working mostly on text docs.
The incoming parameters are very complex and large objects, and I use only few of the stored data. Obviously I have to make templates out of it.
There are two possible ways:
To save a real word data as a Python pickle and later use this as a template, and save it to disk.
To set up objects dynamically. Note that used objects' __init__() methods are mostly like this:
class FirstClass:
def __init__():
self.x = 0
self.y = 0
self.fc = self.FirstClass()
class SecondClass:
def __init__():
self.blabla = 0
and so on, there are no complicated calculations. Obviously I can put my custom data by overwriting the initialized instance variables like this:
objects.in_the_test_used_data = some_numbers
My question is which is the better method for templates, or whether there is a better approach for this et.
Thx

Both approaches are valid, but with a small changes.
For the first approach, you can pickle everything, but it might be easier to maintain a json/xml/etc file, if you can do that you can change the data in the future, it will be an easy change, over re-pickling.
For the second approach, you can define your own test-classes/test-instances, but the real solution would be to use a factory library, I personally love factory_boy. It let you define factories for your classes, and help you generate instances easily.
for instance:
class AccountFactory(factory.Factory):
class Meta:
model = objects.Account
username = factory.Sequence(lambda n: 'john%s' % n)
email = factory.LazyAttribute(lambda o: '%s#example.org' % o.username)
date_joined = factory.LazyFunction(datetime.datetime.now)
Will let you call AccountFactory() and get an Account object.
Personally, I prefer the second method, it let you be flexible about your data, make changes, easy to read and has a great API. You will also avoid using large files that has to be committed to your version control and won't really be readable.

Related

from an OOP perspective how bad it is to have objects that have fixed states?

So I am a little confused about what I have been reading on Object Oriented Programming. I realized that while I was focusing on the rule of each object doing only one thing, I created a class that does not have a changing state.
Basically I am writing a program that does a lot of reading and writing on text files. I thought that none of the objects I have should be dealing with these operations and I should have a fileIO class that does these operations for them. However, I am a little worried that this might be the same thing as having a utility class.
Is having a class whose fields never change(or not even need to be initialized) same as a utility class? Is it a bad practice from OOP perspective? Does it make sense to have a fileIO object? If not should objects be allowed to read from and write to files?
class fileIO:
__processFilePath = None
__trackFilePath = None
def __init__(self, iProcessFilePath, iTrackFilePath):
def getProcesses(self):
#checks if appString is in file
def isAppRunning(self,appString):
#reads all
def getAllTrackedLines(self):
#appends
def addNewOnTracked(self,toBeAdded):
#overwrites
def overWriteTrackedLines(self,fullData):
in this particular instance I am actually initializing the fields in the init method. But for the purposes of my program I don't actually need to because there are only two files that I read and write from.

Reading and Writing data from file can be wrapped in some class that handles the state of the data to ensure that the transaction completes. What I mean by this is that the resource needs to be de-allocated properly, preferably in the same transaction, no matter the outcome of the operation. If you consider the allocation and de-allocation of resources as state, then your class is not exactly stateless. In functional programming, a function handling resources will be considered impure as it is not stateless. The state is merely external.
Having a class with no state does not constitute a bad class. It is true that Utility classes are an anti-pattern, but if your class does one small thing, and it does it well, then it is not a problem. The problem comes in when you have a ton of related methods bunched into the class and the code begins to rot. That is what you want to avoid. A class that has a well defined purpose, and only does that thing, will resist rot.
Make sure you write lots of tests around your class as well, as this is key in long term maintainability.
Please let me know if there is anything that I can clarify.

Refactoring a huge Python class using Inheritance to do Composition

I built a pygame game a few years back. It worked, but wasn't the best coding style and had a lot of classic code smells. I've recently picked it back up and am trying to refactor it with more discipline this time.
One big code smell was that I had a huge class, GameObject, that inherited from pygame.sprite.DirtySprite which had a lot of code related to:
various ways of moving a sprite
various ways of animating a sprite
various ways of exploding a sprite
etc.
The crazier I though of ways for sprites to behave, the code duplication was adding up and changes were getting more difficult. So, I started breaking out functionality into lots of smaller classes and then passing them in at object creation:
class GameObject(DirtySprite):
def __init__(initial_position, mover_cls, imager_cls, exploder_cls):
self.mover = mover(self, initial_position)
self.imager = imager(self)
self.exploder = exploder(self)
...
spaceship = GameObject(pos, CrazyMover, SpaceshipImager, BasicExploder)
As I factored out more and more code into these helper classes, the code was definitely better, more flexible and had less duplication. However, for each type of helper classes, the number of parameters got longer and longer. Creating sprites became a chore and the code was ugly. So, during another refactor I created a bunch of really small classes to do the composition:
class GameObjectAbstract(MoverAbstract, ImagerAbstract, \
ExploderAbstract, DirtySprite):
def __init__(self, initial_position):
...
...
class CrazySpaceship(CrazyMover, SpaceshipImager, BasicExploder, GameObjectAbstract):
pass # Many times, all the behavior comes from super classes
...
spaceship = CrazySpaceship(pos)
I like this approach better. Is this a common approach? It seems to have the same benefits of having all the logic broken out in small classes, but creating the objects is much cleaner.
However, this approach isn't as dynamic. I cannot, for example, decide on a new mashup at run-time. However, this wasn't something I was really doing. While I do a lot of mashups, it seems OK that they are statically defined using class statements.
Am I missing anything when it comes to future maintainability and reuse? I hear that composition is better than inheritance, but this feels like I'm using inheritance to do composition - so I feel like this is OK.
Is there a different pattern that I should be using?

That is ok, if you can separate the behaviors well enough -
Just that it is not "composition" at all - it is multiple inheritance, using what we call "mixin classes": a mixin class is roughly a class that provides an specific behavior that can be combined with other classes.
If you are using Python's super correctly, thatcouldbe the best approach. (If you are managing to create your game objects basically just defining the class name and the mixin classes it uses, that is actually a very good approach)
By the way, if you ever want to create new classes at runtime with this method, it is also possible - just use a call to type to create a new class, instead of a class statement:
class CrazySpaceship(CrazyMover, SpaceshipImager, BasicExploder, GameObjectAbstract):
pass # Many times, all the behavior comes from super classes
Is just equivalent in Python to:
CrazySpaceShip = type('CrazySpaceShip', (CrazyMover, SpaceshipImager, BasicExploder, GameObjectAbstract), {})
And the tuple you used as second parameter can be any sequence built at runtime.

SQLAlchemy Declarative: How to merge models and existing business logic classes

I would like to know what the best practices are for using SQLALchemy declarative models within business logic code. Perhaps stackexchange.codereview may have a been a better place to ask this, but I'm not sure.
Here's some background.
Let's say I have a bunch of classes doing various things. Most of them have little or nothing to do with each other.Each such class has between a hundred to thousand lines of code doing things that have precious little to do with the database. In fact, most of the classes aren't even database aware so far. I've gotten away with storing the actual information in flat files (csv, yaml, so on), and only maintaining a serial number table and a document path - serial number mapping in the database. Each object retrieves the files it needs by getting the correct paths from the database (by serial number) and reconstructs itself from there. This has been exceedingly convenient so far, since my 'models' have been (and admittedly, continue to be) more than fluid.
As I expand the involvement of the database in the codebase I currently have, I seem to have settled on the following model, separating the database bits and the business logic into two completely separate parts, and joining them using specific function calls instead of inheritance or even composition. Here is a basic example of the kind of code I have now (pseudocode-quality):
module/db/models.py:
class Example(Base):
id = Column(...)
some_var = Column(...)
module/db/controller.py:
from .models import Example
def get_example_by_id(id, session):
return session.query(Example).filter_by(id=id).one()
def upsert_example(id=None, some_var=None, session):
if id is not None:
try:
example_obj = get_example_by_id(id, session)
example_obj.some_var = some_var
return
except:
pass
example_obj = Example(some_var=some_var)
session.add(example_obj)
session.flush()
module/example.py:
from db import controller
class Example(object):
def __init__(self, id):
self._id = id
self._some_var = None
try:
self._load_from_db()
self._defined = True
except:
self._defined = False
def _load_from_db(self, session):
db_obj = controller.get_example_by_id(self._id, session)
self._some_var = db_obj.some_var
def create(some_var, session):
if self._defined is True:
raise Exception
self._some_var = some_var
self._sync_to_db(session)
def _sync_to_db(self, session):
controller.upsert_example(self._some_var, session)
#property
def some_var(self):
return self._some_var
...
I'm not convinced this is the way to go.
I have a few models following this pattern, and many more that I should implement in time. The database is currently only used for persistence and archiving. Once something is in the database, it's more or less read only from there on in. However, querying on it is becoming important.
The reason I'm inclined to migrate from the flatfiles to the database is largely to improve scalability.
Thus far, if I wanted to find all instances (rows) of Example with some_var = 3, I'd have to construct all of the instances from the flat files and iterate through them. This seems like a waste of both processor time and memory. In many cases, some_var is actually a calculated property, and reached by a fairly expensive process using source data contained in the flat file.
With the structure above, what I would do is query on Example, obtain a list of 'id's which satisfy my criterion, and then reconstruct just those module instances.
The ORM approach, however, as I understand it, would use thick models, where the objects returned by the query are themselves the objects I would need. I'm wondering whether it makes sense to try to move to that kind of a structure.
To that end, I have the following 'questions' / thoughts:
My instinct is that the code snippets above are anti-patterns more than they are useful patterns. I can't put my finger on why, exactly, but I'm not very happy with it. Is there a real, tangible disadvantage to the structure as listed above? Would moving to a more ORM-ish design provide advantages in functionality / performance / maintainability over this approach?
I'm paranoid about tying myself down to a database schema. I'm also paranoid about regular DB migrations. The approach listed above gives me a certain peace of mind in knowing that if I do need to do some migration, it'll be limited to the _load_from_db and _sync_to_db functions, and let me mess around willy nilly with all the rest of the code.
I'm I wrong about the cost of migrations in the thick-Model approach being high?
Is my sense of security in restricting my code's db involvement more of a false sense of security rather than a useful separation?
If I wanted to integrate Example from module/db/models.py with Example from module/example.py in the example above, what would be the cleanest way to go about it. Alternatively, what is an accepted pattern for handling business-logic heavy models with SQLAlchemy?
In the code above, note that the business logic class keeps all of it's information in 'private' instance variables, while the Model class keeps all of it's information in class variables. How would integrating these two approaches actually work? Theoretically, they should still 'just work' even if put together in a single class definition. In practice, does it?
(The actual codebase is on github, though it's not likely to be very readable)

I think it's natural (at least for me) to be critical of our own designs even as we are working on them. The structures you have here seem fine to me. The answer to whether they are a good fit depends on what you plan to do.
If you consolidate your code into thick models then all of it will be one place and your architecture will be simpler, however, it will also probably mean that your business logic will then be tightly bound to the schema created in the database. Rethinking the database means rethinking large portions of other areas in the app.
Following the code sample provided here means separating the concerns which has the negative side effects such as more lines of code in more places and increased complexity, but it also means that the coupling is looser. If you stay true then you should have significantly less trouble if you decide to change your database schema or move to an entirely different form of storage. Since your business logic class is a plain old object it serves as a nice detached state container. If you move to something else you would still have to redesign the model layer and possibly parts of the controllers, but your business logic and UI layers could remain largely unchanged.
I think the real test lies in asking how big is this application going to be and how long do you plan to have it in service? If we're looking at a small application with a short life span then the added complexity of loose couplings is a waste unless you are doing it for educational purposes. If the application is expected to grow to be quite large or be in service for a number of years then the early investment in complexity should pay off in a lower cost of ownership over the long term since making changes to the various components should be easier.
If it makes you feel any better it's not uncommon to see POCO's and POJO's when working with ORM's such as entity framework and hybernate for the same reason.

When should I be using classes in Python?

I have been programming in python for about two years; mostly data stuff (pandas, mpl, numpy), but also automation scripts and small web apps. I'm trying to become a better programmer and increase my python knowledge and one of the things that bothers me is that I have never used a class (outside of copying random flask code for small web apps). I generally understand what they are, but I can't seem to wrap my head around why I would need them over a simple function.
To add specificity to my question: I write tons of automated reports which always involve pulling data from multiple data sources (mongo, sql, postgres, apis), performing a lot or a little data munging and formatting, writing the data to csv/excel/html, send it out in an email. The scripts range from ~250 lines to ~600 lines. Would there be any reason for me to use classes to do this and why?

Classes are the pillar of Object Oriented Programming. OOP is highly concerned with code organization, reusability, and encapsulation.
First, a disclaimer: OOP is partially in contrast to Functional Programming, which is a different paradigm used a lot in Python. Not everyone who programs in Python (or surely most languages) uses OOP. You can do a lot in Java 8 that isn't very Object Oriented. If you don't want to use OOP, then don't. If you're just writing one-off scripts to process data that you'll never use again, then keep writing the way you are.
However, there are a lot of reasons to use OOP.
Some reasons:
Organization:
OOP defines well known and standard ways of describing and defining both data and procedure in code. Both data and procedure can be stored at varying levels of definition (in different classes), and there are standard ways about talking about these definitions. That is, if you use OOP in a standard way, it will help your later self and others understand, edit, and use your code. Also, instead of using a complex, arbitrary data storage mechanism (dicts of dicts or lists or dicts or lists of dicts of sets, or whatever), you can name pieces of data structures and conveniently refer to them.
State: OOP helps you define and keep track of state. For instance, in a classic example, if you're creating a program that processes students (for instance, a grade program), you can keep all the info you need about them in one spot (name, age, gender, grade level, courses, grades, teachers, peers, diet, special needs, etc.), and this data is persisted as long as the object is alive, and is easily accessible. In contrast, in pure functional programming, state is never mutated in place.
Encapsulation:
With encapsulation, procedure and data are stored together. Methods (an OOP term for functions) are defined right alongside the data that they operate on and produce. In a language like Java that allows for access control, or in Python, depending upon how you describe your public API, this means that methods and data can be hidden from the user. What this means is that if you need or want to change code, you can do whatever you want to the implementation of the code, but keep the public APIs the same.
Inheritance:
Inheritance allows you to define data and procedure in one place (in one class), and then override or extend that functionality later. For instance, in Python, I often see people creating subclasses of the dict class in order to add additional functionality. A common change is overriding the method that throws an exception when a key is requested from a dictionary that doesn't exist to give a default value based on an unknown key. This allows you to extend your own code now or later, allow others to extend your code, and allows you to extend other people's code.
Reusability: All of these reasons and others allow for greater reusability of code. Object oriented code allows you to write solid (tested) code once, and then reuse over and over. If you need to tweak something for your specific use case, you can inherit from an existing class and overwrite the existing behavior. If you need to change something, you can change it all while maintaining the existing public method signatures, and no one is the wiser (hopefully).
Again, there are several reasons not to use OOP, and you don't need to. But luckily with a language like Python, you can use just a little bit or a lot, it's up to you.
An example of the student use case (no guarantee on code quality, just an example):
Object Oriented
class Student(object):
def __init__(self, name, age, gender, level, grades=None):
self.name = name
self.age = age
self.gender = gender
self.level = level
self.grades = grades or {}
def setGrade(self, course, grade):
self.grades[course] = grade
def getGrade(self, course):
return self.grades[course]
def getGPA(self):
return sum(self.grades.values())/len(self.grades)
# Define some students
john = Student("John", 12, "male", 6, {"math":3.3})
jane = Student("Jane", 12, "female", 6, {"math":3.5})
# Now we can get to the grades easily
print(john.getGPA())
print(jane.getGPA())
Standard Dict
def calculateGPA(gradeDict):
return sum(gradeDict.values())/len(gradeDict)
students = {}
# We can set the keys to variables so we might minimize typos
name, age, gender, level, grades = "name", "age", "gender", "level", "grades"
john, jane = "john", "jane"
math = "math"
students[john] = {}
students[john][age] = 12
students[john][gender] = "male"
students[john][level] = 6
students[john][grades] = {math:3.3}
students[jane] = {}
students[jane][age] = 12
students[jane][gender] = "female"
students[jane][level] = 6
students[jane][grades] = {math:3.5}
# At this point, we need to remember who the students are and where the grades are stored. Not a huge deal, but avoided by OOP.
print(calculateGPA(students[john][grades]))
print(calculateGPA(students[jane][grades]))

Whenever you need to maintain a state of your functions and it cannot be accomplished with generators (functions which yield rather than return). Generators maintain their own state.
If you want to override any of the standard operators, you need a class.
Whenever you have a use for a Visitor pattern, you'll need classes. Every other design pattern can be accomplished more effectively and cleanly with generators, context managers (which are also better implemented as generators than as classes) and POD types (dictionaries, lists and tuples, etc.).
If you want to write "pythonic" code, you should prefer context managers and generators over classes. It will be cleaner.
If you want to extend functionality, you will almost always be able to accomplish it with containment rather than inheritance.
As every rule, this has an exception. If you want to encapsulate functionality quickly (ie, write test code rather than library-level reusable code), you can encapsulate the state in a class. It will be simple and won't need to be reusable.
If you need a C++ style destructor (RIIA), you definitely do NOT want to use classes. You want context managers.

I think you do it right. Classes are reasonable when you need to simulate some business logic or difficult real-life processes with difficult relations.
As example:
Several functions with share state
More than one copy of the same state variables
To extend the behavior of an existing functionality
I also suggest you to watch this classic video

dantiston gives a great answer on why OOP can be useful. However, it is worth noting that OOP is not necessary a better choice most cases it is used. OOP has the advantage of combining data and methods together. In terms of application, I would say that use OOP only if all the functions/methods are dealing and only dealing with a particular set of data and nothing else.
Consider a functional programming refactoring of dentiston's example:
def dictMean( nums ):
return sum(nums.values())/len(nums)
# It's good to include automatic tests for production code, to ensure that updates don't break old codes
assert( dictMean({'math':3.3,'science':3.5})==3.4 )
john = {'name':'John', 'age':12, 'gender':'male', 'level':6, 'grades':{'math':3.3}}
# setGrade
john['grades']['science']=3.5
# getGrade
print(john['grades']['math'])
# getGPA
print(dictMean(john['grades']))
At a first look, it seems like all the 3 methods exclusively deal with GPA, until you realize that Student.getGPA() can be generalized as a function to compute mean of a dict, and re-used on other problems, and the other 2 methods reinvent what dict can already do.
The functional implementation gains:
Simplicity. No boilerplate class or selfs.
Easily add automatic test code right after each
function for easy maintenance.
Easily split into several programs as your code scales.
Reusability for purposes other than computing GPA.
The functional implementation loses:
Typing in 'name', 'age', 'gender' in dict key each time is not very DRY (don't repeat yourself). It's possible to avoid that by changing dict to a list. Sure, a list is less clear than a dict, but this is a none issue if you include an automatic test code below anyway.
Issues this example doesn't cover:
OOP inheritance can be supplanted by function callback.
Calling an OOP class has to create an instance of it first. This can be boring when you don't have data in __init__(self).

A class defines a real world entity. If you are working on something that exists individually and has its own logic that is separate from others, you should create a class for it. For example, a class that encapsulates database connectivity.
If this not the case, no need to create class

It depends on your idea and design. If you are a good designer, then OOPs will come out naturally in the form of various design patterns.
For simple script-level processing, OOPs can be overhead.
Simply consider the basic benefits of OOPs like reusability and extendability and make sure if they are needed or not.
OOPs make complex things simpler and simpler things complex.
Simply keep the things simple in either way using OOPs or not using OOPs. Whichever is simpler, use that.

Python class design - Splitting up big classes into multiple ones to group functionality

OK I've got 2 really big classes > 1k lines each that I currently have split up into multiple ones. They then get recombined using multiple inheritance. Now I'm wondering, if there is any cleaner/better more pythonic way of doing this. Completely factoring them out would result in endless amounts of self.otherself.do_something calls, which I don't think is the way it should be done.
To make things clear here's what it currently looks like:
from gui_events import GUIEvents # event handlers
from gui_helpers import GUIHelpers # helper methods that don't directly modify the GUI
# GUI.py
class GUI(gtk.Window, GUIEvents, GUIHelpers):
# general stuff here stuff here
One problem that is result of this is Pylint complaining giving me trillions of "init not called" / "undefined attribute" / "attribute accessed before definition" warnings.
EDIT:
You may want to take a look at the code, to make yourself a picture about what the whole thing actually is.
http://github.com/BonsaiDen/Atarashii/tree/next/atarashii/usr/share/pyshared/atarashii/
Please note, I'm really trying anything to keep this thing as DRY as possible, I'm using pylint to detect code duplication, the only thing it complains about are the imports.

If you want to use multiple inheritance to combine everything into one big class (it might make sense to do this), then you can refactor each of the parent classes so that every method and property is either private (starts with '__') or has a short 2-3 character prefix unique to that class. For example, all the methods and properties in your GUIEvents class could start with ge_, everything in GUIHelpers could start with gh_. By doing this, you'll get achieve some of the clarity of using separate sub-class instances (self.ge.doSomething() vs self.ge_doSomething()) and you'll avoid conflicting member names, which is the main risk when combining such large classes into one.

Start by finding classes that model real world concepts that your application needs to work with. Those are natural candidates for classes.
Try to avoid multiple inheritance as much as possible; it's rarely useful and always somewhat confusing. Instead, look to use functional composition ("HAS-A" relationships) to give rich attributes to your objects made of other objects.
Remember to make each method do one small, specific thing; this necessarily entails breaking up methods that do too many things into smaller pieces.
Refactor cases where you find many such methods are duplicating each other's functionality; this is another way to find natural collections of functionality that deserve to be in a distinct class.

I think this is more of a general OO-design problem than Python problem. Python pretty much gives you all the classic OOP tools, conveniently packaged. You'd have to describe the problem in more detail (e.g. what do the GUIEvents and GUIHelpers classes contain?)
One Python-specific aspect to consider is the following: Python supports multiple programming paradigms, and often the best solution is not OOP. This may be the case here. But again, you'll have to throw in more details to get a meaningful answer.

Your code may be substantially improved by implementing a Model-View-Controller design. Depending on how your GUI and tool are setup, you may also benefit from "widgetizing" portions of your GUI, so that rather than having one giant Model-View-Controller, you have a main Model-View-Controller that manages a bunch of smaller Model-View-Controllers, each for distinct portions of your GUI. This would allow you to break up your tool and GUI into many classes, and you may be able to reuse portions of it, reducing the total amount of code you need to maintain.
While python does support multiple programming paradigms, for GUI tools, the best solution will nearly always be an Object-Oriented design.

One possibility is to assign imported functions to class attributes:
In file a_part_1.py:
def add(self, n):
self.n += n
def __init__(self, n):
self.n = n
And in main class file:
import a_part_1
class A:
__init__ = a_part_1.__init__
add = a_part_1.add
Or if you don't want to update main file when new methods are added:
class A: pass
import a_part_1
for k, v in a_part_1.__dict__.items():
if callable(v):
setattr(A,k,v)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.