SQLAlchemy Declarative: How to merge models and existing business logic classes - python

I would like to know what the best practices are for using SQLALchemy declarative models within business logic code. Perhaps stackexchange.codereview may have a been a better place to ask this, but I'm not sure.
Here's some background.
Let's say I have a bunch of classes doing various things. Most of them have little or nothing to do with each other.Each such class has between a hundred to thousand lines of code doing things that have precious little to do with the database. In fact, most of the classes aren't even database aware so far. I've gotten away with storing the actual information in flat files (csv, yaml, so on), and only maintaining a serial number table and a document path - serial number mapping in the database. Each object retrieves the files it needs by getting the correct paths from the database (by serial number) and reconstructs itself from there. This has been exceedingly convenient so far, since my 'models' have been (and admittedly, continue to be) more than fluid.
As I expand the involvement of the database in the codebase I currently have, I seem to have settled on the following model, separating the database bits and the business logic into two completely separate parts, and joining them using specific function calls instead of inheritance or even composition. Here is a basic example of the kind of code I have now (pseudocode-quality):
module/db/models.py:
class Example(Base):
id = Column(...)
some_var = Column(...)
module/db/controller.py:
from .models import Example
def get_example_by_id(id, session):
return session.query(Example).filter_by(id=id).one()
def upsert_example(id=None, some_var=None, session):
if id is not None:
try:
example_obj = get_example_by_id(id, session)
example_obj.some_var = some_var
return
except:
pass
example_obj = Example(some_var=some_var)
session.add(example_obj)
session.flush()
module/example.py:
from db import controller
class Example(object):
def __init__(self, id):
self._id = id
self._some_var = None
try:
self._load_from_db()
self._defined = True
except:
self._defined = False
def _load_from_db(self, session):
db_obj = controller.get_example_by_id(self._id, session)
self._some_var = db_obj.some_var
def create(some_var, session):
if self._defined is True:
raise Exception
self._some_var = some_var
self._sync_to_db(session)
def _sync_to_db(self, session):
controller.upsert_example(self._some_var, session)
#property
def some_var(self):
return self._some_var
...
I'm not convinced this is the way to go.
I have a few models following this pattern, and many more that I should implement in time. The database is currently only used for persistence and archiving. Once something is in the database, it's more or less read only from there on in. However, querying on it is becoming important.
The reason I'm inclined to migrate from the flatfiles to the database is largely to improve scalability.
Thus far, if I wanted to find all instances (rows) of Example with some_var = 3, I'd have to construct all of the instances from the flat files and iterate through them. This seems like a waste of both processor time and memory. In many cases, some_var is actually a calculated property, and reached by a fairly expensive process using source data contained in the flat file.
With the structure above, what I would do is query on Example, obtain a list of 'id's which satisfy my criterion, and then reconstruct just those module instances.
The ORM approach, however, as I understand it, would use thick models, where the objects returned by the query are themselves the objects I would need. I'm wondering whether it makes sense to try to move to that kind of a structure.
To that end, I have the following 'questions' / thoughts:
My instinct is that the code snippets above are anti-patterns more than they are useful patterns. I can't put my finger on why, exactly, but I'm not very happy with it. Is there a real, tangible disadvantage to the structure as listed above? Would moving to a more ORM-ish design provide advantages in functionality / performance / maintainability over this approach?
I'm paranoid about tying myself down to a database schema. I'm also paranoid about regular DB migrations. The approach listed above gives me a certain peace of mind in knowing that if I do need to do some migration, it'll be limited to the _load_from_db and _sync_to_db functions, and let me mess around willy nilly with all the rest of the code.
I'm I wrong about the cost of migrations in the thick-Model approach being high?
Is my sense of security in restricting my code's db involvement more of a false sense of security rather than a useful separation?
If I wanted to integrate Example from module/db/models.py with Example from module/example.py in the example above, what would be the cleanest way to go about it. Alternatively, what is an accepted pattern for handling business-logic heavy models with SQLAlchemy?
In the code above, note that the business logic class keeps all of it's information in 'private' instance variables, while the Model class keeps all of it's information in class variables. How would integrating these two approaches actually work? Theoretically, they should still 'just work' even if put together in a single class definition. In practice, does it?
(The actual codebase is on github, though it's not likely to be very readable)

I think it's natural (at least for me) to be critical of our own designs even as we are working on them. The structures you have here seem fine to me. The answer to whether they are a good fit depends on what you plan to do.
If you consolidate your code into thick models then all of it will be one place and your architecture will be simpler, however, it will also probably mean that your business logic will then be tightly bound to the schema created in the database. Rethinking the database means rethinking large portions of other areas in the app.
Following the code sample provided here means separating the concerns which has the negative side effects such as more lines of code in more places and increased complexity, but it also means that the coupling is looser. If you stay true then you should have significantly less trouble if you decide to change your database schema or move to an entirely different form of storage. Since your business logic class is a plain old object it serves as a nice detached state container. If you move to something else you would still have to redesign the model layer and possibly parts of the controllers, but your business logic and UI layers could remain largely unchanged.
I think the real test lies in asking how big is this application going to be and how long do you plan to have it in service? If we're looking at a small application with a short life span then the added complexity of loose couplings is a waste unless you are doing it for educational purposes. If the application is expected to grow to be quite large or be in service for a number of years then the early investment in complexity should pay off in a lower cost of ownership over the long term since making changes to the various components should be easier.
If it makes you feel any better it's not uncommon to see POCO's and POJO's when working with ORM's such as entity framework and hybernate for the same reason.

Related

Is it worth to use `select_related()` when using just one instance of a model instead of a queryset?

I'll keep it short. Say we have this database structure:
class Bird(models.Model):
name = models.CharField()
specie = models.CharField()
class Feather(models.Model):
bird = models.ForeignKey(Bird)
And then we have some simple lines from an APIView:
feather_id = request.query_params['feather']
feather = Feather.objects.get(pk=feather_id)
bird_name = feather.bird.name
# a lot of lines between
bird_specie = feather.bird.specie
Does it make any difference using:
feather = Feather.objects.select_related('bird').get(pk=1)
instead of:
feather = Feather.objects.get(pk=1)
in this scenario? I saw some people using select_prefetch() in this way and i wonder if it makes any difference, if not, which one should you use? Normally i agree that select_prefetch() is useful for optimization when using querysets to avoid querying each instance individually, but in this case, is it worth using it when you have just one instance in the whole APIView?
In my opinion the only difference is that another query is just made later on, but when talking about performance, it's the same.
Thanks in advance.
Yes, it will make a difference, but it will probably be a very small difference. Selecting the related tables at the same time eliminates the time required for additional round trips to the database, likely a few milliseconds at most.
This may only matter to you if you have higher latency connecting to the database, there are many related tables that will be fetched in turn, and/or the apiview has very high load (and every millisecond counts).
I generally use select_related() on single object queries, but as a stylistic choice rather than a performance choice: to indicate which other models are going to be fetched and used (explicit is better than implicit).

Python unit testing for methods working on complex objects

I have inherited a relatively large (~30.000 lines) Python-based project (running on a CAD system for architects) with some messy methods that I have to bugfix at first and go on with the development. These methods place, say, bricks (or stuff like that), into a wall. So most of the code does 3D calculations on coords, vectors, etc.
There are no unit tests for the project currently (and I'm a complete noob for unit testing, I'm a brick-and-mortar architect). Because of the complexity of the functions I have decided to support my work with unit testing, the question is that how can I do it most effectively. Examples I have seen before are much website-based, working mostly on text docs.
The incoming parameters are very complex and large objects, and I use only few of the stored data. Obviously I have to make templates out of it.
There are two possible ways:
To save a real word data as a Python pickle and later use this as a template, and save it to disk.
To set up objects dynamically. Note that used objects' __init__() methods are mostly like this:
class FirstClass:
def __init__():
self.x = 0
self.y = 0
self.fc = self.FirstClass()
class SecondClass:
def __init__():
self.blabla = 0
and so on, there are no complicated calculations. Obviously I can put my custom data by overwriting the initialized instance variables like this:
objects.in_the_test_used_data = some_numbers
My question is which is the better method for templates, or whether there is a better approach for this et.
Thx
Both approaches are valid, but with a small changes.
For the first approach, you can pickle everything, but it might be easier to maintain a json/xml/etc file, if you can do that you can change the data in the future, it will be an easy change, over re-pickling.
For the second approach, you can define your own test-classes/test-instances, but the real solution would be to use a factory library, I personally love factory_boy. It let you define factories for your classes, and help you generate instances easily.
for instance:
class AccountFactory(factory.Factory):
class Meta:
model = objects.Account
username = factory.Sequence(lambda n: 'john%s' % n)
email = factory.LazyAttribute(lambda o: '%s#example.org' % o.username)
date_joined = factory.LazyFunction(datetime.datetime.now)
Will let you call AccountFactory() and get an Account object.
Personally, I prefer the second method, it let you be flexible about your data, make changes, easy to read and has a great API. You will also avoid using large files that has to be committed to your version control and won't really be readable.

When should I be using classes in Python?

I have been programming in python for about two years; mostly data stuff (pandas, mpl, numpy), but also automation scripts and small web apps. I'm trying to become a better programmer and increase my python knowledge and one of the things that bothers me is that I have never used a class (outside of copying random flask code for small web apps). I generally understand what they are, but I can't seem to wrap my head around why I would need them over a simple function.
To add specificity to my question: I write tons of automated reports which always involve pulling data from multiple data sources (mongo, sql, postgres, apis), performing a lot or a little data munging and formatting, writing the data to csv/excel/html, send it out in an email. The scripts range from ~250 lines to ~600 lines. Would there be any reason for me to use classes to do this and why?
Classes are the pillar of Object Oriented Programming. OOP is highly concerned with code organization, reusability, and encapsulation.
First, a disclaimer: OOP is partially in contrast to Functional Programming, which is a different paradigm used a lot in Python. Not everyone who programs in Python (or surely most languages) uses OOP. You can do a lot in Java 8 that isn't very Object Oriented. If you don't want to use OOP, then don't. If you're just writing one-off scripts to process data that you'll never use again, then keep writing the way you are.
However, there are a lot of reasons to use OOP.
Some reasons:
Organization:
OOP defines well known and standard ways of describing and defining both data and procedure in code. Both data and procedure can be stored at varying levels of definition (in different classes), and there are standard ways about talking about these definitions. That is, if you use OOP in a standard way, it will help your later self and others understand, edit, and use your code. Also, instead of using a complex, arbitrary data storage mechanism (dicts of dicts or lists or dicts or lists of dicts of sets, or whatever), you can name pieces of data structures and conveniently refer to them.
State: OOP helps you define and keep track of state. For instance, in a classic example, if you're creating a program that processes students (for instance, a grade program), you can keep all the info you need about them in one spot (name, age, gender, grade level, courses, grades, teachers, peers, diet, special needs, etc.), and this data is persisted as long as the object is alive, and is easily accessible. In contrast, in pure functional programming, state is never mutated in place.
Encapsulation:
With encapsulation, procedure and data are stored together. Methods (an OOP term for functions) are defined right alongside the data that they operate on and produce. In a language like Java that allows for access control, or in Python, depending upon how you describe your public API, this means that methods and data can be hidden from the user. What this means is that if you need or want to change code, you can do whatever you want to the implementation of the code, but keep the public APIs the same.
Inheritance:
Inheritance allows you to define data and procedure in one place (in one class), and then override or extend that functionality later. For instance, in Python, I often see people creating subclasses of the dict class in order to add additional functionality. A common change is overriding the method that throws an exception when a key is requested from a dictionary that doesn't exist to give a default value based on an unknown key. This allows you to extend your own code now or later, allow others to extend your code, and allows you to extend other people's code.
Reusability: All of these reasons and others allow for greater reusability of code. Object oriented code allows you to write solid (tested) code once, and then reuse over and over. If you need to tweak something for your specific use case, you can inherit from an existing class and overwrite the existing behavior. If you need to change something, you can change it all while maintaining the existing public method signatures, and no one is the wiser (hopefully).
Again, there are several reasons not to use OOP, and you don't need to. But luckily with a language like Python, you can use just a little bit or a lot, it's up to you.
An example of the student use case (no guarantee on code quality, just an example):
Object Oriented
class Student(object):
def __init__(self, name, age, gender, level, grades=None):
self.name = name
self.age = age
self.gender = gender
self.level = level
self.grades = grades or {}
def setGrade(self, course, grade):
self.grades[course] = grade
def getGrade(self, course):
return self.grades[course]
def getGPA(self):
return sum(self.grades.values())/len(self.grades)
# Define some students
john = Student("John", 12, "male", 6, {"math":3.3})
jane = Student("Jane", 12, "female", 6, {"math":3.5})
# Now we can get to the grades easily
print(john.getGPA())
print(jane.getGPA())
Standard Dict
def calculateGPA(gradeDict):
return sum(gradeDict.values())/len(gradeDict)
students = {}
# We can set the keys to variables so we might minimize typos
name, age, gender, level, grades = "name", "age", "gender", "level", "grades"
john, jane = "john", "jane"
math = "math"
students[john] = {}
students[john][age] = 12
students[john][gender] = "male"
students[john][level] = 6
students[john][grades] = {math:3.3}
students[jane] = {}
students[jane][age] = 12
students[jane][gender] = "female"
students[jane][level] = 6
students[jane][grades] = {math:3.5}
# At this point, we need to remember who the students are and where the grades are stored. Not a huge deal, but avoided by OOP.
print(calculateGPA(students[john][grades]))
print(calculateGPA(students[jane][grades]))
Whenever you need to maintain a state of your functions and it cannot be accomplished with generators (functions which yield rather than return). Generators maintain their own state.
If you want to override any of the standard operators, you need a class.
Whenever you have a use for a Visitor pattern, you'll need classes. Every other design pattern can be accomplished more effectively and cleanly with generators, context managers (which are also better implemented as generators than as classes) and POD types (dictionaries, lists and tuples, etc.).
If you want to write "pythonic" code, you should prefer context managers and generators over classes. It will be cleaner.
If you want to extend functionality, you will almost always be able to accomplish it with containment rather than inheritance.
As every rule, this has an exception. If you want to encapsulate functionality quickly (ie, write test code rather than library-level reusable code), you can encapsulate the state in a class. It will be simple and won't need to be reusable.
If you need a C++ style destructor (RIIA), you definitely do NOT want to use classes. You want context managers.
I think you do it right. Classes are reasonable when you need to simulate some business logic or difficult real-life processes with difficult relations.
As example:
Several functions with share state
More than one copy of the same state variables
To extend the behavior of an existing functionality
I also suggest you to watch this classic video
dantiston gives a great answer on why OOP can be useful. However, it is worth noting that OOP is not necessary a better choice most cases it is used. OOP has the advantage of combining data and methods together. In terms of application, I would say that use OOP only if all the functions/methods are dealing and only dealing with a particular set of data and nothing else.
Consider a functional programming refactoring of dentiston's example:
def dictMean( nums ):
return sum(nums.values())/len(nums)
# It's good to include automatic tests for production code, to ensure that updates don't break old codes
assert( dictMean({'math':3.3,'science':3.5})==3.4 )
john = {'name':'John', 'age':12, 'gender':'male', 'level':6, 'grades':{'math':3.3}}
# setGrade
john['grades']['science']=3.5
# getGrade
print(john['grades']['math'])
# getGPA
print(dictMean(john['grades']))
At a first look, it seems like all the 3 methods exclusively deal with GPA, until you realize that Student.getGPA() can be generalized as a function to compute mean of a dict, and re-used on other problems, and the other 2 methods reinvent what dict can already do.
The functional implementation gains:
Simplicity. No boilerplate class or selfs.
Easily add automatic test code right after each
function for easy maintenance.
Easily split into several programs as your code scales.
Reusability for purposes other than computing GPA.
The functional implementation loses:
Typing in 'name', 'age', 'gender' in dict key each time is not very DRY (don't repeat yourself). It's possible to avoid that by changing dict to a list. Sure, a list is less clear than a dict, but this is a none issue if you include an automatic test code below anyway.
Issues this example doesn't cover:
OOP inheritance can be supplanted by function callback.
Calling an OOP class has to create an instance of it first. This can be boring when you don't have data in __init__(self).
A class defines a real world entity. If you are working on something that exists individually and has its own logic that is separate from others, you should create a class for it. For example, a class that encapsulates database connectivity.
If this not the case, no need to create class
It depends on your idea and design. If you are a good designer, then OOPs will come out naturally in the form of various design patterns.
For simple script-level processing, OOPs can be overhead.
Simply consider the basic benefits of OOPs like reusability and extendability and make sure if they are needed or not.
OOPs make complex things simpler and simpler things complex.
Simply keep the things simple in either way using OOPs or not using OOPs. Whichever is simpler, use that.

How are the objects in the classic single responsibility principle example supposed to communicate?

I'm facing potentially a refactoring project at work and am having a little bit of trouble grasping the single responsibility principle example that shows up on most websites. It is the one regarding separating the connection and send/receive methods of a modem into two different objects. The project is in Python, by the way, but I don't think it is a language-specific issue.
Currently I'm working to break up a 1300 line web service driver class that somebody created (arbitrarily split into two classes but they are essentially one). On the level of responsibility I understand I need to break the connectivity, configuration, and XML manipulation responsibilities into separate classes. Right now all is handled by the class using string manipulations and the httplib.HTTPConnection object to handle the request.
So according to this example I would have a class to handle only the http connection, and a class to transfer that data across that connection, but how would these communicate? If I require a connection to be passed in when constructing the data transfer class, does that re-couple the classes? I'm just having trouble grasping how this transfer class actually accesses the connection that has been made.
With a class that huge (> 1000 lines of code) you have more to worry about than only the SRP or the DIP. I have (or "I fight") classes of similar size and from my experience you have to make unit tests where possible. Carefully refactor (very carefully!) Automatic testing is your friend - be it unit testing as mentioned or regression testing, integration testing, acceptance testing, or whatever you are able to automatically execute. Then refactor. And then run the tests. Refactor again. Test. Refactor. Test.
There is a very good book that describes this process: Michael Feather's "Working Effectively With Legacy Code". Read it.
For example, draw a picture that shows the dependencies of all methods and members of this class. That might help you to identify different "areas" of repsonsibility.

Pythonic Django object reuse

I've been racking my brain on this for the last few weeks and I just can't seem to understand it. I'm hoping you folks here can give me some clarity.
A LITTLE BACKGROUND
I've built an API to help serve a large website and like all of us, are trying to keep the API as efficient as possible. Part of this efficiency is to NOT create an object that contains custom business logic over and over again (Example: a service class) as requests are made. To give some personal background I come from the Java world so I'm use to using a IoC or DI to help handle object creation and injection into my classes to ensure classes are NOT created over and over on a per request basis.
WHAT I'VE READ
While looking at many Python IoC and DI posts I've become rather confused on how to best approach creating a given class and not having to worry about the server getting overloaded with too many objects based on the amount of requests it may be handling.
Some people say an IoC or DI really isn't needed. But as I run my Django app I find that unless I construct the object I want globally (top of file) for views.py to use later rather than within each view class or def within views.py I run the change of creating multiple classes of the same type, which from what I understand would cause memory bloat on the server.
So what's the right way to be pythonic to keep objects from being built over and over? Should I invest in using an IoC / DI or not? Can I safely rely on setting up my service.py files to just contain def's instead of classes that contain def's? Is the garbage collector just THAT efficient so I don't even have to worry about it.
I've purposely not placed any code in this post since this seems like a general questions, but I can provide a few code examples if that helps.
Thanks
From a confused engineer that wants to be as pythonic as possible
You come from a background where everything needs to be a class, I've programmed web apps in Java too, and sometimes it's harder to unlearn old things than to learn new things, I understand.
In Python / Django you wouldn't make anything a class unless you need many instances and need to keep state.
For a service that's hardly the case, and sometimes you'll notice in Java-like web apps some services are made singletons, which is just a workaround and a rather big anti-pattern in Python
Pythonic
Python is flexible enough so that a "services class" isn't required, you'd just have a Python module (e.g. services.py) with a number of functions, emphasis on being a function that takes in something, returns something, in a completely stateless fashion.
# services.py
# this is a module, doesn't keep any state within,
# it may read and write to the DB, do some processing etc but doesn't remember things
def get_scores(student_id):
return Score.objects.filter(student=student_id)
# views.py
# receives HTTP requests
def view_scores(request, student_id):
scores = services.get_scores(student_id)
# e.g. use the scores queryset in a template return HTML page
Notice how if you need to swap out the service, you'll just be swapping out a single Python module (just a file really), so Pythonistas hardly bother with explicit interfaces and other abstractions.
Memory
Now per each "django worker process", you'd have that one services module, that is used over and over for all requests that come in, and when the Score queryset is used and no longer pointed at in memory, it'll be cleaned up.
I saw your other post, and well, instantiating a ScoreService object for each request, or keeping an instance of it in the global scope is just unnecessary, the above example does the job with one module in memory, and doesn't need us to be smart about it.
And if you did need to keep state in-between several requests, keeping them in online instances of ScoreService would be a bad idea anyway because now every user might need one instance, that's not viable (too many online objects keeping context). Not to mention that instance is only accessible from the same process unless you have some sharing mechanisms in place.
Keep state in a datastore
In case you want to keep state in-between requests, you'd keep the state in a datastore, and when the request comes in, you hit the services module again to get the context back from the datastore, pick up where you left it and do your business, return your HTTP response, then unused things will get garbage collected.
The emphasis being on keeping things stateless, where any given HTTP request can be processed on any given django process, and all state objects are garbage collected after the response is returned and objects go out of scope.
This may not be the fastest request/response cycle we can pull, but it's scalable as hell
Look at some major web apps written in Django
I suggest you look at some open source Django projects and look at how they're organized, you'll see a lot of the things you're busting your brains with, Djangonauts just don't bother with.

Categories