Parent/Child(ren) Hierarchy / "Nested Sets", in Python/Django

Parent/Child(ren) Hierarchy / "Nested Sets", in Python/Django - python

I'm using Django/Python, but pseudo-code is definitely acceptable here.
Working with some models that already exist, I have Employees that each have a Supervisor, which is essentially a Foreign Key type relationship to another Employee.
Where the Employee/Supervisor hierarchy is something like this:
Any given Employee has ONE Supervisor. That Supervisor may have one or more Employees "beneath", and has his/her own Supervisor as well. Retrieving my "upline" should return my supervisor, his supervisor, her supervisor, etc., until reaching an employee that has no supervisor.
Without going hog-wild and installing new apps to manage these relationships, as this is an existing codebase and project, I'm wondering the "pythonic" or correct way to implement the following functions:
def get_upline(employee):
# Get a flat list of Employee objects that are
# 'supervisors' to eachother, starting with
# the given Employee.
pass
def get_downline(employee):
# Starting with the given Employee, find and
# return a flat list of all other Employees
# that are "below".
pass
I feel like there may be a somewhat simple way to do this with the Django ORM, but if not, I'll take any suggestions.
I haven't thoroughly checked out Django-MPTT, but if I can leave the models in tact, and simply gain more functionality, it would be worth it.

You don't have to touch your models to be able to use django-mptt; you just have to create a parent field on your model, django-mptt creates all the other attributes for mptt automaitcally, when you register your model: mptt.register(MyModel).
Though if you just need the 'upline' hierarchy you wouldn't need nested sets. The bigger performance problem is going the opposite direction and collect eg. children/leaves etc, which makes it necessary to work on a nested set model!

Relational databases are not good for this kind of graph queries, so your only option is to do a bunch of query. Here is a recursive implementation:
def get_upline(employee):
if self.supervisor:
return [employee] + self.supervisor.get_upline()
else:
return [employee]
def get_download(employee):
l = [employee]
for minion in self.minion_set.all():
l.extend(minion.get_download())
return l

Related

How to use active record as DTO to my business objects

I struggle with this problem for a long time. I searched and searched the whole internet for solution but nothing was acceptable for me.
In short, to react Daniel's comment:
I would like to use the Django's Active Record ORM's objects but only as a DTO with the ability to save the data back to the database, including complex relationships.
This way my business objects would be independent of it's data sources and they would only contain behavior.
Long version:
We started a project which looked very simple in terms of business logic required, so I picked Django as a framework to make things easier. The problem got complex and maybe the implementation will be used in a much larger project so I want to decouple my business objects objects from activerecord and I want to use django only as a DB backend and something what uses my objects. The UI is already decoupled from the start, it only calls a REST API provided by the django backend.
My problems with an example:
I have a Request model which connected to many other models. This request contains some requirement specification related to networks. These networks are associated to a Cloud models. The networks under a cloud will be connected to the reservation which is generated from the request (currently they are connected to the request's nw descriptors to determine the configurations during queries).
class Request(Model):
... # bunch_of_stuff_here
class NetworkDescriptor(Model):
request = ForeignKey(Request)
configA = ...
configB = ...
class Cloud(Model):
...
class Network(Model):
cloud = ForeignKey(Cloud)
used_by = ForeignKey(NetworkDescriptor)
Solutions I considered:
1) Embed the models to the BOs and delegate to them.
This leads to the following problem: When i try to access the networks of a cloud and simply delegate, for example with
def get_networks:
return orm_dto.networks
i get back AR objects, which is bad. I don't want to leak the AR details here. I would have to write a mapper to from ORMNetworks to BOnetworks, which tracks the changes of networks ( network.used_by=request ). This by definition an Object Mapper... which i already have (namely ORM) and i dont want this. :)
2) Embed the models but allow only high level interaction. This sounds much more Object Oriented, but still don't know how to do it:
class Cloud:
def serve_request(bo_request):
???
The result should be a bunch of networks where the used_by field is set to the ORMRequest object which is behind the BORequest parameter. How should i get this information? If i don't want to leak the ORM details, again... i have to write something which tracks the BO objects and can map them to AR objects, for example:
class Cloud:
def serve_request(bo_request):
for net in self._find_matching_networks(bo_requsets)
net.used_by = repo.get_ar_from_bo(bo_request)
which is again... not the best solution because i have to write the mapper, but in this case it seems much easier because i don't have to take care of the related fields.
3) Use Template Method pattern and make the AR objects to be only a data source.
class Request(Model, Resource):
def get_cpu():
return self._cpu
class Resource(object)
def __le__(self, other:Resource):
return self.get_cpu() <= other.get_cpu()
This is again a solution which i don't like because in this case the AR still the BO. I can't unit test it effectively without reaching the DB, but at least on code level the business part is separated from the Data access part.
4) Complete swap to SqlAlchemy. The problem is that the administartion of AR objects through Django is much easier than it would be with alchemy, but maybe this is the ultimate solution.

How to implement composition/agregation with NDB on GAE

How do we implement agregation or composition with NDB on Google App Engine ? What is the best way to proceed depending on use cases ?
Thanks !
I've tried to use a repeated property. In this very simple example, a Project have a list of Tag keys (I have chosen to code it this way instead of using StructuredProperty because many Project objects can share Tag objects).
class Project(ndb.Model):
name = ndb.StringProperty()
tags = ndb.KeyProperty(kind=Tag, repeated=True)
budget = ndb.FloatProperty()
date_begin = ndb.DateProperty(auto_now_add=True)
date_end = ndb.DateProperty(auto_now_add=True)
#classmethod
def all(cls):
return cls.query()
#classmethod
def addTags(cls, from_str):
tagname_list = from_str.split(',')
tag_list = []
for tag in tagname_list:
tag_list.append(Tag.addTag(tag))
cls.tags = tag_list
--
Edited (2) :
Thanks. Finally, I have chosen to create a new Model class 'Relation' representing a relation between two entities. It's more an association, I confess that my first design was unadapted.

An alternative would be to use BigQuery. At first we used NDB, with a RawModel which stores individual, non-aggregated records, and an AggregateModel, which a stores the aggregate values.
The AggregateModel was updated every time a RawModel was created, which caused some inconsistency issues. In hindsight, properly using parent/ancestor keys as Tim suggested would've worked, but in the end we found BigQuery much more pleasant and intuitive to work with.
We just have cronjobs that run everyday to push RawModel to BigQuery and another to create the AggregateModel records with data fetched from BigQuery.
(Of course, this is only effective if you have lots of data to aggregate)

It really does depend on the use case. For small numbers of items StructuredProperty and repeated properties may well be the best fit.
For large numbers of entities you will then look at setting the parent/ancestor in the Key for composition, and have a KeyProperty pointing to the primary entity in a many to one aggregation.
However the choice will also depend heavily on the actual use pattern as well. Then considerations of efficiency kick in.
The best I can suggest is consider carefully how you plan to use these relationships, how active are they (ie are they constantly changing, adding, deleting), do you need to see all members of the relation most of the time, or just subsets. These consideration may well require adjustments to the approach.

Is querying NDB JsonProperty in Google App Engine possible? If not, any alternatives?

Is there any way of using JsonProperties in queries in NDB/GAE? I can't seem to find any information about this.
Person.query(Person.custom.eye_color == "blue").fetch()
With a model looking something like this:
class Person(ndb.Model):
height = ndb.IntegerProperty(default=-1)
#...
#...
custom = ndb.JsonProperty(indexed=False, compressed=False)
The use case is this: I'm storing data about customers, where we at first only needed to query specific data. Now, we want to be able to query for any type of registred data about the persons. For example eye color, which some may have put into the system, or any other custom key/value pair in our JsonProperty.
I know about the expando class but for me, it seems a lot easier to be able to query jsonproperty and to keep all the custom properties on the same "name"; custom. That means that the front end can just loop over the properties in custom. If an expando class would be used, it would be harder to differentiate.

Rather than using a JSONProperty have you considered using a StructuredProperty. You maintain the same structure, just stored differently and you can filter by sub components of the StructureProperty with some restrictions, but that may be sufficient.
See https://developers.google.com/appengine/docs/python/ndb/queries#filtering_structured_properties
for querying StructuredProperties.

How to setup a 3-tier web application project

EDIT:
I have added [MVC] and [design-patterns] tags to expand the audience for this question as it is more of a generic programming question than something that has direclty to do with Python or SQLalchemy. It applies to all applications with business logic and an ORM.
The basic question is if it is better to keep business logic in separate modules, or to add it to the classes that our ORM provides:
We have a flask/sqlalchemy project for which we have to setup a structure to work in. There are two valid opinions on how to set things up, and before the project really starts taking off we would like to make our minds up on one of them.
If any of you could give us some insights on which of the two would make more sense and why, and what the advantages/disadvantages would be, it would be greatly appreciated.
My example is an HTML letter that needs to be sent in bulk and/or displayed to a single user. The letter can have sections that display an invoice and/or a list of articles for the user it is addressed to.
Method 1:
Split the code into 3 tiers - 1st tier: web interface, 2nd tier: processing of the letter, 3rd tier: the models from the ORM (sqlalchemy).
The website will call a server side method in a class in the 2nd tier, the 2nd tier will loop through the users that need to get this letter and it will have internal methods that generate the HTML and replace some generic fields in the letter, with information for the current user. It also has internal methods to generate an invoice or a list of articles to be placed in the letter.
In this method, the 3rd tier is only used for fetching data from the database and perhaps some database related logic like generating a full name from a users' first name and last name. The 2nd tier performs most of the work.
Method 2:
Split the code into the same three tiers, but only perform the loop through the collection of users in the 2nd tier.
The methods for generating HTML, invoices and lists of articles are all added as methods to the model definitions in tier 3 that the ORM provides. The 2nd tier performs the loop, but the actual functionality is enclosed in the model classes in the 3rd tier.
We concluded that both methods could work, and both have pros and cons:
Method 1:
separates business logic completely from database access
prevents that importing an ORM model also imports a lot of methods/functionality that we might not need, also keeps the code for the model classes more compact.
might be easier to use when mocking out ORM models for testing
Method 2:
seems to be in line with the way Django does things in Python
allows simple access to methods: when a model instance is present, any function it
performs can be immediately called. (in my example: when I have a letter-instance available, I can directly call a method on it that generates the HTML for that letter)
you can pass instances around, having all appropriate methods at hand.

Normally, you use the MVC pattern for this kind of stuff, but most web frameworks in python have dropped the "Controller" part for since they believe that it is an unnecessary component. In my development I have realized, that this is somewhat true: I can live without it. That would leave you with two layers: The view and the model.
The question is where to put business logic now. In a practical sense, there are two ways of doing this, at least two ways in which I am confrontet with where to put logic:
Create special internal view methods that handle logic, that might be needed in more than one view, e.g. _process_list_data
Create functions that are related to a model, but not directly tied to a single instance inside a corresponding model module, e.g. check_login.
To elaborate: I use the first one for strictly display-related methods, i.e. they are somehow concerned with processing data for displaying purposes. My above example, _process_list_data lives inside a view class (which groups methods by purpose), but could also be a normal function in a module. It recieves some parameters, e.g. the data list and somehow formats it (for example it may add additional view parameters so the template can have less logic). It then returns the data set to the original view function which can either pass it along or process it further.
The second one is used for most other logic which I like to keep out of my direct view code for easier testing. My example of check_login does this: It is a function that is not directly tied to display output as its purpose is to check the users login credentials and decide to either return a user or report a login failure (by throwing an exception, return False or returning None). However, this functionality is not directly tied to a model either, so it cannot live inside an ORM class (well it could be a staticmethod for the User object). Instead it is just a function inside a module (remember, this is Python, you should use the simplest approach available, and functions are there for something)
To sum this up: Display logic in the view, all the other stuff in the model, since most logic is somehow tied to specific models. And if it is not, create a new module or package just for logic of this kind. This could be a separate module or even a package. For example, I often create a util module/package for helper functions, that are not directly tied for any view, model or else, for example a function to format dates that is called from the template but contains so much python could it would be ugly being defined inside a template.
Now we bring this logic to your task: Processing/Creation of letters. Since I don't know exactly what processing needs to be done, I can only give general recommendations based on my assumptions.
Let's say you have some data and want to bring it into a letter. So for example you have a list of articles and a costumer who bought these articles. In that case, you already have the data. The only thing that may need to be done before passing it to the template is reformatting it in such a way that the template can easily use it. For example it may be desired to order the purchased articles, for example by the amount, the price or the article number. This is something that is independent of the model, the order is now only display related (you could have specified the order already in your database query, but let's assume you didn't). In this case, this is an operation your view would do, so your template has the data ready formatted to be displayed.
Now let's say you want to get the data to create a specifc letter, for example a list of articles the user bough over time, together with the date when they were bought and other details. This would be the model's job, e.g. create a query, fetch the data and make sure it is has all the properties required for this specifc task.
Let's say in both cases you with to retrieve a price for the product and that price is determined by a base value and some percentages based on other properties: This would make sense as a model method, as it operates on a single product or order instance. You would then pass the model to the template and call the price method inside it. But you might as well reformat it in such a way, that the call is made already in the view and the template only gets tuples or dictionaries. This would make it easier to pass the same data out as an API (see below) but it might not necessarily be the easiest/best way.
A good rule for this decision is to ask yourself If I were to provide a JSON API additionally to my standard view, how would I need to modify my code to be as DRY as possible?. If theoretical is not enough at the start, build some APIs for the templates and see where you need to change things to the API makes sense next to the views themselves. You may never use this API and so it does not need to be perfect, but it can help you figure out how to structure your code. However, as you saw above, this doesn't necessarily mean that you should do preprocessing of the data in such a way that you only return things that can be turned into JSON, instead you might want to make some JSON specifc formatting for the API view.
So I went on a little longer than I intended, but I wanted to provide some examples to you because that is what I missed when I started and found out those things via trial and error.

Python AppEngine Sort By Referenced Property

I have a model Entry
class Entry(db.Model):
year = db.StringProperty()
.
.
.
and for whatever reason the last name field is stored in a different model LastName:
class LastName(db.Model):
entry = db.ReferenceProperty(Entry, collection_name='last_names')
last_name = db.StringProperty()
If I query Entry and sort it by year (or any other property) using .order() how would I then sort that by the last name? I'm new to python but coming from Java I would guess there's some kind of comparator equivalent; or I'm completely wrong and there's another way to do it. I for sure cannot change my model at this point in time, though that may be the solution later. Any suggestions?
EDIT: I'm currently paginating through the results using offsets (moving to cursors soon, but I think it would be the same issue). So if I try to sort outside of the datastore I would only be sorting the current set; it's possible that the first page will be all 'B's and the second page will have 'A's, so it will only be sorted by page not by overall set. Am I screwed the way my models are currently set up?

A few issues here.
There's no way to do this sorting directly in the datastore API, either in Python or Java - as you no doubt know, the datastore is non-relational, and indirect lookups like this aren't supported.
If this was just a straight one-to-one relationship, which gave you an accessor from the Entry entity to the LastName one, you could use the standard Python sort function to sort the list:
entries.sort(key=lambda e: e.last_name.last_name)
(note that this sorts the list in place but returns None, so don't try assigning from it).
However, this won't work, because what you've actually got here is a one-to-many relationship: there are potentially many LastNames for each Entry. The definition actually recognises this: the collection_name attribute, which defines the accessor from Entry to LastName, is called last_names, ie plural.
So what you're asking doesn't really make sense: which of the potentially many LastNames do you want to sort on? You can certainly do it the other way round - given a query of LastNames, sort by entry year - but given your current structure there's not really any way of doing it.
I must say though, although I don't know the rest of your models, I suspect you have actually got that relationship the wrong way round: the ReferenceProperty should probably live on Entry pointing to LastName rather than the other way round as it is now. Then it would simply be the sort call I gave above.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.