Django design patterns - models with ForeignKey references to multiple classes - python

I'm working through the design of a Django inventory tracking application, and have hit a snag in the model layout. I have a list of inventoried objects (Assets), which can either exist in a Warehouse or in a Shipment. I want to store different lists of attributes for the two types of locations, e.g.:
For Warehouses, I want to store the address, manager, etc.
For Shipments, I want to store the carrier, tracking number, etc.
Since each Warehouse and Shipment can contain multiple Assets, but each Asset can only be in one place at a time, adding a ForeignKey relationship to the Asset model seems like the way to go. However, since Warehouse and Shipment objects have different data models, I'm not certain how to best do this.
One obvious (and somewhat ugly) solution is to create a Location model which includes all of the Shipment and Warehouse attributes and an is_warehouse Boolean attribute, but this strikes me as a bit of a kludge. Are there any cleaner approaches to solving this sort of problem (Or are there any non-Django Python libraries which might be better suited to the problem?)

what about having a generic foreign key on Assets?

I think its perfectly reasonable to create a "through" table such as location, which associates an asset, a content (foreign key) and a content_type (warehouse or shipment) . And you could set a unique constraint on the asset_fk so thatt it can only exist in one location at a time

Related

Generic data handling in Python

The situation
While reading the Bible (as context) I'd like to point out certain dependencies e.g. of people and locations. Due to swift expandability I'm choosing Python to handle this versatile data. Currently I'm creating many feature vectors independent from each other, containing various information as the database.
In the end I'd like to type in a keyword to search in this whole database, which shall return everything that is in touch with it. Something simple as
results = database(key)
What I'm looking for
Unfortunately I'm not a Pro about different database handling possibilities and I hope you can help me finding an appropriate option.
Are there possibilities that can be used out of the box or do I need to create all logic by myself?
This is a little vague so I'll try to handle the People and Location bit of it to help you get started.
One possibility is to build a SQLite database. (The sqlite3 library + documentation is relatively friendly). Also here's a nice tutorial on getting started with SQLite.
To start, you can create two entity tables:
People: contains details about every person in bible.
Locations: contains details about every location in bible.
You can then create two relationship tables that reference people and locations (as Foreign Keys). For example, one of these relationship tables might be
People_Visited_Locations: contains information about where each person visited in their lifetime. The schema might looks something like this:
| person (Foreign Key)| location (Foreign Key) | year |
Remember that Foreign Key refers to an entry in another table. In our case, person is an existing unique ID from your entity table People, location is an existing unique ID from your entity table Locations, and year could be the year that person went to that location.
Then to fetch every place that some person, say Adam in the bible visited, you can create a Select statement that returns all entries in People_Visited_Locations with Adam as person.
I think key (pun intended) takeaway is how Relationship tables can help you map relationships between entities.
Hope this helps get you started :)

Association objects with history for relationships using ORM

A common type of relationship in schemas is this: a joiner table has a datetime element and is meant to store history about relationships between the rows of two other tables over time. These relationships are one-to-one or one-to-many even though we're using an association table which usually implies many-to-many. At any given point in time only one mapping, the latest at that point in time, is valid. For example:
Tables:
Computer: [id, name, description]
Locations: [id, name, address]
ComputerLocations: [id, computers_id, locations_id, timestamp]
A Computers object can only belong to one Locations object at a time (and Locations can have many Computers), but we store the history in the table. Rows in ComputerLocations aren't deleted, only superseded by new rows at query-time. Perhaps in the future some prune-type event will remove older rows as their usefulness is reduced.
What I'm looking do do is model this in SQLAlchemy, specifically in the ORM, so that a Computers class has the following properties:
A new Computer can be created without (independently of) a location (this makes sense because the location table is separate)
A new Location can be created without (independently of) a computer
If a Computer has a location it must be a member of Locations (foreign key constraint)
When updating an existing Computers object's location, a new row will be added to ComputerLocations with a datetime of NOW()
When creating a new Computers object with a location, a new row will be added to ComputerLocations with a datetime of NOW()
Everything should be atomic (i.e. fail if a new Computer is created but the row associating it to a location can't be created)
Is there a specific design pattern or a concrete method in SQLAlchemy ORM to accomplish this? The documentation has a section on Non-traditional mappings that includes mapping a class against multiple tables and to arbitrary selects so this looks promising. Further there was another question of stackoverflow that mentioned vertical tables. Due to my relative inexperience with SQLAlchemy I cannot synthesize this information into a robust and elegant solution yet. Any help would be greatly appreciated.
I'm using MySQL but a solution should be general enough for any database through the SQLAlchemy dialects system.

MongoEngine: EmbeddedDocument v/s. ReferenceField

EmbeddedDocument will allow to store a document inside another document, while RefereneField just stores it's reference. But, they're achieving a similar goal. Do they have specific use cases?
PS:
There's already a question on SO, but no good answers.
The answer to this really depends on what intend to do with the data you are storing in mongodb. It is important to remember that a ReferenceField will point to a document in another collection in mongodb, whereas an EmbeddedDocument is stored in the same document in the same collection.
Consider this schema:
Person
-> name
-> address
Address
-> street
-> city
-> country
If you expect every person to have only one address and each address to only be associated with one person (a one-to-one relationship) and you are generally going to query the database for one or more Person documents then the Person.address field should be EmbeddedDocumentField.
If you expect every person to have more than one address but each address will only be associated to one person (a one-to-many relationship) and you will still mainly query for a Person then you can use an EmbeddedDocumentListField.
If you expect every person to have more than one address and each address will be associated with many people (a many-to-many relationship) you probably should use ReferenceField.
However, even if you are one-to-one or one-to-many, if the Address is part of your data model that is of interest then it may be advantageous to have it stored in it's own collection because it makes aggregation and indexing easier.
One other point to consider is that unless you turn it off mongoengine will de-reference every ReferenceFieldwhen you retrieve a document - this might introduce performance penalties with lots of ReferenceField or references to very large documents.
It's really about the schema design of your collections in MongoDB. Generally it depends on different factors like cardinality of the relationship, way of accessing the data or size of the documents. It's explained well in official MongoDB's blog with some examples and I recommend you take a look at it.

How to model a unique constraint in GAE ndb

I want to have several "bundles" (Mjbundle), which essentially are bundles of questions (Mjquestion). The Mjquestion has an integer "index" property which needs to be unique, but it should only be unique within the bundle containing it. I'm not sure how to model something like this properly, I try to do it using a structured (repeating) property below, but there is yet nothing actually constraining the uniqueness of the Mjquestion indexes. What is a better/normal/correct way of doing this?
class Mjquestion(ndb.Model):
"""This is a Mjquestion."""
index = ndb.IntegerProperty(indexed=True, required=True)
genre1 = ndb.IntegerProperty(indexed=False, required=True, choices=[1,2,3,4,5,6,7])
genre2 = ndb.IntegerProperty(indexed=False, required=True, choices=[1,2,3])
#(will add a bunch of more data properties later)
class Mjbundle(ndb.Model):
"""This is a Mjbundle."""
mjquestions = ndb.StructuredProperty(Mjquestion, repeated=True)
time = ndb.DateTimeProperty(auto_now_add=True)
(With the above model and having fetched a certain Mjbundle entity, I am not sure how to quickly fetch a Mjquestion from mjquestions based on the index. The explanation on filtering on structured properties looks like it works on the Mjbundle type level, whereas I already have a Mjbundle entity and was not sure how to quickly query only on the questions contained by that entity, without looping through them all "manually" in code.)
So I'm open to any suggestion on how to do this better.
I read this informational answer: https://stackoverflow.com/a/3855751/129202 It gives some thoughts about scalability and on a related note I will be expecting just a couple of bundles but each bundle will have questions in the thousands.
Maybe I should not use the mjquestions property of Mjbundle at all, but rather focus on parenting: each Mjquestion created should have a certain Mjbundle entity as parent. And then "manually" enforce uniqueness at "insert time" by doing an ancestor query.
When you use a StructuredProperty, all of the entities that type are stored as part of the containing entity - so when you fetch your bundle, you have already fetched all of the questions. If you stick with this way of storing things, iterating to check in code is the solution.

Modeling Hierarchical Data - GAE

I'm new in google-app-engine and google datastore (bigtable) and I've some doubts in order of which could be the best approach to design the required data model.
I need to create a hierarchy model, something like a product catalog, each domain has some subdomains in deep. For the moment the structure for the products changes less than the read requirements. Wine example:
Origin (Toscana, Priorat, Alsacian)
Winery (Belongs only to one Origin)
Wine (Belongs only to one Winery)
All the relations are disjoint and incomplete. Additionally in order of the requirements probably we need to store counters of use for every wine (could require transactions)
In order of the documentation seems there're different potential solutions:
Ancestors management. Using parent relations and transactions
Pseudo-ancestor management. Simulating ancestors with a db.ListProperty(db.Key)
ReferenceProperty. Specifying explicitelly the relation between the classes
But in order of the expected requests to get wines... sometimes by variety, sometimes by origin, sometimes by winery... i'm worried about the behaviour of the queries using these structures (like the multiple joins in a relational model. If you ask for the products of a family... you need to join for the final deep qualifier in the tree of products and join since the family)
Maybe is better to create some duplicated information (in order of the google team recommendations: operations are expensive, but storage is not, so duplicate content should not be seen the main problem)
Some responses of other similar questions suggest:
Store all the parent ids as a hierarchy in a string... like a path property
Duplicate the relations between the Drink entity an all the parents in the tree ...
Any suggestions?
Hi Will,
Our case is more an strict hierarchical approach as you represent in the second example. And the queries is for retrieving list of products, retrieve only one is not usual.
We need to retrieve all the wines from an Origin, from a Winery or from a Variety (If we supose that the variety is another node of the strict hierarchical tree, is only an example)
One way could be include a path property, as you mentioned:
/origin/{id}/winery/{id}/variety/{id}
To allow me to retrieve a list of wines from a variety applying a query like this:
wines_query = Wine.all()
wines_query.filter('key_name >','/origin/toscana/winery/latoscana/variety/merlot/')
wines_query.filter('key_name <','/origin/toscana/winery/latoscana/variety/merlot/zzzzzzzz')
Or like this from an Origin:
wines_query = Wine.all()
wines_query.filter('key_name >','/origin/toscana/')
wines_query.filter('key_name <','/origin/toscana/zzzzzz')
Thank you!
I'm not sure what kinds of queries you'll need to do in addition to those mentioned in the question, but storing the data in an explicit ancestor hierarchy would make the ones you asked about fall out pretty easily.
For example, to get all wines from a particular origin:
origin_key = db.Key.from_path('Origin', 123)
wines_query = db.Query(Wine).ancestor(origin_key)
or to get all wines from a particular winery:
origin_key = db.Key.from_path('Origin', 123)
winery_key = db.Key.from_path('Winery', 456, parent=origin_key)
wines_query = db.Query(Wine).ancestor(winery_key)
and, assuming you're storing the variety as a property on the Wine model, all wines of a particular variety is as simple as
wines_query = Wine.all().filter('variety =', 'merlot')
One possible downside of this strict hierarchical approach is the kind of URL scheme it can impose on you. With a hierarchy that looks like
Origin -> Winery -> Wine
you must know the key name or ID of a wine's origin and winery in order to build a key to retrieve that wine. Unless you've already got the string representation of a wine's key. This basically forces you to have URLs for wines in one of the following forms:
/origin/{id}/winery/{id}/wine/{id}
/wine/{opaque and unfriendly datastore key as a string}
(The first URL could of course be replaced with querystring parameters; the important part is that you need three different pieces of information to identify a given wine.)
Maybe there are other alternatives to these URL schemes that have not occurred to me, though.

Categories