Archiving Django models

Archiving Django models - python

I'm creating an online order system for selling items on a regular basis (home delivery of vegetable boxes). I have an 'order' model (simplified) as follows:
class BoxOrder(models.Model):
customer = models.ForeignKey(Customer)
frequency = models.IntegerField(choices=((1, "Weekly"), (2, "Fortnightly)))
item = models.ForeignKey(Item)
payment_method = models.IntegerField(choices=((1, "Online"), (2, "Free)))
Now my 'customer' has the ability to change the frequency of the order, or the 'item' (say 'carrots') being sold or even delete the order all together.
What I'd like to do is create weekly 'backups' of all orders processed that week, so that I can see a historical graph of all the orders ever sold every week. The problem with just archiving the order into another table/database is that if an item (say I no longer sell carrots) is deleted for some reason, then that archived BoxOrder would become invalid because of the ForeignKeys
What would be the best solution for creating an archiving system using Django - so that orders for every week in history are viewable in Django admin, and they are 'static' (i.e. independent of whether any other objects are deleted)?
I've thought about creating a new 'flat' BoxOrderArchive model, then using a cron job to move orders for a given week over, e.g.:
class BoxOrderArchive(models.Model):
customer_name = models.CharField(max_length=20)
frequency = models.IntegerField()
item_name = models.CharField() # refers to BoxOrder.item.name
item_price = models.DecimalField(max_digits=10, decimal_places=2) # refers to BoxOrder.item.price
payment_method = models.IntegerField()
But I feel like that might be a lot of extra work. Before I go down that route, it would be great to know if anybody has any other solutions?
Thanks

This is a rather broad topic, and I won't specifically answer your question, however my advice to you is don't delete or move anything. You can add a boolan field to your Item named is_deleted or is_active or something similar and play with that flag when you delete your item. This way you can
keep your ForeignKeys,
have a different representation for non-active items
restore and Item that was previously deleted (for instance you may want to sell Carrots again after some months - this way your statistics will be consistent across the year)
The same advice is true for the BoxOrder model. Do not remove rows to different tables, just add an is_archived field and set it to True.

So, after looking into this long and hard, I think the best solution for me is to create a 'flat' version of the object, dereferencing any existing objects, and save that in the database.
The reason for this is that my 'BoxOrder' object can change every week (as the customer edits their address, item, cost etc. Keeping track of all these changes is just plain difficult.
Plus, I don't need to do anything with the data other than display it to the sites users.
Basically what I am wanting to do is to create a snapshot, and none of the existing tools really are what I want. Having said that, others may have different priorities, so here's a list of useful links:
[1] SO question regarding storing a snapshot/pickling model instances
[2] Django Simple History Docs - stores model state on every create/update/delete
[3] Django Reversion Docs - allows reverting a model instance
For discussion on [2] and [3], see the comments on Serafim's answer

Related

Storing multiple values into a single field in mysql database that preserve order in Django

I've been trying to build a Tutorial system that we usually see on websites. Like the ones we click next -> next -> previous etc to read.
All Posts are stored in a table(model) called Post. Basically like a pool of post objects.
Post.objects.all() will return all the posts.
Now there's another Table(model)
called Tutorial That will store the following,
class Tutorial(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
tutorial_heading = models.CharField(max_length=100)
tutorial_summary = models.CharField(max_length=300)
series = models.CharField(max_length=40) # <---- Here [10,11,12]
...
Here entries in this series field are post_ids stored as a string representation of a list.
example: series will have [10,11,12] where 10, 11 and 12 are post_id that correspond to their respective entries in the Post table.
So my table entry for Tutorial model looks like this.
id heading summary series
"5" "Series 3 Tutorial" "lorem on ullt consequat." "[12, 13, 14]"
So I just read the series field and get all the Posts with the ids in this list then display them using pagination in Django.
Now, I've read from several stackoverflow posts that having multiple entries in a single field is a bad idea. And having this relationship to span over multiple tables as a mapping is a better option.
What I want to have is the ability to insert new posts into this series anywhere I want. Maybe in the front or middle. This can be easily accomplished by treating this series as a list and inserting as I please. Altering "[14,12,13]" will reorder the posts that are being displayed.
My question is, Is this way of storing multiple values in field for my usecase is okay. Or will it take a performance hit Or generally a bad idea. If no then is there a way where I can preserve or alter order by spanning the relationship by using another table or there is an entirely better way to accomplish this in Django or MYSQL.

Here entries in this series field are post_ids stored as a string representation of a list.
(...)
So I just read the series field and get all the Posts with the ids in this list then display them using pagination in Django.
DON'T DO THIS !!!
You are working with a relational database. There is one proper way to model relationships between entities in a relational database, which is to use foreign keys. In your case, depending on whether a post can belong only to a single tutorial ("one to many" relationship) or to many tutorials at the same time ("many to many" relationship, you'll want either to had to post a foreign key on tutorial, or to use an intermediate "post_tutorials" table with foreign keys on both post and tutorials.
Your solution doesn't allow the database to do it's job properly. It cannot enforce integrity constraints (what if you delete a post that's referenced by a tutorial ?), it cannot optimize read access (with proper schema the database can retrieve a tutorial and all it's posts in a single query) , it cannot follow reverse relationships (given a post, access the tutorial(s) it belongs to) etc. And it requires an external program (python code) to interact with your data, while with proper modeling you just need standard SQL.
Finally - but this is django-specific - using proper schema works better with the admin features, and with django rest framework if you intend to build a rest API.
wrt/ the ordering problem, it's a long known (and solved) issue, you just need to add an "order" field (small int should be enough). There are a couple 3rd part django apps that add support for this to both your models and the admin so it's almost plug and play.
IOW, there are absolutely no good reason to denormalize your schema this way and only good reasons to use proper relational modeling. FWIW I once had to work on a project based on some obscure (and hopefully long dead) PHP cms that had the brillant idea to use your "serialized lists" anti-pattern, and I can tell you it was both a disaster wrt/ performances and a complete nightmare to maintain. So do yourself and the world a favour: don't try to be creative, follow well-known and established best practices instead, and your life will be much happier. My 2 cents...

I can think of two approaches:
Approach One: Linked List
One way is using linked list like this:
class Tutorial(models.Model):
...
previous = models.OneToOneField('self', null=True, blank=True, related_name="next")
In this approach, you can access the previous Post of the series like this:
for tutorial in Tutorial.objects.filter(previous__isnull=True):
print(tutorial)
while(tutorial.next_post):
print(tutorial.next)
tutorial = tutorial.next
This is kind of complicated approach, for example whenever you want to add a new tutorial in middle of a linked-list, you need to change in two places. Like:
post = Tutorial.object.first()
next_post = post.next
new = Tutorial.objects.create(...)
post.next=new
post.save()
new.next = next_post
new.save()
But there is a huge benefit in this approach, you don't have to create a new table for creating series. Also, there is possibility that the order in tutorials will not be modified frequently, which means you don't need to take too much hassle.
Approach Two: Create a new Model
You can simply create a new model and FK to Tutorial, like this:
class Series(models.Model):
name = models.CharField(max_length=255)
class Tutorial(models.Model):
..
series = models.ForeignKey(Series, null=True, blank=True, related_name='tutorials')
order = models.IntegerField(default=0)
class Meta:
unique_together=('series', 'order') # it will make sure that duplicate order for same series does not happen
Then you can access tutorials in series by:
series = Series.object.first()
series.tutorials.all().order_by('tutorials__order')
Advantage of this approach is its much more flexible to access Tutorials through series, but there will be an extra table created for this, and one extra field as well to maintain order.

Should we save results in database models which we can anytime calculate using data?

I am trying to create a Report Card Model. I have with me:
Question ids, answers selected for each question by candidate, correct answer id of each question, weight of each question.
Is it a good idea, to create fields like "Total marks, average, no of correct answers, number of questions" etc in my ReportCard model OR should I calculate everything , every time a viewer visits the detail view of this report card ?
My Model so far:
class ReportCard(models.Model):
exam = models.OneToOneField(Exam)
class ExamChoiceMade(models.Model):
report_card = models.OneToOneField(ReportCard)
question_no = models.PositiveIntegerField(default=0)
answer_chosen = models.PositiveIntegerField(default=0)
is_correct = models.BooleanField(default=False)

First thing that you need to remember is that no matter what decisions you make, there will be trade-offs. And among all the choices you have, you need to consider the best ever option.
In web you mainly need to consider the scalability as the main issue related to performance trade-offs.
It is a good practice to keep lightly calculated (as in non-resource hungry) fields as model-properties so that they will act as a field of the tables but never gets stored and is calculated on-demand.
Now when we consider the on-demand calculation if it is resource hungry, your response is going to be very slow. And we should be very careful to keep our response time < 100ms for any normal(even those appears to be normal for end user) actions.
So the answer to you question is that the call on whether to store or calculate on demand is requirement dependant.
However the fileds that you have mentioned above doesn't seem to be resource hungry and so can be just model property.

Mutually exclusive many-to-many relationship in Django models

I'm trying to create a simple model to keep track of discount coupons in Django 1.10 (with Postgres 9.5 as underlying database) and I was wondering if there's a way to make sure that a coupon instance (id, perhaps is a more accurate term?) doesn't appear in two M2M relationships at the same time.
I'm sure everyone is familiar with how discount coupons work but, just in case, let me explain my use case:
Some coupons would be always applied. For instance: "Free delivery in your first purchase", or "10% off Pepsi for the rest of your life"... things like that.
Some other coupons would be applied through a code (a simple string, really) that the user would have to input somewhere (like "Get a 5% off with the code "5-OFF" "... yeah, I'll probably have to work on the obfuscation of the codes :-D )
The user could say "No, I don't want to apply this coupon to this order, I'll use it later". For instance: if the user can use a one-time 5% off coupon, but wants to keep it for a large purchase. Let's say the customer knows he's going to make a large purchase in the upcoming future, and right now he's doing a small one. He might wanna keep the 5% off for the later (bigger) purchase.
To keep track of those things, I have a model like this:
class CouponTracker(models.Model):
# Coupons that require a code to be activated:
extra_applied_coupons = ManyToManyField('Coupon', related_name='+')
# Coupons that the user could have applied and specifically
# said no (such as 5% off if the purchase is small, and wants
# to use it later):
vetoed_coupons = ManyToManyField('Coupon', related_name='+')
So, the question is:
How can I enforce (at a database level, through a constraint) that a coupon does not appear at the same time in extra_applied_coupons and vetoed_coupons?
Thank you in advance!

Why don't you combine 2 extra_applied_coupons and vetoed_coupons and have 1 more fields (for example, type) to determine coupon's group. Then problem will be simpler, just ensure unique in 1 ManyToMany relationship
class CouponTracker(models.Model):
coupons = ManyToManyField('Coupon', related_name='+')
type = models.IntegerField(default=0)
type can be 0 for extra_applied_coupons and 1 for vetoed_coupons.
If you want to add more relationship attribute, you can check https://docs.djangoproject.com/en/1.11/topics/db/models/#extra-fields-on-many-to-many-relationships

Since ManyToMany relations creating a seperate table, AFAIK there cannot make an UNIQUE constraint across tables. So there is no direct way to add contraint on db level. Check this . Either you have to do on application layer or some hackish way like this

Best way to generate deadlines for milestones from template for different users in django

I want to generate the list of various milestones to accomplish something, and the deadline for each of them is calculated dynamically from a final date given by the user.
I'm not sure about the best way to handle this. The first idea that came to my mind is to write some template (not django template here) file on the server containing the necessary informations for generating all the steps, which will be fetched once for every new user, and used to create a list of milestone objects from a milestone class (some generic model in django). Maybe something written in json :
{"some_step":
{
"start_date" = "final_date-10",
"end_date" = "final_date-7",
}
}
and the corresponding model
class Milestone(models.Model):
name = models.Charfield()
start_date = models.DateField()
end_date = models.DateField()
def time_to_final(self,time):
return self.final_date-time
strings like the "finaldate-10" would be converted by some routine and passed at the registration time to the time_to_final method, when initializing the data for the new user in the database.
However I'm not sure it's the best approach. Though it won't be used by millions of people, I'm worried about possible negative impacts on the server performances ? Is there a better, maybe more pythonic way ?
EDIT for more clarification :
A user wants to do complete something at date D0.
My app generates the steps like this :
do step 1 from date D1i to date D1f
do step 2 from date D2i to date D2f
-...
until date D0 is reached and all tasks are completed
All the dates are calculated when D0 is provided.
All the steps are generated for every user.

What have templates got to do with this? Design your models first - maybe you need a Steps model with a foreign key to User and a foreign Key to Milestone (or maybe not - I'm not clear from your description).
Only when you've got the data clear in your mind start thinking about templates etc.
The great thing about django is that once you've made your models you can use the admin interface to enter some data. It will quickly become clear whether you've modeled your problem correctly.
Don't worry about performance, get your data structures clear, make it work and if you find it isn't running fast enough (unlikely) optimize it.

Best way to merge and unmerge objects without losing data

Say I have two tables (I am using Django, but this question is mostly language agnostic):
Organization(models.Model):
name = models.CharField(max_length=100)
Event(models.Model):
organization = models.ForeignKey(Organization)
name = models.CharField(max_length=100)
Users are allowed to create both events and organizations. There is the chance that two separate users create organization objects that are supposed to resemble the same real world organization. When someone notices this problem, they should be able to merge the two objects so there is only one organization.
The question I have is this: How do I merge these two organizations in order to ensure I can "unmerge" them if the user incorrectly merged them? Thus, the simple solution of deleting one of the Organization objects and pointing all Events to the other one is not an option. I am looking for very high level guidelines on best practices here.
A few possible solutions:
Add another table that joins together organizations that have been "merged" and keep track of merges that way
Add a foreign key field on Organization to point to an organization it was merged with
Keep copies of all of the original objects as they existed before a merge, using something like django-reversion

Personally, I would go with a solution which uses something like django-reversion. However, if you want to create something more robust and less dependent on 3rd party logic, add a merged_into field to Organization and merged_from field to Event:
Organization(models.Model):
name = models.CharField(max_length=100)
merged_into = models.ForeignKey('self', null=True, blank=True)
Event(models.Model):
organization = models.ForeignKey(Organization)
name = models.CharField(max_length=100)
merged_from = models.ForeignKey(Organization null=True, blank=True)
On merge, you can choose update the events as well. From now on, be sure to redirect all references of "merged_into" organizations into the new organization.
If you want to allow multiple merges (for example: A + B into C, A+C into D, E+F into G and D+G into H), you can create a new organization instance each time and merge both "parents" into it, copying the events instead of updating them. This keeps the original events intact waiting for a rollback. This also allows merging more than 2 organizations into a new one in one step.

My suggestion would be a diff-like interface. For each field, you provide all the possible values from the objects being merged. The person merging them chooses the appropriate value for each field. You'd probably only want to show fields on which a conflict was detected in this view.
After all conflicting fields have had a "good" value chosen for them. You create a new object, assign relationships from the old versions to that one, and delete the old versions.
If you're looking for some sort of automatic approach, I think you'd be hard pressed to find one, and even if you did, it would not really be a good idea. Any time you're merging anything you need a human in the middle. Even apps that sync contacts and such don't attempt to handle conflicts on their own.

I think there is a key hack.
Organization will have usual id field, and an another 'aliases' field. 'aliases' field would be comma separated ids. In that field you'll track the organizations that may be pointing to the same in real world. Let's say there was a 2 organization named organization_1, organization_2 and id is 1, 2.
organization_1 organization_2
_id = 1 _id = 2
aliases = '1, 2' aliases = '2, 1'
If you want to query event's that is only belong to organization_1, you can do it. If you want to query all events of organization_1, organization_2, you check it if aliases field contains the key. Maybe separator should be not just ',' it should also surround aliases field a whole. Something like ',1,2,'. In this way we can be sure to check if it contains ',id,'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.