Django Models: Keep track of activity through related models?

Django Models: Keep track of activity through related models? - python

I have something of a master table of Persons. Everything in my Django app some relates to one or more People, either directly or through long fk chains. Also, all my models have the standard bookkeeping fields 'created_at' and 'updated_at'. I want to add a field on my Person table called 'last_active_at', mostly for raw sql ordering purposes.
Creating or editing certain related models produces new timestamps for those objects. I need to somehow update Person.'last_active_at' with those values. Functionally, this isn't too hard to accomplish, but I'm concerned about undue stress on the app.
My two greatest causes of concern are that I'm restricted to a real db field--I can't assign a function to the Person table as a #property--and one of these 'activity' models receives and processes new instances from a foreign datasource I have no control over, sporadically receiving a lot of data at once.
My first thought was to add a post_save hook to the 'activity' models. Still seems like my best option, but I know nothing about them, how hard they hit the db, etc.
My second thought was to write some sort of script that goes through the day's activity and updates those models over the night. My employers a 'live'er stream, though.
My third thought was to modify the post_save algo to check if the 'updated_at' is less than half an hour from the Person's 'last_active_at', and not update the person if true.
Are my thoughts tending in a scalable direction? Are there other approaches I should pursue?

It is said that premature optimization is the mother of all problems. You should start with the dumbest implementation (update it every time), and then measure and - if needed - replace it with something more efficient.
First of all, let's put a method to update the last_active_at field on Person. That way, all the updating logic itself is concentrated here, and we can easily modify it later.
The signals are quite easy to use : it's just about declaring a function and registering it as a receiver, and it will be ran each time the signal is emitted. See the documentation for the full explanation, but here is what it might look like :
from django.db.models.signals import post_save
from django.dispatch import receiver
#receiver(post_save, sender=RelatedModel)
def my_handler(sender, **kwargs):
# sender is the object being saved
person = # Person to be updated
person.update_activity()
As for the updating itself, start with the dumbest way to do it.
def update_activity(self):
self.last_active_at = now()
Then measure and decide if it's a problem or not. If it's a problem, some of the things you can do are :
Check if the previous update is recent before updating again. Might be useless if a read to you database is not faster than a write. Not a problem if you use a cache.
Write it down somewhere for a deferred process to update later. No need to be daily : if the problem is that you have 100 updates per seconds, you can just have a script update the database every 10 seconds, or every minutes. You can probably find a good performance/uptodatiness trade-off using this technique.
These are just some though based on what you proposed, but the right choice depends on the kind of figures you have. Determine what kind of load you'll have, what kind of reaction time is needed for that field, and experiment.

Related

Django - Best way to create snapshots of objects

I am currently working on a Django 2+ project involving a blockchain, and I want to make copies of some of my object's states into that blockchain.
Basically, I have a model (say "contract") that has a list of several "signature" objects.
I want to make a snapshot of that contract, with the signatures. What I am basically doing is taking the contract at some point in time (when it's created for example) and building a JSON from it.
My problem is: I want to update that snapshot anytime a signature is added/updated/deleted, and each time the contract is modified.
The intuitive solution would be to override each "delete", "create", "update" of each of the models involved in that snapshot, and pray that all of them the are implemented right, and that I didn't forget any. But I think that this is not scalable at all, hard to debug and ton maintain.
I have thought of a solution that might be more centralized: using a periodical job to get the last update date of my object, compare it to the date of my snapshot, and update the snapshot if necessary.
However with that solution, I can identify changes when objects are modified or created, but not when they are deleted.
So, this is my big question mark: how with django can you identify deletions in relationships, without any prior context, just by looking at the current database's state ? Is there a django module to record deleted objects ? What are your thoughts on my issue ?

All right?
I think that, as I understand your problem, you are in need of a module like Django Signals, which listens for changes in the database and, when identified (and if all the desired conditions are met), executes certain commands in your application ( even be able to execute in the database).
This is the most recent documentation:
https://docs.djangoproject.com/en/3.1/topics/signals/

SQLAlchemy Declarative: How to merge models and existing business logic classes

I would like to know what the best practices are for using SQLALchemy declarative models within business logic code. Perhaps stackexchange.codereview may have a been a better place to ask this, but I'm not sure.
Here's some background.
Let's say I have a bunch of classes doing various things. Most of them have little or nothing to do with each other.Each such class has between a hundred to thousand lines of code doing things that have precious little to do with the database. In fact, most of the classes aren't even database aware so far. I've gotten away with storing the actual information in flat files (csv, yaml, so on), and only maintaining a serial number table and a document path - serial number mapping in the database. Each object retrieves the files it needs by getting the correct paths from the database (by serial number) and reconstructs itself from there. This has been exceedingly convenient so far, since my 'models' have been (and admittedly, continue to be) more than fluid.
As I expand the involvement of the database in the codebase I currently have, I seem to have settled on the following model, separating the database bits and the business logic into two completely separate parts, and joining them using specific function calls instead of inheritance or even composition. Here is a basic example of the kind of code I have now (pseudocode-quality):
module/db/models.py:
class Example(Base):
id = Column(...)
some_var = Column(...)
module/db/controller.py:
from .models import Example
def get_example_by_id(id, session):
return session.query(Example).filter_by(id=id).one()
def upsert_example(id=None, some_var=None, session):
if id is not None:
try:
example_obj = get_example_by_id(id, session)
example_obj.some_var = some_var
return
except:
pass
example_obj = Example(some_var=some_var)
session.add(example_obj)
session.flush()
module/example.py:
from db import controller
class Example(object):
def __init__(self, id):
self._id = id
self._some_var = None
try:
self._load_from_db()
self._defined = True
except:
self._defined = False
def _load_from_db(self, session):
db_obj = controller.get_example_by_id(self._id, session)
self._some_var = db_obj.some_var
def create(some_var, session):
if self._defined is True:
raise Exception
self._some_var = some_var
self._sync_to_db(session)
def _sync_to_db(self, session):
controller.upsert_example(self._some_var, session)
#property
def some_var(self):
return self._some_var
...
I'm not convinced this is the way to go.
I have a few models following this pattern, and many more that I should implement in time. The database is currently only used for persistence and archiving. Once something is in the database, it's more or less read only from there on in. However, querying on it is becoming important.
The reason I'm inclined to migrate from the flatfiles to the database is largely to improve scalability.
Thus far, if I wanted to find all instances (rows) of Example with some_var = 3, I'd have to construct all of the instances from the flat files and iterate through them. This seems like a waste of both processor time and memory. In many cases, some_var is actually a calculated property, and reached by a fairly expensive process using source data contained in the flat file.
With the structure above, what I would do is query on Example, obtain a list of 'id's which satisfy my criterion, and then reconstruct just those module instances.
The ORM approach, however, as I understand it, would use thick models, where the objects returned by the query are themselves the objects I would need. I'm wondering whether it makes sense to try to move to that kind of a structure.
To that end, I have the following 'questions' / thoughts:
My instinct is that the code snippets above are anti-patterns more than they are useful patterns. I can't put my finger on why, exactly, but I'm not very happy with it. Is there a real, tangible disadvantage to the structure as listed above? Would moving to a more ORM-ish design provide advantages in functionality / performance / maintainability over this approach?
I'm paranoid about tying myself down to a database schema. I'm also paranoid about regular DB migrations. The approach listed above gives me a certain peace of mind in knowing that if I do need to do some migration, it'll be limited to the _load_from_db and _sync_to_db functions, and let me mess around willy nilly with all the rest of the code.
I'm I wrong about the cost of migrations in the thick-Model approach being high?
Is my sense of security in restricting my code's db involvement more of a false sense of security rather than a useful separation?
If I wanted to integrate Example from module/db/models.py with Example from module/example.py in the example above, what would be the cleanest way to go about it. Alternatively, what is an accepted pattern for handling business-logic heavy models with SQLAlchemy?
In the code above, note that the business logic class keeps all of it's information in 'private' instance variables, while the Model class keeps all of it's information in class variables. How would integrating these two approaches actually work? Theoretically, they should still 'just work' even if put together in a single class definition. In practice, does it?
(The actual codebase is on github, though it's not likely to be very readable)

I think it's natural (at least for me) to be critical of our own designs even as we are working on them. The structures you have here seem fine to me. The answer to whether they are a good fit depends on what you plan to do.
If you consolidate your code into thick models then all of it will be one place and your architecture will be simpler, however, it will also probably mean that your business logic will then be tightly bound to the schema created in the database. Rethinking the database means rethinking large portions of other areas in the app.
Following the code sample provided here means separating the concerns which has the negative side effects such as more lines of code in more places and increased complexity, but it also means that the coupling is looser. If you stay true then you should have significantly less trouble if you decide to change your database schema or move to an entirely different form of storage. Since your business logic class is a plain old object it serves as a nice detached state container. If you move to something else you would still have to redesign the model layer and possibly parts of the controllers, but your business logic and UI layers could remain largely unchanged.
I think the real test lies in asking how big is this application going to be and how long do you plan to have it in service? If we're looking at a small application with a short life span then the added complexity of loose couplings is a waste unless you are doing it for educational purposes. If the application is expected to grow to be quite large or be in service for a number of years then the early investment in complexity should pay off in a lower cost of ownership over the long term since making changes to the various components should be easier.
If it makes you feel any better it's not uncommon to see POCO's and POJO's when working with ORM's such as entity framework and hybernate for the same reason.

How is post_delete firing before delete in Django?

I am seeing post_delete fire on a model before the instance is actually deleted from the database, which is contrary to https://docs.djangoproject.com/en/1.6/ref/signals/#post-delete
Note that the object will no longer be in the database, so be very careful what you do with this instance.
If I look in the database, the record remains, if I requery using the ORM, the record is returned, and is equivalent to the instance:
>>> instance.__class__.objects.get(pk=instance.pk) == instance
True
I don't have much relevant code to show, my signal looks like this:
from django.db.models.signals import post_delete, post_save
#receiver(post_delete, sender=UserInvite)
def invite_delete_action(sender, instance, **kwargs):
raise Exception(instance.__class__.objects.get(pk=instance.pk) == instance)
I am deleting this instance directly, it's not a relation of something else that is being deleted
My model is pretty normal looking
My view is a generic DeleteView
I haven't found any transactional decorators anywhere - which was my first thought as to how it might be happening
Any thoughts on where I would start debugging how on earth this happening? Is anyone aware of this as a known bug, I can't find any tickets describing any behaviour like this – also I am sure this works as expected in various other places in my application that are seemingly unaffected.
If I allow the execution to continue the instance does end up deleted... so it's not like it's present because it's failing to delete it (pretty sure post_delete shouldn't fire in that case anyway).

I believe what I am seeing is because of Django's default transactional behaviour, where the changes are not committed until the request is complete.
I don't really have a solution – I can't see a way to interrogate the state an instance or record would be in once the transaction is completed (or even a way to have any visibility of the transaction) nor any easy way to prevent this behaviour without significantly altering the way the application runs.
I am opting for ignore the problem for now, and not worrying about the repercussions in my use-case, which in fact, aren't that severe – I do welcome any and all suggestions regarding how to handle this properly however.
I fire a more generic signal for activity logging in my post_delete, and in the listener for that I need to be able to check if the instance is being deleted – otherwise it binds a bad GenericRelation referencing a pk that does not exist, what I intended to do is nullify it if I see the relation is being deleted - but as described, I can't tell at this point, unless I was to pass an extra argument whenever I fire the signal inside the post_delete.

Django/Python - Updating the database every second

I'm working on creating a browser-based game in Django and Python, and I'm trying to come up with a solution to one of the problems I'm having.
Essentially, every second, multiple user variables need to be updated. For example, there's a currency variable that should increase by some amount every second, progressively getting larger as you level-up and all of that jazz.
I feel like it's a bad idea to do this with cronjobs (and from my research, other people think that too), so right now I'm thinking I should just create a thread that loops through all of the users in the database that performs these updates.
Am I on the right track here, or is there a better solution? In Django, how can I start a thread the second the server starts?
I appreciate the insight.

One of the possible solutions would be to use separate daemonized lightweight python script to perform all the in-game business logic and left django be just the frontend to your game. To bind them together you might pick any of high-performance asynchronous messaging library like ZeroMQ (for instance to pass player's actions to that script). This stack would also have a benefit of a frontend being separated and completely agnostic of a backend implementation.

Generally a better solution would be to not update everyone's currency every second. Instead, store the timestamp of the user's most recent transaction, their income rate, and their latest balance. With these three pieces of data, you can calculate their current balance whenever needed.
When the user makes a purchase for example, do the math to calculate what their new balance is. Some pseudocode:
def current_balance(user):
return user.latest_balance + user.income * (now() - user.last_timestamp)
def purchase(user, price):
if price > current_balance(user):
print("You don't have enough cash.")
return
user.balance = current_balance(user) - price
user.last_timestamp = now()
user.save()
print("You just bought a thing!")
With a system like this, your database will only get updated upon user interaction, which will make your system scale oodles better.

Repeating "events" in a calendar: CPU vs Database

I'm building a calendar system from the ground up (requirement, as I'm working with a special type of calendar alongside Gregorian) , and I need some help with logic. I'm writing the application in Django and Python.
Essentially, the logical issues I'm running into is how to persist as few objects as possible as smartly as possible without running up the tab on CPU cycles. I'm feeling that polymorphism would be a solution to this, but I'm not exactly sure how to express it here.
I have two basic subsets of events, repeating events and one-shot events.
Repeating events will have subscribers, people which are notified about their changes. If, for example, a class is canceled or moved to a different address or time, people who have subscribed need to know about this. Some events simply happen every day until the end of time, won't be edited, and "just happen." The problem is that if I have one object that stores the event info and its repeating policy, then canceling or modifying one event in the series really screws things up and I'll have to account for that somehow, keeping subscribers aware of the change and keeping the series together as a logical group.
Paradox: generating unique event objects for each normal event in a series until the end of time (if it repeats indefinitely) doesn't make sense if they're all going to store the same information; however, if any change happens to a single event in the series, I'll almost have to create a different object in the database to represent a cancellation.
Can someone help me with the logic here? It's really twisting my mind and I can't really think straight anymore. I'd really like some input on how to solve this issue, as repeating events isn't exactly the easiest logical thing either (repeat every other day, or every M/W/F, or on the 1st M of each month, or every 3 months, or once a year on this date, or once a week on this date, or once a month on this date, or at 9:00 am on Tuesdays and 11:00am on Thursdays, etc.) and I'd like help understanding the best route of logic for repeating events as well.
Here's a thought on how to do it:
class EventSeries(models.Model):
series_name = models.TextField()
series_description = models.TextField()
series_repeat_policy = models.IHaveNoIdeaInTheWorldOnHowToRepresentThisField()
series_default_time = models.TimeField()
series_start_date = models.DateField()
series_end_date = models.DateField()
location = models.ForeignKey('Location')
class EventSeriesAnomaly(models.Model):
event_series = models.ForeignKey('EventSeries', related_name="exceptions")
override_name = models.TextField()
override_description = models.TextField()
override_time = models.TimeField()
override_location = models.ForeignKey('Location')
event_date = models.DateField()
class EventSeriesCancellation(models.Model):
event_series = models.ForeignKey('EventSeries', related_name="cancellations")
event_date = models.TimeField()
cancellation_explanation = models.TextField()
This seems to make a bit of sense, but as stated above, this is ruining my brain right now so anything seems like it would work. (Another problem and question, if someone wants to modify all remaining events in the series, what in the heck do I do!?!? I suppose that I could change 'series_default_time' and then generate anomaly instances for all past instances to set them to the original time, but AHHHHHH!!!)
Boiling it down to three simple, concrete questions, we have:
How can I have a series of repeating events, yet allow for cancellations and modifications on individual events and modifications on the rest of the series as a whole, while storing as few objects in the database as absolutely necessary, never generating objects for individual events in advance?
How can I repeat events in a highly customizable way, without losing my mind, in that I can allow events to repeat in a number of ways, but again making things easy and storing as few objects as possible?
How can I do all of the above, allowing for a switch on each event series to make it not happen if it falls out on a holiday?

This could become a heated discussion, as date logic usually is much harder than it first looks and everyone will have her own idea how to make things happen.
I would probably sacrifice some db space and have the models be as dumb as possible (e.g by not having to define anomalies to a series). The repeat condition could either be some simple terms which would have to be parsed (depending on your requirements) or - KISS - just the interval the next event occurs.
From this you can generate the "next" event, which will copy the repeat condition, and you generate as much events into the future as practically necessary (define some max time window into the future for which to generate events but generate them only, when somebody in fact looks at the time intervall in question). The events could have a pointer back to its parent event, so a whole series is identifiable (just like a linked list).
The model should have an indicator, whether a single event is cancelled. (The event remains in the db, to be able to copy the event into the future). Cancelling a whole series deletes the list of events.
EDIT: other answers have mentioned the dateutil package for interval building and parsing, which really looks very nice.

I want to address only question 3, about holidays.
In several reporting databases, I have found it handy to define a table, let's call it "Almanac", that has one row for each date, within a certain range. If the range spans ten years, the table will contain about 3,652 rows. That's small by today's standards. The primary key is the date.
Some other columns are things like whether the date is a holiday, a normal working day, or a weekend day. I know, I know, you could compute the weekend stuff by using a built in function. But it turns out to be convenient to include this stuff as data. It makes your joins simpler and more similar to each other.
Then you have one application program that populates the Almanac. It has all the calendar quirks built into it, including the enterprise rules for figuring out which days are holidays. You can even include columns for which "fiscal month" a given date belongs to, if that's relevant to your case. The rest of the application, both entry programs and extraction programs, all treat the Almanac like plain old data.
This may seem suboptimal because it's not minimal. But trust me, this design pattern is useful in a wide variety of situations. It's up to you to figure how it applies to your case.
The Almanac is really a subset of the principles of data warehousing and star schema design.
if you want to do the same thing inside the CPU, you could have an "Almanac" object with public features such as Almanac.holiday(date).

For an event series model I created, my solution to IHaveNoIdeaInTheWorldOnHowToRepresentThisField was to use pickled object field to save a recurrence rule (rrule) from dateutil in my event series model.

I faced the exact same problems as you a while ago. However, the initial solution did not contain any anomalies or cancellations, just repeating events. The way we modeled a set of repeating events, is to have some field indicating the interval type (like monthly/weekly/daily) and then the distances (like every 2nd day, ever 2nd week, etc.) starting from a given starting day. This simple way for repetition does not cover too many scenarios, but it was very easy to calculate the repeating dates. Other ways of repetition are also possible, for instance something in the way cronjobs are defined.
To generate the repetitions, we created a table function, that given some userid generated all the event repetitions on the fly up to like 5 years into the future on the fly using recursive SQL (so as in your approach, for a set of repetitions, only one event has to be stored). This so far works very well and the table function can be queried as if the individual repetitions were actually stored in the database. It also could be easily extended to exclude any cancelled events and to replaced changed events based on dates, also on the fly. I do not know if this is possible with your database and your ORM.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.