Design pattern for event grouping

Design pattern for event grouping - python

I would to implement a existing model with an event-driven (or message-driven) architecture.
Some entities need to fire events to update other entities.
For entities of class B depends on entities of class A.
So when an A entity is changing, I will fire an event to let some B entities to update
(several A entities are "linked" to one B entity).
In some case, I need to update all A entities. This will generate events.
But most of those events will be "redondant" because I will update each B entity
several times. I would like to "group" or "delay" those events.
What are the usefull design patterns for my use case ?
Do I need grouping pending events ?
Do I need delaying events ?
Do I need event invalidation ? based on time-stamp ?
I am using Python but examples in any langages will be appreciated.

Related

Django - Best way to create snapshots of objects

I am currently working on a Django 2+ project involving a blockchain, and I want to make copies of some of my object's states into that blockchain.
Basically, I have a model (say "contract") that has a list of several "signature" objects.
I want to make a snapshot of that contract, with the signatures. What I am basically doing is taking the contract at some point in time (when it's created for example) and building a JSON from it.
My problem is: I want to update that snapshot anytime a signature is added/updated/deleted, and each time the contract is modified.
The intuitive solution would be to override each "delete", "create", "update" of each of the models involved in that snapshot, and pray that all of them the are implemented right, and that I didn't forget any. But I think that this is not scalable at all, hard to debug and ton maintain.
I have thought of a solution that might be more centralized: using a periodical job to get the last update date of my object, compare it to the date of my snapshot, and update the snapshot if necessary.
However with that solution, I can identify changes when objects are modified or created, but not when they are deleted.
So, this is my big question mark: how with django can you identify deletions in relationships, without any prior context, just by looking at the current database's state ? Is there a django module to record deleted objects ? What are your thoughts on my issue ?

All right?
I think that, as I understand your problem, you are in need of a module like Django Signals, which listens for changes in the database and, when identified (and if all the desired conditions are met), executes certain commands in your application ( even be able to execute in the database).
This is the most recent documentation:
https://docs.djangoproject.com/en/3.1/topics/signals/

Many different sharded counters in single transaction

I have to increment three different counters in a single transaction. Beside that I have to manipulate three other entities as well. I get
too many entity groups in a single transaction
I've used the recipie from https://developers.google.com/appengine/articles/sharding_counters to implement my counters. I increment my counters inside some model (class) methods depending on business logic.
As a workaround I implemented a deferred increment method that uses tasks to update the counter. But that doesn't scale well if the number of counters increases further as there is a limit of tasks in a single transaction as well (I thinks it's 5) and I guess it's not the most effective way.
I also found https://github.com/DocSavage/sharded_counter/blob/master/counter.py which seems to ensure updating the counter even in case of a db error through memcache. But I don't want to increment my counters if the transaction fails.
Another idea is to remember the counters I have to increment during a web request and to increment them in a single deferred task. I don't know how to implement this in a clean and thread safe way without passing objects created in the request to the model methods. I think this code would be ugly and not in the same transcation:
def my_request_handler():
counter_session = model.counter_session()
model.mylogic(counter_session, other_params)
counter_session.write()
Any experiences or ideas?
BTW: I'm using python, ndb and flask
It would be ok if the counter is not 100% accurate.

As said in Transactions and entity groups:
the simplest approach is to determine which entities you need to be
able to process in the same transaction. Then, when you create those
entities, place them in the same entity group by declaring them with a
common ancestor. They will then all be in the same entity group and
you will always be able to update and read them transactionally.

performance implications of transactions on large entity groups (python, NDB, Master/Slave)

I have several tens of thousands of related small entities (NDB atop of Master-Slave, will have to move to HRD one day..), which I'd like to put in the same entity group to enable transactions.
Small subsets of those entities will be updated by transactions.
What are the performance implications of this setup?
Does it mean the whole group gets locked during the update? I.e. one transaction at a time.
Thanks!

There's an approximate performance limit of 1 write transaction per second to an entity group.
The whole group does get locked for the update. A subsequent transaction will fail and retry.
10k entities in an entity group sounds like a lot, but it really depends on your write patterns. For example, if only a few entities in the group are ever updated, it may not be a big issue. However, if random users are constantly updating random entities in the group, you'll want to split it up into more entity groups.

How to implement calendar with repeatable events?

I am trying to implement a calendar with repeatable events.
Simple example (in human language) is: 'Something happens every working day between 10:00 and 12:00'
What is the most correct way to store this data in the database and to search between them.
The search may be something like "Give me all events on Tuesday 21th of Feb 2012".
I am planning to use relational database to store them.
P.S. I am planning to use Python and Django so existing libs can be used.

You have to think about how you want to implement this when determining the best way to store the data:
should users be able to reschedule or remove one of the recurring events
similarly, should changes to recurring events change all events or only future events?
do you care about creating a lot of records in the database?
If the answer is yes to the first two and no to the last, the easiest way to implement this is to allow events to have a parent event, and then to create a separate record called Recurring which relates how a base event recurs. Then each time the recurring event changes, a script is triggered that creates/recreates the events.
Searching for the events becomes simplicity itself: since they are actual events, you just search for them.

Repeating "events" in a calendar: CPU vs Database

I'm building a calendar system from the ground up (requirement, as I'm working with a special type of calendar alongside Gregorian) , and I need some help with logic. I'm writing the application in Django and Python.
Essentially, the logical issues I'm running into is how to persist as few objects as possible as smartly as possible without running up the tab on CPU cycles. I'm feeling that polymorphism would be a solution to this, but I'm not exactly sure how to express it here.
I have two basic subsets of events, repeating events and one-shot events.
Repeating events will have subscribers, people which are notified about their changes. If, for example, a class is canceled or moved to a different address or time, people who have subscribed need to know about this. Some events simply happen every day until the end of time, won't be edited, and "just happen." The problem is that if I have one object that stores the event info and its repeating policy, then canceling or modifying one event in the series really screws things up and I'll have to account for that somehow, keeping subscribers aware of the change and keeping the series together as a logical group.
Paradox: generating unique event objects for each normal event in a series until the end of time (if it repeats indefinitely) doesn't make sense if they're all going to store the same information; however, if any change happens to a single event in the series, I'll almost have to create a different object in the database to represent a cancellation.
Can someone help me with the logic here? It's really twisting my mind and I can't really think straight anymore. I'd really like some input on how to solve this issue, as repeating events isn't exactly the easiest logical thing either (repeat every other day, or every M/W/F, or on the 1st M of each month, or every 3 months, or once a year on this date, or once a week on this date, or once a month on this date, or at 9:00 am on Tuesdays and 11:00am on Thursdays, etc.) and I'd like help understanding the best route of logic for repeating events as well.
Here's a thought on how to do it:
class EventSeries(models.Model):
series_name = models.TextField()
series_description = models.TextField()
series_repeat_policy = models.IHaveNoIdeaInTheWorldOnHowToRepresentThisField()
series_default_time = models.TimeField()
series_start_date = models.DateField()
series_end_date = models.DateField()
location = models.ForeignKey('Location')
class EventSeriesAnomaly(models.Model):
event_series = models.ForeignKey('EventSeries', related_name="exceptions")
override_name = models.TextField()
override_description = models.TextField()
override_time = models.TimeField()
override_location = models.ForeignKey('Location')
event_date = models.DateField()
class EventSeriesCancellation(models.Model):
event_series = models.ForeignKey('EventSeries', related_name="cancellations")
event_date = models.TimeField()
cancellation_explanation = models.TextField()
This seems to make a bit of sense, but as stated above, this is ruining my brain right now so anything seems like it would work. (Another problem and question, if someone wants to modify all remaining events in the series, what in the heck do I do!?!? I suppose that I could change 'series_default_time' and then generate anomaly instances for all past instances to set them to the original time, but AHHHHHH!!!)
Boiling it down to three simple, concrete questions, we have:
How can I have a series of repeating events, yet allow for cancellations and modifications on individual events and modifications on the rest of the series as a whole, while storing as few objects in the database as absolutely necessary, never generating objects for individual events in advance?
How can I repeat events in a highly customizable way, without losing my mind, in that I can allow events to repeat in a number of ways, but again making things easy and storing as few objects as possible?
How can I do all of the above, allowing for a switch on each event series to make it not happen if it falls out on a holiday?

This could become a heated discussion, as date logic usually is much harder than it first looks and everyone will have her own idea how to make things happen.
I would probably sacrifice some db space and have the models be as dumb as possible (e.g by not having to define anomalies to a series). The repeat condition could either be some simple terms which would have to be parsed (depending on your requirements) or - KISS - just the interval the next event occurs.
From this you can generate the "next" event, which will copy the repeat condition, and you generate as much events into the future as practically necessary (define some max time window into the future for which to generate events but generate them only, when somebody in fact looks at the time intervall in question). The events could have a pointer back to its parent event, so a whole series is identifiable (just like a linked list).
The model should have an indicator, whether a single event is cancelled. (The event remains in the db, to be able to copy the event into the future). Cancelling a whole series deletes the list of events.
EDIT: other answers have mentioned the dateutil package for interval building and parsing, which really looks very nice.

I want to address only question 3, about holidays.
In several reporting databases, I have found it handy to define a table, let's call it "Almanac", that has one row for each date, within a certain range. If the range spans ten years, the table will contain about 3,652 rows. That's small by today's standards. The primary key is the date.
Some other columns are things like whether the date is a holiday, a normal working day, or a weekend day. I know, I know, you could compute the weekend stuff by using a built in function. But it turns out to be convenient to include this stuff as data. It makes your joins simpler and more similar to each other.
Then you have one application program that populates the Almanac. It has all the calendar quirks built into it, including the enterprise rules for figuring out which days are holidays. You can even include columns for which "fiscal month" a given date belongs to, if that's relevant to your case. The rest of the application, both entry programs and extraction programs, all treat the Almanac like plain old data.
This may seem suboptimal because it's not minimal. But trust me, this design pattern is useful in a wide variety of situations. It's up to you to figure how it applies to your case.
The Almanac is really a subset of the principles of data warehousing and star schema design.
if you want to do the same thing inside the CPU, you could have an "Almanac" object with public features such as Almanac.holiday(date).

For an event series model I created, my solution to IHaveNoIdeaInTheWorldOnHowToRepresentThisField was to use pickled object field to save a recurrence rule (rrule) from dateutil in my event series model.

I faced the exact same problems as you a while ago. However, the initial solution did not contain any anomalies or cancellations, just repeating events. The way we modeled a set of repeating events, is to have some field indicating the interval type (like monthly/weekly/daily) and then the distances (like every 2nd day, ever 2nd week, etc.) starting from a given starting day. This simple way for repetition does not cover too many scenarios, but it was very easy to calculate the repeating dates. Other ways of repetition are also possible, for instance something in the way cronjobs are defined.
To generate the repetitions, we created a table function, that given some userid generated all the event repetitions on the fly up to like 5 years into the future on the fly using recursive SQL (so as in your approach, for a set of repetitions, only one event has to be stored). This so far works very well and the table function can be queried as if the individual repetitions were actually stored in the database. It also could be easily extended to exclude any cancelled events and to replaced changed events based on dates, also on the fly. I do not know if this is possible with your database and your ORM.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.