Repeating "events" in a calendar: CPU vs Database - python

I'm building a calendar system from the ground up (requirement, as I'm working with a special type of calendar alongside Gregorian) , and I need some help with logic. I'm writing the application in Django and Python.
Essentially, the logical issues I'm running into is how to persist as few objects as possible as smartly as possible without running up the tab on CPU cycles. I'm feeling that polymorphism would be a solution to this, but I'm not exactly sure how to express it here.
I have two basic subsets of events, repeating events and one-shot events.
Repeating events will have subscribers, people which are notified about their changes. If, for example, a class is canceled or moved to a different address or time, people who have subscribed need to know about this. Some events simply happen every day until the end of time, won't be edited, and "just happen." The problem is that if I have one object that stores the event info and its repeating policy, then canceling or modifying one event in the series really screws things up and I'll have to account for that somehow, keeping subscribers aware of the change and keeping the series together as a logical group.
Paradox: generating unique event objects for each normal event in a series until the end of time (if it repeats indefinitely) doesn't make sense if they're all going to store the same information; however, if any change happens to a single event in the series, I'll almost have to create a different object in the database to represent a cancellation.
Can someone help me with the logic here? It's really twisting my mind and I can't really think straight anymore. I'd really like some input on how to solve this issue, as repeating events isn't exactly the easiest logical thing either (repeat every other day, or every M/W/F, or on the 1st M of each month, or every 3 months, or once a year on this date, or once a week on this date, or once a month on this date, or at 9:00 am on Tuesdays and 11:00am on Thursdays, etc.) and I'd like help understanding the best route of logic for repeating events as well.
Here's a thought on how to do it:
class EventSeries(models.Model):
series_name = models.TextField()
series_description = models.TextField()
series_repeat_policy = models.IHaveNoIdeaInTheWorldOnHowToRepresentThisField()
series_default_time = models.TimeField()
series_start_date = models.DateField()
series_end_date = models.DateField()
location = models.ForeignKey('Location')
class EventSeriesAnomaly(models.Model):
event_series = models.ForeignKey('EventSeries', related_name="exceptions")
override_name = models.TextField()
override_description = models.TextField()
override_time = models.TimeField()
override_location = models.ForeignKey('Location')
event_date = models.DateField()
class EventSeriesCancellation(models.Model):
event_series = models.ForeignKey('EventSeries', related_name="cancellations")
event_date = models.TimeField()
cancellation_explanation = models.TextField()
This seems to make a bit of sense, but as stated above, this is ruining my brain right now so anything seems like it would work. (Another problem and question, if someone wants to modify all remaining events in the series, what in the heck do I do!?!? I suppose that I could change 'series_default_time' and then generate anomaly instances for all past instances to set them to the original time, but AHHHHHH!!!)
Boiling it down to three simple, concrete questions, we have:
How can I have a series of repeating events, yet allow for cancellations and modifications on individual events and modifications on the rest of the series as a whole, while storing as few objects in the database as absolutely necessary, never generating objects for individual events in advance?
How can I repeat events in a highly customizable way, without losing my mind, in that I can allow events to repeat in a number of ways, but again making things easy and storing as few objects as possible?
How can I do all of the above, allowing for a switch on each event series to make it not happen if it falls out on a holiday?

This could become a heated discussion, as date logic usually is much harder than it first looks and everyone will have her own idea how to make things happen.
I would probably sacrifice some db space and have the models be as dumb as possible (e.g by not having to define anomalies to a series). The repeat condition could either be some simple terms which would have to be parsed (depending on your requirements) or - KISS - just the interval the next event occurs.
From this you can generate the "next" event, which will copy the repeat condition, and you generate as much events into the future as practically necessary (define some max time window into the future for which to generate events but generate them only, when somebody in fact looks at the time intervall in question). The events could have a pointer back to its parent event, so a whole series is identifiable (just like a linked list).
The model should have an indicator, whether a single event is cancelled. (The event remains in the db, to be able to copy the event into the future). Cancelling a whole series deletes the list of events.
EDIT: other answers have mentioned the dateutil package for interval building and parsing, which really looks very nice.

I want to address only question 3, about holidays.
In several reporting databases, I have found it handy to define a table, let's call it "Almanac", that has one row for each date, within a certain range. If the range spans ten years, the table will contain about 3,652 rows. That's small by today's standards. The primary key is the date.
Some other columns are things like whether the date is a holiday, a normal working day, or a weekend day. I know, I know, you could compute the weekend stuff by using a built in function. But it turns out to be convenient to include this stuff as data. It makes your joins simpler and more similar to each other.
Then you have one application program that populates the Almanac. It has all the calendar quirks built into it, including the enterprise rules for figuring out which days are holidays. You can even include columns for which "fiscal month" a given date belongs to, if that's relevant to your case. The rest of the application, both entry programs and extraction programs, all treat the Almanac like plain old data.
This may seem suboptimal because it's not minimal. But trust me, this design pattern is useful in a wide variety of situations. It's up to you to figure how it applies to your case.
The Almanac is really a subset of the principles of data warehousing and star schema design.
if you want to do the same thing inside the CPU, you could have an "Almanac" object with public features such as Almanac.holiday(date).

For an event series model I created, my solution to IHaveNoIdeaInTheWorldOnHowToRepresentThisField was to use pickled object field to save a recurrence rule (rrule) from dateutil in my event series model.

I faced the exact same problems as you a while ago. However, the initial solution did not contain any anomalies or cancellations, just repeating events. The way we modeled a set of repeating events, is to have some field indicating the interval type (like monthly/weekly/daily) and then the distances (like every 2nd day, ever 2nd week, etc.) starting from a given starting day. This simple way for repetition does not cover too many scenarios, but it was very easy to calculate the repeating dates. Other ways of repetition are also possible, for instance something in the way cronjobs are defined.
To generate the repetitions, we created a table function, that given some userid generated all the event repetitions on the fly up to like 5 years into the future on the fly using recursive SQL (so as in your approach, for a set of repetitions, only one event has to be stored). This so far works very well and the table function can be queried as if the individual repetitions were actually stored in the database. It also could be easily extended to exclude any cancelled events and to replaced changed events based on dates, also on the fly. I do not know if this is possible with your database and your ORM.

Related

Can a function in an sqlalchemy class perform a query?

I may be guilty of thinking in too object-oriented a manner here, but I keep coming back to a certain pattern which doesn't look SQL-friendly at all.
Simplified description:-
I have an Attendance table, with various attributes including clockin, lunchout, lunchin, and clockout (all of type Time), an employee ID, and a date.
Different employees can have different shift hours, which depends on the day of the week and day of the month. The employee's categories are summarized in an Employees table, the shift hours etc. are summarized in an Hours table. A Leave table specifies half/full day leave taken.
I have raw check-in times from another source (third party, basically consider this a csv) which I sort and then need to insert to the Attendance table properly, by which I imagine I should be doing:-
a) Take the earliest time as clockin
b) Take the latest time as clockout
c) Compare remaining times with assigned lunch hours according to an employee's shift for the day
c) is the one giving me problems, as some employees check-in at odd times (perhaps they need to run out for an hour to meet a client), and some simply don't check-in for meals (eating in, etc). So I need to fuzzy-match the remaining times with assigned lunch hours, meaning I have to access the Hours table as well.
I'm trying to do all of this in a function within the Attendance class, so that it looks something like this:-
class Attendance(Base):
... my attributes ...
def calc_attendance(self, listOfTimes):
# Do the necessary
However it looks like I'd be required to do more queries within the calc_attendance function. Not even sure if that's possible, and certainly doesn't look like something I should be doing.
I could do multiple queries in the calling function, but that seems more messy. I've also considered multiple joins so the requisite information (for each employee per day) is available. Neither of them feel right, please do correct me though, if necessary.
In short, yes you seem to be trying to jam everything into one class, but that is not "wrong" it's just which class you're jamming things into.
to answer the actual question: Yes, you can perform a query inside another class, and or make that class turn itself into a dict or other iter-able format which makes comparing things easier in some ways.
However I have found in practice it is usually more efficient from a time perspective to build a class per table and apply modifications to that table's specific data using the internal functions, and a overriding umbrella "job" class that performs the specific "tasks" comparison, data input, etc... for the group but which encapsulates this process so that it can scale up and be used by other higher level classes in future. This allows for table specific work done in their respective classes, but the group to be compared and fuzzy matched without cluttering up their respective tables. This makes maintaining the code easier and splitting the work amongst a group possible. It also allows for a much sharper focus on each task. In theory would also allow for asyncio and other optimizations in future.
so:
Main Job (class)
(various comparison functions,data input verification, etc...)
|-> table 1 class instance
|-> table 2 class instance
happy to help further if you need actual examples.

Multiple queries vs. manually sorting one large query (AppEngine NDB)

For a model like:
class Thing(ndb.Model):
visible = ndb.BooleanProperty()
made_by = ndb.KeyProperty(kind=User)
belongs_to = ndb.KeyProperty(kind=AnotherThing)
Essentially performing an 'or' query, but comparing different properties so I can't use a built in OR... I want to get all Thing (belonging to a particular AnotherThing) which either have visible set to True or visible is False and made_by is the current user.
Which would be less demanding on the datastore (ie financially cost less):
Query to get everything, ie: Thing.query(Thing.belongs_to == some_thing.key) and iterate through the results, storing the visible ones, and the ones that aren't visible but are made_by the current user?
Query to get the visible ones, ie: Thing.query(Thing.belongs_to == some_thing.key, Thing.visible == "True") and query separately to get the non-visible ones by the current user, ie: Thing.query(Thing.belongs_to == some_thing.key, Thing.visible == "False", Thing.made_by = current_user)?
Number 1. would get many unneeded results, like non-visible Things by other users - which I think is many reads of the datastore? 2. is two whole queries though, which is also possibly unnecessarily heavy, right? I'm still trying to work out what kinds of interaction with the database cause what kinds of costs.
I'm using ndb, tasklets and memcache where necessary, in case that's relevant.
Number two is going to be financially less for two reasons. First you pay for each read of the data store and for each returned entity in a query, therefore you will be paying more for the first one which you have to Read all data and query all data. The second way you only pay for what you need.
Secondly you also pay for backend or frontend time, and you will be using time to iterate through all your results in the first method, where as you need to spend no time for the second method.
I can't see a way where the first option is better. (maybe if you only have a few entities??)
To understand how reads and queries cost you scroll down a little on:
https://developers.google.com/appengine/docs/billing
You will see how Read, Writes and Smalls are added up for reads, writes and queries.
I would also just query for ones that are owned by the current user instead of visible=false and owner=current, this way you don't need a composite index which will save some time. You can also make visible a partial index this was saving some space as well (only index it when true, assuming you never need to query for false ones). You will need to do a litte work to remove duplicates, but that is probably not to bad.
You are probably best benchmarking both cases using real-world data. It's hard to determine things like this in the abstract, as there are many subtleties that may affect overall performance.
I would expect option 2 to be better though. Loading tons of objects that you don't care about is simply going to put a heavy burden on the data store that I don't think an extra query would be comparable to. Of course, it depends on how many extra things, etc.

Django Models: Keep track of activity through related models?

I have something of a master table of Persons. Everything in my Django app some relates to one or more People, either directly or through long fk chains. Also, all my models have the standard bookkeeping fields 'created_at' and 'updated_at'. I want to add a field on my Person table called 'last_active_at', mostly for raw sql ordering purposes.
Creating or editing certain related models produces new timestamps for those objects. I need to somehow update Person.'last_active_at' with those values. Functionally, this isn't too hard to accomplish, but I'm concerned about undue stress on the app.
My two greatest causes of concern are that I'm restricted to a real db field--I can't assign a function to the Person table as a #property--and one of these 'activity' models receives and processes new instances from a foreign datasource I have no control over, sporadically receiving a lot of data at once.
My first thought was to add a post_save hook to the 'activity' models. Still seems like my best option, but I know nothing about them, how hard they hit the db, etc.
My second thought was to write some sort of script that goes through the day's activity and updates those models over the night. My employers a 'live'er stream, though.
My third thought was to modify the post_save algo to check if the 'updated_at' is less than half an hour from the Person's 'last_active_at', and not update the person if true.
Are my thoughts tending in a scalable direction? Are there other approaches I should pursue?
It is said that premature optimization is the mother of all problems. You should start with the dumbest implementation (update it every time), and then measure and - if needed - replace it with something more efficient.
First of all, let's put a method to update the last_active_at field on Person. That way, all the updating logic itself is concentrated here, and we can easily modify it later.
The signals are quite easy to use : it's just about declaring a function and registering it as a receiver, and it will be ran each time the signal is emitted. See the documentation for the full explanation, but here is what it might look like :
from django.db.models.signals import post_save
from django.dispatch import receiver
#receiver(post_save, sender=RelatedModel)
def my_handler(sender, **kwargs):
# sender is the object being saved
person = # Person to be updated
person.update_activity()
As for the updating itself, start with the dumbest way to do it.
def update_activity(self):
self.last_active_at = now()
Then measure and decide if it's a problem or not. If it's a problem, some of the things you can do are :
Check if the previous update is recent before updating again. Might be useless if a read to you database is not faster than a write. Not a problem if you use a cache.
Write it down somewhere for a deferred process to update later. No need to be daily : if the problem is that you have 100 updates per seconds, you can just have a script update the database every 10 seconds, or every minutes. You can probably find a good performance/uptodatiness trade-off using this technique.
These are just some though based on what you proposed, but the right choice depends on the kind of figures you have. Determine what kind of load you'll have, what kind of reaction time is needed for that field, and experiment.

How to implement calendar with repeatable events?

I am trying to implement a calendar with repeatable events.
Simple example (in human language) is: 'Something happens every working day between 10:00 and 12:00'
What is the most correct way to store this data in the database and to search between them.
The search may be something like "Give me all events on Tuesday 21th of Feb 2012".
I am planning to use relational database to store them.
P.S. I am planning to use Python and Django so existing libs can be used.
You have to think about how you want to implement this when determining the best way to store the data:
should users be able to reschedule or remove one of the recurring events
similarly, should changes to recurring events change all events or only future events?
do you care about creating a lot of records in the database?
If the answer is yes to the first two and no to the last, the easiest way to implement this is to allow events to have a parent event, and then to create a separate record called Recurring which relates how a base event recurs. Then each time the recurring event changes, a script is triggered that creates/recreates the events.
Searching for the events becomes simplicity itself: since they are actual events, you just search for them.

Object oriented design for an investment/stock and options portfolio in Python

I'm a beginner/intermediate Python programmer but I haven't written an application, just scripts. I don't currently use a lot of object oriented design, so I would like this project to help build my OOD skills. The problem is, I don't know where to start from a design perspective (I know how to create the objects and all that stuff). For what it's worth, I'm also self taught, no formal CS education.
I'd like to try writing a program to keep track of a portfolio stock/options positions.
I have a rough idea about what would make good object candidates (Portfolio, Stock, Option, etc.) and methods (Buy, Sell, UpdateData, etc.).
A long position would buy-to-open, and sell-to-close while a short position has a sell-to-open and buy-to-close.
portfolio.PlaceOrder(type="BUY", symbol="ABC", date="01/02/2009", price=50.00, qty=100)
portfolio.PlaceOrder(type="SELL", symbol="ABC", date="12/31/2009", price=100.00, qty=25)
portfolio.PlaceOrder(type="SELLSHORT", symbol="XYZ", date="1/2/2009", price=30.00, qty=50)
portfolio.PlaceOrder(type="BUY", symbol="XYZ", date="2/1/2009", price=10.00, qty=50)
Then, once this method is called how do I store the information? At first I thought I would have a Position object with attributes like Symbol, OpenDate, OpenPrice, etc. but thinking about updating the position to account for sales becomes tricky because buys and sells happen at different times and amounts.
Buy 100 shares to open, 1 time, 1 price. Sell 4 different times, 4 different prices.
Buy 100 shares. Sell 1 share per day, for 100 days.
Buy 4 different times, 4 different prices. Sell entire position at 1 time, 1 price.
A possible solution would be to create an object for each share of stock, this way each share would have a different dates and prices. Would this be too much overhead? The portfolio could have thousands or millions of little Share objects. If you wanted to find out the total market value of a position you'd need something like:
sum([trade.last_price for trade in portfolio.positions if trade.symbol == "ABC"])
If you had a position object the calculation would be simple:
position.last * position.qty
Thanks in advance for the help. Looking at other posts it's apparent SO is for "help" not to "write your program for you". I feel that I just need some direction, pointing down the right path.
ADDITIONAL INFO UPON REFLECTION
The Purpose
The program would keep track of all positions, both open and closed; with the ability to see a detailed profit and loss.
When I think about detailed P&L I want to see...
- all the open dates (and closed dates)
- time held
- open price (closed date)
- P&L since open
- P&L per day
#Senderle
I think perhaps you're taking the "object" metaphor too literally, and so are trying to make a share, which seems very object-like in some ways, into an object in the programming sense of the word. If so, that's a mistake, which is what I take to be juxtapose's point.
This is my mistake. Thinking about "objects" a share object seems natural candidate. It's only until there may be millions that the idea seems crazy. I'll have some free coding time this weekend and will try creating an object with a quantity.
There are two basic precepts you should keep in mind when designing such a system:
Eliminate redundancy from your data. No redundancy insures integrity.
Keep all the data you need to answer any inquiry, at the lowest level of detail.
Based on these precepts, my suggestion is to maintain a Transaction Log file. Each transaction represents a change of state of some kind, and all the pertinent facts about it: when, what, buy/sell, how many, how much, etc. Each transaction would be represented by a record (a namedtuple is useful here) in a flat file. A years worth (or even 5 or 10 years) of transactions should easily fit in a memory resident list. You can then create functions to select, sort and summarize whatever information you need from this list, and being memory resident, it will be amazingly fast, much faster than a SQL database.
When and if the Transaction Log becomes too large or too slow, you can compute the state of all your positions as of a particular date (like year-end), use that for the initial state for the following period, and archive your old log file to disc.
You may want some auxiliary information about your holdings such as value/price on any particular date, so you can plot value vs. time for any or all holdings (There are on-line sources for this type of information, yahoo finance for one.) A master database containing static information about each of your holdings would also be useful.
I know this doesn't sound very "object oriented", but OO design could be useful to hide the detailed workings of the system in a TransLog object with methods to save/restore the data to/from disc (save/open methods), enter/change/delete a transaction; and additional methods to process the data into meaningful information displays.
First write the API with a command line interface. When this is working to your satisfaction, then you can go on to creating a GUI front end if you wish.
Good luck and have fun!
Avoid objects. Object oriented design is flawed. Think about your program as a collection of behaviors that operate on data (lists and dictionaries). Then group your related behaviors as functions in a module.
Each function should have clear input and outputs. Store your data globally in each module.
Why do it without objects? Because it maps closer to the problem space. Object oriented programming creates too much indirection to solve a problem. Unnecessary indirection causes software bloat and bugs.
A possible solution would be to create an object for each share of stock, this way each share would have a different dates and prices. Would this be too much overhead? The portfolio could have thousands or millions of little Share objects. If you wanted to find out the total market value of a position you'd need something like:
Yes it would be too much overhead. The solution here is you would store the data in a database. Finding the total market value of a position would be done in SQL unless you use a NOSQL scheme.
Don't try to design for all possible future outcomes. Just make your program work that way it needs to work now.
I think I'd separate it into
holdings (what you currently own or owe of each symbol)
orders (simple demands to buy or sell a single symbol at a single time)
trades (collections of orders)
This makes it really easy to get a current value, queue orders, and build more complex orders, and maps easily into data objects with a database behind them.
To answer your question: You appear to have a fairly clear idea of your data model already. But it looks to me like you need to think more about what you want this program to do. Will it keep track of changes in stock prices? Place orders, or suggest orders to be placed? Or will it simply keep track of the orders you've placed? Each of these uses may call for different strategies.
That said, I don't see why you would ever need to have an object for every share; I don't understand the reasoning behind that strategy. Even if you want to be able to track your order history in great detail, you could just store aggregate data, as in "x shares at y dollars per share, on date z".
It would make more sense to have a position object (or holding object, in Hugh's terminology) -- one per stock, perhaps with an .order_history attribute, if you really need a detailed history of your holdings in that stock. And yes, a database would definitely be useful for this kind of thing.
To wax philosophical for a moment: I think perhaps you're taking the "object" metaphor too literally, and so are trying to make a share, which seems very object-like in some ways, into an object in the programming sense of the word. If so, that's a mistake, which is what I take to be juxtapose's point.
I disagree with him that object oriented design is flawed -- that's a pretty bold pronouncement! -- but his answer is right insofar as an "object" (a.k.a. a class instance) is almost identical to a module**. It's a collection of related functions linked to some shared state. In a class instance, the state is shared via self or this, while in a module, it's shared through the global namespace.
For most purposes, the only major difference between a class instance and a module is that there can be many class instances, each one with its own independent state, while there can be only one module instance. (There are other differences, of course, but most of the time they involve technical matters that aren't very important for learning OOD.) That means that you can think about objects in a way similar to the way you think about modules, and that's a useful approach here.
**In many compiled languages, the file that results when you compile a module is called an "object" file. I think that's where the "object" metaphor actually comes from. (I don't have any real evidence of that! So anyone who knows better, feel free to correct me.) The ubiquitous toy examples of OOD that one sees -- car.drive(mph=50); car.stop(); car.refuel(unleaded, regular) -- I believe are back-formations that can confuse the concept a bit.
I would love to hear with what you came up with. I am ~ 4 months (part time) into creating an Order Handler and although its mostly complete, I still have the same questions as you as I'd like it to be made properly.
Currently, I save two files
A "Strategy Hit Log" where each buy/sell signal that comes from any strategy script is saved. For example: when the buy_at_yesterdays_close_price.py strategy is triggered, it saved that buy request in this file, and passes the request to the Order Handler
An "Order Log" which is a single DataFrame - this file fits the purpose you were focusing on.
Each request from a strategy pertains to a single underlying security (for example, AAPL stock) and creates an Order which is saved as a row in the DataFrame, containing columns for the Ticker, the Strategy name that spawned this Order as well as Suborder and Broker Suborder columns (explained below).
Each Order has a list of Suborders (dicts) stored in the Suborder column. For example: if you are bullish on AAPL, the suborders could be:
[
{'security': 'equity', 'quantity':10},
{'security': 'option call', 'quantity':10}
]
Each Order also has a list of Broker Suborders (dicts) stored in the Broker Suborder column. Each Broker Suborder is a request to the Broker to buy or sell a security, and is indexed with the "order ID" that the Broker provides for that request. Every new request to the Broker is a new Broker Suborder, and canceling that Broker Suborder is recorded in that dict. To record modifications to Broker Suborders, you cancel the old Broker Suborder and send and record a new one (its the same commission using IBKR).
Improvements
List of Classes instead of DataFrame: I think it'd be much more pythonic to save each Order as an instance of an Order_Class (instead of a row of a DataFrame), which has Suborder and Broker_Suborder attributes, both of which are also instances of Suborder_Class and Broker_Suborder_Class.
My current question is whether saving a list of classes as my record of all open and closed Orders is pythonic or silly.
Visualization Considerations: It seems like Orders should be saved in table form for easier viewing, but maybe it is better to save them in this "list of class instances" form and use a function to tabularize them at the time of viewing? Any input by anyone would be greatly appreciated. I'd like to move away from this and start playing with ML, but I don't want to leave the Order Handler unfinished.
Is it all crap?: should each Broker Suborder (buy/sell request to a Broker) be attached to an Order (which is just a specific strategy request from a strategy script) or should all Broker Suborders be recorded in chronological order and simply have a reference to the strategy-order (Order) that spawned the Broker Suborder? Idk... but I would love everyone's input.

Categories