I'm building a directory of Hospitals and Clinics and as a speciality field I'd like to store the speciality or type of clinic or hospital (like Dermathologist, etc.). However, there are some places, specially big ones, with many different specialities in one place, and as the choices= method of a CharField doesn't allow me to select more than one option I had to think of an alternative.
At first I didn't think it was necessary to create a different table and add a relation, that's why I tried the django-multiselectfield package and it works just fine, but I was wondering if it would be better to create a different table and give it a relation to the Hospitals model. That 'type' or 'speciality' table once built it likely won't ever change in its contents. Is it better to build a different table performance-wise?
Also I'm trying to store the choices of the model in a different choices.py file with TextChoices model classes as I will be using the same choices in various fields of different models through different apps. I know is generally better to store the choices inside the same class, but does that make sense in my case?
Performance is probably not the primary concern here; I think the difference between the two approaches would be negligible. Whether one or more than one model would use the same set of choices doesn't lean one way or another; either a fixed list or many-to-many relation could accommodate that.
Although you say that the selections aren't expected to change (an argument in favor of a hard-coded list of choices), medical specialties are a kind of data that do change in the long run. Contrast this with, say, months of the year or days of the week, which are a lot less likely to change.
That said, if you already have a multi-select field working, I'd be inclined to leave it alone until there's a compelling reason to change it.
For that 2nd part, I see no issue with storing the Choice list in another .py file.
I've done that strictly to keep my models.py looking somewhat pretty- I don't want to scroll past 150 choices to double check a model method.
The 1st part is all about taste. I'd personally go the Relation + Many-To-Many route.
I always plan for edge cases so "likely won't change" = "So there's a possibility"
Also I like that the Relation + Many-To-Many route doesn't have a dependency, it's a Core Django feature.. It's pretty rock solid and future proof
an added benefit is making it another table also means that a non-technical person could potentially add new options and in theory you're not spending your time constantly changing it.
I know there are similar questions (like this, this, this and this) but I have specific requirements and looking for a less-expensive way to do the following (on Django 1.10.2):
Looking to not have sequential/guessable integer ids in the URLs and ideally meet the following requirements:
Avoid UUIDs since that makes the URL really long.
Avoid a custom primary key. It doesn’t seem to work well if the models have ManyToManyFields. Got affected by at least three bugs while trying that (#25012, #24030 and #22997), including messing up the migrations and having to delete the entire db and recreating the migrations (well, lots of good learning too)
Avoid checking for collisions if possible (hence avoid a db lookup for every insert)
Don’t just want to look up by the slug since it’s less performant than just looking up an integer id.
Don’t care too much about encrypting the id - just don’t want it to
be a visibly sequential integer.
Note: The app would likely have 5 million records or so in the long term.
After researching a lot of options on SO, blogs etc., I ended up doing the following:
Encoding the id to base32 only for the URLs and decoding it back in urls.py (using an edited version of Django’s util functions to encode to base 36 since I needed uppercase letters instead of lowercase).
Not storing the encoded id anywhere. Just encoding and decoding everytime on the fly.
Keeping the default id intact and using it as primary key.
(good hints, posts and especially this comment helped a lot)
What this solution helps achieve:
Absolutely no edits to models or post_save signals.
No collision checks needed. Avoiding one extra request to the db.
Lookup still happens on the default id which is fast. Also, no double save()requests on the model for every insert.
Short and sweet encoded ID (the number of characters go up as the number of records increase but still not very long)
What it doesn’t help achieve/any drawbacks:
Encryption - the ID is encoded but not encrypted, so the user may
still be able to figure out the pattern to get to the id (but I dont
care about it much, as mentioned above).
A tiny overhead of encoding and decoding on each URL construction/request but perhaps that’s better than collision checks and/or multiple save() calls on the model object for insertions.
For reference, looks like there are multiple ways to generate random IDs that I discovered along the way (like Django’s get_random_string, Python’s random, Django’s UUIDField etc.) and many ways to encode the current ID (base 36, base 62, XORing, and what not).
The encoded ID can also be stored as another (indexed) field and looked up every time (like here) but depends on the performance parameters of the web app (since looking up a varchar id is less performant that looking up an integer id). This identifier field can either be saved from a overwritten model’s save() function, or by using a post_save() signal (see here) (while both approaches will need the save() function to be called twice for every insert).
All ears to optimizations to the above approach. I love SO and the community. Everytime there’s so much to learn here.
Update: After more than a year of this post, I found this great library called hashids which does pretty much the same thing quite well! Its available in many languages including Python.
I may be guilty of thinking in too object-oriented a manner here, but I keep coming back to a certain pattern which doesn't look SQL-friendly at all.
Simplified description:-
I have an Attendance table, with various attributes including clockin, lunchout, lunchin, and clockout (all of type Time), an employee ID, and a date.
Different employees can have different shift hours, which depends on the day of the week and day of the month. The employee's categories are summarized in an Employees table, the shift hours etc. are summarized in an Hours table. A Leave table specifies half/full day leave taken.
I have raw check-in times from another source (third party, basically consider this a csv) which I sort and then need to insert to the Attendance table properly, by which I imagine I should be doing:-
a) Take the earliest time as clockin
b) Take the latest time as clockout
c) Compare remaining times with assigned lunch hours according to an employee's shift for the day
c) is the one giving me problems, as some employees check-in at odd times (perhaps they need to run out for an hour to meet a client), and some simply don't check-in for meals (eating in, etc). So I need to fuzzy-match the remaining times with assigned lunch hours, meaning I have to access the Hours table as well.
I'm trying to do all of this in a function within the Attendance class, so that it looks something like this:-
class Attendance(Base):
... my attributes ...
def calc_attendance(self, listOfTimes):
# Do the necessary
However it looks like I'd be required to do more queries within the calc_attendance function. Not even sure if that's possible, and certainly doesn't look like something I should be doing.
I could do multiple queries in the calling function, but that seems more messy. I've also considered multiple joins so the requisite information (for each employee per day) is available. Neither of them feel right, please do correct me though, if necessary.
In short, yes you seem to be trying to jam everything into one class, but that is not "wrong" it's just which class you're jamming things into.
to answer the actual question: Yes, you can perform a query inside another class, and or make that class turn itself into a dict or other iter-able format which makes comparing things easier in some ways.
However I have found in practice it is usually more efficient from a time perspective to build a class per table and apply modifications to that table's specific data using the internal functions, and a overriding umbrella "job" class that performs the specific "tasks" comparison, data input, etc... for the group but which encapsulates this process so that it can scale up and be used by other higher level classes in future. This allows for table specific work done in their respective classes, but the group to be compared and fuzzy matched without cluttering up their respective tables. This makes maintaining the code easier and splitting the work amongst a group possible. It also allows for a much sharper focus on each task. In theory would also allow for asyncio and other optimizations in future.
so:
Main Job (class)
(various comparison functions,data input verification, etc...)
|-> table 1 class instance
|-> table 2 class instance
happy to help further if you need actual examples.
I'm a beginner/intermediate Python programmer but I haven't written an application, just scripts. I don't currently use a lot of object oriented design, so I would like this project to help build my OOD skills. The problem is, I don't know where to start from a design perspective (I know how to create the objects and all that stuff). For what it's worth, I'm also self taught, no formal CS education.
I'd like to try writing a program to keep track of a portfolio stock/options positions.
I have a rough idea about what would make good object candidates (Portfolio, Stock, Option, etc.) and methods (Buy, Sell, UpdateData, etc.).
A long position would buy-to-open, and sell-to-close while a short position has a sell-to-open and buy-to-close.
portfolio.PlaceOrder(type="BUY", symbol="ABC", date="01/02/2009", price=50.00, qty=100)
portfolio.PlaceOrder(type="SELL", symbol="ABC", date="12/31/2009", price=100.00, qty=25)
portfolio.PlaceOrder(type="SELLSHORT", symbol="XYZ", date="1/2/2009", price=30.00, qty=50)
portfolio.PlaceOrder(type="BUY", symbol="XYZ", date="2/1/2009", price=10.00, qty=50)
Then, once this method is called how do I store the information? At first I thought I would have a Position object with attributes like Symbol, OpenDate, OpenPrice, etc. but thinking about updating the position to account for sales becomes tricky because buys and sells happen at different times and amounts.
Buy 100 shares to open, 1 time, 1 price. Sell 4 different times, 4 different prices.
Buy 100 shares. Sell 1 share per day, for 100 days.
Buy 4 different times, 4 different prices. Sell entire position at 1 time, 1 price.
A possible solution would be to create an object for each share of stock, this way each share would have a different dates and prices. Would this be too much overhead? The portfolio could have thousands or millions of little Share objects. If you wanted to find out the total market value of a position you'd need something like:
sum([trade.last_price for trade in portfolio.positions if trade.symbol == "ABC"])
If you had a position object the calculation would be simple:
position.last * position.qty
Thanks in advance for the help. Looking at other posts it's apparent SO is for "help" not to "write your program for you". I feel that I just need some direction, pointing down the right path.
ADDITIONAL INFO UPON REFLECTION
The Purpose
The program would keep track of all positions, both open and closed; with the ability to see a detailed profit and loss.
When I think about detailed P&L I want to see...
- all the open dates (and closed dates)
- time held
- open price (closed date)
- P&L since open
- P&L per day
#Senderle
I think perhaps you're taking the "object" metaphor too literally, and so are trying to make a share, which seems very object-like in some ways, into an object in the programming sense of the word. If so, that's a mistake, which is what I take to be juxtapose's point.
This is my mistake. Thinking about "objects" a share object seems natural candidate. It's only until there may be millions that the idea seems crazy. I'll have some free coding time this weekend and will try creating an object with a quantity.
There are two basic precepts you should keep in mind when designing such a system:
Eliminate redundancy from your data. No redundancy insures integrity.
Keep all the data you need to answer any inquiry, at the lowest level of detail.
Based on these precepts, my suggestion is to maintain a Transaction Log file. Each transaction represents a change of state of some kind, and all the pertinent facts about it: when, what, buy/sell, how many, how much, etc. Each transaction would be represented by a record (a namedtuple is useful here) in a flat file. A years worth (or even 5 or 10 years) of transactions should easily fit in a memory resident list. You can then create functions to select, sort and summarize whatever information you need from this list, and being memory resident, it will be amazingly fast, much faster than a SQL database.
When and if the Transaction Log becomes too large or too slow, you can compute the state of all your positions as of a particular date (like year-end), use that for the initial state for the following period, and archive your old log file to disc.
You may want some auxiliary information about your holdings such as value/price on any particular date, so you can plot value vs. time for any or all holdings (There are on-line sources for this type of information, yahoo finance for one.) A master database containing static information about each of your holdings would also be useful.
I know this doesn't sound very "object oriented", but OO design could be useful to hide the detailed workings of the system in a TransLog object with methods to save/restore the data to/from disc (save/open methods), enter/change/delete a transaction; and additional methods to process the data into meaningful information displays.
First write the API with a command line interface. When this is working to your satisfaction, then you can go on to creating a GUI front end if you wish.
Good luck and have fun!
Avoid objects. Object oriented design is flawed. Think about your program as a collection of behaviors that operate on data (lists and dictionaries). Then group your related behaviors as functions in a module.
Each function should have clear input and outputs. Store your data globally in each module.
Why do it without objects? Because it maps closer to the problem space. Object oriented programming creates too much indirection to solve a problem. Unnecessary indirection causes software bloat and bugs.
A possible solution would be to create an object for each share of stock, this way each share would have a different dates and prices. Would this be too much overhead? The portfolio could have thousands or millions of little Share objects. If you wanted to find out the total market value of a position you'd need something like:
Yes it would be too much overhead. The solution here is you would store the data in a database. Finding the total market value of a position would be done in SQL unless you use a NOSQL scheme.
Don't try to design for all possible future outcomes. Just make your program work that way it needs to work now.
I think I'd separate it into
holdings (what you currently own or owe of each symbol)
orders (simple demands to buy or sell a single symbol at a single time)
trades (collections of orders)
This makes it really easy to get a current value, queue orders, and build more complex orders, and maps easily into data objects with a database behind them.
To answer your question: You appear to have a fairly clear idea of your data model already. But it looks to me like you need to think more about what you want this program to do. Will it keep track of changes in stock prices? Place orders, or suggest orders to be placed? Or will it simply keep track of the orders you've placed? Each of these uses may call for different strategies.
That said, I don't see why you would ever need to have an object for every share; I don't understand the reasoning behind that strategy. Even if you want to be able to track your order history in great detail, you could just store aggregate data, as in "x shares at y dollars per share, on date z".
It would make more sense to have a position object (or holding object, in Hugh's terminology) -- one per stock, perhaps with an .order_history attribute, if you really need a detailed history of your holdings in that stock. And yes, a database would definitely be useful for this kind of thing.
To wax philosophical for a moment: I think perhaps you're taking the "object" metaphor too literally, and so are trying to make a share, which seems very object-like in some ways, into an object in the programming sense of the word. If so, that's a mistake, which is what I take to be juxtapose's point.
I disagree with him that object oriented design is flawed -- that's a pretty bold pronouncement! -- but his answer is right insofar as an "object" (a.k.a. a class instance) is almost identical to a module**. It's a collection of related functions linked to some shared state. In a class instance, the state is shared via self or this, while in a module, it's shared through the global namespace.
For most purposes, the only major difference between a class instance and a module is that there can be many class instances, each one with its own independent state, while there can be only one module instance. (There are other differences, of course, but most of the time they involve technical matters that aren't very important for learning OOD.) That means that you can think about objects in a way similar to the way you think about modules, and that's a useful approach here.
**In many compiled languages, the file that results when you compile a module is called an "object" file. I think that's where the "object" metaphor actually comes from. (I don't have any real evidence of that! So anyone who knows better, feel free to correct me.) The ubiquitous toy examples of OOD that one sees -- car.drive(mph=50); car.stop(); car.refuel(unleaded, regular) -- I believe are back-formations that can confuse the concept a bit.
I would love to hear with what you came up with. I am ~ 4 months (part time) into creating an Order Handler and although its mostly complete, I still have the same questions as you as I'd like it to be made properly.
Currently, I save two files
A "Strategy Hit Log" where each buy/sell signal that comes from any strategy script is saved. For example: when the buy_at_yesterdays_close_price.py strategy is triggered, it saved that buy request in this file, and passes the request to the Order Handler
An "Order Log" which is a single DataFrame - this file fits the purpose you were focusing on.
Each request from a strategy pertains to a single underlying security (for example, AAPL stock) and creates an Order which is saved as a row in the DataFrame, containing columns for the Ticker, the Strategy name that spawned this Order as well as Suborder and Broker Suborder columns (explained below).
Each Order has a list of Suborders (dicts) stored in the Suborder column. For example: if you are bullish on AAPL, the suborders could be:
[
{'security': 'equity', 'quantity':10},
{'security': 'option call', 'quantity':10}
]
Each Order also has a list of Broker Suborders (dicts) stored in the Broker Suborder column. Each Broker Suborder is a request to the Broker to buy or sell a security, and is indexed with the "order ID" that the Broker provides for that request. Every new request to the Broker is a new Broker Suborder, and canceling that Broker Suborder is recorded in that dict. To record modifications to Broker Suborders, you cancel the old Broker Suborder and send and record a new one (its the same commission using IBKR).
Improvements
List of Classes instead of DataFrame: I think it'd be much more pythonic to save each Order as an instance of an Order_Class (instead of a row of a DataFrame), which has Suborder and Broker_Suborder attributes, both of which are also instances of Suborder_Class and Broker_Suborder_Class.
My current question is whether saving a list of classes as my record of all open and closed Orders is pythonic or silly.
Visualization Considerations: It seems like Orders should be saved in table form for easier viewing, but maybe it is better to save them in this "list of class instances" form and use a function to tabularize them at the time of viewing? Any input by anyone would be greatly appreciated. I'd like to move away from this and start playing with ML, but I don't want to leave the Order Handler unfinished.
Is it all crap?: should each Broker Suborder (buy/sell request to a Broker) be attached to an Order (which is just a specific strategy request from a strategy script) or should all Broker Suborders be recorded in chronological order and simply have a reference to the strategy-order (Order) that spawned the Broker Suborder? Idk... but I would love everyone's input.
I'm really new to database object design so please forgive any weirdness in my question. Basically, I am use Google AppEngine (Python) and contructing an object to track user info. One of these pieces of data is 40 Achievement scores. Do I make a list of ints in the User object for this? Or do I make a separate entity with my user id, the achievement index (0-39) and the score and then do a query to grab these 40 items every time I want to get the user data in total?
The latter approach seems more object oriented to me, and certainly better if I extend it to have more than just scores for these 40 achievements. However, considering that I might not extend it, should I even consider just doing a simple list of 40 ints in my user data? I would then forgo doing a query, getting the sorted list of achievements, reading the score from each one just to process a response etc.
Is doing this latter approach just such a common practice and hand-waved as not even worth batting an eyelash at in terms of thinking it might be more costly or complex processing wise?
I like the simple idea of keeping the list of 40 ints, but you can't force feed it into App Engine's existing User class, whose layout is determined by the GAE API (and doesn't include those 40 ints). So that list will inevitably need to live in a separate entity (i.e., each instance of a separate model).