classification of user population by access rights - python

What I am trying to do is to classify the employees by the roles they have
in an organization. This is computed by grabbing all the permissions, or
access lists, they have for the target enterprise software.
There are potentially 10000s of users and dozens of permissions per user.
Edit: when there are large amounts of users, the vast majority will have a limited set permissions. For example, they might all have Employee only. More complicated cases are power users and there will be way less.
Also, don't be misled by the permission names I have given, like Acct1/Acct2, they're just meant to give a feel for the the domain. The solution I am looking for should conceptually work even with randomnly-assigned primary key integers like to you see in many ORM stores - there is no implied relationship between permissions.
import pprint
pp = pprint.PrettyPrinter(indent=4)
def classify(employees):
"""employees assigned the same set
of permissions are grouped together"""
roles = dict()
for user, permissions in employees.items():
permissions = list(permissions)
permissions.sort()
key = tuple(permissions)
members = roles.setdefault(key, set([]))
members.add(user)
return roles
everyone = {
"Jim": set(["Employee","Acct1","Manager"]),
"Marion": set(["Employee","Acct1","Acct2"]),
"Omar": set(["Employee","Acct1"]),
"Kim": set(["Employee","Acct1"]),
"Tyler": set(["Employee","Acct1"]),
"Susan": set(["Employee","Marketing","Manager"]),
}
result = classify(everyone)
print("pass1")
pp.pprint(result)
At this point, the classification system returns the following:
{ ('Acct1', 'Acct2', 'Employee'): set(['Marion']),
('Acct1', 'Employee'): set(['Kim', 'Omar', 'Tyler']),
('Acct1', 'Employee', 'Manager'): set(['Jim']),
('Employee', 'Manager', 'Marketing'): set(['Susan'])}
From this, we can eyeball the data and manually assign some meaningful names to those roles.
Senior Accountants - Marion
Accounting Managers - Jim
Accountants - Kim, Omar, Tyler
Marketing Manager - Susan
The assignment is manual, but the intent is that it remains as "sticky" as possible even when people get hired or leave and when permission change.
Let's do a second pass.
Someone's decided to rename Acct2 to SrAcct. People get hired, Kim leaves.
This is represented by the following employee permissions:
everyone2 = {
"Jim": set(["Employee","Acct1","Manager"]),
"Marion": set(["Employee","Acct1","SrAcct"]),
"Omar": set(["Employee","Acct1"]),
"Tyler": set(["Employee","Acct1"]),
"Milton": set(["Employee","JuniorAcct"]),
"Susan": set(["Employee","Marketing","Manager"]),
"Tim": set(["Employee","Marketing"]),
}
The output this time is:
{ ('Acct1', 'Employee'): set(['Omar', 'Tyler']),
('Acct1', 'Employee', 'Manager'): set(['Jim']),
('Acct1', 'Employee', 'SrAcct'): set(['Marion']),
('Employee', 'JuniorAcct'): set(['Milton']),
('Employee', 'Manager', 'Marketing'): set(['Susan']),
('Employee', 'Marketing'): set(['Tim'])}
Ideally, we'd recognize that
Senior Accountants - Marion
Accounting Managers - Jim
Accountants - Omar, Tyler
Marketing Manager - Susan
new role - Tim
new role - Milton
Tim's roles will now be named a Marketer, while Milton a Junior Accountant.
What's important is that the role name assignment is stable enough to allow reasoning about an employee population even as people get hired and leave (most frequent) and as permissions are added or renamed (much less frequent). It's OK to ask the end user from time to time to assign new role names or to decide between ties. But most of the time, it should run along smoothly. What it shouldn't do it guess wrong and erroneously label a set of users as the wrong role name.
The problem I have is that it is easy to eyeball, but both the set of permissions and the set of users that define a role can change. Classification time is important, but the value of this classification mechanism goes up as the number of users and permissions increase.
I've tried extracting "the subset of permissions that define a role". For example, Employee is assigned to everyone so can be ignored. While (Manager, Acct1), (Manager, Marketing) uniquely belong to Jim and Susan. Trouble is that runs into a combinational explosion once you get the easy 20-30% of the cases out and it never finishes.
What I thinking now is to back and compute the new employee-permission role classification for each generation and then backtrack to get a fuzzy matching "best fit" compared to the previous generation. Pick the ones that are reasonably unambiguous and ask the user to decide on ties and to assign new role names as needed.
For example, an exact match on permissions and a reasonable match on employees means that 'Omar', 'Tyler' are still Accountants at pass 2. On the other hand, if Marion had left and I had "Jane": set(["Employee","Acct1","SrAcct"]), I'd have to ask the end user to arbitrate and identify her as a Senior Accountant.
I've worked with Jaccard Similarity (https://en.wikipedia.org/wiki/Jaccard_index) in the past, but I am unsure how it applies to cases where both sides can change (Acct2 => SrAcct as well as employee changes).
I am pretty sure this kind of logic has been needed before, so I'm hoping for recommendations for algorithms to look at and strategies to follow.
Oh, and I am looking for reasonably stand-alone approaches that I can implement, and reason about, within the context of a larger Python app. Not for machine-learning recommendations about how to configure the likes of TensorFlow to do this for me. Though, if push came to shove, I could call a batch to do the matching.

This will be a so-so answer, so apologies, but your problem is very wide and requires some logic rather than some specific code.
Perhaps this problem will be better addressed as "tags"? I mean a person could be both an employee, a guy in marketing, and a manager, all at the same time (and I presume will have permissions of all 3).
So I suggest a different approach - instead of grouping accounts by their respective permissions, and only then naming them manually, first classify and name the permissions (at least the more popular and stable among them) and then assign each employee to the correct category (or several) by giving each employee tags that encapsulate multiple permissions each.
Then, you will have quite a few users or permissions unclassified, but hopefully then you can ask users to do a bit of classification for you (for example, describe their position/permissions) and work with your approach on a much smaller problem set.
That way you can be sure that when a new employee enters, he is given the proper tag by looking at his permissions and deciding where he fits in. And when an employee leaves, it makes no difference, because he doesn't individually effects the permissions and tags.

What you're really creating here is a single tree of organizational hierarchy. Your grouping algorithm is already capable of that. You're not showing them within a single hierarchy, but they could easily be displayed that way.
The "subjective" part of your organization is deciding when it is appropriate to combine branches into a single organizational role, and deciding in which order to sort the permissions when creating the branches (i.e. do you want to have a single manager branch, with divisions below that, or do you want to have department branches, each containing a manager branch).
Unfortunately, there's no way for a machine to know those preferences. You're going to have to make all those decisions, especially if you're going to require a 0% false positive rate.
The easiest way I can think of to provide this preference information to the algorithm would be to give it an ordered list of permission "weights' it will use when building the hierarchy. For a first pass, you could just order them by how many people have that permission. It's possible that you might need more complex "weighting" than a single set of ordered permissions. For a more complex weighting, you would need to specify more complex "rules" that check membership (or non-membership) in multiple permission sets.
The second bit of information would likely be provided interactively. Given a display of the entire organizational chart, you would choose which permission sets should be combined into a single organizational set. This is where you would also assign display names for your roles to each permission set group(s).
As far as being able to respond to hires/fires, it shouldn't be a problem so long as the permissions are the same. As far as adding and removing permissions from users, you would have to store previous permissions and groupings and match them against current permissions for each user to prompt someone to either okay the change to the role permission set, or to form a new branch with the new permission.

This is what I ended up doing:
before calculating the classification for new set of user/access, save the old ones, along with their assigned names.
after the new classifications are calculated, find the closest match between the new and the old and transfer the names if the confidence is high enough.
full user match? then it's a match. I transform the user set into a sorted tuple of users to match via dictionary.
full permissions match? Again, it's a match. Again, check via a set to sorted tuple tranform lookup against a dictionary.
For each current left unmatched, I calculate a Jaccard similarity for each unmatched previous, separately on its users and its permissions. So, that could go O(N2)on the numbers of unmatched. Append each match to that classification's list. Sort the list in order of score (from the calc function below) and, last step, only pick one automatically if there is large enough difference with the next closest match.
`
class Match(object):
#these are weighing coefficients - I consider roles/permissions more important because of the expected user churn.
con_roles = .7
con_users = .3
con_other = .07
threshold = .7
def calc(self):
#could have anything you want here, really.
self.similarity = self.con_roles * self.simroles + self.con_users * self.simusers
OK, I am leaving a lot out, but basically, you can apply a simple Jaccard similarity algo to both the user and role side and put those numbers into a suitable equation to see what's a close match. If not satified, ask the user to assign the names again, as a last resort.
Hopefully that'll help someone if they end up looking for something similar.

Related

MultiSelectField vs separate model

I'm building a directory of Hospitals and Clinics and as a speciality field I'd like to store the speciality or type of clinic or hospital (like Dermathologist, etc.). However, there are some places, specially big ones, with many different specialities in one place, and as the choices= method of a CharField doesn't allow me to select more than one option I had to think of an alternative.
At first I didn't think it was necessary to create a different table and add a relation, that's why I tried the django-multiselectfield package and it works just fine, but I was wondering if it would be better to create a different table and give it a relation to the Hospitals model. That 'type' or 'speciality' table once built it likely won't ever change in its contents. Is it better to build a different table performance-wise?
Also I'm trying to store the choices of the model in a different choices.py file with TextChoices model classes as I will be using the same choices in various fields of different models through different apps. I know is generally better to store the choices inside the same class, but does that make sense in my case?
Performance is probably not the primary concern here; I think the difference between the two approaches would be negligible. Whether one or more than one model would use the same set of choices doesn't lean one way or another; either a fixed list or many-to-many relation could accommodate that.
Although you say that the selections aren't expected to change (an argument in favor of a hard-coded list of choices), medical specialties are a kind of data that do change in the long run. Contrast this with, say, months of the year or days of the week, which are a lot less likely to change.
That said, if you already have a multi-select field working, I'd be inclined to leave it alone until there's a compelling reason to change it.
For that 2nd part, I see no issue with storing the Choice list in another .py file.
I've done that strictly to keep my models.py looking somewhat pretty- I don't want to scroll past 150 choices to double check a model method.
The 1st part is all about taste. I'd personally go the Relation + Many-To-Many route.
I always plan for edge cases so "likely won't change" = "So there's a possibility"
Also I like that the Relation + Many-To-Many route doesn't have a dependency, it's a Core Django feature.. It's pretty rock solid and future proof
an added benefit is making it another table also means that a non-technical person could potentially add new options and in theory you're not spending your time constantly changing it.

Django - short non-linear non-predictable ID in the URL

I know there are similar questions (like this, this, this and this) but I have specific requirements and looking for a less-expensive way to do the following (on Django 1.10.2):
Looking to not have sequential/guessable integer ids in the URLs and ideally meet the following requirements:
Avoid UUIDs since that makes the URL really long.
Avoid a custom primary key. It doesn’t seem to work well if the models have ManyToManyFields. Got affected by at least three bugs while trying that (#25012, #24030 and #22997), including messing up the migrations and having to delete the entire db and recreating the migrations (well, lots of good learning too)
Avoid checking for collisions if possible (hence avoid a db lookup for every insert)
Don’t just want to look up by the slug since it’s less performant than just looking up an integer id.
Don’t care too much about encrypting the id - just don’t want it to
be a visibly sequential integer.
Note: The app would likely have 5 million records or so in the long term.
After researching a lot of options on SO, blogs etc., I ended up doing the following:
Encoding the id to base32 only for the URLs and decoding it back in urls.py (using an edited version of Django’s util functions to encode to base 36 since I needed uppercase letters instead of lowercase).
Not storing the encoded id anywhere. Just encoding and decoding everytime on the fly.
Keeping the default id intact and using it as primary key.
(good hints, posts and especially this comment helped a lot)
What this solution helps achieve:
Absolutely no edits to models or post_save signals.
No collision checks needed. Avoiding one extra request to the db.
Lookup still happens on the default id which is fast. Also, no double save()requests on the model for every insert.
Short and sweet encoded ID (the number of characters go up as the number of records increase but still not very long)
What it doesn’t help achieve/any drawbacks:
Encryption - the ID is encoded but not encrypted, so the user may
still be able to figure out the pattern to get to the id (but I dont
care about it much, as mentioned above).
A tiny overhead of encoding and decoding on each URL construction/request but perhaps that’s better than collision checks and/or multiple save() calls on the model object for insertions.
For reference, looks like there are multiple ways to generate random IDs that I discovered along the way (like Django’s get_random_string, Python’s random, Django’s UUIDField etc.) and many ways to encode the current ID (base 36, base 62, XORing, and what not).
The encoded ID can also be stored as another (indexed) field and looked up every time (like here) but depends on the performance parameters of the web app (since looking up a varchar id is less performant that looking up an integer id). This identifier field can either be saved from a overwritten model’s save() function, or by using a post_save() signal (see here) (while both approaches will need the save() function to be called twice for every insert).
All ears to optimizations to the above approach. I love SO and the community. Everytime there’s so much to learn here.
Update: After more than a year of this post, I found this great library called hashids which does pretty much the same thing quite well! Its available in many languages including Python.

Can a function in an sqlalchemy class perform a query?

I may be guilty of thinking in too object-oriented a manner here, but I keep coming back to a certain pattern which doesn't look SQL-friendly at all.
Simplified description:-
I have an Attendance table, with various attributes including clockin, lunchout, lunchin, and clockout (all of type Time), an employee ID, and a date.
Different employees can have different shift hours, which depends on the day of the week and day of the month. The employee's categories are summarized in an Employees table, the shift hours etc. are summarized in an Hours table. A Leave table specifies half/full day leave taken.
I have raw check-in times from another source (third party, basically consider this a csv) which I sort and then need to insert to the Attendance table properly, by which I imagine I should be doing:-
a) Take the earliest time as clockin
b) Take the latest time as clockout
c) Compare remaining times with assigned lunch hours according to an employee's shift for the day
c) is the one giving me problems, as some employees check-in at odd times (perhaps they need to run out for an hour to meet a client), and some simply don't check-in for meals (eating in, etc). So I need to fuzzy-match the remaining times with assigned lunch hours, meaning I have to access the Hours table as well.
I'm trying to do all of this in a function within the Attendance class, so that it looks something like this:-
class Attendance(Base):
... my attributes ...
def calc_attendance(self, listOfTimes):
# Do the necessary
However it looks like I'd be required to do more queries within the calc_attendance function. Not even sure if that's possible, and certainly doesn't look like something I should be doing.
I could do multiple queries in the calling function, but that seems more messy. I've also considered multiple joins so the requisite information (for each employee per day) is available. Neither of them feel right, please do correct me though, if necessary.
In short, yes you seem to be trying to jam everything into one class, but that is not "wrong" it's just which class you're jamming things into.
to answer the actual question: Yes, you can perform a query inside another class, and or make that class turn itself into a dict or other iter-able format which makes comparing things easier in some ways.
However I have found in practice it is usually more efficient from a time perspective to build a class per table and apply modifications to that table's specific data using the internal functions, and a overriding umbrella "job" class that performs the specific "tasks" comparison, data input, etc... for the group but which encapsulates this process so that it can scale up and be used by other higher level classes in future. This allows for table specific work done in their respective classes, but the group to be compared and fuzzy matched without cluttering up their respective tables. This makes maintaining the code easier and splitting the work amongst a group possible. It also allows for a much sharper focus on each task. In theory would also allow for asyncio and other optimizations in future.
so:
Main Job (class)
(various comparison functions,data input verification, etc...)
|-> table 1 class instance
|-> table 2 class instance
happy to help further if you need actual examples.

Object oriented design for an investment/stock and options portfolio in Python

I'm a beginner/intermediate Python programmer but I haven't written an application, just scripts. I don't currently use a lot of object oriented design, so I would like this project to help build my OOD skills. The problem is, I don't know where to start from a design perspective (I know how to create the objects and all that stuff). For what it's worth, I'm also self taught, no formal CS education.
I'd like to try writing a program to keep track of a portfolio stock/options positions.
I have a rough idea about what would make good object candidates (Portfolio, Stock, Option, etc.) and methods (Buy, Sell, UpdateData, etc.).
A long position would buy-to-open, and sell-to-close while a short position has a sell-to-open and buy-to-close.
portfolio.PlaceOrder(type="BUY", symbol="ABC", date="01/02/2009", price=50.00, qty=100)
portfolio.PlaceOrder(type="SELL", symbol="ABC", date="12/31/2009", price=100.00, qty=25)
portfolio.PlaceOrder(type="SELLSHORT", symbol="XYZ", date="1/2/2009", price=30.00, qty=50)
portfolio.PlaceOrder(type="BUY", symbol="XYZ", date="2/1/2009", price=10.00, qty=50)
Then, once this method is called how do I store the information? At first I thought I would have a Position object with attributes like Symbol, OpenDate, OpenPrice, etc. but thinking about updating the position to account for sales becomes tricky because buys and sells happen at different times and amounts.
Buy 100 shares to open, 1 time, 1 price. Sell 4 different times, 4 different prices.
Buy 100 shares. Sell 1 share per day, for 100 days.
Buy 4 different times, 4 different prices. Sell entire position at 1 time, 1 price.
A possible solution would be to create an object for each share of stock, this way each share would have a different dates and prices. Would this be too much overhead? The portfolio could have thousands or millions of little Share objects. If you wanted to find out the total market value of a position you'd need something like:
sum([trade.last_price for trade in portfolio.positions if trade.symbol == "ABC"])
If you had a position object the calculation would be simple:
position.last * position.qty
Thanks in advance for the help. Looking at other posts it's apparent SO is for "help" not to "write your program for you". I feel that I just need some direction, pointing down the right path.
ADDITIONAL INFO UPON REFLECTION
The Purpose
The program would keep track of all positions, both open and closed; with the ability to see a detailed profit and loss.
When I think about detailed P&L I want to see...
- all the open dates (and closed dates)
- time held
- open price (closed date)
- P&L since open
- P&L per day
#Senderle
I think perhaps you're taking the "object" metaphor too literally, and so are trying to make a share, which seems very object-like in some ways, into an object in the programming sense of the word. If so, that's a mistake, which is what I take to be juxtapose's point.
This is my mistake. Thinking about "objects" a share object seems natural candidate. It's only until there may be millions that the idea seems crazy. I'll have some free coding time this weekend and will try creating an object with a quantity.
There are two basic precepts you should keep in mind when designing such a system:
Eliminate redundancy from your data. No redundancy insures integrity.
Keep all the data you need to answer any inquiry, at the lowest level of detail.
Based on these precepts, my suggestion is to maintain a Transaction Log file. Each transaction represents a change of state of some kind, and all the pertinent facts about it: when, what, buy/sell, how many, how much, etc. Each transaction would be represented by a record (a namedtuple is useful here) in a flat file. A years worth (or even 5 or 10 years) of transactions should easily fit in a memory resident list. You can then create functions to select, sort and summarize whatever information you need from this list, and being memory resident, it will be amazingly fast, much faster than a SQL database.
When and if the Transaction Log becomes too large or too slow, you can compute the state of all your positions as of a particular date (like year-end), use that for the initial state for the following period, and archive your old log file to disc.
You may want some auxiliary information about your holdings such as value/price on any particular date, so you can plot value vs. time for any or all holdings (There are on-line sources for this type of information, yahoo finance for one.) A master database containing static information about each of your holdings would also be useful.
I know this doesn't sound very "object oriented", but OO design could be useful to hide the detailed workings of the system in a TransLog object with methods to save/restore the data to/from disc (save/open methods), enter/change/delete a transaction; and additional methods to process the data into meaningful information displays.
First write the API with a command line interface. When this is working to your satisfaction, then you can go on to creating a GUI front end if you wish.
Good luck and have fun!
Avoid objects. Object oriented design is flawed. Think about your program as a collection of behaviors that operate on data (lists and dictionaries). Then group your related behaviors as functions in a module.
Each function should have clear input and outputs. Store your data globally in each module.
Why do it without objects? Because it maps closer to the problem space. Object oriented programming creates too much indirection to solve a problem. Unnecessary indirection causes software bloat and bugs.
A possible solution would be to create an object for each share of stock, this way each share would have a different dates and prices. Would this be too much overhead? The portfolio could have thousands or millions of little Share objects. If you wanted to find out the total market value of a position you'd need something like:
Yes it would be too much overhead. The solution here is you would store the data in a database. Finding the total market value of a position would be done in SQL unless you use a NOSQL scheme.
Don't try to design for all possible future outcomes. Just make your program work that way it needs to work now.
I think I'd separate it into
holdings (what you currently own or owe of each symbol)
orders (simple demands to buy or sell a single symbol at a single time)
trades (collections of orders)
This makes it really easy to get a current value, queue orders, and build more complex orders, and maps easily into data objects with a database behind them.
To answer your question: You appear to have a fairly clear idea of your data model already. But it looks to me like you need to think more about what you want this program to do. Will it keep track of changes in stock prices? Place orders, or suggest orders to be placed? Or will it simply keep track of the orders you've placed? Each of these uses may call for different strategies.
That said, I don't see why you would ever need to have an object for every share; I don't understand the reasoning behind that strategy. Even if you want to be able to track your order history in great detail, you could just store aggregate data, as in "x shares at y dollars per share, on date z".
It would make more sense to have a position object (or holding object, in Hugh's terminology) -- one per stock, perhaps with an .order_history attribute, if you really need a detailed history of your holdings in that stock. And yes, a database would definitely be useful for this kind of thing.
To wax philosophical for a moment: I think perhaps you're taking the "object" metaphor too literally, and so are trying to make a share, which seems very object-like in some ways, into an object in the programming sense of the word. If so, that's a mistake, which is what I take to be juxtapose's point.
I disagree with him that object oriented design is flawed -- that's a pretty bold pronouncement! -- but his answer is right insofar as an "object" (a.k.a. a class instance) is almost identical to a module**. It's a collection of related functions linked to some shared state. In a class instance, the state is shared via self or this, while in a module, it's shared through the global namespace.
For most purposes, the only major difference between a class instance and a module is that there can be many class instances, each one with its own independent state, while there can be only one module instance. (There are other differences, of course, but most of the time they involve technical matters that aren't very important for learning OOD.) That means that you can think about objects in a way similar to the way you think about modules, and that's a useful approach here.
**In many compiled languages, the file that results when you compile a module is called an "object" file. I think that's where the "object" metaphor actually comes from. (I don't have any real evidence of that! So anyone who knows better, feel free to correct me.) The ubiquitous toy examples of OOD that one sees -- car.drive(mph=50); car.stop(); car.refuel(unleaded, regular) -- I believe are back-formations that can confuse the concept a bit.
I would love to hear with what you came up with. I am ~ 4 months (part time) into creating an Order Handler and although its mostly complete, I still have the same questions as you as I'd like it to be made properly.
Currently, I save two files
A "Strategy Hit Log" where each buy/sell signal that comes from any strategy script is saved. For example: when the buy_at_yesterdays_close_price.py strategy is triggered, it saved that buy request in this file, and passes the request to the Order Handler
An "Order Log" which is a single DataFrame - this file fits the purpose you were focusing on.
Each request from a strategy pertains to a single underlying security (for example, AAPL stock) and creates an Order which is saved as a row in the DataFrame, containing columns for the Ticker, the Strategy name that spawned this Order as well as Suborder and Broker Suborder columns (explained below).
Each Order has a list of Suborders (dicts) stored in the Suborder column. For example: if you are bullish on AAPL, the suborders could be:
[
{'security': 'equity', 'quantity':10},
{'security': 'option call', 'quantity':10}
]
Each Order also has a list of Broker Suborders (dicts) stored in the Broker Suborder column. Each Broker Suborder is a request to the Broker to buy or sell a security, and is indexed with the "order ID" that the Broker provides for that request. Every new request to the Broker is a new Broker Suborder, and canceling that Broker Suborder is recorded in that dict. To record modifications to Broker Suborders, you cancel the old Broker Suborder and send and record a new one (its the same commission using IBKR).
Improvements
List of Classes instead of DataFrame: I think it'd be much more pythonic to save each Order as an instance of an Order_Class (instead of a row of a DataFrame), which has Suborder and Broker_Suborder attributes, both of which are also instances of Suborder_Class and Broker_Suborder_Class.
My current question is whether saving a list of classes as my record of all open and closed Orders is pythonic or silly.
Visualization Considerations: It seems like Orders should be saved in table form for easier viewing, but maybe it is better to save them in this "list of class instances" form and use a function to tabularize them at the time of viewing? Any input by anyone would be greatly appreciated. I'd like to move away from this and start playing with ML, but I don't want to leave the Order Handler unfinished.
Is it all crap?: should each Broker Suborder (buy/sell request to a Broker) be attached to an Order (which is just a specific strategy request from a strategy script) or should all Broker Suborders be recorded in chronological order and simply have a reference to the strategy-order (Order) that spawned the Broker Suborder? Idk... but I would love everyone's input.

GAE python database object design for simple list of values

I'm really new to database object design so please forgive any weirdness in my question. Basically, I am use Google AppEngine (Python) and contructing an object to track user info. One of these pieces of data is 40 Achievement scores. Do I make a list of ints in the User object for this? Or do I make a separate entity with my user id, the achievement index (0-39) and the score and then do a query to grab these 40 items every time I want to get the user data in total?
The latter approach seems more object oriented to me, and certainly better if I extend it to have more than just scores for these 40 achievements. However, considering that I might not extend it, should I even consider just doing a simple list of 40 ints in my user data? I would then forgo doing a query, getting the sorted list of achievements, reading the score from each one just to process a response etc.
Is doing this latter approach just such a common practice and hand-waved as not even worth batting an eyelash at in terms of thinking it might be more costly or complex processing wise?
I like the simple idea of keeping the list of 40 ints, but you can't force feed it into App Engine's existing User class, whose layout is determined by the GAE API (and doesn't include those 40 ints). So that list will inevitably need to live in a separate entity (i.e., each instance of a separate model).

Categories