When naming a container , what's a better coding style:
source = {}
#...
source[record] = some_file
or
sources = {}
#...
sources[record] = some_file
The plural reads more natural at creation; the singular at assignment.
And it is not an idle question; I did catch myself getting confused in an old code when I wasn't sure if a variable was a container or a single value.
UPDATE
It seems there's a general agreement that when the dictionary is used as a mapping, it's better to use a more detailed name (e.g., recordToSourceFilename); and if I absolutely want to use a short name, then make it plural (e.g., sources).
I think that there are two very specific use cases with dictionaries that should be identified separately. However, before addressing them, it should be noted that the variable names for dictionaries should almost always be singular, while lists should almost always be plural.
Dictionaries as object-like entities: There are times when you have a dictionary that represents some kind of object-like data structure. In these instances, the dictionary almost always refers to a single object-like data structure, and should therefore be singular. For example:
# assume that users is a list of users parsed from some JSON source
# assume that each user is a dictionary, containing information about that user
for user in users:
print user['name']
Dictionaries as mapping entities: Other times, your dictionary might be behaving more like a typical hash-map. In such a case, it is best to use a more direct name, though still singular. For example:
# assume that idToUser is a dictionary mapping IDs to user objects
user = idToUser['0001a']
print user.name
Lists: Finally, you have lists, which are an entirely separate idea. These should almost always be plural, because they are simple a collection of other entities. For example:
users = [userA, userB, userC] # makes sense
for user in users:
print user.name # especially later, in iteration
I'm sure that there are some obscure or otherwise unlikely situations that might call for some exceptions to be made here, but I feel that this is a pretty strong guideline to follow when naming dictionaries and lists, not just in Python but in all languages.
It should be plural because then the program behaves just like you read it aloud. Let me show you why it should not be singular (totally contrived example):
c = Customer(name = "Tony")
c.persist()
[...]
#
# 500 LOC later, you retrieve the customer list as a mapping from
# customer ID to Customer instance.
#
# Singular
customer = fetchCustomerList()
nameOfFirstCustomer = customer[0].name
for c in customer: # obviously it's totally confusing once you iterate
...
# Plural
customers = fetchCustomerList()
nameOfFirstCustomer = customers[0].name
for customer in customers: # yeah, that makes sense!!
...
Furthermore, sometimes it's a good idea to have even more explicit names from which you can infer the mapping (for dictionaries) and probably the type. I usually add a simple comment when I introduce a dictionary variable. An example:
# Customer ID => Customer
idToCustomer = {}
[...]
idToCustomer[1] = Customer(name = "Tony")
I prefer plurals for containers. There's just a certain understandable logic in using:
entries = []
for entry in entries:
#Code...
Related
tl;dr: I want to express something like [child.child_field_value, child.parent_field_value] on a Django child model and get an iterable like ['Alsatian', 'Dog'] or similar.
Context: I'm trying to prepare a dict for a JSON API in Django, such that I have two models, Evaluation and its parent Charity.
In the view I filter for all Evaluations meeting certain parameters, and then use a dict comp nexted in a list comp on evaluation.__dict__.items() to drop Django's '_state' field (this isn't the focus of this question, but please tell me if you know a better practice!):
response = { 'evaluations': [{
key:value for key, value in evaluation.__dict__.items()
if key not in ['_state']} for evaluation in evaluations]}
But I want a good way to combine the fields charity_name and charity_abbreviation of each Evaluation's parent charity with the rest of that evaluation's fields. So far the best way I can find/think of is during the dict comp to conditionally check whether the field we're iterating through is charity_id and if so to look up that charity and return an array of the two fields.
But I haven't figured out how to do that, and it seems likely to end up with something very messy which isn't isn't functionally ideal, since I'd rather that array was two key:value pairs in line with the rest of the dictionary.
I'm making an application in which a user can create categories to put items in them. The items share some basic properties, but the rest of them are defined by the category they belong to. The problem is that both the category and it's special properties are created by the user.
For instance, the user may create two categories: books and buttons. In the 'book' category he may create two properties: number of pages and author. In the buttons category he may create different properties: number of holes and color.
Initially, I placed these properties in a JsonProperty inside the Item. While this works, it means that I query the Datastore just by specifying the category that I am looking for and then I have to filter the results of the query in the code. For example, if I'm looking for all the books whose author is Carl Sagan, I would query the Item class with category == books and the loop through the results to keep only those that match the author.
While I don't really expect to have that many items per category (probably in the hundreds, unlikely to get to one thousand), this looks inefficient. So I tried to use ndb.Expando to make those special properties real properties that are indexed. I did this, adding the corresponding special properties to the item when putting it to the Datastore. So if the user creates an Item in the 'books' category and previously created in that category the special property 'author', an Item is saved with the special property expando_author = author in it. It worked as I expected until this point (dev server).
The real problem though became visible when I did some queries. While they worked in the dev server, they created composite indexes for each special/expando property, even if the query filters were equality only. And while each category can have at most five properties, it is evident that it can easily get out of control.
Example query:
items = Item.query()
for p in properties:
items = items.filter(ndb.GenericProperty(p)==properties[p])
items.fetch()
Now, since I don't know in advance what the properties will be (though I will limit it to 5), I can't build the indexes before uploading the application, and even if I knew it would probably mean having more indexes that I'm comfortable with. Is Expando the wrong tool for what I'm trying to do? Should I just keep filtering the results in the code using the JsonProperty? I would greatly appreciate any advice I can get.
PD. To make this post shorter I omitted a few details about what I did, if you need to know something I may have left out just ask in the comments.
Consider storing category's properties in a single list property prefixed with category property name.
Like (forget me I forgot exact Python syntax, switched to Go)
class Item():
props = StringListProperty()
book = Item(category='book', props=['title:Carl Sagan'])
button = Item(category='button', props=['wholes:5'])
Then you can do have a single composite index on category+props and do queries like this:
def filter_items(category, propName, propValue):
Item.filter(Item.category == category).filter(Item.props==propName+':'+propValue)
And you would need a function on Item to get property values cleaned up from prop names.
I'm very new to programming (taking my first class in it now), so bear with me for format issues and misunderstandings, or missing easy fixes.
I have a dict with tweet data: 'user' as keys and then 'text' as their values. My goal here is to find the tweets where they are replying to another user, signified by starting with the # symbol, and then make a new dict that contains the author's user and the users of everyone he replied to. That's the fairly simple if statement I have below. I was also able to use the split function to isolate the username of the person they are replying to (the function takes all the text between the # symbol and the next space after it).
st='#'
en=' '
task1dict={}
for t in a,b,c,d,e,f,g,h,i,j,k,l,m,n:
if t['text'][0]=='#':
user=t['user']
repliedto=t['text'].split(st)[-1].split(en)[0]
task1dict[user]=[repliedto]
Username1 replied to username2. Username2 replied to both username3 and username5.
I am trying to create a dict (caled tweets1) that reads something like:
'user':'repliedto'
username1:[username2]
username2:[username3, username5]
etc.
Is there a better way to isolate the usernames, and then put them into a new dict? Here's a 2 entry sample of the tweet data:
{"user":"datageek88","text":"#sundevil1992 good question! #joeclarknet Is this on the exam?"},
{"user":"joeclarkphd","text":"Exam questions will be answered in due time #sundevil1992"}
I am now able to add them to a dict, but it would only save one 'repliedto' for each 'user', so instead of showing username2 have replied to both 3 and 5, it just shows the latest one, 5:
{'username1': ['username2'],
'username2': ['username5']}
Again, if I'm making a serious no-no anywhere in here, I apologize, and please show me what I'm doing wrong!
Modify the last line to
task1dict.setdefault(user, [])
task1dict[user].append (repliedto)
You were overwriting the users replied to array each time you edited it. The setdefault method will set the dict to have a empty list if it doesn't already exist. Then just append to the list.
EDIT: same code using a set for uniqueness.
task1dict.setdefault(user, set())
task1dict[user].add (repliedto)
For a set you add an element to the set. Whereas a list you append to the list
I might do it like this. Use the following regular expression to identify all usernames.
r"#([^\s]*)"
It means look for the # symbol, and then return all characters that aren't a space. A defaultdict is a simply a dictionary that returns a default value if they key isn't found. In this case, I specify an empty set as the return type in the event that we are adding a new key.
import re
from collections import defaultdict
tweets = [{"user":"datageek88","text":"#sundevil1992 good question! #joeclarknet Is this on the exam?"},
{"user":"joeclarkphd","text":"Exam questions will be answered in due time #sundevil1992"}]
from_to = defaultdict(set)
for tweet in tweets:
if "#" in tweet['text']:
user = tweet['user']
for replied_to in re.findall(r"#([^\s]*)", tweet['text']):
from_to[user].add(replied_to)
print from_to
Output
defaultdict(<type 'list'>, {'joeclarkphd': ['sundevil1992'],
'datageek88': ['sundevil1992', 'joeclarknet']})
I am building an application for Facebook using Google App Engine. I was trying to compare friends in my user's Facebook account to those already in my application, so I could add them to the database if they are friends in Facebook but not in my application, or not if they are already friends in both. I was trying something like this:
request = graph.request("/me/friends")
user = User.get_by_key_name(self.session.id)
list = []
for x in user.friends:
list.append(x.user)
for friend in request["data"]:
if User.get_by_key_name(friend["id"]):
friendt = User.get_by_key_name(friend["id"])
if friendt.key not in user.friends:
newfriend = Friend(friend = user,
user = friendt,
id = friendt.id)
newfriend.put()
graph.request returns an object with the user's friends. How do I compare content in te two lists of retrieved objects. It doesn't necessarily need to be Facebook related.
(I know this question may be quite silly, but it is really being a pain for me.)
If you upgrade to NDB, the "in" operator will actually work; NDB implements a proper eq operator on Model instances. Note that the key is also compared, so entities that have the same property values but different keys are considered unequal. If you want to ignore the key, consider comparing e1._to_dict() == e2._to_dict().
You should write a custom function to compare your objects, and consider it as a comparison of nested dictionaries. As you will be comparing only the attributes and not functions, you have to do a nested dict comparison.
Reason: All the attributes will be not callable and hopefully, might not start with _, so you have to just compare the remaining elements from the obj.dict and the approach should be bottom up i.e. finish off the nested level objects first (e.g. the main object could host other objects, which will have their own dict)
Lastly, you can consider the accepted answer code here: How to compare two lists of dicts in Python?
Even with all I do know about the AppEngine datastore, I don't know the answer to this. I'm trying to avoid having to write and run all the code it would take to figure it out, hoping someone already knows the answer.
I have code like:
class AddlInfo(db.Model)
user = db.ReferenceProperty(User)
otherstuff = db.ListProperty(db.Key, indexed=False)
And create the record with:
info = AddlInfo(user=user)
info.put()
To get this object I can do something like:
# This seems excessively wordy (even though that doesn't directly translate into slower)
info = AddlInfo.all().filter('user =', user).fetch(1)
or I could do something like:
class AddlInfo(db.Model)
# str(user.key()) is the key to this record
otherstuff = db.ListProperty(db.Key, indexed=False)
Creation looks like:
info = AddlInfo(key_name=str(user.key()))
info.put()
And then get the info with:
info = AddlInfo.get(str(user.key()))
I don't need the reference_property in the AddlInfo, (I got there using the user object in the first place). Which is faster/less resource intensive?
==================
Part of why I was doing it this way is that otherstuff could be a list of 100+ keys and I only need them sometimes (probably less than 50% of the time) I was trying to make it more efficient by not having to load those 100+ keys on every request.....
Between those 2 options, the second is marginally cheaper, because you're determining the key by inference rather than looking it up in a remote index.
As Wooble said, it's cheaper still to just keep everything on one entity. Consider an Expando if you just need a way to store a bunch of optional, ad-hoc properties.
The second approach is the better one, with one modification: There's no need to use the whole key of the user as the key name of this entity - just use the same key name as the User record.