'if' element is not in list on Google App Engine - python

I am building an application for Facebook using Google App Engine. I was trying to compare friends in my user's Facebook account to those already in my application, so I could add them to the database if they are friends in Facebook but not in my application, or not if they are already friends in both. I was trying something like this:
request = graph.request("/me/friends")
user = User.get_by_key_name(self.session.id)
list = []
for x in user.friends:
list.append(x.user)
for friend in request["data"]:
if User.get_by_key_name(friend["id"]):
friendt = User.get_by_key_name(friend["id"])
if friendt.key not in user.friends:
newfriend = Friend(friend = user,
user = friendt,
id = friendt.id)
newfriend.put()
graph.request returns an object with the user's friends. How do I compare content in te two lists of retrieved objects. It doesn't necessarily need to be Facebook related.
(I know this question may be quite silly, but it is really being a pain for me.)

If you upgrade to NDB, the "in" operator will actually work; NDB implements a proper eq operator on Model instances. Note that the key is also compared, so entities that have the same property values but different keys are considered unequal. If you want to ignore the key, consider comparing e1._to_dict() == e2._to_dict().

You should write a custom function to compare your objects, and consider it as a comparison of nested dictionaries. As you will be comparing only the attributes and not functions, you have to do a nested dict comparison.
Reason: All the attributes will be not callable and hopefully, might not start with _, so you have to just compare the remaining elements from the obj.dict and the approach should be bottom up i.e. finish off the nested level objects first (e.g. the main object could host other objects, which will have their own dict)
Lastly, you can consider the accepted answer code here: How to compare two lists of dicts in Python?

Related

Can you access a parent field from a child object in Django?

tl;dr: I want to express something like [child.child_field_value, child.parent_field_value] on a Django child model and get an iterable like ['Alsatian', 'Dog'] or similar.
Context: I'm trying to prepare a dict for a JSON API in Django, such that I have two models, Evaluation and its parent Charity.
In the view I filter for all Evaluations meeting certain parameters, and then use a dict comp nexted in a list comp on evaluation.__dict__.items() to drop Django's '_state' field (this isn't the focus of this question, but please tell me if you know a better practice!):
response = { 'evaluations': [{
key:value for key, value in evaluation.__dict__.items()
if key not in ['_state']} for evaluation in evaluations]}
But I want a good way to combine the fields charity_name and charity_abbreviation of each Evaluation's parent charity with the rest of that evaluation's fields. So far the best way I can find/think of is during the dict comp to conditionally check whether the field we're iterating through is charity_id and if so to look up that charity and return an array of the two fields.
But I haven't figured out how to do that, and it seems likely to end up with something very messy which isn't isn't functionally ideal, since I'd rather that array was two key:value pairs in line with the rest of the dictionary.

Filtering data in python

I am working on a web crawler for python that gathers information on posts by users on a site and compares their scores for posts all provided users participate in. It is currently structured so that I receive the following data:
results is a dictionary indexed by username that contains dictionaries of each user's history in a post, points key value structure.
common is a list that starts with all the posts in the dictionary of the first user in results. This list should be filtered down to only the posts all users have in common
points is a dictionary indexed by username that keeps a running total of points on shared posts.
My filtering code is below:
common = list(results.values()[0].keys())
for user in results:
for post_hash in common:
if post_hash not in results[user]:
common.remove(post_hash)
else:
points[user] += results[user][post_hash]
The issue I'm encountering is that this doesn't actually filter out posts that aren't shared, and thus, doesn't provide accurate point values.
What am I doing wrong with my structure, and is there any easier way to find only the common posts?
I think you may have two issues:
Using a list for common means that when you remove an item via common.remove, it will only remove the first item it finds (there could be more)
You're not just adding points for posts shared by all users - you're adding points for users as you encounter them - before you know if that post is shared by everyone or not
Without some actual data to play with, it's a little difficult to write working code, but try this:
# this should give us a list of posts shared by all users
common = set.intersection(*[set(k.keys()) for k in results.values()])
# there's probably a more efficient (functional) way of summing the points
# by user instead of looping, but simple is good.
for user in results:
for post_hash in common:
points[user] += results[user][post_hash]
from collections import Counter
from functools import reduce
posts = []
# Create an array of all the post hashes
for p in results.values():
posts.extend(p.keys())
# use Counter to create a dictionary like object that where the key
# is the post hash and the value is the number of occurrences
posts = Counter(posts)
for user in results:
# Reduce only the posts that show up more than once.
points[user] = reduce(lambda x,y: x+y, (post for post in user if posts[post] > 1))
import functools
iterable = (v.keys() for v in results.values())
common = funtools.reduce(lambda x,y: x & y, iterable)
points = {user: sum(posts[post] for post in common) for user,posts in results.items()}
See if this works.

Django, SQLite - Accurate ordering of strings with accented letters

Main problem:
I have a Python (3.4) Django (1.6) web app using an SQLite (3) database containing a table of authors. When I get the ordered list of authors some names with accented characters like ’Čapek’ and ’Örkény’ are the end of list instead of at (or directly after) section ’c’ and ’o’ of the list.
My 1st try:
SQLite can accept collation definitions. I searched for one that was made to order UTF-8 strings correctly for example Localized and Unicode collation in Android (Accented Search in sqlite (android)) but found none.
My 2nd try:I found an old closed Django ticket about my problem: https://code.djangoproject.com/ticket/8384 It suggests sorting with Python as workaround. I found it quite unsatisfying. Firstly if I sort with a Python method (like below) instead of ordering at model level I cannot use generic views. Secondly ordering with a Python method returns the very same result as the SQLite order_by does: ’Čapek’ and ’Örkény’ are placed after section 'z'.
author_list = sorted(Author.objects.all(), key=lambda x: (x.lastname, x.firstname))
How could I get the queryset ordered correctly?
Thanks to the link CL wrote in his comment, I managed to overcome the difficulties that I replied about. I answer my question to share the piece of code that worked because using Pyuca to sort querysets seems to be a rare and undocumented case.
# import section
from pyuca import Collator
# Calling Collator() takes some seconds so you should create it as reusable variable.
c = Collator()
# ...
# main part:
author_list = sorted(Author.objects.all(), key=lambda x: (c.sort_key(x.lastname), c.sort_key(x.firstname)))
The point is to use sort_key method with the attribute you want to sort by as argument. You can sort by multiple attributes as you see in the example.
Last words: In my language (Hungarian) we use four different accented version of the Latin letter ‘o’: ‘o’, ’ó’, ’ö’, ’ő’. ‘o’ and ‘ó’ are equal in sorting, and ‘ö’ and ‘ő’ are equal too, and ‘ö’/’ő’ are after ‘o’/’ó’. In the default collation table the four letters are equal. Now I try to find a way to define or find a localized collation table.
You could create a new field in the table, fill it with the result of unidecode, then sort according to it.
Using a property to provide get/set methods could help in keeping the fields in sync.

singular or plural identifier for a dictionary?

When naming a container , what's a better coding style:
source = {}
#...
source[record] = some_file
or
sources = {}
#...
sources[record] = some_file
The plural reads more natural at creation; the singular at assignment.
And it is not an idle question; I did catch myself getting confused in an old code when I wasn't sure if a variable was a container or a single value.
UPDATE
It seems there's a general agreement that when the dictionary is used as a mapping, it's better to use a more detailed name (e.g., recordToSourceFilename); and if I absolutely want to use a short name, then make it plural (e.g., sources).
I think that there are two very specific use cases with dictionaries that should be identified separately. However, before addressing them, it should be noted that the variable names for dictionaries should almost always be singular, while lists should almost always be plural.
Dictionaries as object-like entities: There are times when you have a dictionary that represents some kind of object-like data structure. In these instances, the dictionary almost always refers to a single object-like data structure, and should therefore be singular. For example:
# assume that users is a list of users parsed from some JSON source
# assume that each user is a dictionary, containing information about that user
for user in users:
print user['name']
Dictionaries as mapping entities: Other times, your dictionary might be behaving more like a typical hash-map. In such a case, it is best to use a more direct name, though still singular. For example:
# assume that idToUser is a dictionary mapping IDs to user objects
user = idToUser['0001a']
print user.name
Lists: Finally, you have lists, which are an entirely separate idea. These should almost always be plural, because they are simple a collection of other entities. For example:
users = [userA, userB, userC] # makes sense
for user in users:
print user.name # especially later, in iteration
I'm sure that there are some obscure or otherwise unlikely situations that might call for some exceptions to be made here, but I feel that this is a pretty strong guideline to follow when naming dictionaries and lists, not just in Python but in all languages.
It should be plural because then the program behaves just like you read it aloud. Let me show you why it should not be singular (totally contrived example):
c = Customer(name = "Tony")
c.persist()
[...]
#
# 500 LOC later, you retrieve the customer list as a mapping from
# customer ID to Customer instance.
#
# Singular
customer = fetchCustomerList()
nameOfFirstCustomer = customer[0].name
for c in customer: # obviously it's totally confusing once you iterate
...
# Plural
customers = fetchCustomerList()
nameOfFirstCustomer = customers[0].name
for customer in customers: # yeah, that makes sense!!
...
Furthermore, sometimes it's a good idea to have even more explicit names from which you can infer the mapping (for dictionaries) and probably the type. I usually add a simple comment when I introduce a dictionary variable. An example:
# Customer ID => Customer
idToCustomer = {}
[...]
idToCustomer[1] = Customer(name = "Tony")
I prefer plurals for containers. There's just a certain understandable logic in using:
entries = []
for entry in entries:
#Code...

What is the most efficient way to do a ONE:ONE relation on google app engine datastore

Even with all I do know about the AppEngine datastore, I don't know the answer to this. I'm trying to avoid having to write and run all the code it would take to figure it out, hoping someone already knows the answer.
I have code like:
class AddlInfo(db.Model)
user = db.ReferenceProperty(User)
otherstuff = db.ListProperty(db.Key, indexed=False)
And create the record with:
info = AddlInfo(user=user)
info.put()
To get this object I can do something like:
# This seems excessively wordy (even though that doesn't directly translate into slower)
info = AddlInfo.all().filter('user =', user).fetch(1)
or I could do something like:
class AddlInfo(db.Model)
# str(user.key()) is the key to this record
otherstuff = db.ListProperty(db.Key, indexed=False)
Creation looks like:
info = AddlInfo(key_name=str(user.key()))
info.put()
And then get the info with:
info = AddlInfo.get(str(user.key()))
I don't need the reference_property in the AddlInfo, (I got there using the user object in the first place). Which is faster/less resource intensive?
==================
Part of why I was doing it this way is that otherstuff could be a list of 100+ keys and I only need them sometimes (probably less than 50% of the time) I was trying to make it more efficient by not having to load those 100+ keys on every request.....
Between those 2 options, the second is marginally cheaper, because you're determining the key by inference rather than looking it up in a remote index.
As Wooble said, it's cheaper still to just keep everything on one entity. Consider an Expando if you just need a way to store a bunch of optional, ad-hoc properties.
The second approach is the better one, with one modification: There's no need to use the whole key of the user as the key name of this entity - just use the same key name as the User record.

Categories