I have a QuerySet which returns a list of user-defined tags. In some cases, I'd like to exclude any of the tags that start with the word "Local", but this seems to be causing me problems.
The following examples work when I'm testing for other values (like HVAC below):
queryset = queryset.exclude(tags__tag__tag_name__icontains = 'HVAC')
queryset = queryset.exclude(tags__tag__tag_name__istartswith = 'HVAC')
but when I try the same with "Local", it excludes everything, not just the values that contain or start with the word "Local". Both examples below exclude everything:
queryset = queryset.exclude(tags__tag__tag_name__icontains = 'Local')
queryset = queryset.exclude(tags__tag__tag_name__istartswith = 'Local')
As an additional note, the following does work, but it only excludes that exact value and I can't anticipate / list all of the values that start with "Local":
queryset = queryset.exclude(tags__tag__tag_name = 'Local 123')
My best guess is that "Local" is a reserved word in python? Any ideas on ways around this or is there something else I'm missing?
I don't know if this is exactly the right way to deal with this issue, but since per #WillemVanOnsem it seems like I was excluding all model objects that have at least one tag with 'Local' in it (when using both exclude and filter as far as I can tell), instead I ended up creating a new list of all values that don't contain "Local" and returning that list instead of the original queryset.
newQueryset = list()
for item in list(queryset):
if not 'local' in str(item['tags__tag__tag_name']).lower():
newQueryset.append(item)
return newQueryset
Related
I'm making an application in which a user can create categories to put items in them. The items share some basic properties, but the rest of them are defined by the category they belong to. The problem is that both the category and it's special properties are created by the user.
For instance, the user may create two categories: books and buttons. In the 'book' category he may create two properties: number of pages and author. In the buttons category he may create different properties: number of holes and color.
Initially, I placed these properties in a JsonProperty inside the Item. While this works, it means that I query the Datastore just by specifying the category that I am looking for and then I have to filter the results of the query in the code. For example, if I'm looking for all the books whose author is Carl Sagan, I would query the Item class with category == books and the loop through the results to keep only those that match the author.
While I don't really expect to have that many items per category (probably in the hundreds, unlikely to get to one thousand), this looks inefficient. So I tried to use ndb.Expando to make those special properties real properties that are indexed. I did this, adding the corresponding special properties to the item when putting it to the Datastore. So if the user creates an Item in the 'books' category and previously created in that category the special property 'author', an Item is saved with the special property expando_author = author in it. It worked as I expected until this point (dev server).
The real problem though became visible when I did some queries. While they worked in the dev server, they created composite indexes for each special/expando property, even if the query filters were equality only. And while each category can have at most five properties, it is evident that it can easily get out of control.
Example query:
items = Item.query()
for p in properties:
items = items.filter(ndb.GenericProperty(p)==properties[p])
items.fetch()
Now, since I don't know in advance what the properties will be (though I will limit it to 5), I can't build the indexes before uploading the application, and even if I knew it would probably mean having more indexes that I'm comfortable with. Is Expando the wrong tool for what I'm trying to do? Should I just keep filtering the results in the code using the JsonProperty? I would greatly appreciate any advice I can get.
PD. To make this post shorter I omitted a few details about what I did, if you need to know something I may have left out just ask in the comments.
Consider storing category's properties in a single list property prefixed with category property name.
Like (forget me I forgot exact Python syntax, switched to Go)
class Item():
props = StringListProperty()
book = Item(category='book', props=['title:Carl Sagan'])
button = Item(category='button', props=['wholes:5'])
Then you can do have a single composite index on category+props and do queries like this:
def filter_items(category, propName, propValue):
Item.filter(Item.category == category).filter(Item.props==propName+':'+propValue)
And you would need a function on Item to get property values cleaned up from prop names.
Let's say I have a list of people that can be "followed".
I'd like to iterate through all the people that a certain user is following and grab posts from all of those users in the form of a queryset.
I understand that I can combine querysets by using chain or |, but I'm a bit confused when it comes to combining querysets that I might grab from looping through everyone being followed.
following = UserFollows.objects.filter(user_id = user.id)
for follow in following.iterator():
UserPost.objects.filter(user=follow.user) #what do I do with this?
How would I combine those if I cant explicitly name them to chain or '|'?
You can do something like this:
following = UserFollows.objects.filter(user__id = user.id).select_related('user')
users_ids = [follow.user.id for follow in following]
posts = UserPost.objects.filter(user__id__in=users_ids)
but look that it is quite expensive operation so it's good to add select_related() method to fetch users in one query. I think you should also consider to cache the users_ids list before get it from database.
Have you tried something like
following = UserFollows.objects.filter(user_id = user.id)
q = UserPost.objects.filter(user=following[0].user)
for follow in following[1:]:
q = q | UserPost.objects.filter(user=follow.user)
Consider an array of Tags, T.
Each PhotoSet has a many-to-many relationship to Tags.
We also have a filter, F (consisting of a set of Tags), and we want to return all PhotoSets who have ALL the tags contained in F.
i.e,. if F = ['green', 'dogs', 'cats'], we want every PhotoSet instance that has all the tags in F.
Naturally
PhotoSet.objects.filter(tags__in=F)
Does not do the trick, since it returns every PhotoSet contain any member of F.
I see it's possible to use similar things using "Q" expressions, but that only seemed for a finite amount of conjunctive parameters. Is this something that can be done using a list comprehension??
Thanks in advance!
EDIT -- SOLUTION:
I found the solution using an obvious way. Simply chaining filters...
results = PhotoSets.objects
for f in F:
results = results.filter(tags__in=[f])
results = results.all()
Was staring me in the face the whole time!
Little quick and dirty, but it'll do the trick:
query = None
for tag in F:
if query is None:
query = Q(tags=tag)
else:
query &= Q(tags=tag)
PhotoSet.objects.filter(query)
When naming a container , what's a better coding style:
source = {}
#...
source[record] = some_file
or
sources = {}
#...
sources[record] = some_file
The plural reads more natural at creation; the singular at assignment.
And it is not an idle question; I did catch myself getting confused in an old code when I wasn't sure if a variable was a container or a single value.
UPDATE
It seems there's a general agreement that when the dictionary is used as a mapping, it's better to use a more detailed name (e.g., recordToSourceFilename); and if I absolutely want to use a short name, then make it plural (e.g., sources).
I think that there are two very specific use cases with dictionaries that should be identified separately. However, before addressing them, it should be noted that the variable names for dictionaries should almost always be singular, while lists should almost always be plural.
Dictionaries as object-like entities: There are times when you have a dictionary that represents some kind of object-like data structure. In these instances, the dictionary almost always refers to a single object-like data structure, and should therefore be singular. For example:
# assume that users is a list of users parsed from some JSON source
# assume that each user is a dictionary, containing information about that user
for user in users:
print user['name']
Dictionaries as mapping entities: Other times, your dictionary might be behaving more like a typical hash-map. In such a case, it is best to use a more direct name, though still singular. For example:
# assume that idToUser is a dictionary mapping IDs to user objects
user = idToUser['0001a']
print user.name
Lists: Finally, you have lists, which are an entirely separate idea. These should almost always be plural, because they are simple a collection of other entities. For example:
users = [userA, userB, userC] # makes sense
for user in users:
print user.name # especially later, in iteration
I'm sure that there are some obscure or otherwise unlikely situations that might call for some exceptions to be made here, but I feel that this is a pretty strong guideline to follow when naming dictionaries and lists, not just in Python but in all languages.
It should be plural because then the program behaves just like you read it aloud. Let me show you why it should not be singular (totally contrived example):
c = Customer(name = "Tony")
c.persist()
[...]
#
# 500 LOC later, you retrieve the customer list as a mapping from
# customer ID to Customer instance.
#
# Singular
customer = fetchCustomerList()
nameOfFirstCustomer = customer[0].name
for c in customer: # obviously it's totally confusing once you iterate
...
# Plural
customers = fetchCustomerList()
nameOfFirstCustomer = customers[0].name
for customer in customers: # yeah, that makes sense!!
...
Furthermore, sometimes it's a good idea to have even more explicit names from which you can infer the mapping (for dictionaries) and probably the type. I usually add a simple comment when I introduce a dictionary variable. An example:
# Customer ID => Customer
idToCustomer = {}
[...]
idToCustomer[1] = Customer(name = "Tony")
I prefer plurals for containers. There's just a certain understandable logic in using:
entries = []
for entry in entries:
#Code...
I'm developing a simple Blogging/Bookmarking platform and I'm trying to add a tags-explorer/drill-down feature a là delicious to allow users to filter the posts specifying a list of specific tags.
Something like this:
Posts are represented in the datastore with this simplified model:
class Post(db.Model):
title = db.StringProperty(required = True)
link = db.LinkProperty(required = True)
description = db.StringProperty(required = True)
tags = db.ListProperty(str)
created = db.DateTimeProperty(required = True, auto_now_add = True)
Post's tags are stored in a ListProperty and, in order to retrieve the list of posts tagged with a specific list of tags, the Post model exposes the following static method:
#staticmethod
def get_posts(limit, offset, tags_filter = []):
posts = Post.all()
for tag in tags_filter:
if tag:
posts.filter('tags', tag)
return posts.fetch(limit = limit, offset = offset)
This works well, although I've not stressed it too much.
The problem raises when I try to add a "sorting" order to the get_posts method to keep the result ordered by "-created" date:
#staticmethod
def get_posts(limit, offset, tags_filter = []):
posts = Post.all()
for tag in tags_filter:
if tag:
posts.filter('tags', tag)
posts.order("-created")
return posts.fetch(limit = limit, offset = offset)
The sorting order adds an index for each tag to filter, leading to the dreaded exploding indexes problem.
One last thing that makes this thing more complicated is that the get_posts method should provide some pagination mechanism.
Do you know any Strategy/Idea/Workaround/Hack to solve this problem?
Queries involving keys use indexes
just like queries involving
properties. Queries on keys require
custom indexes in the same cases as
with properties, with a couple of
exceptions: inequality filters or an
ascending sort order on key do not
require a custom index, but a
descending sort order on
Entity.KEY_RESERVED_PROPERTY_key_
does.
So use a sortable date string for the primary key of the entity:
class Post(db.Model):
title = db.StringProperty(required = True)
link = db.LinkProperty(required = True)
description = db.StringProperty(required = True)
tags = db.ListProperty(str)
created = db.DateTimeProperty(required = True, auto_now_add = True)
#classmethod
def create(*args, **kw):
kw.update(dict(key_name=inverse_millisecond_str() + disambig_chars()))
return Post(*args, **kw)
...
def inverse_microsecond_str(): #gives string of 8 characters from ascii 23 to 'z' which sorts in reverse temporal order
t = datetime.datetime.now()
inv_us = int(1e16 - (time.mktime(t.timetuple()) * 1e6 + t.microsecond)) #no y2k for >100 yrs
base_100_chars = []
while inv_us:
digit, inv_us = inv_us % 100, inv_us / 100
base_100_str = [chr(23 + digit)] + base_100_chars
return "".join(base_100_chars)
Now, you don't even have to include a sort order in your queries, although it won't hurt to explicitly sort by key.
Things to remember:
This won't work unless you use the "create" here for all your Posts.
You'll have to migrate old data
No ancestors allowed.
The key is stored once per index, so it is worthwhile to keep it short; that's why I'm doing the base-100 encoding above.
This is not 100% reliable because of the possibility of key collisions. The above code, without disambig_chars, nominally gives reliability of the number of microseconds between transactions, so if you had 10 posts per second at peak times, it would fail 1/100,000. However, I'd shave off a couple orders of magnitude for possible app engine clock tick issues, so I'd actually only trust it for 1/1000. If that's not good enough, add disambig_chars; and if you need 100% reliability, then you probably shouldn't be on app engine, but I guess you could include logic to handle key collisions on save().
What if you inverted the relationship? Instead of a post with a list of tags you would have a tag entity with a list of posts.
class Tag(db.Model):
tag = db.StringProperty()
posts = db.ListProperty(db.Key, indexed=False)
To search for tags you would do tags = Tag.all().filter('tag IN', ['python','blog','async'])
This would give you hopefully 3 or more Tag entities, each with a list of posts that are using that tag. You could then do post_union = set(tags[0].posts).intersection(tags[1].posts, tags[2].posts) to find the set of posts that have all tags.
Then you could fetch those posts and order them by created (I think). Posts.all().filter('__key__ IN', post_union).order("-created")
Note: This code is off the top of my head, I can't remember if you can manipulate sets like that.
Edit: #Yasser pointed out that you can only do IN queries for < 30 items.
Instead you could have the key name for each post start with the creation time. Then you could sort the keys you retrieved via the first query and just do Posts.get(sorted_posts).
Don't know how this would scale to a system with millions of posts and/or tags.
Edit2: I meant set intersection, not union.
This question sounds similar to:
Data Modelling Advice for Blog Tagging system on Google App Engine
Mapping Data for a Google App Engine Blog Application:
parent->child relationships in appengine python (bigtable)
As pointed by Robert Kluin in the last one, you could also consider using a pattern similar to "Relation Index" as described in this Google I/O presentation.
# Model definitions
class Article(db.Model):
title = db.StringProperty()
content = db.StringProperty()
class TagIndex(db.Model):
tags = db.StringListProperty()
# Tags are child entities of Articles
article1 = Article(title="foo", content="foo content")
article1.put()
TagIndex(parent=article1, tags=["hop"]).put()
# Get all articles for a given tag
tags = db.GqlQuery("SELECT __key__ FROM Tag where tags = :1", "hop")
keys = (t.parent() for t in tags)
articles = db.get(keys)
Depending on how many Page you expect back by Tags query, sorting could either be made in memory or by making the date string representation part of Article key_name
Updated with StringListProperty and sorting notes after Robert Kluin and Wooble comments on #appengine IRC channel.
One workaround could be this:
Sort and concatenate a post's tags with a delimiter like | and store them as a StringProperty when storing a post. When you receive the tags_filter, you can sort and concatenate them to create a single StringProperty filter for the posts. Obviously this would be an AND query and not an OR query but thats what your current code seems to be doing as well.
EDIT: as rightly pointed out, this would only match exact tag list not partial tag list, which is obviously not very useful.
EDIT: what if you model your Post model with boolean placeholders for tags e.g. b1, b2, b3 etc. When a new tag is defined, you can map it to the next available placeholder e.g. blog=b1, python=b2, async=b3 and keep the mapping in a separate entity. When a tag is assigned to a post, you just switch its equivalent placeholder value to True.
This way when you receive a tag_filter set, you can construct your query from the map e.g.
Post.all().filter("b1",True).filter("b2",True).order('-created')
can give you all the posts which have tags python and blog.