I'm relatively new to Django and face a problem that I couldn't solve yet:
I have two models which look like:
class Item(models.Model):
char1 = models.CharField(max_length=200)
char2 = models.CharField(max_length=200)
class Entry(models.Model):
item = models.ForeignKey(Item)
choice = models.IntegerField()
I have stored many Items in my database, and I want basically in one view to randomly iterate through all the stored Items, and for each Item display char1 and char2 with an IntegerField and a 'next' button, that stores a new Entry (with the actual Item, and typed integer) in my database and directs me to the next (random) Item.
During research I found for example the form wizard and formsets, but this is not what I want, the wizard needs multiple form models that he can display successive, but I want to display (randomly) each instance of only one model (Item) and store one Entry for each.
I hope someone can give me a hint where to look for, because nowhere I found a documentation/tutorial for this use case, and since I'm not very experienced with Django, I can't figure it out at the moment...
Best regards and thanks in advance!
Judging by your title, the problem is the iterating over multiple items from a single (possibly random/unsorted) model.
If I'm not mistaken, what you are looking for is Pagination. From that text, a small example is:
>>> from django.core.paginator import Paginator
>>> objects = ['john', 'paul', 'george', 'ringo']
>>> p = Paginator(objects, 2)
>>> p.count
4
>>> p.num_pages
2
>>> p.page_range
[1, 2]
Although a list is shown above, Paginator can also be used on Django QuerySets, and it's functionality incorporated into Django Templates.
Let me know if this isn't what you're after.
Cheer.
Paulo Bu's answer provides the way to get the random ordering in the first place from Django's AP. The tricky part about what you're doing is that it's not really RESTful to save the particular random ordering of the items between page loads, because that is not stateless. By default, your randomly ordered queryset is going to fall out of existence as soon as you serve the request, and there will be no guarantee that you will circulate through all the items instead of getting repeats and misses. So you'll want to save that ordering. There are a bunch of options for how you might approach this:
Serve the entire randomized list of item IDs with every request, and have the backend serve up the data for the current item by index
Serve all of the data -- full items and entries -- then you can either render everything client-side
Store the randomized list as a session variable
Store a permanent random ordering of the items by adding a float between 0 and 1 to every Item, ordering on that float, and starting at a random index (if you don't care whether each user has the same overall permutation)
Related
tl;dr: I want to express something like [child.child_field_value, child.parent_field_value] on a Django child model and get an iterable like ['Alsatian', 'Dog'] or similar.
Context: I'm trying to prepare a dict for a JSON API in Django, such that I have two models, Evaluation and its parent Charity.
In the view I filter for all Evaluations meeting certain parameters, and then use a dict comp nexted in a list comp on evaluation.__dict__.items() to drop Django's '_state' field (this isn't the focus of this question, but please tell me if you know a better practice!):
response = { 'evaluations': [{
key:value for key, value in evaluation.__dict__.items()
if key not in ['_state']} for evaluation in evaluations]}
But I want a good way to combine the fields charity_name and charity_abbreviation of each Evaluation's parent charity with the rest of that evaluation's fields. So far the best way I can find/think of is during the dict comp to conditionally check whether the field we're iterating through is charity_id and if so to look up that charity and return an array of the two fields.
But I haven't figured out how to do that, and it seems likely to end up with something very messy which isn't isn't functionally ideal, since I'd rather that array was two key:value pairs in line with the rest of the dictionary.
This is a specific question based on an attempt to increase performance speeds when querying data.
So I was trying to figure out within a Django template itself, how to break a conditional on the first attempt. It seems that it is not possible, and the suggestions were all to use the views.py for the logic instead. This led me to try filtering based on a condition being met.
In my example, I have two scenarios that I'm comparing.
(1) In the first one, I have one query inside my views.py to get all items. Naturally, Item is a schema I have in my models. Anyway, inside the template I want to render, I have the context being passed and have 11 separate forloops all iterating over the same all_items loop. Then based on the condition (ie item.category), the appropriate html is rendered.
Again, what I wanted to do was have 1 loop, and then based on the condition, render to the appropriate places, but only go through that all_items loop once. Unfortunately, I'm not able to break the loop in the template after the condition to not over-render html I don't want on each successive iteration.
So this led me the next scenario:
(2) In my views I created 11 separate queries (eg item.objects.filter(category='1'), item.objects.filter(category='2', etc.). I in turn assign each one of those to 11 separate variables passed through my views as context to render in the template. This also allows me to remove the conditional in the template where I check for the category.
What I'm wondering is, in the latter example, is each query filter the same as running the same forloop for all_items in the template, just done a little differently? Am I saving any time with the latter? It is hard to say just based on my user experience, so any insights would be great. Thank you.
If you have Item model containing category and name field you can use values() and single query with filter() as:
categories = ['1', '2', ... '11']
items = item.objects.filter(category__in=categories).values('category','name')
What you have in output is:
<QuerySet [{'category': '1', 'name': 'name1'}, {'category': 2, 'name': 'name2'}, ..., {'category': '11', 'name': 'name11'}]>
You can loop items result without all those separate database queries.
Please comment in case of any questions.
P.S. I used name field as an example, it could be any field presented in your model.
I'm making an application in which a user can create categories to put items in them. The items share some basic properties, but the rest of them are defined by the category they belong to. The problem is that both the category and it's special properties are created by the user.
For instance, the user may create two categories: books and buttons. In the 'book' category he may create two properties: number of pages and author. In the buttons category he may create different properties: number of holes and color.
Initially, I placed these properties in a JsonProperty inside the Item. While this works, it means that I query the Datastore just by specifying the category that I am looking for and then I have to filter the results of the query in the code. For example, if I'm looking for all the books whose author is Carl Sagan, I would query the Item class with category == books and the loop through the results to keep only those that match the author.
While I don't really expect to have that many items per category (probably in the hundreds, unlikely to get to one thousand), this looks inefficient. So I tried to use ndb.Expando to make those special properties real properties that are indexed. I did this, adding the corresponding special properties to the item when putting it to the Datastore. So if the user creates an Item in the 'books' category and previously created in that category the special property 'author', an Item is saved with the special property expando_author = author in it. It worked as I expected until this point (dev server).
The real problem though became visible when I did some queries. While they worked in the dev server, they created composite indexes for each special/expando property, even if the query filters were equality only. And while each category can have at most five properties, it is evident that it can easily get out of control.
Example query:
items = Item.query()
for p in properties:
items = items.filter(ndb.GenericProperty(p)==properties[p])
items.fetch()
Now, since I don't know in advance what the properties will be (though I will limit it to 5), I can't build the indexes before uploading the application, and even if I knew it would probably mean having more indexes that I'm comfortable with. Is Expando the wrong tool for what I'm trying to do? Should I just keep filtering the results in the code using the JsonProperty? I would greatly appreciate any advice I can get.
PD. To make this post shorter I omitted a few details about what I did, if you need to know something I may have left out just ask in the comments.
Consider storing category's properties in a single list property prefixed with category property name.
Like (forget me I forgot exact Python syntax, switched to Go)
class Item():
props = StringListProperty()
book = Item(category='book', props=['title:Carl Sagan'])
button = Item(category='button', props=['wholes:5'])
Then you can do have a single composite index on category+props and do queries like this:
def filter_items(category, propName, propValue):
Item.filter(Item.category == category).filter(Item.props==propName+':'+propValue)
And you would need a function on Item to get property values cleaned up from prop names.
I am working on a web crawler for python that gathers information on posts by users on a site and compares their scores for posts all provided users participate in. It is currently structured so that I receive the following data:
results is a dictionary indexed by username that contains dictionaries of each user's history in a post, points key value structure.
common is a list that starts with all the posts in the dictionary of the first user in results. This list should be filtered down to only the posts all users have in common
points is a dictionary indexed by username that keeps a running total of points on shared posts.
My filtering code is below:
common = list(results.values()[0].keys())
for user in results:
for post_hash in common:
if post_hash not in results[user]:
common.remove(post_hash)
else:
points[user] += results[user][post_hash]
The issue I'm encountering is that this doesn't actually filter out posts that aren't shared, and thus, doesn't provide accurate point values.
What am I doing wrong with my structure, and is there any easier way to find only the common posts?
I think you may have two issues:
Using a list for common means that when you remove an item via common.remove, it will only remove the first item it finds (there could be more)
You're not just adding points for posts shared by all users - you're adding points for users as you encounter them - before you know if that post is shared by everyone or not
Without some actual data to play with, it's a little difficult to write working code, but try this:
# this should give us a list of posts shared by all users
common = set.intersection(*[set(k.keys()) for k in results.values()])
# there's probably a more efficient (functional) way of summing the points
# by user instead of looping, but simple is good.
for user in results:
for post_hash in common:
points[user] += results[user][post_hash]
from collections import Counter
from functools import reduce
posts = []
# Create an array of all the post hashes
for p in results.values():
posts.extend(p.keys())
# use Counter to create a dictionary like object that where the key
# is the post hash and the value is the number of occurrences
posts = Counter(posts)
for user in results:
# Reduce only the posts that show up more than once.
points[user] = reduce(lambda x,y: x+y, (post for post in user if posts[post] > 1))
import functools
iterable = (v.keys() for v in results.values())
common = funtools.reduce(lambda x,y: x & y, iterable)
points = {user: sum(posts[post] for post in common) for user,posts in results.items()}
See if this works.
I'm developing a simple Blogging/Bookmarking platform and I'm trying to add a tags-explorer/drill-down feature a là delicious to allow users to filter the posts specifying a list of specific tags.
Something like this:
Posts are represented in the datastore with this simplified model:
class Post(db.Model):
title = db.StringProperty(required = True)
link = db.LinkProperty(required = True)
description = db.StringProperty(required = True)
tags = db.ListProperty(str)
created = db.DateTimeProperty(required = True, auto_now_add = True)
Post's tags are stored in a ListProperty and, in order to retrieve the list of posts tagged with a specific list of tags, the Post model exposes the following static method:
#staticmethod
def get_posts(limit, offset, tags_filter = []):
posts = Post.all()
for tag in tags_filter:
if tag:
posts.filter('tags', tag)
return posts.fetch(limit = limit, offset = offset)
This works well, although I've not stressed it too much.
The problem raises when I try to add a "sorting" order to the get_posts method to keep the result ordered by "-created" date:
#staticmethod
def get_posts(limit, offset, tags_filter = []):
posts = Post.all()
for tag in tags_filter:
if tag:
posts.filter('tags', tag)
posts.order("-created")
return posts.fetch(limit = limit, offset = offset)
The sorting order adds an index for each tag to filter, leading to the dreaded exploding indexes problem.
One last thing that makes this thing more complicated is that the get_posts method should provide some pagination mechanism.
Do you know any Strategy/Idea/Workaround/Hack to solve this problem?
Queries involving keys use indexes
just like queries involving
properties. Queries on keys require
custom indexes in the same cases as
with properties, with a couple of
exceptions: inequality filters or an
ascending sort order on key do not
require a custom index, but a
descending sort order on
Entity.KEY_RESERVED_PROPERTY_key_
does.
So use a sortable date string for the primary key of the entity:
class Post(db.Model):
title = db.StringProperty(required = True)
link = db.LinkProperty(required = True)
description = db.StringProperty(required = True)
tags = db.ListProperty(str)
created = db.DateTimeProperty(required = True, auto_now_add = True)
#classmethod
def create(*args, **kw):
kw.update(dict(key_name=inverse_millisecond_str() + disambig_chars()))
return Post(*args, **kw)
...
def inverse_microsecond_str(): #gives string of 8 characters from ascii 23 to 'z' which sorts in reverse temporal order
t = datetime.datetime.now()
inv_us = int(1e16 - (time.mktime(t.timetuple()) * 1e6 + t.microsecond)) #no y2k for >100 yrs
base_100_chars = []
while inv_us:
digit, inv_us = inv_us % 100, inv_us / 100
base_100_str = [chr(23 + digit)] + base_100_chars
return "".join(base_100_chars)
Now, you don't even have to include a sort order in your queries, although it won't hurt to explicitly sort by key.
Things to remember:
This won't work unless you use the "create" here for all your Posts.
You'll have to migrate old data
No ancestors allowed.
The key is stored once per index, so it is worthwhile to keep it short; that's why I'm doing the base-100 encoding above.
This is not 100% reliable because of the possibility of key collisions. The above code, without disambig_chars, nominally gives reliability of the number of microseconds between transactions, so if you had 10 posts per second at peak times, it would fail 1/100,000. However, I'd shave off a couple orders of magnitude for possible app engine clock tick issues, so I'd actually only trust it for 1/1000. If that's not good enough, add disambig_chars; and if you need 100% reliability, then you probably shouldn't be on app engine, but I guess you could include logic to handle key collisions on save().
What if you inverted the relationship? Instead of a post with a list of tags you would have a tag entity with a list of posts.
class Tag(db.Model):
tag = db.StringProperty()
posts = db.ListProperty(db.Key, indexed=False)
To search for tags you would do tags = Tag.all().filter('tag IN', ['python','blog','async'])
This would give you hopefully 3 or more Tag entities, each with a list of posts that are using that tag. You could then do post_union = set(tags[0].posts).intersection(tags[1].posts, tags[2].posts) to find the set of posts that have all tags.
Then you could fetch those posts and order them by created (I think). Posts.all().filter('__key__ IN', post_union).order("-created")
Note: This code is off the top of my head, I can't remember if you can manipulate sets like that.
Edit: #Yasser pointed out that you can only do IN queries for < 30 items.
Instead you could have the key name for each post start with the creation time. Then you could sort the keys you retrieved via the first query and just do Posts.get(sorted_posts).
Don't know how this would scale to a system with millions of posts and/or tags.
Edit2: I meant set intersection, not union.
This question sounds similar to:
Data Modelling Advice for Blog Tagging system on Google App Engine
Mapping Data for a Google App Engine Blog Application:
parent->child relationships in appengine python (bigtable)
As pointed by Robert Kluin in the last one, you could also consider using a pattern similar to "Relation Index" as described in this Google I/O presentation.
# Model definitions
class Article(db.Model):
title = db.StringProperty()
content = db.StringProperty()
class TagIndex(db.Model):
tags = db.StringListProperty()
# Tags are child entities of Articles
article1 = Article(title="foo", content="foo content")
article1.put()
TagIndex(parent=article1, tags=["hop"]).put()
# Get all articles for a given tag
tags = db.GqlQuery("SELECT __key__ FROM Tag where tags = :1", "hop")
keys = (t.parent() for t in tags)
articles = db.get(keys)
Depending on how many Page you expect back by Tags query, sorting could either be made in memory or by making the date string representation part of Article key_name
Updated with StringListProperty and sorting notes after Robert Kluin and Wooble comments on #appengine IRC channel.
One workaround could be this:
Sort and concatenate a post's tags with a delimiter like | and store them as a StringProperty when storing a post. When you receive the tags_filter, you can sort and concatenate them to create a single StringProperty filter for the posts. Obviously this would be an AND query and not an OR query but thats what your current code seems to be doing as well.
EDIT: as rightly pointed out, this would only match exact tag list not partial tag list, which is obviously not very useful.
EDIT: what if you model your Post model with boolean placeholders for tags e.g. b1, b2, b3 etc. When a new tag is defined, you can map it to the next available placeholder e.g. blog=b1, python=b2, async=b3 and keep the mapping in a separate entity. When a tag is assigned to a post, you just switch its equivalent placeholder value to True.
This way when you receive a tag_filter set, you can construct your query from the map e.g.
Post.all().filter("b1",True).filter("b2",True).order('-created')
can give you all the posts which have tags python and blog.