App Engine Django Form Uniqueness Validation?

App Engine Django Form Uniqueness Validation? - python

Is there a simpler way to use uniqueness validation with Django Forms in AppEngine?
I understand that performance would be problem if we keep an uniqueness constraint but since the amount of data being added is very small performance is not a big concern, rather development time is a concern here.
Any help is appreciated.

You can use keys for uniqueness:
The complete key of an entity,
including the path, the kind and the
name or numeric ID, is unique and
specific to that entity. The complete
key is assigned when the entity is
created in the datastore, and none of
its parts can change...
Every entity has an identifier. An
application can assign its own
identifier for use in the key by
giving the instance constructor a
key_name argument (a str value):
s = Story(key_name="xzy123")
...Once the entity has been created, its
ID or name cannot be changed.
EDIT
As jbochi noted, this could be dangerous and you could loss data. Another way to achieve the same is using an hash function with shard counters. A good example is showed in "Paging through large datasets" article.

Related

How to model a unique constraint in GAE ndb

I want to have several "bundles" (Mjbundle), which essentially are bundles of questions (Mjquestion). The Mjquestion has an integer "index" property which needs to be unique, but it should only be unique within the bundle containing it. I'm not sure how to model something like this properly, I try to do it using a structured (repeating) property below, but there is yet nothing actually constraining the uniqueness of the Mjquestion indexes. What is a better/normal/correct way of doing this?
class Mjquestion(ndb.Model):
"""This is a Mjquestion."""
index = ndb.IntegerProperty(indexed=True, required=True)
genre1 = ndb.IntegerProperty(indexed=False, required=True, choices=[1,2,3,4,5,6,7])
genre2 = ndb.IntegerProperty(indexed=False, required=True, choices=[1,2,3])
#(will add a bunch of more data properties later)
class Mjbundle(ndb.Model):
"""This is a Mjbundle."""
mjquestions = ndb.StructuredProperty(Mjquestion, repeated=True)
time = ndb.DateTimeProperty(auto_now_add=True)
(With the above model and having fetched a certain Mjbundle entity, I am not sure how to quickly fetch a Mjquestion from mjquestions based on the index. The explanation on filtering on structured properties looks like it works on the Mjbundle type level, whereas I already have a Mjbundle entity and was not sure how to quickly query only on the questions contained by that entity, without looping through them all "manually" in code.)
So I'm open to any suggestion on how to do this better.
I read this informational answer: https://stackoverflow.com/a/3855751/129202 It gives some thoughts about scalability and on a related note I will be expecting just a couple of bundles but each bundle will have questions in the thousands.
Maybe I should not use the mjquestions property of Mjbundle at all, but rather focus on parenting: each Mjquestion created should have a certain Mjbundle entity as parent. And then "manually" enforce uniqueness at "insert time" by doing an ancestor query.

When you use a StructuredProperty, all of the entities that type are stored as part of the containing entity - so when you fetch your bundle, you have already fetched all of the questions. If you stick with this way of storing things, iterating to check in code is the solution.

How to implement composition/agregation with NDB on GAE

How do we implement agregation or composition with NDB on Google App Engine ? What is the best way to proceed depending on use cases ?
Thanks !
I've tried to use a repeated property. In this very simple example, a Project have a list of Tag keys (I have chosen to code it this way instead of using StructuredProperty because many Project objects can share Tag objects).
class Project(ndb.Model):
name = ndb.StringProperty()
tags = ndb.KeyProperty(kind=Tag, repeated=True)
budget = ndb.FloatProperty()
date_begin = ndb.DateProperty(auto_now_add=True)
date_end = ndb.DateProperty(auto_now_add=True)
#classmethod
def all(cls):
return cls.query()
#classmethod
def addTags(cls, from_str):
tagname_list = from_str.split(',')
tag_list = []
for tag in tagname_list:
tag_list.append(Tag.addTag(tag))
cls.tags = tag_list
--
Edited (2) :
Thanks. Finally, I have chosen to create a new Model class 'Relation' representing a relation between two entities. It's more an association, I confess that my first design was unadapted.

An alternative would be to use BigQuery. At first we used NDB, with a RawModel which stores individual, non-aggregated records, and an AggregateModel, which a stores the aggregate values.
The AggregateModel was updated every time a RawModel was created, which caused some inconsistency issues. In hindsight, properly using parent/ancestor keys as Tim suggested would've worked, but in the end we found BigQuery much more pleasant and intuitive to work with.
We just have cronjobs that run everyday to push RawModel to BigQuery and another to create the AggregateModel records with data fetched from BigQuery.
(Of course, this is only effective if you have lots of data to aggregate)

It really does depend on the use case. For small numbers of items StructuredProperty and repeated properties may well be the best fit.
For large numbers of entities you will then look at setting the parent/ancestor in the Key for composition, and have a KeyProperty pointing to the primary entity in a many to one aggregation.
However the choice will also depend heavily on the actual use pattern as well. Then considerations of efficiency kick in.
The best I can suggest is consider carefully how you plan to use these relationships, how active are they (ie are they constantly changing, adding, deleting), do you need to see all members of the relation most of the time, or just subsets. These consideration may well require adjustments to the approach.

Python AppEngine Sort By Referenced Property

I have a model Entry
class Entry(db.Model):
year = db.StringProperty()
.
.
.
and for whatever reason the last name field is stored in a different model LastName:
class LastName(db.Model):
entry = db.ReferenceProperty(Entry, collection_name='last_names')
last_name = db.StringProperty()
If I query Entry and sort it by year (or any other property) using .order() how would I then sort that by the last name? I'm new to python but coming from Java I would guess there's some kind of comparator equivalent; or I'm completely wrong and there's another way to do it. I for sure cannot change my model at this point in time, though that may be the solution later. Any suggestions?
EDIT: I'm currently paginating through the results using offsets (moving to cursors soon, but I think it would be the same issue). So if I try to sort outside of the datastore I would only be sorting the current set; it's possible that the first page will be all 'B's and the second page will have 'A's, so it will only be sorted by page not by overall set. Am I screwed the way my models are currently set up?

A few issues here.
There's no way to do this sorting directly in the datastore API, either in Python or Java - as you no doubt know, the datastore is non-relational, and indirect lookups like this aren't supported.
If this was just a straight one-to-one relationship, which gave you an accessor from the Entry entity to the LastName one, you could use the standard Python sort function to sort the list:
entries.sort(key=lambda e: e.last_name.last_name)
(note that this sorts the list in place but returns None, so don't try assigning from it).
However, this won't work, because what you've actually got here is a one-to-many relationship: there are potentially many LastNames for each Entry. The definition actually recognises this: the collection_name attribute, which defines the accessor from Entry to LastName, is called last_names, ie plural.
So what you're asking doesn't really make sense: which of the potentially many LastNames do you want to sort on? You can certainly do it the other way round - given a query of LastNames, sort by entry year - but given your current structure there's not really any way of doing it.
I must say though, although I don't know the rest of your models, I suspect you have actually got that relationship the wrong way round: the ReferenceProperty should probably live on Entry pointing to LastName rather than the other way round as it is now. Then it would simply be the sort call I gave above.

Is it safe to pass Google App Engine Entity Keys into web pages to maintain context?

I have a simple GAE system that contains models for Account, Project and Transaction.
I am using Django to generate a web page that has a list of Projects in a table that belong to a given Account and I want to create a link to each project's details page. I am generating a link that converts the Project's key to string and includes that in the link to make it easy to lookup the Project object. This gives a link that looks like this:
My Project Name
Is it secure to create links like this? Is there a better way? It feels like a bad way to keep context.
The key string shows up in the linked page and is ugly. Is there a way to avoid showing it?
Thanks.

There is few examples, in GAE docs, that uses same approach, and also Key are using characters safe for including in URLs. So, probably, there is no problem.
BTW, I prefer to use numeric ID (obj_key.id()), when my model uses number as identifier, just because it's looks not so ugly.

Whether or not this is 'secure' depends on what you mean by that, and how you implement your app. Let's back off a bit and see exactly what's stored in a Key object. Take your key, go to shell.appspot.com, and enter the following:
db.Key(your_key)
this returns something like the following:
datastore_types.Key.from_path(u'TestKind', 1234, _app=u'shell')
As you can see, the key contains the App ID, the kind name, and the ID or name (along with the kind/id pairs of any parent entities - in this case, none). Nothing here you should be particularly concerned about concealing, so there shouldn't be any significant risk of information leakage here.
You mention as a concern that users could guess other URLs - that's certainly possible, since they could decode the key, modify the ID or name, and re-encode the key. If your security model relies on them not guessing other URLs, though, you might want to do one of a couple of things:
Reconsider your app's security model. You shouldn't rely on 'secret URLs' for any degree of real security if you can avoid it.
Use a key name, and set it to a long, random string that users will not be able to guess.
A final concern is what else users could modify. If you handle keys by passing them to db.get, the user could change the kind name, and cause you to fetch a different entity kind to that which you intended. If that entity kind happens to have similarly named fields, you might do things to the entity (such as revealing data from it) that you did not intend. You can avoid this by passing the key to YourModel.get instead, which will check the key is of the correct kind before fetching it.
All this said, though, a better approach is to pass the key ID or name around. You can extract this by calling .id() on the key object (for an ID - .name() if you're using key names), and you can reconstruct the original key with db.Key.from_path('kind_name', id) - or just fetch the entity directly with YourModel.get_by_id.

After doing some more research, I think I can now answer my own question. I wanted to know if using GAE keys or ids was inherently unsafe.
It is, in fact, unsafe without some additional code, since a user could modify URLs in the returned webpage or visit URL that they build manually. This would potentially let an authenticated user edit another user's data just by changing a key Id in a URL.
So for every resource that you allow access to, you need to ensure that the currently authenticated user has the right to be accessing it in the way they are attempting.
This involves writing extra queries for each operation, since it seems there is no built-in way to just say "Users only have access to objects that are owned by them".

I know this is an old post, but i want to clarify one thing. Sometimes you NEED to work with KEYs.
When you have an entity with a #Parent relationship, you cant get it by its ID, you need to use the whole KEY to get it back form the Datastore. In these cases you need to work with the KEY all the time if you want to retrieve your entity.

They aren't simply increasing; I only have 10 entries in my Datastore and I've already reached 7001.
As long as there is some form of protection so users can't simply guess them, there is no reason not to do it.

Django : load a restricted set of fields of objects loaded using a foreign key

I have the following code, using Django ORM
routes =Routes.objects.filter(scheduleid=schedule.id).only('externalid')
t_list = [(route.externalid, route.vehicle.name) for route in routes])
and it is very slow, because the vehicle objects are huge (dozens of fields, and I cannot change that, it is coming from a legacy database). A lot of time is devoted to create the Vehicle objects, while I only need the name field of this object.
Is there a more efficient way to obtain t_list ? I am looking for something like only() for accessing objects through a foreign key.
EDIT :
the solution is the following :
routes=Routes.objects.filter(scheduleid=schedule.id).select_related("vehicle")
routes= routes.only('externalid','vehicle__name')
Does there exist something similar ?

You should be able to do this, I think. Warning: not tested Tested using local models. Generated query looked good.
routes = Routes.objects.select_related('vehicle').filter(**conditions).only(
'externalid', 'vehicle__name')
For this to work there should be a vehicle foreign key field declared in Routes model. This is 'cause select_related() only follows forward relationships.

You can try following:
Routes.objects.filter(scheduleid__id=schedule.id).values('externalid', 'vehicle__name')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.