I apologize if my question turns out to be silly, but I'm rather new to Django, and I could not find an answer anywhere.
I have the following model:
class BlackListEntry(models.Model):
user_banned = models.ForeignKey(auth.models.User,related_name="user_banned")
user_banning = models.ForeignKey(auth.models.User,related_name="user_banning")
Now, when i try to create an object like this:
BlackListEntry.objects.create(user_banned=int(user_id),user_banning=int(banning_id))
I get a following error:
Cannot assign "1": "BlackListEntry.user_banned" must be a "User" instance.
Of course, if i replace it with something like this:
user_banned = User.objects.get(pk=user_id)
user_banning = User.objects.get(pk=banning_id)
BlackListEntry.objects.create(user_banned=user_banned,user_banning=user_banning)
everything works fine. The question is:
Does my solution hit the database to retrieve both users, and if yes, is it possible to avoid it, just passing ids?
The answer to your question is: YES.
Django will hit the database (at least) 3 times, 2 to retrieve the two User objects and a third one to commit your desired information. This will cause an absolutelly unnecessary overhead.
Just try:
BlackListEntry.objects.create(user_banned_id=int(user_id),user_banning_id=int(banning_id))
These is the default name pattern for the FK fields generated by Django ORM. This way you can set the information directly and avoid the queries.
If you wanted to query for the already saved BlackListEntry objects, you can navigate the attributes with a double underscore, like this:
BlackListEntry.objects.filter(user_banned__id=int(user_id),user_banning__id=int(banning_id))
This is how you access properties in Django querysets. with a double underscore. Then you can compare to the value of the attribute.
Though very similar, they work completely different. The first one sets an atribute directly while the second one is parsed by django, that splits it at the '__', and query the database the right way, being the second part the name of an attribute.
You can always compare user_banned and user_banning with the actual User objects, instead of their ids. But there is no use for this if you don't already have those objects with you.
Hope it helps.
I do believe that when you fetch the users, it is going to hit the db...
To avoid it, you would have to write the raw sql to do the update using method described here:
https://docs.djangoproject.com/en/dev/topics/db/sql/
If you decide to go that route keep in mind you are responsible for protecting yourself from sql injection attacks.
Another alternative would be to cache the user_banned and user_banning objects.
But in all likelihood, simply grabbing the users and creating the BlackListEntry won't cause you any noticeable performance problems. Caching or executing raw sql will only provide a small benefit. You're probably going to run into other issues before this becomes a problem.
Related
I have many models with relational links to each other which I have to use. My code is very complicated so I cannot keep session alive after a query. Instead, I try to preload all the objects:
def db_get_structure():
with Session(my_engine) as session:
deps = {x.id: x for x in session.query(Department).all()}
...
return (deps, ...)
def some_logic(id):
struct = db_get_structure()
return some_other_logic(struct.deps[id].owner)
However, I get the following error anyway regardless of the fact that all the objects are already loaded:
sqlalchemy.orm.exc.DetachedInstanceError: Parent instance <Department at 0x10476e780> is not bound to a Session; lazy load operation of attribute 'owner' cannot proceed
Is it possible to link preloaded objects with each other so that the relations will work after session get closed?
I know about joined queries (.options(joinedload(), but this approach leads to more code lines and bigger DB request, and I think this should be solved simpler, because all the objects are already loaded into Python objects.
It's even possible now to request the related objects like struct.deps[struct.deps[id].owner_id], but I think the ORM should do this and provide shorter notation struct.deps[id].owner using some "cached load".
Whenever you access an attribute on a DB entity that has not yet been loaded from the DB, SQLAlchemy will issue an implicit SQL statement to the DB to fetch that data. My guess is that this is what happens when you issue struct.deps[struct.deps[id].owner_id].
If the object in question has been removed from the session it is in a "detached" state and SQLAlchemy protects you from accidentally running into inconsistent data. In order to work with that object again it needs to be "re-attached".
I've done this already fairly often with session.merge:
attached_object = new_session.merge(detached_object)
But this will reconile the object instance with the DB and potentially issue updates to the DB if necessary. The detached_object is taken as "truth".
I believe you can do the reverse (attaching it by reading from the DB instead of writing to it) by using session.refresh(detached_object), but I need to verify this. I'll update the post if I found something.
Both ways have to talk to the DB with at least a select to ensure the data is consistent.
In order to avoid loading, issue session.merge(..., load=False). But this has some very important cavetas. Have a look at the docs of session.merge() for details.
I will need to read up on your link you added concerning your "complicated code". I would like to understand why you need to throw away your session the way you do it. Maybe there is an easier way?
I cant find "best" solution for very simple problem(or not very)
Have classical set of data: posts that attached to users, comments that attached to post and to user.
Now i can't decide how to build scheme/classes
On way is to store user_id inside comments and inside.
But what happens when i have 200 comments on page?
Or when i have N posts on page?
I mean it should be 200 additional requests to database to display user info(such as name,avatar)
Another solution is to embed user data into each comment and each post.
But first -> it is huge overhead, second -> model system is getting corrupted(using mongoalchemy), third-> user can change his info(like avatar). And what then? As i understand update operation on huge collections of comments or posts is not simple operation...
What would you suggest? Is 200 requests per page to mongodb is OK(must aim for performance)?
Or may be I am just missing something...
You can avoid the N+1-problem of hundreds of requests using $in-queries. Consider this:
Post {
PosterId: ObjectId
Text: string
Comments: [ObjectId, ObjectId, ...] // option 1
}
Comment {
PostId: ObjectId // option 2 (better)
Created: dateTime,
AuthorName: string,
AuthorId: ObjectId,
Text: string
}
Now you can find the posts comments with an $in query, and you can also easily find all comments made by a specific author.
Of course, you could also store the comments as an embedded array in post, and perform an $in query on the user information when you fetch the comments. That way, you don't need to de-normalize user names and still don't need hundreds of queries.
If you choose to denormalize the user names, you will have to update all comments ever made by that user when a user changes e.g. his name. On the other hand, if such operations don't occur very often, it shouldn't be a big deal. Or maybe it's even better to store the name the user had when he made the comment, depending your requirements.
A general problem with embedding is that different writers will write to the same object, so you will have to use the atomic modifiers (such as $push). This is sometimes harder to use with mappers (I don't know mongoalchemy though), and generally less flexible.
What I would do with mongodb would be to embed the user id into the comments (which are part of the structure of the "post" document).
Three simple hints for better performances:
1) Make sure to ensure an index on the user_id
2) Use comment pagination method to avoid querying 200 times the database
3) Caching is your friend
You could cache your user objects so you don't have to query the database each time.
I like the idea of embedding user data into each post but then you have to think about what happens when a user's profile is updated? have to make sure that no post is missed.
I would recommend starting out just by skimming how mongo recommends you handle schemas.
Generally, for "contains" relationships between entities,
embedding should be be chosen. Use linking when not using linking would result in
duplication of data.
http://www.mongodb.org/display/DOCS/Schema+Design
There's a pretty good use case from the MongoDB docs: http://docs.mongodb.org/manual/use-cases/storing-comments/
Conveniently it's also written in Python :-)
I have a model Entry
class Entry(db.Model):
year = db.StringProperty()
.
.
.
and for whatever reason the last name field is stored in a different model LastName:
class LastName(db.Model):
entry = db.ReferenceProperty(Entry, collection_name='last_names')
last_name = db.StringProperty()
If I query Entry and sort it by year (or any other property) using .order() how would I then sort that by the last name? I'm new to python but coming from Java I would guess there's some kind of comparator equivalent; or I'm completely wrong and there's another way to do it. I for sure cannot change my model at this point in time, though that may be the solution later. Any suggestions?
EDIT: I'm currently paginating through the results using offsets (moving to cursors soon, but I think it would be the same issue). So if I try to sort outside of the datastore I would only be sorting the current set; it's possible that the first page will be all 'B's and the second page will have 'A's, so it will only be sorted by page not by overall set. Am I screwed the way my models are currently set up?
A few issues here.
There's no way to do this sorting directly in the datastore API, either in Python or Java - as you no doubt know, the datastore is non-relational, and indirect lookups like this aren't supported.
If this was just a straight one-to-one relationship, which gave you an accessor from the Entry entity to the LastName one, you could use the standard Python sort function to sort the list:
entries.sort(key=lambda e: e.last_name.last_name)
(note that this sorts the list in place but returns None, so don't try assigning from it).
However, this won't work, because what you've actually got here is a one-to-many relationship: there are potentially many LastNames for each Entry. The definition actually recognises this: the collection_name attribute, which defines the accessor from Entry to LastName, is called last_names, ie plural.
So what you're asking doesn't really make sense: which of the potentially many LastNames do you want to sort on? You can certainly do it the other way round - given a query of LastNames, sort by entry year - but given your current structure there's not really any way of doing it.
I must say though, although I don't know the rest of your models, I suspect you have actually got that relationship the wrong way round: the ReferenceProperty should probably live on Entry pointing to LastName rather than the other way round as it is now. Then it would simply be the sort call I gave above.
I have a simple GAE system that contains models for Account, Project and Transaction.
I am using Django to generate a web page that has a list of Projects in a table that belong to a given Account and I want to create a link to each project's details page. I am generating a link that converts the Project's key to string and includes that in the link to make it easy to lookup the Project object. This gives a link that looks like this:
My Project Name
Is it secure to create links like this? Is there a better way? It feels like a bad way to keep context.
The key string shows up in the linked page and is ugly. Is there a way to avoid showing it?
Thanks.
There is few examples, in GAE docs, that uses same approach, and also Key are using characters safe for including in URLs. So, probably, there is no problem.
BTW, I prefer to use numeric ID (obj_key.id()), when my model uses number as identifier, just because it's looks not so ugly.
Whether or not this is 'secure' depends on what you mean by that, and how you implement your app. Let's back off a bit and see exactly what's stored in a Key object. Take your key, go to shell.appspot.com, and enter the following:
db.Key(your_key)
this returns something like the following:
datastore_types.Key.from_path(u'TestKind', 1234, _app=u'shell')
As you can see, the key contains the App ID, the kind name, and the ID or name (along with the kind/id pairs of any parent entities - in this case, none). Nothing here you should be particularly concerned about concealing, so there shouldn't be any significant risk of information leakage here.
You mention as a concern that users could guess other URLs - that's certainly possible, since they could decode the key, modify the ID or name, and re-encode the key. If your security model relies on them not guessing other URLs, though, you might want to do one of a couple of things:
Reconsider your app's security model. You shouldn't rely on 'secret URLs' for any degree of real security if you can avoid it.
Use a key name, and set it to a long, random string that users will not be able to guess.
A final concern is what else users could modify. If you handle keys by passing them to db.get, the user could change the kind name, and cause you to fetch a different entity kind to that which you intended. If that entity kind happens to have similarly named fields, you might do things to the entity (such as revealing data from it) that you did not intend. You can avoid this by passing the key to YourModel.get instead, which will check the key is of the correct kind before fetching it.
All this said, though, a better approach is to pass the key ID or name around. You can extract this by calling .id() on the key object (for an ID - .name() if you're using key names), and you can reconstruct the original key with db.Key.from_path('kind_name', id) - or just fetch the entity directly with YourModel.get_by_id.
After doing some more research, I think I can now answer my own question. I wanted to know if using GAE keys or ids was inherently unsafe.
It is, in fact, unsafe without some additional code, since a user could modify URLs in the returned webpage or visit URL that they build manually. This would potentially let an authenticated user edit another user's data just by changing a key Id in a URL.
So for every resource that you allow access to, you need to ensure that the currently authenticated user has the right to be accessing it in the way they are attempting.
This involves writing extra queries for each operation, since it seems there is no built-in way to just say "Users only have access to objects that are owned by them".
I know this is an old post, but i want to clarify one thing. Sometimes you NEED to work with KEYs.
When you have an entity with a #Parent relationship, you cant get it by its ID, you need to use the whole KEY to get it back form the Datastore. In these cases you need to work with the KEY all the time if you want to retrieve your entity.
They aren't simply increasing; I only have 10 entries in my Datastore and I've already reached 7001.
As long as there is some form of protection so users can't simply guess them, there is no reason not to do it.
I have a super simple django model here:
class Notification(models.Model):
message = models.TextField()
user = models.ForeignKey(User)
timestamp = models.DateTimeField(default=datetime.datetime.now)
Using ajax, I check for new messages every minute. I only show the five most recent notifications to the user at any time. What I'm trying to avoid, is the following scenario.
User logs in and has no notifications. While the user's window is up, he receives 10 new messages. Since I'm only showing him five, no big deal. The problem happens when the user starts to delete his notifications. If he deletes the five that are displayed, the five older ones will be displayed on the next ajax call or refresh.
I'd like to have my model's save method delete everything but the 5 most recent objects whenever a new one is saved. Unfortunately, you can't use [5:] to do this. Help?
EDIT
I tried this which didn't work as expected (in the model's save method):
notes = Notification.objects.filter(user=self.user)[:4]
Notification.objects.exclude(pk__in=notes).delete()
i couldn't find a pattern in strange behavior, but after a while of testing, it would only delete the most recent one when a new one was created. i have NO idea why this would be. the ordering is taken care of in the model's Meta class (by timestamp descending). thanks for the help, but my way seems to be the only one that works consistently.
This is a bit old, but I believe you can do the following:
notes = Notification.objects.filter(user=self.user)[:4]
Notification.objects.exclude(pk__in=list(notes)).delete() # list() forces a database hit.
It costs two hits, but avoids using the for loop with transactions middleware.
The reason for using list(notes) is that Django creates a single query without it and, in Mysql 5.1, this raises the error
(1235, "This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'")
By using list(notes), we force a query of notes, avoiding this.
This can be further optimized to:
notes = Notification.objects.filter(user=self.user)[:4].values_list("id", flat=True) # only retrieve ids.
Notification.objects.exclude(pk__in=list(notes)).delete()
Use an inner query to get the set of items you want to keep and then filter on them.
objects_to_keep = Notification.objects.filter(user=user).order_by('-created_at')[:5]
Notification.objects.exclude(pk__in=objects_to_keep).delete()
Double check this before you use it. I have found that simpler inner queries do not always behave as expected. The strange behavior I have experienced has been limited to querysets that are just an order_by and a slice. Since you will have to filter on user, you should be fine.
this is how i ended up doing this.
notes = Notification.objects.filter(user=self.user)
for note in notes[4:]:
note.delete()
because i'm doing this in the save method, the only way the loop would ever have to run more than once would be if the user got multiple notifications at once. i'm not worried about that happening (while it may happen it's not likely to be enough to cause a problem).