Store unsaved model instance in session - python

I'm trying to store several unsaved model entries in the django session. I would like it to work something like this:
KEY = "FOOBAR"
def save_entry_to_session(new_entry, session):
items = deserialize(session.get(KEY))
items = append(new_entry)
session.put(KEY, serialize(items))
I have looked in to https://docs.djangoproject.com/en/dev/topics/serialization/ but the DeserializedObject didn't really play along and seemed like unnecessary overhead. Is there a better way to handle this? Pickle or is this unsafe?
Standard use case: save several items one at a time in the session -> save none, one or more items based on user actions.

I ended up using djangos model serializers, the deserializedObject was no obstacle. Using pickle would, in this instance, have been easier and safe (since it would never parse user submitted data) but I went with the safer choice to avoid risking exposure later.

I don't know if this is a new feature since you asked this question, but in Django 1.10 right now I am able to store an unsaved model in the session just by direct assignment, and it comes back out just fine without any extra work. Just request.session['thing'] = instance and then later instance = request.session['thing'].

Related

SQLAlchemy: use related object when session is closed

I have many models with relational links to each other which I have to use. My code is very complicated so I cannot keep session alive after a query. Instead, I try to preload all the objects:
def db_get_structure():
with Session(my_engine) as session:
deps = {x.id: x for x in session.query(Department).all()}
...
return (deps, ...)
def some_logic(id):
struct = db_get_structure()
return some_other_logic(struct.deps[id].owner)
However, I get the following error anyway regardless of the fact that all the objects are already loaded:
sqlalchemy.orm.exc.DetachedInstanceError: Parent instance <Department at 0x10476e780> is not bound to a Session; lazy load operation of attribute 'owner' cannot proceed
Is it possible to link preloaded objects with each other so that the relations will work after session get closed?
I know about joined queries (.options(joinedload(), but this approach leads to more code lines and bigger DB request, and I think this should be solved simpler, because all the objects are already loaded into Python objects.
It's even possible now to request the related objects like struct.deps[struct.deps[id].owner_id], but I think the ORM should do this and provide shorter notation struct.deps[id].owner using some "cached load".
Whenever you access an attribute on a DB entity that has not yet been loaded from the DB, SQLAlchemy will issue an implicit SQL statement to the DB to fetch that data. My guess is that this is what happens when you issue struct.deps[struct.deps[id].owner_id].
If the object in question has been removed from the session it is in a "detached" state and SQLAlchemy protects you from accidentally running into inconsistent data. In order to work with that object again it needs to be "re-attached".
I've done this already fairly often with session.merge:
attached_object = new_session.merge(detached_object)
But this will reconile the object instance with the DB and potentially issue updates to the DB if necessary. The detached_object is taken as "truth".
I believe you can do the reverse (attaching it by reading from the DB instead of writing to it) by using session.refresh(detached_object), but I need to verify this. I'll update the post if I found something.
Both ways have to talk to the DB with at least a select to ensure the data is consistent.
In order to avoid loading, issue session.merge(..., load=False). But this has some very important cavetas. Have a look at the docs of session.merge() for details.
I will need to read up on your link you added concerning your "complicated code". I would like to understand why you need to throw away your session the way you do it. Maybe there is an easier way?

Save form data on every step using Django FormWizard

Background
I'm building a very large form to process customer submissions, so the end goal is to allow the user to resume the form where they left off at a later date. The form is fully functional using a FormWizard (NamedUrlSessionWizardView, actually). The Django docs mention a final save is accomplished in the done method, and leave this as an exercise to the reader. This works OK if the user completes this in one sitting, but not if you want to restore this later.
In my case, an email address is used to lookup past progress, and send a unique link to the user. This sets up the form and returns the user to where they left off. This works fine as long as your session is still valid, but not if it isn't (different computer, etc). What I would like to do is save the form data (these are ModelForms) after each step. I'll restore the user's state when they return.
Research
This answer is about the only solution I can find, but the solution is the same thing that the standard FormWizard.post() method does:
if form.is_valid():
# if the form is valid, store the cleaned data and files.
self.storage.set_step_data(self.steps.current, self.process_step(form))
self.storage.set_step_files(self.steps.current, self.process_step_files(form))
My Question
What is the proper way/place in a FormWizard to take action on, and save the form data out after each step?
You should be able to save the data directly to the ModelForm as you go along by simply writing it into the post method.
if self.steps.current == "form1":
data = self.request.POST["form1-response"]
user = CustomerModel.objects.get(id=self.request.user.id)
user.response = data
user.form_step = "form1"
user.save()
form_step, in this case, is simply a bookmark that you can use to direct the user back to the right step on their return. You should remove any already-saved fields from the done method, so they don't get overwritten.
If you do it this way, you may need to construct a dispatch method that rebuilds the management form when the user logs back in.
Alternatively, you might be able to get away with saving the user's session (or the relevant parts) into a session field on the model, then write a dispatch method for the SessionWizardView that injects the relevant information back in. I've never attempted it, but if you can get it to work, it might be preferable from an aesthetic standpoint depending on how many steps you have to cover.
Finally, if you can rely on your users not to clear their cookies and to use the same browser when they return, you can maybe cheat and set use persistent cookies.
Hopefully that will get you started. I'd be interested to see how you end up getting it to work. Good luck!

How can I quickly set a field of all instances of a Django model at once?

To clarify, I've got several thousands of Property items, each with a 'present' field (among others). To reset the system for use again, I need to set every item's 'property' field to false. Now, of course there's the easy way to do it, which is just:
for obj in Property.objects.all():
obj.present = False
obj.save()
But this takes nearly 30 seconds on my development server. I feel there must be a better way, so I tried limiting the loaded fields using Django's only queryset:
for obj in Property.objects.only('present'):
obj.present = False
obj.save()
For whatever reason, this actually takes longer than just getting the entire object.
Because I need to indiscriminately set all of these values to False, is there a faster way? This function takes no user input other than the 'go do it' command, so I feel a native SQL command would be a safe option, but I don't know SQL enough to draft such a command.
Thanks everyone.
Use the update query:
Property.objects.all().update(present=False)
Note that update() query runs at SQL level, so if your model has a custom save() method then it is not going to be called here. In that case, the normal for-loop version that you're using is the way to go.

Django models - assign id instead of object

I apologize if my question turns out to be silly, but I'm rather new to Django, and I could not find an answer anywhere.
I have the following model:
class BlackListEntry(models.Model):
user_banned = models.ForeignKey(auth.models.User,related_name="user_banned")
user_banning = models.ForeignKey(auth.models.User,related_name="user_banning")
Now, when i try to create an object like this:
BlackListEntry.objects.create(user_banned=int(user_id),user_banning=int(banning_id))
I get a following error:
Cannot assign "1": "BlackListEntry.user_banned" must be a "User" instance.
Of course, if i replace it with something like this:
user_banned = User.objects.get(pk=user_id)
user_banning = User.objects.get(pk=banning_id)
BlackListEntry.objects.create(user_banned=user_banned,user_banning=user_banning)
everything works fine. The question is:
Does my solution hit the database to retrieve both users, and if yes, is it possible to avoid it, just passing ids?
The answer to your question is: YES.
Django will hit the database (at least) 3 times, 2 to retrieve the two User objects and a third one to commit your desired information. This will cause an absolutelly unnecessary overhead.
Just try:
BlackListEntry.objects.create(user_banned_id=int(user_id),user_banning_id=int(banning_id))
These is the default name pattern for the FK fields generated by Django ORM. This way you can set the information directly and avoid the queries.
If you wanted to query for the already saved BlackListEntry objects, you can navigate the attributes with a double underscore, like this:
BlackListEntry.objects.filter(user_banned__id=int(user_id),user_banning__id=int(banning_id))
This is how you access properties in Django querysets. with a double underscore. Then you can compare to the value of the attribute.
Though very similar, they work completely different. The first one sets an atribute directly while the second one is parsed by django, that splits it at the '__', and query the database the right way, being the second part the name of an attribute.
You can always compare user_banned and user_banning with the actual User objects, instead of their ids. But there is no use for this if you don't already have those objects with you.
Hope it helps.
I do believe that when you fetch the users, it is going to hit the db...
To avoid it, you would have to write the raw sql to do the update using method described here:
https://docs.djangoproject.com/en/dev/topics/db/sql/
If you decide to go that route keep in mind you are responsible for protecting yourself from sql injection attacks.
Another alternative would be to cache the user_banned and user_banning objects.
But in all likelihood, simply grabbing the users and creating the BlackListEntry won't cause you any noticeable performance problems. Caching or executing raw sql will only provide a small benefit. You're probably going to run into other issues before this becomes a problem.

Attribute Cache in Django - What's the point?

I was just looking over EveryBlock's source code and I noticed this code in the alerts/models.py code:
def _get_user(self):
if not hasattr(self, '_user_cache'):
from ebpub.accounts.models import User
try:
self._user_cache = User.objects.get(id=self.user_id)
except User.DoesNotExist:
self._user_cache = None
return self._user_cache
user = property(_get_user)
I've noticed this pattern around a bunch, but I don't quite understand the use. Is the whole idea to make sure that when accessing the FK on self (self = alert object), that you only grab the user object once from the db? Why wouldn't you just rely upon the db caching amd django's ForeignKey() field? I noticed that the model definition only holds the user id and not a foreign key field:
class EmailAlert(models.Model):
user_id = models.IntegerField()
...
Any insights would be appreciated.
I don't know why this is an IntegerField; it looks like it definitely should be a ForeignKey(User) field--you lose things like select_related() here and other things because of that, too.
As to the caching, many databases don't cache results--they (or rather, the OS) will cache the data on disk needed to get the result, so looking it up a second time should be faster than the first, but it'll still take work.
It also still takes a database round-trip to look it up. In my experience, with Django, doing an item lookup can take around 0.5 to 1ms, for an SQL command to a local Postgresql server plus sometimes nontrivial overhead of QuerySet. 1ms is a lot if you don't need it--do that a few times and you can turn a 30ms request into a 35ms request.
If your SQL server isn't local and you actually have network round-trips to deal with, the numbers get bigger.
Finally, people generally expect accessing a property to be fast; when they're complex enough to cause SQL queries, caching the result is generally a good idea.
Although databases do cache things internally, there's still an overhead in going back to the db every time you want to check the value of a related field - setting up the query within Django, the network latency in connecting to the db and returning the data over the network, instantiating the object in Django, etc. If you know the data hasn't changed in the meantime - and within the context of a single web request you probably don't care if it has - it makes much more sense to get the data once and cache it, rather than querying it every single time.
One of the applications I work on has an extremely complex home page containing a huge amount of data. Previously it was carrying out over 400 db queries to render. I've refactored it now so it 'only' uses 80, using very similar techniques to the one you've posted, and you'd better believe that it gives a massive performance boost.

Categories