Django get_query_set override is being cached - python

I'm overriding Django's get_query_set function on one of my models dynamically. I'm doing this to forcibly filter the original query set returned by Model.objects.all/filter/get by a "scenario" value, using a decorator. Here's the decorator's function:
# Get the base QuerySet for these models before we modify their
# QuerySet managers. This prevents infinite recursion since the
# get_query_set function doesn't rely on itself to get this base QuerySet.
all_income_objects = Income.objects.all()
# Figure out what scenario the user is using.
current_scenario = Scenario.objects.get(user=request.user, selected=True)
# Modify the imported income class to filter based on the current scenario.
Expense.objects.get_query_set = lambda: all_expense_objects.filter(scenario=current_scenario)
# Call the method that was initially supposed to
# be executed before we were so rudely interrupted.
return view(request, **arguments)
I'm doing this to DRY up the code, so that all of my queries aren't littered with an additional filter. However, if the scenario changes, no objects are being returned. If I kill all of my python processes on my server, the objects for the newly select scenario appear. I'm thinking that it's caching the modified class, and then when the scenario changes, it's applying another filter that will never make sense, since objects can only have one scenario at a time.
This hasn't been an issue with user-based filters because the user never changes for my session. Is passenger doing something stupid to hold onto class objects between requests? Should I be bailing on this weird design pattern and just implement these filters on a per-view basis? There must be a best practice for DRYing filters up that apply across many views based on something dynamic, like the current user.

What about creating a Manager object for the model which takes the user as an argument where this filtering is done. My understanding of being DRY w/ Django querysets is to use a Model Manager
#### view code:
def some_view(request):
expenses = Expense.objects.filter_by_cur_scenario(request.user)
# add additional filters here, or add to manager via more params
expenses = expenses.filter(something_else=True)
#### models code:
class ExpenseManager(models.Manager):
def filter_by_cur_scenario(self, user):
current_scenario = Scenario.objects.get(user=request.user, selected=True)
return self.filter(scenario=current_scenario)
class Expense(models.Model):
objects = ExpenseManager()
Also, one quick caveat on the manager (which may apply to overriding get_query_set): foreign relationships will not take into account any filtering done at this level. For example, you override the MyObject.objects.filter() method to always filter out deleted rows; A model w/ a foreignkey to that won't use that filter function (at least from what I understand -- someone please correct me if I'm wrong).

I was hoping to have this implementation happen without having to code anything in other views. Essentially, after the class is imported, I want to modify it so that no matter where it's referenced using Expense.objects.get/filter/all it's already been filtered. As a result, there is no implementation required for any of the other views; it's completely transparent. And, even in cases where I'm using it as a ForeignKey, when an object is retrieved using the aforementioned Expense.objects.get/filter/all, they'll be filtered as well.

Related

How to create a Django REST Framework View instance from a ViewSet class?

I'm trying to unit test Django REST Framework view set permissions for two reasons: speed and simplicity. In keeping with these goals I would also like to avoid using any mocking frameworks. Basically I want to do something like this:
request = APIRequestFactory().post(…)
view = MyViewSet.as_view(actions={"post": "create"})
self.assertTrue(MyPermission().has_permission(request, view))
The problem with this approach is that view is not actually a View instance but rather a function which does something with a View instance, and it does not have certain properties which I use in has_permission, such as action. How do I construct the kind of View instance which can be passed to has_permission?
The permission is already tested at both the integration and acceptance level, but I would like to avoid creating several complex and time-consuming tests to simply check that each of the relevant actions are protected.
I've been able to work around this by monkeypatching a view set instance and manually dispatching it:
view_set = MyViewSet()
view_set.action_map = {"post": "create"}
view_set.dispatch(request)
You can do something like below.
request = APIRequestFactory().post(…)
view_obj = MyViewSet()
self.assertTrue(MyPermission().has_permission(request, view_obj))

How can I define a metric for uniqueness of strings on Django model?

Suppose I have a model that represents scientific articles. Doing some research, I may find the same article more than once, with approximately equal titles:
Some Article Title
Some Article Title
Notice that the second title string is slightly different: it has an extra space before "Title".
If the problem was because there could be more or less spacing, it would be easy since I could just trim it before saving.
But say there could be more small differences that consist of characters other than spaces:
Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT) project.
Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT).
This is some random article I used here as an example
Those titles clearly refer to the same unique work, but the second one for some reason is missing some letters.
What is the best way of defining uniqueness in this situation?
In my mind, I was thinking of some function that calculates the levenshtein distance and decides if the strings are the same title based on some threshold. But is it possible to do on a django model, or define this behavior on a database level?
My first thought was the levenshtein distance too, so it's probably the way to go here ;) You could implement it yourself or find the code that already knows how to compute it (there's a lot of them) and then...
...use it in the model validation:
https://docs.djangoproject.com/en/2.0/ref/models/instances/#validating-objects
You can basically raise an exception in the custom validate_unique if you decide the new object violates this special type of uniqueness. The flipside is you'll probably need to load all other objects there.
If you create these objects on your own, you'll have to call full_clean() explicitly before saving. If the articles come from some kind of form, calling is_valid() on that form is enough.
You have 2 options here, 0 of which are perfect.
Option 1
This assumes you have a function titles_are_similar(title_1: str, title_2: str): bool implemented, that decides whether the two titles are similar. Use any sort of fuzzy string comparison of your choice to implement this function.
We will need to use an enhanced validator.
I said "enhanced" because it will optionally accept the object you are currently trying to save, when a typical django validator for obvious reasons does not do so.
The current object's id is required. When you change and save an already existing instance/row x, validation should not fail because the table already contains a "similar" value that belongs to this exact instance/row x.
The validator itself will use values_list to reduce the performance impact.
def title_unique_enough_validator(value, exclude_obj=None):
query_set = Article.objects.all()
if exclude_obj:
query_set = query_set.exclude(pk=exclude_obj.pk) # pk -> id
old_titles = query_set.values_list("title", flat=True)
if any(titles_are_similar(old_title, new_title) for old_title in old_titles):
raise ValidationError("Similar title already exists") # also use _()
If you will use: title = models.CharField(validators=[title_unique_enough_validator], ...) you will get a ValidationError every time you try to modify and save an existing object, as this object is not passed into the validator and therefore not excluded from the check (I mentioned it above). Instead, we will override the Article.clean() method (docs):
class Article(Model):
...
def clean(self):
super().clean()
title_unique_enough_validator(value=self.title, obj=self)
It will nicely work with forms. But there are 2 other major problems left.
Problem 1
Quoting the docs:
Note, however, that like Model.full_clean(), a model’s clean() method is not invoked when you call your model’s save() method.
To solve this, override the .save() method:
class Article(...):
...
def save(self, *args, **kwargs)
title_unique_enough_validator(value=self.value, obj=self) # can raise ValidationError
return super().save(*args, **kwargs)
However, django does not expect to have a ValidationError when calling save(). So, every time you manually call article.save() from your Python code (without djano forms) you need to wrap it into a try ... except block. Otherwise your software will 500 on ValidationError.
Problem 2
Do you ever explicitly call Article.objects.update()? If so, bad news (docs):
update() does an update at the SQL level and, thus, does not call any save() methods on your models
As a workaround, you might want to create a custom model manager for the Article model and override the update(): simply make it unusable (raise NotImplemented), or implement an additional check there. Just something that will prevent it from violating your constraint.
Option 2
Use database constraints.
Why I did not list this option first? Well, you will encounter tons and tons of problems with it. Django is not aware what database constraints might do. It just dies with OperationalError (docs) every time a constraint prevents it from doing what it wants.
As I have to work with many unmanaged models using django, I can confirm that you will require crap load of efforts to enhance django classes, so that it can deal with the OperationalError every now and then without exploding every bloody time. Especially painful is to deal with it if you're using django.contrib.admin, as it's just an endless pile of spaghetti.
So, seriously, avoid database constraints, unless you already must use unmanaged models or you're a masochist in search of adventures.

Should I prefer one general signal instead of multiple specific ones?

When the user creates a product, multiple actions have to be done in save() method before calling super(Product,self).save(*args,**kwargs).
I'm not sure if I should use just one pre_save signal to do all these actions or it is better to create a signal for each of these actions separately.
Simple example (I'm going to replace save overrides by signals):
class Product(..):
def save(...):
if not self.pk:
if not self.category:
self.category = Category.get_default()
if not self.brand:
self.brand = 'NA'
super(Product,self).save(*args,**kwargs)
...
SO
#receiver(pre_save,sender=Product)
def set_attrs(instance,**kwargs):
if kwargs['created']:
instance.category = Category.get_default()
instance.brand = 'NA'
OR
#receiver(pre_save,sender=Product)
def set_category(instance,**kwargs):
if kwargs['created']:
instance.category = Category.get_default()
#receiver(pre_save,sender=Product)
def set_brand(instance,**kwargs):
if kwargs['created']:
instance.brand = 'NA'
This is just simple example. In this case, the general set_attrs should be probably enough but there are more complex situations with different actions like creating userprofile for user and then userplan etc.
Is there some best practice advice for this? Your opinions?
To put the facts out simply, it could be pointed out as a single piece of advice,
If action on one model's instance affects another model, signals are the cleanest way to go about. This is an example where you can go with a signal, because you might want to avoid some_model.save() call from within the save() method of another_model, if you know what I mean.
To elaborate on an example, when overriding save() methods, common task is to create slugs from some fields in the model. If you are required to implement this process on multiple models, then using a pre_save signal would be a benefit, rather than hard-coding in save() method of each models.
Also, on bulk operations, these signals and methods are not necessarily called.
From the docs,
Overridden model methods are not called on bulk operations
Note that the delete() method for an object is not necessarily called when deleting objects in bulk using a QuerySet or as a result of a cascading delete. To ensure customized delete logic gets executed, you can use pre_delete and/or post_delete signals.
Unfortunately, there isn’t a workaround when creating or updating objects in bulk, since none of save(), pre_save, and post_save are called.
For more reference,
Django override save() or signals?
Overriding predefined model methods
Django: signal or model method?

In Django, ok to set Model.objects to another manager outside of model definition?

Suppose, in Django 1.6, you have the following model code:
class FooManager(models.Manager):
def get_queryset():
return ... # i.e. return a custom queryset
class Foo(models.Model):
foo_manager = FooManager()
If, outside the Foo model definition (e.g. in view code or in the shell), you do:
Foo.objects = FooManager()
Foo.objects.all()
you'll get an exception in the Django internal code on Foo.objects.all() due to a variable named lookup_model being `None'.
However, if you instead do:
Foo.objects = Foo.foo_manager
Foo.objects.all()
The Foo.objects.all() will work as expected, i.e. as if objects had been defined to be FooManager() in the model definition in the first place.
I believe this behavior is due to Django working its "magic" in creating managers during model definition (just as it works magic in creating model fields).
My question: is there any reason NOT to assign objects to an alternate manager in this way outside of the model definition? It seems to work fine, but I don't fully understand the internals so want to make sure.
In case you are wondering, the context is that I have a large code base with many typical references to objects. I want to have this code base work on different databases dynamically, i.e. based on a request URL parameter. My plan is to use middleware that sets objects for all relevant models to managers that point to the appropriate database. The rest of the app code would then go on its merry way, using objects without ever having to know anything has changed.
The trouble is that this is not at all thread safe. Doing this will change the definition for all requests being served by that process, until something else changes it again. That is very likely to have all sorts of unexpected effects.

Is __new__ a good way to retrieve a SQLAlchemy object

I am using SQLAlchemy and I just read about the __new__ function. I also read the other posts here about __new__ so I am aware of the difference to __init__, the order they get called and their purpose and the main message for me was: Use __new__ to control the creation of a new instance.
So with that in mind, when I work with SQLAlchemy and want to retrieve an instance (and create one if it does not already exist, e.g. retrieve a User object, I normally do this:
user = DBSession.query(User).filter(User.id==user_id).first()
if not user:
user = User()
This would either return the current user or give me a new one. Now with my new knowledge about magic, I thought something like this could be a good idea:
user = User(id=user_id)
And in my database class, I would call:
def __new__(cls, id=0):
if id:
user = DBSession.query(User).filter(User.id==id).first()
if not id or not user:
user = super(User, cls).__new__(cls, id=id)
return user
Now this code is only a quick draft (e.g. a call to super is missing) but it should clearly point out the idea.
Now my question: Is this a good practice or should I avoid this? If it should be avoided: Why?
Based on your question and your comments, I would suggest you not do this, because it doesn't appear you have any reason to do so, and you don't seem to understand what you're doing.
You say that you will put certain code __new__. But in the __new__ of what? If you have this:
class User(Base):
def __new__(cls, id=0):
if id:
user = DBSession.query(User).filter(User.id==id).first()
if not user:
user = User()
return user
. . . then when you try to create a User instance, its __new__ will try to create another instance, and so on, leading to infinite recursion.
Using user = User.__init__() solves nothing. __init__ always returns None, so you will just be trying to create a None object.
The appropriate use case for __new__ is when you want to change what kind of object is returned when you instantiate a class by doing SomeClass(). It is rare to need to do this. The most common case is when you want to create a user-defined class that mimics a builtin type such as dict, but even then you might not need to do this.
If your code works without overriding __new__, don't override __new__. Only override it if you have a specific problem or task that can't be solved in another way.
From what I see and unterstand, there is no reason why not to put your code into __init__ instead of __new__. There are only a few and very limited - but valid - uses cases for __new__ and you should really know what you are doing. So unless you have a very good reason, stick with __init__.
There is a very distinct difference between the first example (checking the return value) and the second (using the constructor immediately); and that difference is the free variable: DBSession.
In some cases, this difference is not interesting; If you are only using your sqlalchemy mapped objects for database persistence; and then only in contexts where sqlalchemy.orm.scopedsession is permissible (exactly one session per thread). then the difference is not very interesting.
I have found it unusual for both of these conditions to hold, and often neither holds.
By doing this you are preventing the objects from being useful outside the context of database persistence. By disconnecting your models from the database, your application can answer questions like "what if this object had this attribute?" in addition to questions like "does this object have this attribute?" This gets to the crux of why we map database values as python objects, so that they can have interesting behaviors, instead of just as dicts, which are merely bags of attributes.
For instance, in addition to using a regular database persistent login; you might allow users to log into your site with something like OAuth. Although you don't need to persist the users' name and password to your database, you still need to create the User object for the rest of your application to work (so that the user's gravatar shows up in the template).
The other question of implicitly accessing a particular database context by default is usually a bad idea. As applications grow, the need to manage how a database gets more complicated. Objects may be partitioned across several database hosts; you may be managing several concurrent transactions in the same thread; you might want to reuse a particular session for caching performance reasons. The sqlalchemy Session class exists to address all of these peculiarities; managing them explicitly, even when you are just using the most common pattern; makes dealing with the occasional variation much easier.
A really common example of that in web apps is start-up code; Sometimes it's neccesary to pull some key bits of data out of the database before an application is ready to serve any requests; but since there is no request to serve, where does the database connection come from? How do you get rid of it once you've finished starting up? These questions are usually non-issues with explicitly managed sessions.

Categories