Is __new__ a good way to retrieve a SQLAlchemy object

Is __new__ a good way to retrieve a SQLAlchemy object - python

I am using SQLAlchemy and I just read about the __new__ function. I also read the other posts here about __new__ so I am aware of the difference to __init__, the order they get called and their purpose and the main message for me was: Use __new__ to control the creation of a new instance.
So with that in mind, when I work with SQLAlchemy and want to retrieve an instance (and create one if it does not already exist, e.g. retrieve a User object, I normally do this:
user = DBSession.query(User).filter(User.id==user_id).first()
if not user:
user = User()
This would either return the current user or give me a new one. Now with my new knowledge about magic, I thought something like this could be a good idea:
user = User(id=user_id)
And in my database class, I would call:
def __new__(cls, id=0):
if id:
user = DBSession.query(User).filter(User.id==id).first()
if not id or not user:
user = super(User, cls).__new__(cls, id=id)
return user
Now this code is only a quick draft (e.g. a call to super is missing) but it should clearly point out the idea.
Now my question: Is this a good practice or should I avoid this? If it should be avoided: Why?

Based on your question and your comments, I would suggest you not do this, because it doesn't appear you have any reason to do so, and you don't seem to understand what you're doing.
You say that you will put certain code __new__. But in the __new__ of what? If you have this:
class User(Base):
def __new__(cls, id=0):
if id:
user = DBSession.query(User).filter(User.id==id).first()
if not user:
user = User()
return user
. . . then when you try to create a User instance, its __new__ will try to create another instance, and so on, leading to infinite recursion.
Using user = User.__init__() solves nothing. __init__ always returns None, so you will just be trying to create a None object.
The appropriate use case for __new__ is when you want to change what kind of object is returned when you instantiate a class by doing SomeClass(). It is rare to need to do this. The most common case is when you want to create a user-defined class that mimics a builtin type such as dict, but even then you might not need to do this.
If your code works without overriding __new__, don't override __new__. Only override it if you have a specific problem or task that can't be solved in another way.

From what I see and unterstand, there is no reason why not to put your code into __init__ instead of __new__. There are only a few and very limited - but valid - uses cases for __new__ and you should really know what you are doing. So unless you have a very good reason, stick with __init__.

There is a very distinct difference between the first example (checking the return value) and the second (using the constructor immediately); and that difference is the free variable: DBSession.
In some cases, this difference is not interesting; If you are only using your sqlalchemy mapped objects for database persistence; and then only in contexts where sqlalchemy.orm.scopedsession is permissible (exactly one session per thread). then the difference is not very interesting.
I have found it unusual for both of these conditions to hold, and often neither holds.
By doing this you are preventing the objects from being useful outside the context of database persistence. By disconnecting your models from the database, your application can answer questions like "what if this object had this attribute?" in addition to questions like "does this object have this attribute?" This gets to the crux of why we map database values as python objects, so that they can have interesting behaviors, instead of just as dicts, which are merely bags of attributes.
For instance, in addition to using a regular database persistent login; you might allow users to log into your site with something like OAuth. Although you don't need to persist the users' name and password to your database, you still need to create the User object for the rest of your application to work (so that the user's gravatar shows up in the template).
The other question of implicitly accessing a particular database context by default is usually a bad idea. As applications grow, the need to manage how a database gets more complicated. Objects may be partitioned across several database hosts; you may be managing several concurrent transactions in the same thread; you might want to reuse a particular session for caching performance reasons. The sqlalchemy Session class exists to address all of these peculiarities; managing them explicitly, even when you are just using the most common pattern; makes dealing with the occasional variation much easier.
A really common example of that in web apps is start-up code; Sometimes it's neccesary to pull some key bits of data out of the database before an application is ready to serve any requests; but since there is no request to serve, where does the database connection come from? How do you get rid of it once you've finished starting up? These questions are usually non-issues with explicitly managed sessions.

Related

Django: Using ".first()" on a related queryset triggers queries despite "prefetch_related". Is there a workaround?

I'm trying to optimize some querysets in my Django code by using prefetch_related.
However, I've realized that if elsewhere in the code, a queryset function such as first, last, latest etc.. is called on a related queryset, a query is triggered regardless, effectively nullifying the optimization.
e.g.
class Client(models.Model):
# [...]
#property
def latest_order(self):
self.orders.latest('ordered_at') # Order is a Many-to-one related obj
Using Client.objects.prefetch_related('orders') is not gonna be useful and latest_order will still trigger a query for each Client.
Is there a way around this without altering the Client model's property? Otherwise if that's the best solution, what's the best way to retain the functionality latest_order?
The only way I can think of is to just to replicate latest functionality but in memory (via the sorted function), however this means that I lose out on being able to do this operation database-side if it's advantageous (e.g. I'm just looking at 1 client with A LOT of orders).

How can I define a metric for uniqueness of strings on Django model?

Suppose I have a model that represents scientific articles. Doing some research, I may find the same article more than once, with approximately equal titles:
Some Article Title
Some Article Title
Notice that the second title string is slightly different: it has an extra space before "Title".
If the problem was because there could be more or less spacing, it would be easy since I could just trim it before saving.
But say there could be more small differences that consist of characters other than spaces:
Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT) project.
Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT).
This is some random article I used here as an example
Those titles clearly refer to the same unique work, but the second one for some reason is missing some letters.
What is the best way of defining uniqueness in this situation?
In my mind, I was thinking of some function that calculates the levenshtein distance and decides if the strings are the same title based on some threshold. But is it possible to do on a django model, or define this behavior on a database level?

My first thought was the levenshtein distance too, so it's probably the way to go here ;) You could implement it yourself or find the code that already knows how to compute it (there's a lot of them) and then...
...use it in the model validation:
https://docs.djangoproject.com/en/2.0/ref/models/instances/#validating-objects
You can basically raise an exception in the custom validate_unique if you decide the new object violates this special type of uniqueness. The flipside is you'll probably need to load all other objects there.
If you create these objects on your own, you'll have to call full_clean() explicitly before saving. If the articles come from some kind of form, calling is_valid() on that form is enough.

You have 2 options here, 0 of which are perfect.
Option 1
This assumes you have a function titles_are_similar(title_1: str, title_2: str): bool implemented, that decides whether the two titles are similar. Use any sort of fuzzy string comparison of your choice to implement this function.
We will need to use an enhanced validator.
I said "enhanced" because it will optionally accept the object you are currently trying to save, when a typical django validator for obvious reasons does not do so.
The current object's id is required. When you change and save an already existing instance/row x, validation should not fail because the table already contains a "similar" value that belongs to this exact instance/row x.
The validator itself will use values_list to reduce the performance impact.
def title_unique_enough_validator(value, exclude_obj=None):
query_set = Article.objects.all()
if exclude_obj:
query_set = query_set.exclude(pk=exclude_obj.pk) # pk -> id
old_titles = query_set.values_list("title", flat=True)
if any(titles_are_similar(old_title, new_title) for old_title in old_titles):
raise ValidationError("Similar title already exists") # also use _()
If you will use: title = models.CharField(validators=[title_unique_enough_validator], ...) you will get a ValidationError every time you try to modify and save an existing object, as this object is not passed into the validator and therefore not excluded from the check (I mentioned it above). Instead, we will override the Article.clean() method (docs):
class Article(Model):
...
def clean(self):
super().clean()
title_unique_enough_validator(value=self.title, obj=self)
It will nicely work with forms. But there are 2 other major problems left.
Problem 1
Quoting the docs:
Note, however, that like Model.full_clean(), a model’s clean() method is not invoked when you call your model’s save() method.
To solve this, override the .save() method:
class Article(...):
...
def save(self, *args, **kwargs)
title_unique_enough_validator(value=self.value, obj=self) # can raise ValidationError
return super().save(*args, **kwargs)
However, django does not expect to have a ValidationError when calling save(). So, every time you manually call article.save() from your Python code (without djano forms) you need to wrap it into a try ... except block. Otherwise your software will 500 on ValidationError.
Problem 2
Do you ever explicitly call Article.objects.update()? If so, bad news (docs):
update() does an update at the SQL level and, thus, does not call any save() methods on your models
As a workaround, you might want to create a custom model manager for the Article model and override the update(): simply make it unusable (raise NotImplemented), or implement an additional check there. Just something that will prevent it from violating your constraint.
Option 2
Use database constraints.
Why I did not list this option first? Well, you will encounter tons and tons of problems with it. Django is not aware what database constraints might do. It just dies with OperationalError (docs) every time a constraint prevents it from doing what it wants.
As I have to work with many unmanaged models using django, I can confirm that you will require crap load of efforts to enhance django classes, so that it can deal with the OperationalError every now and then without exploding every bloody time. Especially painful is to deal with it if you're using django.contrib.admin, as it's just an endless pile of spaghetti.
So, seriously, avoid database constraints, unless you already must use unmanaged models or you're a masochist in search of adventures.

How can a class hold an array of classes in django

I have been having trouble using django. Right now, I have a messagebox class that is suppose to hold messages, and a message class that extends it. How do I make it so messagebox will hold messages?
Something else that I cannot figure out is how classes are to interact. Like, I have a user that can send messages. Should I call its method to call a method in messagebox to send a msg or can I have a method in user to make a msg directly.
My teacher tries to accentuate cohesion and coupling, but he never even talks about how to implement this in django or implement django period. Any help would be appreciated.

You're confusing two different things here. A class can easily have an attribute that is a list which contains instances of another class, there is nothing difficult about that.
(But note that there is no way in which a Message should extend MessageBox; this should be composition, not inheritance.)
However then you go on to talk about Django models. But Django models, although they are Python classes, also represent tables in the database. And the way you represent one table containing a list of entries in another table is via a foreign key field. So in this case your Message model would have a ForeignKey to MessageBox.
Where you put the send method depends entirely on your logic. A message should probably know how to send itself, so it sounds like the method would go there.

In Django, can I specify database when creating an object?

Look at this Django ORM code:
my_instance = MyModel()
my_instance.some_related_object = OtherModel.objects.using('other_db').get(id)
At this point, in the second line, Django will throw an error:
ValueError: Cannont assign "<OtherModel: ID>": instance is on database "default", value is on database "other_db"
To me, it doesn't make much sense. How Django can tell on which database my_instance is, if I haven't even called:
my_instance.save(using='some_database')
yet?
I guess, that during the construction of an object Django automatically assigns it to the default database. Can I change it? Can I specify database when creating an object, by passing an argument to its constructor? According to the documentation, the only arguments I can pass, when creating an object are the values of its fields. So how can I solve my problem?
In Django 1.8 There is a new method called Model.from_db (https://docs.djangoproject.com/en/1.8/ref/models/instances/) but I'm using earlier version of Django and can't switch to the newer now. Looking at the implementation all it does is setting two model's attributes:
instance._state.adding = False
instance._state.db = db
So would it be enough to change my code to:
my_instance = MyModel()
my_instance._state.adding = False
my_instance._state.db = 'other_db'
my_instance.some_related_object = OtherModel.objects.using('other_db').get(id)
or it is too late to do it because those flags are used in constructor and have to be set in constructor only?

You might want to look into database routing, which has been supported since Django 1.2. This will let you setup multiple databases (or "routers") for different models.
You can create a custom database router (a class inheriting from the built-in object type), with db_for_read and db_for_write methods that return the name of the database (as defined in the DATABASES setting) that should be used for the model passed into that method. Return None to let Django figure it out.
It's usually used for handling master-slave replication, so you can have a separate read-only database from your writeable one, but the same logic would apply to let you specify that certain models live in certain databases.
You would probably also want to define an allow_syncdb method so that only the models you want to appear in database B will appear there, and everything else will appear in database A.

Django knows what database each object comes from because it notes it such in its internal properties. The QuerySet too has this information stored within itself.
Actually, database routing isn't really needed to achieve what you want here.
Consider the following code fragment:
my_instance = MyModel()
my_instance.some_related_object_id = OtherModel.objects.using('other_db').get(id).id
Note how I assign just the ID, not the object itself.
You will lose the actual object here, but gain the ability to store referential data.
AFAIK there's no API to change an object's associated database.

Django get_query_set override is being cached

I'm overriding Django's get_query_set function on one of my models dynamically. I'm doing this to forcibly filter the original query set returned by Model.objects.all/filter/get by a "scenario" value, using a decorator. Here's the decorator's function:
# Get the base QuerySet for these models before we modify their
# QuerySet managers. This prevents infinite recursion since the
# get_query_set function doesn't rely on itself to get this base QuerySet.
all_income_objects = Income.objects.all()
# Figure out what scenario the user is using.
current_scenario = Scenario.objects.get(user=request.user, selected=True)
# Modify the imported income class to filter based on the current scenario.
Expense.objects.get_query_set = lambda: all_expense_objects.filter(scenario=current_scenario)
# Call the method that was initially supposed to
# be executed before we were so rudely interrupted.
return view(request, **arguments)
I'm doing this to DRY up the code, so that all of my queries aren't littered with an additional filter. However, if the scenario changes, no objects are being returned. If I kill all of my python processes on my server, the objects for the newly select scenario appear. I'm thinking that it's caching the modified class, and then when the scenario changes, it's applying another filter that will never make sense, since objects can only have one scenario at a time.
This hasn't been an issue with user-based filters because the user never changes for my session. Is passenger doing something stupid to hold onto class objects between requests? Should I be bailing on this weird design pattern and just implement these filters on a per-view basis? There must be a best practice for DRYing filters up that apply across many views based on something dynamic, like the current user.

What about creating a Manager object for the model which takes the user as an argument where this filtering is done. My understanding of being DRY w/ Django querysets is to use a Model Manager
#### view code:
def some_view(request):
expenses = Expense.objects.filter_by_cur_scenario(request.user)
# add additional filters here, or add to manager via more params
expenses = expenses.filter(something_else=True)
#### models code:
class ExpenseManager(models.Manager):
def filter_by_cur_scenario(self, user):
current_scenario = Scenario.objects.get(user=request.user, selected=True)
return self.filter(scenario=current_scenario)
class Expense(models.Model):
objects = ExpenseManager()
Also, one quick caveat on the manager (which may apply to overriding get_query_set): foreign relationships will not take into account any filtering done at this level. For example, you override the MyObject.objects.filter() method to always filter out deleted rows; A model w/ a foreignkey to that won't use that filter function (at least from what I understand -- someone please correct me if I'm wrong).

I was hoping to have this implementation happen without having to code anything in other views. Essentially, after the class is imported, I want to modify it so that no matter where it's referenced using Expense.objects.get/filter/all it's already been filtered. As a result, there is no implementation required for any of the other views; it's completely transparent. And, even in cases where I'm using it as a ForeignKey, when an object is retrieved using the aforementioned Expense.objects.get/filter/all, they'll be filtered as well.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.