get_or_create django model with ManyToMany field

get_or_create django model with ManyToMany field - python

Suppose I have three django models:
class Section(models.Model):
name = models.CharField()
class Size(models.Model):
section = models.ForeignKey(Section)
size = models.IntegerField()
class Obj(models.Model):
name = models.CharField()
sizes = models.ManyToManyField(Size)
I would like to import a large amount of Obj data where many of the sizes fields will be identical. However, since Obj has a ManyToMany field, I can't just test for existence like I normally would. I would like to be able to do something like this:
try:
x = Obj(name='foo')
x.sizes.add(sizemodel1) # these can be looked up with get_or_create
...
x.sizes.add(sizemodelN) # these can be looked up with get_or_create
# Now test whether x already exists, so I don't add a duplicate
try:
Obj.objects.get(x)
except Obj.DoesNotExist:
x.save()
However, I'm not aware of a way to get an object this way, you have to just pass in keyword parameters, which don't work for ManyToManyFields.
Is there any good way I can do this? The only idea I've had is to build up a set of Q objects to pass to get:
myq = myq & Q(sizes__id=sizemodelN.id)
But I am not sure this will even work...

Use a through model and then .get() against that.
http://docs.djangoproject.com/en/dev/topics/db/models/#extra-fields-on-many-to-many-relationships
Once you have a through model, you can .get() or .filter() or .exists() to determine the existence of an object that you might otherwise want to create. Note that .get() is really intended for columns where unique is enforced by the DB - you might have better performance with .exists() for your purposes.
If this is too radical or inconvenient a solution, you can also just grab the ManyRelatedManager and iterate through to determine if the object exists:
object_sizes = obj.sizes.all()
exists = object_sizes.filter(id__in = some_bunch_of_size_object_ids_you_are_curious_about).exists()
if not exists:
(your creation code here)

Your example doesn't make much sense because you can't add m2m relationships before an x is saved, but it illustrated what you are trying to do pretty well. You have a list of Size objects created via get_or_create(), and want to create an Obj if no duplicate obj-size relationship exists?
Unfortunately, this is not possible very easily. Chaining Q(id=F) & Q(id=O) & Q(id=O) doesn't work for m2m.
You could certainly use Obj.objects.filter(size__in=Sizes) but that means you'd get a match for an Obj with 1 size in a huge list of sizes.
Check out this post for an __in exact question, answered by Malcolm, so I trust it quite a bit.
I wrote some python for fun that could take care of this.
This is a one time import right?
def has_exact_m2m_match(match_list):
"""
Get exact Obj m2m match
"""
if isinstance(match_list, QuerySet):
match_list = [x.id for x in match_list]
results = {}
match = set(match_list)
for obj, size in \
Obj.sizes.through.objects.filter(size__in=match).values_list('obj', 'size'):
# note: we are accessing the auto generated through model for the sizes m2m
try:
results[obj].append(size)
except KeyError:
results[obj] = [size]
return bool(filter(lambda x: set(x) == match, results.values()))
# filter any specific objects that have the exact same size IDs
# if there is a match, it means an Obj exists with exactly
# the sizes you provided to the function, no more.
sizes = [size1, size2, size3, sizeN...]
if has_exact_m2m_match(sizes):
x = Obj.objects.create(name=foo) # saves so you can use x.sizes.add
x.sizes.add(sizes)

Related

Django: How to annotate M2M or OneToMany fields using a SubQuery?

I have Order objects and OrderOperation objects that represent an action on a Order (creation, modification, cancellation).
Conceptually, an order has 1 to many order operations. Each time there is an operation on the order, the total is computed in this operation. Which means when I need to find an attribute of an order, I just get the last order operation attribute instead, using a Subquery.
The simplified code
class OrderOperation(models.Model):
order = models.ForeignKey(Order)
total = DecimalField(max_digits=9, decimal_places=2)
class Order(models.Model)
# ...
class OrderQuerySet(query.Queryset):
#staticmethod
def _last_oo(field):
return Subquery(OrderOperation.objects
.filter(order_id=OuterRef("pk"))
.order_by('-id')
.values(field)
[:1])
def annotated_total(self):
return self.annotate(oo_total=self._last_oo('total'))
This way, I can run my_order_total = Order.objects.annotated_total()[0].oo_total. It works great.
The issue
Computing total is easy as it's a simple value. However, when there is a M2M or OneToMany field, this method does not work. For example, using the example above, let's add this field:
class OrderOperation(models.Model):
order = models.ForeignKey(Order)
total = DecimalField(max_digits=9, decimal_places=2)
ordered_articles = models.ManyToManyField(Article,through='orders.OrderedArticle')
Writing something like the following does NOT work as it returns only 1 foreign key (not a list of all the FKs):
def annotated_ordered_articles(self):
return self.annotate(oo_ordered_articles=self._last_oo('ordered_articles'))
The purpose
The whole purpose is to allow a user to search among all orders, providing a list or articles in input. For example: "Please find all orders containing at least article 42 or article 43", or "Please find all orders containing exactly article 42 and 43", etc.
If I could get something like:
>>> Order.objects.annotated_ordered_articles()[0].oo_ordered_articles
<ArticleQuerySet [<Article: Article42>, <Article: Article43>]>
or even:
>>> Order.objects.annotated_ordered_articles()[0].oo_ordered_articles
[42,43]
That would solve my issue.
My current idea
Maybe something like ArrayAgg (I'm using pgSQL) could do the trick, but I'm not sure to understand how to use it in my case.
Maybe this has to do with values() method that seems to not be intended to handle M2M and 1TM relations as stated in the doc:
values() and values_list() are both intended as optimizations for a
specific use case: retrieving a subset of data without the overhead of
creating a model instance. This metaphor falls apart when dealing with
many-to-many and other multivalued relations (such as the one-to-many
relation of a reverse foreign key) because the “one row, one object”
assumption doesn’t hold.

ArrayAgg will be great if you want to fetch only one variable (ie. name) from all articles. If you need more, there is a better option for that:
prefetch_related
Instead, you can prefetch for each Order, latest OrderOperation as a whole object. This adds the ability to easily get any field from OrderOperation without extra magic.
The only caveat with that is that you will always get a list with one operation or an empty list when there are no operations for selected order.
To do that, you should use prefetch_related queryset model together with Prefetch object and custom query for OrderOperation. Example:
from django.db.models import Max, F, Prefetch
last_order_operation_qs = OrderOperation.objects.annotate(
lop_pk=Max('order__orderoperation__pk')
).filter(pk=F('lop_pk'))
orders = Order.objects.prefetch_related(
Prefetch('orderoperation_set', queryset=last_order_operation_qs, to_attr='last_operation')
)
Then you can just use order.last_operation[0].ordered_articles to get all ordered articles for particular order. You can add prefetch_related('ordered_articles') to first queryset to have improved performance and less queries on database.

To my surprise, your idea with ArrayAgg is right on the money. I didn't know there was a way to annotate with an array (and I believe there still isn't for backends other than Postgres).
from django.contrib.postgres.aggregates.general import ArrayAgg
qs = Order.objects.annotate(oo_articles=ArrayAgg(
'order_operation__ordered_articles__id',
'DISTINCT'))
You can then filter the resulting queryset using the ArrayField lookups:
# Articles that contain the specified array
qs.filter(oo_articles__contains=[42,43])
# Articles that are identical to the specified array
qs.filter(oo_articles=[42,43,44])
# Articles that are contained in the specified array
qs.filter(oo_articles__contained_by=[41,42,43,44,45])
# Articles that have at least one element in common
# with the specified array
qs.filter(oo_articles__overlap=[41,42])
'DISTINCT' is needed only if the operation may contain duplicate articles.
You may need to tweak the exact name of the field passed to the ArrayAgg function. For subsequent filtering to work, you may also need to cast id fields in the ArrayAgg to int as otherwise Django casts the id array to ::serial[], and my Postgres complained about type "serial[]" does not exist:
from django.db.models import IntegerField
from django.contrib.postgres.fields.array import ArrayField
from django.db.models.functions import Cast
ArrayAgg(Cast('order_operation__ordered_articles__id', IntegerField()))
# OR
Cast(ArrayAgg('order_operation__ordered_articles__id'), ArrayField(IntegerField()))
Looking at your posted code more closely, you'll also have to filter on the one OrderOperation you are interested in; the query above looks at all operations for the relevant order.

Building Django Q() objects from other Q() objects, but with relation crossing context

I commonly find myself writing the same criteria in my Django application(s) more than once. I'll usually encapsulate it in a function that returns a Django Q() object, so that I can maintain the criteria in just one place.
I will do something like this in my code:
def CurrentAgentAgreementCriteria(useraccountid):
'''Returns Q that finds agent agreements that gives the useraccountid account current delegated permissions.'''
AgentAccountMatch = Q(agent__account__id=useraccountid)
StartBeforeNow = Q(start__lte=timezone.now())
EndAfterNow = Q(end__gte=timezone.now())
NoEnd = Q(end=None)
# Now put the criteria together
AgentAgreementCriteria = AgentAccountMatch & StartBeforeNow & (NoEnd | EndAfterNow)
return AgentAgreementCriteria
This makes it so that I don't have to think through the DB model more than once, and I can combine the return values from these functions to build more complex criterion. That works well so far, and has saved me time already when the DB model changes.
Something I have realized as I start to combine the criterion from these functions that is that a Q() object is inherently tied to the type of object .filter() is being called on. That is what I would expect.
I occasionally find myself wanting to use a Q() object from one of my functions to construct another Q object that is designed to filter a different, but related, model's instances.
Let's use a simple/contrived example to show what I mean. (It's simple enough that normally this would not be worth the overhead, but remember that I'm using a simple example here to illustrate what is more complicated in my app.)
Say I have a function that returns a Q() object that finds all Django users, whose username starts with an 'a':
def UsernameStartsWithAaccount():
return Q(username__startswith='a')
Say that I have a related model that is a user profile with settings including whether they want emails from us:
class UserProfile(models.Model):
account = models.OneToOneField(User, unique=True, related_name='azendalesappprofile')
emailMe = models.BooleanField(default=False)
Say I want to find all UserProfiles which have a username starting with 'a' AND want use to send them some email newsletter. I can easily write a Q() object for the latter:
wantsEmails = Q(emailMe=True)
but find myself wanting to something to do something like this for the former:
startsWithA = Q(account=UsernameStartsWithAaccount())
# And then
UserProfile.objects.filter(startsWithA & wantsEmails)
Unfortunately, that doesn't work (it generates invalid PSQL syntax when I tried it).
To put it another way, I'm looking for a syntax along the lines of Q(account=Q(id=9)) that would return the same results as Q(account__id=9).
So, a few questions arise from this:
Is there a syntax with Django Q() objects that allows you to add "context" to them to allow them to cross relational boundaries from the model you are running .filter() on?
If not, is this logically possible? (Since I can write Q(account__id=9) when I want to do something like Q(account=Q(id=9)) it seems like it would).

Maybe someone suggests something better, but I ended up passing the context manually to such functions. I don't think there is an easy solution, as you might need to call a whole chain of related tables to get to your field, like table1__table2__table3__profile__user__username, how would you guess that? User table could be linked to table2 too, but you don't need it in this case, so I think you can't avoid setting the path manually.
Also you can pass a dictionary to Q() and a list or a dictionary to filter() functions which is much easier to work with than using keyword parameters and applying &.
def UsernameStartsWithAaccount(context=''):
field = 'username__startswith'
if context:
field = context + '__' + field
return Q(**{field: 'a'})
Then if you simply need to AND your conditions you can combine them into a list and pass to filter:
UserProfile.objects.filter(*[startsWithA, wantsEmails])

django admin incorrectly adds order by into query

I have noticed thanks to django debug toolbar, that every django admin list page, always add an "ORDER BY id DESC" to all my queries, EVEN if I manually override the get_queryset method of the admin.ModelAdmin (which I usually do because I want custom sorting on some of my admin pages)
I guess this is not really something to worry about, but it is an additional sorting operation the database will need to do, even if it doesn't make sense at all.
Is there any way to prevent this? It seems like on some models (even that, not on all) if I add the ordering meta data, then it won't automatically add an order by by id, but it will however add by that field, which is something also I don't want, because doing so would add that order by to all my other queries across the code.
EDIT: Seems like the culprit is at django.contrib.admin.views.main at ChangeList, on the function get_ordering at line 316 (django 1.7.10)
pk_name = self.lookup_opts.pk.name
if not (set(ordering) & set(['pk', '-pk', pk_name, '-' + pk_name])):
# The two sets do not intersect, meaning the pk isn't present. So
# we add it.
ordering.append('-pk')
I wonder what's the reason behind this...
EDIT:
To improve performance, and since MySQL (and InnoDB) returns data in the clustered index order when no order by is given, I can safely remove that id appending.
To do so, it is quite easy, I have just extended django's ChangeList and modified the get_ordering method. After that, just made a custom admin model that extendes from ModelAdmin and overrides the get_changelist method to the rerturn the above class.
I hope it helps anyone :)

Was having the exact same issue as this question where an admin queryset was 4 times slower due to the ID sort when I already have unique sorts. Thanks to #user1777914 and his work I don't have timeouts every other load! I am just adding this answer here for clarity if others suffer the same. As user1777914 mentions extend the ChangeList:
class NoPkChangeList(ChangeList):
def get_ordering(self, request, queryset):
"""
Returns the list of ordering fields for the change list.
First we check the get_ordering() method in model admin, then we check
the object's default ordering. Then, any manually-specified ordering
from the query string overrides anything. Finally, WE REMOVE the primary
key ordering field.
"""
params = self.params
ordering = list(self.model_admin.get_ordering(request) or self._get_default_ordering())
if ORDER_VAR in params:
# Clear ordering and used params
ordering = []
order_params = params[ORDER_VAR].split('.')
for p in order_params:
try:
none, pfx, idx = p.rpartition('-')
field_name = self.list_display[int(idx)]
order_field = self.get_ordering_field(field_name)
if not order_field:
continue # No 'admin_order_field', skip it
# reverse order if order_field has already "-" as prefix
if order_field.startswith('-') and pfx == "-":
ordering.append(order_field[1:])
else:
ordering.append(pfx + order_field)
except (IndexError, ValueError):
continue # Invalid ordering specified, skip it.
# Add the given query's ordering fields, if any.
ordering.extend(queryset.query.order_by)
# Ensure that the primary key is systematically present in the list of
# ordering fields so we can guarantee a deterministic order across all
# database backends.
# pk_name = self.lookup_opts.pk.name
# if not (set(ordering) & {'pk', '-pk', pk_name, '-' + pk_name}):
# # The two sets do not intersect, meaning the pk isn't present. So
# # we add it.
# ordering.append('-pk')
return ordering
Then in your ModelAdmin just override get_changelist:
class MyAdmin(ModelAdmin):
def get_changelist(self, request, **kwargs):
return NoPkChangeList

The answer of 7Wonders can be reduced to the following statements because only ChangeList._get_deterministic_ordering() needs to change:
# admin.py
class MyAdmin(ModelAdmin):
def get_changelist(self, request, **kwargs):
"""Improve changelist query speed by disabling deterministic ordering.
Please be aware that this might disturb pagination.
"""
from django.contrib.admin.views.main import ChangeList
class NoDeterministicOrderChangeList(ChangeList):
def _get_deterministic_ordering(self, ordering):
return ordering
return NoDeterministicOrderChangeList

How to combine two querysets when defining choices in a ModelMultipleChoiceField?

I am using a piece of code in two separate places in order to dynamically generate some form fields. In both cases, dynamic_fields is a dictionary where the keys are objects and the values are lists of objects (in the event of an empty list, the value is False instead):
class ExampleForm(forms.ModelForm):
def __init__(self, *args, **kwargs):
dynamic_fields = kwargs.pop('dynamic_fields')
super(ExampleForm, self).__init__(*args, **kwargs)
for key in dynamic_fields:
if dynamic_fields[key]:
self.fields[key.description] = forms.ModelMultipleChoiceField(widget=forms.CheckboxSelectMultiple, queryset=dynamic_fields[key], required=False)
class Meta:
model = Foo
fields = ()
In one view, for any key the value is a list of objects returned with a single DB query - a single, normal queryset. This view works just fine.
In the other view, it takes multiple queries to get everything I need to construct a given value. I am first instantiating the dictionary with the values set equal to blank lists, then adding the querysets I get from these multiple queries to the appropriate lists one at a time with basic list comprehension (dict[key] += queryset). This makes each value a 2-D list, which I then flatten (and remove duplicates) by doing:
for key in dict:
dict[key] = list(set(dict[key]))
I have tried this several different ways - directly appending the queries in each queryset to the values/lists, leaving it as a list of lists, using append instead of += - but I get the same error every time: 'list' object has no attribute 'none'.
Looking through the traceback, the error is coming up in the form's clean method. This is the relevant section from the code in django.forms.models:
def clean(self, value):
if self.required and not value:
raise ValidationError(self.error_messages['required'], code='required')
elif not self.required and not value:
return self.queryset.none() # the offending line
My thought process so far: in my first view, I'm generating the list that serves as the value for each key via a single query, but I'm combining multiple queries into a list in my second view. That list doesn't have a none method like I would normally have with a single queryset.
How do I combine multiple querysets without losing access to this method?
I found this post, but I'm still running into the same issue using itertools.chain as suggested there. The only thing I've been able to accomplish with that is changing the error to say 'chain' or 'set' object has no attribute 'none'.
Edit: here's some additional information about how the querysets are generated. I have the following models (only relevant fields are shown):
class Profile(models.Model):
user = models.OneToOneField(User)
preferred_genres = models.ManyToManyField(Genre, blank=True)
class Genre(models.Model):
description = models.CharField(max_length=200, unique=True)
parent = models.ForeignKey("Genre", null=True, blank=True)
class Trope(models.Model):
description = models.CharField(max_length=200, unique=True)
genre_relation = models.ManyToManyField(Genre)
In (the working) view #1, the dictionary I use to generate my fields has keys equal to a certain Genre, and values equal to a list of Genres for whom the key is a parent. In other words, for every key, the queryset is Genre.objects.filter(parent=key, **kwargs).
In the non-functional view #2, we need to start with the profile's preferred_genres field. For every preferred_genre I need to pull the associated Tropes and combine them into a single queryset. Right now, I am looping through preferred_genres and doing something like this:
for g in preferred_genres:
tropeset = g.trope_set.all()
This gets me a bunch of individual querysets containing the information I need, but I can't find a way to combine the multiple tropesets into one big queryset (as opposed to a list without the none attribute). (As an aside, this also hammers my database with a bunch of queries. I am also trying to wrap my head around how I can maybe use prefetch_related to reduce the number of queries, but one thing at a time.)
If I can't combine these querysets into one but CAN somehow accomplish these lookups with a single query, I am all ears! I am now reading the documentation regarding complex queries with the Q object. It is tantalizing - I can conceptualize how this would accomplish what I'm looking for, but only if I can call all of the queries at one time. Since I have to call them iteratively one at a time, I am not sure how to use the Q object to | or & them together.

You can combine querysets by using the | and & operators.
from functools import reduce
from operator import and_, or_
querysets = [q1, q2, q3, ...] # List of querysets you want to combine.
# Objects that are present in *at least one* of the queries
combined_or_querysets = reduce(or_, querysets[1:], querysets[0])
# Objects that are present in *all* of the queries
combined_and_querysets = reduce(and_, querysets[1:], querysets[0])
From Django 1.11+ you can also use the union and intersection methods.

I've found a "solution" to this problem. If I structure the query like so, I can get everything I need in one swoop without having to combine querysets after the fact:
desired_value = Trope.objects.filter(genre_relation__in=preferred_genres).distinct()
I still do not know how to combine multiple querysets into one without losing the inherent "queryset-ness" that seems to be necessary for the form to render properly. However, for my specific use case, restructuring the query as noted renders the issue moot.

Django - AutoField with regards to a foreign key

I have a model with a unique integer that needs to increment with regards to a foreign key, and the following code is how I currently handle it:
class MyModel(models.Model):
business = models.ForeignKey(Business)
number = models.PositiveIntegerField()
spam = models.CharField(max_length=255)
class Meta:
unique_together = (('number', 'business'),)
def save(self, *args, **kwargs):
if self.pk is None: # New instance's only
try:
highest_number = MyModel.objects.filter(business=self.business).order_by('-number').all()[0].number
self.number = highest_number + 1
except ObjectDoesNotExist: # First MyModel instance
self.number = 1
super(MyModel, self).save(*args, **kwargs)
I have the following questions regarding this:
Multiple people can create MyModel instances for the same business, all over the internet. Is it possible for 2 people creating MyModel instances at the same time, and .count() returns 500 at the same time for both, and then both try to essentially set self.number = 501 at the same time (raising an IntegrityError)? The answer seems like an obvious "yes, it could happen", but I had to ask.
Is there a shortcut, or "Best way" to do this, which I can use (or perhaps a SuperAutoField that handles this)?
I can't just slap a while model_not_saved: try:, except IntegrityError: in, because other restraints in the model could lead to an endless loop, and a disaster worse than Chernobyl (maybe not quite that bad).

You want that constraint at the database level. Otherwise you're going to eventually run into the concurrency problem you discussed. The solution is to wrap the entire operation (read, increment, write) in a transaction.
Why can't you use an AutoField for instead of a PositiveIntegerField?
number = models.AutoField()
However, in this case number is almost certainly going to equal yourmodel.id, so why not just use that?
Edit:
Oh, I see what you want. You want a numberfield that doesn't increment unless there's more than one instance of MyModel.business.
I would still recommend just using the id field if you can, since it's certain to be unique. If you absolutely don't want to do that (maybe you're showing this number to users), then you will need to wrap your save method in a transaction.
You can read more about transactions in the docs:
http://docs.djangoproject.com/en/dev/topics/db/transactions/
If you're just using this to count how many instances of MyModel have a FK to Business, you should do that as a query rather than trying to store a count.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.