Iterating over a Django QuerySet while deleting objects in the same QuerySet - python

I am wondering what is the best way to iterate over a Django QuerySet while deleting objects within the Queryset? For example, say you have a log table with entries at specific times, and you wanted to archive them so that there is no more than 1 entry every 5 minutes. I know this may be wrong, but this is kind of what I am going for:
toarchive = Log.objects.all().order_by("-date")
start = toarchive[0].date
interval = start - datetime.timedelta(minutes=5)
for entry in toarchive[1:]:
if entry.date > interval:
entry.delete()
else:
interval = entry.date - datetime.timedelta(minutes=5)

So I guess I have answered my own question by asking it, if anyone else is curious. I thought there would have been a problem when deleting objects while iterating over them, but there isn't. The code snippet in the question is the right way to do it.

Querysets have a delete method that will delete all the results of that queryset. For the example you gave
toarchive.filter(date__gt=interval).delete()
will work. If you're doing a test that can't be done in a filter, however, the method you described is probably best.

Related

Efficient way of dividing a querySet with a filter, while keeping all data?

I have a 'Parts' model, and these parts are either linked to a 'Device' model or not yet. The actual "link" is done via more than just one ForeignKey, i.e. I have to go through 3 or 4 Models all linked between each other with ForeignKeys to finally get the data I want.
My question is: What is the most efficient way of getting both the linked and non-linked parts ?
Right now, I am getting all parts and simply outputting that, but I would like a little separation:
allParts = Parts.object.all()
I know I could do something similar to this:
allParts = Parts.object.all()
linkedParts = allParts.objects.filter(...device_id=id)
nonLinkedParts = allParts.objects.exclude(...device_id__in=[o.id for o in linkedParts])
But is that really the most efficient solution ? I feel like there would be a better way, but I have not yet found anything in the docs about it.
Just to clarify, there are only linked, and non-linked parts. These are mutually exclusive and exhaustive.
Thank you very much
If you are only interested in obtaining the elements, for example to iterate over it, you can work with two lists:
allParts = Parts.object.all()
linkedParts = []
nonLinkedParts = []
for part in allParts:
if part.device_id == id:
linkedParts.append(part)
else:
nonLinkedParts.append(part)
since these are lists, you can no longer (efficiently) filter further, or order by a specific condition. If you want to order it by a certain field, you should do that already in the allParts database query.

How to delete first N items from queryset in django

I'm looking to delete only the first N results returned from a query in django. Following the django examples here which I found while reading this SO answer, I was able to limit the resulting set using the following code
m = Model.objects.all()[:N]
but attempting to delete it generates the following error
m.delete()
AssertionError: Cannot use 'limit' or 'offset' with delete.
Is there a way to accomplish this in django?
You can not delete through a limit. Most databases do not support this.
You can however accomplish this in two steps, like:
Model.objects.filter(id__in=list(Models.objects.values_list('pk', flat=True)[:N])).delete()
We thus first retrieve the primary keys of the first N elements, and then use this in a .filter(..) part to delete those items in bulk.
You don't have the option directly. So you should delete it by some advanced ways. For example:
not_ideal = Model.objects.all()[N:].values_list("id", flat=True)
Model.objects.exclude(pk__in=list(not_ideal)).delete()
Using this way you are finding your not ideal objects and delete everything except them.
You can use anything beside id. But id is unique and will help you to optimize.
Notice that in the first line I'm getting the items which are from N to the last.(Not from the first to N)
Try this.
Loop through all filtered objects
delatable_objects = Model.objects.all()[:N]
for m in delatable_objects:
m.delete()
You can loop through the queryset and apply delete method to the objects.
for obj in m:
obj.delete()

Length of _QueryIterator

I'm trying to get the length of the result of the following query:
matchingTitles = db.GqlQuery("SELECT * FROM Post WHERE title=:1",title).run()
I tried doing this:
if(len(matchingTitles)>0):
But I get the following error:
TypeError: object of type '_QueryIterator' has no len()
I've been searching all over for the _QueryIteratorobject docs, but can't seem to find any. I instead just iterated over it and incremented a number "for each" item in the set. Wondering if there was a better way...
Thanks!
EDIT
There's a better way to do this. Instead of running and then counting, you can simply do:
matchingTitles = db.GqlQuery("SELECT * FROM Post WHERE title=:1",title).count()
and it returns the number of entities.
This can take a lot of memory, but you could use itertools.tee:
https://docs.python.org/2/library/itertools.html#itertools.tee
For anyone that comes across this question actually looking for the length of a _QueryIterator, you can try:
len(list(matchingTitles)) # This will load all the results into memory before counting.
# OR
sum([1 for _ in matchingTitles])
As mentioned though - it's usually better / faster / cheaper to use the database's count functionality than loading all the records and iterating over them. There may be a reason you can't use that - in which case those two options are available.

Search not exists in Django

I'm really new to django programming, and I'm facing a problem I don't really know how to solve:
I want to get a list of users who have many string attributes, but only the users whom none of it's attributes is equal to a given one.
I have this piece of code
all_users = list(UserProfile.objects.attribute.filter(type=given).exists())
but this code will return me the users who have that attribute, so here's the question: How I can modify this line (or what lines do I need to add) in order to get the list of users without this attribute
Ps: Maybe I didn't explained myself clearly as I don't really know how to specify my problem in english, but, if you don't know what I'm asking I can try again
Thanks all
You can use exclude:
all_users = list(UserProfile.objects.attribute.exclude(type=given).exists())
To quote the docs:
To create such a subset, you refine the initial QuerySet, adding filter conditions. The two most common ways to refine a QuerySet are:
filter(**kwargs)
Returns a new QuerySet containing objects that match the given lookup parameters.
exclude(**kwargs)
Returns a new QuerySet containing objects that do not match the given lookup parameters.

Make a query to get the latest objects, excluding duplicates of obj.x [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Django, What's the best ,fastest way to get only first and last element from something, Customer.objects.xxxx
Hmm this is hard to explain, but this is what's happening..
I have a model A that has x and date
There are multiple A's with the same x and different dates.
From:
A.x = 1 (newest)
A.x = 1
A.x = 1 (oldest)
A.x = 2
I want to get only the newest of each x, in this case the first one, and the last one, excluding the older x=1 duplicates.
I've thought of doing some nasty loops or using itertools, but I'm not sure what's the best way to achieve this.
Any help is appreciated.
Starting Django 1.4 the .distinct() queryset API accepts field parameters
So I guess you can do something like this
Queryset.order_by('-date').distinct('field')
You can read more about it here.
You can use the latest() queryset method. It's basically the same as using .order_by()[0].
This is nice since, I always forget if I need to use the "-" or not when using order_by() with dates.
latest_object = Model.objects.latest("date/datetime fieldname")

Categories