Django 1.10 Delete large cascade querySet - python

I'm trying to delete all objects from a large queryset. Here is my models.py
from __future__ import unicode_literals
from django.contrib.auth.models import User
from django.db import models
class Fund(models.Model):
name = models.CharField(max_length=255, blank=False, null=False)
start_date = models.DateField(default=None, blank=False, null=False)
def __unicode__(self):
return self.name
class FundData(models.Model):
fund = models.ForeignKey(Fund, on_delete=models.CASCADE)
date = models.DateField(default=None, blank=False, null=False)
value = models.FloatField(default=None, blank=True, null=True)
def __unicode__(self):
return "{} --- Data: {} --- Value: {} ".format(str(self.fund), str(self.date), str(self.value))
But when I try to delete all records the query take too much time and mysql is being timed out.
Fund.objects.all().delete()
What is the best way to manage this operation inside a view?
There is a way to do that calling a django command from terminal?

First of all you can change the timeout time of MySQL by edditing settings.py
DATABASES = {
'default': {
...
OPTIONS = {
'connect_timeout': 5, # your timeout time
}
...
}
}
The reasons why .delete() may be slow are:
Django has to ensure cascade deleting functions properly. It has to look for foreign keys to your models
Django has to somehow handle pre and post-save signals for your models
If you are sure that your models don't have cascade deleting or any signals to be handled, you can try to use private _raw_delete as follows:
queryset._raw_delete(queryset.db)
You can find more details on it here

Finally I got a simple solution to delete all cascade object breaking down the macro deletion operation preferring delete Fund objects one by one. Because it is a really long operation (approximately 1 second for each object for thousands objects) I assigned it to a celery worker. Slow but safe I think, if someone got a better solution please let me know!
#shared_task
def reset_funds():
for fund in Fund.objects.all():
print "Delete Fund: {}".format(fund.name)
fund.delete()
return "All Funds Deleted!"

Related

How do I use django GenericForeignKeys?

I have the following two models:
class URLResource(models.Model):
class ResourceType(models.TextChoices):
VID = 'VID', _('Video')
DOC = 'DOC', _('Document')
resource_type = models.CharField(
choices=ResourceType.choices,
default=ResourceType.UNK,
max_length=3)
title = models.CharField(max_length=280, null=True)
url = models.URLField()
created = models.DateTimeField(auto_now_add=True)
def __repr__(self):
return f'URLResource {self.resource_type}: {self.title} URL: {self.url}'
class Record(models.Model):
user = models.ForeignKey(
settings.AUTH_USER_MODEL,
...)
date = models.DateTimeField(default=now, blank=True)
# <resource>: I want to point to a resource somehow
def __repr__(self):
return f'Record {{ User: {self.user}, Resource: }}'
My use case is such that a lot of users can create Records and list them, but since the underlying URLResource is going to be same, I thought I could use many-to-one relationship from Records-to-URLResource so the underlying resources would remain the same. However, even though I have only one type of resource now i.e. URLResource, I also want to roll out other resources like VideoResource, PhysicalResource etc. Clearly, I can't simply use the many-to-one relationship. So, I figured I could use Django's contenttype framework, but I still can't get things to work.
I want to be able to do atleast these two things:
When creating a Record, I can easily assign it a URLResource (or any other Resource) (but only one resource).
I can easily access the URLResource fields. For example like Record.objects.get(user=x).resource.resource_type.

What is the optimal way to store 100k daily database inserts in Django?

My problem is that Django inserts entries waaaaaaay too slow ( i didnt even time but it was more than 5 mins) for 100k entries from Pandas csv file. What i am doing is parsing csv file and then save those objects to postgresql in Django. It is going to be a daily cronjob with csv files differ for most of the entries(some can be duplicates from the previous days or owner could be the same)
I haven't tried raw queries, but i dont think that would help much.
and i am really stuck at this point honestly. apart from some iteration manipulations and making a generator, rather than iterator i can not somehow improve the time of insertions.
class TrendType(models.Model):
""" Описывает тип отчета (посты, видео, субъекты)"""
TREND_TYPE = Choices('video', 'posts', 'owners') ## << mnemonic
title = models.CharField(max_length=50)
mnemonic = models.CharField(choices=TREND_TYPE, max_length=30)
class TrendSource(models.Model):
""" Источник отчета (файла) """
trend_type = models.ForeignKey(TrendType, on_delete=models.CASCADE)
load_date = models.DateTimeField()
filename = models.CharField(max_length=100)
class TrendOwner(models.Model):
""" Владелец данных (группа, юзер, и т.п.)"""
TREND_OWNERS = Choices('group', 'user', '_')
title = models.CharField(max_length=50)
mnemonic = models.CharField(choices=TREND_OWNERS, max_length=30)
class Owner(models.Model):
""" Данные о владельце """
link = models.CharField(max_length=255)
name = models.CharField(max_length=255)
trend_type = models.ForeignKey(TrendType, on_delete=models.CASCADE)
trend_owner = models.ForeignKey(TrendOwner, on_delete=models.CASCADE)
class TrendData(models.Model):
""" Модель упаковка всех данных """
owner = models.ForeignKey(Owner, on_delete=models.CASCADE)
views = models.IntegerField()
views_u = models.IntegerField()
likes = models.IntegerField()
shares = models.IntegerField()
interaction_rate = models.FloatField()
mean_age = models.IntegerField()
source = models.ForeignKey(TrendSource, on_delete=models.CASCADE)
date_trend = models.DateTimeField() # << take it as a date
Basically, i would love a good solution for a 'fast' insertion to a database and is it even possible given these models.
Maybe you don't need an ORM here? You can try to implement a simple wrapper around typical SQL requests
Use bulk read/writing, using bulk_create() in Django ORM, or in your wrapper
Check https://docs.djangoproject.com/en/2.2/topics/db/optimization/
The problem is not with django but rather with postgresql itself. My suggestion would be to change your backend. Postgresql is good for UPDATE data, but there are better DBs for INSERT data. Postgresql vs TimescaleDB However, I dont think there is django ORM for TimescaleDB.
My suggestion would be to use Redis. The primary use is cache in memory but you can make it to persist your data too. And there is also ORM for python with redis called ROM

Multiple datetime fields for a model in Django

I want to create a log system to register some faults I need to handle at my work. I use Django and my models look like these:
class Chan(models.Model):
channelname = models.CharField(max_length=30)
freq = models.FloatField(default = 0.0)
def __unicode__(self):
return u'%s' % (self.channelname)
# timestamp object
class EventTime(models.Model):
since = models.DateTimeField()
till = models.DateTimeField()
def __unicode__(self):
return u'%s' % self.since.strftime('%Y-%m-%d %H:%M')
class Fault(models.Model):
channel = models.ManyToManyField(Chan)
description = models.CharField(max_length=200, default="-")
message = models.TextField()
timeevent = models.ManyToManyField(EventTime,null=True)
visible = models.BooleanField()
Firstly I used just one EventTime object but I soon realized I need to be able to choose several time periods because the same event could happen several times a day. So it would be too tedious to create a new record of Fault each time. So I basically needed something like this:
The problem is that 'ManyToManyField' is too unhandy to use because I don't need to keep these values for other faults. So I don't know what solution I can use for it. I don't know how many time periods I need. Maybe I could add an extra Text field to my table where I would keep comma-separated datetime objects converted into a string like '2017-11-06 18:36,2017-11-06 18:37'. But I don't know where to set this extra-conversion because I want to use a standart DateTimeField in Django admin site to set it before I make this conversion. Or maybe I could change the interface itself and add some Javascript. Maybe someone could give me advice or share some useful links. Thank you.
I would recommend using a Many-to-one relation together with InlineModelAdmin for the django admin.
models.py
class Fault(models.Model):
channel = models.ManyToManyField(Chan)
description = models.CharField(max_length=200, default="-")
message = models.TextField()
visible = models.BooleanField()
class EventTime(models.Model):
since = models.DateTimeField()
till = models.DateTimeField()
fault = models.ForeignKey(Fault, on_delete=models.CASCADE, related_name='timeevents')
def __unicode__(self):
return u'%s' % self.since.strftime('%Y-%m-%d %H:%M')
admin.py
from .models import Fault, EventTime
from django.contrib import admin
class EventTimeInline(admin.TabularInline):
model = EventTime
#admin.register(Fault)
class FaultAdmin(admin.ModelAdmin):
# ...
inlines = [EventTimeInline,]

Django `bulk_create` with related objects

I have a Django system that runs billing for thousands of customers on a regular basis. Here are my models:
class Invoice(models.Model):
balance = models.DecimalField(
max_digits=6,
decimal_places=2,
)
class Transaction(models.Model):
amount = models.DecimalField(
max_digits=6,
decimal_places=2,
)
invoice = models.ForeignKey(
Invoice,
on_delete=models.CASCADE,
related_name='invoices',
null=False
)
When billing is run, thousands of invoices with tens of transactions each are created using several nested for loops, which triggers an insert for each created record. I could run bulk_create() on the transactions for each individual invoice, but this still results in thousands of calls to bulk_create().
How would one bulk-create thousands of related models so that the relationship is maintained and the database is used in the most efficient way possible?
Notes:
I'm looking for a native Django solution that would work on all databases (with the possible exception of SQLite).
My system runs billing in a celery task to decouple long-running code from active requests, but I am still concerned with how long it takes to complete a billing cycle.
The solution should assume that other requests or running tasks are also reading from and writing to the tables in question.
You could bulk_create all the Invoice objects, refresh them from the db, so that they all have ids, create the Transaction objects for all the invoices and then also save them with bulk_create. All of this can be done inside a single transaction.atomic context.
Also, specifically for django 1.10 and postrgres, look at this answer.
You can do it with two bulk create queries, with following method.
new_invoices = []
new_transactions = []
for loop:
invoice = Invoice(params)
new_invoices.append(invoice)
for loop:
transaction = Transaction(params)
transaction.invoice = invoice
new_transactions.append(transaction)
Invoice.objects.bulk_create(new_invoices)
for each in new_transactions:
each.invoice_id = each.invoice.id
Transaction.objects.bulk_create(new_transactions)
Another way for this purpose can be like the below code snippet:
from django.utils import timezone
from django.db import transaction
new_invoices = []
new_transactions = []
for sth in sth_else:
...
invoice = Invoice(params)
new_invoices.append(invoice)
for sth in sth_else:
...
new_transactions.append(transaction)
with transaction.atomic():
other_invoice_ids = Invoice.objects.values_list('id', flat=True)
now = timezone.now()
Invoice.objects.bulk_create(new_invoices)
new_invoices = Invoice.objects.exclude(id__in=other_invoice_ids).values_list('id', flat=True)
for invoice_id in new_invoices:
transaction = Transaction(params, invoice_id=invoice_id)
new_transactions.append(transaction)
Transaction.objects.bulk_create(new_transactions)
I write this answer based on this post on another question in the community.

Django Admin filter by function / filter only by first object in reverse foreign key lookup

I am trying to build a filter by function in django. From what I've learned by googling this is quite hard to achieve.
So here is my code:
class TrackingEventType(models.Model):
name = models.CharField(blank=False, null=False, max_length=255)
class TrackingEvent(models.Model):
datetime = models.DateTimeField(blank=False, null=False, default=datetime.now, verbose_name="Zeitpunkt")
event_type = models.ForeignKey(TrackingEventType, help_text="Art des Events")
tracking = models.ForeignKey('Tracking')
class Meta:
ordering = ['-datetime']
class Tracking(models.Model):
tracking_no = models.CharField(blank=False, null=False, max_length=10, unique=True, verbose_name="Tracking Nummer")
def get_last_event(self):
"""
Todo: return the latest event.
"""
return TrackingEvent.objects.filter(tracking=self.id).first()
get_last_event.short_description = 'Last event'
class Meta:
ordering = ['-tracking_no']
My goal is to make it possible to filter Tracking objects by their last events type name. Displaying the result of the funtion in django admin is easy, but adding a corresponding filter isn't.
My idea was also to try to build a filter something like:
trackingevent__set__first__event_type__name
But yeah, that would be too easy :)
Any inputs are welcome.
As you've discovered it isn't trivial to filter in that manner. If you are accessing that information regularly it is probably also not very efficient either.
I would suggest that you store a reference to the latest tracking event in the Tracking model itself:
class Tracking(models.Model)
# ...
last_event = models.ForeignKey(TrackingEvent, null=True)
You would then use signals to update this reference whenever a new tracking event is created. Something along the lines of:
from django.db.models.signals import post_save
from django.dispatch import receiver
#receiver(post_save, sender=TrackingEvent)
def update_latest_tracking_event(sender, instance, created, **kwargs):
# Is this a new event?
if created:
# If yes, then update the Tracking reference
tracking = instance.tracking
tracking.last_event = instance
tracking.save()
(Please read the documentation on where to put this code).
Once all this is in place it becomes easy to filter based on the last tracking event type:
# I'm just guess what event types you have...
cancellation = TrackingEventType.objects.get(name='cancel')
Tracking.objects.filter(last_event__event_type=cancellation)

Categories