I have a huge amount of data in my db.
I cannot use .delete() method cause performance of Django ORM is insufficient in my case.
_raw_delete() method suits me cause I can do it python instead using raw SQL.
But I have problem I have no idead how can I delete relation tables using _raw_delete. They need to be deleted before models cause I have restrict in DB. Any ideas how can I achieve this?
I have found a solution.
You can operate on link model with this:
link_model = MyModel._meta.get_field('my_m2m_field').remote_field.through
qs = link_model.objects.filter(mymodel_id__in=mymodel_ids)
qs._raw_delete(qs.db)
Related
I have this excel sheet where i can have as many as 3000-5000 or may be more data( email ids) and i need to insert each of these into my db one by one .I have been asked not to make these many write operations in one go .How should i go about solving this so that i don't do so many entries to database ?Any hint will be highly appreciated .
The solution i could think of is this .
https://www.caktusgroup.com/blog/2019/01/09/django-bulk-inserts/
I'd vote for doing bulk operations withithe django model itself
you can for example do the following with your model class :
... Entry.objects.bulk_create([
... Entry(headline='This is a test'),
... Entry(headline='This is only a test'),
... ])
(code from django website)
you can also you transaction dicorator were you can in sure that a database trasaction is atomic (All or Nothing), thisis handy when you want to do long processeing for your entities or transform them
from django.db import transaction
#transaction.atomic
decorate your method that inserts records into the database
I think that the link you included would work fairly well!
Just wanted to throw one other option out there. Many databases have batch/bulk upload capabilities. Depending on what DB you're using, you could potentially connect directly to it, and perform your writes without Django.
I'm trying to provide an interface for the user to write custom queries over the database. I need to make sure they can only query the records they are allowed to. In order to do that, I decided to apply row based access control using django-guardian.
Here is how my schemas look like
class BaseClass(models.Model):
somefield = models.TextField()
class Meta:
permissions = (
('view_record', 'View record'),
)
class ClassA(BaseClass):
# some other fields here
classb = models.ForeignKey(ClassB)
class ClassB(BaseClass):
# some fields here
classc = models.ForeignKey(ClassC)
class ClassC(BaseClass):
# some fields here
I would like to be able to use get_objects_for_group as follows:
>>> group = Group.objects.create('some group')
>>> class_c = ClassC.objects.create('ClassC')
>>> class_b = ClassB.objects.create('ClassB', classc=class_c)
>>> class_a = ClassA.objects.create('ClassA', classb=class_b)
>>> assign_perm('view_record', group, class_c)
>>> assign_perm('view_record', group, class_b)
>>> assign_perm('view_record', group, class_a)
>>> get_objects_for_group(group, 'view_record')
This gives me a QuerySet. Can I use the BaseClass that I defined above and write a raw query over other related classes?
>>> qs.intersection(get_objects_for_group(group, 'view_record'), \
BaseClass.objects.raw('select * from table_a a'
'join table_b b on a.id=b.table_a_id '
'join table_c c on b.id=c.table_b_id '
'where some conditions here'))
Does this approach make sense? Is there a better way to tackle this problem?
Thanks!
Edit:
Another way to tackle the problem might be creating a separate table for each user. I understand the complexity this might add to my application but:
The number of users will not be more than 100s for a long time. Not a consumer application.
Per our use case, it's quite unlikely that I'll need to query across these tables. I won't write a query that needs to aggregate anything from table1, table2, table3 that belongs to the same model.
Maintaining a separate table per customer could have an advantage.
Do you think this is a viable approach?
After researching many options I found out that I can solve this problem at the database level using Row Level Security on PostgreSQL. It ends up being the easiest and the most elegant.
This article helped me a lot to bridge the application level users with PostgreSQL policies.
What I learned by doing my research is:
Separate tables could still be an option in the future when customers can potentially affect each others' query performances since they are allowed to run arbitrary queries.
Trying to solve it at the ORM level is almost impossible if you are planning to use raw or ad-hoc queries.
I think you already know what you need to do. The word you are looking for is multitenancy. Although it is not one table per customer. The best suit for you will be one schema per customer. Unfortunately, the best article I had on multitenancy is no more available. See if you can find a cached version: https://msdn.microsoft.com/en-us/library/aa479086.aspx otherwise there are numerous articles availabe on the internet.
Another viable approach is to take a look at custom managers. You could write one custom manager for each Model-Customer and query it accordingly. But all this will lead to application complexity and will soon get out of your hand. Any bug in the application security layer is a nightmare to you.
Weighing both I'd be inclined to say multitenancy solution as you said in your edit is by far the best approach.
First, you should provide us with more details, how is your architecture set and built, with django so that we can help you. Have you implemented an API? using django template is not really a good idea if you are building a large scale application, consuming a lot of data.Because this can affect the query load massively.I can suggest extracting your front-end from the backend.
Does anyone know if you can do both the .select_for_update() and .select_related() statements in a single query? Such as:
employee = get_object_or_404(Employee.objects.select_for_update().
select_related(‘company’), pk=3)
It seemed to work fine in one place in my code, but a second usage threw an "InternalError: current transaction is aborted" for a series of unit tests. Removing the .select_related and leaving just .select_for_update made the error go away, but I don't know why. I'd like to use them both to optimize my code, but if forced to choose, I'll pick select_for_update. Wondering if there's a way I can use both. Using postgres and django 1.9. Thanks!
Since Django 2.0, you can use select_for_update together with select_related even on nullable relations - by using new parameter of=...
Using their Person example from docs, you could do
Person.objects.select_related('hometown').select_for_update(of=('self',))
which would lock only the Person object
You can't use select_related with foreign keys that are nullable when you are using select_for_update on the same queryset.
This will work in all cases:
Book.objects.select_related().select_for_update().get(name='Doors of perception')
I am in a situation where I would have to insert multiple records into a postgre db through an ajax call, based on a foreignkey.
Currently I am using db1.db2_set.create(...) for each record, looping over a list of dictionaries.
Is this the best way to do it? It seems like I'm hitting the database for every insert.
I'm pretty sure that django will query database when you call save() method. So if you do something like:
for i in objects:
db1.db2_set.create(i)
db1.save()
It may acess database only once. But still, this might be usefull:
http://djangosnippets.org/snippets/766/
It's a middleware that you can add to see how many querys your django application uses in the every page you are accessing.
I am trying to design a tagging system with a model like this:
Tag:
content = CharField
creator = ForeignKey
used = IntergerField
It is a many-to-many relationship between tags and what's been tagged.
Everytime I insert a record into the assotication table,
Tag.used is incremented by one, and decremented by one in case of deletion.
Tag.used is maintained because I want to speed up answering the question 'How many times this tag is used?'.
However, this seems to slow insertion down obviously.
Please tell me how to improve this design.
Thanks in advance.
http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html
If your database support materialized indexed views then you might want to create one for this. You can get a large performance boost for frequently run queries that aggregate data, which I think you have here.
your view would be on a query like:
SELECT
TagID,COUNT(*)
FROM YourTable
GROUP BY TagID
The aggregations can be precomputed and stored in the index to minimize expensive computations during query execution.
I don't think it's a good idea to denormalize your data like that.
I think a more elegant solution is to use django aggregation to track how many times the tag has been used http://docs.djangoproject.com/en/dev/topics/db/aggregation/
You could attach the used count to your tag object by calling something like this:
my_tag = Tag.objects.annotate(used=Count('post'))[0]
and then accessing it like this:
my_tag.used
assuming that you have a Post model class that has a ManyToMany field to your Tag class
You can order the Tags by the named annotated field if needed:
Tag.objects.annotate(used=Count('post')).order_by('-used')