Let's say I have this data model:
class Workflow(models.Model):
...
class Command(models.Model):
workflow = models.ForeignKey(Workflow)
...
class Job(models.Model):
command = models.ForeignKey(Command)
...
Suppose somewhere I want to loop through all the Workflow objects, and for each workflow I want to loop through its Commands, and for each Command I want to loop through each Job. Is there a way to structure this with a single query?
That is, I'd like Workflow.objects.all() to join in its dependent models, so I get a collection that has dependent objects already cached, so workflows[0].command_set.get() doesn't produce an additional query.
Is this possible?
The other way around it's easy since you can do
all_jobs = Job.objects.select_related().all()
And any job.command or job.command.workflow won't produce additional query.
Not sure if it's possible with a Workflow query.
I think the only way you could do that would be using django.db.connection and write your own query.
Since this would be iterating all instances of Job (your ForeignKeys aren't set null) anyway you could select all Job's and then group them outside of the ORM
Related
I need to make a function that will be launched in celery and will take records from the model in turn, check something and write data to another model with onetoone relationship. There are a lot of entries and using model_name.objects.all () is not appropriate (it will take a lot of memory and time) how to do it correctly.
You can use an iterator over the queryset https://docs.djangoproject.com/en/dev/ref/models/querysets/#iterator so your records are fetched on by one
model_iterator = your_model.objects.all().iterator()
for record in model_iterator:
do_something(record)
I'm trying to provide an interface for the user to write custom queries over the database. I need to make sure they can only query the records they are allowed to. In order to do that, I decided to apply row based access control using django-guardian.
Here is how my schemas look like
class BaseClass(models.Model):
somefield = models.TextField()
class Meta:
permissions = (
('view_record', 'View record'),
)
class ClassA(BaseClass):
# some other fields here
classb = models.ForeignKey(ClassB)
class ClassB(BaseClass):
# some fields here
classc = models.ForeignKey(ClassC)
class ClassC(BaseClass):
# some fields here
I would like to be able to use get_objects_for_group as follows:
>>> group = Group.objects.create('some group')
>>> class_c = ClassC.objects.create('ClassC')
>>> class_b = ClassB.objects.create('ClassB', classc=class_c)
>>> class_a = ClassA.objects.create('ClassA', classb=class_b)
>>> assign_perm('view_record', group, class_c)
>>> assign_perm('view_record', group, class_b)
>>> assign_perm('view_record', group, class_a)
>>> get_objects_for_group(group, 'view_record')
This gives me a QuerySet. Can I use the BaseClass that I defined above and write a raw query over other related classes?
>>> qs.intersection(get_objects_for_group(group, 'view_record'), \
BaseClass.objects.raw('select * from table_a a'
'join table_b b on a.id=b.table_a_id '
'join table_c c on b.id=c.table_b_id '
'where some conditions here'))
Does this approach make sense? Is there a better way to tackle this problem?
Thanks!
Edit:
Another way to tackle the problem might be creating a separate table for each user. I understand the complexity this might add to my application but:
The number of users will not be more than 100s for a long time. Not a consumer application.
Per our use case, it's quite unlikely that I'll need to query across these tables. I won't write a query that needs to aggregate anything from table1, table2, table3 that belongs to the same model.
Maintaining a separate table per customer could have an advantage.
Do you think this is a viable approach?
After researching many options I found out that I can solve this problem at the database level using Row Level Security on PostgreSQL. It ends up being the easiest and the most elegant.
This article helped me a lot to bridge the application level users with PostgreSQL policies.
What I learned by doing my research is:
Separate tables could still be an option in the future when customers can potentially affect each others' query performances since they are allowed to run arbitrary queries.
Trying to solve it at the ORM level is almost impossible if you are planning to use raw or ad-hoc queries.
I think you already know what you need to do. The word you are looking for is multitenancy. Although it is not one table per customer. The best suit for you will be one schema per customer. Unfortunately, the best article I had on multitenancy is no more available. See if you can find a cached version: https://msdn.microsoft.com/en-us/library/aa479086.aspx otherwise there are numerous articles availabe on the internet.
Another viable approach is to take a look at custom managers. You could write one custom manager for each Model-Customer and query it accordingly. But all this will lead to application complexity and will soon get out of your hand. Any bug in the application security layer is a nightmare to you.
Weighing both I'd be inclined to say multitenancy solution as you said in your edit is by far the best approach.
First, you should provide us with more details, how is your architecture set and built, with django so that we can help you. Have you implemented an API? using django template is not really a good idea if you are building a large scale application, consuming a lot of data.Because this can affect the query load massively.I can suggest extracting your front-end from the backend.
Is there any way of using JsonProperties in queries in NDB/GAE? I can't seem to find any information about this.
Person.query(Person.custom.eye_color == "blue").fetch()
With a model looking something like this:
class Person(ndb.Model):
height = ndb.IntegerProperty(default=-1)
#...
#...
custom = ndb.JsonProperty(indexed=False, compressed=False)
The use case is this: I'm storing data about customers, where we at first only needed to query specific data. Now, we want to be able to query for any type of registred data about the persons. For example eye color, which some may have put into the system, or any other custom key/value pair in our JsonProperty.
I know about the expando class but for me, it seems a lot easier to be able to query jsonproperty and to keep all the custom properties on the same "name"; custom. That means that the front end can just loop over the properties in custom. If an expando class would be used, it would be harder to differentiate.
Rather than using a JSONProperty have you considered using a StructuredProperty. You maintain the same structure, just stored differently and you can filter by sub components of the StructureProperty with some restrictions, but that may be sufficient.
See https://developers.google.com/appengine/docs/python/ndb/queries#filtering_structured_properties
for querying StructuredProperties.
I'm using Django/Python, but pseudo-code is definitely acceptable here.
Working with some models that already exist, I have Employees that each have a Supervisor, which is essentially a Foreign Key type relationship to another Employee.
Where the Employee/Supervisor hierarchy is something like this:
Any given Employee has ONE Supervisor. That Supervisor may have one or more Employees "beneath", and has his/her own Supervisor as well. Retrieving my "upline" should return my supervisor, his supervisor, her supervisor, etc., until reaching an employee that has no supervisor.
Without going hog-wild and installing new apps to manage these relationships, as this is an existing codebase and project, I'm wondering the "pythonic" or correct way to implement the following functions:
def get_upline(employee):
# Get a flat list of Employee objects that are
# 'supervisors' to eachother, starting with
# the given Employee.
pass
def get_downline(employee):
# Starting with the given Employee, find and
# return a flat list of all other Employees
# that are "below".
pass
I feel like there may be a somewhat simple way to do this with the Django ORM, but if not, I'll take any suggestions.
I haven't thoroughly checked out Django-MPTT, but if I can leave the models in tact, and simply gain more functionality, it would be worth it.
You don't have to touch your models to be able to use django-mptt; you just have to create a parent field on your model, django-mptt creates all the other attributes for mptt automaitcally, when you register your model: mptt.register(MyModel).
Though if you just need the 'upline' hierarchy you wouldn't need nested sets. The bigger performance problem is going the opposite direction and collect eg. children/leaves etc, which makes it necessary to work on a nested set model!
Relational databases are not good for this kind of graph queries, so your only option is to do a bunch of query. Here is a recursive implementation:
def get_upline(employee):
if self.supervisor:
return [employee] + self.supervisor.get_upline()
else:
return [employee]
def get_download(employee):
l = [employee]
for minion in self.minion_set.all():
l.extend(minion.get_download())
return l
I am trying to design a tagging system with a model like this:
Tag:
content = CharField
creator = ForeignKey
used = IntergerField
It is a many-to-many relationship between tags and what's been tagged.
Everytime I insert a record into the assotication table,
Tag.used is incremented by one, and decremented by one in case of deletion.
Tag.used is maintained because I want to speed up answering the question 'How many times this tag is used?'.
However, this seems to slow insertion down obviously.
Please tell me how to improve this design.
Thanks in advance.
http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html
If your database support materialized indexed views then you might want to create one for this. You can get a large performance boost for frequently run queries that aggregate data, which I think you have here.
your view would be on a query like:
SELECT
TagID,COUNT(*)
FROM YourTable
GROUP BY TagID
The aggregations can be precomputed and stored in the index to minimize expensive computations during query execution.
I don't think it's a good idea to denormalize your data like that.
I think a more elegant solution is to use django aggregation to track how many times the tag has been used http://docs.djangoproject.com/en/dev/topics/db/aggregation/
You could attach the used count to your tag object by calling something like this:
my_tag = Tag.objects.annotate(used=Count('post'))[0]
and then accessing it like this:
my_tag.used
assuming that you have a Post model class that has a ManyToMany field to your Tag class
You can order the Tags by the named annotated field if needed:
Tag.objects.annotate(used=Count('post')).order_by('-used')