How can I retrieve data from multiple tables - python

I need to retrieve data from multiple tables, with a dynamically built filter that might or might not use data from any of the tables.
So say I have this:
class Solution(models.Model):
name = models.CharField(max_length=MAX, unique=True)
# Other data
class ExportTrackingRecord(models.Model):
tracked_id = models.IntegerField()
solution = models.ForeignKey(Solution)
# Other data
Then elsewhere I need to do:
def get_data(user_provided_criteria):
etr = ExportTrackingRecord.objects.filter(make_Q_object(user_provided_criteria)).select_related()
for data in etr:
s = data.solution
# do things with data from both tables
As far as I can tell, if I happen to filter on a field in Solution, django will do the join, and select_related get both objects. If I only filter on fields in ExportTrackingRecord then there will be no join, and django will generate a new query for each ExportTrackingRecord in the QuerySet (which could be thousands...)
I am fairly new to django, but is there a reasonable way to force the join?

select_related() is the key to your problem. If you don't use it and don't filter on fields of the related model Django will not do a join and cause an extra query for every row in the result if you are accessing data of the related model.
If you do something like ExportTrackingRecord.objects.filter(...).select_related('solution') you force Django to always do a join with the Solution table.
If you need to do the same in the other direction, through the reverse foreign key relation ship you need prefetch_related(), same for many-to-many relations

select_related controls what gets loaded into the results when the QuerySet is evaluated. it will force the join regardless of filtering.
If you don't specify select_related, then even if your filter produces a sql query with a join, the parent model's fields won't be loaded in the results, and accessing them will still require additional queries.

Related

Storing multiple values into a single field in mysql database that preserve order in Django

I've been trying to build a Tutorial system that we usually see on websites. Like the ones we click next -> next -> previous etc to read.
All Posts are stored in a table(model) called Post. Basically like a pool of post objects.
Post.objects.all() will return all the posts.
Now there's another Table(model)
called Tutorial That will store the following,
class Tutorial(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
tutorial_heading = models.CharField(max_length=100)
tutorial_summary = models.CharField(max_length=300)
series = models.CharField(max_length=40) # <---- Here [10,11,12]
...
Here entries in this series field are post_ids stored as a string representation of a list.
example: series will have [10,11,12] where 10, 11 and 12 are post_id that correspond to their respective entries in the Post table.
So my table entry for Tutorial model looks like this.
id heading summary series
"5" "Series 3 Tutorial" "lorem on ullt consequat." "[12, 13, 14]"
So I just read the series field and get all the Posts with the ids in this list then display them using pagination in Django.
Now, I've read from several stackoverflow posts that having multiple entries in a single field is a bad idea. And having this relationship to span over multiple tables as a mapping is a better option.
What I want to have is the ability to insert new posts into this series anywhere I want. Maybe in the front or middle. This can be easily accomplished by treating this series as a list and inserting as I please. Altering "[14,12,13]" will reorder the posts that are being displayed.
My question is, Is this way of storing multiple values in field for my usecase is okay. Or will it take a performance hit Or generally a bad idea. If no then is there a way where I can preserve or alter order by spanning the relationship by using another table or there is an entirely better way to accomplish this in Django or MYSQL.
Here entries in this series field are post_ids stored as a string representation of a list.
(...)
So I just read the series field and get all the Posts with the ids in this list then display them using pagination in Django.
DON'T DO THIS !!!
You are working with a relational database. There is one proper way to model relationships between entities in a relational database, which is to use foreign keys. In your case, depending on whether a post can belong only to a single tutorial ("one to many" relationship) or to many tutorials at the same time ("many to many" relationship, you'll want either to had to post a foreign key on tutorial, or to use an intermediate "post_tutorials" table with foreign keys on both post and tutorials.
Your solution doesn't allow the database to do it's job properly. It cannot enforce integrity constraints (what if you delete a post that's referenced by a tutorial ?), it cannot optimize read access (with proper schema the database can retrieve a tutorial and all it's posts in a single query) , it cannot follow reverse relationships (given a post, access the tutorial(s) it belongs to) etc. And it requires an external program (python code) to interact with your data, while with proper modeling you just need standard SQL.
Finally - but this is django-specific - using proper schema works better with the admin features, and with django rest framework if you intend to build a rest API.
wrt/ the ordering problem, it's a long known (and solved) issue, you just need to add an "order" field (small int should be enough). There are a couple 3rd part django apps that add support for this to both your models and the admin so it's almost plug and play.
IOW, there are absolutely no good reason to denormalize your schema this way and only good reasons to use proper relational modeling. FWIW I once had to work on a project based on some obscure (and hopefully long dead) PHP cms that had the brillant idea to use your "serialized lists" anti-pattern, and I can tell you it was both a disaster wrt/ performances and a complete nightmare to maintain. So do yourself and the world a favour: don't try to be creative, follow well-known and established best practices instead, and your life will be much happier. My 2 cents...
I can think of two approaches:
Approach One: Linked List
One way is using linked list like this:
class Tutorial(models.Model):
...
previous = models.OneToOneField('self', null=True, blank=True, related_name="next")
In this approach, you can access the previous Post of the series like this:
for tutorial in Tutorial.objects.filter(previous__isnull=True):
print(tutorial)
while(tutorial.next_post):
print(tutorial.next)
tutorial = tutorial.next
This is kind of complicated approach, for example whenever you want to add a new tutorial in middle of a linked-list, you need to change in two places. Like:
post = Tutorial.object.first()
next_post = post.next
new = Tutorial.objects.create(...)
post.next=new
post.save()
new.next = next_post
new.save()
But there is a huge benefit in this approach, you don't have to create a new table for creating series. Also, there is possibility that the order in tutorials will not be modified frequently, which means you don't need to take too much hassle.
Approach Two: Create a new Model
You can simply create a new model and FK to Tutorial, like this:
class Series(models.Model):
name = models.CharField(max_length=255)
class Tutorial(models.Model):
..
series = models.ForeignKey(Series, null=True, blank=True, related_name='tutorials')
order = models.IntegerField(default=0)
class Meta:
unique_together=('series', 'order') # it will make sure that duplicate order for same series does not happen
Then you can access tutorials in series by:
series = Series.object.first()
series.tutorials.all().order_by('tutorials__order')
Advantage of this approach is its much more flexible to access Tutorials through series, but there will be an extra table created for this, and one extra field as well to maintain order.

How can I test that Django QuerySets are ordered by PK ascending

class Foo(models.Model):
name = models.CharField(max_length=10)
class Meta(object):
ordering = ('pk', )
I want to test that this ordering is working as I expect.
def test_respect_ordering(self):
Foo.objects.create(name="bar", pk=2)
Foo.objects.create(name="baz", pk=1)
results = Foo.objects.all()
self.assertEqual("baz", results[0].name)
self.assertEqual("bar", results[1].name)
Although this works as I expect, my test passes regardless of the Meta class or the ordering property defined in it. Is there some way I can test that this code matters?
Why do I want to test this? My tests run in SQLite, but production is in mysql. Hopefully someday, we'll use a better RDMBS, and maybe results won't be returned by PK across all of these RDMBS's.
The Django docs indicate that sorting doesn't happen automatically.
If you omit the ordering attribute in your Meta class, the resulting generated SQL query will not have an ORDER BY clause (well, in 1.4 anyway). This means you can't rely on the order of the rows.
Unordered SQL queries will generally have an order that looks like it makes some sense. That's because the query plan will most likely use indexes to decrease query time, and indexes can play a big part in the row order for unordered queries. A table generated by a Django model will only have an index on the primary key unless otherwise specified, so in general the 'unordered' order will be quite similar to the order when sorted on primary key.
However, there is absolutely no guarantee here, and this order cannot be relied on. The query plan depends largely on the database engine, and it can even change drastically for very similar queries on the same engine.
If you want a particular order, you should explicitly specify the order you want. That's the only reliable way to guarantee a particular order.

SQLAlchemy: add a child in many-to-many relationship by IDs

I am looking for a way to add a "Category" child to an "Object" entity without wasting the performance on loading the child objects first.
The "Object" and "Category" tables are linked with many-to-many relationship, stored in "ObjectCategory" table. The "Object" model is supplied with the relationsip:
categories = relationship('Category', secondary = 'ObjectCategory', backref = 'objects')
Now this code works just fine:
obj = models.Object.query.get(9)
cat1 = models.Category.query.get(22)
cat2 = models.Category.query.get(28)
obj.categories.extend([cat1, cat2])
But in the debug output I see that instantiating the obj and each category costs me a separate SELECT command to the db server, in addition to the single bulk INSERT command. Totally unneeded in this case, because I was not interested in manipulating the given category objects. Basically all I need is to nicely insert the appropriate category IDs.
The most obvious solution would be to go ahead and insert the entries in the association table directly:
db.session.add(models.ObjectCategory(oct_objID=9, oct_catID=22))
db.session.add(models.ObjectCategory(oct_objID=9, oct_catID=28))
But this approach is kind of ugly, it doesn't seem to use the power of the abstracted SQLAlchemy relationships. What's more it produces separate INSERT for every add(), vs the nice bulk INSERT as in the obj.categories.extend([list]) case. I imagine there could be some lazy object mode that would let the object live with only it's ID (unverified) and load the other fields only if they are requested. That would allow adding children in one-to-many or many-to-many relationships without issuing any SELECT to the database, yet letting to use the powerful ORM abstraction (ie, treating the list of children as a Python list).
How should I adjust my code to carry out this task using the power of SQLAlchemy but being conservative on the database use?
Do you have a ORM mapping for the ObjectCategory table? If so you could create and add ObjectCategory objects:
session.add(ObjectCategory(obj_id=9, category_id=22)
session.add(ObjectCategory(obj_id=9, category_id=28)

how to query a multiple column in django

I have a problem in displaying specific columns in my model in django...
I have read in the documentation in about the queryset feature of django.
my question is that it is also possible in django to run just like this query?
select name, age, address from person;
can anyone can give me an idea, i also try it like this
Mymodel.objects.get(name, age, address)
but there are error in the parameter of name, age and address...
thanks...
If you want only some columns use only:
Mymodel.objects.only('name', 'age', 'address')
If you don't want some columns use defer:
Mymodel.objects.defer('some_big_field')
You can still access field you haven't queried, but it will cost you one mode DB hit.
Also you can use values and values_list methods, but instead of model instances they return list of dicts and list of lists respectively.
there are a few different ways. django normally wraps the data in model instances, which is part of the point of the orm. you deal with objects and django deals with the database. so
for person in MyMydel.objects.all():
do_something_with(person.name)
having said that, if you only want certain attributes, e.g. for performance, you can use values
MyMode.objects.values('name', 'age', 'address')
which returns a list of dicts with those values

Designing a Tag table that tells how many times it's used

I am trying to design a tagging system with a model like this:
Tag:
content = CharField
creator = ForeignKey
used = IntergerField
It is a many-to-many relationship between tags and what's been tagged.
Everytime I insert a record into the assotication table,
Tag.used is incremented by one, and decremented by one in case of deletion.
Tag.used is maintained because I want to speed up answering the question 'How many times this tag is used?'.
However, this seems to slow insertion down obviously.
Please tell me how to improve this design.
Thanks in advance.
http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html
If your database support materialized indexed views then you might want to create one for this. You can get a large performance boost for frequently run queries that aggregate data, which I think you have here.
your view would be on a query like:
SELECT
TagID,COUNT(*)
FROM YourTable
GROUP BY TagID
The aggregations can be precomputed and stored in the index to minimize expensive computations during query execution.
I don't think it's a good idea to denormalize your data like that.
I think a more elegant solution is to use django aggregation to track how many times the tag has been used http://docs.djangoproject.com/en/dev/topics/db/aggregation/
You could attach the used count to your tag object by calling something like this:
my_tag = Tag.objects.annotate(used=Count('post'))[0]
and then accessing it like this:
my_tag.used
assuming that you have a Post model class that has a ManyToMany field to your Tag class
You can order the Tags by the named annotated field if needed:
Tag.objects.annotate(used=Count('post')).order_by('-used')

Categories