django - inner join queryset not working - python

The SQL I want to accomplish is this -
SELECT jobmst_id, jobmst_name, jobdtl_cmd, jobdtl_params FROM jobmst
INNER JOIN jobdtl ON jobmst.jobdtl_id = jobdtl.jobdtl_id
WHERE jobmst_id = 3296
I've only had success once with an inner join in django off of a annote and order_by but I can't seem to get it to work doing either prefetch_related() or select_related()
My models are as so -
class Jobdtl(models.Model):
jobdtl_id = models.IntegerField(primary_key=True)
jobdtl_cmd = models.TextField(blank=True)
jobdtl_fromdt = models.DateTimeField(blank=True, null=True)
jobdtl_untildt = models.DateTimeField(blank=True, null=True)
jobdtl_fromtm = models.DateTimeField(blank=True, null=True)
jobdtl_untiltm = models.DateTimeField(blank=True, null=True)
jobdtl_priority = models.SmallIntegerField(blank=True, null=True)
jobdtl_params = models.TextField(blank=True) # This field type is a guess.
class Meta:
managed = False
db_table = 'jobdtl'
class Jobmst(MPTTModel):
jobmst_id = models.IntegerField(primary_key=True)
jobmst_type = models.SmallIntegerField()
jobmst_prntid = TreeForeignKey('self', null=True, blank=True, related_name='children', db_column='jobmst_prntid')
jobmst_name = models.TextField(db_column='jobmst_name', blank=True)
# jobmst_owner = models.IntegerField(blank=True, null=True)
jobmst_owner = models.ForeignKey('Owner', db_column='jobmst_owner', related_name = 'Jobmst_Jobmst_owner', blank=True, null=True)
jobmst_crttm = models.DateTimeField()
jobdtl_id = models.ForeignKey('Jobdtl', db_column='jobdtl_id', blank=True, null=True)
jobmst_prntname = models.TextField(blank=True)
class MPTTMeta:
order_insertion_by = ['jobmst_id']
class Meta:
managed = True
db_table = 'jobmst'
I have a really simple view like so -
# Test Query with Join
def test_queryjoin(request):
queryset = Jobmst.objects.filter(jobmst_id=3296).order_by('jobdtl_id')
queryresults = serializers.serialize("python", queryset, fields=('jobmst_prntid', 'jobmst_id', 'jobmst_prntname', 'jobmst_name', 'jobmst_owner', 'jobdtl_cmd', 'jobdtl_params'))
t = get_template('test_queryjoin.html')
html = t.render(Context({'query_output': queryresults}))
return HttpResponse(html)
I've tried doing a bunch of things -
queryset = Jobmst.objects.all().prefetch_related()
queryset = Jobmst.objects.all().select_related()
queryset = jobmst.objects.filter(jobmst_id=3296).order_by('jobdtl_id')
a few others as well I forget.
Each time the json I'm getting is only from the jobmst table with no mention of the jobdtl results which I want. If I go the other way and do Jobdtl.objects.xxxxxxxxx same thing it's not giving me the results from the other model.
To recap I want to display fields from both tables where a certain clause is met.
What gives?

Seems that I was constantly looking in the wrong place. Coming from SQL I kept thinking in terms of inner joining tables which is not how this works. I'm joining the results from models.
Hence, rethinking my search I came across itertools and the chain function.
I now have 2 queries under a def in my views.py
from itertools import chain
jobmstquery = Jobmst.objects.filter(jobmst_id=3296)
jobdtlquery = Jobdtl.objects.filter(jobdtl_id=3296)
queryset = chain(jobmstquery, jobdtlquery)
queryresults = serializers.serialize("python", queryset)
That shows me the results from each table "joined" like I would want in SQL. Now I can focus on filtering down the results to give me what I want.
Remember folks, the information you need is almost always there, it's just a matter of knowing how to look for it :)

What you are looking for might be this
queryset = Jobmst.objects.filter(id=3296).values_list(
'id', 'name', 'jobmst_owner__cmd', 'jobmst_owner__params')
You would get your results with only one query and you should be able to use sort with this.
P.S. Coming from SQL you might find some great insights playing with queryset.query (the SQL generated by django) in a django shell.

Related

Django order_by query runs incredibly slow in Python, but fast in DB

I have the following models:
class Shelf(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
name = models.CharField(max_length=200, db_index=True)
slug = models.SlugField(max_length=200, editable=False)
games = models.ManyToManyField(Game, blank=True, through='SortedShelfGames')
objects = ShelfManager()
description = models.TextField(blank=True, null=True)
class SortedShelfGames(models.Model):
game = models.ForeignKey(Game, on_delete=models.CASCADE)
shelf = models.ForeignKey(Shelf, on_delete=models.CASCADE)
date_added = models.DateTimeField()
order = models.IntegerField(blank=True, null=True)
releases = models.ManyToManyField(Release)
objects = SortedShelfGamesManager.as_manager()
class Game(models.Model):
name = models.CharField(max_length=300, db_index=True)
sort_name = models.CharField(max_length=300, db_index=True)
...
I have a view where I want to get all of a user's SortedShelfGames, distinct on the Game relationship. I then want to be able to sort that list of SortedShelfGames on a few different fields. So right now, I'm doing the following inside of the SortedShelfGamesManager (which inherits from models.QuerySet) to get the list:
games = self.filter(
pk__in=Subquery(
self.filter(shelf__user=user).distinct('game').order_by('game', 'date_added').values('pk') # The order_by statement in here is to get the earliest date_added field for display
)
)
That works the way it's supposed to. However, whenever I try and do an order_by('game__sort_name'), the query takes forever in my python. When I'm actually trying to use it on my site, it just times out. If I take the generated SQL and just run it on my database, it returns all of my results in a fraction of a second. I can't figure out what I'm doing wrong here. The SortedShelfGames table has millions of records in it if that matters.
This is the generated SQL:
SELECT
"collection_sortedshelfgames"."id", "collection_sortedshelfgames"."game_id", "collection_sortedshelfgames"."shelf_id", "collection_sortedshelfgames"."date_added", "collection_sortedshelfgames"."order",
(SELECT U0."rating" FROM "reviews_review" U0 WHERE (U0."game_id" = "collection_sortedshelfgames"."game_id" AND U0."user_id" = 1 AND U0."main") LIMIT 1) AS "score",
"games_game"."id", "games_game"."created", "games_game"."last_updated", "games_game"."exact", "games_game"."date", "games_game"."year", "games_game"."quarter", "games_game"."month", "games_game"."name", "games_game"."sort_name", "games_game"."rating_id", "games_game"."box_art", "games_game"."description", "games_game"."slug", "games_game"."giantbomb_id", "games_game"."ignore_giantbomb", "games_game"."ignore_front_page", "games_game"."approved", "games_game"."user_id", "games_game"."last_edited_by_id", "games_game"."dlc", "games_game"."parent_game_id"
FROM
"collection_sortedshelfgames"
INNER JOIN
"games_game"
ON
("collection_sortedshelfgames"."game_id" = "games_game"."id")
WHERE
"collection_sortedshelfgames"."id"
IN (
SELECT
DISTINCT ON (U0."game_id") U0."id"
FROM
"collection_sortedshelfgames" U0
INNER JOIN
"collection_shelf" U1 ON (U0."shelf_id" = U1."id")
WHERE
U1."user_id" = 1
ORDER
BY U0."game_id" ASC, U0."date_added" ASC
)
ORDER BY
"games_game"."sort_name" ASC
I think you don't need a Subquery for this.
Here's what I ended up doing to solve this. Instead of using a Subquery, I created a list of primary keys by evaluating what I was using as the Subquery in, then feeding that into my query. It looks like this:
pks = list(self.filter(shelf__user=user).distinct('game').values_list('pk', flat=True))
games = self.filter(
pk__in=pks)
)
games = games.order_by('game__sort_name')
This ended up being pretty fast. This is essentially the same thing as the Subquery method, but whatever was going on underneath the hood in python/Django was slowing this way down.

Django - improve the query consisting many-to-many and foreignKey fields

I want to export a report from the available data into a CSV file. I wrote the following code and it works fine. What do you suggest to improve the query?
Models:
class shareholder(models.Model):
title = models.CharField(max_length=100)
code = models.IntegerField(null=False)
class Company(models.Model):
isin = models.CharField(max_length=20, null=False)
cisin = models.CharField(max_length=20)
name_fa = models.CharField(max_length=100)
name_en = models.CharField(max_length=100)
class company_shareholder(models.Model):
company = models.ManyToManyField(Company)
shareholder = models.ForeignKey(shareholder, on_delete=models.SET_NULL, null=True)
share = models.IntegerField(null = True) # TODO: *1000000
percentage = models.DecimalField(max_digits=8, decimal_places=2, null=True)
difference = models.DecimalField(max_digits=11, decimal_places=2, null=True)
update_datetime = models.DateTimeField(null=True)
View:
def ExportAllShare(request):
response = HttpResponse(content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename="shares.csv"'
response.write(u'\ufeff'.encode('utf8'))
writer = csv.writer(response)
writer.writerow(['date','company','shareholder title','shareholder code','difference','share'])
results = company_shareholder.objects.all()
for result in results:
row = (
result.update_datetime,
result.company.first().name_fa,
result.shareholder.title,
result.shareholder.code,
result.difference,
result.share,
)
writer.writerow(row)
return (response)
First of all if it's working fine for you, then it's working fine, don't optimize prematurely.
But, in a query like this you are running into n+1 problem. In Django you avoid it using select_related and prefetch_related. Like this:
results = company_shareholder.objects.select_related('shareholder').prefetch_related('company').all()
This should reduce the number of queries you are generating. If you need a little bit more performance and since you are not using percentage I would defer it.
Also, I would highly suggest you follow PEP8 styling guide and name your classes in CapWords convention like Shareholder and CompanyShareholder.

Reference multiple foreign keys in Django Model

I'm making a program that helps log missions in a game. In each of these missions I would like to be able to select a number of astronauts that will go along with it out of the astronauts table. This is fine when I only need one, but how could I approach multiple foreign keys in a field?
I currently use a 'binary' string that specifies which astronauts are to be associated with the mission (1 refers to Jeb, but not Bill, Bob, or Val and 0001 means only Val), with the first digit specifying the astronaut with id 1 and so forth. This works, but it feels quite clunky.
Here's the model.py for the two tables in question.
class astronauts(models.Model):
name = models.CharField(max_length=200)
adddate = models.IntegerField(default=0)
experience = models.IntegerField(default=0)
career = models.CharField(max_length=9, blank=True, null=True)
alive = models.BooleanField(default=True)
def __str__(self):
return self.name
class Meta:
verbose_name_plural = "Kerbals"
class missions(models.Model):
# mission details
programid = models.ForeignKey(programs, on_delete=models.SET("Unknown"))
missionid = models.IntegerField(default=0)
status = models.ForeignKey(
missionstatuses, on_delete=models.SET("Unknown"))
plan = models.CharField(max_length=1000)
# launch
launchdate = models.IntegerField(default=0)
crewmembers = models.IntegerField(default=0)
# recovery
summary = models.CharField(max_length=1000, blank=True)
recdate = models.IntegerField(default=0)
def __str__(self):
return str(self.programid) + '-' + str(self.missionid)
class Meta:
verbose_name_plural = "Missions"
I saw a post about an 'intermediate linking table' to store the crew list but that also isn't ideal.
Thanks!
This is the use case for Django's ManyToManyField. Change the appropriate field on the missions:
class missions(models.Model):
crewmembers = models.ManyToManyField('astronauts')
You can access this from the Astronaut model side like so:
jeb = astronaut.objects.get(name='Jebediah Kerman')
crewed_missions = jeb.missions_set.all()
Or from the mission side like so:
mission = missions.objects.order_by('?')[0]
crew = mission.crewmembers.all()
This creates another table in the database, in case that is somehow a problem for you.

Django - How to link tables

Hello to the stackoverflow team,
I have the following two django tables:
class StraightredFixture(models.Model):
fixtureid = models.IntegerField(primary_key=True)
soccerseason = models.IntegerField(db_column='soccerSeason') # Field name made lowercase.
hometeamid = models.IntegerField()
awayteamid = models.IntegerField()
fixturedate = models.DateTimeField()
fixturestatus = models.CharField(max_length=24)
fixturematchday = models.IntegerField()
hometeamscore = models.IntegerField()
awayteamscore = models.IntegerField()
class Meta:
managed = False
db_table = 'straightred_fixture'
class StraightredTeam(models.Model):
teamid = models.IntegerField(primary_key=True)
teamname = models.CharField(max_length=36)
teamcode = models.CharField(max_length=5)
teamshortname = models.CharField(max_length=24)
class Meta:
managed = False
db_table = 'straightred_team'
In the views.py I know I can put the following and it works perfectly:
def test(request):
fixture = StraightredFixture.objects.get(fixtureid=136697)
return render(request,'straightred/test.html',{'name':fixture.hometeamid})
As I mentioned above, this all works well but I am looking to return the teamname of the hometeamid which can be found in the StraightredTeam model.
After some looking around I have been nudged in the direction of "select_related" but I am unclear on how to implement it in my existing tables and also if it is the most efficient way for this type of query. It feels right.
Please note this code was created using "python manage.py inspectdb".
Any advice at this stage would be greatly appreciated. Many thanks, Alan.
See model relationships.
Django provides special model fields to manage table relationships.
The one suiting your needs is ForeignKey.
Instead of declaring:
hometeamid = models.IntegerField()
awayteamid = models.IntegerField()
which I guess is the result of python manage.py inspectdb, you would declare:
home_team = models.ForeignKey('<app_name>. StraightredTeam', db_column='hometeamid', related_name='home_fixtures')
away_team = models.ForeignKey('<app_name>. StraightredTeam', db_column='awayteamid', related_name='away_fixtures')
By doing this will, you tell the Django ORM to handle the relationship under the hood, which will allow you to do such things as:
fixture = StraightredFixture.objects.get(fixtureid=some_fixture_id)
fixture.home_team # Returns the associated StraightredTeam instance.
team = StraightredTeam.objects.get(team_id=some_team_id)
team.home_fixtures.all() # Return all at home fixtures for that team.
I am not sure if this makes sense for Managed=False, but I suppose the sane way of doing it in Django would be with
home_team = models.ForeignKey('StraightRedFixture', db_column='fixtureid'))
And then just using fixture.home_team instead of doing queries by hand.

django queryset include more columns in select statement

I been trying to create a backward relation using queryset and the joining is working fine, accept that its not including the other joined table in the selected columns. Below is my models, queryset and query.str() print
class Main(models.Model):
slug = models.SlugField()
is_active = models.BooleanField(default=True)
site = models.ForeignKey(Site)
parent = models.ForeignKey('self', blank=True, null=True, limit_choices_to={'parent' : None})
class Meta:
unique_together = (("slug", "parent"))
def __unicode__(self):
return self.slug
class MainI18n(models.Model):
main = models.ForeignKey(Main)
language = models.CharField(max_length=2, choices=settings.LANGUAGES)
title = models.CharField(max_length=100)
label = models.CharField(max_length=200, blank=True, null=True)
description = models.TextField(blank=True, null=True)
disclaimer = models.TextField(blank=True, null=True)
class Meta:
unique_together = (("language", "main"))
def __unicode__(self):
return self.title
class List(models.Model):
main = models.ForeignKey(Main)
slug = models.SlugField(unique=True)
is_active = models.BooleanField(default=True)
parent = models.ForeignKey('self', blank=True, null=True)
def __unicode__(self):
return self.slug
class ListI18n(models.Model):
list = models.ForeignKey(List)
language = models.CharField(max_length=2, choices=settings.LANGUAGES)
title = models.CharField(max_length=50)
description = models.TextField()
class Meta:
unique_together = (("language", "list"))
def __unicode__(self):
return self.title
and my queryset is
Main.objects.select_related('main', 'parent').filter(list__is_active=True, maini18n__language='en', list__listi18n__language='en')
and this is what my query is printing
'SELECT `category_main`.`id`, `category_main`.`slug`, `category_main`.`is_active`, `category_main`.`site_id`, `category_main`.`parent_id`, T5.`id`, T5.`slug`, T5.`is_active`, T5.`site_id`, T5.`parent_id` FROM `category_main` INNER JOIN `category_maini18n` ON (`category_main`.`id` = `category_maini18n`.`main_id`) INNER JOIN `category_list` ON (`category_main`.`id` = `category_list`.`main_id`) INNER JOIN `category_listi18n` ON (`category_list`.`id` = `category_listi18n`.`list_id`) LEFT OUTER JOIN `category_main` T5 ON (`category_main`.`parent_id` = T5.`id`) WHERE (`category_maini18n`.`language` = en AND `category_list`.`is_active` = True AND `category_listi18n`.`language` = en )'
anyone can help show columns from list and listi18n? I tried extra but It doesn't allow me to pass things like category_list.*
thanks
UPDATE
Thanks for Daniel approach, I managed to get it to work but instead I had to start from ListI18n
ListI18n.objects.select_related('list', 'list__main', 'list__main__parent', 'list__main__i18nmain').filter(list__is_active=True, list__main__maini18n__language='en', language='en').query.__str__()
Its working perfectly now, but I couldn't include list_main_maini18n, below is the output query
'SELECT `category_listi18n`.`id`, `category_listi18n`.`list_id`, `category_listi18n`.`language`, `category_listi18n`.`title`, `category_listi18n`.`description`, `category_list`.`id`, `category_list`.`main_id`, `category_list`.`slug`, `category_list`.`is_active`, `category_list`.`parent_id`, `category_main`.`id`, `category_main`.`slug`, `category_main`.`is_active`, `category_main`.`site_id`, `category_main`.`parent_id`, T5.`id`, T5.`slug`, T5.`is_active`, T5.`site_id`, T5.`parent_id` FROM `category_listi18n` INNER JOIN `category_list` ON (`category_listi18n`.`list_id` = `category_list`.`id`) INNER JOIN `category_main` ON (`category_list`.`main_id` = `category_main`.`id`) INNER JOIN `category_maini18n` ON (`category_main`.`id` = `category_maini18n`.`main_id`) LEFT OUTER JOIN `category_main` T5 ON (`category_main`.`parent_id` = T5.`id`) WHERE (`category_list`.`is_active` = True AND `category_listi18n`.`language` = en AND `category_maini18n`.`language` = en )'
Any idea how can I include MainI18n in the query result? should I use extra and include the tables and do the relation in the where clause? or is there a better approach?
The relationship from Main to List is a backwards ForeignKey (ie the FK is on List pointing at Main), and select_related doesn't work that way. When you think about it, this is correct: there are many Lists for each Main, so it doesn't make sense to say "give me the one List for this Main", which is what select_related is all about.
If you started from List, it would work:
List.objects.select_related('main__parent').filter(is_active=True, main__maini18n__language='en', listi18n__language='en')
because that way you're only following forwards relationships. You may find you're able to reorder your views/templates to use the query this way round.
Looks to me it is actually working (T5 in your select statement). You can always access fields from related instances in django via something like my_obj.parent.is_active. If you used select_related before they are included in the first query. If you didn't specify it in select_related a call to my_obj.parent.is_active for example would perform an extra db query.

Categories