Django model set lookup very slow - python

I'm getting a very slow lookup in my Django models.
I have two tables:
class Scan(models.Model):
scan_name = models.CharField(max_length=32, unique=True, validators=[alphanumeric_plus_validator])
class ScanProcessingInfo(models.Model):
scan_name = models.CharField(max_length=32)
processing_name = models.CharField(max_length=64)
in_progress = models.BooleanField(default=False)
When I perform the following operation to get a list of all Scan objects which have a ScanProcessingInfo for a specific processing_name:
scans = models.Scan.objects.all()
scan_set = []
for scan in scans:
if self.set_type_definition.test_scan(scan, self.arg1, self.arg2):
scan_set.append(scan)
(test_scan routes to)
def get_proc_info_been_done(scan, spd_name):
try:
proc_info = models.ScanProcessingInfo.objects.get(scan_name = scan.scan_name)
except models.ScanProcessingInfo.DoesNotExist:
proc_info = None
if proc_info == None:
return False
return not proc_info.in_progress
the request takes about 10 seconds. There are 300 Scans in total and 10 ScanProcessingInfos. The db backend is an RDS MySQL db. I also expect someone will tell me off for using strings for the cross-table identifiers, but I doubt that's the cause here.
I'm sure I'm doing something obvious wrong, but would appreciate a pointer, thank you.

I think what you're asking is how to get all Scans for which a matching ScanProcessingInfo exists.
The first thing to do is to declare the actual relationship. You don't need to change your database (you should, but you don't have to); you can use your existing underlying field, but just tell Django to treat it as a foreign key.
class ScanProcessingInfo(models.Model):
scan = models.ForeignKey('Scan', to_field='scan_name', db_field='scan_name', on_delete=models.DO_NOTHING)
Now you can use this relationship to get all the scans in one go:
scan_set = Scan.objects.exclude(scanprocessinginfo=None)
Edit
To get all matching objects with a specific attribute, use the double-underscore syntax:
scan_set = Scan.objects.filter(scanprocessinginfo__processing_name=spd_name)

Use Many-to-one relationship.
scan_name = ForeignKey(Scan, related_name='processing_infos',on_delete=models.CASCADE)

Related

Django fake model instanciation - No testunit [duplicate]

This question already has an answer here:
How can I create a django model instance with deferred fields without hitting the database?
(1 answer)
Closed 8 months ago.
I want to know if I can instanciate an empty fake model just with id of database record.
I found way to create mockup model, but I want a production-friendly solution.
Explanation of my issue :
I want to list users settings for users who choose to be displayed on public mode :
user_displayed_list = UserPublicProfile.objects.filter(
displayed = True,
).only(
'user_id',
'is_premium',
)
user_settings_list = []
for user_displayed in user_displayed_list:
# I have to send user Instance to the next method :
user_settings = self.get_user_settings(user_displayed.user)
user_settings_list.append(user_settings)
# But ’user_displayed.user’ run an new SQL query
I know I can improve my queryset as :
user_displayed_list = UserPublicProfile.objects.filter(
displayed = True,
).select_related(
'user'
).only(
'user',
'is_premium',
)
But It makes an useless join because I need only the user id field in get_user_settings():
The get_user_settings() method (it could help to understand context):
def get_user_settings(self, user)
user_settings = UserSettings.objects.get(user = user)
return user_settings
In real project, this method run more business feature
Is there a way to instanciate a User model instance with only id field filled ?
I don't want to use a custom empty class coded for this purpose. I really want an object User.
I didn't find anything for that. If it's possible, I could use it by this way :
for user_displayed in user_displayed_list:
FakeUser = User.objects.create_fake(id = user_displayed.user_id)
# I have to send user Instance to the next method :
user_settings = self.get_user_settings(FakeUser)
Without seeing the complete models, I'm assuming a bit. Assuming that UserSettings has a ForeignKey to User. Same for UserPublicProfile. Or User has ForeignKey to UserSettings. Works as well.
Assuming that, I see two solutions.
Solution #1; use the ORM to full potential
Just saw your comment about the 'legacy method, used many times'.
Django relations are very smart. They accept either the object or the ID of a ForeignKey.
You'd imagine this only works with a User. But if you pass the id, Django ORM will help you out.
def get_user_settings(self, user)
user_settings = UserSettings.objects.get(user = user)
return user_settings
So in reality, these work the same:
UserSettings.objects.get(user=1)
UserSettings.objects.get(user_id=1)
Which means this should work, without a extra query:
user_displayed_list = UserPublicProfile.objects.filter(
displayed = True,
).only(
'user_id',
'is_premium',
)
user_settings_list = []
for user_displayed in user_displayed_list:
# I have to send user Instance to the next method :
user_settings = self.get_user_settings(user_displayed.user_id) # pass the user_id instead of the object.
user_settings_list.append(user_settings)
Solution #2: chain relations
Another solution, again, still assuming quite a bit ;)
It would think you can chain the model together.
Assuming these FK exists: UserPublicProfile -> User -> UserSetting.
You could do this:
user_displayed_list = UserPublicProfile.objects.filter(
displayed = True,
).select_related(
'user', 'user__usersettings', # depends on naming of relations
).only(
'user',
'is_premium',
)
for user_displayed in user_displayed_list:
# I have to send user Instance to the next method :
user_settings = user_displayed.user.usersettings # joined, so should cause no extra queries. Depends on naming of relations.
user_settings_list.append(user_settings)

Django Models .get fails but .filter and .all works - object exists in database

Racking my brain on this one. The model seems true, theoretically all of the commented permutations should work--- but the only things that can successfully retrieve the user is .filter and .all; .get doesnt work; I can deal with using either .filter or .all ---- but why isn't get working?
I'll reiterate that a direct SQL query works 100% in this case. All imports are in place and things are functioning great at a low level -- again, Filter works, All works, but get fails for some reason.
class UserModelTest(TestCase):
def test_getUserByUsername(self):
sanity = True
try:
#u = User.objects.filter(username='wadewilliams')
u = User.objects.get(username='wadewilliams')
#u = User.objects.get(pk=15773)
#u = User.objects.all()
print u
except User.DoesNotExist:
sanity = False
self.assertEqual(sanity, True)
... That test fails unless I uncomment either filter or all... both gets, no go.
And the model...
class User(models.Model):
userid = models.IntegerField(primary_key=True, db_column='userID')
username = models.CharField(max_length=135)
realname = models.CharField(max_length=150, db_column='name')
email = models.CharField(max_length=765, blank=True)
class Meta:
db_table = u'users'
def __unicode__(self):
return self.username + ' (' + self.email + ')'
The test suite creates a mock database that is blank, so no users can be found even though the exist in the production/development database.
From the docs:
Finding data from your production database when running tests?
If your code attempts to access the database when its modules are compiled, this will occur before the test database is set up, with potentially unexpected results. For example, if you have a database query in module-level code and a real database exists, production data could pollute your tests. It is a bad idea to have such import-time database queries in your code anyway - rewrite your code so that it doesn't do this.

Django: Proper Way to Update Models

Suppose I have the following function, which retrieves data from a server in the form of user-defined objects. For example, let's define the objects as PersonalEntry.
def retrieve_people()
// returns a list of all latest people objects, with fields equal to those in PersonEntry
def retrieve_books()
// returns a list of all latest book objects regardless of author, with fields equal to those in BookEntry, contains attribute,
Both user-defined classes has an .as_dict() method which returns all its attributes in a dictionary.
I would like to update the model whenever this function is called (ie. update the fields if the instance of that model already exists, else, define a new instance of the model). This is my current setup.
class PersonEntry(models.Model):
name = models.CharField(max_length = 50)
age = models.IntegerField()
biography = models.CharField(max_length = 200)
def update_persons():
try:
temp = retrieve_person()
for person in temp:
match = PersonEntry.objects.filter(name = person.name(), age = person.age())
match.update(**person.as_dict())
except DoesNotExist:
PersonEntry.create(**person.as_dict())
class BookEntry(models.Model):
author = models.ForeignKey(PersonEntry)
author_name = models.CharField(max_length = 50) //books return redundant info
author_age = models.IntegerField() //books return redundant info
title = models.CharField(max_length = 50)
summary = models.CharField(max_length = 200)
def update_books():
try:
temp = retrieve_books()
for book in temp:
match = BookEntry.objects.filter(title = temp.title())
match.update(**book.as_dict(), associate_person(book.author_age(), book.author_name()))
except DoesNotExist:
BookEntry.create(**book.as_dict(), associate_person(book.author_age(), book.author_name())
def associate_person(age, name):
return PersonEntry.get(name = name, age = age)
I suppose a more general question is, how do I update models with relationships if I have a function which returns data? Do I have a method in the model itself, or do I have it one level up (ie. move update_books to the Person model) I'm new to Django, so not really sure how the organization should be.
I confess I haven't completely grokked your question, but I'll take a punt that you should look into
Managers
Generally, in django, everything is done as lazily as possible - meaning nothing gets updated until you actually try to use it - so you don't update models/relationships as you go, rather you just declare what they are (perhaps with a manager) then it works it out the current value only when asked.
Methods returning a collection (or queryset) of some kind of a model, should be a part of the Managers. So in your case update_books should be in a custom manager for BookEntry and update_persons should be in custom manager for PersonEntry.
Also do not call the function retrieve_* from inside the model or manager. Call it in your application logic and then pass the result to the manager method unless that method itself is part of the manager/model.
You do not need a separate filter and update method. Depending on how you have retrieved the data and it has a pk you can directly do a save. See How Does Django Know when to create or update

Google App Engine Python Datastore

Basically what Im trying to make is a data structure where it has the users name, id, and datejoined. Then i want a "sub-structure" where it has the users "text" and the date it was modified. and the user will have multiple instances of this text.
class User(db.Model):
ID = db.IntegerProperty()
name = db.StringProperty()
datejoined = db.DateTimeProperty(auto_now_add=True)
class Content(db.Model):
text = db.StringProperty()
datemod= db.DateTimeProperty(auto_now_add = True)
Is the code set up correctly?
One problem you will have is that making User.ID unique will be non-trivial. The problem is that two writes to the database could occur on different shards, both check at about the same time for existing entries that match the uniqueness constraint and find none, then both create identical entries (with regard to the unique property) and then you have an invalid database state. To solve this, appengine provides a means of ensuring that certain datastore entities are always placed on the same physical machine.
To do this, you make use of the entity keys to tell google how to organize the entities. Lets assume you want the username to be unique. Change User to look like this:
class User(db.Model):
datejoined = db.DateTimeProperty(auto_now_add=True)
Yes, that's really it. There's no username since that's going to be used in the key, so it doesn't need to appear separately. If you like, you can do this...
class User(db.Model):
datejoined = db.DateTimeProperty(auto_now_add=True)
#property
def name(self):
return self.key().name()
To create an instance of a User, you now need to do something a little different, you need to specify a key_name in the init method.
someuser = User(key_name='john_doe')
...
someuser.save()
Well, really you want to make sure that users don't overwrite each other, so you need to wrap the user creation in a transaction. First define a function that does the neccesary check:
def create_user(username):
checkeduser = User.get_by_key_name(username)
if checkeduser is not None:
raise db.Rollback, 'User already exists!'
newuser = User(key_name=username)
# more code
newuser.put()
Then, invoke it in this way
db.run_in_transaction(create_user, 'john_doe')
To find a user, you just do this:
someuser = User.get_by_key_name('john_doe')
Next, you need some way to associate the content to its user, and visa versa. One solution is to put the content into the same entity group as the user by declaring the user as a parent of the content. To do this, you don't need to change the content at all, but you create it a little differently (much like you did with User):
somecontent = Content(parent=User.get_by_key_name('john_doe'))
So, given a content item, you can look up the user by examining its key:
someuser = User.get(somecontent.key().parent())
Going in reverse, looking up all of the content for a particular user is only a little trickier.
allcontent = Content.gql('where ancestor is :user', user=someuser).fetch(10)
Yes, and if you need more documentation, you can check here for database types and here for more info about your model classes.
An alternative solution you may see is using referenceproperty.
class User(db.Model):
name = db.StringProperty()
datejoined = db.DateTimeProperty(auto_now_add=True)
class Content(db.Model):
user = db.ReferenceProperty(User,collection_name='matched_content')
text = db.StringProperty()
datemod= db.DateTimeProperty(auto_now_add = True)
content = db.get(content_key)
user_name = content.user.name
#looking up all of the content for a particular user
user_content = content.user.matched_content
#create new content for a user
new_content = Content(reference=content.user)

How can I query for records based on an attribute of a ReferenceProperty? (Django on App Engine)

If I have the following models in a Python (+ Django) App Engine app:
class Album(db.Model):
private = db.BooleanProperty()
...
class Photo(db.Model):
album = db.ReferenceProperty(Album)
title = db.StringProperty()
...how can I retrieve all Photos that belong to a public Album (that is, an Album with private == False)?
To further explain my intention, I thought it would be:
public_photos = Photos.all().filter('album.private = ', False)
and then I could do something like:
photos_for_homepage = public_photos.fetch(30)
but the query does not match anything, which tells me I'm going down the wrong path.
You can't. App engine doesn't support joins.
One approach is to implement the join manually. For example you could fetch all photos, then filter out the private ones in code. Or fetch all public albums, and then fetch each of their photos. It depends on your data as to whether this will perform okay or not.
The alternative approach is to denormalize your data. Put another field in the Photo model, eg:
class Photo(db.Model):
album = db.ReferenceProperty(Album)
album_private = db.BooleanProperty()
title = db.StringProperty()
Then you can filter for public photos with:
public_photos = Photos.all().filter('album_private = ', False)
This improves query performance, but at the expense of write performance. You will need to keep the album_private field of the photos updated whenever you change the private flag of the album. It depends on your data and read/write patterns as to whether this will be better or worse.

Categories