I have a model with a unique integer that needs to increment with regards to a foreign key, and the following code is how I currently handle it:
class MyModel(models.Model):
business = models.ForeignKey(Business)
number = models.PositiveIntegerField()
spam = models.CharField(max_length=255)
class Meta:
unique_together = (('number', 'business'),)
def save(self, *args, **kwargs):
if self.pk is None: # New instance's only
try:
highest_number = MyModel.objects.filter(business=self.business).order_by('-number').all()[0].number
self.number = highest_number + 1
except ObjectDoesNotExist: # First MyModel instance
self.number = 1
super(MyModel, self).save(*args, **kwargs)
I have the following questions regarding this:
Multiple people can create MyModel instances for the same business, all over the internet. Is it possible for 2 people creating MyModel instances at the same time, and .count() returns 500 at the same time for both, and then both try to essentially set self.number = 501 at the same time (raising an IntegrityError)? The answer seems like an obvious "yes, it could happen", but I had to ask.
Is there a shortcut, or "Best way" to do this, which I can use (or perhaps a SuperAutoField that handles this)?
I can't just slap a while model_not_saved: try:, except IntegrityError: in, because other restraints in the model could lead to an endless loop, and a disaster worse than Chernobyl (maybe not quite that bad).
You want that constraint at the database level. Otherwise you're going to eventually run into the concurrency problem you discussed. The solution is to wrap the entire operation (read, increment, write) in a transaction.
Why can't you use an AutoField for instead of a PositiveIntegerField?
number = models.AutoField()
However, in this case number is almost certainly going to equal yourmodel.id, so why not just use that?
Edit:
Oh, I see what you want. You want a numberfield that doesn't increment unless there's more than one instance of MyModel.business.
I would still recommend just using the id field if you can, since it's certain to be unique. If you absolutely don't want to do that (maybe you're showing this number to users), then you will need to wrap your save method in a transaction.
You can read more about transactions in the docs:
http://docs.djangoproject.com/en/dev/topics/db/transactions/
If you're just using this to count how many instances of MyModel have a FK to Business, you should do that as a query rather than trying to store a count.
Related
As part of using TimescaleDB, which requires a timestamp as the primary key (time in SensorReading), I need to handle the case when the same timestamp is used by different sensor values. One elegant solution might be to smear colliding timestamps (add a microsecond on collision).
How can this problem be solved in a robust and performant manner for the following models?
class Sensor(models.Model):
name = models.CharField(max_length=50)
class SensorReading(models.Model):
time = models.DateTimeField(primary_key=True, default=datetime.now)
sensor = models.ForeignKey(Sensor, on_delete=models.CASCADE)
value = models.FloatField()
P.S. This is a workaround as Django does not support composite primary keys. Otherwise it would be possible to set the sensor and timestamp as a composite primary key.
To work with timescale db, I make a virtual primary key field, that tricks the Django and represents compound key value as a single json tuple.
https://viewflow.medium.com/the-django-compositeforeignkey-field-get-access-to-a-legacy-database-without-altering-db-tables-74abc9868026
You could check the code sample here - https://github.com/viewflow/cookbook/tree/v2/timescale_db
One solution I found was to use a try/except block around the model save call. Try to add a sensor reading, which will succeed most of the time, but if a collision occurs handle that error. Ensure that the exception thrown is exactly because of that collision and then increment the timestamp by one microsecond (the smallest resolution). Then repeat the try to save the sensor reading. This will almost always succeed due to the low collision probability in the first place. Below is the code tested to recursively handle incrementing the timestamp until it succeeds.
class SensorReading(models.Model):
time = models.DateTimeField(primary_key=True, default=datetime.now)
sensor = models.ForeignKey(Sensor, on_delete=models.CASCADE)
value = models.FloatField()
def save(self, *args, **kwargs):
self.save_increment_time_on_duplicate(*args, **kwargs)
def save_increment_time_on_duplicate(self, *args, **kwargs):
try:
super().save(*args, **kwargs)
except IntegrityError as exception:
if all (k in exception.args[0] for k in ("Key","time", "already exists")):
self.time = str(parse_datetime(self.time) + timedelta(microseconds=1))
self.save_increment_time_on_duplicate(*args, **kwargs)
A more robust implementation might also add a max number of tries before aborting.
In general, I'd recommend taking the table management for that sensors table at least somewhat out of Django, do your insertion with psycopg2 or another driver, and create a proper primary key on Sensor, time, then expose the table (or a view on top of it) as a Django model which it can read from, perhaps with a join if you need to. These should be write once, so you shouldn't have to deal with updates which is usually why it needs the non-compound primary key. At some point Django should really start supporting compound primary keys though, it's too bad they don't.
The other approaches may work, but they likely won't scale very well at all. So if you care about ingest rate, I might try something different.
I have a model Student with manager StudentManager as given below. As property gives the last date by adding college_duration in join_date. But when I execute this property computation is working well, but for StudentManager it gives an error. How to write manager class which on the fly computes some field using model fields and which is used to filter records.
The computed field is not in model fields. still, I want that as filter criteria.
class StudentManager(models.Manager):
def passed_students(self):
return self.filter(college_end_date__lt=timezone.now())
class Student(models.Model):
join_date = models.DateTimeField(auto_now_add=True)
college_duration = models.IntegerField(default=4)
objects = StudentManager()
#property
def college_end_date(self):
last_date = self.join_date + timezone.timedelta(days=self.college_duration)
return last_date
Error Django gives. when I tried to access Student.objects.passed_students()
django.core.exceptions.FieldError: Cannot resolve keyword 'college_end_date' into field. Choices are: join_date, college_duration
Q 1. How alias queries done in Django ORM?
By using the annotate(...)--(Django Doc) or alias(...) (New in Django 3.2) if you're using the value only as a filter.
Q 2. Why property not accessed in Django managers?
Because the model managers (more accurately, the QuerySet s) are wrapping things that are being done in the database. You can call the model managers as a high-level database wrapper too.
But, the property college_end_date is only defined in your model class and the database is not aware of it, and hence the error.
Q 3. How to write manager to filter records based on the field which is not in models, but can be calculated using fields present in the model?
Using annotate(...) method is the proper Django way of doing so. As a side note, a complex property logic may not be re-create with the annotate(...) method.
In your case, I would change college_duration field from IntegerField(...) to DurationField(...)--(Django Doc) since its make more sense (to me)
Later, update your manager and the properties as,
from django.db import models
from django.utils import timezone
class StudentManager(models.Manager):
<b>def passed_students(self):
default_qs = self.get_queryset()
college_end = models.ExpressionWrapper(
models.F('join_date') + models.F('college_duration'),
output_field=models.DateField()
)
return default_qs \
.annotate(college_end=college_end) \
.filter(college_end__lt=timezone.now().date())</b>
class Student(models.Model):
join_date = models.DateTimeField()
college_duration = models.DurationField()
objects = StudentManager()
#property
def college_end_date(self):
# return date by summing the datetime and timedelta objects
return <b>(self.join_date + self.college_duration).date()
Note:
DurationField(...) will work as expected in PostgreSQL and this implementation will work as-is in PSQL. You may have problems if you are using any other databases, if so, you may need to have a "database function" which operates over the datetime and duration datasets corresponding to your specific database.
Personally, I like this solution,
To quote #Willem Van Olsem's comment:
You don't. The database does not know anything about properties, etc. So it can not filter on this. You can make use of .annotate(..) to move the logic to the database side.
You can either do the message he shared, or make that a model field that auto calculates.
class StudentManager(models.Manager):
def passed_students(self):
return self.filter(college_end_date__lt=timezone.now())
class Student(models.Model):
join_date = models.DateTimeField(auto_now_add=True)
college_duration = models.IntegerField(default=4)
college_end_date = models.DateTimeField()
objects = StudentManager()
def save(self, *args, **kwargs):
# Add logic here
if not self.college_end_date:
self.college_end_date = self.join_date + timezone.timedelta(days-self.college_duration)
return super.save(*args, **kwargs)
Now you can search it in the database.
NOTE: This sort of thing is best to do from the start on data you KNOW you're going to want to filter. If you have pre-existing data, you'll need to re-save all existing instances.
Problem
You’re attempting to query on a row that doesn’t exist in the database. Also, Django ORM doesn’t recognize a property as a field to register.
Solution
The direct answer to your question would be to create annotations, which could be subsequently queried off of. However, I would reconsider your table design for Student as it introduces unnecessary complexity and maintenance overhead.
There’s much more framework/db support for start date, end date idiosyncrasy than there is start date, timedelta.
Instead of storing duration, store end_date and calculate duration in a model method. This makes more not only makes more sense as students are generally provided a start date and estimated graduation date rather than duration, but also because it’ll make queries like these much easier.
Example
Querying which students are graduating in 2020.
Students.objects.filter(end_date__year=2020)
I want to model pair-wise relations between all members of a set.
class Match(models.Model):
foo_a = models.ForeignKey(Foo, related_name='foo_a')
foo_b = models.ForeignKey(Foo, related_name='foo_b')
relation_value = models.IntegerField(default=0)
class Meta:
unique_together = ('ingredient_a', 'ingredient_b')
When I add a pair A-B, it successfully prevents me from adding A-B again, but does not prevent me from adding B-A.
I tried following, but to no avail.
unique_together = (('ingredient_a', 'ingredient_b'), ('ingredient_b', 'ingredient_a'))
Edit:
I need the relationship_value to be unique for every pair of items
If you define a model like what you defined, its not just a ForeignKey, its called a ManyToMany Relation.
In the django docs, it is explicitly defined that unique together constraint cannot be included for a ManyToMany Relation.
From the docs,
A ManyToManyField cannot be included in unique_together. (It’s not clear what that would even mean!) If you need to validate uniqueness related to a ManyToManyField, try using a signal or an explicit through model.
EDIT
After lot of search and some trial and errors and finally I think I have found a solution for your scenario. Yes, as you said, the present schema is not as trivial as we all think. In this context, Many to many relation is not the discussion we need to forward. The solution is, (or what I think the solution is) model clean method:
class Match(models.Model):
foo_a = models.ForeignKey(Foo, related_name='foo_a')
foo_b = models.ForeignKey(Foo, related_name='foo_b')
def clean(self):
a_to_b = Foo.objects.filter(foo_a = self.foo_a, foo_b = self.foo_b)
b_to_a = Foo.objects.filter(foo_a = self.foo_b, foo_b = self.foo_a)
if a_to_b.exists() or b_to_a.exists():
raise ValidationError({'Exception':'Error_Message')})
For more details about model clean method, refer the docs here...
I've overridden the save method of the object to save 2 pairs every time. If the user wants to add a pair A-B, a record B-A with the same parameters is automatically added.
Note: This solution affects the querying speed. For my project, it is not an issue, but it needs to be considered.
def save(self, *args, **kwargs):
if not Match.objects.filter(foo_a=self.foo_a, foo_b=self.foo_b).exists():
super(Match, self).save(*args, **kwargs)
if not Match.objects.filter(foo_a=self.foo_b, foo_b=self.foo_a).exists():
Match.objects.create(foo_a=self.foo_b, foo_b=self.foo_a, bar=self.bar)
EDIT: Update and remove methods need to be overridden too of course.
We are about to introduce a social aspect into our app, where users can like each others events.
Getting this wrong would mean a lot of headache later on, hence I would love to get input from some experienced developers on GAE, how they would suggest to model it.
It seems there is a similar question here however the OP didn't provide any code to begin with.
Here are two models:
class Event(ndb.Model):
user = ndb.KeyProperty(kind=User, required=True)
time_of_day = ndb.DateTimeProperty(required=True)
notes = ndb.TextProperty()
timestamp = ndb.FloatProperty(required=True)
class User(UserMixin, ndb.Model):
firstname = ndb.StringProperty()
lastname = ndb.StringProperty()
We need to know who has liked an event, in case that the user may want to unlike it again. Hence we need to keep a reference. But how?
One way would be introducing a RepeatedProperty to the Event class.
class Event(ndb.Model):
....
ndb.KeyProperty(kind=User, repeated=True)
That way any user that would like this Event, would be stored in here. The number of users in this list would determine the number of likes for this event.
Theoretically that should work. However this post from the creator of Python worries me:
Do not use repeated properties if you have more than 100-1000 values.
(1000 is probably already pushing it.) They weren't designed for such
use.
And back to square one. How am I supposed to design this?
RepeatProperty has limitation in number of values (< 1000).
One recommended way to break the limit is using shard:
class Event(ndb.Model):
# use a integer to store the total likes.
likes = ndb.IntegerProperty()
class EventLikeShard(ndb.Model):
# each shard only store 500 users.
event = ndb.KeyProperty(kind=Event)
users = ndb.KeyProperty(kind=User, repeated=True)
If the limitation is more than 1000 but less than 100k.
A simpler way:
class Event(ndb.Model):
likers = ndb.PickleProperty(compressed=True)
Use another model "Like" where you keep the reference to user and event.
Old way of representing many to many in a relational manner. This way you keep all entities separated and can easily add/remove/count.
I would recommend the usual many-to-many relationship using an EventUser model given that the design seems to require unlimited number of user linking an event. The only tricky part is that you must ensure that event/user combination is unique, which can be done using _pre_put_hook. Keeping a likes counter as proposed by #lucemia is indeed a good idea.
You would then would capture the liked action using a boolean, or, you can make it a bit more flexible by including an actions string array. This way, you could also capture action such as signed-up or attended.
Here is a sample code:
class EventUser(ndb.Model):
event = ndb.KeyProperty(kind=Event, required=True)
user = ndb.KeyProperty(kind=User, required=True)
actions = ndb.StringProperty(repeated=True)
# make sure event/user is unique
def _pre_put_hook(self):
cur_key = self.key
for entry in self.query(EventUser.user == self.user, EventUser.event == self.event):
# If cur_key exists, means that user is performing update
if cur_key.id():
if cur_key == entry.key:
continue
else:
raise ValueError("User '%s' is a duplicated entry." % (self.user))
# If adding
raise ValueError("User Add '%s' is a duplicated entry." % (self.user))
Suppose I have three django models:
class Section(models.Model):
name = models.CharField()
class Size(models.Model):
section = models.ForeignKey(Section)
size = models.IntegerField()
class Obj(models.Model):
name = models.CharField()
sizes = models.ManyToManyField(Size)
I would like to import a large amount of Obj data where many of the sizes fields will be identical. However, since Obj has a ManyToMany field, I can't just test for existence like I normally would. I would like to be able to do something like this:
try:
x = Obj(name='foo')
x.sizes.add(sizemodel1) # these can be looked up with get_or_create
...
x.sizes.add(sizemodelN) # these can be looked up with get_or_create
# Now test whether x already exists, so I don't add a duplicate
try:
Obj.objects.get(x)
except Obj.DoesNotExist:
x.save()
However, I'm not aware of a way to get an object this way, you have to just pass in keyword parameters, which don't work for ManyToManyFields.
Is there any good way I can do this? The only idea I've had is to build up a set of Q objects to pass to get:
myq = myq & Q(sizes__id=sizemodelN.id)
But I am not sure this will even work...
Use a through model and then .get() against that.
http://docs.djangoproject.com/en/dev/topics/db/models/#extra-fields-on-many-to-many-relationships
Once you have a through model, you can .get() or .filter() or .exists() to determine the existence of an object that you might otherwise want to create. Note that .get() is really intended for columns where unique is enforced by the DB - you might have better performance with .exists() for your purposes.
If this is too radical or inconvenient a solution, you can also just grab the ManyRelatedManager and iterate through to determine if the object exists:
object_sizes = obj.sizes.all()
exists = object_sizes.filter(id__in = some_bunch_of_size_object_ids_you_are_curious_about).exists()
if not exists:
(your creation code here)
Your example doesn't make much sense because you can't add m2m relationships before an x is saved, but it illustrated what you are trying to do pretty well. You have a list of Size objects created via get_or_create(), and want to create an Obj if no duplicate obj-size relationship exists?
Unfortunately, this is not possible very easily. Chaining Q(id=F) & Q(id=O) & Q(id=O) doesn't work for m2m.
You could certainly use Obj.objects.filter(size__in=Sizes) but that means you'd get a match for an Obj with 1 size in a huge list of sizes.
Check out this post for an __in exact question, answered by Malcolm, so I trust it quite a bit.
I wrote some python for fun that could take care of this.
This is a one time import right?
def has_exact_m2m_match(match_list):
"""
Get exact Obj m2m match
"""
if isinstance(match_list, QuerySet):
match_list = [x.id for x in match_list]
results = {}
match = set(match_list)
for obj, size in \
Obj.sizes.through.objects.filter(size__in=match).values_list('obj', 'size'):
# note: we are accessing the auto generated through model for the sizes m2m
try:
results[obj].append(size)
except KeyError:
results[obj] = [size]
return bool(filter(lambda x: set(x) == match, results.values()))
# filter any specific objects that have the exact same size IDs
# if there is a match, it means an Obj exists with exactly
# the sizes you provided to the function, no more.
sizes = [size1, size2, size3, sizeN...]
if has_exact_m2m_match(sizes):
x = Obj.objects.create(name=foo) # saves so you can use x.sizes.add
x.sizes.add(sizes)