Should I handle order number using the model's ID? - python

I've an personal ecommerce site.
I'm using the ID of the model as the Order Number. Just because it seemed logic, and I was expecting ID would increment just by 1 everytime.
However, I'm noticing that the ID of my Orders (of my Order model) had jumped twice:
a) From 54 to 86 (32 of difference).
b) From 99 to 132 (33 of difference).
Don't know why, don't know if I should use a custom field instead of the models ID.
I'm using Django 3.0 and hosting my project on Heroku.
models.py:
class Order(models.Model):
ORDER_STATUS = (
('recibido_pagado', 'Recibido y pagado'),
('recibido_no_pagado', 'Recibido pero no pagado'),
('en_proceso', 'En proceso'),
('en_camino', 'En camino'),
('entregado', 'Entregado'),
('cancelado', 'Cancelado por no pagar' )
)
token = models.CharField(max_length=100, blank=True, null=True)
first_name = models.CharField(max_length=50, blank=True, null=True)
last_name = models.CharField(max_length=50, blank=True, null=True)
phone_number = models.CharField(max_length=30, blank=True)
total = models.DecimalField(max_digits=10, decimal_places=2)
stickers_price = models.DecimalField(max_digits=10, decimal_places=2)
discount = models.DecimalField(max_digits=10, decimal_places=2, default=Decimal('0.00'))
shipping_cost = models.DecimalField(max_digits=10, decimal_places=2)
email = models.EmailField(max_length=250, blank = True, verbose_name= 'Correo electrónico')
last_four = models.CharField(max_length=100, blank=True, null=True)
created = models.DateTimeField(auto_now_add=True)
shipping_address = models.CharField(max_length=100, blank=True, null=True)
shipping_address1 = models.CharField(max_length=100, blank=True, null=True)
reference = models.CharField(max_length=100, blank=True, null=True)
shipping_department = models.CharField(max_length=100, blank=True, null=True)
shipping_province = models.CharField(max_length=100, blank=True, null=True)
shipping_district = models.CharField(max_length=100, blank=True, null=True)
reason = models.CharField(max_length=400, blank=True, null=True, default='')
status = models.CharField(max_length=20, choices=ORDER_STATUS, default='recibido_pagado')
comments = models.CharField(max_length=400, blank=True, null=True, default='')
cupon = models.ForeignKey('marketing.Cupons', blank=True, null=True, default=None, on_delete=models.SET_NULL)
class Meta:
db_table = 'Order'
ordering = ['-created']
def __str__(self):
return str(self.id)
def igv(self):
igv = int(self.total) * 18/100
return igv
def shipping_date(self):
shipping_date = self.created + datetime.timedelta(days=10)
return shipping_date
def deposit_payment_date(self):
deposit_payment_date = self.created + datetime.timedelta(days=2)
return
View that creates the order:
#csrf_exempt
def cart_charge_deposit_payment(request):
amount = request.POST.get('amount')
email = request.user.email
shipping_address = request.POST.get('shipping_address')
shipping_cost = request.POST.get('shipping_cost')
discount = request.POST.get('discount')
stickers_price = request.POST.get('stickers_price')
comments = request.POST.get('comments')
last_four = 1111
transaction_amount = amount
first_name = request.user.first_name
last_name = request.user.last_name
phone_number = request.user.profile.phone_number
current_time = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
shipping_address1 = request.user.profile.shipping_address1
reference = request.user.profile.reference
shipping_department = request.user.profile.shipping_department
shipping_province = request.user.profile.shipping_province
shipping_district = request.user.profile.shipping_district
order_details = Order.objects.create(
token='Random',
first_name=first_name,
last_name=last_name,
phone_number=phone_number,
email=email, # Using email entered in Culqi module, NOT user.email. Could be diff.
total=transaction_amount,
stickers_price = stickers_price,
discount = discount,
shipping_cost=shipping_cost,
last_four=last_four,
created=current_time,
shipping_address=shipping_address,
shipping_address1=shipping_address1,
reference=reference,
shipping_department=shipping_department,
shipping_province=shipping_province,
shipping_district=shipping_district,
status='recibido_no_pagado',
cupon=cupon,
comments=comments
)
...

If you need consecutive numbering without holes you should not use Django's autogenerated id field as your order number.
In order to guarantee uniqueness even under concurrency Django creates a database sequence which is an object in the database that produces a new value each time it is consulted. Note that the sequence consumes the value produced even if it is not saved to the database anywhere.
What happens then is that whenever you try to create an instance and this operation fails at the database level, a number from the sequence is consumed anyway. So let's say you create your first Order successfully, it will have the ID number 1. Then let's say that you try to create a second Order, but the INSERT in the database fails (for example for some integrity check, or whatever). Afterwards you successfully create a third Order, you would expect that this order has the ID number 2, but it will actually have ID number 3, because the number 2 was consumed from the sequence even if it was not saved.
So no, you cannot use the id if you need to ensure there are no holes in your order numbers.
Now in order to have consecutive numeration you could simply add a column
order_number = models.PositiveIntegerField(unique=True, null=True)
question is how to properly set its value. So in an ideal world where there is no concurrency (two processes running queries against the same database) you could simply get the maximum order number so far, add 1 and then save this value into order_number. Thing is if you do this naively you will end up having duplicates (actually integrity errors, because unique=True will prevent duplicates).
One way to solve this would be to lock your table (see this SO question) while you compute and update your order number.
As I assume you don't care that the order number faithfully reflects the order in which orders where created but only that it is sequential and without holes what you can do is to run a query like the following inside a transaction (assuming your Order model lives inside an orders django app):
UPDATE orders_order SET order_number = (SELECT COALESCE(MAX(order_number), 0) FROM orders_order) + 1 WHERE id = [yourid] AND order_number IS NULL
Now even with this query you could have concurrency issues, since Django uses postgres default isolation level by default. So in order to make this query safe you will need to change isolation level. Refer to this SO question for a way on having two separate connections with two different isolation levels. What you need to make this query safe is to set the isolation level to SERIALIZABLE.
Assuming you were able to solve the isolation level issue then is the thing on how to run this query
from django.db import connections, transaction
with transaction.atomic(using='your_isolated_db_alias'):
with connections['your_isolated_db_alias'].cursor() as cursor:
cursor.execute('UPDATE orders_order SET order_number = (SELECT COALESCE(MAX(order_number), 0) FROM orders_order) + 1 WHERE id = %s AND order_number IS NULL', order.id)
The snippet above assumes you have the order for which you want to set the order number in a variable called order. If your isolation is right then you should be safe.
Now there is a third alternative which is to use select_for_update() as a table locking mechanism (although it is not intended for that but for row level locking). So the idea is simple, in the same way as before you first create your order and then update it to set the order number. So in order to guarantee that you won't end up with duplicate (aka IntegrityError) order numbers what you do is issue a query that selects all the Orders in your DB and then use select_for_update() in the following way:
from django.db import transaction
with transaction.atomic():
# This locks every row in orders_order until the end of the transaction
Order.objects.all().select_for_update() # pointless query just to lock the table
max_on = Order.objects.aggregate(max_on=Max('order_number'))['max_on']
Order.objects.filter(id=order.id).update(order_number=max_on + 1)
As long as you are sure that you have at least 1 order before entering the code block above AND that you always do the full select_for_update() first, then you should also be safe.
And these are the ways I can think of how to solve the consecutive numbering. I'd love to see an out of the box solution for this, but unfortunately I do not know any.

This will not answer your question directly, but still might be useful for you or somebody with a similar problem.
From the data integrity point of view, deleting potentially useful data such as customer order in production can be a really bad idea. Even if you don't need this data at the moment, you may come to a point in future when you want to analyze all of your orders, even failed / cancelled ones.
What I would suggest here, is to ensure that deleting not so important related models doesn't cause deleting orders. You can easily achieve this by passing PROTECT argument to your ForeignKey field. This will raise ProtectedError when trying to delete related model. Another useful options are SET_NULL and SET_DEFAULT whose names speak for themselves.
By following this approach, you will never need to worry about the broken id counter.

Let's leave Django, Python.
That is DB topic. Say - you start transaction, with new row in particular table. That means new ID. If you commit that amount of work - new ID is visible. If rollback happens ID is lost. From DB perspective there is no way to reuse that number.
Be aware that select max(id) + 1 is bad practice - what if two transactions do that at the same time?
Other option is lock. I can see 3 solutions here:
Lock all rows in the table - that means - your insert time depends on table size :)
As a side note. If you go one by one to lock, be sure to sort all rows in the table to be sure there is no deadlock. Say you use Postgres, edit means row can be moved at the end... so order depends on what is going on with the data. If so two transactions can lock rows in different order, and deadlock is a matter of time. During tests, under low load - everything goes just fine...
Lock whole table. Better, since not depends on rows, but you block against edits as well.
Separate table for generators - each generator has row in that table - you lock that row, take next value, at the end of transaction row is released.
To all points. That means - you need short transactions. In web apps that is general rule. Just be sure create order is light, and most heavy things are performed as separate transaction. Why? Lock is released at the end of transaction.
Hope it explains the case.
In Django. Let's create model:
class Custom_seq(models.Model):
name = models.CharField(max_length=100, blank=False, null=False)
last_number = models.IntegerField(default=0)
Query for next id:
seq = Custom_seq.objects.filter(name='order sequence').select_for_update(no_wait=False).first()
new_order_id = seq.last_number + 1
seq.last_number = new_order_id
seq.save()
Why it works? Please note that at one time you are creating one order. It can be committed - so used, or rolled back - cancelled... both cases are supported.

It is database internal behavior: https://www.postgresql.org/docs/current/functions-sequence.html
Important
To avoid blocking concurrent transactions that obtain numbers from the
same sequence, a nextval operation is never rolled back; that is, once
a value has been fetched it is considered used and will not be
returned again. This is true even if the surrounding transaction later
aborts, or if the calling query ends up not using the value. For
example an INSERT with an ON CONFLICT clause will compute the
to-be-inserted tuple, including doing any required nextval calls,
before detecting any conflict that would cause it to follow the ON
CONFLICT rule instead. Such cases will leave unused “holes” in the
sequence of assigned values. Thus, PostgreSQL sequence objects cannot
be used to obtain “gapless” sequences.

Related

How to fetch related entries in Django through reverse foreign key

Django newbie here!
I am coming from .NET background I am frustrated as to how to do the following simple thing:
My simplified models are as follows
class Circle(BaseClass):
name = models.CharField("Name", max_length=2048, blank=False, null=False)
active = models.BooleanField(default=False)
...
class CircleParticipant(BaseClass):
circle = models.ForeignKey(Circle, on_delete=models.CASCADE, null=True, blank=True)
user = models.ForeignKey(User, on_delete=models.SET_NULL, null=True, blank=True)
status = models.CharField("Status", max_length=256, blank=False, null=False)
...
class User(AbstractBaseUser, PermissionsMixin):
email = models.EmailField(verbose_name="Email", unique=True, max_length=255, validators=[email_validator])
first_name = models.CharField(verbose_name="First name", max_length=30, default="first")
last_name = models.CharField(verbose_name="Last name", max_length=30, default="last")
...
My goal is to get a single circle with participants that include the users as well. With the extra requirement to do all that in a single DB trip.
in SQL terms I want to accomplish this:
SELECT circle.name, circle.active, circle_participant.status, user.email. user.first_name. user.last_name
FROM circle
JOIN circle_participant on circle.id = circle_participant.id
JOIN user on user.id = circle_participant.id
WHERE circle.id = 43
I've tried the following:
Circle.objects.filter(id=43) \
.prefetch_related(Prefetch('circleparticipant_set', queryset=CircleParticipant.objects.prefetch_related('user')))
This is supposed to be working but when I check the query property on that statement it returns
SELECT "circle"."id", "circle"."created", "circle"."updated", "circle"."name", "circle"."active", FROM "circle" WHERE "circle"."id" = 43
(additional fields omitted for brevity.)
Am I missing something or is the query property incorrect?
More importantly how can I achieve fetching all that data with a single DB trip.
For reference here's how to do it in .NET Entity Framework
dbContext.Circle
.Filter(x => x.id == 43)
.Include(x => x.CircleParticipants) // This will exist in the entity/model
.ThenInclude(x => x.User)
.prefetch_related will use a second query to reduce the bandwidth, otherwise it will repeat data for the same Circle and CircleParticipants multiple times. Your CircleParticipant however acts as a junction table, so you can use:
Circle.objects.filter(id=43).prefetch_related(
Prefetch('circleparticipant_set', queryset=CircleParticipant.objects.select_related('user')
)
)
Am I missing something or is the query property incorrect?
There are two ways that Django gives you to solve the SELECT N+1 problem. The first is prefetch_related(), which creates two queries, and joins the result in memory. The second is select_related(), which creates a join, but has a few more restrictions. (You also haven't set related_name on any of your foriegn keys. IIRC that is required before using select_related().)
More importantly how can I achieve fetching all that data with a single DB trip.
I would suggest that you not worry too much about doing it all in one query. One of the downsides of doing this in one query as you suggest is that lots of the data that comes back will be redundant. For example, the circle.name column will be the same for every row in the table which is returned.
You should absolutely care about how many queries you do - but only to the extent that you avoid a SELECT N+1 problem. If you're doing one query for each model class involved, that's pretty good.
If you care strongly about SQL performance, I also recommend the tool Django Debug Toolbar, which can show you the number of queries, the exact SQL, and the time taken by each.
in SQL terms I want to accomplish this:
There are a few ways you could accomplish that.
Use many-to-many
Django has a field which can be used to create a many-to-many relationship. It's called ManyToManyField. It will implicitly create a many-to-many table to represent the relationship, and some helper methods to allow you to easily query for all circles a user is in, or all users that a circle has.
You're also attaching some metadata to each user/circle relationship. That means you'll need to define an explicit table using ManyToManyField.through.
There are examples in the docs here.
Use a related model query
If I specifically wanted a join, and not a subquery, I would query the users like this:
Users.objects.filter(circleparticipant_set__circle_id=43)
Use a subquery
This also creates only one query, but it uses a subquery instead.
Users.objects.filter(circleparticipant_set=CircleParticipant.objects.filter(circle_id=43))

how to setup an implicit condition for unique=True in django orm

i am using Django along with DRF, and I am using many tables and complex relationships,
using soft delete (that is only marked deleted_at and not actually deleting anything) on my models from the begging didn't hurt much, but now we are making some new changes and deleted instances are growing.
so unique db constraints started giving errors, I know and implemented two possible solutions, one is unique_together in metaclass, with deleted at and every other unique value,
the other is:
class Meta:
constraints = [UniqueConstraint(fields=["name"], condition=Q(deleted_at != True))]
yet what I want is quite different, I want to avoid repeating all this and create a flag of uniqueness like this:
something_id = models.CharField(null=False, blank=False,
max_length=256, unique_undeleted=True)
notice the unique_undeleted parameter,
i know doing this directly requires changes in the library which is obviously bad,
but i have a parent model that i use in all others,
i thought of making a customized Constructor that will check every unique value, and make the unique on condition addition, or i could use this also
class MySpecialModel(parentModel):
somethingboolean= models.BooleanField(default=True, null=False)
something_id = models.CharField(null=False, blank=False,
max_length=256)
something_unique_too = models.CharField(null=False, blank=False,
max_length=256)
unique_undeleted_fields = ["something_id", "something_unique_too""]
and iterate in the parent model and create the UniqueConstraint for each field,
yet that does not feel right!
any guidance will be appreciated.

Can I create a Django object using a subquery for a field value?

TLDR
When creating a new object using Django ORM, can I, in a transactionally safe / race-condition-free manner, set a field's value based on an already existing object's value, say F('sequence_number') + 1 where F('sequence_number') refers not to the current object (which does not exist yet) but to the most recent object with that prefix in the table?
Longer version
I have a model Issue with properties sequence_number and sequence_prefix. There is a unique constraint on (sequence_prefix, sequence_number) (e.g. DATA-1).
class Issue(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
sequence_prefix = models.CharField(blank=True, default="", max_length=32)
sequence_number = models.IntegerField(null=False)
class Meta:
constraints = [
models.UniqueConstraint(
fields=["sequence_prefix", "sequence_number"], name="unique_sequence"
)
]
The idea is that issues —for auditing purposes— have unique sequence numbers for each variable (user-determined) prefix: when creating an issue the user selects a prefix, e.g. REVIEW or DATA, and the sequence number is the incremented value of the previous issue with that same sequence. So it's like an AutoField but dependent on the value of another field for its value. There can not be two issues DATA-1, but REVIEW-1 and DATA-1 and OTHER-1 all may exist at the same time.
How can I tell Django when creating an Issue, that it must find the most recent object for that given sequence_prefix, take the sequence_number + 1 and use that for the new object's sequence_number value, in a way that is safe of any race-condition?
A good way to archive this is to override the save() method of the Issue model.
For example:
class Issue(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
sequence_prefix = models.CharField(blank=True, default="", max_length=32)
sequence_number = models.IntegerField(null=False)
def save(self, *args, **kwargs):
max_id_by_prefix = Issue.objects.filter(sequence_prefix=self.sequence_prefix).max().id
self.sequence_number = max_id_by_prefix + 1
super(Issue, self).save(*args, **kwargs)
class Meta:
constraints = [
models.UniqueConstraint(
fields=["sequence_prefix", "sequence_number"], name="unique_sequence"
)
]
In this way, before saving the object, you can take the max sequence_number of the sequence_prefix that you are saving.
Unless you want to use database sequences (AutoField), I believe you will need to implement something on your own. There are two options
Prevent concurrent inserts per specific sequence_prefix with some locking mechanism (I would use Redis for a distributed lock, to support multi-processing setup)
Implement your own sequencing (again, Redis is a perfect choices), which will provide you with auto-incrementing sequence_number per prefix. For example:
sequence_number = redis_client.incr('sequence:REVIEW')

How do I remove duplicate records in an existing database? (no unique constraint defined)

I'm working on a project where improvements to database schema, query speed, and template inheritance are required. As an example, there is a Menu model that lacks a unique constraint (See below).
To improve the model's data integrity, I'm planning to add a migration by adding a unique=True constraint to the season field.
Before applying the migration, I checked the database for all Menu instances to see if an integrity error could potentially occur. As a result of checking, there are 3 model instances with the same value assigned to season.
I want to remove all but 1 of the Menu instances from the existing database in this case, and it doesn't matter which one is kept. What would be some approaches to accomplishing this?
class Menu(models.Model):
season = models.CharField(max_length=20)
items = models.ManyToManyField('Item', related_name='items')
created_date = models.DateTimeField(
default=timezone.now)
expiration_date = models.DateTimeField(
blank=True, null=True)
def __str__(self):
return self.season
You can below code if you are using Postgresql:
all_unique_season = Menu.objects.distinct('season').values_list('id', flat=True)
Menu.objects.exclude(id__in=all_unique_season).delete()
Also if you are using other databases, you can use below code:
used_ids = list()
for i in Menu.objects.values('id', 'season'):
used_ids.append(i['id'])
Menu.objects.filter(season=i['season']).exclude(id__in=used_ids).delete()

How to get Django admin.TabularInline to NOT require some items

class LineItemInline(admin.TabularInline):
model = LineItem
extra = 10
class InvoiceAdmin(admin.ModelAdmin):
model = Invoice
inlines = (LineItemInline,)
and
class LineItem(models.Model):
invoice = models.ForeignKey(Invoice)
item_product_code = models.CharField(max_length=32)
item_description = models.CharField(max_length=64)
item_commodity_code = models.ForeignKey(CommodityCode)
item_unit_cost = models.IntegerField()
item_unit_of_measure = models.ForeignKey(UnitOfMeasure, default=0)
item_quantity = models.IntegerField()
item_total_cost = models.IntegerField()
item_vat_amount = models.IntegerField(default=0)
item_vat_rate = models.IntegerField(default=0)
When I have it setup like this, the admin interface is requiring me to add data to all ten LineItems. The LineItems have required fields, but I expected it to not require whole line items if there was no data entered.
That's strange, it's supposed not to do that - it shouldn't require any data in a row if you haven't entered anything.
I wonder if the default options are causing it to get confused. Again, Django should cope with this, but try removing those and see what happens.
Also note that this:
item_unit_of_measure = models.ForeignKey(UnitOfMeasure, default=0)
is not valid, since 0 can not be the ID of a UnitOfMeasure object. If you want FKs to not be required, use null=True, blank=True in the field declaration.
Turns out the problem is default values. The one pointed out above about UnitOfMeasure isn't the actual problem though, any field with a default= causes it to require the rest of the data to be present. This to me seems like a bug since a default value should be subtracted out when determining if there is anything in the record that needs saving, but when I remove all the default values, it works.
In this code,
item_unit_of_measure = models.ForeignKey(UnitOfMeasure, default=0)
it was a sneaky way of letting the 0th entry in the database be the default value. That doesn't work unfortunately as he pointed out though.

Categories