Django get_or_create() duplicates in db - python

Using Django 1.11.6, MySql
I`m importing uniq only data rows from CSV file (~530 rows).
After 1st import - all 530 records updated to the DB.
If I import this file 2d time and ~30 last records will be updated to DB.
Get data:
obj.account = int(ready_item[0].replace("\"","").replace("*",""))
pai_obj.reporting_mask = str(ready_item[0].replace("\"","").replace("*",""))
pai_obj.group = ready_item[1].replace("\"","")
pai_obj.location = ready_item[2].replace("\"","")
pai_obj.terminal = ready_item[4].replace("\"","")
pai_obj.settlement_type = ready_item[5].replace("\"","")
pai_obj.settlement_date = datetime_or_none(report_data)
pai_obj.amount = float_or_none(ready_item[6].replace("\"","").replace("$","").replace(",",""))
data.append(pai_obj)
Import vie get_or_create():
for record in data:
Accountmode.objects.get_or_create(
account=record.account,
reporting_mask=record.reporting_mask,
group=record.group,
location=record.location,
terminal=record.terminal,
settlement_type=record.settlement_type,
amount=record.amount,
defaults={'settlement_date': record.settlement_date})
The Model:
class Accountmode(models.Model):
account = models.IntegerField(blank=True, default=0)
reporting_mask = models.IntegerField(blank=False, default=0)
group = models.CharField(max_length=1024, blank=True, null=True)
location = models.CharField(max_length=1024, blank=True, null=True)
settlement_date = models.DateField(null=True)
terminal = models.CharField(max_length=1024, blank=False, null=True)
settlement_type = models.CharField(max_length=1024, blank=False, null=True)
amount = models.DecimalField(max_digits=25, decimal_places=2)
created_date = models.DateTimeField(default=datetime.now, blank=True)
As I know, get_or_create() should check if data already exist first and create new record if Not. Why get_or_create() pass some records?

The case was about Flout values with +3 symbols after come (12,012).
Those values were duplicating each time the user import same file.
Next solution was found:
1. Save amount and other values at str during file rows parsing.
obj.account = int(ready_item[0].replace("\"","").replace("*",""))
pai_obj.reporting_mask = str(ready_item[0].replace("\"","").replace("*",""))
pai_obj.group = ready_item[1].replace("\"","")
pai_obj.location = ready_item[2].replace("\"","")
pai_obj.terminal = ready_item[4].replace("\"","")
pai_obj.settlement_type = ready_item[5].replace("\"","")
pai_obj.settlement_date = datetime_or_none(report_data)
pai_obj.amount = *str*(ready_item[6].replace("\"","").replace("$","").replace(",",""))
data.append(pai_obj)

Related

Hey everyone I have problem in sorting queryset in Django

So, I am learning Django and trying to make a site similar to AirBNB.
I have models called lisitngs that has latitude and longitude stored in CharField. My model is as follows:
class Listing(models.Model):
class BathRoomType(models.TextChoices):
ATTACHED = 'Attached Bathroom'
COMMON = 'Shared Bathroom'
class RoomVentType(models.TextChoices):
AC = 'Air Conditioner'
NO_AC = 'No Air Conditioner'
class LisitngType(models.TextChoices):
ROOM = 'Room'
APARTEMENT = 'Apartement'
HOUSE = 'Full House'
user = models.ForeignKey(User, on_delete=models.CASCADE)
title = models.CharField(max_length=255)
city = models.ForeignKey(RoomLocation, on_delete=models.CASCADE)
exact_address = models.CharField(max_length=255)
lat = models.CharField(max_length=300, blank=False, null=False, default="0")
lng = models.CharField(max_length=300, blank=False, null=False, default="0")
description = models.TextField()
price = models.IntegerField()
listing_type = models.CharField(max_length=20, choices=LisitngType.choices, default=LisitngType.ROOM)
kitchen_available = models.BooleanField(default=False)
kitchen_description = models.TextField(null=True, blank=True)
bedrooms = models.IntegerField()
max_acomodation = models.IntegerField()
bathroom_type = models.CharField(max_length=20, choices=BathRoomType.choices, default=BathRoomType.ATTACHED)
no_bathrooms = models.IntegerField()
room_type = models.CharField(max_length=30, choices=RoomVentType.choices, default=RoomVentType.AC)
main_photo = models.ImageField(upload_to='room_images', default='default_room.jpg')
photo_1 = models.ImageField(upload_to='room_images', default='default_room.jpg')
photo_2 = models.ImageField(upload_to='room_images', default='default_room.jpg')
photo_3 = models.ImageField(upload_to='room_images', default='default_room.jpg')
is_published = models.BooleanField(default=False)
date_created = models.DateTimeField(default=timezone.now, editable=False)
slug = AutoSlugField(populate_from=['title', 'listing_type', 'bathroom_type', 'room_type'])
rating = models.IntegerField(default=5)
approved = models.BooleanField(default=False)
total_bookings = models.IntegerField(default=0)
def __str__(self):
return self.title
In my homepage what I want to do is show the listings which are nearby me.
For that I have a function named as near_places. This near_place function takes latitude and longitude after querying through the model Listing and returns the distance between the listing and current user accessing the homepage:
import geocoder
from haversine import haversine
def near_places(dest_lat, dest_lng):
g = geocoder.ip('me')
origin = tuple(g.latlng)
destination = (dest_lat, dest_lng)
distance = haversine(origin, destination)
return distance
My homepage function in views.py is as follows:
def home(request):
objects = Listing.objects.filter(is_published=True, approved=True)
for object in objects:
lat, lng = float(object.lat), float(object.lng)
object.distance = near_places(lat, lng)
return render(request, 'listings/home.html')
As you can see I have looped through the query set and for each data I have calculated the distance and appended in the queryset as distance. Now, I would like to only get 10 items that has lowest distance. How, can I do so.
I have tried to user object = objects.order_by('-distance')[:10] but it gives me error as
FieldError at /
Cannot resolve keyword 'distance' into field. Choices are: The_room_booked, approved, bathroom_type, bedrooms, city, city_id, date_created, description, exact_address, id, is_published, kitchen_available, kitchen_description, lat, listing_type, lng, main_photo, max_acomodation, no_bathrooms, photo_1, photo_2, photo_3, price, rating, reviewsandrating, room_type, slug, title, total_bookings, user, user_id
Any way that I can solve it?
Also it takes quite a time to calculate current distance using near_places() function as above. Any suggestions will be helpful.
Thank You
You can't do that, because your model doesn't have a distance field and there is no such DB column as well.
What you can do is either
add such field to your model - I don't recommend with your current logic as you will iterate over every instance and send sql request to update every row.
get your queryest, convert it to a list of dicts and then iterate over your list of dicts with your function adding the distance key to it. Then you can sort the list by the python sort function and pass it to the template.
Like:
objects = Listing.objects.filter(is_published=True, approved=True).values('lat', 'lng') # add fields that you need
for obj in objects:
obj['distance'] = near_places(obj['lat'], obj['lng'])
my_sorted_list = sorted(objects, key=lambda k: k['distance'])
Pass my_sorted_list to your template. You can add reverse=True arg to sorted function if you want another direction sorting.

Add quantity for every the same Client and Product in Django

I have 3 models, CustomerPurchaseOrderDetail, Customer and Product model, if Customer1 buy a product for example, Meat. it will save in CustomerPurchaseOrderDetail and if that Customer1 add another Meat Product, instead of adding another record to the database it will simply just add quantity.
this is my views.py
def batchaddtocart(request):
userID = request.POST.get("userID")
client = Customer(id=userID)
vegetables_id = request.POST.get("id")
v = Product(id=vegetables_id)
price = request.POST.get("price")
discount = request.POST.get("discount_price")
insert, create = CustomerPurchaseOrderDetail.objects.get_or_create(
profile=client,
products=v,
unitprice=price,
quantity=1,
discounted_amount=discount,
discounted_unitprice=discount,
)
order_qs = CustomerPurchaseOrderDetail.objects.filter\
(
profile=client,
products=v,
unitprice=price,
quantity=1,
discounted_amount=discount,
discounted_unitprice=discount
)
for order in order_qs:
if order.profile == client and order.products == v:
insert.quantity += 1
print(insert.quantity)
insert.save()
insert.save()
this is my models.py
class CustomerPurchaseOrderDetail(models.Model):
profile = models.ForeignKey(Customer,
on_delete=models.SET_NULL, null=True, blank=True,
verbose_name="Client Account")
products = models.ForeignKey(Product,
on_delete=models.SET_NULL, null=True, blank=True,
verbose_name="Product")
quantity = models.IntegerField(max_length=500, null=True, blank=True, default=1)
class Product(models.Model):
product = models.CharField(max_length=500)
class Customer(models.Model):
user = models.OneToOneField(User, related_name="profile", on_delete=models.CASCADE)
firstname = models.CharField(max_length=500, blank=True)
lastname = models.CharField(max_length=500, blank=True)
contactNumber = models.CharField(max_length=500, blank=True)
email = models.CharField(max_length=500, blank=True)
I did not encounter an error but the functionality I wanted did not work. it does not add additional quantity even if the same product is added to the list purchased by Customer1.
the problem is in following lines client = Customer(id=userID) and v = Product(id=vegetables_id) every time your function is called you are creating new customer and product objects instead of using existing objects is your database. replace them with client,created = Customer.objects.get_or_create(id=userID) and same for product v,v_created = Product.objects.get_or_create(id=vegetables_id)
When you use get_or_create method, it will create a new entry whenever at least one parameter is different enough to did not match any registered value. So, if you pass the quantity parameter equals 1, it always will create a new entry when quantity is 2+, for instance.
You should filter first with only the "fixed" parameters and create a new entry if you get nothing. Otherwise, just increment quantity.
Something like this:
order = None
order_qs = CustomerPurchaseOrderDetail.objects.filter\
(
profile=client,
products=v,
unitprice=price,
discounted_amount=discount,
discounted_unitprice=discount
)
if not order_qs:
order = CustomerPurchaseOrderDetail.objects.create(
profile=client,
products=v,
unitprice=price,
quantity=1,
discounted_amount=discount,
discounted_unitprice=discount,
)
else:
for order in order_qs:
if order.profile == client and order.products == v:
# NOTE: Please, note if you need to check your other parameters too (unityprice, discounted....)
order.quantity += 1
print(order.quantity)
order.save()

Django - improve the query consisting many-to-many and foreignKey fields

I want to export a report from the available data into a CSV file. I wrote the following code and it works fine. What do you suggest to improve the query?
Models:
class shareholder(models.Model):
title = models.CharField(max_length=100)
code = models.IntegerField(null=False)
class Company(models.Model):
isin = models.CharField(max_length=20, null=False)
cisin = models.CharField(max_length=20)
name_fa = models.CharField(max_length=100)
name_en = models.CharField(max_length=100)
class company_shareholder(models.Model):
company = models.ManyToManyField(Company)
shareholder = models.ForeignKey(shareholder, on_delete=models.SET_NULL, null=True)
share = models.IntegerField(null = True) # TODO: *1000000
percentage = models.DecimalField(max_digits=8, decimal_places=2, null=True)
difference = models.DecimalField(max_digits=11, decimal_places=2, null=True)
update_datetime = models.DateTimeField(null=True)
View:
def ExportAllShare(request):
response = HttpResponse(content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename="shares.csv"'
response.write(u'\ufeff'.encode('utf8'))
writer = csv.writer(response)
writer.writerow(['date','company','shareholder title','shareholder code','difference','share'])
results = company_shareholder.objects.all()
for result in results:
row = (
result.update_datetime,
result.company.first().name_fa,
result.shareholder.title,
result.shareholder.code,
result.difference,
result.share,
)
writer.writerow(row)
return (response)
First of all if it's working fine for you, then it's working fine, don't optimize prematurely.
But, in a query like this you are running into n+1 problem. In Django you avoid it using select_related and prefetch_related. Like this:
results = company_shareholder.objects.select_related('shareholder').prefetch_related('company').all()
This should reduce the number of queries you are generating. If you need a little bit more performance and since you are not using percentage I would defer it.
Also, I would highly suggest you follow PEP8 styling guide and name your classes in CapWords convention like Shareholder and CompanyShareholder.

Django DateTimeField received a naive datetime

The problem I have is that I don't seem to be able to filter my data based on timestamp, i.e. both date and hour.
My model looks as follows:
# Create your models here.
class HourlyTick(models.Model):
id = models.IntegerField(primary_key=True)
timestamp = models.DateTimeField(blank=True, null=True)
symbol = models.TextField(blank=True, null=True)
open = models.IntegerField(blank=True, null=True)
high = models.IntegerField(blank=True, null=True)
low = models.IntegerField(blank=True, null=True)
close = models.IntegerField(blank=True, null=True)
trades = models.IntegerField(blank=True, null=True)
volume = models.IntegerField(blank=True, null=True)
vwap = models.FloatField(blank=True, null=True)
class Meta:
managed = False
db_table = 'xbtusd_hourly'
My view:
class HourlyTickList(ListAPIView):
serializer_class = HourlyTickSerializer
def get(self, request):
start = request.GET.get('start', None)
end = request.GET.get('end', None)
tz = pytz.timezone("Europe/Paris")
start_dt = datetime.datetime.fromtimestamp(int(start) / 1000, tz)
end_dt = datetime.datetime.fromtimestamp(int(end) / 1000, tz)
qs = HourlyTick.objects.filter(timestamp__range = (start_dt, end_dt))
rawData = serializers.serialize('python', qs)
fields = [d['fields'] for d in rawData]
fieldsJson = json.dumps(fields, indent=4, sort_keys=True, default=str)
return HttpResponse(fieldsJson, content_type='application/json')
The message I receive is:
RuntimeWarning: DateTimeField HourlyTick.timestamp received a naive
datetime (2017-01-15 06:00:00) while time zone support is active.
RuntimeWarning)
However, when I use make_aware to fix this error, I get the error:
ValueError: Not naive datetime (tzinfo is already set)
My database contains data that looks like this:
2017-01-06T12:00:00.000Z
For some reason, the first option returns results, but it totally ignores the time.
How do I fix this?
The problem was because Python couldn't interpret the format I had stored in the database. Two solutions possible:
Writing a raw query with string transformation in Django
Storing the datetime fields in a different format
I went with option 2 since it was an already automated script for retrieving the data and now it works fine.

Django filter only on aggregate/annotate

I'm trying to construct a fairly complicated Django query and I'm not making much progress. I was hoping some wizard here could help me out?
I have the following models:
class Person(models.Model):
MALE = "M"
FEMALE = "F"
OTHER = "O"
UNKNOWN = "U"
GENDER_CHOICES = (
(MALE, "Male"),
(FEMALE, "Female"),
(UNKNOWN, "Other"),
)
firstName = models.CharField(max_length=200, null=True, db_column="firstname")
lastName = models.CharField(max_length=200, null=True, db_column="lastname")
gender = models.CharField(max_length=1, choices=GENDER_CHOICES, default=UNKNOWN, null=True)
dateOfBirth = models.DateField(null=True, db_column="dateofbirth")
dateInService = models.DateField(null=True, db_column="dateinservice")
photo = models.ImageField(upload_to='person_photos', null=True)
class SuccessionTerm(models.Model):
originalName = models.CharField(max_length=200, null=True, db_column="originalname")
description = models.CharField(max_length=200, blank=True, null=True)
score = models.IntegerField()
class Succession(model.Model):
position = models.ForeignKey(Position, to_field='positionId', db_column="position_id")
employee = models.ForeignKey(Employee, to_field='employeeId', db_column="employee_id")
term = models.ForeignKey(SuccessionTerm)
class Position(models.Model):
positionId = models.CharField(max_length=200, unique=True, db_column="positionid")
title = models.CharField(max_length=200, null=True)
# There cannot be a DB constraint, as that would make it impossible to add the first position.
dottedLine = models.ForeignKey("Position", to_field='positionId', related_name="Dotted Line",
null=True, db_constraint=False, db_column="dottedline_id")
solidLine = models.ForeignKey("Position", to_field='positionId', related_name="SolidLine",
null=True, db_constraint=False, db_column="solidline_id")
grade = models.ForeignKey(Grade)
businessUnit = models.ForeignKey(BusinessUnit, null=True, db_column="businessunit_id")
functionalArea = models.ForeignKey(FunctionalArea, db_column="functionalarea_id")
location = models.ForeignKey(Location, db_column="location_id")
class Employee(models.Model):
person = models.OneToOneField(Person, db_column="person_id")
fte = models.IntegerField(default=100)
dataSource = models.ForeignKey(DataSource, db_column="datasource_id")
talentStatus = models.ForeignKey(TalentStatus, db_column="talentstatus_id")
retentionRisk = models.ForeignKey(RetentionRisk, db_column="retentionrisk_id")
retentionRiskReason = models.ForeignKey(RetentionRiskReason, db_column="retentionriskreason_id")
performanceStatus = models.ForeignKey(PerformanceStatus, db_column="performancestatus_id")
potential = models.ForeignKey(Potential, db_column="potential_id")
mobility = models.ForeignKey(Mobility, db_column="mobility_id")
currency = models.ForeignKey(Currency, null=True, db_column="currency_id")
grade = models.ForeignKey(Grade, db_column="grade_id")
position = models.OneToOneField(Position, to_field='positionId', null=True,
blank=True, db_column="position_id")
employeeId = models.CharField(max_length=200, unique=True, db_column="employeeid")
dateInPosition = models.DateField(null=True, db_column="dateinposition")
Now, what I want is for each employee to get the position title, the person's name, and for each succession term (of which there are three) how many times the position of that employee is in the succession table, and the number of times each of these employees occurs in the successors table. Above all, I want to do all of this in a singe query (or more specifically, a single Django ORM statement), as I'm doing this in a paginated way, but I want to be able to order the result on any of these columns!
So far, I have this:
emps = Employee.objects.all()
.annotate(ls_st=Count('succession__term'))
.filter(succession__term__description="ShortTerm")
.order_by(ls_st)
.prefetch_related('person', 'position')[lower_limit:upper_limit]
This is only one of the succession terms, and I would like to extend it to all terms by adding more annotate calls.
My problem is that the filter call works on the entire query. I would like to only filter on the Count call.
I've tried doing something like Count(succession__term__description'="ShortTerm") but that doesn't work. Is there any other way to do this?
Thank you very much in advance,
Regards,
Linus
So what you want is a count of each different type of succession__term? That is pretty complex, and I don't think you can do this with the built in django orm right now. (unless you did a .extra() query)
In django 1.8, I believe you will be able to do it with the new Query Expressions (https://docs.djangoproject.com/en/dev/releases/1.8/#query-expressions). But of course 1.8 isn't released yet, so that doesn't help you.
In the meantime, you can use the very handy django-aggregate-if package. (https://github.com/henriquebastos/django-aggregate-if/, https://pypi.python.org/pypi/django-aggregate-if)
With django-aggregate-if, your query might look like this:
emps = Employee.objects.annotate(
ls_st=Count('succession__term', only=Q(succession__term__description="ShortTerm")),
ls_lt=Count('succession__term', only=Q(succession__term__description="LongTerm")), # whatever your other term descriptions are.
ls_ot=Count('succession__term', only=Q(succession__term__description="OtherTerm"))
)
.order_by('ls_st')
.prefetch_related('person', 'position')[lower_limit:upper_limit]
Disclaimer: I have never used django-aggregate-if, so I'm not entirely sure if this will work, but according the the README, it seems like it should.

Categories