Insert into Django JsonField without pulling the content into memory

Insert into Django JsonField without pulling the content into memory - python

Have a Django model
class Employee(models.Model):
data = models.JSONField(default=dict, blank=True)
This JSONField contains two year of data, like a ton of data.
class DataQUery:
def __init__(self, **kwargs):
super(DataQUery, self).__init__()
self.data = kwargs['data'] #is pulling the data in memory,
#that's what I want to avoid
# Then We have dictionary call today to hold daily data
today = dict()
# ... putting stuff in today
# Then insert it into data with today's date as key for that day
self.data[f"{datetime.now().date()}"] = today
# Then update the database
Employee.objects.filter(id=1).update(data=self.data)
I want to insert into data column without pulling it into memory.
Yes, I could change default=dict to default=list and directly do
Employee.objects.filter(id=1).update(data=today)
BUT I need today's DATE to identify the different days.
So if I don't need to pull the data column, I don't need kwargs dict. Let's say I don't init anything (so not pulling anything into memory), how can I update data column with a dictionary that's identified by today's date, such that after the update, the data column (JSONField) will look like {2021-08-10:{...}, 2021-08-11:{..}}

For relational databases, one can store multiple items that belong to the same entity by creating a new model with a ForeignKey to that other model. This thus means that we implement this as:
class Employee(models.Model):
# …
pass
class EmployeePresence(models.Model):
employee = models.ForeignKey(Employee, on_delete=models.CASCADE)
date = models.DateField(auto_now_add=True)
data = models.JSONField(default=dict, blank=True)
class Meta:
ordering = ['employee', 'date']
In that case we thus want to add a new EmployeePresence object that relates to an Employee object e, we thus create a new one with:
EmployeePresence.objects.create(
date='2021-08-11',
data={'some': 'data', 'other': 'data'}
)
We can access all EmployeePresences of a given Employee object e with:
e.employeepresence_set.all()
creating, updating, removing a single EmployeePresence record is thus simpler, and can be done efficiently through querying.

Related

LEFT JOIN with other param in ON Django ORM

I have the following models:
class Customer(models.Model):
name = models.CharField(max_length=255)
email = models.EmailField(max_length = 255, default='example#example.com')
authorized_credit = models.IntegerField(default=0)
balance = models.IntegerField(default=0)
class Transaction(models.Model):
customer = models.ForeignKey(Customer, on_delete=models.CASCADE)
payment_amount = models.IntegerField(default=0) #can be 0 or have value
exit_amount = models.IntegerField(default=0) #can be 0 or have value
transaction_date = models.DateField()
I want to query for get all customer information and date of last payment.
I have this query in postgres that is correct, is just that i need:
select e.*, max(l.transaction_date) as last_date_payment
from app_customer as e
left join app_transaction as l
on e.id = l.customer_id and l.payment_amount != 0
group by e.id
order by e.id
But i need this query in django for an serializer. I try with that but return other query
In Python:
print(Customer.objects.filter(transaction__isnull=True).order_by('id').query)
>>> SELECT app_customer.id, app_customer.name, app_customer.email, app_customer.balance FROM app_customer
LEFT OUTER JOIN app_transaction
ON (app_customer.id = app_transaction.customer_id)
WHERE app_transaction.id IS NULL
ORDER BY app_customer.id ASC
But that i need is this rows
example

Whether you are working with a serializer or not you can reuse the same view/function for both the tasks.
First to get the transaction detail for the current customer object you have you have to be aware of related_name.related_name have default values but you can mention something unique so that you remember.
Change your model:
class Transaction(models.Model):
customer = models.ForeignKey(Customer, related_name="transac_set",on_delete=models.CASCADE)
related_names are a way in django to create reverse relationship from Customer to Transaction this way you will be able to do Customer cus.transac_set.all() and it will fetch all the transaction of cus object.
Since you might have multiple customers to get transaction details for you can use select_related() when querying this will hit the database least number of times and get all the data for you.
Create a function definition to get the data of all transaction of Customers:
def get_cus_transac(cus_id_list):
#here cus_id_list is the list of ids you need to fetch
cus_transac_list = Transaction.objects.select_related(customer).filter(id__in = cus_id_list)
return cus_transac_list
For your purpose you need to use another way that is the reason you needed related_name, prefetch_related().
Create a function definition to get the data of latest transaction of Customers: ***Warning: I was typing this answer before sleeping so there is no way the latest value of transaction is being fetched here.I will add it later but you can work on similar terms and get it done this way.
def get_latest_transac(cus_id_list):
#here cus_id_list is the list of ids you need to fetch
latest_transac_list = Customer.objects.filter(id__in = cus_id_list).prefetch_related('transac_set')
return latest_transac_list
Now coming to serializer,you need to have 3 serializers (Actually you need 2 only but third one can serialize Customer data + latest transaction that you need) ...one for Transaction and another for customer then the 3rd Serializer to combine them.
There might be some mistakes in code or i might have missed some details.As i have not checked it.I am assuming you know how to make serializers and views for the same.

One approach is to use subqueries:
transaction_subquery = Transaction.objects.filter(
customer=OuterRef('pk'), payment_amount__gt=0,
).order_by('-transaction_date')
Customer.objects.annotate(
last_date_payment=Subquery(
transaction_subquery.values('transaction_date')[:1]
)
)
This will get all customer data, and annotate with their last transaction date that has payment_amount as non-zero, in one query.

To solve your problem:
I want to query for get all customer information and date of last payment.
You can try use order by combine with distinct:
Customer.objects.prefetch_related('transaction_set').values('id', 'name', 'email', 'authorized_credit', 'balance', 'transaction__transaction_date').order_by('-transaction__transaction_date').distinct('transaction__transaction_date')
Note:
It only applies to PostgreSQL when distinct followed by parameters.
Usage of distinct: https://docs.djangoproject.com/en/3.2/ref/models/querysets/#distinct

Creating records by model

Suppose I have such models:
class Recipe (models.Model):
par_recipe = models.CharField(max_length=200)
class Line (models.Model):
par_machine = models.CharField(max_length=200)
class Measurements (models.Model):
par_value = models.IntegerField(default=0)
id_line = models.ForeignKey(Line)
id_recipe = models.ForeignKey(Recipe)
Do I understand correctly that in this way I have a 1: 1 relationship, and adding entries ids will be automatically created id_line,id_recipe.
I will add for example:
for row in ws.iter_rows(row_offset=1):
recipe =Recipe()
line = line()
measurements = Measurements()
recipe.par_recipe = row[1].value
line.par_machine = row[2].value
measurements.par_value = row[8].value
And the small question about measurements was conceived that all secondary keys should go to it, now it is implemented correctly?

It is not quite like that, you would have to tie them together:
for row in ws.iter_rows(row_offset=1):
recipe =Recipe.objects.create(par_recipe=row[1].value)
line = Line.objects.create(par_machine=row[2].value)
measurements = Measurements.objects.create(
par_value=row[8].value,
id_line=line,
id_recipe=recipe
)
None of this is db optimized, you could use transactions to optimize the db writes.
You could make it faster if there are a lot of rows by using transactions:
from django.db import transaction
with transaction.atomic():
for row in ws.iter_rows(row_offset=1):
recipe =Recipe.objects.create(par_recipe=row[1].value)
line = Line.objects.create(par_machine=row[2].value)
measurements = Measurements.objects.create(
par_value=row[8].value,
id_line=line,
id_recipe=recipe
)
This would create a transaction and write one instead of each time. But it will also fail the whole transaction on an error.
see Django Database Transactions
You could get more creative by counting the number of records and writing every 1000 records for example by:
from django.db import transaction
with transaction.atomic():
for idx, row in enumerate(ws.iter_rows(row_offset=1)):
recipe =Recipe.objects.create(par_recipe=row[1].value)
line = Line.objects.create(par_machine=row[2].value)
measurements = Measurements.objects.create(
par_value=row[8].value,
id_line=line,
id_recipe=recipe
)
# every 1000 records, commmit the transaction
if idx % 1000 == 0:
transaction.commit()

Do I understand correctly that in this way I have a 1: 1 relationship, and adding entries ids will be automatically created id_line,id_recipe.
The relations will not link to the previously constructed objects, that would also be quite unsafe since a small change to the code fragment, could result in a totally different way of linking elements together.
Furthermore a ForeignKey is a many-to-one relation: multiple Measurements objects can refer to the same Recipe object.
You need to do this manually, for example:
for row in ws.iter_rows(row_offset=1):
recipe = Recipe.objects.create(par_recipe=row[1].value)
line = Line.objects.create(par_machine=row[2].value)
measurements = Measurements.objects.create(
par_value=row[8].value,
id_line=line,
id_recipe=recipe
)
Note that a ForeignKey refers to the objects, not to the primary key value, so you probably want to rename your ForeignKeys. A model typically has a singular name, so Measurement instead of Measurements:
class Measurement(models.Model):
par_value = models.IntegerField(default=0)
line = models.ForeignKey(Line, on_delete=models.CASCADE)
recipe = models.ForeignKey(Recipe, on_delete=models.CASCADE)

How to query database items from models.py in Django?

I have different model. Choices of Multiselctfield of one model is dependent on another model.So , database has to be queried inside model.py While doing so, this causes problem in migration. (Table doesn't exist error)
class Invigilator(models.Model):
---
# this method queries Shift objects and Examroom
def get_invigilator_assignment_list ():
assignment = []
shifts = Shift.objects.all()
for shift in shifts:
rooms= ExamRoom.objects.all()
for room in rooms:
assign = str (shift.shiftName)+ " " +str (room.name)
assignment.append (assign)
return assignment
assignment_choice = []
assign = get_invigilator_assignment_list()
i = 0
for assignm in assign:
datatuple = (i,assignm)
assignment_choice.append(datatuple)
i= i+1
ASSIGNMENT_CHOICE = tuple(assignment_choice)
assignment =MultiSelectField (choices = ASSIGNMENT_CHOICE, blank = True, verbose_name="Assignments")

You cannot add dynamic choices because they are all stored in the migration files and table info. If Django lets you do that, this means that everytime someone adds a record to those 2 models, a new migration should be created and the db should be changed. You must approach this problem differently.
As far as I know django-smart-selects has a ChainedManyToMany field which can do the trick.
Here is an example from the repo.
from smart_selects.db_fields import ChainedManyToManyField
class Publication(models.Model):
name = models.CharField(max_length=255)
class Writer(models.Model):
name = models.CharField(max_length=255)
publications = models.ManyToManyField('Publication', blank=True, null=True)
class Book(models.Model):
publication = models.ForeignKey(Publication)
writer = ChainedManyToManyField(
Writer,
chained_field="publication",
chained_model_field="publications")
name = models.CharField(max_length=255)

This cannot be done in the model and doesn't make sense. It's like you're trying to create a column in a table with a certain fixed set of choices (what is MultiSelecField anyway?), but when someone later adds a new row in the Shift or ExamRoom table, the initial column choices have to change again.
You can
either make your assignment column a simple CharField and create the choices dynamically when creating the form
or you can try to model your relationships differently. For example, since it looks like assignment is a combination of Shift and ExamRoom, I would create a through relationship:
shifts = models.ManyToManyField(Shift, through=Assignment)
class Assignment(Model):
room = ForeignKey(ExamRoom)
shift = ForeignKey(Shift)
invigilator = ForeignKey(Invigilator)
When creating the relationship, you'd have to pick a Shift and a Room which would create the Assignment object. Then you can query things like invigilator.shifts.all() or invigilator.assignment_set.first().room.

How to generate feed from different models in Django?

So, I have two models called apartments and jobs. It's easy to display contents of both models separately, but what I can't figure out is how to display the mix feed of both models based on the date.
jobs = Job.objects.all().order_by('-posted_on')
apartments = Apartment.objects.all().order_by('-date')
The posted date on job is represented by 'posted_by' and the posted date on apartment is represented by 'date'. How can I combine both of these and sort them according to the date posted? I tried combining both of these models in a simpler way like:
new_feed = list(jobs) + list(apartments)
This just creates the list of both of these models, but they are not arranged based on date.

I suggest two ways to achieve that.
With union() New in Django 1.11.
Uses SQL’s UNION operator to combine the results of two or more QuerySets
You need to to make sure that you have a unique name for the ordered field
Like date field for job and also apartment
jobs = Job.objects.all().order_by('-posted_on')
apartments = Apartment.objects.all().order_by('-date')
new_feed = jobs.union(apartments).order_by('-date')
Note with this options, you need to have the same field name to order them.
Or
With chain(), used for treating consecutive sequences as a single sequence and use sorted() with lambda to sort them
from itertools import chain
# remove the order_by() in each queryset, use it once with sorted
jobs = Job.objects.all()
apartments = Apartment.objects.all()
result_list = sorted(chain(job, apartments),
key=lambda instance: instance.date)
With this option, you don't really need to rename or change one of your field names, just add a property method, let's choose the Job Model
class Job(models.Model):
''' fields '''
posted_on = models.DateField(......)
#property
def date(self):
return self.posted_on
So now, both of your models have the attribute date, you can use chain()
result_list = sorted(chain(job, apartments),
key=lambda instance: instance.date)

A good way to do that is to use adapter design pattern. The idea is that we introduce an auxiliary data structure that can be used for the purpose of sorting these model objects. This method has several benefits over trying to fit both models to have the identically named attribute used for sorting. The most important is that the change won't affect any other code in your code base.
First, you fetch your objects as you do but you don't have to fetch them sorted, you can fetch all of them in arbitrary order. You may also fetch just top 100 of them in the sorted order. Just fetch what fits your requirements here:
jobs = Job.objects.all()
apartments = Apartment.objects.all()
Then, we build an auxiliary list of tuples (attribute used for sorting, object), so:
auxiliary_list = ([(job.posted_on, job) for job in jobs]
+ [(apartment.date, apartment) for apartment in apartments])
now, it's time to sort. We're going to sort this auxiliary list. By default, python sort() method sorts tuples in lexicographical order, which mean it will use the first element of the tuples i.e. posted_on and date attributes for ordering. Parameter reverse is set to True for sorting in decreasing order i.e. as you want them in your feed.
auxiliary_list.sort(reverse=True)
now, it's time to return only second elements of the sorted tuples:
sorted_feed = [obj for _, obj in auxiliary_list]
Just keep in mind that if you expect your feed to be huge then sorting these elements in memory is not the best way to do this, but I guess this is not your concern here.

I implemented this in the following ways.
I Video model and Article model that had to be curated as a feed. I made another model called Post, and then had a OneToOne key from both Video & Article.
# apps.feeds.models.py
from model_utils.models import TimeStampedModel
class Post(TimeStampedModel):
...
#cached_property
def target(self):
if getattr(self, "video", None) is not None:
return self.video
if getattr(self, "article", None) is not None:
return self.article
return None
# apps/videos/models.py
class Video(models.Model):
post = models.OneToOneField(
"feeds.Post",
on_delete=models.CASCADE,
)
...
# apps.articles.models.py
class Article(models.Model):
post = models.OneToOneField(
"feeds.Post",
on_delete=models.CASCADE,
)
...
Then for the feed API, I used django-rest-framework to sort on Post queryset's created timestamp. I customized serializer's method and added queryset annotation for customization etc. This way I was able to get either Article's or Video's data as nested dictionary from the related Post instance.
The advantage of this implementation is that you can optimize the queries easily with annotation, select_related, prefetch_related methods that works well on Post queryset.
# apps.feeds.serializers.py
class FeedSerializer(serializers.ModelSerializer):
type = serializers.SerializerMethodField()
class Meta:
model = Post
fields = ("type",)
def to_representation(self, instance) -> OrderedDict:
ret = super().to_representation(instance)
if isinstance(instance.target, Video):
ret["data"] = VideoSerializer(
instance.target, context={"request": self.context["request"]}
).data
else:
ret["data"] = ArticleSerializer(
instance.target, context={"request": self.context["request"]}
).data
return ret
def get_type(self, obj):
return obj.target._meta.model_name
#staticmethod
def setup_eager_loading(qs):
"""
Inspired by:
http://ses4j.github.io/2015/11/23/optimizing-slow-django-rest-framework-performance/
"""
qs = qs.select_related("live", "article")
# other db optimizations...
...
return qs
# apps.feeds.viewsets.py
class FeedViewSet(viewsets.ModelViewSet):
queryset = Post.objects.all()
serializer_class = FeedSerializer
permission_classes = (IsAuthenticatedOrReadOnly,)
def get_queryset(self):
qs = super().get_queryset()
qs = self.serializer_class().setup_eager_loading(qs)
return as
...

How to fetch last 24 hours records from database

I have Orders model which stores orders of users. I'd like to filter only orders which has been issued (order_started field) on the last 24 hours for a user. I am trying to update following view:
def userorders(request):
Orders = Orders.objects.using('db1').filter(order_owner=request.user).extra(select={'order_ended_is_null': 'order_ended IS NULL',},)
Order model has following fields:
order_uid = models.TextField(primary_key=True)
order_owner = models.TextField()
order_started = models.DateTimeField()
order_ended = models.DateTimeField(blank=True, null=True)
How can I add the extra filter?

You can do it as below, where you add another argument in the filter call (assuming the rest of your function was working):
import datetime
def userorders(request):
time_24_hours_ago = datetime.datetime.now() - datetime.timedelta(days=1)
orders = Orders.objects.using('db1').filter(
order_owner=request.user,
order_started__gte=time_24_hours_ago
).extra(select={'order_ended_is_null': 'order_ended IS NULL',},)
Note that Orders is not a good choice for a variable name, since it refers to another class in the project and begins with caps (generally used for classes), so I've used orders instead (different case).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Insert into Django JsonField without pulling the content into memory - python

Related

LEFT JOIN with other param in ON Django ORM

Creating records by model

How to query database items from models.py in Django?

How to generate feed from different models in Django?

How to fetch last 24 hours records from database

Categories

Resources