How to Optimize a loop "For" in Django

How to Optimize a loop "For" in Django - python

I want to optimize this functions, because they take too long, each of them bring specific atributes, if you can help me. I think there's maybe a way to call the atributes in the function.
The functions are made with python and Django.
This is what i've done so far.
Definition of the functions.
cand_seleccionados = ListaFinal.objects.filter(interesado__id_oferta=efectiva.oferta.id)
seleccionados_ids = cand_seleccionados.values_list("interesado_id", flat=True)
cand_postulados = Postulados.objects.filter(
interesado__id_oferta=efectiva.oferta.id
).exclude(interesado_id__in=seleccionados_ids)
postulados_ids = cand_postulados.values_list("interesado_id", flat=True)
cand_entrevistados = Entrevistados.objects.filter(
interesado__id_oferta=efectiva.oferta.id
).exclude(interesado_id__in=postulados_ids)
This is the loop for cand_Postulados, the others are the same so i thought it wouldnt be necesary to put more
for p in cand_postulados:
postulado = dict()
telefono = Perfil.objects.values_list("telefono", flat=True).get(
user_id=p.interesado.candidato.id
)
postulado["id"] = p.interesado.candidato.id
postulado["nombre"] = p.interesado.candidato.first_name
postulado["email"] = p.interesado.candidato.email
postulado["teléfono"] = telefono
if p.interesado.id_oferta.pais is None:
postulado["pais"] = "Sin pais registrado"
else:
postulado["pais"] = p.interesado.id_oferta.pais.nombre
postulado["nombre_reclutador"] = p.interesado.id_reclutador.first_name
postulado["id_reclutador"] = p.interesado.id_reclutador.id
postulados.append(postulado)

If I'm reading your loop correctly, this should do everything in a single query. (You may need to adjust some of the __ spanning lookups if I read things incorrectly. In particular, I don't necessarily know the reverse name of your Perfil to user relationship.)
cand_postulados = (
Postulados.objects
.filter(interesado__id_oferta=efectiva.oferta.id)
.exclude(interesado_id__in=seleccionados_ids)
)
postulados = list(cand_postulados.values(
teléfono="interesado__candidato__telefono",
nombre="interesado__candidato__first_name",
email="interesado__candidato__email",
pais="interesado__id_oferta__pais__nombre",
nombre_reclutador="interesado__id_reclutador__first_name",
id_reclutador="interesado__id_reclutador__id",
))
for datum in postulados:
if not datum.get("pais"):
datum["pais"] = "Sin pais registrado"

Related

Need help in reducing the complexity or duplication in the function

Hi can someone please help me in reducing the complexity of the below mentioned code as I am new to this I need it to reduce the amount of code and improve the code and to improve simplicity and reduce duplications in the overall coding any any help in this regard can be of great help and thanks in advance for your time and consideration in helping me in this regard.
def update(self, instance, validated_data):
instance.email_id = validated_data.get('email_id', instance.email_id)
instance.email_ml_recommendation = validated_data.get('email_ml_recommendation',
instance.email_ml_recommendation)
instance.ef_insured_name = validated_data.get('ef_insured_name', instance.ef_insured_name)
instance.ef_broker_name = validated_data.get('ef_broker_name', instance.ef_broker_name)
instance.ef_obligor_name = validated_data.get('ef_obligor_name', instance.ef_obligor_name)
instance.ef_guarantor_third_party = validated_data.get('ef_guarantor_third_party',
instance.ef_guarantor_third_party)
instance.ef_coverage = validated_data.get('ef_coverage', instance.ef_coverage)
instance.ef_financials = validated_data.get('ef_financials', instance.ef_financials)
instance.ef_commercial_brokerage = validated_data.get('ef_commercial_brokerage',
instance.ef_commercial_brokerage)
# fixing bug of pipeline
instance.ef_underwriter_decision = validated_data.get('ef_underwriter_decision',
instance.ef_underwriter_decision)
instance.ef_decision_nty_fields = validated_data.get('ef_decision_nty_fields',
instance.ef_decision_nty_fields)
instance.ef_feedback = validated_data.get('ef_feedback', instance.ef_feedback)
instance.relation_id = validated_data.get('relation_id', instance.relation_id)
instance.email_outlook_date = validated_data.get('email_outlook_date',
instance.email_outlook_date)
instance.ef_decision_nty_fields = validated_data.get('ef_decision_nty_fields',
instance.ef_decision_nty_fields)
instance.ef_pl_est_premium_income = validated_data.get('ef_pl_est_premium_income',
instance.ef_pl_est_premium_income)
instance.ef_pl_prob_closing = validated_data.get('ef_pl_prob_closing',
instance.ef_pl_prob_closing)
instance.ef_pl_time_line = validated_data.get('ef_pl_time_line', instance.ef_pl_time_line)
instance.ef_pipeline = validated_data.get('ef_pipeline', instance.ef_pipeline)
instance.el_insured_margin = validated_data.get('el_insured_margin', instance.el_insured_margin)
instance.ef_last_updated = validated_data.get('ef_last_updated', instance.ef_last_updated)
instance.relation_id = validated_data.get('relation_id', instance.relation_id)
# CR3.2 Primium and basis point addition
instance.broker_email_id = validated_data.get('broker_email_id', instance.broker_email_id)
instance.premium = validated_data.get('premium', instance.premium)
instance.basis_points = validated_data.get('basis_points', instance.basis_points)
instance.basis_points_decision = validated_data.get('basis_points_decision',
instance.basis_points_decision)
embedded_json = validated_data.get('email_embedd', instance.email_embedd)
instance.email_embedd = json.dumps(embedded_json)

If all items in your dictionary validated_data that have a corresponding attribute in instance have to be copied to that instance, then iterate those items and use setattr to set the corresponding attributes of your instance object.
You seem to have one special case where a value needs to be stringified as JSON. So you'll need specific code to deal with that scenario:
def update(self, instance, validated_data):
for key, value in validated_data.items():
if hasattr(instance, key):
if key == "email_embedd": # special case
instance.email_embedd = json.dumps(value)
else:
setattr(instance, key, value)
A Logical Error...
There is a problem in your code for the special case:
embedded_json = validated_data.get('email_embedd', instance.email_embedd)
instance.email_embedd = json.dumps(embedded_json)
If this gets executed when validated_data does not have the key email_embedd, then embedded_json will default to instance.email_embedd. But that value is already JSON encoded! So if you now proceed with json.dumps(embedded_json) you'll end up with a JSON string that itself has been stringified again!
This problem will not occur with the code proposed above.

How to speed up writing in a database?

I have a function which search for json files in a directory, parse the file and write data in the database. My problem is writing in database, because it take around 30 minutes. Any idea how can I speed up writting in a database? I have few quite big files to parse, but parsing the file is not a problem it take around 3 minutes. Currently I am using sqlite but in the future I will change it to PostgreSQL.
Here is my function:
def create_database():
with transaction.atomic():
directory = os.fsencode('data/web_files/unzip')
for file in os.listdir(directory):
filename = os.fsdecode(file)
with open('data/web_files/unzip/{}'.format(filename.strip()), encoding="utf8") as f:
data = json.load(f)
cve_items = data['CVE_Items']
for i in range(len(cve_items)):
database_object = DataNist()
try:
impact = cve_items[i]['impact']['baseMetricV2']
database_object.severity = impact['severity']
database_object.exp_score = impact['exploitabilityScore']
database_object.impact_score = impact['impactScore']
database_object.cvss_score = impact['cvssV2']['baseScore']
except KeyError:
database_object.severity = ''
database_object.exp_score = ''
database_object.impact_score = ''
database_object.cvss_score = ''
for vendor_data in cve_items[i]['cve']['affects']['vendor']['vendor_data']:
database_object.vendor_name = vendor_data['vendor_name']
for description_data in cve_items[i]['cve']['description']['description_data']:
database_object.description = description_data['value']
for product_data in vendor_data['product']['product_data']:
database_object.product_name = product_data['product_name']
database_object.save()
for version_data in product_data['version']['version_data']:
if version_data['version_value'] != '-':
database_object.versions_set.create(version=version_data['version_value'])
My models.py:
class DataNist(models.Model):
vendor_name = models.CharField(max_length=100)
product_name = models.CharField(max_length=100)
description = models.TextField()
date = models.DateTimeField(default=timezone.now)
severity = models.CharField(max_length=10)
exp_score = models.IntegerField()
impact_score = models.IntegerField()
cvss_score = models.IntegerField()
def __str__(self):
return self.vendor_name + "-" + self.product_name
class Versions(models.Model):
data = models.ForeignKey(DataNist, on_delete=models.CASCADE)
version = models.CharField(max_length=50)
def __str__(self):
return self.version
I will appreciate if you can give me any advice how can I improve my code.

Okay, given the structure of the data, something like this might work for you.
This is standalone code aside from that .objects.bulk_create() call; as commented in the code, the two classes defined would actually be models within your Django app.
(By the way, you probably want to save the CVE ID as an unique field too.)
Your original code had the misassumption that every "leaf entry" in the affected version data would have the same vendor, which may not be true. That's why the model structure here has a separate product-version model that has vendor, product and version fields. (If you wanted to optimize things a little, you might deduplicate the AffectedProductVersions even across DataNists (which, as an aside, is not a perfect name for a model)).
And of course, as you had already done in your original code, the importing should be run within a transaction (transaction.atomic()).
Hope this helps.
import json
import os
import types
class DataNist(types.SimpleNamespace): # this would actually be a model
severity = ""
exp_score = ""
impact_score = ""
cvss_score = ""
def save(self):
pass
class AffectedProductVersion(types.SimpleNamespace): # this too
# (foreign key to DataNist here)
vendor_name = ""
product_name = ""
version_value = ""
def import_item(item):
database_object = DataNist()
try:
impact = item["impact"]["baseMetricV2"]
except KeyError: # no impact object available
pass
else:
database_object.severity = impact.get("severity", "")
database_object.exp_score = impact.get("exploitabilityScore", "")
database_object.impact_score = impact.get("impactScore", "")
if "cvssV2" in impact:
database_object.cvss_score = impact["cvssV2"]["baseScore"]
for description_data in item["cve"]["description"]["description_data"]:
database_object.description = description_data["value"]
break # only grab the first description
database_object.save() # save the base object
affected_versions = []
for vendor_data in item["cve"]["affects"]["vendor"]["vendor_data"]:
for product_data in vendor_data["product"]["product_data"]:
for version_data in product_data["version"]["version_data"]:
affected_versions.append(
AffectedProductVersion(
data_nist=database_object,
vendor_name=vendor_data["vendor_name"],
product_name=product_data["product_name"],
version_name=version_data["version_value"],
)
)
AffectedProductVersion.objects.bulk_create(
affected_versions
) # save all the version information
return database_object # in case the caller needs it
with open("nvdcve-1.0-2019.json") as infp:
data = json.load(infp)
for item in data["CVE_Items"]:
import_item(item)

How to test if one element from list is in other list in django filter

I trying to make this for query work using the Django filter. Any help?
tmp = TemporaryLesson.objects.filter(Q(expiration_date__gte=now()) | Q(expiration_date__isnull=True))
temporary_lessons = []
for t in tmp: # How to make this manual query works in the filter above?
for c in t.related_courses:
if c in student.my_courses:
temporary_lessons.append(t)
break
EDIT:
Model variables
From Student
my_courses = models.ManyToManyField(
'course.BaseCourse',
related_name='students',
through=CourseXStudent
)
From TemporaryLesson
related_courses = models.ManyToManyField(
'course.BaseCourse',
related_name='temporary_lessons'
)

How to get 10 random GAE ndb entities?

I have the following class to keep my records:
class List(ndb.Model):
'''
Index
Key: sender
'''
sender = ndb.StringProperty()
...
counter = ndb.IntegerProperty(default=0)
ignore = ndb.BooleanProperty(default=False)
added = ndb.DateTimeProperty(auto_now_add=True, indexed=False)
updated = ndb.DateTimeProperty(auto_now=True, indexed=False)
The following code is used to return all entities I need:
entries = List.query()
entries = entries.filter(List.counter > 5)
entries = entries.filter(List.ignore == False)
entries = entries.fetch()
How should I modify the code to get 10 random records from entries? I am planning to have a daily cron task to extract random records, so they should be really random. What is the best way to get these records (to minimize number of read operations)?
I don't think that the following code is the best:
entries = random.sample(entries, 10)

Well after reading the comments the only improvement you can make as far I can see is to fetch the keys only and limit if possible.
Haven't tested but like so
list_query = List.query()
list_query = list_query.filter(List.counter > 5)
list_query = list_query.filter(List.ignore == False)
list_keys = list_query.fetch(keys_only=True) # maybe put a limit here.
list_keys = random.sample(list_keys, 10)
lists = [list_key.get() for list_key in list_keys]

SQLAlchemy session query with INSERT IGNORE

I'm trying to do a bulk insert/update with SQLAlchemy. Here's a snippet:
for od in clist:
where = and_(Offer.network_id==od['network_id'],
Offer.external_id==od['external_id'])
o = session.query(Offer).filter(where).first()
if not o:
o = Offer()
o.network_id = od['network_id']
o.external_id = od['external_id']
o.title = od['title']
o.updated = datetime.datetime.now()
payout = od['payout']
countrylist = od['countries']
session.add(o)
session.flush()
for country in countrylist:
c = session.query(Country).filter(Country.name==country).first()
where = and_(OfferPayout.offer_id==o.id,
OfferPayout.country_name==country)
opayout = session.query(OfferPayout).filter(where).first()
if not opayout:
opayout = OfferPayout()
opayout.offer_id = o.id
opayout.payout = od['payout']
if c:
opayout.country_id = c.id
opayout.country_name = country
else:
opayout.country_id = 0
opayout.country_name = country
session.add(opayout)
session.flush()
It looks like my issue was touched on here, http://www.mail-archive.com/sqlalchemy#googlegroups.com/msg05983.html, but I don't know how to use "textual clauses" with session query objects and couldn't find much (though admittedly I haven't had as much time as I'd like to search).
I'm new to SQLAlchemy and I'd imagine there's some issues in the code besides the fact that it throws an exception on a duplicate key. For example, doing a flush after every iteration of clist (but I don't know how else to get an the o.id value that is used in the subsequent OfferPayout inserts).
Guidance on any of these issues is very appreciated.

The way you should be doing these things is with session.merge().
You should also be using your objects relation properties. So the o above should have o.offerpayout and this a list (of objects) and your offerpayout has offerpayout.country property which is the related countries object.
So the above would look something like
for od in clist:
o = Offer()
o.network_id = od['network_id']
o.external_id = od['external_id']
o.title = od['title']
o.updated = datetime.datetime.now()
payout = od['payout']
countrylist = od['countries']
for country in countrylist:
opayout = OfferPayout()
opayout.payout = od['payout']
country_obj = Country()
country_obj.name = country
opayout.country = country_obj
o.offerpayout.append(opayout)
session.merge(o)
session.flush()
This should work as long as all the primary keys are correct (i.e the country table has a primary key of name). Merge essentially checks the primary keys and if they are there merges your object with one in the database (it will also cascade down the joins).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to Optimize a loop "For" in Django - python

Related

Need help in reducing the complexity or duplication in the function

How to speed up writing in a database?

How to test if one element from list is in other list in django filter

How to get 10 random GAE ndb entities?

SQLAlchemy session query with INSERT IGNORE

Categories

Resources