Django Querying database - python

I'm trying to implement search in django.
My view is as follows :
search_term = request.GET['search_term']
customers = Customer.objects.filter(
Q(chassis__icontains=search_term) | Q(registration__icontains=search_term) |
Q(email__icontains=search_term) | Q(firstname__icontains=search_term) |
Q(lastname__icontains=search_term))
calculations_data = []
if customers:
for customer in customers:
try:
calculation = Calculations.objects.get(customer=customer, user=request.user)
calculations_data.append({
'calculation': calculation,
'price': price_incl_vat(calculation.purchase_price),
'customer_fullname': '{} {} '.format(customer.firstname, customer.lastname),
'car_chassis': customer.chassis,
'car_registration': customer.registration,
})
except Calculations.DoesNotExist:
pass
context = {'search_term': search_term, 'total_result': len(calculations_data), 'calculation_data': calculations_data}
return render(request, 'master/search.html', context)
I have two models, calculations and customer. Inside calculation I have customer as ForeignKey, but it can be empty. Thus, every calculation doesn't need to have a customer.
In my example, if I have search term the result is good, but If there is not search term, then I get only the calculations which have a customer.
But what I need is, if there is no search_term, I want to get all calculations.
Is there maybe a better way to write the query?
Thanks.

Since the results depend on availability of search_term, why aren't you using if-else on search_term.
search_term = request.GET.get('search_term', None)
if search_term:
# when search term is not None
# get relevant calculations
else:
calculations = Calculations.objects.all()
# rest of code
You can further simplify your code when search_term is not None by putting the Q objects directly in a Calculations.objects.filter() itself (instead of getting relevant customers and then finding the relevant calculations). In Django, you can query on attributes of foreign key in Q objects. You are first fetching Customers and then using those results to find Calculations. That will increase number of queries to database.
You can do something like following:
calculations = Calculations.objects.filter(
Q(customer__email__icontains=search_term) |
Q(customer__chassis_icontains=search_term)|
Q(....)).select_related('customer')
Related links:
1. Lookups that span relationships
2. select_related

try this:
if customers:
try:
calculations = Calculations.objects.filter(user=request.user)
if customers:
calculations=calculations.filter(customer__in=customers)
for calculation in calculations:
calculations_data.append({
'calculation': calculation,
'price': price_incl_vat(calculation.purchase_price),
'customer_fullname': '{} {} '.format(customer.firstname, customer.lastname),
'car_chassis': customer.chassis,
'car_registration': customer.registration,
})
except Calculations.DoesNotExist:
pass

Related

Elasticsearch behaviour

I have a question about the expected behaviour of Elasticsearch (version 7.9.1) that I'm having a hard time finding the answer to.
I query Elasticsearch with the help of the elasticsearch-dsl (version 7.3.0) library. My code is as follows:
item_search = ItemSearch(search, query_facets)
item_search = item_search[0:9999]
res = item_search.execute()
Here search is a search term for full-text search, and query_facets is a dictionary mapping fields to the terms in the fields.
The ItemSearch class looks like this:
class ItemSearch(FacetedSearch):
doc_types = [ItemsIndex, ]
size = 20
facets = {'language': TermsFacet(field='language.raw', size=size),}
def __init__(self, search, query_facets):
super().__init__(search, query_facets)
def search(self):
s = super(ItemSearch, self).search()
return s
The language field has many thousands of values, but I limited the return size to 20 since we never want to display more results than around that number anyway.
Now onto my actual question: I would expect that if I pass for example {'language' : ["Dutch"]} to ItemSearch as the query_facets parameter, that Elasticsearch returns the count for "Dutch", whether or not it belongs to the top 20 results. However, this is not the case. Is this the expected behaviour, or am I missing something? And if yes, how can I achieve the result I'm after?

What is causing inefficiency when parsing this QuerySet into tuples?

In a django app, I am attempting to parse a Queryset, representing individual time-series values x from n sensors, into tuples (t, x1, x2 ... xn), and thence into a json object in the format specified by google charts here: https://developers.google.com/chart/interactive/docs/gallery/linechart
None values are used as placeholders if no value was logged for a given timestamp from a particular sensor
The page load time is significant for a QuerySet with ~6500 rows (~3 seconds, run locally)
It's significantly longer on the server
http://54.162.202.222/pulogger/simpleview/?device=test
Profiling indicates that 99.9% of the time is spent on _winapi.WaitForSingleObject, (which I can't interpret) and manual profiling with a timer indicates that the server-side culprit is the while loop that iterates over the QuerySet and groups the values into tuples (line 23 in my code example)
Results are as-follows:
basic gets (took 5ms)
queried data (took 0ms)
split data by sensor (took 981ms)
prepared column labels/types (took 0ms)
prepared json (took 27ms)
created context (took 0ms)
For the sake of completeness, the timing function is as follows:
def print_elapsed_time(ref_datetime, description):
print('{} (took {}ms)'.format(description, floor((datetime.now()-ref_datetime).microseconds/1000)))
return datetime.now()
The code performing the processing and generating the view is as-follows:
def simpleview(request):
time_marker = datetime.now()
device_name = request.GET['device']
device = Datalogger.objects.get(device_name=device_name)
sensors = Sensor.objects.filter(datalogger=device).order_by('pk')
sensor_count = len(sensors) # should be no worse than count() since already-evaluated and cached. todo: confirm
#assign each sensor an index for the tuples (zero is used for time/x-axis)
sensor_indices = {}
for idx, sensor in enumerate(sensors, start=1):
sensor_indices.update({sensor.sensor_name:idx})
time_marker = print_elapsed_time(time_marker, 'basic gets')
# process data into timestamp-grouped tuples accessible by sensor-index ([0] is timestamp)
raw_data = SensorDatum.objects.filter(sensor__datalogger__device_name=device_name).order_by('timestamp', 'sensor')
data = []
data_idx = 0
time_marker = print_elapsed_time(time_marker, 'queried data')
while data_idx < len(raw_data):
row_list = [raw_data[data_idx].timestamp]
row_list.extend([None]*sensor_count)
row_idx = 1
while data_idx < len(raw_data) and raw_data[data_idx].timestamp == row_list[0]:
row_idx = sensor_indices.get(raw_data[data_idx].sensor.sensor_name)
row_list[row_idx] = raw_data[data_idx].value
data_idx += 1
data.append(tuple(row_list))
time_marker = print_elapsed_time(time_marker, 'split data by sensor')
column_labels = ['Time']
column_types = ["datetime"]
for sensor in sensors:
column_labels.append(sensor.sensor_name)
column_types.append("number")
time_marker = print_elapsed_time(time_marker, 'prepared column labels/types')
gchart_json = prepare_data_for_gchart(column_labels, column_types, data)
time_marker = print_elapsed_time(time_marker, 'prepared json')
context = {
'device': device_name,
'sensor_count': sensor_count,
'sensor_indices': sensor_indices,
'gchart_json': gchart_json,
}
time_marker = print_elapsed_time(time_marker, 'created context')
return render(request, 'pulogger/simpleTimeSeriesView.html', context)
I'm new to python, so I expect that there's a poor choice of operation/collection I've used somewhere. Unless I'm blind, it should run in O(n).
Obviously this isn't the whole problem since it only accounts for a part of the apparent load-time, but I figure this is a good place to start.
The "queried data" section is taking 0ms because that section is constructing the query, not executing your query against the database.
The query is being executed when it gets to this line: while data_idx < len(raw_data):, because to calculate the length of the iterable it must evaluate it.
So it may not be the loop that's taking most of the time, it's probably the query execution and evaluation. You can evaluate the query before the main loop by wrapping the queryset in a list(), this will allow your time_marker to display how long the query is actually taking to execute.
Do you need the queryset evaluated to a model? Alternatively you could use .values() or .values_list() to return an actual list of values, which skips serializing the query results into Model objects. By doing this you also avoid having to return all the columns from the database, you only return the ones you need.
You could potential remove the table join in this query SensorDatum.objects.filter(sensor__datalogger__device_name=device_name).order_by('timestamp', 'sensor') by denormalizing your schema (if possible) to have the device_name field on the sensor.
You have queries running under a loop. You can use select_related to cache related objects beforehand.
Example:
raw_data = SensorDatum.objects.filter(
sensor__datalogger__device_name=device_name
).order_by(
'timestamp',
'sensor'
).select_related('sensor') # this will fetch and cache sensor objects and will prevent further db queries in the loop
Ref: select_related Django 2.1 Docs

Google's radarsearch API results

I'm trying to geolocate all the businesses related to a keyword in my city using, first, the radarsearch API in order to retrieve the Place ID and later using the Places API to get more information of each Place ID (such as the name, or the formatted address).
In my first approach I splitted my city in 9 circumferences, each one with radius 22km and avoiding rural zones, where there's no supposed to be a business. This way I obtained (once removing duplicated results, due to the circumferences overlapping) approximately 150 businesses. This result is not reliable because the official webpage of the company asserts there are 245.
In order to retrieve ALL the businesses, I split my city in circumferences of radius 10km. Therefore with approx 50 pairs of coordinates I fill the city, including now all zones, both rural and non-rural. Now, surprisingly I obtain only 81 businesses! How can this be possible?
I'm storing all the information in separated dictionaries and I noticed the amount of data of each dictionary increases with the increasing of the radius and is always the same (for a fixed radius).
Now, apart from the previous question, is there any way to limit the amount of results each request yields?
The code I'm using is the following:
dict1 = {}
radius=20000
keyword='keyworkd'
key=YOUR_API_KEY
url_base="https://maps.googleapis.com/maps/api/place/radarsearch/json?"
list_dicts = []
for i,(lo, la) in enumerate(zip(lon_txt,lat_txt)):
url=url_base+'location='+str(lo)+','+str(la)+'&radius='+str(radius)+'&keyword='+keyword+'&key='+key
response = urllib2.urlopen(url)
table = json.load(response)
if table['status']=='OK':
for j,line in enumerate(table['results']):
temp = {j : line['place_id']}
dict1.update(temp)
list_dicts.append(dict1)
else:
pass
Finally I managed to solve this problem.
The thing was the dict initialization must be done in each loop iteration. Now it stores all the information and I retrieve what I wanted from the beginning.
dict1 = {}
radius=20000
keyword='keyworkd'
key=YOUR_API_KEY
url_base="https://maps.googleapis.com/maps/api/place/radarsearch/json?"
list_dicts = []
for i,(lo, la) in enumerate(zip(lon_txt,lat_txt)):
url=url_base+'location='+str(lo)+','+str(la)+'&radius='+str(radius)+'&keyword='+keyword+'&key='+key
response = urllib2.urlopen(url)
table = json.load(response)
if table['status']=='OK':
for j,line in enumerate(table['results']):
temp = {j : line['place_id']}
dict1.update(temp)
list_dicts.append(dict1)
dict1 = {}
else:
pass

How to search in ManyToManyField

I am new to django, and I am trying to make a query in a Many to many field.
an example of my query:
in the Models I have
class Line(models.Model):
name = models.CharField("Name of line", max_length=50, blank=True)
class Cross(models.Model):
lines = models.ManyToManyField(Line, verbose_name="Lines crossed")
date = models.DateField('Cross Date', null=True, blank=False)
I am making a search querying all the crosses that have certain lines.
I mean the query in the search box will look like: line_1, line_2, line_3
and the result will be all the crosses that have all the lines (line_1, line2, line_3)
I don't know how should the filter condition be!
all_crosses = Cross.objects.all().filter(???)
The view code:
def inventory(request):
if request.method == "POST":
if 'btn_search' in request.POST:
if 'search_by_lines' in request.POST:
lines_query = request.POST['search_by_lines']
queried_lines = split_query(lines_query, ',')
query = [Q(lines__name=l) for l in queried_lines]
print(query)
result = Cross.objects.filter(reduce(operator.and_, query))
Thank you very much
You should be able to do:
crosses = Cross.objects.filter(lines__name__in=['line_1', 'line_2', 'line_3'])
for any of the three values. If you're looking for all of the values that match, you'll need to use a Q object:
from django.db.models import Q
crosses = Cross.objects.filter(
Q(lines__name='line_1') &
Q(lines__name='line_2') &
Q(lines__name='line_3')
)
There is at least one other approach you can use, which would be chaining filters:
Cross.objects.filter(lines__name='line_1')
.filter(lines_name='line_2')
.filter(lines__name='line_3')
If you need to dynamically construct the Q objects, and assuming the "name" value is what you're posting:
import operator
lines = [Q(line__name='{}'.format(line)) for line in request.POST.getlist('lines')]
crosses = Cross.objects.filter(reduce(operator.and_, lines))
[Update]
Turns out, I was dead wrong. I tried a couple of different ways of querying Cross objects where the value of lines matched all of the items searched. Q objects, annotations of counts on the number of objects contained... nothing worked as expected.
In the end, I ended up matching cross.lines as a list to the list of values posted.
In short, the search view I created matched in this fashion:
results = []
posted_lines = []
search_by_lines = 'search_by_lines' in request.POST.keys()
crosses = Cross.objects.all().prefetch_related('lines')
if request.method == 'POST' and search_by_lines:
posted_lines = request.POST.getlist('line')
for cross in crosses:
if list(cross.lines.values_list('name', flat=True)) == posted_lines:
results.append(cross)
return render(request, 'search.html', {'lines': lines, 'results': results,
'posted_lines': posted_lines})
What I would probably do in this case is add a column on the Cross model to keep a comma separated list of the primary keys of the related lines values, which you could keep in sync via a post_save signal.
With the additional field, you could query directly against the "line" values without joins.

Dynamic queries in Elasticsearch and issue of keywords

I'm currently running into a problem, trying to build dynamic queries for Elasticsearch in Python. To make a query I use Q shortсut from elasticsearch_dsl. This is something I try to implement
...
s = Search(using=db, index="reestr")
condition = {"attr_1_":"value 1", "attr_2_":"value 2"} # try to build query from this
must = []
for key in condition:
must.append(Q('match',key=condition[key]))
But that in fact results to this condition:
[Q('match',key="value 1"),Q('match',key="value 2")]
However, what I want is:
[Q('match',attr_1_="value 1"),Q('match',attr_2_="value 2")]
IMHO, the way this library does queries is not effective. I think this syntax:
Q("match","attrubute_name"="attribute_value")
is much more powerful and makes it possible to do a lot more things, than this one:
Q("match",attribute_name="attribute_value")
It seems, as if it is impossible to dynamically build attribute_names. Or it is, of course, possible that I do not know the right way to do it.
Suppose,
filters = {'condition1':['value1'],'condition2':['value3','value4']}
Code:
filters = data['filters_data']
must_and = list() # The condition that has only one value
should_or = list() # The condition that has more than 1 value
for key in filters:
if len(filters[key]) > 1:
for item in filters[key]:
should_or.append(Q("match", **{key:item}))
else:
must_and.append(Q("match", **{key:filters[key][0]}))
q1 = Bool(must=must_and)
q2 = Bool(should=should_or)
s = s.query(q1).query(q2)
result = s.execute()
One can also use terms, that can directly accept the list of values and no need of complicated for loops,
Code:
for key in filters:
must_and.append(Q("terms", **{key:filters[key]}))
q1 = Bool(must=must_and)

Categories