Django + Haystack + ElasticSearch advanced lookup

Django + Haystack + ElasticSearch advanced lookup - python

I'm having trouble in resolving Haystack queries using SQ, objects. If I'm performing the same query but using Django ORM and it's Q objects, everything works fine.
I can't figure out what I'm doing wrong here since Haystach documentation states that SQ objects are similar to Q ones. Any help is much appreciated. Thanks!
Here's the code I have:
class PublicationSearch(object):
def __init__(self, search_data):
self.__dict__.update(search_data)
def search_all_words(self, sq):
if self.all_words:
words = self.all_words.split()
title_sq = SQ()
full_text_sq = SQ()
for word in words:
title_sq = title_sq | SQ(title__icontains=word)
full_text_sq = full_text_sq | SQ(full_text__icontains=word)
keyword_sq = title_sq | full_text_sq
sq = sq & keyword_sq
return sq
class AdvancedPublicationForm(AdvancedPublicationBaseForm):
def search(self):
cleaned_data = super(AdvancedPublicationForm, self).clean()
# if no query word was submitted, return an empty sqs
if not any(cleaned_data.itervalues()):
return self.no_query_found()
results = self.build_results(cleaned_data)
return results
def build_results(self, search_data):
sq = SQ()
results = None
searcher = PublicationSearch(search_data)
for key in search_data.iterkeys():
dispatch = getattr(searcher, 'search_%s' % key)
sq = dispatch(sq)
if sq and len(sq):
results = SearchQuerySet().models(Publication).add(sq)
else:
results = []
return results
The query for a sample of two words is looking like this:
(AND: (OR: (AND: ), ('title__icontains', u'casamento'), ('title__icontains', u'civil'), (AND: ), ('full_text__icontains', u'casamento'), ('full_text__icontains', u'civil')))
And the error returned:
Failed to query Elasticsearch using '( OR title:(casamento) OR title:(civil) OR OR full_text:(casamento) OR full_text:(civil))'

I manage to find the way. Refactoring to look like below.
title_sq.add(SQ(title__icontains=word), SQ.OR)
full_text_sq.add(SQ(full_text__icontains=word), SQ.OR)

Related

Django JSONField filtering Queryset where filter value is annotated sum value

How do I properly write the filter code so it returns only the Animals that are not sold out.
I'm using POSTGRES db, python3.6 and Django 2.1.7 (currently there are v2.2a1,v2.2b1 pre-release versions)
My questioin is an extension to Django JSONField filtering
which filters on a hard coded value in the filter.
My Case requires an annotated value in the filter.
models.py I know that the models can be optimized, but I already have huge amount of records since more than 3 years
from django.db import models
from django.contrib.postgres.fields import JSONField
class Animal(models.Model):
data = models.JSONField(verbose_name=_('data'), blank=True)
class Sell(models.Model):
count = models.IntegerField(verbose_name=_('data'), blank=True)
animal = models.ForeignKey('Animal',
on_delete=models.CASCADE,
related_name="sales_set",
related_query_name="sold"
)
in my api I want to return only the animals that still have something left for selling
animal = Animal(data={'type':'dog', 'bread':'Husky', 'count':20})
What I want to filter should be similar to animal.data['count'] > sum(animal.sales_set__count
Animal.objects.annotate(animals_sold=Sum('sales_set__count'))
.filter(data__contains=[{'count__gt': F('animals_sold')}])
with the code above i get builtins.TypeError
TypeError: Object of type 'F' is not JSON serializable
if I remove the F it won't filter on the value of the animals_sold, but on the text 'animals_sold' and it doesn't do any help.
Animal.objects.annotate(animals_sold=Sum('sales_set__count'))
.filter(data__contains=[{'count__gt': F('animals_sold')}])
Edit 1:
There is one more topic here that can be linked:
Postgres: values query on json key with django
Edit 2:
here is some additional code with custom transform classes as suggested in related django ticket
from django.db.models.constants import LOOKUP_SEP
from django.db.models import F, Q, Prefetch, Sum
from django.db.models import IntegerField, FloatField, ExpressionWrapper
from django.db.models.functions import Cast
from django.contrib.postgres.fields import JSONField
from django.contrib.postgres.fields.jsonb import KeyTransform, KeyTextTransform
class KeyIntegerTransform(KeyTransform): # similar to KeyTextTransform
""" trasnform the data.count to integer """
operator = '->>'
nested_operator = '#>>'
output_field = IntegerField()
class KeyIntTransformFactory:
""" helper class for the JSONF() """
def __init__(self, key_name):
self.key_name = key_name
def __call__(self, *args, **kwargs):
return KeyIntegerTransform(self.key_name, *args, **kwargs)
class JSONF(F):
""" for filtering on JSON Fields """
def resolve_expression(self, query=None, allow_joins=True, reuse=None, summarize=False, for_save=False):
rhs = super().resolve_expression(query, allow_joins, reuse, summarize, for_save)
field_list = self.name.split(LOOKUP_SEP)
for name in field_list[1:]:
rhs = KeyIntegerTransform(name)(rhs)
return rhs
queryset filtering I tried so far:
q = q.filter(data__contains={'count__gt':JSONF('sold_count_sum')})
# err: Object of type 'JSONF' is not JSON serializable
q = q.filter(sold_count_sum__lt=Cast(JSONF('data_count'), IntegerField()))
# err: operator does not exist: text ->> unknown
q = q.filter(sold_count_sum__lt=Cast(JSONF('data__count'), IntegerField()))
# err: 'KeyIntegerTransform' takes exactly 1 argument (0 given)
q = q.filter(sold_count_sum__lt=KeyIntegerTransform('count', 'data'))
# err: operator does not exist: text ->> unknown
q = q.filter(sold_count_sum__lt=F('data__count'))
# err: operator does not exist: text ->> unknown
q = q.filter(sold_count_sum__lt=F('data_count'))
# err: operator does not exist: text ->> unknown
q = q.filter(sold_count_sum__lt=JSONF('data_count'))
# err: operator does not exist: text ->> unknown
q = q.filter(sold_count_sum__lt=JSONF('data__count'))
# err: 'KeyIntegerTransform' takes exactly 1 argument (0 given)
q = q.filter(sold_count_sum__lt=JSONF('data', 'count'))
# err: JSONF.__init__() takes 2 params

queryset = Animal.objects.annotate(
json=Cast(F('data'), JSONField()),
sold_count_sum = Sum('sold__count'),
sold_times = Count('sold'),
).filter(
Q(sold_times=0) | Q(sold_count_sum__lt=Cast(
KeyTextTransform('count', 'json'), IntegerField())
),
# keyword filtering here ...
# client = client
)
this is what works for me, but it can be optimized with a good JSONF field probably
we can also (re)move the json annotation and use casted version of data (may have some performance improvement):
queryset = Animal.objects.annotate(
sold_count_sum = Sum('sold__count'),
sold_times = Count('sold'),
).filter(
Q(sold_times=0) | Q(sold_count_sum__lt=Cast(
KeyTextTransform('count', Cast(
F('data'), JSONField())), IntegerField()
)
),
# keyword filtering here ...
# client = client
)

How about something like this:
from django.db.models import Sum, F
from django.contrib.postgres.fields.jsonb import KeyTransform
Animal.objects.annotate(animals_sold=Sum('sales_set__count'), data_count=KeyTransform('count', 'data')).filter(data_count__gt=F('animals_sold'))

The F class doesn't support a JSONField at this time, but you might try making your own custom expression as described in the related ticket.

Python odata API

I'm building a python class that will work with the devextreme odata datastore.
Here is my ODataAPI class:
from sqlalchemy.orm import load_only
from casservices import db
import models
class ODataAPI:
def __init__(self, model):
self.model = model
def get(self, cfg):
#"""
#config options
#sort = string of "param_name desc" - if desc is missing, sort asc
#take = int limit amount of results
#skip = int number of items to skip ahead
#filter = e.g. (substringof('needle',name)) or (role eq 'needle') or (substringof('needle',email)) or (job eq 'needle') or (office eq 'needle')
#select = csv of entities to return
#"""
q = db.session.query(self.model)
if cfg.get('$select') is not None:
splt = cfg.get('$select').split(",")
q.options(load_only(*splt))
if cfg.get('$filter') is not None:
NEED CODE HERE TO PARSE $filter
if cfg.get('$orderby') is not None:
splt = cfg.get('$orderby').split(" ")
order_direction = "ASC"
if len(splt) == 2 and splt[1] == 'desc':
order_direction = "DESC"
order_string = "%s.%s %s" % (self.model.__tablename__, splt[0], order_direction)
q = q.order_by(order_string)
if cfg.get('$top') is not None:
q = q.limit(cfg.get('$top'))
if cfg.get('$skip') is not None:
q = q.offset(cfg.get('$skip'))
items = q.all()
total_items = db.session.query(self.model).count()
data = {
"d":{
"__count": total_items,
"results": [i.as_dict() for i in items]
}
}
return data
how do I parse the following string into something I can use for filtering my set?
The get parameter comes in like this:
$filter=(substringof('needle',name)) or (role eq 'needle') or (substringof('needle',email)) or (job eq 'needle') or (office eq 'needle')

I have come across a helpul OData filter parser and it works with SQLAlchemy 🥳🎉:
Example function I made with Flask-SQLAlchemy:
def get_countries(filter, page_number, per_page):
# OData filter
query = apply_odata_query(Country.query, filter)
return paginate(query, page_number, per_page)
To call the function, you now just need to pass the filter blah blah blah:
countries = get_countries("code eq 'ZWE'", 1, 10)
You can find the library and stuff here: https://github.com/gorilla-co/odata-query. Library is also extendable.😁

How to use whoosh for searching keywords

Article Schema:
Below is the article schema what I have created.
class ArticleSchema(SchemaClass):
title = TEXT(
phrase=True, sortable=True, stored=True,
field_boost=2.0, spelling=True, analyzer=StemmingAnalyzer())
keywords = KEYWORD(
commas=True, field_boost=1.5, lowercase=True)
authors = KEYWORD(stored=True, commas=True, lowercase=True)
content = TEXT(spelling=True, analyzer=StemmingAnalyzer())
summary = TEXT(spelling=True, analyzer=StemmingAnalyzer())
published_time = DATETIME(stored=True, sortable=True)
permalink = STORED
thumbnail = STORED
article_id = ID(unique=True, stored=True)
topic = TEXT(spelling=True, stored=True)
series_id = STORED
tags = KEYWORD(commas=True, lowercase=True)
Search Query
FIELD_TIME = 'published_time'
FIELD_TITLE = 'title'
FIELD_PUBLISHER = 'authors'
FIELD_KEYWORDS = 'keywords'
FIELD_CONTENT = 'content'
FIELD_TOPIC = 'topic'
def search_query(search_term=None, page=1, result_len=10):
'''Search the provided query.'''
if not search_term or search_term == '':
return None, 0
if not index.exists_in(INDEX_DIR, indexname=INDEX_NAME):
return None, 0
ix = get_index()
parser = qparser.MultifieldParser(
[FIELD_TITLE, FIELD_PUBLISHER, FIELD_KEYWORDS, FIELD_TOPIC],
ix.schema)
query = parser.parse(search_term)
query.normalize()
search_results = []
with ix.searcher() as searcher:
results = searcher.search_page(
query,
pagenum=page,
pagelen=result_len,
sortedby=[sorting_timestamp, scores],
reverse=True,
terms=True
)
if results.scored_length() > 0:
for hit in results:
search_results.append(append_to(hit))
return (search_results, results.pagecount)
parser = qparser.MultifieldParser(
[FIELD_TITLE, FIELD_PUBLISHER, FIELD_TOPIC],
ix.schema, termclass=FuzzyTerm)
parser.add_plugin(qparser.FuzzyTermPlugin())
query = parser.parse(search_term)
query.normalize()
search_results = []
with ix.searcher() as searcher:
results = searcher.search_page(
query,
pagenum=page,
pagelen=result_len,
sortedby=[sorting_timestamp, scores],
reverse=True,
terms=True
)
if results.scored_length() > 0:
for hit in results:
search_results.append(append_to(hit))
return (search_results, results.pagecount)
return None, 0
When I am trying the title search is working, but for author and keyword the search is not working. I am not able to understand what wrong I am doing here. I am getting data from api and then running the index. It's all working fine. But when I am searching through keywords like authors and keywords it's not working.

Both authors and keywords are of type KEYWORD which does not support phrase search which mean that you should search with the exact keyword or one of its derivatives since you are using a stemmer.
For authors, I think you should use TEXT.
From whoosh documentation
whoosh.fields.KEYWORD
This type is designed for space- or comma-separated keywords. This
type is indexed and searchable (and optionally stored). To save space,
it does not support phrase searching.

Filtering objects in Django based on optional arguments

Many times I find myself writing code similar to:
query = MyModel.objects.all()
if request.GET.get('filter_by_field1'):
query = query.filter(field1 = True)
if request.GET.get('filter_by_field2'):
query = query.filter(field2 = False)
field3_filter = request.GET.get('field3'):
if field3_filter is not None:
query = query.filter(field3 = field3_filter)
if field4_filter:
query = query.filter(field4 = field4_filter)
# etc...
return query
Is there a better, more generic way of building queries such as the one above?

If the only things that are ever going to be in request GET are potential query arguments, you could do this:
query = MyModel.objects.filter(**request.GET)

Union and Intersect in Django

class Tag(models.Model):
name = models.CharField(maxlength=100)
class Blog(models.Model):
name = models.CharField(maxlength=100)
tags = models.ManyToManyField(Tag)
Simple models just to ask my question.
I wonder how can i query blogs using tags in two different ways.
Blog entries that are tagged with "tag1" or "tag2":
Blog.objects.filter(tags_in=[1,2]).distinct()
Blog objects that are tagged with "tag1" and "tag2" : ?
Blog objects that are tagged with exactly "tag1" and "tag2" and nothing else : ??
Tag and Blog is just used for an example.

You could use Q objects for #1:
# Blogs who have either hockey or django tags.
from django.db.models import Q
Blog.objects.filter(
Q(tags__name__iexact='hockey') | Q(tags__name__iexact='django')
)
Unions and intersections, I believe, are a bit outside the scope of the Django ORM, but its possible to to these. The following examples are from a Django application called called django-tagging that provides the functionality. Line 346 of models.py:
For part two, you're looking for a union of two queries, basically
def get_union_by_model(self, queryset_or_model, tags):
"""
Create a ``QuerySet`` containing instances of the specified
model associated with *any* of the given list of tags.
"""
tags = get_tag_list(tags)
tag_count = len(tags)
queryset, model = get_queryset_and_model(queryset_or_model)
if not tag_count:
return model._default_manager.none()
model_table = qn(model._meta.db_table)
# This query selects the ids of all objects which have any of
# the given tags.
query = """
SELECT %(model_pk)s
FROM %(model)s, %(tagged_item)s
WHERE %(tagged_item)s.content_type_id = %(content_type_id)s
AND %(tagged_item)s.tag_id IN (%(tag_id_placeholders)s)
AND %(model_pk)s = %(tagged_item)s.object_id
GROUP BY %(model_pk)s""" % {
'model_pk': '%s.%s' % (model_table, qn(model._meta.pk.column)),
'model': model_table,
'tagged_item': qn(self.model._meta.db_table),
'content_type_id': ContentType.objects.get_for_model(model).pk,
'tag_id_placeholders': ','.join(['%s'] * tag_count),
}
cursor = connection.cursor()
cursor.execute(query, [tag.pk for tag in tags])
object_ids = [row[0] for row in cursor.fetchall()]
if len(object_ids) > 0:
return queryset.filter(pk__in=object_ids)
else:
return model._default_manager.none()
For part #3 I believe you're looking for an intersection. See line 307 of models.py
def get_intersection_by_model(self, queryset_or_model, tags):
"""
Create a ``QuerySet`` containing instances of the specified
model associated with *all* of the given list of tags.
"""
tags = get_tag_list(tags)
tag_count = len(tags)
queryset, model = get_queryset_and_model(queryset_or_model)
if not tag_count:
return model._default_manager.none()
model_table = qn(model._meta.db_table)
# This query selects the ids of all objects which have all the
# given tags.
query = """
SELECT %(model_pk)s
FROM %(model)s, %(tagged_item)s
WHERE %(tagged_item)s.content_type_id = %(content_type_id)s
AND %(tagged_item)s.tag_id IN (%(tag_id_placeholders)s)
AND %(model_pk)s = %(tagged_item)s.object_id
GROUP BY %(model_pk)s
HAVING COUNT(%(model_pk)s) = %(tag_count)s""" % {
'model_pk': '%s.%s' % (model_table, qn(model._meta.pk.column)),
'model': model_table,
'tagged_item': qn(self.model._meta.db_table),
'content_type_id': ContentType.objects.get_for_model(model).pk,
'tag_id_placeholders': ','.join(['%s'] * tag_count),
'tag_count': tag_count,
}
cursor = connection.cursor()
cursor.execute(query, [tag.pk for tag in tags])
object_ids = [row[0] for row in cursor.fetchall()]
if len(object_ids) > 0:
return queryset.filter(pk__in=object_ids)
else:
return model._default_manager.none()

I've tested these out with Django 1.0:
The "or" queries:
Blog.objects.filter(tags__name__in=['tag1', 'tag2']).distinct()
or you could use the Q class:
Blog.objects.filter(Q(tags__name='tag1') | Q(tags__name='tag2')).distinct()
The "and" query:
Blog.objects.filter(tags__name='tag1').filter(tags__name='tag2')
I'm not sure about the third one, you'll probably need to drop to SQL to do it.

Please don't reinvent the wheel and use django-tagging application which was made exactly for your use case. It can do all queries you describe, and much more.
If you need to add custom fields to your Tag model, you can also take a look at my branch of django-tagging.

This will do the trick for you
Blog.objects.filter(tags__name__in=['tag1', 'tag2']).annotate(tag_matches=models.Count(tags)).filter(tag_matches=2)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django + Haystack + ElasticSearch advanced lookup - python

I manage to find the way. Refactoring to look like below. title_sq.add(SQ(title__icontains=word), SQ.OR) full_text_sq.add(SQ(full_text__icontains=word), SQ.OR)

Related

Django JSONField filtering Queryset where filter value is annotated sum value

Python odata API

How to use whoosh for searching keywords

Filtering objects in Django based on optional arguments

Union and Intersect in Django

Categories

Resources