Django PostgreSQL - Fuzzy Searching on single words

Django PostgreSQL - Fuzzy Searching on single words - python

I'm using Django's builtin trigram_similar lookup and TrigramSimilarity class to create a fuzzy searching using a PostgreSQL database. This way, I filter titles of articles. The problem is, this filters based on the whole title. I want to filter it also on parts on title.
Example:
Title of Article: "We are the world | This is the best article ever".
My search: "we"
In this example, my function returns nothing, but I want it to return this article. How can I do that?
This is the code I use:
def search(qs, term, fields=["title"], treshold=.3):
if len(fields) >= 2:
return qs.annotate(
similarity=Greatest(
*[TrigramSimilarity(field, term) for field in fields]
)
).filter(similarity__gte=treshold).order_by("-similarity")
return qs.filter(
**{"{}__trigram_similar".format(fields[0]): term}
)

I think the problem is that you are using trigram similarity with a two letter word. If you try to do this with a word that is three letters, maybe it will work?

Related

different between title and title_icontains in model filter django

what is diffrent between title and title_icontains in django ?
from .model import product
product.objects.filter(title='blah')
product.objects.filter(tite__icontains='blah')

Probably it is title__icontains=… so with two consecutive underscores (__). In that case you make use of the __icontains lookup [Django-doc]. As the documentation says, this is a:
Case-insensitive containment test.
It thus looks for Products where the title contains blah. For example fooblah, blahfoo, of fooblahbar. It does this in a case insensitive manner, so products with FooBlah, BLAHfoo and FooBlAHBAR as title will also be retained.

The first form ...filter(title='value') will return all objects whose title will match exactly the value.
And the second form, correctly written as ...filter(title__icontains) will return all objects whose title contains the value, but any upper/lower case letters will match.
The i here means "ignore case".

tite__icontains is to look fo the string, but with case-insensitive.

How do I write a Django query that finds words in a Postgres column?

I'm using Django and Python 3.7. How do I scan for words in a Django query? A word is a string surrounded by whitespace (or the beginning or end of a line). I have this ...
def get_articles_with_words_in_titles(self, long_words):
qset = Article.objects.filter(reduce(operator.or_, (Q(title__icontains=x) for x in long_words)))
result = set(list(qset))
but if "long_words" contains things like ["about", "still"], it will match Articles whose titles have things like "whereabouts" or "stillborn". Any idea how to modify my query to incorporate word boundaries?

If you database is postgres, I suggest to try the Full Text Search of postgres.
And it seems that django has builtin module of it.
from django.contrib.postgres.search import SearchVector, SearchQuery
search_vector = SearchVector('title')
search_query = SearchQuery('about') & SearchQuery('still')
Article.objects.annotate(
search=search_vector
).filter(
search=search_query
)

Try iregex or regex
# Article.objects.filter(title__iregex=r"\y(still|about)\y")
words = "|".join(long_words)
Article.objects.filter(title__iregex=fr"\y({words})\y")
This should work for PostgreSQL
Django documentation:
https://docs.djangoproject.com/en/2.2/ref/models/querysets/#iregex
Python's regular expression documentation for word boundaries:
https://docs.python.org/3.7/library/re.html#index-26
PostgreSQL's documentation on word boundaries:
https://www.postgresql.org/docs/9.1/functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE

Django SearchVector using icontains

I am trying to search for a list of values in multiple columns in postgres (via django). I was able to use SearchQuery and SearchVector and this works great if one of the search values matches a full word. I was hoping to use icontains so that partial strings could also be used in the search. Is this possible and if so could someone point me in the right direction. Here is an example of my approach below.
Example Data:
Superhero.objects.create(
superhero='Batman',
publisher='DC Comics',
alter_ego='Bruce Wayne',
)
Superhero.objects.create(
superhero='Hulk',
publisher='Marvel Comics',
alter_ego='Bruce Banner',
)
Django filter:
from django.contrib.postgres.search import SearchQuery, SearchVector
query = SearchQuery('man') | SearchQuery('Bruce')
vector = SearchVector('superhero', 'alter_ego', 'publisher')
queryset = queryset.annotate(search=vector).filter(search=query)
This would return the Hulk record but I am hoping I can somehow use like 'icontains' so that when searching for 'man' the Batman record would also be returned. Any help is appreciated!

You can apply icontains to the filter like:
queryset = queryset.annotate(search=vector).filter(search__icontains=query)

So SearchQuery and SearchVector are a part of Django's Full Text searching functionality and it doesnt look like you can achieve what I was wanting to do with these functions. I have taken a different approach thanks to Julian Phalip's approach here.. https://www.julienphalip.com/blog/adding-search-to-a-django-site-in-a-snap/

searching textfield for each keyword

Right now my views.py function allows me to search each Django textfield for the ENTIRE search phrase. However if I would like to search for a title and author, such as, "biology John", my queryset will end up empty, and I am not sure how to break up the phrase and search for individual words.
def search(request):
query = request.GET.get('search')
if query:
results = Protocol.objects.filter(Q(title__contains=query) | Q(author__contains=query) | Q(description__contains=query) | Q(reagents__contains=query) | Q(protocol_steps__contains=query))
else:
results = ''
return render(request, 'protocat_app/search_protocols.html',{'results':results})

You definitely can set up haystack with solr/elasticsearch. But if you still need some database query, you can use the following:
import operator
def search(request):
terms = request.GET.get('search', '').split(' ')
q_list = []
for term in terms:
if term:
q_list.append(Q(title__contains=query))
q_list.append(Q(author__contains=query))
q_list.append(Q(description__contains=query))
q_list.append(Q(reagents__contains=query))
q_list.append(Q(protocol_steps__contains=query))
if q_list:
results = Protocol.objects.filter(reduce(operator.or_, q_list))
else:
results = ''
return render(request, 'protocat_app/search_protocols.html',{'results':results})
Hope this helps :)

I would suggest setting up a search server like Solr/Elastic Search.
Here you have just one case where you need to split the query among 2 fields. Maybe later you come up with a situation where you need to find among multiple indexed fields. Thus, Solr would be of great help.
You can read about Solr here.
Also, you can make use of django-haystack to make django interact with Solr and get the filtered results as per what the user searched.

Efficient Django full-text search without Haystack

What's the next best option for database-agnostic full-text search for Django without Haystack?
I have a model like:
class Paper(models.Model):
title = models.CharField(max_length=1000)
class Person(models.Model):
name = models.CharField(max_length=100)
class PaperReview(models.Model):
paper = models.ForeignKey(Paper)
person = models.ForeignKey(Person)
I need to search for papers by title and reviewer name, but I also want to search from the perspective of a person and find which papers they have and haven't reviewed. With Haystack, it's trivial to implement a full-text index to search by title and name fields, but as far as I can tell, there's no way to do the "left outer join" necessary to find papers without a review by a specific person.

Haystack is just a wrapper that exposes a few different search engine backends:
Solr
ElasticSearch
Whoosh
Xapian
There might be other backends as well available as plugins.
So the real question here is, is there a search backend that gives me the desired functionality, and does haystack expose that functionality?
The answer to that is, you can probably use elasticsearch*, but note the asterix.
Generally, when creating a search index, it's a good idea to think about the documents in the same way you might if you were creating a no-rel database and you want those documents to be as flat as possible.
So one possibility might be to have an array of char fields on a paperreview index. The array would contain all of the related foreign key references.
Another might be to use "nested documents" in elasticsearch.
And lastly, to use "parent/child documents" in elasticsearch.
You can still use haystack for indexing, with some hacking, but you will probably want to use one of the raw backends directly, such as pyelasticsearch or pyes.
http://www.elasticsearch.org/guide/reference/mapping/nested-type/
http://www.elasticsearch.org/guide/reference/mapping/parent-field/
http://pyelasticsearch.readthedocs.org/en/latest/
http://pyes.readthedocs.org/en/latest/

I know this question is older, but I spent some time investigation this recently and answered this as well here but it is actually not too hard to implement this yourself, and wanted to share.
I found the SearchVector/SearchQuery approach actually does not catch all cases, for example partial words (see https://www.fusionbox.com/blog/detail/partial-word-search-with-postgres-full-text-search-in-django/632/ for reference). You can implement your own without much trouble, depending on your constraints.
example, within a viewsets' get_queryset method:
...other params...
search_terms = self.request.GET.get('q')
if search_terms:
# remove possible other delimiters and other chars
# that could interfere
cleaned_terms = re.sub(r'[!\'()|&;,]', ' ', search_terms).strip()
if cleaned_terms:
# Check against all the params we want
# apply to previous terms' filtered results
q = reduce(
lambda p, n: p & n,
map(
lambda word:
Q(your_property__icontains=word) | Q(
second_property__icontains=word) | Q(
third_property__icontains=word)
cleaned_terms.split()
)
)
qs = YourModel.objects.filter(q)
return qs

I use Haystack + elastic search and so far its working pretty well. Dont think its trivial . You can easily implement your requirement, if theres a association between paper and person.

I ended up using djorm-ext-pgfulltext, which provides a simple Django interface for PostgreSQL's built-in full text search features.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django PostgreSQL - Fuzzy Searching on single words - python

I think the problem is that you are using trigram similarity with a two letter word. If you try to do this with a word that is three letters, maybe it will work?

Related

different between title and title_icontains in model filter django

How do I write a Django query that finds words in a Postgres column?

Django SearchVector using icontains

searching textfield for each keyword

Efficient Django full-text search without Haystack

Categories

Resources