Django Full Text Search Not Matching Partial Words - python

I'm using Django Full Text search to search across multiple fields but have an issue when searching using partial strings.
Lets say we have a report object with the name 'Sample Report'.
vector = SearchVector('name') + SearchVector('author__username')
search = SearchQuery('Sa')
Report.objects.exclude(visible=False).annotate(search=vector).filter(search=search)
The following QuerySet is empty but if I include the full word 'Sample' then the report will appear in the QuerySet.
Is there anyway to use icontains or prefixing with django full text search?

This is working on Django 1.11:
tools = Tool.objects.annotate(
search=SearchVector('name', 'description', 'expert__user__username'),
).filter(search__icontains=form.cleaned_data['query_string'])
Note the icontains in the filter.

#santiagopim solution is correct but to address Matt's comment for if you get the following error:
ERROR: function replace(tsquery, unknown, unknown) does not exist
at character 1603 HINT: No function matches the given name
and argument types. You might need to add explicit type casts.
You have to remove the call to SearchQuery and just use a plain string.
I know this doesn't address the underlying issue for if you need to use SearchQuery but if you are like me and just need a quick fix, you can try the following.
vector = SearchVector('name') + SearchVector('author__username')
# NOTE: I commented out the line below
# search = SearchQuery('Sa')
search = 'Sa'
Report.objects.exclude(visible=False).annotate(search=vector)\
.filter(search__icontains =search)
This other answer might be helpful.

Related

Django/Postgres - No function matches the given name and argument types

I'm trying to create a search system in my Django and Postgresql project but I keep running into an error when I try to make a query.
Whenever I try these commands in the shell:
vector = SearchVector('title','tags')
query = SearchQuery('book') | SearchQuery('harry')
My_Library.objects.annotate(similarity=TrigramSimilarity(vector,test),).filter(similarity__gt=0.3).order_by('-similarity')
I get the error:
"No function matches the given name and argument types. You might need to add explicit type casts."
I've been testing other options for a while, and the only way I can successfully pass a search query without an error is by using two strings in the place of query and vector.
My_Library.objects.annotate(similarity=TrigramSimilarity('title','my search query'),).filter(similarity__gt=0.3).order_by('-similarity')
This will successfully pass my search with no error.
Why am I getting this error, and how can I fix it?
I've been basing my code off of this Full Text Search documentation
TrigramSimilarity takes 2 strings as arguments
You're trying to pass it a SearchVector and a SearchQuery.
that won't work
If you want to search by multiple tags, you probably need to aggregate multiple of the similarity queries with a | and then sort on similarity, something like:
from django.db.models import Q
My_Library.objects.annotate(
Q(similarity=TrigramSimilarity('title','my search query'),)) |
Q(similarity=TrigramSimilarity('title','my search query'),))
).filter(similarity__gt=0.3).order_by('-similarity')
More details on Q
https://docs.djangoproject.com/en/1.11/ref/models/querysets/#q-objects

Django domain + regex parameter not working on production machine

I currently have a django view with a fairly simple search function (takes user input, returns a list of objects). For usability, I'd like the option of passing search paramters via url like so:
www.example.com/search/mysearchstring
Where mysearchstring is the input to the search function. I'm using regex to validate any alphanumeric or underscore characters.
The problem I'm having is that while this works perfectly in my development environment, it breaks on the live machine.
Currently, I am using this exact same method (with different regex patterns) in other django views without any issues. This leads me to believe that either.
1) My regex is truly bad (more likely)
2) There is a difference in regex validators between environments (less likely)
The machine running this is using django 1.6 and python 2.7, which are slightly behind my development machine, but not significantly.
urls.py
SEARCH_REGEX = '(?P<pdom>\w*)?'
urlpatterns = patterns('',
....
url(r'^polls/search/' + SEARCH_REGEX, 'polls.views.search'),
...)
Which are passed to the view like this
views. py
def search(request, pdom):
...
When loading up the page, I get the following error:
ImproperlyConfigured: "^polls/search/(?P<pdom>\w*)?" is not a valid regular expression: nothing to repeat
I've been scratching my head over this one for a while. I've attempted to use a few different methods of encapsulation around the expression with no change in results. Would appreciate any insight!
I would change it to this:
SEARCH_REGEX = r'(?P<pdom>.+)$'
It's usually a good idea to use raw strings r'' for regular expressions in python.
The group will match the entire content of the search part of your url. I would handle query string validation in the view, instead of in the url regex. If someone tries to search polls/search/two+words, you should not return a 404, but instead a 400 status and a error message explaining that the search string was malformed.
Finally, you might want to follow the common convention for search urls. Which is to use a query parameter called q. So your url-pattern would be ^polls/search/$, and then you just handle the q in the view using something like this:
def search_page_view(request):
query_string = request.GET.get('q', '')

Django queryset exclude regex

I have a command that filter through requests, and I need to extract some of them following two rules.
It should
include '^https?:\/\/[^.]*\.?site\.co([^?]*[?]).*utm_.*$'
or
exclude '^https?:\/\/[^.]*\.?site\.([^\/]+\/)*'
So, working out a possible SQL representation, I came up with:
exclude (
matching '^https?:\/\/[^.]*\.?site\.([^\/]+\/)*'
and
not matching '^https?:\/\/[^.]*\.?site\.co([^?]*[?]).*utm_.*$'
)
Which translate in django to:
.exclude(
Q(referer__iregex=r'^https?:\/\/[^.]*\.?site\.co([^?]*[?]).*utm_.*$') &
Q(referer__not_iregex=r'^https?://[^.]*\.?site\.[^/]+/?[\?]*$'))
But unfortunately, the __not_iregex lookup doesn't exists. What could be a workaround this?
You could in fact use filter for the part which you don't want to exclude:
queryset
.filter(referer__iregex=r'^https?://[^.]*\.?site\.[^/]+/?[\?]*$')
.exclude(referer__iregex=r'^https?:\/\/[^.]*\.?site\.([^\/]+\/)*')
So here your matching goes into exclude and not matching goes into filter.
Or you could use the ~Q if you really want to imitate what you have in the SQL representation:
.exclude(
Q(referer__iregex=r'^https?:\/\/[^.]*\.?site\.([^\/]+\/)*') &
~Q(referer__iregex=r'^https?://[^.]*\.?site\.[^/]+/?[\?]*$'))
# notice use of ~ here

Django-Haystack with Solr contains search

I am using haystack within a project using solr as the backend. I want to be able to perform a contains search, similar to the Django .filter(something__contains="...")
The __startswith option does not suit our needs as it, as the name suggests, looks for words that start with the string.
I tried to use something like *keyword* but Solr does not allow the * to be used as the first character
Thanks.
To get "contains" functionallity you can use:
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="100" side="back"/>
<filter class="solr.LowerCaseFilterFactory" />
as index analyzer.
This will create ngrams for every whitespace separated word in your field. For example:
"Index this!" => x, ex, dex, ndex, index, !, s!, is!, his!, this!
As you see this will expand your index greatly but if you now enter a query like:
"nde*"
it will match "ndex" giving you a hit.
Use this approach carefully to make sure that your index doesn't get too large. If you increase minGramSize, or decrease maxGramSize it will not expand the index as mutch but reduce the "contains" functionallity. For instance setting minGramSize="3" will require that you have at least 3 characters in your contains query.
You can achieve the same behavior without having to touch the solr schema. In your index, make your text field an EdgeNgramField instead of a CharField. Under the hood this will generate a similar schema to what lindstromhenrik suggested.
I am using an expression like:
.filter(something__startswith='...')
.filter_or(name=''+s'...')
as is seems solr does not like expression like '...*', but combined with or will do
None of the answers here do a real substring search *keyword*.
They don't find the keyword that is part of a bigger string, (not a prefix or suffix).
Using EdgeNGramFilterFactory or the EdgeNgramField in the indexes can only do a "startswith" or a "endswith" type of filtering.
The solution is to use a NgramField like this:
class MyIndex(indexes.SearchIndex, indexes.Indexable):
...
field_to_index= indexes.NgramField(model_attr='field_name')
...
This is very elegant, because you don't need to manually add anything to the schema.xml

Search range of int values using djapian

I'm using djapian as my search backend, and I'm looking to search for a range of values. For example:
query = 'comments:(0..10)'
Post.indexer.search(query)
would search for Posts with between 0 and 10 comments. I cannot find a way to do this in djapian, though I have found this issue, and patch to implement some kind of date range searching. I also found this page from the xapian official docs describing some kind of range query. However, I lack the knowledge to either formulate my own raw xapian query, and/or feed a raw xapian query into djapian. So help me SO, how can I query a djapian index for a range of int values.
Thanks,
Laurie
Ok, I worked it out. I'll leave the answer here for posterity.
The first thing to do is to attach a NumberValueRangeProcessor to the QueryParser. You can do this by extending the djapian Indexer._get_query_parser. Note the leading underscore. Below is a code snippet showing how I did it.
from djapian import Indexer
from xapian import NumberValueRangeProcessor
class RangeIndexer(Indexer)
def _get_query_parser(self, *args, **kwargs):
query_parser = Indexer._get_query_parser(self, *args, **kwargs)
valno = self.free_values_start_number + 0
nvrp = NumberValueRangeProcessor(valno, 'value_range:', True)
query_parser.add_valuerangeprocessor(nvrp)
return query_parser
Lines to note:
valno = self.free_values_start_number + 0
The self.free_values_start_number is an int, and used as the value no, it is the index of the first column where fields start being defined. I added 0 to this, to indicate that you should add the index of the field you want the range search to be for.
nvrp = NumberValueRangeProcessor(valno, 'value_range:', True)
We send valno to tell the processor what field to deal with. The 'value_range:' indicates the prefix for the processor, so we can search by saying 'value_range:(0..100)'. The True simply indicates that the 'value_range:' should be treated as a prefix not a suffix.
query_parser.add_valuerangeprocessor(nvrp)
This simply adds the NumberValueRangeProcessor to the QueryParser.
Hope that helps anyone who has any problems with this matter. Note that you will need to add a new NumberValueRangeProcessor for each field you want to be able to range search.

Categories