Django-Haystack with Solr contains search - python

I am using haystack within a project using solr as the backend. I want to be able to perform a contains search, similar to the Django .filter(something__contains="...")
The __startswith option does not suit our needs as it, as the name suggests, looks for words that start with the string.
I tried to use something like *keyword* but Solr does not allow the * to be used as the first character
Thanks.

To get "contains" functionallity you can use:
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="100" side="back"/>
<filter class="solr.LowerCaseFilterFactory" />
as index analyzer.
This will create ngrams for every whitespace separated word in your field. For example:
"Index this!" => x, ex, dex, ndex, index, !, s!, is!, his!, this!
As you see this will expand your index greatly but if you now enter a query like:
"nde*"
it will match "ndex" giving you a hit.
Use this approach carefully to make sure that your index doesn't get too large. If you increase minGramSize, or decrease maxGramSize it will not expand the index as mutch but reduce the "contains" functionallity. For instance setting minGramSize="3" will require that you have at least 3 characters in your contains query.

You can achieve the same behavior without having to touch the solr schema. In your index, make your text field an EdgeNgramField instead of a CharField. Under the hood this will generate a similar schema to what lindstromhenrik suggested.

I am using an expression like:
.filter(something__startswith='...')
.filter_or(name=''+s'...')
as is seems solr does not like expression like '...*', but combined with or will do

None of the answers here do a real substring search *keyword*.
They don't find the keyword that is part of a bigger string, (not a prefix or suffix).
Using EdgeNGramFilterFactory or the EdgeNgramField in the indexes can only do a "startswith" or a "endswith" type of filtering.
The solution is to use a NgramField like this:
class MyIndex(indexes.SearchIndex, indexes.Indexable):
...
field_to_index= indexes.NgramField(model_attr='field_name')
...
This is very elegant, because you don't need to manually add anything to the schema.xml

Related

Python parsing update statements using regex

I'm trying to find a regex expression in python that will be able to handle most of the UPDATE queries that I throw at if from my DB. I can't use sqlparse or any other libraries that may be useful with for this, I can only use python's built-in modules or cx_Oracle, in case it has a method I'm not aware of that could do something like this.
Most update queries look like this:
UPDATE TABLE_NAME SET COLUMN_NAME=2, OTHER_COLUMN=to_date('31-DEC-202023:59:59','DD-MON-YYYYHH24:MI:SS'), COLUMN_STRING='Hello, thanks for your help', UPDATED_BY=-100 WHERE CODE=9999;
Most update queries I use have a version of these types of updates. The output has to be a list including each separate SQL keyword (UPDATE, SET, WHERE), each separate update statement(i.e COLUMN_NAME=2) and the final identifier (CODE=9999).
Ideally, the result would look something like this:
list = ['UPDATE', 'TABLE_NAME', 'SET', 'COLUMN_NAME=2', 'OTHER_COLUMN=("31-DEC-2020 23:59:59","DD-MON-YYYY HH24:MI:SS")', COLUMN_STRING='Hello, thanks for your help', 'UPDATED_BY=-100', 'WHERE', 'CODE=9999']
Initially I tried doing this using a string.split() splitting on the spaces, but when dealing with one of my slightly more complex queries like the one above, the split method doesn't deal well with string updates such as the one I'm trying to make in COLUMN_STRING or those in OTHER_COLUMN due to the blank spaces in those updates.
Let's use the shlex module :
import shlex
test="UPDATE TABLE_NAME SET COLUMN_NAME=2, OTHER_COLUMN=to_date('31-DEC-202023:59:59','DD-MON-YYYYHH24:MI:SS'), COLUMN_STRING='Hello, thanks for your help', UPDATED_BY=-100 WHERE CODE=9999;"
t=shlex.split(test)
Up to here, we won't get rid of comma delimiters and the last semi one, so maybe we can do this :
for i in t:
if i[-1] in [',',';']:
i=i[:-1]
If we print every element of that list we'll get :
UPDATE
TABLE_NAME
SET
COLUMN_NAME=2
OTHER_COLUMN=to_date(31-DEC-202023:59:59,DD-MON-YYYYHH24:MI:SS)
COLUMN_STRING=Hello, thanks for your help
UPDATED_BY=-100
WHERE
CODE=9999
Not a proper generic answer, but serves the purpose i hope.

Django Full Text Search Not Matching Partial Words

I'm using Django Full Text search to search across multiple fields but have an issue when searching using partial strings.
Lets say we have a report object with the name 'Sample Report'.
vector = SearchVector('name') + SearchVector('author__username')
search = SearchQuery('Sa')
Report.objects.exclude(visible=False).annotate(search=vector).filter(search=search)
The following QuerySet is empty but if I include the full word 'Sample' then the report will appear in the QuerySet.
Is there anyway to use icontains or prefixing with django full text search?
This is working on Django 1.11:
tools = Tool.objects.annotate(
search=SearchVector('name', 'description', 'expert__user__username'),
).filter(search__icontains=form.cleaned_data['query_string'])
Note the icontains in the filter.
#santiagopim solution is correct but to address Matt's comment for if you get the following error:
ERROR: function replace(tsquery, unknown, unknown) does not exist
at character 1603 HINT: No function matches the given name
and argument types. You might need to add explicit type casts.
You have to remove the call to SearchQuery and just use a plain string.
I know this doesn't address the underlying issue for if you need to use SearchQuery but if you are like me and just need a quick fix, you can try the following.
vector = SearchVector('name') + SearchVector('author__username')
# NOTE: I commented out the line below
# search = SearchQuery('Sa')
search = 'Sa'
Report.objects.exclude(visible=False).annotate(search=vector)\
.filter(search__icontains =search)
This other answer might be helpful.

mysql 'LIKE' in Django python

I am little new to Django,
My Question is How do i do %LIKE% of MYSQL in Django Filter
Want something like this
myModel.objects.filter(myField__**like**="xyz")
as we can do
myModel.objects.filter(myField__startswith="xyz")
for strings that starts with 'xyz' but i want to match anywhere in the myField content.
What i know
it can be done by REGEX and .extra() but i want something very straight forward.
Thanks in advance.
You can do it like this:
myModel.objects.filter(myField__contains = "xyz")
Note: __contains is case sensitive. You can use __icontains if you don't care about the case of the text.
Use the contains operator my_model.objects.filter(my_field__contains='xyz') and icontains if you want case insensitivity

Django, SQLite - Accurate ordering of strings with accented letters

Main problem:
I have a Python (3.4) Django (1.6) web app using an SQLite (3) database containing a table of authors. When I get the ordered list of authors some names with accented characters like ’Čapek’ and ’Örkény’ are the end of list instead of at (or directly after) section ’c’ and ’o’ of the list.
My 1st try:
SQLite can accept collation definitions. I searched for one that was made to order UTF-8 strings correctly for example Localized and Unicode collation in Android (Accented Search in sqlite (android)) but found none.
My 2nd try:I found an old closed Django ticket about my problem: https://code.djangoproject.com/ticket/8384 It suggests sorting with Python as workaround. I found it quite unsatisfying. Firstly if I sort with a Python method (like below) instead of ordering at model level I cannot use generic views. Secondly ordering with a Python method returns the very same result as the SQLite order_by does: ’Čapek’ and ’Örkény’ are placed after section 'z'.
author_list = sorted(Author.objects.all(), key=lambda x: (x.lastname, x.firstname))
How could I get the queryset ordered correctly?
Thanks to the link CL wrote in his comment, I managed to overcome the difficulties that I replied about. I answer my question to share the piece of code that worked because using Pyuca to sort querysets seems to be a rare and undocumented case.
# import section
from pyuca import Collator
# Calling Collator() takes some seconds so you should create it as reusable variable.
c = Collator()
# ...
# main part:
author_list = sorted(Author.objects.all(), key=lambda x: (c.sort_key(x.lastname), c.sort_key(x.firstname)))
The point is to use sort_key method with the attribute you want to sort by as argument. You can sort by multiple attributes as you see in the example.
Last words: In my language (Hungarian) we use four different accented version of the Latin letter ‘o’: ‘o’, ’ó’, ’ö’, ’ő’. ‘o’ and ‘ó’ are equal in sorting, and ‘ö’ and ‘ő’ are equal too, and ‘ö’/’ő’ are after ‘o’/’ó’. In the default collation table the four letters are equal. Now I try to find a way to define or find a localized collation table.
You could create a new field in the table, fill it with the result of unidecode, then sort according to it.
Using a property to provide get/set methods could help in keeping the fields in sync.

Python: Need to replace a series of different substrings in HTML template with additional HTML or database results

Situation:
I am writing a basic templating system in Python/mod_python that reads in a main HTML template and replaces instances of ":value:" throughout the document with additional HTML or db results and then returns it as a view to the user.
I am not trying to replace all instances of 1 substring. Values can vary. There is a finite list of what's acceptable. It is not unlimited. The syntax for the values is [colon]value[colon]. Examples might be ":gallery: , :related: , :comments:". The replacement may be additional static HTML or a call to a function. The functions may vary as well.
Question:
What's the most efficient way to read in the main HTML file and replace the unknown combination of values with their appropriate replacement?
Thanks in advance for any thoughts/solutions,
c
There are dozens of templating options that already exist. Consider genshi, mako, jinja2, django templates, or more.
You'll find that you're reinventing the wheel with little/no benefit.
If you can't use an existing templating system for whatever reason, your problem seems best tackled with regular expressions:
import re
valre = re.compile(r':\w+:')
def dosub(correspvals, correspfuns, lastditch):
def f(value):
v = value.group()[1:-1]
if v in correspvals:
return correspvals[v]
if v in correspfuns:
return correspfuns[v]() # or whatever args you need
# what if a value has neither a corresponding value to
# substitute, NOR a function to call? Whatever...:
return lastditch(v)
return f
replacer = dosub(adict, another, somefun)
thehtml = valre.sub(replacer, thehtml)
Basically you'll need two dictionaries (one mapping values to corresponding values, another mapping values to corresponding functions to be called) and a function to be called as a last-ditch attempt for values that can't be found in either dictionary; the code above shows you how to put these things together (I'm using a closure, a class would of course do just as well) and how to apply them for the required replacement task.
This is probably a job for a templating engine and for Python there are a number of choices. In this stackoveflow question people have listed their favourites and some helpfully explain why: What is your single favorite Python templating engine?

Categories