Django queryset exclude regex

Django queryset exclude regex - python

I have a command that filter through requests, and I need to extract some of them following two rules.
It should
include '^https?:\/\/[^.]*\.?site\.co([^?]*[?]).*utm_.*$'
or
exclude '^https?:\/\/[^.]*\.?site\.([^\/]+\/)*'
So, working out a possible SQL representation, I came up with:
exclude (
matching '^https?:\/\/[^.]*\.?site\.([^\/]+\/)*'
and
not matching '^https?:\/\/[^.]*\.?site\.co([^?]*[?]).*utm_.*$'
)
Which translate in django to:
.exclude(
Q(referer__iregex=r'^https?:\/\/[^.]*\.?site\.co([^?]*[?]).*utm_.*$') &
Q(referer__not_iregex=r'^https?://[^.]*\.?site\.[^/]+/?[\?]*$'))
But unfortunately, the __not_iregex lookup doesn't exists. What could be a workaround this?

You could in fact use filter for the part which you don't want to exclude:
queryset
.filter(referer__iregex=r'^https?://[^.]*\.?site\.[^/]+/?[\?]*$')
.exclude(referer__iregex=r'^https?:\/\/[^.]*\.?site\.([^\/]+\/)*')
So here your matching goes into exclude and not matching goes into filter.
Or you could use the ~Q if you really want to imitate what you have in the SQL representation:
.exclude(
Q(referer__iregex=r'^https?:\/\/[^.]*\.?site\.([^\/]+\/)*') &
~Q(referer__iregex=r'^https?://[^.]*\.?site\.[^/]+/?[\?]*$'))
# notice use of ~ here

Related

Django SearchVector using icontains

I am trying to search for a list of values in multiple columns in postgres (via django). I was able to use SearchQuery and SearchVector and this works great if one of the search values matches a full word. I was hoping to use icontains so that partial strings could also be used in the search. Is this possible and if so could someone point me in the right direction. Here is an example of my approach below.
Example Data:
Superhero.objects.create(
superhero='Batman',
publisher='DC Comics',
alter_ego='Bruce Wayne',
)
Superhero.objects.create(
superhero='Hulk',
publisher='Marvel Comics',
alter_ego='Bruce Banner',
)
Django filter:
from django.contrib.postgres.search import SearchQuery, SearchVector
query = SearchQuery('man') | SearchQuery('Bruce')
vector = SearchVector('superhero', 'alter_ego', 'publisher')
queryset = queryset.annotate(search=vector).filter(search=query)
This would return the Hulk record but I am hoping I can somehow use like 'icontains' so that when searching for 'man' the Batman record would also be returned. Any help is appreciated!

You can apply icontains to the filter like:
queryset = queryset.annotate(search=vector).filter(search__icontains=query)

So SearchQuery and SearchVector are a part of Django's Full Text searching functionality and it doesnt look like you can achieve what I was wanting to do with these functions. I have taken a different approach thanks to Julian Phalip's approach here.. https://www.julienphalip.com/blog/adding-search-to-a-django-site-in-a-snap/

Django Full Text Search Not Matching Partial Words

I'm using Django Full Text search to search across multiple fields but have an issue when searching using partial strings.
Lets say we have a report object with the name 'Sample Report'.
vector = SearchVector('name') + SearchVector('author__username')
search = SearchQuery('Sa')
Report.objects.exclude(visible=False).annotate(search=vector).filter(search=search)
The following QuerySet is empty but if I include the full word 'Sample' then the report will appear in the QuerySet.
Is there anyway to use icontains or prefixing with django full text search?

This is working on Django 1.11:
tools = Tool.objects.annotate(
search=SearchVector('name', 'description', 'expert__user__username'),
).filter(search__icontains=form.cleaned_data['query_string'])
Note the icontains in the filter.

#santiagopim solution is correct but to address Matt's comment for if you get the following error:
ERROR: function replace(tsquery, unknown, unknown) does not exist
at character 1603 HINT: No function matches the given name
and argument types. You might need to add explicit type casts.
You have to remove the call to SearchQuery and just use a plain string.
I know this doesn't address the underlying issue for if you need to use SearchQuery but if you are like me and just need a quick fix, you can try the following.
vector = SearchVector('name') + SearchVector('author__username')
# NOTE: I commented out the line below
# search = SearchQuery('Sa')
search = 'Sa'
Report.objects.exclude(visible=False).annotate(search=vector)\
.filter(search__icontains =search)
This other answer might be helpful.

mysql 'LIKE' in Django python

I am little new to Django,
My Question is How do i do %LIKE% of MYSQL in Django Filter
Want something like this
myModel.objects.filter(myField__**like**="xyz")
as we can do
myModel.objects.filter(myField__startswith="xyz")
for strings that starts with 'xyz' but i want to match anywhere in the myField content.
What i know
it can be done by REGEX and .extra() but i want something very straight forward.
Thanks in advance.

You can do it like this:
myModel.objects.filter(myField__contains = "xyz")
Note: __contains is case sensitive. You can use __icontains if you don't care about the case of the text.

Use the contains operator my_model.objects.filter(my_field__contains='xyz') and icontains if you want case insensitivity

How to use FILTER to select data that does not matches

I am using rdflib in Python and running SPARQL SELECT queries to get relevant data.
It is very easy to filter data for some criteria using FILTER command like FILTER regex(?pname,'"""+samplepersnalisedexpertise+"""',"i") described below, but if I have to select data that does not matches, then how do we need to use FILTER? I have tried using FILTER (?personuri != '"""+imURI+"""') below but that does not work.
exprtppl= GraphR.query("""
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX bibo: <http://purl.org/ontology/bibo/>
SELECT ?nname
{
?puburi dc:title ?pname.
FILTER regex(?pname,'"""+samplepersnalisedexpertise+"""',"i")
?personuri foaf:publications ?puburi.
?personuri foaf:nick ?nname
FILTER (?personuri != '"""+imURI+"""')
}""")
Can anyone of you please help out for solution. Thanks in Advance.

You are trying to compare with a URI value, which should not be surrounded by quotes, but by fish hooks:
FILTER(?personuri != <"""+imURI+""">)
By the way, the suggestion #morphyn gives above (using the str() function) will also work, but is less efficient.

Django-Haystack with Solr contains search

I am using haystack within a project using solr as the backend. I want to be able to perform a contains search, similar to the Django .filter(something__contains="...")
The __startswith option does not suit our needs as it, as the name suggests, looks for words that start with the string.
I tried to use something like *keyword* but Solr does not allow the * to be used as the first character
Thanks.

To get "contains" functionallity you can use:
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="100" side="back"/>
<filter class="solr.LowerCaseFilterFactory" />
as index analyzer.
This will create ngrams for every whitespace separated word in your field. For example:
"Index this!" => x, ex, dex, ndex, index, !, s!, is!, his!, this!
As you see this will expand your index greatly but if you now enter a query like:
"nde*"
it will match "ndex" giving you a hit.
Use this approach carefully to make sure that your index doesn't get too large. If you increase minGramSize, or decrease maxGramSize it will not expand the index as mutch but reduce the "contains" functionallity. For instance setting minGramSize="3" will require that you have at least 3 characters in your contains query.

You can achieve the same behavior without having to touch the solr schema. In your index, make your text field an EdgeNgramField instead of a CharField. Under the hood this will generate a similar schema to what lindstromhenrik suggested.

I am using an expression like:
.filter(something__startswith='...')
.filter_or(name=''+s'...')
as is seems solr does not like expression like '...*', but combined with or will do

None of the answers here do a real substring search *keyword*.
They don't find the keyword that is part of a bigger string, (not a prefix or suffix).
Using EdgeNGramFilterFactory or the EdgeNgramField in the indexes can only do a "startswith" or a "endswith" type of filtering.
The solution is to use a NgramField like this:
class MyIndex(indexes.SearchIndex, indexes.Indexable):
...
field_to_index= indexes.NgramField(model_attr='field_name')
...
This is very elegant, because you don't need to manually add anything to the schema.xml

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django queryset exclude regex - python

Related

Django SearchVector using icontains

Django Full Text Search Not Matching Partial Words

mysql 'LIKE' in Django python

How to use FILTER to select data that does not matches

Django-Haystack with Solr contains search

Categories

Resources