I am trying to search for a list of values in multiple columns in postgres (via django). I was able to use SearchQuery and SearchVector and this works great if one of the search values matches a full word. I was hoping to use icontains so that partial strings could also be used in the search. Is this possible and if so could someone point me in the right direction. Here is an example of my approach below.
Example Data:
Superhero.objects.create(
superhero='Batman',
publisher='DC Comics',
alter_ego='Bruce Wayne',
)
Superhero.objects.create(
superhero='Hulk',
publisher='Marvel Comics',
alter_ego='Bruce Banner',
)
Django filter:
from django.contrib.postgres.search import SearchQuery, SearchVector
query = SearchQuery('man') | SearchQuery('Bruce')
vector = SearchVector('superhero', 'alter_ego', 'publisher')
queryset = queryset.annotate(search=vector).filter(search=query)
This would return the Hulk record but I am hoping I can somehow use like 'icontains' so that when searching for 'man' the Batman record would also be returned. Any help is appreciated!
You can apply icontains to the filter like:
queryset = queryset.annotate(search=vector).filter(search__icontains=query)
So SearchQuery and SearchVector are a part of Django's Full Text searching functionality and it doesnt look like you can achieve what I was wanting to do with these functions. I have taken a different approach thanks to Julian Phalip's approach here.. https://www.julienphalip.com/blog/adding-search-to-a-django-site-in-a-snap/
Related
I'm using Django's builtin trigram_similar lookup and TrigramSimilarity class to create a fuzzy searching using a PostgreSQL database. This way, I filter titles of articles. The problem is, this filters based on the whole title. I want to filter it also on parts on title.
Example:
Title of Article: "We are the world | This is the best article ever".
My search: "we"
In this example, my function returns nothing, but I want it to return this article. How can I do that?
This is the code I use:
def search(qs, term, fields=["title"], treshold=.3):
if len(fields) >= 2:
return qs.annotate(
similarity=Greatest(
*[TrigramSimilarity(field, term) for field in fields]
)
).filter(similarity__gte=treshold).order_by("-similarity")
return qs.filter(
**{"{}__trigram_similar".format(fields[0]): term}
)
I think the problem is that you are using trigram similarity with a two letter word. If you try to do this with a word that is three letters, maybe it will work?
What's the next best option for database-agnostic full-text search for Django without Haystack?
I have a model like:
class Paper(models.Model):
title = models.CharField(max_length=1000)
class Person(models.Model):
name = models.CharField(max_length=100)
class PaperReview(models.Model):
paper = models.ForeignKey(Paper)
person = models.ForeignKey(Person)
I need to search for papers by title and reviewer name, but I also want to search from the perspective of a person and find which papers they have and haven't reviewed. With Haystack, it's trivial to implement a full-text index to search by title and name fields, but as far as I can tell, there's no way to do the "left outer join" necessary to find papers without a review by a specific person.
Haystack is just a wrapper that exposes a few different search engine backends:
Solr
ElasticSearch
Whoosh
Xapian
There might be other backends as well available as plugins.
So the real question here is, is there a search backend that gives me the desired functionality, and does haystack expose that functionality?
The answer to that is, you can probably use elasticsearch*, but note the asterix.
Generally, when creating a search index, it's a good idea to think about the documents in the same way you might if you were creating a no-rel database and you want those documents to be as flat as possible.
So one possibility might be to have an array of char fields on a paperreview index. The array would contain all of the related foreign key references.
Another might be to use "nested documents" in elasticsearch.
And lastly, to use "parent/child documents" in elasticsearch.
You can still use haystack for indexing, with some hacking, but you will probably want to use one of the raw backends directly, such as pyelasticsearch or pyes.
http://www.elasticsearch.org/guide/reference/mapping/nested-type/
http://www.elasticsearch.org/guide/reference/mapping/parent-field/
http://pyelasticsearch.readthedocs.org/en/latest/
http://pyes.readthedocs.org/en/latest/
I know this question is older, but I spent some time investigation this recently and answered this as well here but it is actually not too hard to implement this yourself, and wanted to share.
I found the SearchVector/SearchQuery approach actually does not catch all cases, for example partial words (see https://www.fusionbox.com/blog/detail/partial-word-search-with-postgres-full-text-search-in-django/632/ for reference). You can implement your own without much trouble, depending on your constraints.
example, within a viewsets' get_queryset method:
...other params...
search_terms = self.request.GET.get('q')
if search_terms:
# remove possible other delimiters and other chars
# that could interfere
cleaned_terms = re.sub(r'[!\'()|&;,]', ' ', search_terms).strip()
if cleaned_terms:
# Check against all the params we want
# apply to previous terms' filtered results
q = reduce(
lambda p, n: p & n,
map(
lambda word:
Q(your_property__icontains=word) | Q(
second_property__icontains=word) | Q(
third_property__icontains=word)
cleaned_terms.split()
)
)
qs = YourModel.objects.filter(q)
return qs
I use Haystack + elastic search and so far its working pretty well. Dont think its trivial . You can easily implement your requirement, if theres a association between paper and person.
I ended up using djorm-ext-pgfulltext, which provides a simple Django interface for PostgreSQL's built-in full text search features.
I am using haystack within a project using solr as the backend. I want to be able to perform a contains search, similar to the Django .filter(something__contains="...")
The __startswith option does not suit our needs as it, as the name suggests, looks for words that start with the string.
I tried to use something like *keyword* but Solr does not allow the * to be used as the first character
Thanks.
To get "contains" functionallity you can use:
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="100" side="back"/>
<filter class="solr.LowerCaseFilterFactory" />
as index analyzer.
This will create ngrams for every whitespace separated word in your field. For example:
"Index this!" => x, ex, dex, ndex, index, !, s!, is!, his!, this!
As you see this will expand your index greatly but if you now enter a query like:
"nde*"
it will match "ndex" giving you a hit.
Use this approach carefully to make sure that your index doesn't get too large. If you increase minGramSize, or decrease maxGramSize it will not expand the index as mutch but reduce the "contains" functionallity. For instance setting minGramSize="3" will require that you have at least 3 characters in your contains query.
You can achieve the same behavior without having to touch the solr schema. In your index, make your text field an EdgeNgramField instead of a CharField. Under the hood this will generate a similar schema to what lindstromhenrik suggested.
I am using an expression like:
.filter(something__startswith='...')
.filter_or(name=''+s'...')
as is seems solr does not like expression like '...*', but combined with or will do
None of the answers here do a real substring search *keyword*.
They don't find the keyword that is part of a bigger string, (not a prefix or suffix).
Using EdgeNGramFilterFactory or the EdgeNgramField in the indexes can only do a "startswith" or a "endswith" type of filtering.
The solution is to use a NgramField like this:
class MyIndex(indexes.SearchIndex, indexes.Indexable):
...
field_to_index= indexes.NgramField(model_attr='field_name')
...
This is very elegant, because you don't need to manually add anything to the schema.xml
I'm not sure the best way to describe what it is that I'm trying to do so forgive my title.
I have two models, User and Group. Group contains field, members, which is a ManyToManyField referring to User.
Given a User, I want to find all of the Groups to which that user belongs.
My idea would be to do something like this:
groups = Group.objects.filter(user in members)
Something like that. Even though I realize that this isn't right
I tried reading through this link but couldn't figure out how to apply:
http://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
Thanks
EDIT:
Figured it out
groups = Group.objects.filter(members__username=user.username)
If you have the user and you want to have his groups then start querying from it, not the way around ;)
Here's an example:
james = User.objects.get(pk= 123)
james_groups = james.group_set.all()
The most concise way is probably
groups = user1.group_set.all()
which gives you a queryset that is iterable.
I might have missed somthing while searching through the documentation - I can't seem to find a way to use data from one query to form another query.
My query is:
sites_list = Site.objects.filter(worker=worker)
I'm trying to do something like this:
for site in sites_list:
[Insert Query Here]
Edit: I saw the awnser and im not sure how i didnt get that, maybe thats the sign im up too late coding :S
You could easily do something like this:
sites_list = Site.objects.filter(worker=worker)
for site in sites_list:
new_sites_list = Site.objects.filter(name=site.name).filter(something else)
You can also use the __in lookup type. For example, if you had an Entry model with a relation to Site, you could write:
Entry.objects.filter(site__in=Site.objects.filter(...some conditions...))
This will end up doing one query in the DB (the filter condition on sites would be turned into a subquery in the WHERE clause).