How to use "suggest" in elasticsearch pyes? - python

How to use the "suggest" feature in pyes? Cannot seem to figure it out due to poor documentation. Could someone provide a working example? None of what I tried appears to work. In the docs its listed under query, but using:
query = Suggest(fields="fieldname")
connectionobject.search(query=query)

Since version 5:
_suggest endpoint has been deprecated in favour of using suggest via _search endpoint. In 5.0, the _search endpoint has been optimized for suggest only search requests.
(from https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-suggesters.html)
Better way to do this is using search api with suggest option
from elasticsearch import Elasticsearch
es = Elasticsearch()
text = 'ra'
suggest_dictionary = {"my-entity-suggest" : {
'text' : text,
"completion" : {
"field" : "suggest"
}
}
}
query_dictionary = {'suggest' : suggest_dictionary}
res = es.search(
index='auto_sugg',
doc_type='entity',
body=query_dictionary)
print(res)
Make sure you have indexed each document with suggest field
sample_entity= {
'id' : 'test123',
'name': 'Ramtin Seraj',
'title' : 'XYZ',
"suggest" : {
"input": [ 'Ramtin', 'Seraj', 'XYZ'],
"output": "Ramtin Seraj",
"weight" : 34 # a prior weight
}
}

Here is my code which runs perfectly.
from elasticsearch import Elasticsearch
es = Elasticsearch()
text = 'ra'
suggDoc = {
"entity-suggest" : {
'text' : text,
"completion" : {
"field" : "suggest"
}
}
}
res = es.suggest(body=suggDoc, index="auto_sugg", params=None)
print(res)
I used the same client mentioned on the elasticsearch site here
I indexed the data in the elasticsearch index by using completion suggester from here

Related

Elastic search not giving data with big number for page size

Size of data to get: 20,000 approx
Issue: searching Elastic Search indexed data using below command in python
but not getting any results back.
from pyelasticsearch import ElasticSearch
es_repo = ElasticSearch(settings.ES_INDEX_URL)
search_results = es_repo.search(
query, index=advertiser_name, es_from=_from, size=_size)
If I give size less than or equal to 10,000 it works fine but not with 20,000
Please help me find an optimal solution to this.
PS: On digging deeper into ES found this message error:
Result window is too large, from + size must be less than or equal to: [10000] but was [19999]. See the scrolling API for a more efficient way to request large data sets.
for real time use the best solution is to use the search after query . You need only a date field, and another field that uniquely identify a doc - it's enough a _id field or an _uid field.
Try something like this, in my example I would like to extract all the documents that belongs to a single user - in my example the user field has a keyword datatype:
from elasticsearch import Elasticsearch
es = Elasticsearch()
es_index = "your_index_name"
documento = "your_doc_type"
user = "Francesco Totti"
body2 = {
"query": {
"term" : { "user" : user }
}
}
res = es.count(index=es_index, doc_type=documento, body= body2)
size = res['count']
body = { "size": 10,
"query": {
"term" : {
"user" : user
}
},
"sort": [
{"date": "asc"},
{"_uid": "desc"}
]
}
result = es.search(index=es_index, doc_type=documento, body= body)
bookmark = [result['hits']['hits'][-1]['sort'][0], str(result['hits']['hits'][-1]['sort'][1]) ]
body1 = {"size": 10,
"query": {
"term" : {
"user" : user
}
},
"search_after": bookmark,
"sort": [
{"date": "asc"},
{"_uid": "desc"}
]
}
while len(result['hits']['hits']) < size:
res =es.search(index=es_index, doc_type=documento, body= body1)
for el in res['hits']['hits']:
result['hits']['hits'].append( el )
bookmark = [res['hits']['hits'][-1]['sort'][0], str(result['hits']['hits'][-1]['sort'][1]) ]
body1 = {"size": 10,
"query": {
"term" : {
"user" : user
}
},
"search_after": bookmark,
"sort": [
{"date": "asc"},
{"_uid": "desc"}
]
}
Then you will find all the doc appended to the result var
If you would like to use scroll query - doc here:
from elasticsearch import Elasticsearch, helpers
es = Elasticsearch()
es_index = "your_index_name"
documento = "your_doc_type"
user = "Francesco Totti"
body = {
"query": {
"term" : { "user" : user }
}
}
res = helpers.scan(
client = es,
scroll = '2m',
query = body,
index = es_index)
for i in res:
print(i)
Probably its ElasticSearch constraints.
index.max_result_window index setting which defaults to 10,000

db.collection.update $set is not working

I am using gv3.6.2 mongo db and using $set to-update a field and it just doesn't work and am clueless as to why?any pointers are appreciated?
from pymongo import MongoClient .
from bson import ObjectId
import os,pymongo
dbuser = os.environ.get('user', '')
dbpass = os.environ.get('pwd', '')
uri = 'mongodb://{dbuser}:{dbpass}#machineip/data'.format(**locals())
client = MongoClient(uri)
db = client.data
collection = db['test']
print db.version
db.collection.update(
{ "_id" : ObjectId("5a95a1c32a2e2e0025e6d6e2") },
{ "$set":
{
"status": "submission"
}
}
)
Document:
{
"_id" : ObjectId("5a95a1c32a2e2e0025e6d6e2"),
"status" : "Submitting",
"endRev" : "9531c3448d3f7713dc74c4b05d177ecf0c6e4df6",
"chip" : "4364",
}
Your update isn't working because of the match portion of your query:
{ "_id": "5a95a1c32a2e2e0025e6d6e2" }
That is searching for a document with a string _id. You must cast to an ObjectId in order for it to find the matching document and perform the update.
{ "_id" : ObjectId("5a95a1c32a2e2e0025e6d6e2") }
Also be sure to include from pymongo import ObjectId.
Use update_many to update more than one document. If you want to update one document use update_one.
Actually update is deprecated.
from bson import ObjectId
db.collection.update_many({"_id" : ObjectId("5a95a1c32a2e2e0025e6d6e2")}, {"status": "submission"})
I hope it helps.

How to use find() nested documents for two levels or more?

Here is my sample mongodb database
database image for one object
The above is a database with an array of articles. I fetched only one object for simplicity purposes.
database image for multiple objects ( max 20 as it's the size limit )
I have about 18k such entries.
I have to extract the description and title tags present inside the (articles and 0) subsections.
The find() method is the question here.. i have tried this :
for i in db.ncollec.find({'status':"ok"}, { 'articles.0.title' : 1 , 'articles.0.description' : 1}):
for j in i:
save.write(j)
After executing the code, the file save has this :
_id
articles
_id
articles
and it goes on and on..
Any help on how to print what i stated above?
My entire code for reference :
import json
import newsapi
from newsapi import NewsApiClient
import pymongo
from pymongo import MongoClient
client = MongoClient()
db = client.dbasenews
ncollec = db.ncollec
newsapi = NewsApiClient(api_key='**********')
source = open('TextsExtractedTemp.txt', 'r')
destination = open('NewsExtracteddict.txt', "w")
for word in source:
if word == '\n':
continue
all_articles = newsapi.get_everything(q=word, language='en', page_size=1)
print(all_articles)
json.dump(all_articles, destination)
destination.write("\n")
try:
ncollec.insert(all_articles)
except:
pass
Okay, so I checked a little to update my rusty memory of pymongo, and here is what I found.
The correct query should be :
db.ncollec.find({ 'status':"ok",
'articles.title' : { '$exists' : 'True' },
'articles.description' : { '$exists' : 'True' } })
Now, if you do this :
query = { 'status' : "ok",
'articles.title' : { '$exists' : 'True' },
'articles.description' : { '$exists' : 'True' } }
for item in db.ncollect.find(query):
print item
And that it doesn't show anything, the query is correct, but you don't have the right database, or the right tree, or whatever.
But I assure you, that with the database you showed me, that if you do...
query = { 'status' : "ok",
'articles.title' : { '$exists' : 'True' },
'articles.description' : { '$exists' : 'True' } }
for item in db.ncollect.find(query):
save.write(item[0]['title'])
save.write(item[0]['description'])
It'll do what you wished to do in the first place.
Now, the key item[0] might not be good, but for this, I can't really be of any help since it is was you are showing on the screen. :)
Okay, now. I have found something for you that is a bit more complicated, but is cool :)
But I'm not sure if it'll work for you. I suspect you're giving us a wrong tree, since when you do .find( {'status' : 'ok'} ), it doesn't return anything, and it should return all the documents with a 'status' : 'ok', and since you have lots...
Anyways, here is the query, that you should use with .aggregate() method, instead of .find() :
elem = { '$match' : { 'status' : 'ok', 'articles.title' : { '$exists' : 'True'}, 'articles.description' : { '$exists' : 'True'}} }
[ elem, { '$unwind' : '$articles' }, elem ]
If you want an explanation as to how this works, I invite you to read this page.
This query will return ONLY the elements in your array that have a title, and a description, with a status OK. If an element doesn't have a title, or a description, it will be ignored.

Elasticsearch with python: query specific field

I'm using python's elasticsearch module to connect and search through my elasticsearch cluster.
In the cluster, one of the fields in my index is 'message' - I want to query my elastic, from python, for a specific value in this 'message' field.
Here is my basic search which simply returns all logs of a specific index.
es = elasticsearch.Elasticsearch(source_cluster)
doc = {
'size' : 10000,
'query': {
'match_all' : {}
}
}
res = es.search(index='test-index', body=doc, scroll='1m')
How should I change this query in order to find all results with the word 'moved' in their 'message' field?
The equivalent query that does it from Kibana is:
_index:test-index && message: moved
Thanks,
Noam
You need to use the match query. Try this:
doc = {
'size' : 10000,
'query': {
'match' : {
'message': 'moved'
}
}
}

Querying ElasticSearch with Python Requests not working fine

I'm trying to do full-text search on a mongodb db with the Elastic Search engine but I ran into a problem: no matters what search term I provide(or if I use query1 or query2), the engine always returns the same results. I think the problem is in the way I make the requests, but I don't know how to solve it.
Here is the code:
def search(search_term):
query1 = {
"fuzzy" : {
"art_text" : {
"value" : search_term,
"boost" : 1.0,
"min_similarity" : 0.5,
"prefix_length" : 0
}
},
"filter": {
"range" : {
"published": {
"from" : "20130409T000000",
"to": "20130410T235959"
}
}
}
}
query2 = {
"match_phrase": { "art_text": search_term }
}
es_query = json.dumps(query1)
uri = 'http://localhost:9200/newsidx/_search'
r = requests.get(uri, params=es_query)
results = json.loads( r.text )
data = [res['_source']['api_id'] for res in results['hits']['hits'] ]
print "results: %d" % len(data)
pprint(data)
The params parameter is not for data being sent. If you're trying to send data to the server you should specifically be using the data parameter. If you're trying to send query parameters, then you shouldn't be JSON-encoding them and just give it to params as a dict.
I suspect your first request should be the following:
r = requests.get(uri, data=es_query)
And before someone downvotes me, yes the HTTP/1.1 spec allows data to be sent with GET requests and yes requests does support it.
search = {'query': {'match': {'test_id':13} }, 'sort' {'date_utc':{'order':'desc'}} }
data = requests.get('http://localhost:9200/newsidx/test/_search?&pretty',params = search)
print data.json()
http://docs.python-requests.org/en/latest/user/quickstart/

Categories