Python and Elasticsearch autcompletion

Python and Elasticsearch autcompletion - python

I am trying to work with Python Elasticsearch version 1.1.0, on the master branch. It seems it will create an index, but there are issues with retrieving autocomplete results, when using a suggestion filed.
Below is a basic Python functions to create an index, then add a song to it, and finally we query it through the curl at the very bottom.
Unfortunately it fails with the error:
"reason" : "BroadcastShardOperationFailedException[[music][2] ]; nested: ElasticsearchException[failed to execute suggest]; nested: ElasticsearchException[Field [suggest] is not a completion suggest field]; "
} ]'
The functions I am using to create the index and add a song is below:
conn = Elasticsearch()
def mapping():
return """{
"song" : {
"properties" : {
"name" : { "type" : "string" },
"suggest" : { "type" : "completion",
"index_analyzer" : "simple",
"search_analyzer" : "simple",
"payloads" : true
}
}
}
}"""
def createMapping():
settings = mapping()
conn.indices.create(index= "music", body=settings)
def addSong():
body = """{
"name" : "Nevermind",
"suggest" : {
"input": [ "Nevermind", "Nirvana" ],
"output": "Nirvana - Nevermind",
"payload" : { "artistId" : 2321 },
"weight" : 34
}
}"""
res = conn.index(body=body, index="music", doc_type="song", id=1)
Curl request:
curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
"song-suggest" : {
"text" : "n",
"completion" : {
"field" : "suggest"
}
}
}'

When you use the create index API, you have to wrap your mappings in mappings:
def createMapping():
settings = """{"mappings": %s}""" % mapping()
conn.indices.create(index= "music", body=settings)

Related

How change the syntax in Elasticsearch 8 where 'body' parameter is deprecated?

After updating Python package elasticsearch from 7.6.0 to 8.1.0, I started to receive an error at this line of code:
count = es.count(index=my_index, body={'query': query['query']} )["count"]
receive following error message:
DeprecationWarning: The 'body' parameter is deprecated and will be
removed in a future version. Instead use individual parameters.
count = es.count(index=ums_index, body={'query': query['query']}
)["count"]
I don't understand how to use the above-mentioned "individual parameters".
Here is my query:
query = {
"bool": {
"must":
[
{"exists" : { "field" : 'device'}},
{"exists" : { "field" : 'app_version'}},
{"exists" : { "field" : 'updatecheck'}},
{"exists" : { "field" : 'updatecheck_status'}},
{"term" : { "updatecheck_status" : 'ok'}},
{"term" : { "updatecheck" : 1}},
{
"range": {
"#timestamp": {
"gte": from_date,
"lte": to_date,
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"
}
}
}
],
"must_not":
[
{"term" : { "device" : ""}},
{"term" : { "updatecheck" : ""}},
{"term" : { "updatecheck_status" : ""}},
{
"terms" : {
"app_version" : ['2.2.1.1', '2.2.1.2', '2.2.1.3', '2.2.1.4', '2.2.1.5',
'2.2.1.6', '2.2.1.7', '2.1.2.9', '2.1.3.2', '0.0.0.0', '']
}
}
]
}
}
In the official documentation, I can't find any chance to find examples of how to pass my query in new versions of Elasticsearch.
Possibly someone has a solution for this case other than reverting to previous versions of Elasticsearch?

According to the documentation, this is now to be done as follows:
# ✅ New usage:
es.search(query={...})
# ❌ Deprecated usage:
es.search(body={"query": {...}})
So the queries are done directly in the same line of code without "body", substituting the api you need to use, in your case "count" for "search".
You can try the following:
# ✅ New usage:
es.count(query={...})
# ❌ Deprecated usage:
es.count(body={"query": {...}})
enter code here
You can find out more by clicking on the following link:
https://github.com/elastic/elasticsearch-py/issues/1698
For example, if the query would be:
GET index-00001/_count
{
"query" : {
"match_all": {
}
}
}
Python client would be the next:
my_index = "index-00001"
query = {
"match_all": {
}
}
hits = en.count(index=my_index, query=query)
or
hits = en.count(index=my_index, query={"match_all": {}})

Using Elasticsearch 8.4.1, I got the same warning when creating indices via Python client.
I had to this this way instead:
settings = {
"number_of_shards": 2,
"number_of_replicas": 1
}
mappings = {
"dynamic": "true",
"numeric_detection": "true",
"_source": {
"enabled": "true"
},
"properties": {
"p_text": {
"type": "text"
},
"p_vector": {
"type": "dense_vector",
"dims": 768
},
}
}
es.indices.create(index=index_name, settings=settings, mappings=mappings)
Hope this helps.

Celery Result type for ElasticSearch

I'm exploring celery for my work currently and I'm trying to set-up Elasticsearch backend. Is there any way to send resulting value as a dictionary/JSON, not as a text? Therefore, results in Elasticsearch will be shown correctly and nested type could be used?
Automatic mapping created by celery
{
"celery" : {
"mappings" : {
"backend" : {
"properties" : {
"#timestamp" : {
"type" : "date"
},
"result" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
I've tried to create my own mapping with nested field, but it has resulted in a elasticsearch.exceptions.RequestError: RequestError(400, 'mapper_parsing_exception', 'object mapping for [result] tried to parse field [result] as object, but found a concrete value')
UPDATE
Result is already encoded in JSON and inside Elasticsearch wrapper JSON string is saved inside a dictionary. Adding json.loads(result) as a quick-fix actually helps.
After the quick-fix new mapping has appeared:
{
"celery" : {
"mappings" : {
"backend" : {
"properties" : {
"#timestamp" : {
"type" : "date"
},
"result" : {
"properties" : {
"date_done" : {
"type" : "date"
},
"result" : {
"type" : "long"
},
"status" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"task_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
}
Updated Kibana view:
Is there any way to disable serialization of results in Celery?
I could add a pull-request with unpacking JSON, just for Elasticsearch, but it looks like a hack.

Since v4.0 the default result_serializer is json, so you should have results in JSON format anyway. Maybe your configuration uses something else? - In that case I suggest you remove it (if you use Celery >=4.0) and you should enjoy results in JSON format. I prefer msgpack but on the other hand I do not use ElasticSearch on Celery results...

Elastic Search and AWS python

I am working on AWS ElasticSearch using python,I have JSON file with 3 field.
("cat1","Cat2","cat3"), each row is separated with \n
example cat1:food, cat2: wine, cat3: lunch etc.
from requests_aws4auth import AWS4Auth
import boto3
import requests
payload = {
"settings": {
"number_of_shards": 10,
"number_of_replicas": 5
},
"mappings": {
"Categoryall" :{
"properties" : {
"cat1" : {
"type": "string"
},
"Cat2":{
"type" : "string"
},
"cat3" : {
"type" : "string"
}
}
}
}
}
r = requests.put(url, auth=awsauth, json=payload)
I created schema/mapping for the index as shown above but i don't know how to populate index.
I am thinking to put a for loop for JSON file and call post request to insert the index. Doesn't have an idea how to proceed.
I want to create index and bulk upload this file in the index. Any suggestion would be appreciated.

Take a look at Elasticsearch Bulk API.
Basically, you need to create a bulk request body and post it to your "https://{elastic-endpoint}/_bulk" url.
The following example is showing a bulk request to insert 3 json records into your index called "my_index":
{ "index" : { "_index" : "my_index", "_type" : "_doc", "_id" : "1" } }
{ "cat1" : "food 1", "cat2": "wine 1", "cat3": "lunch 1" }
{ "index" : { "_index" : "my_index", "_type" : "_doc", "_id" : "2" } }
{ "cat1" : "food 2", "cat2": "wine 2", "cat3": "lunch 2" }
{ "index" : { "_index" : "my_index", "_type" : "_doc", "_id" : "3" } }
{ "cat1" : "food 3", "cat2": "wine 3", "cat3": "lunch 3" }
where each json record is represented by 2 json objects.
So if you write your bulk request body into a file called post-data.txt, then you can post it using Python something like this:
with open('post-data.txt','rb') as payload:
r = requests.post('https://your-elastic-endpoint/_bulk', auth=awsauth,
data=payload, ... add more params)
Alternatively, you can try Python elasticsearch bulk helpers.

post request with \n-delimited JSON in python

I'm trying to use the bulk API from Elasticsearch and I see that this can be done using the following request which is special because what is given as a "data" is not a proper JSON, but a JSON that uses \n as delimiters.
curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d '
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
'
My question is how can I perform such request within python? The authors of ElasticSearch suggest to not pretty print the JSON but I'm not sure what it means (see https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html)
I know that this is a valid python request
import requests
import json
data = json.dumps({"field":"value"})
r = requests.post("localhost:9200/_bulk?pretty", data=data)
But what do I do if the JSON is \n-delimited?

What this really is is a set of individual JSON documents, joined together with newlines. So you could do something like this:
data = [
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } },
{ "field1" : "value1" },
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" }, },
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" }, },
{ "field1" : "value3" },
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} },
{ "doc" : {"field2" : "value2"} }
]
data_to_post = '\n'.join(json.dumps(d) for d in data)
r = requests.post("localhost:9200/_bulk?pretty", data=data_to_post)
However, as pointed out in the comments, the Elasticsearch Python client is likely to be more useful.

As a follow-up to Daniel's answer above, I had to add an additional '\n' to the end of the data_to_post, and add a {Content-Type: application/x-ndjson} header to get it work in Elasticsearch 6.3.
data_to_post = '\n'.join(json.dumps(d) for d in data) + "\n"
headers = {"Content-Type": "application/x-ndjson"}
r = requests.post("http://localhost:9200/_bulk?pretty", data=data_to_post, headers=headers)
Otherwise, I will receive the error:
"The bulk request must be terminated by a newline [\\n]"

You can use python ndjson library to do it.
https://pypi.org/project/ndjson/
It contains JSONEncoder and JSONDecoder classes for easy use with other libraries, such as requests:
import ndjson
import requests
response = requests.get('https://example.com/api/data')
items = response.json(cls=ndjson.Decoder)

PyMongo returns an empty result set when a MongoDB client returns correct results

I have a simple MongoDB collection that I am accessing using PyMongo in my Python script.
I am filtering the query in Python using the dictionary:
{ "$and" : [
{ "bettinginterests" : { "$elemMatch" : { "runner.name" : "Jailhouse King" } } },
{ "bettinginterests" : { "$elemMatch" : { "runner.name" : "Tyrone Haji" } } }
]
}
And this returns correct results. However, I would like to expand the filter to be:
{ "$and" : [
{ "bettinginterests" : { "$elemMatch" : { "runner.name" : "Jailhouse King" } } },
{ "bettinginterests" : { "$elemMatch" : { "runner.name" : "Tyrone Haji" } } },
{ "summary.dist" : "1" }
]
}
And this is returning an empty result set. Now when I do this same query in my MongoDB client using:
db.race_results.find({ "$and" : [
{ "bettinginterests" : { "$elemMatch" : { "runner.name" : "Jailhouse King" } } },
{ "bettinginterests" : { "$elemMatch" : { "runner.name" : "Tyrone Haji" } } },
{ "summary.dist": "1" }
]
})
The results are returned correctly as expected.
I don't see any difference between the Python dictionary being passed as the query filter, and the js code being executed on my MongoDB client.
Does anyone see where there might be a difference? I'm at a loss here.
UPDATE:
Here is a sample record in my DB:
https://gist.github.com/brspurri/8cefcd20a7f995145a81
UPDATE 2:
Python Code to perform the query:
runner = "Jailhouse King"
opponent = "Tyrone Haji"
query_filter = {"$and": [
{"bettinginterests": {"$elemMatch": {"runner.name": runner}}},
{"bettinginterests": {"$elemMatch": {"runner.name": opponent}}},
{ "summary.dist" : "1" }
]
}
try:
collection = db.databases['race_results']
entities = None
if not query_filter:
entities = collection.find().sort([("date", -1)])
else:
entities = collection.find(query_filter).sort([("date", -1)])
except BaseException, e:
print('An error occured in query: %s\n' % e)

This line is probably the culprit.
collection = db.databases['race_results']
If db is your database you are doing it wrong. It should be
collection = db['race_results']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python and Elasticsearch autcompletion - python

When you use the create index API, you have to wrap your mappings in mappings: def createMapping(): settings = """{"mappings": %s}""" % mapping() conn.indices.create(index= "music", body=settings)

Related

How change the syntax in Elasticsearch 8 where 'body' parameter is deprecated?

Celery Result type for ElasticSearch

Elastic Search and AWS python

post request with \n-delimited JSON in python

PyMongo returns an empty result set when a MongoDB client returns correct results

Categories

Resources