Python custom scripting in ElasticSearch

Python custom scripting in ElasticSearch - python

The index has the capability of taking custom scripting in Python, but I can't find an example of custom scripting written in Python anywhere. Does anybody have an example of a working script? One with something as simple as an if-statement would be amazing.

A simple custom scoring query using python (assuming you have the plugin installed).
{
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"query": {
"function_score": {
"query": {
"match_all": {}
},
"script_score": {
"lang": "python",
"script": [
"if _score:",
" _score"
]
},
"boost_mode": "replace"
}
},
"track_scores": true
}

Quoted from elasticsearch ML -
Luca pointed out that ES calls python with an 'eval'
PyObject ret = interp.eval((PyCode) compiledScript);
Just make sure your code pass through the eval.

Related

AWS Python StepFunctions stepfunctions.steps.Parallel [Generate Definition]

is there an example of the Python AWS Data Science SDK for stepfunctions stepfunctions.steps.Parallel class implementation?
Parallel execution requires branches, but i cant seem to find the methods or documentation about their description.
Generating a synchronous list of steps works fine, but i cant find how to define the parallel step, anyone knows?
Are there any other libraries that can do this? boto3 as far as i looked doesnt have the functionality and CDK is not suitable, as this will be a service.
I'd like to be able to generate something like this by using just code:
{
"Comment": "Parallel Example.",
"StartAt": "LookupCustomerInfo",
"States": {
"LookupCustomerInfo": {
"Type": "Parallel",
"End": true,
"Branches": [
{
"StartAt": "LookupAddress",
"States": {
"LookupAddress": {
"Type": "Task",
"Resource":
"arn:aws:lambda:us-east-1:123456789012:function:AddressFinder",
"End": true
}
}
},
{
"StartAt": "LookupPhone",
"States": {
"LookupPhone": {
"Type": "Task",
"Resource":
"arn:aws:lambda:us-east-1:123456789012:function:PhoneFinder",
"End": true
}
}
}
]
}
}
}

AVRO - JSON Enconding of messages in Confluent Python SerializingProducer vs. Confluent Rest Proxy with UNIONS

attached an example AVRO-Schema
{
"type": "record",
"name": "DummySampleAvroValue",
"namespace": "de.company.dummydomain",
"fields": [
{
"name": "ID",
"type": "int"
},
{
"name": "NAME",
"type": [
"null",
"string"
]
},
{
"name": "STATE",
"type": "int"
},
{
"name": "TIMESTAMP",
"type": [
"null",
"string"
]
}
]
}
Regarding the section "JSON Encoding" of the official AVRO-Specs - see: https://avro.apache.org/docs/current/spec.html#json_encoding - a JSON Message which validates against the above AVRO-Schema should look like the following because of the UNION-Types used:
{
"ID":1,
"NAME":{
"string":"Kafka"
},
"STATE":-1,
"TIMESTAMP":{
"string":"2022-04-28T10:57:03.048413"
}
}
When producing this message via Confluent Rest Proxy (AVRO), everything works fine, the data is accepted, validated and present in Kafka.
When using the "SearializingProducer" from the confluent_kafka Python Package, the example message is not accepted and only "regular" JSON works, e. g.:
{
"ID":1,
"NAME":"Kafka",
"STATE":-1,
"TIMESTAMP":"2022-04-28T10:57:03.048413"
}
Is this intended behaviour or am I doing something wrong? Can I tell the SerializingProducer to accept this encoding?
I need to hold open both ways to produce messages but the sending system can/want´s only to provide one of the above Payloads. Is there a way to use both with the same payload?
Thanks in advance.
Best regards

Elastic search create custom analyzer using Python client HTTP 400 issue

I am trying to create a custom analyzer with elastic search python client. I'm referring to this article in elastic search documentation.
elastic docs article
When I send a PUT request with the following JSON settings it sends 200 Success.
PUT my-index-000001
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"char_filter": [
"emoticons"
],
"tokenizer": "punctuation",
"filter": [
"lowercase",
"english_stop"
]
}
},
"tokenizer": {
"punctuation": {
"type": "pattern",
"pattern": "[ .,!?]"
}
},
"char_filter": {
"emoticons": {
"type": "mapping",
"mappings": [
":) => _happy_",
":( => _sad_"
]
}
},
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
}
}
}
}
}
The issue comes when I try to do the same with the python client. Here's how I am using it.
settings.py to define settings
settings = {
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"char_filter": [
"emoticons"
],
"tokenizer": "punctuation",
"filter": [
"lowercase",
"english_stop"
]
}
},
"tokenizer": {
"punctuation": {
"type": "pattern",
"pattern": "[ .,!?]"
}
},
"char_filter": {
"emoticons": {
"type": "mapping",
"mappings": [
":) => _happy_",
":( => _sad_"
]
}
},
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
}
}
}
}
}
create-index helper method
es_connection.create_index(index_name="test", mapping=mapping, settings=settings)
es-client call
def create_index(self, index_name: str, mapping: Dict, settings) -> None:
"""
Create an ES index.
:param index_name: Name of the index.
:param mapping: Mapping of the index
"""
logging.info(f"Creating index {index_name} with the following schema: {json.dumps(mapping, indent=2)}")
self.es_client.indices.create(index=index_name, ignore=400, mappings=mapping, settings=settings)
I get the following error from logs
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"unknown setting [index.settings.analysis.analyzer.my_custom_analyzer.char_filter] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"}],"type":"illegal_argument_exception","reason":"unknown setting [index.settings.analysis.analyzer.my_custom_analyzer.char_filter] please check that any required plugins are installed, or check the breaking changes documentation for removed settings","suppressed":[{"type":"illegal_argument_exception","reason":"unknown setting [index.settings.analysis.analyzer.my_custom_analyzer.filter] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"},{"type":"illegal_argument_exception","reason":"unknown setting [index.settings.analysis.analyzer.my_custom_analyzer.tokenizer] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"},{"type":"illegal_argument_exception","reason":"unknown setting [index.settings.analysis.char_filter.emoticons.mappings] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"},{"type":"illegal_argument_exception","reason":"unknown setting [index.settings.analysis.char_filter.emoticons.type] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"},{"type":"illegal_argument_exception","reason":"unknown setting [index.settings.analysis.filter.english_stop.stopwords] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"},{"type":"illegal_argument_exception","reason":"unknown setting [index.settings.analysis.filter.english_stop.type] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"},{"type":"illegal_argument_exception","reason":"unknown setting [index.settings.analysis.tokenizer.punctuation.pattern] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"},{"type":"illegal_argument_exception","reason":"unknown setting [index.settings.analysis.tokenizer.punctuation.type] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"}]},"status":400}
Any idea what causes this issue ??? Related to ignore 400 ???? Thanks in advance.
PS - I'm using docker.elastic.co/elasticsearch/elasticsearch:7.15.1 and python elasticsearch client 7.15.1

You simply need to remove the settings section at the top because it's added automatically by the client code:
settings = {
"settings": { <--- remove this line
"analysis": {
"analyzer": {

Warning while trying to add mapping with dynamic_templates having analyzer and search_analyzer

I am using elasticsearch python client to connect to elasticsearch.
While trying to add mapping to index, I am getting following warning:
es.indices.put_mapping(index=index, body=mappings)
/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py:209: ElasticsearchWarning: }}], attempted to validate it with the following match_mapping_type: [string], caused by [unknown parameter [search_analyzer] on mapper [__dynamic__attributes] of type [keyword]]
/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py:209: ElasticsearchWarning: }}], attempted to validate it with the following match_mapping_type: [string], caused by [unknown parameter [search_analyzer] on mapper [__dynamic__metadata] of type [keyword]]
warnings.warn(message, category=ElasticsearchWarning)
And while indexing the record, got this warning:
/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py:209: ElasticsearchWarning: Parameter [search_analyzer] is used in a dynamic template mapping and has no effect on type [keyword]. Usage will result in an error in future major versions and should be removed.
warnings.warn(message, category=ElasticsearchWarning)
/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py:209: ElasticsearchWarning: Parameter [analyzer] is used in a dynamic template mapping and has no effect on type [keyword]. Usage will result in an error in future major versions and should be removed.
warnings.warn(message, category=ElasticsearchWarning)
I am using Using elasticsearch "7.15.1"
pip packages:
elasticsearch==7.15.1
elasticsearch-dsl==7.4.0
My settings and mappings are:
settings = {"analysis": {"analyzer": {"my_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["trim"]}
}
}
}
mappings = {"dynamic_templates": [
{"attributes": {
"match_mapping_type": "string",
"path_match": "attributes.*",
"mapping": {
"type": "keyword",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
}
}
},
{"metadata": {
"match_mapping_type": "string",
"path_match": "metadata.*",
"mapping": {
"type": "keyword",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
}
}
}
]
}
I need help in adjusting the mapping, this mapping was working fine on elastic 6.0.1. After upgrading to 7.15.1 started getting warning.

You are trying to set an analyzer on a keyword field. The Elasticsearch analyzer documentation states at the top of the page:
Only text fields support the analyzer mapping parameter.
You have to change the type of your field to text or specify no analyzer at all for the keyword fields. You can also use normalizers to apply token filters to your keyword fields. As mentioned in the answer from this question on the Elastic discuss page.
The trim token filter that you want to use is not explicitly mentioned in the list of compatible filters, but I tried it with the Kibana dev tools, and it seems to work:
PUT normalizer_trim
{
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"filter": ["lowercase", "trim"]
}
}
}
},
"mappings": {
"properties": {
"foo": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}

Operation Failure : Not authorized on aggregations to execute the command

I am new to the mongodb , and i have been learning some of the methods using the pymongo version 3.8.0 and the jupyter notebook. It has been going fine, until i tried the "$lookup" methods, now it has started throwing the error
Operations Failure: not authorized on aggregations to execute the command. Any help/suggestions on solving the issue will highly be appreciated.
I have tried reinstalling the packages, and enable windows administration privileges, that so far has not solved the problem
OperationFailure: not authorized on aggregations to execute command
{ aggregate: "air_routes", pipeline: [ { $match: { airplane: { $regex: "747|380" } } }, { $lookup: { from: "air_alliance", localField: "airline.name", foreignField: "airlines", as: "data_src" } },
{ $unwind: "$data_src" }, { $group: { _id: { name: "$name", airlines: "$airlines" }, numberofflights: { $sum: 1 } } }, { $sort: { numberofflights: -1 } },
{ allowDiskUse: true } ], cursor: {}, lsid: { id: UUID("af942a3d-309b-4cd2-a99b-3ebcd60406f4") }, $clusterTime: { clusterTime: Timestamp(1557101096, 1),
signature: { hash: BinData(0, AD50B7BE136F58D794C75C6AD031E92168EF61D1), keyId: 6627672121604571137 } }, $db: "aggregations", $readPreference: { mode: "primary" } }
Please help resolve this issue. Thanks,

Okay, i have found the answer, apparently, it is permissions related issue, and the second call to the database (databases are stored on atlas cluster) was passing some parameters (apparently), which were either coming off empty or were not fetched properly, reasons for which are still not clear. Therefore, the second collection set "air_alliance" was reproducing the error.
A helpful thread is given here https://jira.mongodb.org/browse/CSHARP-1722

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python custom scripting in ElasticSearch - python

The index has the capability of taking custom scripting in Python, but I can't find an example of custom scripting written in Python anywhere. Does anybody have an example of a working script? One with something as simple as an if-statement would be amazing.

Quoted from elasticsearch ML - Luca pointed out that ES calls python with an 'eval' PyObject ret = interp.eval((PyCode) compiledScript); Just make sure your code pass through the eval.

Related

AWS Python StepFunctions stepfunctions.steps.Parallel [Generate Definition]

AVRO - JSON Enconding of messages in Confluent Python SerializingProducer vs. Confluent Rest Proxy with UNIONS

Elastic search create custom analyzer using Python client HTTP 400 issue

Warning while trying to add mapping with dynamic_templates having analyzer and search_analyzer

Operation Failure : Not authorized on aggregations to execute the command

Categories

Resources