Elasticsearch-Python 2.7-Configure an index for analyzer

Elasticsearch-Python 2.7-Configure an index for analyzer - python

I am trying to build an index using the python API, with the following code (In particular I am trying to configure an analyzer):
doc = {
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding" ]
}
}
}
}
}
res = es.indices.create(index='index_db',body=doc)
But when I try to feed the database with some example data: 'My œsophagus caused a débâcle' (the same example of the website) I don't obtain : 'my, oesophagus, caused, a, debacle' but again: 'my, œsophagus caused, a, débâcle'. I think the problem is in the creation of the index. Do I use the correct syntax?

After several attempt I found the solution. It was a syntax problem.
The correct answer is:
doc = {
"index" : {
"analysis" : {
"analyzer" : {
"default" : {
"tokenizer" : "standard",
"filter" : ["standard", "asciifolding"]
}
}
}
}
}
es.indices.create(index='forensic_db',body=doc)

Related

Mongodb find nested dict element

{
"_id" : ObjectId("63920f965d15e98e3d7c450c"),
"first_name" : "mymy",
"last_activity" : 1669278303.4341061,
"username" : null,
"dates" : {
"29.11.2022" : {
},
"30.11.2022" : {
}
},
"user_id" : "1085116517"
}
How can I find all documents with 29.11.2022 contained in date? I tried many things but in all of them it detects the dot letter as something else.

Use $getField in $expr.
db.collection.find({
$expr: {
$eq: [
{},
{
"$getField": {
"field": "29.11.2022",
"input": "$dates"
}
}
]
}
})
Mongo Playground

How change the syntax in Elasticsearch 8 where 'body' parameter is deprecated?

After updating Python package elasticsearch from 7.6.0 to 8.1.0, I started to receive an error at this line of code:
count = es.count(index=my_index, body={'query': query['query']} )["count"]
receive following error message:
DeprecationWarning: The 'body' parameter is deprecated and will be
removed in a future version. Instead use individual parameters.
count = es.count(index=ums_index, body={'query': query['query']}
)["count"]
I don't understand how to use the above-mentioned "individual parameters".
Here is my query:
query = {
"bool": {
"must":
[
{"exists" : { "field" : 'device'}},
{"exists" : { "field" : 'app_version'}},
{"exists" : { "field" : 'updatecheck'}},
{"exists" : { "field" : 'updatecheck_status'}},
{"term" : { "updatecheck_status" : 'ok'}},
{"term" : { "updatecheck" : 1}},
{
"range": {
"#timestamp": {
"gte": from_date,
"lte": to_date,
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"
}
}
}
],
"must_not":
[
{"term" : { "device" : ""}},
{"term" : { "updatecheck" : ""}},
{"term" : { "updatecheck_status" : ""}},
{
"terms" : {
"app_version" : ['2.2.1.1', '2.2.1.2', '2.2.1.3', '2.2.1.4', '2.2.1.5',
'2.2.1.6', '2.2.1.7', '2.1.2.9', '2.1.3.2', '0.0.0.0', '']
}
}
]
}
}
In the official documentation, I can't find any chance to find examples of how to pass my query in new versions of Elasticsearch.
Possibly someone has a solution for this case other than reverting to previous versions of Elasticsearch?

According to the documentation, this is now to be done as follows:
# ✅ New usage:
es.search(query={...})
# ❌ Deprecated usage:
es.search(body={"query": {...}})
So the queries are done directly in the same line of code without "body", substituting the api you need to use, in your case "count" for "search".
You can try the following:
# ✅ New usage:
es.count(query={...})
# ❌ Deprecated usage:
es.count(body={"query": {...}})
enter code here
You can find out more by clicking on the following link:
https://github.com/elastic/elasticsearch-py/issues/1698
For example, if the query would be:
GET index-00001/_count
{
"query" : {
"match_all": {
}
}
}
Python client would be the next:
my_index = "index-00001"
query = {
"match_all": {
}
}
hits = en.count(index=my_index, query=query)
or
hits = en.count(index=my_index, query={"match_all": {}})

Using Elasticsearch 8.4.1, I got the same warning when creating indices via Python client.
I had to this this way instead:
settings = {
"number_of_shards": 2,
"number_of_replicas": 1
}
mappings = {
"dynamic": "true",
"numeric_detection": "true",
"_source": {
"enabled": "true"
},
"properties": {
"p_text": {
"type": "text"
},
"p_vector": {
"type": "dense_vector",
"dims": 768
},
}
}
es.indices.create(index=index_name, settings=settings, mappings=mappings)
Hope this helps.

How to get all document with max date?

i'm trying to get, from my MongoDB, all documents with the higher date.
My db is look like :
_id:"1"
date:"21-12-20"
report:"some stuff"
_id:"2"
date:"11-11-11"
report:"qualcosa"
_id:5fe08735b5a28812866cbc8a
date:"21-12-20"
report:Object
_id:5fe0b35e2f465c2a2bbfc0fd
date:"20-12-20"
report:"ciao"
and i would like to have a result like :
_id:"1"
date:"21-12-20"
report:"some stuff"
_id:5fe08735b5a28812866cbc8a
date:"21-12-20"
report:Object
I tried to run this script :
db.collection.find({}).sort([("date", -1)]).limit(1)
but it gives me only one document.
How can I get all the documents with the greatest date automatically?

Try to remove limit(1) and it's gonna work

If you add .limit(1) it's only ever going to give you one document.
Either use the answer as a query to another .find(), or you can write an aggregate query. If you data set is a modest size, I prefer the former for clarity.
max_date = list(db.collection.find({}).sort([("date", -1)])).limit(1)
if len(max_date) > 0:
db.collection.find({'date': max_date[0]['date']})

Use an aggregation pipeline like this:
db.collection.aggregate([
{ $group: { _id: null, data: { $push: "$$ROOT" } } },
{
$set: {
data: {
$filter: {
input: "$data",
cond: { $eq: [{ $max: "$data.date" }, "$$this.date"] }
}
}
}
},
{ $unwind: "$data" },
{ $replaceRoot: { newRoot: "$data" } }
])

how to search query in elastic search with synonyms in Python

I am not able to understand the implementation of the elastic search query along with the synonym table. With a general query, I don't have any search problems but incorporating synonyms as become an issue to me.
es.search(index='data_inex', body={
"query": {
"match": {"inex": "tren"}
},
"settings": {
"filter": {
"synonym": {
"type": "synonym",
"lenient": true,
"synonyms": [ "foo, baz", "tren, hut" ]
}
}
}
}
)
Also, is it possible to use a file instead of this array?

Check the documentation: Click Here
You can configure synonyms file as well:
PUT /test_index
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [ "synonym" ]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "analysis/synonym.txt" // <======== location of synonym file
}
}
}
}
}
}
Please note:
Changes in the synonyms file will not reflect in the documents indexed before the change. Re-indexing is required for the same.
You cannot change the mapping (including the analyzer) of an existing field. What you need to do if you want to change the mapping of existing documents is reindex those documents to another index with the updated mapping.
Search query doesn't support "settings".

How can multi search in Elasticsearch for python?

body =
{
"query":{
"bool":{
"must":{
'terms':{
"reason":["A","B"]
}
}
}
}
}
The 'reason' is in _source.
I want to find reason A or reason B given the index='test_index' by python.
But this code can't find.
The result is empty.
I use this "/_search?q=reason:A|Bsize=50&from=5000",the result is correct.
I want to get the same result in Python.
How can I do?

Try this.
test_index/_search
{
"query": {
"bool" :{
"should" : [
{ "term" : { "reason" : "A" } },
{ "term" : { "reason" : "B" } }
]
}
}
}
must is strict search and analogous to AND in sql query. should is analogous to OR in sql

It's not about Python when you use body as dict of your query.
Terms Query:
https://www.elastic.co/guide/en/elasticsearch/reference/7.4/query-dsl-terms-query.html
ES Client for Python:
https://elasticsearch-py.readthedocs.io/en/master/
It should works:
body = {"query":{"bool":{"must":{'terms':{"reason":["A","B"]}}}}}
res = es.search(index="test-index", body=body)
Check:
res['hits']['hits']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Elasticsearch-Python 2.7-Configure an index for analyzer - python

After several attempt I found the solution. It was a syntax problem. The correct answer is: doc = { "index" : { "analysis" : { "analyzer" : { "default" : { "tokenizer" : "standard", "filter" : ["standard", "asciifolding"] } } } } } es.indices.create(index='forensic_db',body=doc)

Related

Mongodb find nested dict element

How change the syntax in Elasticsearch 8 where 'body' parameter is deprecated?

How to get all document with max date?

how to search query in elastic search with synonyms in Python

How can multi search in Elasticsearch for python?

Categories

Resources