indexing synonyms in ElasticSearch Python - python

Problem description
I want to run a query string like this for example :
{"query": {
"query_string" : {
"fields" : ["description"],
"query" : "illegal~"
}
}
}
I have a side synonyms.txt file that contains synonyms :
illegal, banned, criminal, illegitimate, illicit, irregular, outlawed, prohibited
otherWord, synonym1, synonym2...
I want to find all elements having any one of these synonyms.
What I tried
First I want to index those synonyms in my ES database.
I tried to run this query with curl :
curl -X PUT "https://instanceAdress.europe-west1.gcp.cloud.es.io:9243/app/kibana#/dev_tools/console/sources" -H 'Content-Type: application/json' -d' {
"settings": {
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "synonyms.txt"
}
}
}
}
}
}
'
but it doesn't work {"statusCode":404,"error":"Not Found"}
I then need to change my query so that it takes into account the synonyms but I have no idea how.
So my questions are :
How can I index my synonyms ?
How can I change my query so that it does the query for all synonyms ?
Is there any way to index them in Python ?
example of a get query using Python Elasticsearch
es = Elasticsearch(
['fullAdress.europe-west1.gcp.cloud.es.io'],
http_auth=('login', 'password'),
scheme="https",
port=9243,
)
es.get(index="sources", doc_type='rcp', id="301495")

You can index using synonyms with Python by:
First, create a token filter:
synonyms_token_filter = token_filter(
'synonyms_token_filter', # Any name for the filter
'synonym', # Synonym filter type
synonyms=your_synonyms # Synonyms mapping will be inlined
)
And then create an analyzer:
custom_analyzer = analyzer(
'custom_analyzer',
tokenizer='standard',
filter=[
'lowercase',
synonyms_token_filter
])
There's also a package for this: https://github.com/agora-team/elasticsearch-synonyms

Related

MongoDB Python MongoEngine - Returning Document by filter of Embedded Documents Sum of Filtered property

I am using Python and MongoEngine to try and query the below Document in MongoDB.
I need a query to efficiently get the Documents only when they contain Embedded Documents 'Keywords' that match the following criteria:
Keywords Filtered where the Property 'SFR' is LTE '100000'
SUM the filtered keywords
Return the parent documents where SUM of the keywords matching the criteria is Greater than '9'
Example structure:
{
"_id" : ObjectId("5eae60e4055ef0e717f06a50"),
"registered_data" : ISODate("2020-05-03T16:12:51.999+0000"),
"UniqueName" : "SomeUniqueNameHere",
"keywords" : [
{
"keyword" : "carport",
"search_volume" : NumberInt(10532),
"sfr" : NumberInt(20127),
"percent_contribution" : 6.47,
"competing_product_count" : NumberInt(997),
"avg_review_count" : NumberInt(143),
"avg_review_score" : 4.05,
"avg_price" : 331.77,
"exact_ppc_bid" : 3.44,
"broad_ppc_bid" : 2.98,
"exact_hsa_bid" : 8.33,
"broad_hsa_bid" : 9.29
},
{
"keyword" : "party tent",
"search_volume" : NumberInt(6944),
"sfr" : NumberInt(35970),
"percent_contribution" : 4.27,
"competing_product_count" : NumberInt(2000),
"avg_review_count" : NumberInt(216),
"avg_review_score" : 3.72,
"avg_price" : 210.16,
"exact_ppc_bid" : 1.13,
"broad_ppc_bid" : 0.55,
"exact_hsa_bid" : 9.66,
"broad_hsa_bid" : 8.29
}
]
}
From the research I have been doing, I believe an Aggregate type query might do what I am attempting.
Unfortunately, being new to MongoDB / MongoEngine I am struggling to figure out how to structure the query and have failed in finding an example similar to what I am attempting to do (RED FLAG RIGHT????).
I did find an example of a aggregate but unsure how to structure my criteria in it, maybe something like this is getting close but does not work.
pipeline = [
{
"$lte": {
"$sum" : {
"keywords" : {
"$lte": {
"keyword": 100000
}
}
}: 9
}
}
]
data = product.objects().aggregate(pipeline)
Any guidance would be greatly appreciated.
Thanks,
Ben
you can try something like this
db.collection.aggregate([
{
$project: { // the first project to filter the keywords array
registered_data: 1,
UniqueName: 1,
keywords: {
$filter: {
input: "$keywords",
as: "item",
cond: {
$lte: [
"$$item.sfr",
100000
]
}
}
}
}
},
{
$project: { // the second project to get the length of the keywords array
registered_data: 1,
UniqueName: 1,
keywords: 1,
keywordsLength: {
$size: "$keywords"
}
}
},
{
$match: { // then do the match
keywordsLength: {
$gte: 9
}
}
}
])
you can test it here Mongo Playground
hope it helps
Note, I used sfr property only from the keywords array for simplicity

Extract values from oddly-nested Python

I must be really slow because I spent a whole day googling and trying to write Python code to simply list the "code" values only so my output will be Service1, Service2, Service2. I have extracted json values before from complex json or dict structure. But now I must have hit a mental block.
This is my json structure.
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
print(somejson["offers"]) # I tried so many variations to no avail.
Or, if you want the "code" stuffs :
>>> [s['code'] for s in somejson['offers'].values()]
['Service1', 'Service2', 'Service4']
somejson["offers"] is a dictionary. It seems you want to print its keys.
In Python 2:
print(somejson["offers"].keys())
In Python 3:
print([x for x in somejson["offers"].keys()])
In Python 3 you must use the list comprehension because in Python 3 keys() is a 'view', not a list.
This should probably do the trick , if you are not certain about the number of Services in the json.
import json
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
#Without knowing the Services:
offers = somejson["offers"]
keys = offers.keys()
for service in keys:
print(somejson["offers"][service]["code"])

Perform nested search using elasticsearch dsl

Hi I want to perform a nested search using elasticsearch dsl where a document field has nested json data in it so I want specific nested key values from it like -
Below is the document:-
{
"_index" : "data",
"_type" : "users",
"_id" : "15",
"_version" : 1,
"found" : true,
"_source" : {
"data" : {
"Gender" : "M",
"Marks" : "80",
"name" : "Mayank",
"Address" : "India"
},
"last_updated" : "2017-04-09T01:54:33.764573"
}
}
I only want field values which are stored in an array.
fields_want = ['name', 'Marks']
Output should be like -> {"name":"Mayank", "Marks":"80"}
Elasticsearch dsl documentation is pretty hard to understandand for me.
https://elasticsearch-dsl.readthedocs.io/en/latest/search_dsl.html#
Dsl code:-
client = Elasticsearch()
s = Search(using=client, index="data") \
.query("match", _type="users") \
.query("match", _id=15)
response = s.execute()
for hit in s:
print(hit.data)
From this code I can get the whole json object under data field.
Can somebody guide me here ?
It was solved.
I have used source filter to get nested output.
client = Elasticsearch()
s = Search(using=client, index="data") \
.query("match", _type="users") \
.query("match", _id=15) \
.source(['data.Name', 'data.Marks'])
response = s.execute()
print response
Output -
{u'Name': u'Mayank', u'Marks': u'80'}

How do I delete values from this document in MongoDB using Python

I am having a document which is structured like this
{
"_id" : ObjectId("564c0cb748f9fa2c8cdeb20f"),
"username" : "blah",
"useremail" : "blah#blahblah.com",
"groupTypeCustomer" : true,
"addedpartners" : [
"562f1a629410d3271ba74f74",
"562f1a6f9410d3271ba74f83"
],
"groupName" : "Mojito",
"groupTypeSupplier" : false,
"groupDescription" : "A group for fashion designers"
}
Now I want to delete one of the values from this 'addedpartners' array and update the document.
I want to just delete 562f1a6f9410d3271ba74f83 from the addedpartners array
This is what I had tried earlier.
db.myCollection.update({'_id':'564c0cb748f9fa2c8cdeb20f'},{'$pull':{'addedpartners':'562f1a6f9410d3271ba74f83'}})
db.myCollection.update(
{ _id: ObjectId(id) },
{ $pull: { 'addedpartners': '562f1a629410d3271ba74f74' } }
);
Try with this
db.myCollection.update({}, {$unset : {"addedpartners.1" : 1 }})
db.myCollection.update({}, {$pull : {"addedpartners" : null}})
No way to delete array directly, i think this is going to work, i haven't tried yet.

How to use "suggest" in elasticsearch pyes?

How to use the "suggest" feature in pyes? Cannot seem to figure it out due to poor documentation. Could someone provide a working example? None of what I tried appears to work. In the docs its listed under query, but using:
query = Suggest(fields="fieldname")
connectionobject.search(query=query)
Since version 5:
_suggest endpoint has been deprecated in favour of using suggest via _search endpoint. In 5.0, the _search endpoint has been optimized for suggest only search requests.
(from https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-suggesters.html)
Better way to do this is using search api with suggest option
from elasticsearch import Elasticsearch
es = Elasticsearch()
text = 'ra'
suggest_dictionary = {"my-entity-suggest" : {
'text' : text,
"completion" : {
"field" : "suggest"
}
}
}
query_dictionary = {'suggest' : suggest_dictionary}
res = es.search(
index='auto_sugg',
doc_type='entity',
body=query_dictionary)
print(res)
Make sure you have indexed each document with suggest field
sample_entity= {
'id' : 'test123',
'name': 'Ramtin Seraj',
'title' : 'XYZ',
"suggest" : {
"input": [ 'Ramtin', 'Seraj', 'XYZ'],
"output": "Ramtin Seraj",
"weight" : 34 # a prior weight
}
}
Here is my code which runs perfectly.
from elasticsearch import Elasticsearch
es = Elasticsearch()
text = 'ra'
suggDoc = {
"entity-suggest" : {
'text' : text,
"completion" : {
"field" : "suggest"
}
}
}
res = es.suggest(body=suggDoc, index="auto_sugg", params=None)
print(res)
I used the same client mentioned on the elasticsearch site here
I indexed the data in the elasticsearch index by using completion suggester from here

Categories