I have created Elasticsearch index and one of the nested field has mapping as following.
"groups": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"value": {
"type": "text"
}
}
}
On details about ES version, its 5.0 and I am using official python client elasticsearch-py on client side. I want to query this nested field based on its value.
Lets say there is another field called name which is a text type field. I want to find all name starting with A and falling under group specified.
Some sample data,
Groups - HR(name=HR, value=hr), Marketing(name=Marketing, value=marketing)
Names - Andrew, Alpha, Barry, John
Andrew and Alpha belong to group HR.
Based on this I tried a query
{
'query': {
'bool': {
'must': [{
'match_phrase_prefix': {
'title': 'A'
}
}]
},
'nested': {
'path': 'groups',
'query': {
'bool': {
'must': [{
'match': {
'groups.value': 'hr'
}
}]
}
}
}
}
}
For this query I referred ES docs but this query does not return anything. It would be great if someone can point out what is wrong with this query or mapping itself.
You're almost there, you simply need to move the nested query inside the bool/must query:
{
'query': {
'bool': {
'must': [
{
'match_phrase_prefix': {
'title': 'A'
}
},
{
'nested': {
'path': 'groups',
'query': {
'bool': {
'must': [{
'match': {
'groups.value': 'hr'
}
}]
}
}
}
}
]
}
}
}
Related
Say I have this:
search_object = {
'query': {
'bool' : {
'must' : {
'simple_query_string' : {
'query': search_text,
'fields': [ 'french_no_accents', 'def_no_accents', ],
},
},
'filter' : [
{ 'term' : { 'def_no_accents' : 'court', }, },
{ 'term' : { 'def_no_accents' : 'bridge', }, },
],
},
},
'highlight': {
'encoder': 'html',
'fields': {
'french_no_accents': {},
'def_no_accents': {},
},
'number_of_fragments' : 0,
},
}
... whatever search string I enter as search_text, its constituent terms, but also "court" and "bridge" are highlighted. I don't want "court" or "bridge" to be highlighted.
I've tried putting the "highlight" key-value in a different spot in the structure... nothing seems to work (i.e. syntax exception thrown).
More generally, is there a formal grammar anywhere specifying what you can and can't do with ES (v7) queries?
You could add a highlight query to limit what should and shouldn't get highlighted:
{
"query": {
"bool": {
"must": {
"simple_query_string": {
"query": "abc",
"fields": [
"french_no_accents",
"def_no_accents"
]
}
},
"filter": [
{ "term": { "def_no_accents": "court" } },
{ "term": { "def_no_accents": "bridge" } }
]
}
},
"highlight": {
"encoder": "html",
"fields": {
"*_no_accents": { <--
"highlight_query": {
"simple_query_string": {
"query": "abc",
"fields": [ "french_no_accents", "def_no_accents" ]
}
}
}
},
"number_of_fragments": 0
}
}
I've used a wildcard for the two fields (*_no_accents) -- if that matches unwanted fields too, you'll need to duplicate the highlight query on two separate, non-wilcard highlight fields like you originally had. Though I can't think of a scenario where that'd happen since your multi_match query targets two concrete fields.
As to:
More generally, is there a formal grammar anywhere specifying what you can and can't do with ES (v7) queries?
what exactly are you looking for?
I want to insert a new item in the table only if a particular item already exists. Is it possible to achieve this using transact_write_items? I want to avoid querying the table and then inserting the new item.
response = dynamo_client.transact_write_items(
TransactItems=[
{
'ConditionCheck': {
'Key': {
'indicator_id': {
'S': 'indicator_1'
}
},
'ConditionExpression': 'attribute_exists(#indicator_id)',
'ExpressionAttributeNames': {
'#indicator_id': 'indicator_id'
},
'TableName': 'CAS'
},
'Put': {
'Key': {
'indicator_id': {
'S': 'update_indicator_1'
}
},
'TableName': 'CAS'
}
}
]
)
This throws the following error :
botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the TransactWriteItems operation: TransactItems can only contain one of Check, Put, Update or Delete
There are 2 modification required in your argument TransactItems
The operations in json should be re-arranged
In Put operation replace Key with Item
response = dynamo_client.transact_write_items(
TransactItems=[
{
'ConditionCheck': {
'Key': {
'indicator_id': {
'S': 'indicator_1'
}
},
'ConditionExpression': 'attribute_exists(#indicator_id)',
'ExpressionAttributeNames': {
'#indicator_id': 'indicator_id'
},
'TableName': 'CAS'
}
},
{
'Put': {
'Item': {
'indicator_id': {
'S': 'insert_indicator_2'
}
},
'TableName': 'CAS'
}
}
]
)
In the documentation (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Client.transact_write_items) even though all operations are mentioned in the same dict, but only for reference and should be consider as Check or Put or etc
The operations should be an array(list) of such dicts
The problem was with the syntax.
The correct syntax is :
response = dynamo_client.transact_write_items(
TransactItems=[
{
'ConditionCheck': {
'Key': {
'indicator_id': {
'S': 'indicator_1'
}
},
'ConditionExpression': 'attribute_exists(#indicator_id)',
'ExpressionAttributeNames': {
'#indicator_id': 'indicator_id'
},
'TableName': 'CAS'
}
},
{
'Put': {
'Key': {
'indicator_id': {
'S': 'update_indicator_1'
}
},
'TableName': 'CAS'
}
}
]
)
Suppose within my Elasticsearch I have a field 'ListNames' that provides a list of dictionaries. One of the keys within each dictionary is 'People'. My goal is to Query/Filter from ES all relevant profiles where 'ListNames.People' contains 'Adam' and contains a name that is NOT 'Adam'. Without a verbose list of all possible Names (since there are many), how could I achieve this? Thank you for any help in advance.
The below Code shows examples of post's I have tried
#Note: this returns profiles with ONLY Adam contained in the ListNames.
post_data = {
"size": 30,
"query": {
'match':{
'ListNames.People':'Adam'
}
}
}
#################
post_data = {
"size": 30,
"query": {
'bool': {
'should': [{
'match': {
'ListNames.People': 'Adam'
}
}],
'must_not':[
{'match':{'ListNames.People':'Adam'}}
]
}
}
}
###################
post_data = {
"size": 30,
"query": {
'bool': {
'must': [{
'match': {
'ListNames.People': 'Adam'
}
}],
'must_not':[
{'match':{'ListNames.People':'Adam'}}
]
}
}
}
The first post returns results only containing Adam, which is not desired, and the other two return empty.
Update after discussion in comments
You have to use painless to check such condition. Please note that using script can have performance degradation.
The query will be:
{
"query": {
"bool": {
"filter": [
{
"term": {
"ListNames.People": "Adam"
}
},
{
"script": {
"script": {
"source": "for(int i = 0; i < doc['ListNames.People'].length; i++) { if(doc['ListNames.People'][i] != params.person) { return true; }} return false;",
"lang": "painless",
"params": {
"person": "Adam"
}
}
}
}
]
}
}
}
I'm using Elastic search with Python. I can't find a way to make insensitive search with accents.
For example:
I have two words. "Camión" and "Camion".
When a user search for "camion" I'd like the two results show up.
Creating index:
es = Elasticsearch([{u'host': u'127.0.0.1', u'port': b'9200'}])
es.indices.create(index='name', ignore=400)
es.index(
index="name",
doc_type="producto",
id=p.pk,
body={
'title': p.titulo,
'slug': p.slug,
'summary': p.summary,
'description': p.description,
'image': foto,
'price': p.price,
'wholesale_price': p.wholesale_price,
'reference': p.reference,
'ean13': p.ean13,
'rating': p.rating,
'quantity': p.quantity,
'discount': p.discount,
'sales': p.sales,
'active': p.active,
'encilleria': p.encilleria,
'brand': marca,
'brand_title': marca_titulo,
'sellos': sellos_str,
'certificados': certificados_str,
'attr_naturales': attr_naturales_str,
'soluciones': soluciones_str,
'categories': categories_str,
'delivery': p.delivery,
'stock': p.stock,
'consejos': p.consejos,
'ingredientes': p.ingredientes,
'es_pack': p.es_pack,
'temp': p.temp,
'relevancia': p.relevancia,
'descontinuado': p.descontinuado,
}
Search:
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': '127.0.0.1', 'port': '9200'}])
resul = es.search(
index="name",
body={
"query": {
"query_string": {
"query": "(title:" + search + " OR description:" + search + " OR summary:" + search + ") AND (active:true)",
"analyze_wildcard": False
}
},
"size": "9999",
}
)
print resul
I've searched on Google, Stackoverflow and elastic.co but I didn't find anything that works.
You need to change the mapping of those fields you have in the query. Changing the mapping requires re-indexing so that the fields will be analyzed differently and the query will work.
Basically, you need something like the following below. The field called text is just an example. You need to apply the same settings for other fields as well. Note that I used fields in there so that the root field will maintain the original text analyzed by default, while text.folded will remove the accented characters and will make it possible for your query to work. I have also changed the query a bit so that you search both versions of that field (camion will match, but also camión).
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"text": {
"type": "string",
"fields": {
"folded": {
"type": "string",
"analyzer": "folding"
}
}
}
}
}
}
}
And the query:
"query": {
"query_string": {
"query": "\\*.folded:camion"
}
}
Also, I strongly suggest reading this section of the documentation: https://www.elastic.co/guide/en/elasticsearch/guide/current/asciifolding-token-filter.html
I am using elasticsearch where the query is to be posted in json and should be in standard order or else the result will be wrong. the problem is that the python is changing my json ordering. my original json query is.
x= {
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*a*"
}
},
"filter": {
"and": {
"filters": [
{
"term": {
"city": "london"
}
},
{
"term": {
"industry.industry_not_analyed": "oil"
}
}
]
}
}
}
},
"facets": {
"industry": {
"terms": {
"field": "industry.industry_not_analyed"
}
},
"city": {
"terms": {
"field": "city.city_not_analyzed"
}
}
}
}
but the resulting python object is as follow.
{
'query': {
'filtered': {
'filter': {
'and': {
'filters': [
{
'term': {
'city': 'london'
}
},
{
'term': {
'industry.industry_not_analyed': 'oil'
}
}
]
}
},
'query': {
'query_string': {
'query': '*a*'
}
}
}
},
'facets': {
'city': {
'terms': {
'field': 'city.city_not_analyzed'
}
},
'industry': {
'terms': {
'field': 'industry.industry_not_analyed'
}
}
}
}
the result is different than what I need how do I solve this.
Use OrderedDict() instead of {}. Note that you can't simply use OrderedDict(query=...) because that would create an unordered dict in the background. Use this code instead:
x = OrderedDict()
x['query'] = OrderedDict()
...
I suggest to implement a builder for this:
x = Query().filtered().query_string("*a*").and()....