ElasticSearch: Finding documents with field value that is in an array - python

I have some customer documents that I want to be retrieved using ElasticSearch based on where the customers come from (country field is IN an array of countries).
[
{
"name": "A1",
"address": {
"street": "1 Downing Street"
"country": {
"code": "GB",
"name": "United Kingdom"
}
}
},
{
"name": "A2",
"address": {
"street": "25 Gormut Street"
"country": {
"code": "FR",
"name": "France"
}
}
},
{
"name": "A3",
"address": {
"street": "Bonjour Street"
"country": {
"code": "FR",
"name": "France"
}
}
}
]
Now, I have another an array in my Python code:
["DE", "FR", "IT"]
I'd like to obtain the two documents, A2 and A3.
How would I write this in PyES/Query DSL? Am I supposed to be using an ExistsFilter or a TermQuery for this. ExistsFilter seems to only check whether the field exists or not, but doesn't care about the value.

In NoSQL-type document stores, all you get back is the document, not parts of the document.
Your requirement: "I'd like to obtain the two documents, A2 and A3." implies that you need to index each of those documents separately, not as an array inside another "parent" document.
If you need to match values of the parent document alongside country then you need to denormalize your data and store those values from the parent doc inside each sub-doc as well.
Once you've done the above, then the query is easy. I'm assuming that the country field is mapped as:
country: {
type: "string",
index: "not_analyzed"
}
To find docs with DE, you can do:
curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"country" : "DE"
}
}
}
}
}
'
To find docs with either DE or FR:
curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"constant_score" : {
"filter" : {
"terms" : {
"country" : [
"DE",
"FR"
]
}
}
}
}
}
'
To combine the above with some other query terms:
curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"filtered" : {
"filter" : {
"terms" : {
"country" : [
"DE",
"FR"
]
}
},
"query" : {
"text" : {
"address.street" : "bonjour"
}
}
}
}
}
'
Also see this answer for an explanation of how arrays of objects can be tricky, because of the way they are flattened:
Is it possible to sort nested documents in ElasticSearch?

Related

How to query all values of a nested field with Elasticsearch

I would like to query a value in all data packages I have in Elasticsearch.
For example, I have the code :
"website" : "google",
"color" : [
{
"color1" : "red",
"color2" : "blue"
}
]
}
I have this code for an undefined number of website. I want to extract all the "color1" for all the websites I have. How can I do ? I tried with match_all and "size" : 0 but it didn't work.
Thanks a lot !
To be able to query nested object you would need to map them as a nested field first then you can query nested field like this:
GET //my-index-000001/_search
{
"aggs": {
"test": {
"nested": {
"path": "color"
},
"aggs": {
"test2": {
"terms": {
"field": "color.color1"
}
}
}
}
}
}
Your result should look like this for the query:
"aggregations": {
"test": {
"doc_count": 5,
"test2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "red",
"doc_count": 4
},
{
"key": "gray",
"doc_count": 1
}
]
}
}
}
if you check the aggregation result back you will have list of your color1 with number of time it appeared in your documents.
For more information you can check Elasticsearch official documentation about Nested Field here and Nested aggregation here.

Python JSON array left join update

I have a nested JSON array, and a separate second array.
Would like perform the equivalent of a SQL UPDATE using a left join.
In other words, keep all items from the main json, and where the same item (key='order') appears in the secondary one, update/append values in the main.
Can obviously achieve this by looping - but really looking for a more elegant & efficient solution.
Most examples of 'merging' json I've seen involve appending new items, or appending - very little regarding 'updating'.
Any pointers appreciated :)
Main JSON object with nested array 'steps'
{
"manifest_header": {
"name": "test",
},
"steps": [
{
"order": "100",
"value": "some value"
},
{
"order": "200",
"value": "some other value"
}
]
}
JSON Array with values to add
{
"steps": [
{
"order": "200",
"etag": "aaaaabbbbbccccddddeeeeefffffgggg"
}
]
}
Desired Result:
{
"manifest_header": {
"name": "test",
},
"steps": [
{
"order": "100",
"value": "some value"
},
{
"order": "200",
"value": "some other value",
"etag": "aaaaabbbbbccccddddeeeeefffffgggg"
}
]
}

replace nested document array mongodb with python

i have this document in mongodb
{
"_id": {
"$oid": "62644af0368cb0a46d7c2a95"
},
"insertionData": "23/04/2022 19:50:50",
"ipfsMetadata": {
"Name": "data.json",
"Hash": "Qmb3FWgyJHzJA7WCBX1phgkV93GiEQ9UDWUYffDqUCbe7E",
"Size": "431"
},
"metadata": {
"sessionDate": "20220415 17:42:55",
"dataSender": "user345",
"data": {
"height": "180",
"weight": "80"
},
"addtionalInformation": [
{
"name": "poolsize",
"value": "30m"
},
{
"name": "swimStyle",
"value": "mariposa"
},
{
"name": "modality",
"value": "swim"
},
{
"name": "gender-title",
"value": "schoolA"
}
]
},
"fileId": {
"$numberLong": "4"
}
}
I want to update nested array document, for instance the name with gender-tittle. This have value schoolA and i want to change to adult like the body. I give the parameter number of fileId in the post request and in body i pass this
post request : localhost/sessionUpdate/4
and body:
{
"name": "gender-title",
"value": "adultos"
}
flask
#app.route('/sessionUpdate/<string:a>', methods=['PUT'])
def sessionUpdate(a):
datas=request.json
r=str(datas['name'])
r2=str(datas['value'])
print(r,r2)
r3=collection.update_one({'fileId':a, 'metadata.addtionalInformation':r}, {'$set':{'metadata.addtionalInformation.$.value':r2}})
return str(r3),200
i'm getting the 200 but the document don't update with the new value.
As you are using positional operator $ to work with your array, make sure your select query is targeting array element. You can see in below query that it is targeting metadata.addtionalInformation array with the condition that name: "gender-title"
db.collection.update({
"fileId": 4,
"metadata.addtionalInformation.name": "gender-title"
},
{
"$set": {
"metadata.addtionalInformation.$.value": "junior"
}
})
Here is the Mongo playground for your reference.

ElasticSearch: Retrieve field and it's normalization

I want to retrieve a field as well as it's normalized version from Elasticsearch.
Here's my index definition and data
PUT normalizersample
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"refresh_interval": "60s",
"analysis": {
"normalizer": {
"my_normalizer": {
"filter": [
"lowercase",
"german_normalization",
"asciifolding"
],
"type": "custom"
}
}
}
},
"mappings": {
"_source": {
"enabled": true
},
"properties": {
"myField": {
"type": "text",
"store": true,
"fields": {
"keyword": {
"type": "keyword",
"store": true
},
"normalized": {
"type": "keyword",
"store": true,
"normalizer": "my_normalizer"
}
}
}
}
}
}
POST normalizersample/_doc/1
{
"myField": ["Andreas", "Ämdreas", "Anders"]
}
My first approach was to use scripted fields like
GET /myIndex/_search
{
"size": 100,
"query": {
"match_all": {}
},
"script_fields": {
"keyword": {
"script": "doc['myField.keyword']"
},
"normalized": {
"script": "doc['myField.normalized']"
}
}
}
However, since myField is an array, this returns two lists of strings per ES document and each of them are sorted alphabetically. Hence, the corresponding entries might not match to each other due to the normalization.
"hits" : [
{
"_index" : "normalizersample",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"de" : [
"amdreas",
"anders",
"andreas"
],
"keyword" : [
"Anders",
"Andreas",
"Ämdreas"
]
}
}
]
While I would like to retrieve [(Andreas, andreas), (Ämdreas, amdreas) (Anders, anders)] or a similar format where I can match every entry to its normalization.
The only way I found was to call Term Vectors on both fields since they contain a position field, but this seems like a huge overhead to me. (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html)
Is there a simpler way to retrieve tuples with the keyword and the normalized field?
Thanks a lot!

Mongodb: updating fields in embedded document

If I have a document that looks like this:
{
"_id" : 1,
"name": "Homer J. Simpson",
"income" : 45000,
"address": {
"street": "742 Evergreen Terrace",
"city": "Springfield",
"state": "???",
"email": "homer#springfield.com",
"zipcode": "12345",
"country": "USA"
}
}
And want to do an update on some of the fields in the address document (leaving the other ones unchanged), and insert new fields if they do not already exist, such as this:
{
"address": {
"email": "homer#gmail.com",
"zipcode": "77788",
"latitude" : 23.43545,
"longitude" : 123.45553
}
}
Is there a way to do an atomic update all at once, or do you need to loop over the key/values in the new data and do a .update() for each one?
Use dot notation with a $set to target multiple embedded fields in a single update:
{ "$set": {
"address.email": "homer#gmail.com",
"address.zipcode": "77788",
"address.latitude" : 23.43545,
"address.longitude" : 123.45553
} }
As Sergio metioned use a $set.
{address.latitude : "77788"}

Categories