ElasticSearch / Python dynamic number of filters - python

I'm pretty new to programing so my question might be stupid/easy to do but:
i need to create multiple filters in elasticsearch based on user input
my body of query:
body = {
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{"term": {name1: value1}},
{"term": {name2: value2}},
{"term": {name3: value3}},
]
}
}
}
},
}
And it works fine but i need to have dynamic number of these filters
I tried to build query into string and then add filters inside but es dont allow it eg:
l = []
for i_type, name in convert.items():
string = '{"term": {"' + i_type + '":"' + name + '"}},'
l.append(string)
i_query = ''.join(l)
when i use list/string in query structure im getting 404 errors from server
Is it even possible to add dynamic number of filters?

It is possible. The body is just a Python dictionary. So you can add dynamically your fields/terms/new filters and so on.
body = {
"query": {
"filtered": {
"filter": {
"bool": {
"must": []
}
}
}
}
}
d = {"name_1": value_1, "name_2": value_2}
Python 2.x
for key, value in d.iteritems():
body1["query"]["filtered"]["filter"]["bool"]["must"].append({"term": {key: value}})
Or shorter (Python 2.x):
body1["query"]["filtered"]["filter"]["bool"]["must"].extend([{"term": {key: value}} for key,value in d.iteritems()])
Python 3.x
for key, value in d.items():
body1["query"]["filtered"]["filter"]["bool"]["must"].append({"term": {key: value}})
The shorter version for Python 3.x:
body1["query"]["filtered"]["filter"]["bool"]["must"].extend([{"term": {key: value}} for key,value in d.items()])
Basically, you can create whatever query you want. For example, you can easily add the should clause:
body["query"]["filtered"]["filter"]["bool"]["should"]=[{"term": {"name_42": value_42}}]

Related

How to ignore key in nested JSON (because it has variable name)

I have following JSON, where element "Key_with_variable_name" has variable name compared to other keys:
{
"Key1": {
"Key11": {
"Key111": "Value111",
"Key112": "Value112",
"Key113": "Value113",
"Key114": {
"Key1141": {
"Key_with_variable_name": {
"Key114111": {
"Key1141111": "Value1141111",
"Key1141112": "Value1141112",
"Key1141113": "Value1141113"
}
}
}
}
}
}
}
How can I access Key114111 (get its name) which is "under" Key_with_variable_name and then access nested Key1141111 and it's value Value1141111? And can I get value of Key_with_variable_name?
I used json_message = json.loads(jsonObject) and worked on JSON as nested dictionary, but struggled to get down deep past "Key_with_variable_name"
Or maybe it can be done easier working on JSON not on dictionary?

How to update/change both keys and values separately (not dedicated key-value pair) in a deeply nested JSON in python 3.x

I have a JSON file where I need to replace the UUID and update it with another one. I'm having trouble replacing the deeply nested keys and values.
Below is my JSON file that I need to read in python, replace the keys and values and update the file.
JSON file - myfile.json
{
"name": "Shipping box"
"company":"Detla shipping"
"description":"---"
"details" : {
"boxes":[
{
"box_name":"alpha",
"id":"a3954710-5075-4f52-8eb4-1137be51bf14"
},
{
"box_name":"beta",
"id":"31be3763-3d63-4e70-a9b6-d197b5cb6929"
} ​
​ ]
​}
"container": [
"a3954710-5075-4f52-8eb4-1137be51bf14":[],
"31be3763-3d63-4e70-a9b6-d197b5cb6929":[] ​
​]
​"data":[
{
"data_series":[],
"other":50
},
{
"data_series":[],
"other":40
},
{
"data_series":
{
"a3954710-5075-4f52-8eb4-1137be51bf14":
{
{
"dimentions":[2,10,12]
}
},
"31be3763-3d63-4e70-a9b6-d197b5cb6929":
{
{
"dimentions":[3,9,12]
}
}
},
"other":50
}
]
}
I want achieve something like the following-
"details" : {
"boxes":[
{
"box_name":"alpha"
"id":"replace_uuid"
},
}
.
.
.
​ "data":[ {
"data_series":
{
"replace_uuid":
{
{
"dimentions":[2,10,12]
}
}
]
In such a type of deeply nested dictionary, how can we replace all the occurrence of keys and values with another string, here replace_uuid?
I tried with pop() and dotty_dict but I wasn't able to replace the nested list.
I was able to achieve it in the following way-
def uuid_change(): #generate a random uuid
new_uuid = uuid.uuid4()
return str(new_uuid)
dict = json.load(f)
for uid in dict[details][boxes]:
old_id = uid['id']
replace_id = uuid_change()
uid['id'] = replace_id
for i in range(n):
for uid1 in dict['container'][i].keys()
if uid1 == old_id:
dict['container'][i][replace_id]
= dict['container'][i].pop(uid1) #replace the key
for uid2 in dict['data'][2]['data_series'].keys()
if uid2 == old_id:
dict['data'][2]['data_series'][replace_id]
= dict['data'][2]['data_series'].pop(uid2) #replace the key

Elegant way of iterating list of dict python

I have a list of dictionary as below. I need to iterate the list of dictionary and remove the content of the parameters and set as an empty dictionary in sections dictionary.
input = [
{
"category":"Configuration",
"sections":[
{
"section_name":"Global",
"parameters":{
"name":"first",
"age":"second"
}
},
{
"section_name":"Operator",
"parameters":{
"adrress":"first",
"city":"first"
}
}
]
},
{
"category":"Module",
"sections":[
{
"section_name":"Global",
"parameters":{
"name":"first",
"age":"second"
}
}
]
}
]
Expected Output:
[
{
"category":"Configuration",
"sections":[
{
"section_name":"Global",
"parameters":{}
},
{
"section_name":"Operator",
"parameters":{}
}
]
},
{
"category":"Module",
"sections":[
{
"section_name":"Global",
"parameters":{}
}
]
}
]
My current code looks like below:
category_list = []
for categories in input:
sections_list = []
category_name_dict = {"category": categories["category"]}
for sections_dict in categories["sections"]:
section = {}
section["section_name"] = sections_dict['section_name']
section["parameters"] = {}
sections_list.append(section)
category_name_dict["sections"] = sections_list
category_list.append(category_name_dict)
Is there any elegant and more performant way to do compute this logic. Keys such as category, sections, section_name, and parameters are constants.
The easier way is not to rebuild the dictionary without the parameters, just clear it in every section:
for value in values:
for section in value['sections']:
section['parameters'] = {}
Code demo
Elegance is in the eye of the beholder, but rather than creating empty lists and dictionaries then filling them why not do it in one go with a list comprehension:
category_list = [
{
**category,
"sections": [
{
**section,
"parameters": {},
}
for section in category["sections"]
],
}
for category in input
]
This is more efficient and (in my opinion) makes it clearer that the intention is to change a single key.

How to find equal values in different fields in Elasticsearch via Python Query?

I've got values in Elasticsearch (+Kibana) and want to make a Graph, where certain nodes are connected.
My fields are "prev" and "curr" and indicate the "previous" and the "current" page, which the user visited.
E.g:
prev: Main_Page, curr: Donald_Trump
prev: other-internal, curr: El_Bienamado
...
So what I'm trying to do is searching for values, where current is equal to previous, to be able to connect those and visualize via Networkx-Graph in Kibana.
My problem is that I just started yesterday with query-syntax and don't know if this is even possible.
All in all, my goal is to make a graph, where nodes are connected to a chain, e.g:
Main_Page -> Donald_Trump -> Problems_in_Afrika -> etc.
Meaning that somebody visited those pages in a certain order.
What I've tried for now is:
def getPrevList():
previous = []
previousQuery = {
"size": 0,
"aggs": {
"topTerms": {
"terms": {
"field": "prev",
"size": 50000
}
}
}
}
results = es.search(index="wiki", body=previousQuery)["aggregations"]["topTerms"]["buckets"]
for bucket in results:
previous.append({
"prev" : bucket["key"],
"numDocs" : bucket["doc_count"]
})
return previous
prevs=getPrevList()
rowNum = 0;
totalNumReviews=0
for prevDetails in prevs:
rowNum += 1
totalNumDocs += prevDetails["numDocs"]
prevId = prevDetails["prev"]
q = {
"query": {
"bool": {
"must": [
{
"term": {"prev": prevId}
}
]
}
},
"controls": {
"sample_size": 10000,
"use_significance": True
},
"vertices": [
{
"field": "curr",
"size": VERTEX_SIZE,
"min_doc_count": 1
},
{
"field": "prev",
"size": VERTEX_SIZE,
"min_doc_count": 1
}
],
"connections": {
"query": {
"match_all": {}
}
}
}
At the end, I'm doing the following:
results = es.transport.perform_request('POST', "/wiki/_xpack/_graph/_explore", body=q)
# Use NetworkX to create a graph of prevs and currs we can analyze
G = nx.Graph()
for node in results["vertices"]:
G.add_node(nodeId(node), type=node["field"])
for edge in results["connections"]:
n1 = results["vertices"][int(edge["source"])]
n2 = results["vertices"][int(edge["target"])]
G.add_edge(nodeId(n1), nodeId(n2))
I copied it from another example, which worked well, but I can see that the "connections" are important to be able to connect the vertices.
As far as I understood, I need the query to find the correct "prev" field.
The controls are not significant for now.
And here comes the complex part for me: What am I writing in the vertices and connections part? Is it correct that I defined vertices as the prev and curr fields?
And in the connections-query: for now I defined "match_all", but this is obviously not correct. I need a query, where I can "match" those, where prev equals curr and connect them.. but HOW??
Any hint is appreciated!
Thank you in forward.
EDIT:
Like #Lupanoide suggested, I altered the code and have now two visualizations:
the first one is the first suggested solution and it gives me this graph (part of it) (matplotlib, not Kibana yet):
The second solution looks more crazy and is more likely to be the correct one, but I need to visualize it in Kibana first:
So the new end of my script is now:
gq = json.dumps(q)
workspaceID ="/f44c95c0-223d-11e9-b49e-bb0f8e1e7bae" # my v6.4.0 workspace
workspaceUrl = "graph#/workspace/"+workspaceID+"?query=" + urllib.quote_plus(gq)
doc = {
"url": workspaceUrl
}
res = es.index(index=connectionsIndexName, doc_type='task', id=0, body=doc)
My only problem now is that when I'm using Kibana to open the URL, I do not see the graph. Instead I get the "new Graph" page.
EDIT2
Okay, I send the query, but of course the query alone is not enough. I need to pass the graph and its connections, right? Is it possible?
Thank you very much!
EDIT:
For your use case you need find all the values for field curr with the same prev value. So you need to groupBy all the pages that are clicked after a certain page. You can do that with terms aggregation .
You need to build a query that on one hand returns, with a term aggregation, all the values for the prev field and then you aggregate against over all the curr values generated:
def getOccurrencyDict():
body = {
"size": 0,
"aggs": {
"getAllThePrevs": {
"terms": {
"field": "prev",
"size": 40000
},
"aggs": {
"getAllTheCurr": {
"terms": {
"field": "curr",
"size": 40000
}
}
}
}
}
}
result = es.search(index="my_index", doc_type="mydoctype", body=body)
Then you have to build a data structure that the class Graph() of Networkx library accepts. So you should build a dict of list and then pass that var to the fromdictoflist method:
dict2Graph = dict()
for res in result["aggregations"]["getAllThePrevs"]["buckets"]:
dict2Graph[ res["key"] ] = list() #you create a dict of list with a prev value key
dict2Graph[ res["key"] ].append(res["getAllTheCurr"]["buckets"]) # you append a list of dict composed by key `key` with the `curr` value, and key `doc_count` with the number of occurrence of the term `curr` before the term prev
Now you pass it to the networkx ingestion method:
G=nx.from_dict_of_lists(dict2Graph)
I have not tested the networkx ingestion, so if it doesn't work , it is because we passed a dict of list of dict inside it and not a dict of list, so you should change a little bit how you build your dict2Graph dict
If the aggregation of aggregation query is too slow you should use prtition. Please read here how you could reach partition aggregation in elastic
EDIT:
after a reading of the networkX documentation, you could do also in this way, without creating the intermediate data structure:
from elasticsearch import Elasticsearch
from elasticsearch.client.graph import GraphClient
es = Elasticsearch()
graph_client = GraphClient(es)
def createGraphInKibana(prev):
q = {
"query": {
"bool": {
"must": [
{
"term": {"prev": prev}
}
]
}
},
"controls": {
"sample_size": 10000,
"use_significance": True
},
"vertices": [
{
"field": "curr",
"size": VERTEX_SIZE,
"min_doc_count": 1
},
{
"field": "prev",
"size": VERTEX_SIZE,
"min_doc_count": 1
}
],
"connections": {
"query": {
"match_all": {}
}
}
}
graph_client.explore(index="your_index", doc_type="your_doc_type", body=q)
G = nx.Graph()
for prev in result["aggregations"]["getAllThePrevs"]["buckets"]:
createGraphInKibana(prev['key'])
for curr in prev["getAllTheCurr"]["buckets"]:
G.add_edge(prev["key"], curr["key"], weight=curr["doc_count"])

How to get values of keys for changing Json

I am using python2.7
I have a json i pull that is always changing when i request it.
I need to pull out Animal_Target_DisplayName under Term7 Under Relation6 in my dict.
The problem is sometimes the object Relation6 is in another part of the Json, it could be leveled deeper or in another order.
I am trying to create code that can just export the values of the key Animal_Target_DisplayName but nothing is working. It wont even loop down the nested dict.
Now this can work if i just pull it out using something like ['view']['Term0'][0]['Relation6'] but remember the JSON is never returned in the same structure.
Code i am using to get the values of the key Animal_Target_DisplayName but it doesnt seem to loop through my dict and find all the values with that key.
array = []
for d in dict.values():
row = d['Animal_Target_DisplayName']
array.append(row)
JSON Below:
dict = {
"view":{
"Term0":[
{
"Id":"b0987b91-af12-4fe3-a56f-152ac7a4d84d",
"DisplayName":"Dog",
"FullName":"Dog",
"AssetType1":[
{
"AssetType_Id":"00000000-0000-0000-0000-000000031131",
}
]
},
{
"Id":"ee74a59d-fb74-4052-97ba-9752154f015d",
"DisplayName":"Dog2",
"FullName":"Dog",
"AssetType1":[
{
"AssetType_Id":"00000000-0000-0000-0000-000000031131",
}
]
},
{
"Id":"eb548eae-da6f-41e8-80ea-7e9984f56af6",
"DisplayName":"Dog3",
"FullName":"Dog3",
"AssetType1":[
{
"AssetType_Id":"00000000-0000-0000-0000-000000031131",
}
]
},
{
"Id":"cfac6dd4-0efa-4417-a2bf-0333204f8a42",
"DisplayName":"Animal Set",
"FullName":"Animal Set",
"AssetType1":[
{
"AssetType_Id":"00000000-0000-0000-0001-000400000001",
}
],
"StringAttribute2":[
{
"StringAttribute_00000000-0000-0000-0000-000000003114_Id":"00a701a8-be4c-4b76-a6e5-3b0a4085bcc8",
"StringAttribute_00000000-0000-0000-0000-000000003114_Value":"Desc"
}
],
"StringAttribute3":[
{
"StringAttribute_00000000-0000-0000-0000-000000000262_Id":"a81adfb4-7528-4673-8c95-953888f3b43a",
"StringAttribute_00000000-0000-0000-0000-000000000262_Value":"meow"
}
],
"BooleanAttribute4":[
{
"BooleanAttribute_00000000-0000-0000-0001-000500000001_Id":"932c5f97-c03f-4a1a-a0c5-a518f5edef5e",
"BooleanAttribute_00000000-0000-0000-0001-000500000001_Value":"true"
}
],
"SingleValueListAttribute5":[
{
"SingleValueListAttribute_00000000-0000-0000-0001-000500000031_Id":"ef51dedd-6f25-4408-99a6-5a6cfa13e198",
"SingleValueListAttribute_00000000-0000-0000-0001-000500000031_Value":"Blah"
}
],
"Relation6":[
{
"Animal_Id":"2715ca09-3ced-4b74-a418-cef4a95dddf1",
"Term7":[
{
"Animal_Target_Id":"88fd0090-4ea8-4ae6-b7f0-1b13e5cf3d74",
"Animal_Target_DisplayName":"Animaltheater",
"Animal_Target_FullName":"Animaltheater"
}
]
},
{
"Animal_Id":"6068fe78-fc8e-4542-9aee-7b4b68760dcd",
"Term7":[
{
"Animal_Target_Id":"4e87a614-2a8b-46c0-90f3-8a0cf9bda66c",
"Animal_Target_DisplayName":"Animaltitle",
"Animal_Target_FullName":"Animaltitle"
}
]
},
{
"Animal_Id":"754ec0e6-19b6-4b6b-8ba1-573393268257",
"Term7":[
{
"Animal_Target_Id":"a8986ed5-3ec8-44f3-954c-71cacb280ace",
"Animal_Target_DisplayName":"Animalcustomer",
"Animal_Target_FullName":"Animalcustomer"
}
]
},
{
"Animal_Id":"86b3ffd1-4d54-4a98-b25b-369060651bd6",
"Term7":[
{
"Animal_Target_Id":"89d02067-ebe8-4b87-9a1f-a6a0bdd40ec4",
"Animal_Target_DisplayName":"Animalfact_transaction",
"Animal_Target_FullName":"Animalfact_transaction"
}
]
},
{
"Animal_Id":"ea2e1b76-f8bc-46d9-8ebc-44ffdd60f213",
"Term7":[
{
"Animal_Target_Id":"e398cd32-1e73-46bd-8b8f-d039986d6de0",
"Animal_Target_DisplayName":"Animalfact_transaction",
"Animal_Target_FullName":"Animalfact_transaction"
}
]
}
],
"Relation10":[
{
"TargetRelation_b8b178ff-e957-47db-a4e7-6e5b789d6f03_Id":"aff80bd0-a282-4cf5-bdcc-2bad35ddec1d",
"Term11":[
{
"AnimalId":"3ac22167-eb91-469a-9d94-315aa301f55a",
"AnimalDisplayName":"Animal",
"AnimalFullName":"Animal"
}
]
}
],
"Tag12":[
{
"Tag_Id":"75968ea6-4c9f-43c9-80f7-dfc41b24ec8f",
"Tag_Name":"AnimalAnimaltitle"
},
{
"Tag_Id":"b1adbc00-aeef-415b-82b6-a3159145c60d",
"Tag_Name":"Animal2"
},
{
"Tag_Id":"5f78e4dc-2b37-41e0-a0d3-cec773af2397",
"Tag_Name":"AnimalDisplayName"
}
]
}
]
}
}
The output i am trying to get is a list of all the values from key Animal_Target_DisplayName like this ['Animaltheater','Animaltitle', 'Animalcustomer', 'Animalfact_transaction', 'Animalfact_transaction'] but we need to remember the nested structure of this json always changes but the keys for it are always the same.
I guess your only option is running through the entire dict and get the values of Animal_Target_DisplayName key, I propose the following recursive solution:
def run_json(dict_):
animal_target_sons = []
if type(dict_) is list:
for element in dict_:
animal_target_sons.append(run_json(element))
elif type(dict_) is dict:
for key in dict_:
if key=="Animal_Target_DisplayName":
animal_target_sons.append([dict_[key]])
else:
animal_target_sons.append(run_json(dict_[key]))
return [x for sublist in animal_target_sons for x in sublist]
run_json(dict_)
Then calling run_json returns a list with what you want. By the way, I recommend you to rename your json from dict to, for example dict_, since dict is a reserved word of Python for the dictionary type.
Since you're getting JSON, why not make use of the json module? That will do the parsing for you and allow you to use dictionary functions+features to get the information you need.
#!/usr/bin/python2.7
from __future__ import print_function
import json
# _somehow_ get your JSON in as a string. I'm calling it "jstr" for this
# example.
# Use the module to parse it
jdict = json.loads(jstr)
# our dict has keys...
# view -> Term0 -> keys-we're-interested-in
templist = jdict["view"]["Term0"]
results = {}
for _el in range(len(templist)):
if templist[_el]["FullName"] == "Animal Set":
# this is the one we're interested in - and it's another list
moretemp = templist[_el]["Relation6"]
for _k in range(len(moretemp)):
term7 = moretemp[_k]["Term7"][0]
displayName = term7["Animal_Target_DisplayName"]
fullName = term7["Animal_Target_FullName"]
results[fullName] = displayName
print("{0}".format(results))
Then you can dump the results dict plain, or with pretty-printing:
>>> print(json.dumps(results, indent=4))
{
"Animaltitle2": "Animaltitle2",
"Animalcustomer3": "Animalcustomer3",
"Animalfact_transaction4": "Animalfact_transaction4",
"Animaltheater1": "Animaltheater1"
}

Categories