create new json data from values in a dictionary and nested dictionary - python

i have 2 dictionaries and i wish to create a new json data with values from both dictionaries as follows.
dic_a = [{'name': 'puskas',
'description': 'puskas is the command center for football',
'size': '251-1K',
'revenue': '$50M-$100M',
'industryTags': ['football federation']}]
dic_b = {'page': 1,
'total': 14,
'results': [{'id': 'i01',
'name': {'fullName': 'luka modric',
'givenName': 'luka',
'familyName': 'modric'},
'role': 'leadership',
'subRole': 'ceo',
'title': 'CEO',
'company': {'name': 'puskas'},
'email': 'luka#puskas.com',
'verified': True},
{'id': 'i02',
'name': {'fullName': 'gucci mane',
'givenName': 'gucci',
'familyName': 'mane'},
'role': 'leadership',
'subRole': 'founder',
'title': 'Co-founder, CTO',
'company': {'name': 'puskas'},
'email': 'gucchi.mane#puskas.com',
'verified': True},
{'id': 'i03',
'name': {'fullName': 'tom ford',
'givenName': 'tom',
'familyName': 'ford'},
'role': 'leadership',
'subRole': 'founder',
'title': 'founder',
'company': {'name': 'puskas'},
'email': 'tomford#puskas.com',
'verified': True}]}
i want to take select values from b, append to a and then convert to json and return as c.
i have tried a few codes off of some syntax i researched here but it don’t work. i am expecting the json result to look like this
json_c = [{'name': 'puskas',
'description': 'puskas is the command center for football',
'size': '251-1K',
'revenue': '$50M-$100M',
'industryTags': ['football federation'],
'leads': [{'id': 'i01',
'name': 'luka modric',
'title': 'CEO',
'company': {'name': 'puskas'},
'email': 'luka#puskas.co',
'verified': True},
{'id': 'i02',
'name': 'gucci mane',
'title': 'Co-founder, CTO',
'company': {'name': 'gucci'},
'email': 'gucchi.mane#gucci.com',
'verified': True},
{'id': 'i03',
'name': 'tom ford',
'title': 'founder',
'company': {'name': 'xyz'},
'email': 'tomford#xyz.co',
'verified': True}]}]

such problems can be solved easily with jmespath
import jmespath
import json
c = dic_a
c[0]['leads'] = jmespath.search('results[].{id:id, name:name.fullName,title:title ,company:company,email:email,verified:verified }',dic_b)
json_string = json.dumps(c, indent=4, ensure_ascii=False)
print(json_string)
# [
# {
# "name": "puskas",
# "description": "puskas is the command center for football",
# "size": "251-1K",
# "revenue": "$50M-$100M",
# "industryTags": [
# "football federation"
# ],
# "leads": [
# {
# "id": "i01",
# "name": "luka modric",
# "title": "CEO",
# "company": {
# "name": "puskas"
# },
# "email": "luka#puskas.com",
# "verified": true
# },
# {
# "id": "i02",
# "name": "gucci mane",
# "title": "Co-founder, CTO",
# "company": {
# "name": "puskas"
# },
# "email": "gucchi.mane#puskas.com",
# "verified": true
# },
# {
# "id": "i03",
# "name": "tom ford",
# "title": "founder",
# "company": {
# "name": "puskas"
# },
# "email": "tomford#puskas.com",
# "verified": true
# }
# ]
# }
# ]

Related

Return specific values from nested Elastic Object

I have to preface this with the fact that I'm working with Elasticsearch module, which returns elastic_transport.ObjectApiResponse. My problem is that I need to select specific keys from this json/dictionary looking log. The indices come from different sources, and thus contain different key/value pairs. They values I need to select are ip, port, username, rule_name, severity, and risk_score. The problem is that they have different key names and each dictionary is vastly different from the other, but they all contain those values. After that, I'll throw them into a Pandas dataframe and create a table with those values. Should a value be missing, I'll fill them with a '-'.
So my question is how I can iterate over these nested objects that are neither ordered nor standardized? Any help is appreciated. Below is a sample of the data.
{
'took': 11,
'timed_out': False,
'_shards': {
'total': 17,
'successful': 17,
'skipped': 0, 'failed': 0
},
'hits': {
'total': {'value': 58, 'relation': 'eq'},
'max_score': 0.0,
'hits': [
{
'_index': '.siem-signals-default-000017',
'_type': '_doc',
'_id': 'abcd1234',
'_score': 0.0,
'_source': {
'#timestamp': '2023-02-09T15:24:09.368Z',
'process': {'pid': 668, 'executable': 'C:\\Windows\\System32\\lsass.exe', 'name': 'lsass.exe'},
'ecs': {'version': '1.10.0'},
'winlog': {
'computer_name': 'SRVDC1',
'User': 'John.Smith',
'api': 'wineventlog',
'keywords': ['Audit Failure']
},
'source':{'domain': 'SRVDC1', 'ip': '10.17.13.118', 'port': 42548}}
'rule': {'id': 'aaabbb', 'actions': [], 'interval': '2m', 'name': 'More Than 3 Failed Login Attempts Within 1 Hour '}
},
{
'_index': '.siem-signals-default-000017',
'_type': '_doc',
'_id': 'abc123',
'_score': 0.0,
'_source': {
'#timestamp': '2023-02-09T15:24:09.369Z',
'log': {'level': 'information'},
'user': {
'id': 'S-1-0-0',
'name': 'John.Smith',
'domain': 'ACME'
},
'related': {
'port': '42554',
'ip': '10.17.13.118'
},
'logon': {'id': '0x3e7', 'type': 'Network', 'failure': {'sub_status': 'User logon with misspelled or bad password'}},
'meta': {'risk_score': 46, 'severity': 'medium'}}},
{
'_index': '.siem-signals-default-000017',
'_type': '_doc',
'_id': 'zzzzz',
'_score': 0.0,
'_source': {
'source': {
'port': '56489',
'ip': '10.18.13.101'
},
'observer': {
'type': 'firewall',
'name': 'pfSense',
'serial_number': 'xoxo',
'product': 'Supermicro',
'ip': '10.7.3.253'
},
'process': {'name': 'filterlog', 'pid': '45005'},
'tags': ['firewall', 'IP_Private_Source', 'IP_Private_Destination'],
'destination': {'service': 'microsoft-ds', 'port': '445', 'ip': '10.250.0.64'},
'log': {'risk_score': 73, 'severity': 'high'},
'rule':{'name': 'Logstash Firewall (NetBIOS and SMB Vulnerability)'}}}]}}
Expected Output
The sample below is possible only when the logs have the same standard structure.

Sort and return all of nested dictionaries based on specified key value

I am trying to re-arrange the contents of a nested dictionaries where it will check the value of a specified key.
dict_entries = {
'entries': {
'AzP746r3Nl': {
'uniqueID': 'AzP746r3Nl',
'index': 2,
'data': {'comment': 'First Plastique Mat.',
'created': '17/01/19 10:18',
'project': 'EMZ',
'name': 'plastique_varA',
'version': '1'},
'name': 'plastique_varA',
'text': 'plastique test',
'thumbnail': '/Desktop/mat/plastique_varA/plastique_varA.jpg',
'type': 'matEntry'
},
'Q2tch2xm6h': {
'uniqueID': 'Q2tch2xm6h',
'index': 0,
'data': {'comment': 'Camino from John Inds.',
'created': '03/01/19 12:08',
'project': 'EMZ',
'name': 'camino_H10a',
'version': '1'},
'name': 'camino_H10a',
'text': 'John Inds : Camino',
'thumbnail': '/Desktop/chips/camino_H10a/camino_H10a.jpg',
'type': 'ChipEntry'
},
'ZeqCFCmHqp': {
'uniqueID': 'ZeqCFCmHqp',
'index': 1,
'data': {'comment': 'Prototype Bleu.',
'created': '03/01/19 14:07',
'project': 'EMZ',
'name': 'bleu_P23y',
'version': '1'},
'name': 'bleu_P23y',
'text': 'Bleu : Prototype',
'thumbnail': '/Desktop/chips/bleu_P23y/bleu_P23y.jpg',
'type': 'ChipEntry'
}
}
}
In my above nested dictionary example, I am trying to check it by the name and created key (2 functions each) and once it has been sorted, the index value will be updated accordingly as well...
Even so, I am able to query for the values of the said key(s):
for item in dict_entries.get('entries').values():
#The key that I am targetting at
tar_key = item['name']
but this is returning me the value of the name key and I am unsure on my next step as I am trying to sort by the value of the name key and capturing + re-arranging all the contents of the nested dictionaries.
This is my desired output (if checking by name):
{'entries': {
'ZeqCFCmHqp': {
'uniqueID': 'ZeqCFCmHqp',
'index': 1,
'data': {'comment': 'Prototype Bleu.',
'created': '03/01/19 14:07',
'project': 'EMZ',
'name': 'bleu_P23y',
'version': '1'},
'name': 'bleu_P23y',
'text': 'Bleu : Prototype',
'thumbnail': '/Desktop/chips/bleu_P23y/bleu_P23y.jpg',
'type': 'ChipEntry'
}
'Q2tch2xm6h': {
'uniqueID': 'Q2tch2xm6h',
'index': 0,
'data': {'comment': 'Camino from John Inds.',
'created': '03/01/19 12:08',
'project': 'EMZ',
'name': 'camino_H10a',
'version': '1'},
'name': 'camino_H10a',
'text': 'John Inds : Camino',
'thumbnail': '/Desktop/chips/camino_H10a/camino_H10a.jpg',
'type': 'ChipEntry'
},
'AzP746r3Nl': {
'uniqueID': 'AzP746r3Nl',
'index': 2,
'data': {'comment': 'First Plastique Mat.',
'created': '17/01/19 10:18',
'project': 'EMZ',
'name': 'plastique_varA',
'version': '1'},
'name': 'plastique_varA',
'text': 'plastique test',
'thumbnail': '/Desktop/mat/plastique_varA/plastique_varA.jpg',
'type': 'matEntry'
}
}
}

Loop through multidimensional JSON (Python)

I have a JSON with following structure:
{
'count': 93,
'apps' : [
{
'last_modified_at': '2016-10-21T12:20:26Z',
'frequency_caps': [],
'ios': {
'enabled': True,
'push_enabled': False,
'app_store_id': 'bbb',
'connection_type': 'certificate',
'sdk_api_secret': '--'
},
'organization_id': '--',
'name': '---',
'app_id': 27,
'control_group_percentage': 0,
'created_by': {
'user_id': 'abc',
'user_name': 'def'
},
'created_at': '2016-09-28T11:41:24Z',
'web': {}
}, {
'last_modified_at': '2016-10-12T08:58:57Z',
'frequency_caps': [],
'ios': {
'enabled': True,
'push_enabled': True,
'app_store_id': '386304604',
'connection_type': 'certificate',
'sdk_api_secret': '---',
'push_expiry': '2018-01-14T08:24:09Z'
},
'organization_id': '---',
'name': '---',
'app_id': 87,
'control_group_percentage': 0,
'created_by': {
'user_id': '----',
'user_name': '---'
},
'created_at': '2016-10-12T08:58:57Z',
'web': {}
}
]
}
It's a JSON with two key-value-pairs. The second pair's value is a List of more JSON's.
For me it is too much information and I want to have a JSON like this:
{
'apps' : [
{
'name': 'Appname',
'app_id' : 1234,
'organization_id' : 'Blablabla'
},
{
'name': 'Appname2',
'app_id' : 5678,
'organization_id' : 'Some other Organization'
}
]
}
I want to have a JSON that only contains one key ("apps") and its value, which would be a List of more JSONs that only have three key-value-pairs..
I am thankful for any advice.
Thank you for your help!
#bishakh-ghosh I don't think you need to use the input json as string. It can be used straight as a dictionary. (thus avoid ast)
One more concise way :
# your original json
input_ = { 'count': 93, ... }
And here are the steps :
Define what keys you want to keep
slice_keys = ['name', 'app_id', 'organization_id']
Define the new dictionary as a slice on the slice_keys
dict(apps=[{key:value for key,value in d.items() if key in slice_keys} for d in input_['apps']])
And that's it.
That should yield the JSON formatted as you want, e.g
{
'apps':
[
{'app_id': 27, 'name': '---', 'organization_id': '--'},
{'app_id': 87, 'name': '---', 'organization_id': '---'}
]
}
This might be what you are looking for:
import ast
import json
json_str = """{
'count': 93,
'apps' : [
{
'last_modified_at': '2016-10-21T12:20:26Z',
'frequency_caps': [],
'ios': {
'enabled': True,
'push_enabled': False,
'app_store_id': 'bbb',
'connection_type': 'certificate',
'sdk_api_secret': '--'
},
'organization_id': '--',
'name': '---',
'app_id': 27,
'control_group_percentage': 0,
'created_by': {
'user_id': 'abc',
'user_name': 'def'
},
'created_at': '2016-09-28T11:41:24Z',
'web': {}
}, {
'last_modified_at': '2016-10-12T08:58:57Z',
'frequency_caps': [],
'ios': {
'enabled': True,
'push_enabled': True,
'app_store_id': '386304604',
'connection_type': 'certificate',
'sdk_api_secret': '---',
'push_expiry': '2018-01-14T08:24:09Z'
},
'organization_id': '---',
'name': '---',
'app_id': 87,
'control_group_percentage': 0,
'created_by': {
'user_id': '----',
'user_name': '---'
},
'created_at': '2016-10-12T08:58:57Z',
'web': {}
}
]
}"""
json_dict = ast.literal_eval(json_str)
new_dict = {}
app_list = []
for appdata in json_dict['apps']:
appdata_dict = {}
appdata_dict['name'] = appdata['name']
appdata_dict['app_id'] = appdata['app_id']
appdata_dict['organization_id'] = appdata['organization_id']
app_list.append(appdata_dict)
new_dict['apps'] = app_list
new_json_str = json.dumps(new_dict)
print(new_json_str) # This is your resulting json string

Mongo Distinct Query with full row object

first of all i'm new to mongo so I don't know much and i cannot just remove duplicate rows due to some dependencies.
I have following data stored in mongo
{'id': 1, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 2, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 3, 'key': 'pehnvosjijipehnvosjijipehnvosjijipehnvosjijipehnvosjiji', 'name': 'some name', 'country': 'IN'},
{'id': 4, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'},
{'id': 5, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'}
you can see some of the rows are duplicate with different id
as long as it will take to solve this issue from input I must tackle it on output.
I need the data in the following way:
{'id': 1, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 3, 'key': 'pehnvosjijipehnvosjijipehnvosjijipehnvosjijipehnvosjiji', 'name': 'some name', 'country': 'IN'},
{'id': 4, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'}
My query
keys = db.collection.distinct('key', {})
all_data = db.collection.find({'key': {$in: keys}})
As you can see it takes two queries for a same result set Please combine it to one as the database is very large
I might also create a unique key on the key but the value is so long (152 characters) that it will not help me.
Or it will??
You need to use the aggregation framework for this. There are multiple ways to do this, the solution below uses the $$ROOT variable to get the first document for each group:
db.data.aggregate([{
"$sort": {
"_id": 1
}
}, {
"$group": {
"_id": "$key",
"first": {
"$first": "$$ROOT"
}
}
}, {
"$project": {
"_id": 0,
"id":"$first.id",
"key":"$first.key",
"name":"$first.name",
"country":"$first.country"
}
}])

Python - Recursive function to serialise family tree information into JSON

I am having difficulty creating a function that will produce a family tree in JSON format.
An example of a two parent, two offspring tree can be seen here:
{
"children": [
{
"id": 409,
"name": "Joe Bloggs",
"no_parent": "true"
},
{
"children": [
{
"children": [],
"id": 411,
"name": "Alice Bloggs"
},
{
"children": [],
"id": 412,
"name": "John Bloggs"
}
],
"hidden": "true",
"id": "empty_node_id_9",
"name": "",
"no_parent": "true"
},
{
"children": [],
"id": 410,
"name": "Sarah Smith",
"no_parent": "true"
}
],
"hidden": "true",
"id": "year0",
"name": ""
}
Joe Bloggs is married to Sarah Smith, with children Alice Bloggs and John Bloggs. The empty nodes exist purely to handle vertices in the tree-map diagram (see jsfiddle below).
The above example should help explain the syntax. A more complex tree can be found on this jsfiddle: http://jsfiddle.net/cyril123/0vbtvoon/22/
The JSON associated with the jsfiddle can be found from lines 34 to lines 101.
I am having difficulty writing a function that recursively produces the JSON for a family tree. I begin with a person class that represents the oldest member of the family. The function would then checks for marriages, for children etc and continues until the tree is complete, returning the json.
My code involves a person class as well as an associated marriage class. I have appropriate methods such as ids for each person, get_marriage() function, get_children() methods etc. I am wondering the best way to go about this is.
My attempt at a recursive function can be found below. The methods/functions involved etc are not detailed but their purpose should be self-explanatory. Many thanks.
def root_nodes(people, first_node=False): #begin by passing in oldest family member and first_node=True
global obj, current_obj, people_used
if obj is not None: print len(str(obj))
if type(people) != list:
people = [people]
for x in people:
if x in rootPeople and first_node: #handles the beginning of the JSON with an empty 'root' starting node.
first_node = False
obj = {'name': "", 'id': 'year0', 'hidden': 'true', 'children': root_nodes(people)}
return obj
else:
marriage_info = get_marriage(x)
if marriage_info is None: #if person is not married
current_obj = {'name': x.get_name(), 'id': x.get_id(), 'children': []}
people_used.append(x)
else:
partners = marriage_info.get_members()
husband, wife = partners[0].get_name(), partners[1].get_name()
husband_id, wife_id = marriage_info.husband.get_id(), marriage_info.wife.get_id()
marriage_year = marriage_info.year
children = marriage_info.get_children()
people_used.append(partners[0])
people_used.append(partners[1])
if partners[0].get_parents() == ['None', 'None'] or partners[1].get_parents() == ['None', 'None']:
if partners[0].get_parents() == ['None', 'None'] and partners[1].get_parents() == ['None', 'None']:
current_obj = {'name': str(husband), 'id': husband_id, 'no_parent': 'true'}, {'name': '', 'id': 'empty_node_id_' + empty_node(), 'no_parent': 'true', 'hidden': 'true', 'children': root_nodes(children)}, {'name': str(wife), 'id': wife_id, 'no_parent': 'true', 'children': []}
if partners[0].get_parents() == ['None', 'None'] and partners[1].get_parents() != ['None', 'None']:
current_obj = {'name': str(husband), 'id': husband_id, 'no_parent': 'true'}, {'name': '', 'id': 'empty_node_id_' + empty_node(), 'no_parent': 'true', 'hidden': 'true', 'children': root_nodes(children)}, {'name': str(wife), 'id': wife_id, 'children': []}
if partners[0].get_parents() != ['None', 'None'] and partners[1].get_parents() == ['None', 'None']:
current_obj = {'name': str(husband), 'id': husband_id}, {'name': '', 'id': 'empty_node_id_' + empty_node(), 'no_parent': 'true', 'hidden': 'true', 'children': root_nodes(children)}, {'name': str(wife), 'id': wife_id, 'no_parent': 'true', 'children': []}
else:
if not any((True for x in partners[0].get_parents() if x in people_used)):
current_obj = {'name': str(husband), 'id': husband_id, 'no_parent' : 'true'}, {'name': '', 'id': 'empty_node_id_' + empty_node(), 'no_parent': 'true', 'hidden': 'true', 'children': root_nodes(children)}, {'name': str(wife), 'id': wife_id, 'children': []}
elif not any((True for x in partners[1].get_parents() if x in people_used)):
current_obj = {'name': str(husband), 'id': husband_id}, {'name': '', 'id': 'empty_node_id_' + empty_node(), 'no_parent': 'true', 'hidden': 'true', 'children': root_nodes(children)}, {'name': str(wife), 'id': wife_id, 'no_parent': 'true', 'children': []}
else:
current_obj = {'name': str(husband), 'id': husband_id}, {'name': '', 'id': 'empty_node_id_' + empty_node(), 'no_parent': 'true', 'hidden': 'true', 'children': root_nodes(children)}, {'name': str(wife), 'id': wife_id, 'children': []}
return current_obj
if obj is None:
obj = current_obj
else:
obj = obj, current_obj
if people.index(x) == len(people)-1:
return obj
Even though the function above is badly written - it is almost successful. The only instance where it fails is if one child is married, then the other children are missed out from the JSON. This is because obj is returned without going to the next iteration in the for loop. Any suggestions on how to fix this would be appreciated.

Categories