Structure JSON format to a specified data structure - python

Basically I have a list
data_list = [
'__att_names' : [
['id', 'name'], --> "__t_idx": 0
['location', 'address'] --> "__t_idx": 1
['random_key1', 'random_key2'] "__t_idx": 2
['random_key3', 'random_key4'] "__t_idx": 3
]
"__root": {
"comparables": [
"__g_id": "153564396",
"__atts": [
1, --> This would be technically __att_names[0][1]
'somerandomname',--> This would be technically __att_names[0][2]
{
"__atts": [
'location_value', --> This would be technically __att_names[1][1]
'address_value',--> This would be technically __att_names[1][2]
"__atts": [
]
"__t_idx": 1 --> It can keep getting nested.. further and further.
]
"__t_idx": 1
}
{
"__atts": [
'random_key3value'
'random_key3value'
]
"__t_idx": 3
}
{
"__atts": [
'random_key1value'
'random_key2value'
]
"__t_idx": 2
}
],
"__t_idx": 0 ---> This maps to the first item in __att_names
]
}
]
My desired output in this case would be
[
{
'id': 1,
'name': 'somerandomname',
'location': 'address_value',
'random_key1': 'random_key1value',
'random_key2': 'random_key2value',
'random_key3': 'random_key3value',
'random_key4': 'random_key4value',
}
]
I was able to get it working for the first few nested fields for __att_names, but my code was getting really long and wonky when I was doing nested and it felt really repetitive.
I feel like there is a neater and recursive way to solve this.
This is my current approach:
As of now the following code does take care first the very first nested object..
payload_names = data_list['__att_names']
comparable_data = data_list['__root']['comparables']
output_arr = []
for items in comparable_data[:1]:
output = {}
index_number = items.get('__t_idx')
attributes = items.get('__atts')
if attributes:
recursive_function(index_number, attributes, payload_names, output)
output_arr.append(output)
def recursive_function(index, attributes, payload_names, output):
category_location = payload_names[index]
for index, categories in enumerate(category_location):
output[categories] = attributes[index]
if type(attributes[index]) == dict:
has_nested_index = attributes[index].get('__t_idx')
has_nested_attributes = attributes[index].get('__atts')
if has_nested_attributes and has_nested_index:
recursive_function(has_nested_index, has_nested_attributes, payload_names, output)
else:
continue
To further explain given example:
[ {
'id': 1,
'name': 'somerandomname',
'location': 'address_value',
'random_key1': 'random_key1value',
'random_key2': 'random_key2value',
'random_key3': 'random_key3value',
'random_key4': 'random_key4value',
}
]
Specifically 'location': 'address_value', The value 'address_value' was derived from the array of comparables key which has the array of dictionaries with key value pair. i.e __g_id and __atts and also __t_idx note some of them might not have __g_id but when there is a key __atts there is also __t_idx which would map the index with array in __att_names
Overally
__att_names are basically all the different keys
and all the items within comparables -> __atts are basically the values for the key names in __att_names.
__t_idx helps us map __atts array items to __att_names and create a dictionary key-value as outcome.

If you want to restructure a complex JSON object, my recommendation is to use jq.
Python package
Oficial website
The data you present is really confusing and ofuscated, so I'm not sure what exact filtering your case would require. But your problem involves indefinitely nested data, for what I understand. So instead of a recursive function, you could make a loop that unnests the data into the plain structure that you desire. There's already a question on that topic.

You can traverse the structure while tracking the __t_idx key values that correspond to list elements that are not dictionaries:
data_list = {'__att_names': [['id', 'name'], ['location', 'address'], ['random_key1', 'random_key2'], ['random_key3', 'random_key4']], '__root': {'comparables': [{'__g_id': '153564396', '__atts': [1, 'somerandomname', {'__atts': ['location_value', 'address_value', {'__atts': [], '__t_idx': 1}], '__t_idx': 1}, {'__atts': ['random_key3value', 'random_key4value'], '__t_idx': 3}, {'__atts': ['random_key1value', 'random_key2value'], '__t_idx': 2}], '__t_idx': 0}]}}
def get_vals(d, f = False, t_idx = None):
if isinstance(d, dict) and '__atts' in d:
yield from [i for a, b in d.items() for i in get_vals(b, t_idx = d.get('__t_idx'))]
elif isinstance(d, list):
yield from [i for b in d for i in get_vals(b, f = True, t_idx = t_idx)]
elif f and t_idx is not None:
yield (d, t_idx)
result = []
for i in data_list['__root']['comparables']:
new_d = {}
for a, b in get_vals(i):
new_d[b] = iter([*new_d.get(b, []), a])
result.append({j:next(new_d[i]) for i, a in enumerate(data_list['__att_names']) for j in a})
print(result)
Output:
[
{'id': 1,
'name': 'somerandomname',
'location': 'location_value',
'address': 'address_value',
'random_key1': 'random_key1value',
'random_key2': 'random_key2value',
'random_key3': 'random_key3value',
'random_key4': 'random_key4value'
}
]

Related

Python - handle empty list when iterating through dict

I have a list of dicts and need to retreive events key which is a list. However that list is not always filled with data, depending on a case.
How to iterate through them and not get list index out of range error?
[-1] does work but when events is and empty list, I get that error.
Sample input:
jobs = [
{
"JobName":"xyz",
"JobRunState":"SUCCEEDED",
"LogGroupName":"xyz",
"Id":"xyz",
"events":[
]
},
{
"JobName":"xyz2",
"JobRunState":"SUCCEEDED",
"LogGroupName":"xyz",
"Id":"xyz",
"events":[
{
"timestamp":1673596884835,
"message":"....",
"ingestionTime":1673598934350
},
{
"timestamp":1673599235711,
"message":"....",
"ingestionTime":1673599236353
}
]
}
]
Code:
success = [
{
"name": x["JobName"],
"state": x["JobRunState"],
"event": self.logs_client.get_log_events(
logGroupName=x["LogGroupName"] + "/output",
logStreamName=x["Id"],
)["events"][-1]["message"],
}
for x in jobs
if x["JobRunState"] in self.SUCCESS
]
Expected behavior: when ["events"] is empty, return "event" as an empty list.
[
{'name': 'xyz', 'state': 'SUCCEEDED', 'event': []},
{'name': 'xyz2', 'state': 'SUCCEEDED', 'event': "...."}
]
Error code:
"event": self.logs_client.get_log_events(
IndexError: list index out of range
If you actually wanted to get all the events and not just the last one, you could do:
success = [
{"event": event["message"]}
for x in jobs
for event in self.logs_client.get_log_events(
logGroupName=x["LogGroupName"] + "/output",
logStreamName=x["Id"],
)["events"]
]
which will simply handle empty lists by not producing a dictionary for those jobs.
If you really just wanted the last one, but still to skip jobs with no events, modify the above code to iterate over a slice of either the last event or no events:
success = [
{"event": last_event["message"]}
for x in jobs
for last_event in self.logs_client.get_log_events(
logGroupName=x["LogGroupName"] + "/output",
logStreamName=x["Id"],
)["events"][-1:]
]
the useful difference of the slice operation being that it gives you a list no matter what rather than an IndexError on an empty list:
>>> [1, 2, 3][-1:]
[3]
>>> [][-1:]
[]
The simple answer is to not try to do everything inside a list comprehension. Just make it a regular loop where you can add more complex logic and build your resulting list with append().
successes = list()
for job in jobs:
if job["state"] in self.SUCCESS:
success = dict()
#do stuff to populate success object
successes.append(success)

How to search an object list for a specific attribute value that exists in another list of objects

I have two lists
list1 = [obj1, obj2, ... objn] # len(list1) == N
list2 = [obj1, obj2, ... objm] # len(list2) == M
here's a json representation of obj:
obj = {
"a1": 0,
"a2": 1,
"a3": 2
}
How would I determine the objects from list2 with the same value for obj["a1"] as those in list1? Note it's possible to have multiple occurrences of this. The objects in both lists are formatted the same.
I am only interested in seeing if the value for a certain object attribute from one list can be found in another
For example
list1 = [
{
"a1":0,
"a2":5,
"a3":4
},
{
"a1":2,
"a2":3,
"a3":1
}
...
]
list2 = [
# first object
{
"a1":0,
"a2":3,
"a3":1
},
# second object
{
"a1":3,
"a2":1,
"a3":0
}
...
]
In this case, the first object in list2 contains the same attribute value for obj["a1"] as list1
using pandas you can try this
list1 = [
{
"a1":0,
"a2":5,
"a3":4
},
{
"a1":2,
"a2":3,
"a3":1
}
]
list2 = [
# first object
{
"a1":0,
"a2":3,
"a3":1
},
# second object
{
"a1":3,
"a2":1,
"a3":0
}
]
df1 = pd.DataFrame(list1)
df2 = pd.DataFrame(list2)
a1 = df2[df2['a1'].isin(df1['a1'])]
a1.to_json(orient='records', lines=True)
Check Pandas, you can easily transform the lists to pandas and from there, doing what you need is pretty straight forward.
Index the two pandas with "a1", and then check this link to get intersection
try this; (I have not run the code. but this should work!)
import pandas as pd
df1 = pd.DataFrame(list1)
df2 = pd.DataFrame(list2)
df1.set_index("a1",inplace=True)
df2.set_index("a1",inplace=True)
df1.index.intersection(df2.index)
This should give you the list of intersections
The easiest solution would be to simply go through a double loop:
for obj1 in list1:
for key in obj1:
for obj2 in list2:
if (obj1[key] == obj2[key]):
# Do what you want
You could try looking into different libraries and you may find an answer. However you can play around with dictionaries to achieve the same results. Let me know if you have any issues with this method.
def get_groups(list1, list2):
# assuming the obj are dictionaries in lists 1 & 2
# we store entries into the known_data as
# (category, value) : [obj_references]
# eg known_data = {('a1', 0) : [obj1, obj23, obj3]}
known_data = {}
for obj in list1:
for category, value in obj.items():
key = (category, value)
entry = known_data.get(key, []) or []
entry.append(obj)
known_data[key] = entry
# now we can iterate over list2 and check to see if it shares any keys
# for groups we store our common key (category, value)
# and map to a 2D array [ [list1 objs] , [list2 objs]]
groups = {}
for obj in list2:
for category, value in obj.items():
key = (category, value)
if key not in known_data:
continue
entry = groups.get(key) or [[], [known_data[key]]]
entry[0].append(obj)
groups[key] = entry
return groups

Flattening python object and list of child objects to a single dict

I'm trying to convert a list of asset objects that has a list of attribute objects into an array of dictionaries. I'm trying to denormalise the parent/child relationship into a single dictionary.
For the context of my code below I have an asset object with a short_name and the asset object has a list of attributes with an attribute_value and attribute_name.
My intended result is something like this;
[{'name': 'Test', 'attr': 0.9}, {'name': 'Test2', 'attr': 0.5}]
So far I've written it like this;
a_list = []
for a in self.assets:
asset_dict = {'name': a.short_name }
for x in a.attributes:
asset_dict = asset_dict | { x.attribute_name : x.attribute_value }
a_list.append(asset_dict)
This works fine, but I'm looking for a neater solution.
I experimented with;
result = [{'name':a.short_name} | {x.attribute_name : x.attribute_value} for x in a.attribute for a in self.assets]
However, I just can't seem to get the syntax correct and not sure if it is possible to do something like this.
EDIT: Inputs on request (excluding the class definition);
self.assets = [Asset(short_name='Test'),Asset(short_name='Test2')]
self.assets[0].attributes = [Attribute(attribute_name='attr',attribute_value=0.9)]
self.assets[1].attributes = [Attribute(attribute_name='attr',attribute_value=0.5)]
This should work:
a_list = [
{'name': a.short_name} |
{x.attribute_name: x.attribute_value for x in a.attributes}
for a in self.assets
]
or
a_list = [
{'name': a.short_name, **{x.attribute_name: x.attribute_value
for x in a.attributes}}
for a in self.assets
]

Python list json conversion to list

I have a list in the below format.
['111: {"id":"de80ca97","data":"test"}', '222: {"id":"8916a167","data":"touch"}', '333: {"id":"12966e98","data":"tap"}']
I need to remove the data column from above list / json and replace it with key value of the list.
I need to transform it to the below structure.
Desired output:
[
{
"score":111,
"id":"de80ca97"
},
{
"score":222,
"id":"8916a167"
},
{
"score":333,
"id":"12966e98"
}
]
Any suggestions or ideas most welcome.
You can use a for loop or you can also use a list comprehension as follows:
>>> import json
>>> l = ['111: {"id":"de80ca97","data":"test"}', '222: {"id":"8916a167","data":"touch"}', '333: {"id":"12966e98","data":"tap"}']
>>> [{'score': int(e.split()[0][:-1]), 'id': json.loads(e.split()[1])['id']} for e in l]
If you prefer to use a for loop:
new_l = []
for e in l:
key, json_str = e.split()
new_l.append({'score': int(key[:-1]), 'id': json.loads(json_str)['id']})

How to get the full path of a key in a complex list of dictionary

So i have a complex list with dictionaries and lists as values.
This is the one:
list = [
{"folder1": [
{"file1": 5},
{"folder3": [{"file2": 7},
{"file3": 10}]},
{"file4": 9}
]
},
{"folder2": [
{"folder4": []},
{"folder5": [
{"folder6": [{"file5": 17}]},
{"file6": 6},
{"file7": 5}
]},
{"file8": 10}
]
}
]
I need to extract the path for each file like a directory tree how is stored on a hdd:
Output sample:
output:
folder1/file1
folder1/file4
folder1/folder3/file2
folder1/folder3/file3
folder2/file8
folder2/folder4
folder2/folder5/file6
folder2/folder5/file7
folder2/folder5/folder6/file5
Please help, i have been struggling and could not find a way.
Thank you
You can use recursion with yield:
def get_paths(d, seen):
for a, b in d.items():
if not isinstance(b, list) or not b:
yield '{}/{}'.format("/".join(seen), a)
else:
for c in b:
for t in get_paths(c, seen+[a]):
yield t
print('\n'.join([i for b in data for i in get_paths(b, [])]))
Output:
folder1/file1
folder1/folder3/file2
folder1/folder3/file3
folder1/file4
folder2/folder4
folder2/folder5/folder6/file5
folder2/folder5/file6
folder2/folder5/file7
folder2/file8

Categories