Python list json conversion to list - python

I have a list in the below format.
['111: {"id":"de80ca97","data":"test"}', '222: {"id":"8916a167","data":"touch"}', '333: {"id":"12966e98","data":"tap"}']
I need to remove the data column from above list / json and replace it with key value of the list.
I need to transform it to the below structure.
Desired output:
[
{
"score":111,
"id":"de80ca97"
},
{
"score":222,
"id":"8916a167"
},
{
"score":333,
"id":"12966e98"
}
]
Any suggestions or ideas most welcome.

You can use a for loop or you can also use a list comprehension as follows:
>>> import json
>>> l = ['111: {"id":"de80ca97","data":"test"}', '222: {"id":"8916a167","data":"touch"}', '333: {"id":"12966e98","data":"tap"}']
>>> [{'score': int(e.split()[0][:-1]), 'id': json.loads(e.split()[1])['id']} for e in l]
If you prefer to use a for loop:
new_l = []
for e in l:
key, json_str = e.split()
new_l.append({'score': int(key[:-1]), 'id': json.loads(json_str)['id']})

Related

how can I extract key value pairs from nested JSON with for loops

I have nested json in multilevel lists of dicts
Try to extract key value pairs. I can make separate listst of keys 'code_details' and of values 'url_details' with for loops. But I want to store the result as key:value pairs.
code_details = []
url_details = []
for item in all_content:
code_details.append(item ['code'])
media_details = item ['media']
for i in media_details:
resources_details = i['resources']
for j in resources_details:
url_details.append(j ['url'])
How can I adjust for loops to store key:value pair in a dict {'code':'url'}
json examples
all_content[{"code": "0100410ZWA",
},
{"media": [
{
"containsExplicitContent": true,
"imageType": "Packshot",
"resources": [
{
"expirationDate": "2021-05-20T11:07:00Z",
"format": "ORIGINAL",
"url": "https://media.lingeriestyling.com/marie_jo_l'aventure-lingerie-padded_bra-tom-0120826-pink-0_L_35590.jpg"
}
]}]
code_details example:
['0502570SRE',
'0102649ALF',
'0602640ALF',
'0502572SRE',
'0102646ALF',
'0102570SRE',
'0502571SRE',
'0602570SRE',
'0502640ALF',
'0102640ALF',
'0102574SRE',
'0502642ALF',
'0102576SRE',
'0502641ALF',
'0663321AME',
'0163244AUT',
'0563240AUT',
'0663320AME',
url_details example:
['https://media.lingeriestyling.com/eservices/marie_jo-lingerie-briefs-danae-0502570-red-0_3558237.jpg',
'https://media.lingeriestyling.com/eservices/marie_jo-lingerie-briefs-danae-0502570-red-0_3560011.jpg',
'https://media.lingeriestyling.com/eservices/marie_jo-lingerie-briefs-danae-0502570-red-2_3560012.jpg',
'https://media.lingeriestyling.com/eservices/marie_jo-lingerie-briefs-danae-0502570-red-3_3560013.jpg',
'https://media.lingeriestyling.com/eservices/marie_jo-lingerie-briefs-danae-0502570-red-0_3558965.jpg',
'https://media.lingeriestyling.com/eservices/marie_jo-lingerie-briefs-danae-0502570-red-2_3558970.jpg',
'https://media.lingeriestyling.com/eservices/marie_jo-lingerie-briefs-danae-0502570-red-3_3558976.jpg',
'https://media.lingeriestyling.com/eservices/marie_jo-lingerie-balcony_bra-raia-0102649-multicolour-0_3558308.jpg',
You can create a dict and update it in each iteration. Notice that dict values will be a lists
code_details = {}
for item in all_content:
media_details = item ['media']
# we need to clean it every iteration
url_details = []
for i in media_details:
resources_details = i['resources']
for j in resources_details:
url_details.append(j ['url'])
# here our magic is
code = item ['code'])
code_details[code] = url_details

How to search an object list for a specific attribute value that exists in another list of objects

I have two lists
list1 = [obj1, obj2, ... objn] # len(list1) == N
list2 = [obj1, obj2, ... objm] # len(list2) == M
here's a json representation of obj:
obj = {
"a1": 0,
"a2": 1,
"a3": 2
}
How would I determine the objects from list2 with the same value for obj["a1"] as those in list1? Note it's possible to have multiple occurrences of this. The objects in both lists are formatted the same.
I am only interested in seeing if the value for a certain object attribute from one list can be found in another
For example
list1 = [
{
"a1":0,
"a2":5,
"a3":4
},
{
"a1":2,
"a2":3,
"a3":1
}
...
]
list2 = [
# first object
{
"a1":0,
"a2":3,
"a3":1
},
# second object
{
"a1":3,
"a2":1,
"a3":0
}
...
]
In this case, the first object in list2 contains the same attribute value for obj["a1"] as list1
using pandas you can try this
list1 = [
{
"a1":0,
"a2":5,
"a3":4
},
{
"a1":2,
"a2":3,
"a3":1
}
]
list2 = [
# first object
{
"a1":0,
"a2":3,
"a3":1
},
# second object
{
"a1":3,
"a2":1,
"a3":0
}
]
df1 = pd.DataFrame(list1)
df2 = pd.DataFrame(list2)
a1 = df2[df2['a1'].isin(df1['a1'])]
a1.to_json(orient='records', lines=True)
Check Pandas, you can easily transform the lists to pandas and from there, doing what you need is pretty straight forward.
Index the two pandas with "a1", and then check this link to get intersection
try this; (I have not run the code. but this should work!)
import pandas as pd
df1 = pd.DataFrame(list1)
df2 = pd.DataFrame(list2)
df1.set_index("a1",inplace=True)
df2.set_index("a1",inplace=True)
df1.index.intersection(df2.index)
This should give you the list of intersections
The easiest solution would be to simply go through a double loop:
for obj1 in list1:
for key in obj1:
for obj2 in list2:
if (obj1[key] == obj2[key]):
# Do what you want
You could try looking into different libraries and you may find an answer. However you can play around with dictionaries to achieve the same results. Let me know if you have any issues with this method.
def get_groups(list1, list2):
# assuming the obj are dictionaries in lists 1 & 2
# we store entries into the known_data as
# (category, value) : [obj_references]
# eg known_data = {('a1', 0) : [obj1, obj23, obj3]}
known_data = {}
for obj in list1:
for category, value in obj.items():
key = (category, value)
entry = known_data.get(key, []) or []
entry.append(obj)
known_data[key] = entry
# now we can iterate over list2 and check to see if it shares any keys
# for groups we store our common key (category, value)
# and map to a 2D array [ [list1 objs] , [list2 objs]]
groups = {}
for obj in list2:
for category, value in obj.items():
key = (category, value)
if key not in known_data:
continue
entry = groups.get(key) or [[], [known_data[key]]]
entry[0].append(obj)
groups[key] = entry
return groups

nested dicts and nested lists

I have a nested lists and dictionary's inside a list.
confused how to access the 'Product_Name' inside nested dict
list_1 = [{"group_details":[{"data":[{"product_details":[{"Product":"xyz","Invoice_No":"852","Product_Name":"abc"}]}]}]
To retrieve the indicated value, you must provide the name of each layer and define the index (which in this case are all [0]) needed to analyze each of the containers:
list_1 = [
{
"group_details":[
{
"data":[
{
"product_details":[
{
"Product":"xyz",
"Invoice_No":"852",
"Product_Name":"abc"
}
]
}
]
}
]
}
]
Product_Name = list_1[0]["group_details"][0]["data"][0]["product_details"][0]["Product_Name"]
print(Product_Name)
Result:
abc
Additional request to find via looping:
for containers in list_1:
for group_details in containers["group_details"]:
for data in group_details["data"]:
for product_details in data["product_details"]:
print(product_details["Product_Name"])
Result:
abc
To parse the structure, indent it:
list_1 = [
{"group_details":[
{"data":[
{"product_details":[
{"Product":"xyz", "Invoice_No":"852", "Product_Name":"abc"}]}]}]}]
print(list_1[0]["group_details"][0]["data"][0]["product_details"][0]["Product_Name"])
# abc
list_1 = [{"group_details":[{"data":[{"product_details":[{"Product":"xyz","Invoice_No":"852","Product_Name":"abc"}]}]}]}]
print(list_1[0]["group_details"][0]["data"][0]["product_details"][0]["Product_Name"])
RESULT:
abc
To do this iteratively:
for i in list_1:
for j in i["group_details"]:
for k in j["data"]:
for l in k["product_details"]:
for kk,vv in l.items():
if kk == "Product_Name":
print(vv)
You can use the following nested for loop:
list_1 = [{"group_details":[{"data":[{"product_details":[{"Product":"xyz","Invoice_No":"852","Product_Name":"abc"}] }]}]}]
for item in list_1:
for group_details in item.get('group_details'):
for data in group_details.get('data'):
for product_details in data.get('product_details'):
print(product_details.get('Product_Name'))

Structure JSON format to a specified data structure

Basically I have a list
data_list = [
'__att_names' : [
['id', 'name'], --> "__t_idx": 0
['location', 'address'] --> "__t_idx": 1
['random_key1', 'random_key2'] "__t_idx": 2
['random_key3', 'random_key4'] "__t_idx": 3
]
"__root": {
"comparables": [
"__g_id": "153564396",
"__atts": [
1, --> This would be technically __att_names[0][1]
'somerandomname',--> This would be technically __att_names[0][2]
{
"__atts": [
'location_value', --> This would be technically __att_names[1][1]
'address_value',--> This would be technically __att_names[1][2]
"__atts": [
]
"__t_idx": 1 --> It can keep getting nested.. further and further.
]
"__t_idx": 1
}
{
"__atts": [
'random_key3value'
'random_key3value'
]
"__t_idx": 3
}
{
"__atts": [
'random_key1value'
'random_key2value'
]
"__t_idx": 2
}
],
"__t_idx": 0 ---> This maps to the first item in __att_names
]
}
]
My desired output in this case would be
[
{
'id': 1,
'name': 'somerandomname',
'location': 'address_value',
'random_key1': 'random_key1value',
'random_key2': 'random_key2value',
'random_key3': 'random_key3value',
'random_key4': 'random_key4value',
}
]
I was able to get it working for the first few nested fields for __att_names, but my code was getting really long and wonky when I was doing nested and it felt really repetitive.
I feel like there is a neater and recursive way to solve this.
This is my current approach:
As of now the following code does take care first the very first nested object..
payload_names = data_list['__att_names']
comparable_data = data_list['__root']['comparables']
output_arr = []
for items in comparable_data[:1]:
output = {}
index_number = items.get('__t_idx')
attributes = items.get('__atts')
if attributes:
recursive_function(index_number, attributes, payload_names, output)
output_arr.append(output)
def recursive_function(index, attributes, payload_names, output):
category_location = payload_names[index]
for index, categories in enumerate(category_location):
output[categories] = attributes[index]
if type(attributes[index]) == dict:
has_nested_index = attributes[index].get('__t_idx')
has_nested_attributes = attributes[index].get('__atts')
if has_nested_attributes and has_nested_index:
recursive_function(has_nested_index, has_nested_attributes, payload_names, output)
else:
continue
To further explain given example:
[ {
'id': 1,
'name': 'somerandomname',
'location': 'address_value',
'random_key1': 'random_key1value',
'random_key2': 'random_key2value',
'random_key3': 'random_key3value',
'random_key4': 'random_key4value',
}
]
Specifically 'location': 'address_value', The value 'address_value' was derived from the array of comparables key which has the array of dictionaries with key value pair. i.e __g_id and __atts and also __t_idx note some of them might not have __g_id but when there is a key __atts there is also __t_idx which would map the index with array in __att_names
Overally
__att_names are basically all the different keys
and all the items within comparables -> __atts are basically the values for the key names in __att_names.
__t_idx helps us map __atts array items to __att_names and create a dictionary key-value as outcome.
If you want to restructure a complex JSON object, my recommendation is to use jq.
Python package
Oficial website
The data you present is really confusing and ofuscated, so I'm not sure what exact filtering your case would require. But your problem involves indefinitely nested data, for what I understand. So instead of a recursive function, you could make a loop that unnests the data into the plain structure that you desire. There's already a question on that topic.
You can traverse the structure while tracking the __t_idx key values that correspond to list elements that are not dictionaries:
data_list = {'__att_names': [['id', 'name'], ['location', 'address'], ['random_key1', 'random_key2'], ['random_key3', 'random_key4']], '__root': {'comparables': [{'__g_id': '153564396', '__atts': [1, 'somerandomname', {'__atts': ['location_value', 'address_value', {'__atts': [], '__t_idx': 1}], '__t_idx': 1}, {'__atts': ['random_key3value', 'random_key4value'], '__t_idx': 3}, {'__atts': ['random_key1value', 'random_key2value'], '__t_idx': 2}], '__t_idx': 0}]}}
def get_vals(d, f = False, t_idx = None):
if isinstance(d, dict) and '__atts' in d:
yield from [i for a, b in d.items() for i in get_vals(b, t_idx = d.get('__t_idx'))]
elif isinstance(d, list):
yield from [i for b in d for i in get_vals(b, f = True, t_idx = t_idx)]
elif f and t_idx is not None:
yield (d, t_idx)
result = []
for i in data_list['__root']['comparables']:
new_d = {}
for a, b in get_vals(i):
new_d[b] = iter([*new_d.get(b, []), a])
result.append({j:next(new_d[i]) for i, a in enumerate(data_list['__att_names']) for j in a})
print(result)
Output:
[
{'id': 1,
'name': 'somerandomname',
'location': 'location_value',
'address': 'address_value',
'random_key1': 'random_key1value',
'random_key2': 'random_key2value',
'random_key3': 'random_key3value',
'random_key4': 'random_key4value'
}
]

How to combine every nth dict element in python list?

Input:
list1 = [
{
"dict_a":"dict_a_values"
},
{
"dict_b":"dict_b_values"
},
{
"dict_c":"dict_c_values"
},
{
"dict_d":"dict_d_values"
}
]
Assuming n=2, every two elements have to be combined together.
Output:
list1 = [
{
"dict_a":"dict_a_values",
"dict_c":"dict_c_values"
},
{
"dict_b":"dict_b_values",
"dict_d":"dict_d_values"
}
]
Ideally, it'd be nicer if the output could look like something as follows with an extra layer of nesting:
[
{"dict_combined_ac": {
"dict_a":"dict_a_values",
"dict_c":"dict_c_values"
}},
{"dict_combined_bd": {
"dict_b":"dict_b_values",
"dict_d":"dict_d_values"
}}
]
But since this is really difficult to implement, I'd be more than satisfied with an output looking something similar to the first example. Thanks in advance!
What I've tried so far:
[ ''.join(x) for x in zip(list1[0::2], list1[1::2]) ]
However, I know this doesn't work because I'm working with dict elements and not str elements and when wrapping the lists with str(), every two letters is being combined instead. I'm also unsure of how I can adjust this to be for every n elements instead of just 2.
Given the original list, as in the question, the following should generate the required output:
result_list = list()
n = 2 # number of elements you want in each partition
seen_idx = set()
for i in range(len(list1)): # iterate over all indices
if i not in seen_idx:
curr_idx_list = list() # current partition
for j in range(i, len(list1), n): # generate indices for a combination partition
seen_idx.add(j) # keep record of seen indices
curr_idx_list.append(j) # store indices for current partition
# At this point we have indices of a partition, now combine
temp_dict = dict() # temporary dictionary where we store combined values
for j in curr_idx_list: # iterate over indices of current partition
temp_dict.update(list1[j])
result_list.append(temp_dict) # add to result list
print(result_list, '\n')
# Bonus: change result list into list of nested dictionaries
new_res_list = list()
for elem in result_list: # for each (combined) dictionary in the list, we make new keys
key_names = list(elem.keys())
key_names = [e.split('_')[1] for e in key_names]
new_key = 'dict_combined_' + ''.join(key_names)
temp_dict = {new_key: elem}
new_res_list.append(temp_dict)
print(new_res_list, '\n')
The output is as follows:
[{'dict_a': 'dict_a_values', 'dict_c': 'dict_c_values'}, {'dict_b': 'dict_b_values', 'dict_d': 'dict_d_values'}]
[{'dict_combined_ac': {'dict_a': 'dict_a_values', 'dict_c': 'dict_c_values'}}, {'dict_combined_bd': {'dict_b': 'dict_b_values', 'dict_d': 'dict_d_values'}}]

Categories