I have a JSON data as below.
input_list = [["Richard",[],{"children":"yes","divorced":"no","occupation":"analyst"}],
["Mary",["testing"],{"children":"no","divorced":"yes","occupation":"QA analyst","location":"Seattle"}]]
I have another list where I have the prospective keys present
list_keys = ['name', 'current_project', 'details']
I am trying to create a dic using both to make the data usable for metrics
I have summarized the both the list for the question but it goes on forever, there are multiple elements in the list. input_list is a nested list which has 500k+ elements and each list element have 70+ elements of their own (expect the details one)
list_keys also have 70+ elements in it.
I was trying to create a dict using zip but that its not helping given the size of data, also with zip I am not able to exclude the "details" element from
I am expecting output something like this.
[
{
"name": "Richard",
"current_project": "",
"children": "yes",
"divorced": "no",
"occupation": "analyst"
},
{
"name": "Mary",
"current_project" :"testing",
"children": "no",
"divorced": "yes",
"occupation": "QA analyst",
"location": "Seattle"
}
]
I have tried this so far
>>> for line in input_list:
... zipbObj = zip(list_keys, line)
... dictOfWords = dict(zipbObj)
...
>>> print dictOfWords
{'current_project': ['testing'], 'name': 'Mary', 'details': {'location': 'Seattle', 'children': 'no', 'divorced': 'yes', 'occupation': 'QA analyst'}}
but with this I am unable to to get rid of nested dict key "details". so looking for help with that
Seems like what you wanted was a list of dictionaries, here is something i coded up in the terminal and copied in here. Hope it helps.
>>> list_of_dicts = []
>>> for item in input_list:
... dict = {}
... for i in range(0, len(item)-2, 3):
... dict[list_keys[0]] = item[i]
... dict[list_keys[1]] = item[i+1]
... dict.update(item[i+2])
... list_of_dicts.append(dict)
...
>>> list_of_dicts
[{'name': 'Richard', 'current_project': [], 'children': 'yes', 'divorced': 'no', 'occupation': 'analyst'
}, {'name': 'Mary', 'current_project': ['testing'], 'children': 'no', 'divorced': 'yes', 'occupation': '
QA analyst', 'location': 'Seattle'}]
I will mention it is not the ideal method of doing this since it relies on perfectly ordered items in the input_list.
people = input_list = [["Richard",[],{"children":"yes","divorced":"no","occupation":"analyst"}],
["Mary",["testing"],{"children":"no","divorced":"yes","occupation":"QA analyst","location":"Seattle"}]]
list_keys = ['name', 'current_project', 'details']
listout = []
for person in people:
dict_p = {}
for key in list_keys:
if not key == 'details':
dict_p[key] = person[list_keys.index(key)]
else:
subdict = person[list_keys.index(key)]
for subkey in subdict.keys():
dict_p[subkey] = subdict[subkey]
listout.append(dict_p)
listout
The issue with using zip is that you have that additional dictionary in the people list. This will get the following output, and should work through a larger list of individuals:
[{'name': 'Richard',
'current_project': [],
'children': 'yes',
'divorced': 'no',
'occupation': 'analyst'},
{'name': 'Mary',
'current_project': ['testing'],
'children': 'no',
'divorced': 'yes',
'occupation': 'QA analyst',
'location': 'Seattle'}]
This script will go through every item of input_list and creates new list where there aren't any list or dictionaries:
input_list = [
["Richard",[],{"children":"yes","divorced":"no","occupation":"analyst"}],
["Mary",["testing"],{"children":"no","divorced":"yes","occupation":"QA analyst","location":"Seattle"}]
]
list_keys = ['name', 'current_project', 'details']
out = []
for item in input_list:
d = {}
out.append(d)
for value, keyname in zip(item, list_keys):
if isinstance(value, dict):
d.update(**value)
elif isinstance(value, list):
if value:
d[keyname] = value[0]
else:
d[keyname] = ''
else:
d[keyname] = value
from pprint import pprint
pprint(out)
Prints:
[{'children': 'yes',
'current_project': '',
'divorced': 'no',
'name': 'Richard',
'occupation': 'analyst'},
{'children': 'no',
'current_project': 'testing',
'divorced': 'yes',
'location': 'Seattle',
'name': 'Mary',
'occupation': 'QA analyst'}]
Related
I have a dictionary with some values that are type list, i need to convert each list in another dictionary and insert this new dictionary at the place of the list.
Basically, I have this dictionary
Dic = {
'name': 'P1',
'srcintf': 'IntA',
'dstintf': 'IntB',
'srcaddr': 'IP1',
'dstaddr': ['IP2', 'IP3', 'IP4'],
'service': ['P_9100', 'SNMP'],
'schedule' : 'always',
}
I need to reemplace the values that are lists
Expected output:
Dic = {
'name': 'P1',
'srcintf': 'IntA',
'dstintf': 'IntB',
'srcaddr': 'IP1',
'dstaddr': [
{'name': 'IP2'},
{'name': 'IP3'},
{'name': 'IP4'}
],
'service': [
{'name': 'P_9100'},
{'name': 'SNMP'}
],
'schedule' : 'always',
}
So far I have come up with this code:
for k,v in Dic.items():
if not isinstance(v, list):
NewDic = [k,v]
print(NewDic)
else:
values = v
keys = ["name"]*len(values)
for item in range(len(values)):
key = keys[item]
value = values[item]
SmallDic = {key : value}
liste.append(SmallDic)
NewDic = [k,liste]
which print this
['name', 'P1']
['srcintf', 'IntA']
['dstintf', 'IntB']
['srcaddr', 'IP1']
['schedule', 'always']
['schedule', 'always']
I think is a problem with the loop for, but so far I haven't been able to figure it out.
You need to re-create the dictionary. With some modifications to your existing code so that it generates a new dictionary & fixing the else clause:
NewDic = {}
for k, v in Dic.items():
if not isinstance(v, list):
NewDic[k] = v
else:
NewDic[k] = [
{"name": e} for e in v # loop through the list values & generate a dict for each
]
print(NewDic)
Result:
{'name': 'P1', 'srcintf': 'IntA', 'dstintf': 'IntB', 'srcaddr': 'IP1', 'dstaddr': [{'name': 'IP2'}, {'name': 'IP3'}, {'name': 'IP4'}], 'service': [{'name': 'P_9100'}, {'name': 'SNMP'}], 'schedule': 'always'}
I have dictionary like that:
dic={'61': {'NAME': 'John', 'LASTNAME': 'X', 'EMAIL': 'X#example.com', 'GRADE': '99'}, '52': {'NAME': 'Jennifer', 'LASTNAME': 'Y', 'EMAIL': 'Y#example.com', 'GRADE': '98'}}
obj = json.dumps(dic,indent=3)
print(obj)
I want to create Json for some values.
{
"NAME": "John",
"LASTNAME": "X",
,
"NAME": "Jennifer",
"LASTNAME": "Y"
}
Any idea for help?
If I understand correctly you want to keep the values of your original data without the indices and also filter out some of them (keep only "NAME" and "LASTNAME"). You can do so by using a combination of dictionary and list comprehensions:
array = [{k:v for k,v in d.items()if k in ("NAME","LASTNAME")} for d in dic.values()]
This creates the following output:
>>> array
[{'NAME': 'John', 'LASTNAME': 'X'}, {'NAME': 'Jennifer', 'LASTNAME': 'Y'}]
a =[{
"id":"1",
"Name":'BK',
"Age":'56'
},
{
"id":"1",
"Sex":'Male'
},
{
"id":"2",
"Name":"AK",
"Age":"32"
}]
I have a list of dictionary with a person information split in multiple dictionary as above for ex above id 1's information is contained in first 2 dictionary , how can i get an output of below
{1: {'Name':'BK','Age':'56','Sex':'Male'}, 2: { 'Name': 'AK','Age':'32'}}
You can use a defaultdict to collect the results.
from collections import defaultdict
a =[{ "id":"1", "Name":'BK', "Age":'56' }, { "id":"1", "Sex":'Male' }, { "id":"2", "Name":"AK", "Age":"32" }]
results = defaultdict(dict)
key = lambda d: d['id']
for a_dict in a:
results[a_dict.pop('id')].update(a_dict)
This gives you:
>>> results
defaultdict(<class 'dict'>, {'1': {'Name': 'BK', 'Age': '56', 'Sex': 'Male'}, '2': {'Name': 'AK', 'Age': '32'}})
The defaultdict type behaves like a normal dict, except that when you reference an unknown value, a default value is returned. This means that as the dicts in a are iterated over, the values (except for id) are updated onto either an existing dict, or an automatic newly created one.
How does collections.defaultdict work?
Using defaultdict
from collections import defaultdict
a = [{
"id": "1",
"Name": 'BK',
"Age": '56'
},
{
"id": "1",
"Sex": 'Male'
},
{
"id": "2",
"Name": "AK",
"Age": "32"
}
]
final_ = defaultdict(dict)
for row in a:
final_[row.pop('id')].update(row)
print(final_)
defaultdict(<class 'dict'>, {'1': {'Name': 'BK', 'Age': '56', 'Sex': 'Male'}, '2': {'Name': 'AK', 'Age': '32'}})
You can combine 2 dictionaries by using the .update() function
dict_a = { "id":"1", "Name":'BK', "Age":'56' }
dict_b = { "id":"1", "Sex":'Male' }
dict_a.update(dict_b) # {'Age': '56', 'Name': 'BK', 'Sex': 'Male', 'id': '1'}
Since the output the you want is in dictionary form
combined_dict = {}
for item in a:
id = item.pop("id") # pop() remove the id key from item and return the value
if id in combined_dict:
combined_dict[id].update(item)
else:
combined_dict[id] = item
print(combined_dict) # {'1': {'Name': 'BK', 'Age': '56', 'Sex': 'Male'}, '2': {'Name': 'AK', 'Age': '32'}}
from collections import defaultdict
result = defaultdict(dict)
a =[{ "id":"1", "Name":'BK', "Age":'56' }, { "id":"1", "Sex":'Male' }, { "id":"2", "Name":"AK", "Age":"32" }]
for b in a:
result[b['id']].update(b)
print(result)
d = {}
for p in a:
id = p["id"]
if id not in d.keys():
d[id] = p
else:
d[id] = {**d[id], **p}
d is the result dictionary you want.
In the for loop, if you encounter an id for the first time, you just store the incomplete value.
If the id is in the existing keys, update it.
The combination happens in {**d[id], **p}
where ** is unpacking the dict.
It unpacks the existing incomplete dict associated withe the id and the current dict, then combine them into a new dict.
I have a list of dictionaries, themselves with nested lists of dictionaries. All of the nest levels have a similar structure, thankfully. I desire to sort these nested lists of dictionaries. I grasp the technique to sort a list of dictionaries by value. I'm struggling with the recursion that will sort the inner lists.
def reorder(l, sort_by):
# I have been trying to add a recursion here
# so that the function calls itself for each
# nested group of "children". So far, fail
return sorted(l, key=lambda k: k[sort_by])
l = [
{ 'name': 'steve',
'children': [
{ 'name': 'sam',
'children': [
{'name': 'sally'},
{'name': 'sabrina'}
]
},
{'name': 'sydney'},
{'name': 'sal'}
]
},
{ 'name': 'fred',
'children': [
{'name': 'fritz'},
{'name': 'frank'}
]
}
]
print(reorder(l, 'name'))
def reorder(l, sort_by):
l = sorted(l, key=lambda x: x[sort_by])
for item in l:
if "children" in item:
item["children"] = reorder(item["children"], sort_by)
return l
Since you state "I grasp the technique to sort a list of dictionaries by value" I will post some code for recursively gathering data from another SO post I made, and leave it to you to implement your sorting technique. The code:
myjson = {
'transportation': 'car',
'address': {
'driveway': 'yes',
'home_address': {
'state': 'TX',
'city': 'Houston'}
},
'work_address': {
'state': 'TX',
'city': 'Sugarland',
'location': 'office-tower',
'salary': 30000}
}
def get_keys(some_dictionary, parent=None):
for key, value in some_dictionary.items():
if '{}.{}'.format(parent, key) not in my_list:
my_list.append('{}.{}'.format(parent, key))
if isinstance(value, dict):
get_keys(value, parent='{}.{}'.format(parent, key))
else:
pass
my_list = []
get_keys(myjson, parent='myjson')
print(my_list)
Is intended to retrieve all keys recursively from the json file. It outputs:
['myjson.address',
'myjson.address.home_address',
'myjson.address.home_address.state',
'myjson.address.home_address.city',
'myjson.address.driveway',
'myjson.transportation',
'myjson.work_address',
'myjson.work_address.state',
'myjson.work_address.salary',
'myjson.work_address.location',
'myjson.work_address.city']
The main thing to note is that if isinstance(value, dict): results in get_keys() being called again, hence the recursive capabilities of it (but only for nested dictionaries in this case).
Simple Python question, but I'm scratching my head over the answer!
I have an array of strings of arbitrary length called path, like this:
path = ['country', 'city', 'items']
I also have a dictionary, data, and a string, unwanted_property. I know that the dictionary is of arbitrary depth and is dictionaries all the way down, with the exception of the items property, which is always an array.
[CLARIFICATION: The point of this question is that I don't know what the contents of path will be. They could be anything. I also don't know what the dictionary will look like. I need to walk down the dictionary as far as the path indicates, and then delete the unwanted properties from there, without knowing in advance what the path looks like, or how long it will be.]
I want to retrieve the parts of the data object (if any) that matches the path, and then delete the unwanted_property from each.
So in the example above, I would like to retrieve:
data['country']['city']['items']
and then delete unwanted_property from each of the items in the array. I want to amend the original data, not a copy. (CLARIFICATION: By this I mean, I'd like to end up with the original dict, just minus the unwanted properties.)
How can I do this in code?
I've got this far:
path = ['country', 'city', 'items']
data = {
'country': {
'city': {
'items': [
{
'name': '114th Street',
'unwanted_property': 'foo',
},
{
'name': '8th Avenue',
'unwanted_property': 'foo',
},
]
}
}
}
for p in path:
if p == 'items':
data = [i for i in data[p]]
else:
data = data[p]
if isinstance(data, list):
for d in data:
del d['unwanted_property']
else:
del data['unwanted_property']
The problem is that this doesn't amend the original data. It also relies on items always being the last string in the path, which may not always be the case.
CLARIFICATION: I mean that I'd like to end up with:
{
'country': {
'city': {
'items': [
{
'name': '114th Street'
},
{
'name': '8th Avenue'
},
]
}
}
}
Whereas what I have available in data is only [{'name': '114th Street'}, {'name': '8th Avenue'}].
I feel like I need something like XPath for the dictionary.
The problem you are overwriting the original data reference. Change your processing code to
temp = data
for p in path:
temp = temp[p]
if isinstance(temp, list):
for d in temp:
del d['unwanted_property']
else:
del temp['unwanted_property']
In this version, you set temp to point to the same object that data was referring to. temp is not a copy, so any changes you make to it will be visible in the original object. Then you step temp along itself, while data remains a reference to the root dictionary. When you find the path you are looking for, any changes made via temp will be visible in data.
I also removed the line data = [i for i in data[p]]. It creates an unnecessary copy of the list that you never need, since you are not modifying the references stored in the list, just the contents of the references.
The fact that path is not pre-determined (besides the fact that items is going to be a list) means that you may end up getting a KeyError in the first loop if the path does not exist in your dictionary. You can handle that gracefully be doing something more like:
try:
temp = data
for p in path:
temp = temp[p]
except KeyError:
print('Path {} not in data'.format(path))
else:
if isinstance(temp, list):
for d in temp:
del d['unwanted_property']
else:
del temp['unwanted_property']
The problem you are facing is that you are re-assigning the data variable to an undesired value. In the body of your for loop you are setting data to the next level down on the tree, for instance given your example data will have the following values (in order), up to when it leaves the for loop:
data == {'country': {'city': {'items': [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]}}}
data == {'city': {'items': [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]}}
data == {'items': [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]}
data == [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]
Then when you delete the items from your dictionaries at the end you are left with data being a list of those dictionaries as you have lost the higher parts of the structure. Thus if you make a backup reference for your data you can get the correct output, for example:
path = ['country', 'city', 'items']
data = {
'country': {
'city': {
'items': [
{
'name': '114th Street',
'unwanted_property': 'foo',
},
{
'name': '8th Avenue',
'unwanted_property': 'foo',
},
]
}
}
}
data_ref = data
for p in path:
if p == 'items':
data = [i for i in data[p]]
else:
data = data[p]
if isinstance(data, list):
for d in data:
del d['unwanted_property']
else:
del data['unwanted_property']
data = data_ref
def delKey(your_dict,path):
if len(path) == 1:
for item in your_dict:
del item[path[0]]
return
delKey( your_dict[path[0]],path[1:])
data
{'country': {'city': {'items': [{'name': '114th Street', 'unwanted_property': 'foo'}, {'name': '8th Avenue', 'unwanted_property': 'foo'}]}}}
path
['country', 'city', 'items', 'unwanted_property']
delKey(data,path)
data
{'country': {'city': {'items': [{'name': '114th Street'}, {'name': '8th Avenue'}]}}}
You need to remove the key unwanted_property.
names_list = []
def remove_key_from_items(data):
for d in data:
if d != 'items':
remove_key_from_items(data[d])
else:
for item in data[d]:
unwanted_prop = item.pop('unwanted_property', None)
names_list.append(item)
This will remove the key. The second parameter None is returned if the key unwanted_property does not exist.
EDIT:
You can use pop even without the second parameter. It will raise KeyError if the key does not exist.
EDIT 2: Updated to recursively go into depth of data dict until it finds the items key, where it pops the unwanted_property as desired and append into the names_list list to get the desired output.
Using operator.itemgetter you can compose a function to return the final key's value.
import operator, functools
def compose(*functions):
'''returns a callable composed of the functions
compose(f, g, h, k) -> f(g(h(k())))
'''
def compose2(f, g):
return lambda x: f(g(x))
return functools.reduce(compose2, functions, lambda x: x)
get_items = compose(*[operator.itemgetter(key) for key in path[::-1]])
Then use it like this:
path = ['country', 'city', 'items']
unwanted_property = 'unwanted_property'
for thing in get_items(data):
del thing[unwanted_property]
Of course if the path contains non-existent keys it will throw a KeyError - you probably should account for that:
path = ['country', 'foo', 'items']
get_items = compose(*[operator.itemgetter(key) for key in path[::-1]])
try:
for thing in get_items(data):
del thing[unwanted_property]
except KeyError as e:
print('missing key:', e)
You can try this:
path = ['country', 'city', 'items']
previous_data = data[path[0]]
previous_key = path[0]
for i in path:
previous_data = previous_data[i]
previous_key = i
if isinstance(previous_data, list):
for c, b in enumerate(previous_data):
if "unwanted_property" in b:
del previous_data[c]["unwanted_property"]
current_dict = {}
previous_data_dict = {}
for i, a in enumerate(path):
if i == 0:
current_dict[a] = data[a]
previous_data_dict = data[a]
else:
if a == previous_key:
current_dict[a] = previous_data
else:
current_dict[a] = previous_data_dict[a]
previous_data_dict = previous_data_dict[a]
data = current_dict
print(data)
Output:
{'country': {'city': {'items': [{'name': '114th Street'}, {'name': '8th Avenue'}]}}, 'items': [{'name': '114th Street'}, {'name': '8th Avenue'}], 'city': {'items': [{'name': '114th Street'}, {'name': '8th Avenue'}]}}