Sum of specific key-value in python dictionary - python

I have a school dictionary as follow-
{
ID('6a15ce'): {
'count': 5,
'amount': 0,
'r_amount': None,
'sub': < subobj >
}, ID('464ba1'): {
'count': 2,
'amount': 120,
'r_amount': None,
'sub': < subobj2 >
}
}
I want to find out the sum of amount , doing as follow-
{k:sum(v['amount']) for k,v in school.items()}
but here I am getting error TypeError: 'int' object is not iterable what could be efficient way to achieve.

You can do:
result = sum(v["amount"] for v in school.values())

You can also do it using the map function:
result = sum(map(lambda i: i['amount'], school.values()))
print(result)
Output:
120

This is a functional solution:
from operator import itemgetter
res = sum(map(itemgetter('amount'), school.values()))

sum(map(lambda schoolAmount: schoolAmount.amount, school))

Related

Structure JSON format to a specified data structure

Basically I have a list
data_list = [
'__att_names' : [
['id', 'name'], --> "__t_idx": 0
['location', 'address'] --> "__t_idx": 1
['random_key1', 'random_key2'] "__t_idx": 2
['random_key3', 'random_key4'] "__t_idx": 3
]
"__root": {
"comparables": [
"__g_id": "153564396",
"__atts": [
1, --> This would be technically __att_names[0][1]
'somerandomname',--> This would be technically __att_names[0][2]
{
"__atts": [
'location_value', --> This would be technically __att_names[1][1]
'address_value',--> This would be technically __att_names[1][2]
"__atts": [
]
"__t_idx": 1 --> It can keep getting nested.. further and further.
]
"__t_idx": 1
}
{
"__atts": [
'random_key3value'
'random_key3value'
]
"__t_idx": 3
}
{
"__atts": [
'random_key1value'
'random_key2value'
]
"__t_idx": 2
}
],
"__t_idx": 0 ---> This maps to the first item in __att_names
]
}
]
My desired output in this case would be
[
{
'id': 1,
'name': 'somerandomname',
'location': 'address_value',
'random_key1': 'random_key1value',
'random_key2': 'random_key2value',
'random_key3': 'random_key3value',
'random_key4': 'random_key4value',
}
]
I was able to get it working for the first few nested fields for __att_names, but my code was getting really long and wonky when I was doing nested and it felt really repetitive.
I feel like there is a neater and recursive way to solve this.
This is my current approach:
As of now the following code does take care first the very first nested object..
payload_names = data_list['__att_names']
comparable_data = data_list['__root']['comparables']
output_arr = []
for items in comparable_data[:1]:
output = {}
index_number = items.get('__t_idx')
attributes = items.get('__atts')
if attributes:
recursive_function(index_number, attributes, payload_names, output)
output_arr.append(output)
def recursive_function(index, attributes, payload_names, output):
category_location = payload_names[index]
for index, categories in enumerate(category_location):
output[categories] = attributes[index]
if type(attributes[index]) == dict:
has_nested_index = attributes[index].get('__t_idx')
has_nested_attributes = attributes[index].get('__atts')
if has_nested_attributes and has_nested_index:
recursive_function(has_nested_index, has_nested_attributes, payload_names, output)
else:
continue
To further explain given example:
[ {
'id': 1,
'name': 'somerandomname',
'location': 'address_value',
'random_key1': 'random_key1value',
'random_key2': 'random_key2value',
'random_key3': 'random_key3value',
'random_key4': 'random_key4value',
}
]
Specifically 'location': 'address_value', The value 'address_value' was derived from the array of comparables key which has the array of dictionaries with key value pair. i.e __g_id and __atts and also __t_idx note some of them might not have __g_id but when there is a key __atts there is also __t_idx which would map the index with array in __att_names
Overally
__att_names are basically all the different keys
and all the items within comparables -> __atts are basically the values for the key names in __att_names.
__t_idx helps us map __atts array items to __att_names and create a dictionary key-value as outcome.
If you want to restructure a complex JSON object, my recommendation is to use jq.
Python package
Oficial website
The data you present is really confusing and ofuscated, so I'm not sure what exact filtering your case would require. But your problem involves indefinitely nested data, for what I understand. So instead of a recursive function, you could make a loop that unnests the data into the plain structure that you desire. There's already a question on that topic.
You can traverse the structure while tracking the __t_idx key values that correspond to list elements that are not dictionaries:
data_list = {'__att_names': [['id', 'name'], ['location', 'address'], ['random_key1', 'random_key2'], ['random_key3', 'random_key4']], '__root': {'comparables': [{'__g_id': '153564396', '__atts': [1, 'somerandomname', {'__atts': ['location_value', 'address_value', {'__atts': [], '__t_idx': 1}], '__t_idx': 1}, {'__atts': ['random_key3value', 'random_key4value'], '__t_idx': 3}, {'__atts': ['random_key1value', 'random_key2value'], '__t_idx': 2}], '__t_idx': 0}]}}
def get_vals(d, f = False, t_idx = None):
if isinstance(d, dict) and '__atts' in d:
yield from [i for a, b in d.items() for i in get_vals(b, t_idx = d.get('__t_idx'))]
elif isinstance(d, list):
yield from [i for b in d for i in get_vals(b, f = True, t_idx = t_idx)]
elif f and t_idx is not None:
yield (d, t_idx)
result = []
for i in data_list['__root']['comparables']:
new_d = {}
for a, b in get_vals(i):
new_d[b] = iter([*new_d.get(b, []), a])
result.append({j:next(new_d[i]) for i, a in enumerate(data_list['__att_names']) for j in a})
print(result)
Output:
[
{'id': 1,
'name': 'somerandomname',
'location': 'location_value',
'address': 'address_value',
'random_key1': 'random_key1value',
'random_key2': 'random_key2value',
'random_key3': 'random_key3value',
'random_key4': 'random_key4value'
}
]

Python list json conversion to list

I have a list in the below format.
['111: {"id":"de80ca97","data":"test"}', '222: {"id":"8916a167","data":"touch"}', '333: {"id":"12966e98","data":"tap"}']
I need to remove the data column from above list / json and replace it with key value of the list.
I need to transform it to the below structure.
Desired output:
[
{
"score":111,
"id":"de80ca97"
},
{
"score":222,
"id":"8916a167"
},
{
"score":333,
"id":"12966e98"
}
]
Any suggestions or ideas most welcome.
You can use a for loop or you can also use a list comprehension as follows:
>>> import json
>>> l = ['111: {"id":"de80ca97","data":"test"}', '222: {"id":"8916a167","data":"touch"}', '333: {"id":"12966e98","data":"tap"}']
>>> [{'score': int(e.split()[0][:-1]), 'id': json.loads(e.split()[1])['id']} for e in l]
If you prefer to use a for loop:
new_l = []
for e in l:
key, json_str = e.split()
new_l.append({'score': int(key[:-1]), 'id': json.loads(json_str)['id']})

How to get the full path of a key in a complex list of dictionary

So i have a complex list with dictionaries and lists as values.
This is the one:
list = [
{"folder1": [
{"file1": 5},
{"folder3": [{"file2": 7},
{"file3": 10}]},
{"file4": 9}
]
},
{"folder2": [
{"folder4": []},
{"folder5": [
{"folder6": [{"file5": 17}]},
{"file6": 6},
{"file7": 5}
]},
{"file8": 10}
]
}
]
I need to extract the path for each file like a directory tree how is stored on a hdd:
Output sample:
output:
folder1/file1
folder1/file4
folder1/folder3/file2
folder1/folder3/file3
folder2/file8
folder2/folder4
folder2/folder5/file6
folder2/folder5/file7
folder2/folder5/folder6/file5
Please help, i have been struggling and could not find a way.
Thank you
You can use recursion with yield:
def get_paths(d, seen):
for a, b in d.items():
if not isinstance(b, list) or not b:
yield '{}/{}'.format("/".join(seen), a)
else:
for c in b:
for t in get_paths(c, seen+[a]):
yield t
print('\n'.join([i for b in data for i in get_paths(b, [])]))
Output:
folder1/file1
folder1/folder3/file2
folder1/folder3/file3
folder1/file4
folder2/folder4
folder2/folder5/folder6/file5
folder2/folder5/file6
folder2/folder5/file7
folder2/file8

Get average value from list of dictionary

I have lists of dictionary. Let's say it
total = [{"date": "2014-03-01", "value": 200}, {"date": "2014-03-02", "value": 100}{"date": "2014-03-03", "value": 400}]
I need get maximum, minimum, average value from it. I can get max and min values with below code:
print min(d['value'] for d in total)
print max(d['value'] for d in total)
But now I need get average value from it. How to do it?
Just divide the sum of values by the length of the list:
print sum(d['value'] for d in total) / len(total)
Note that division of integers returns the integer value. This means that average of the [5, 5, 0, 0] will be 2 instead of 2.5. If you need more precise result then you can use the float() value:
print float(sum(d['value'] for d in total)) / len(total)
I needed a more general implementation of the same thing to work on the whole dictionary. So here is one simple option:
def dict_mean(dict_list):
mean_dict = {}
for key in dict_list[0].keys():
mean_dict[key] = sum(d[key] for d in dict_list) / len(dict_list)
return mean_dict
Testing:
dicts = [{"X": 5, "value": 200}, {"X": -2, "value": 100}, {"X": 3, "value": 400}]
dict_mean(dicts)
{'X': 2.0, 'value': 233.33333333333334}
reduce(lambda x, y: x + y, [d['value'] for d in total]) / len(total)
catavaran's anwser is more easy, you don't need a lambda
An improvement on dsalaj's answer if the values are numeric lists instead:
def dict_mean(dict_list):
mean_dict = {}
for key in dict_list[0].keys():
mean_dict[key] = np.mean([d[key] for d in dict_list], axis=0)
return mean_dict

iterative long-to-wide python one-liner (or two) using groupby

I'm looking to turn a long dataset into a wide one using functional and iterative tools, and my understanding is that this is a task for groupby. I've asked a couple of questions about this before, and thought I had it, but not quite in this case, which ought to be simpler:
Python functional transformation of JSON list of dictionaries from long to wide
Correct use of a fold or reduce function to long-to-wide data in python or javascript?
Here's the data I have:
from itertools import groupby
from operator import itemgetter
from pprint import pprint
>>> longdat=[
{"id":"cat", "name" : "best meower", "value": 10},
{"id":"cat", "name" : "cleanest paws", "value": 8},
{"id":"cat", "name" : "fanciest", "value": 9},
{"id":"dog", "name" : "smelly", "value": 9},
{"id":"dog", "name" : "dumb", "value": 9},
]
Here's the format I want it in:
>>> widedat=[
{"id":"cat", "best meower": 10, "cleanest paws": 8, "fanciest": 9},
{"id":"dog", "smelly": 9, "dumb": 9},
]
Here are my failed attempts:
# WRONG
>>> gh = groupby(sorted(longdat,key=id),itemgetter('id'))
>>> list(gh)
[('cat', <itertools._grouper object at 0x5d0b550>), ('dog', <itertools._grouper object at 0x5d0b210>)]
OK, need to get the second item out of the iterator, fair enough.
#WRONG
>>> gh = groupby(sorted(longdat,key=id),itemgetter('id'))
>>> for g,v in gh:
... {"id":i["id"], i["name"]:i["value"] for i in v}
^
SyntaxError: invalid syntax
Weird, it looked valid. Let's unwind those loops to make sure.
#WRONG
gb = groupby(sorted(longdat,key=id),itemgetter('id'))
data = {}
for g,v in gb:
data[g] = {}
for i in v:
data[g] = i
#WRONG
gb = groupby(sorted(longdat,key=id),itemgetter('id'))
data = []
for g,v in gb:
for i in v:
data[g] = i
Ah! OK, let's go back to the one-line form
#WRONG
>>> gb = groupby(sorted(longdat,key=id),itemgetter('id'))
>>> [{"id":g, i["name"]:i["value"]} for i in k for g,k in gb]
[]
What? Why empty?! Let's unwind basically exactly this again:
#WRONG
gb = groupby(sorted(longdat,key=id),itemgetter('id'))
for g,k in gb:
for i in k:
print(g, i["name"],i["value"])
cat best meower 10
cat fanciest 9
cat cleanest paws 8
dog smelly 9
dog dumb 9
Now, this last one is obviously the worst---it's clear my data is basically right back where it started, as if I didn't even groupby.
Why is this not working and how can I get this in the format I'm seeking?
Also, is it possibly to phrase this entirely iteratively such that I could do
>>> result[0]
{"id":"cat", "best meower": 10, "cleanest paws": 8, "fanciest": 9}
and only get the first result without processing the entire list (beyond having to look at /all/ where id == 'cat'?)
key function passed to the sorted function is id. It will return all different values for all list items.
It should be itemgetter('id') or lambda x: x.id.
>>> id(longdat[0])
41859624L
>>> id(longdat[1])
41860488L
>>> id(longdat[2])
41860200L
>>> itemgetter('id')(longdat[1])
'cat'
>>> itemgetter('id')(longdat[2])
'cat'
>>> itemgetter('id')(longdat[3])
'cat'
from itertools import groupby
from operator import itemgetter
longdat = [
{"id":"cat", "name" : "best meower", "value": 10},
{"id":"cat", "name" : "cleanest paws", "value": 8},
{"id":"cat", "name" : "fanciest", "value": 9},
{"id":"dog", "name" : "smelly", "value": 9},
{"id":"dog", "name" : "dumb", "value": 9},
]
getid = itemgetter('id')
result = [
dict([['id', key]] + [[d['name'], d['value']] for d in grp])
for key, grp in groupby(sorted(longdat, key=getid), key=getid)
]
print(result)
output:
[{'best meower': 10, 'fanciest': 9, 'id': 'cat', 'cleanest paws': 8},
{'dumb': 9, 'smelly': 9, 'id': 'dog'}]

Categories