Aggregate certain values in array of dictionary based on key/value criteria

Aggregate certain values in array of dictionary based on key/value criteria - python

I have the below JSON of forum posts.
What would be the pythonic way of creating a resulting JSON of aggregated Positive/Negative ratings per forum?
Input Json:
{"Posting_Stats":{
"Posts":[
{
"Date":"2020-03-29 12:41:00",
"Forum":"panorama",
"Positive":2,
"Negative":0
},
{
"Date":"2020-03-29 12:37:00",
"Forum":"web",
"Positive":6,
"Negative":0
},
{
"Date":"2020-03-29 12:37:00",
"Forum":"web",
"Positive":2,
"Negative":2
},...]}
Output should be:
{"Forum_Stats" : [{"Forum" : "panorama",
"Positive":2,
"Negative":0},
{"Forum" : "web",
"Positive":8,
"Negative":2},...]
}
]

Cannot think of a different way:
posts = inputData['Posting_Stats']['Posts']
postAggregator = {}
for post in posts:
try:
postAggregator[post['Forum']]['Positive'] += post.get('Positive',0)
postAggregator[post['Forum']]['Negative'] += post.get('Negative',0)
except KeyError:
postAggregator.update({post['Forum']:{"Positive":post.get('Positive',0), "Negative":post.get('Negative',0)}})
outputData = {"Forum_Stats": []}
for key, value in postAggregator.items():
outputData['Forum_Stats'].append({"Forum":key , "Positive":value['Positive'],"Negative":value['Negative']})
print(outputData)
Output:
{'Forum_Stats': [{'Forum': 'panorama', 'Positive': 2, 'Negative': 0}, {'Forum': 'web', 'Positive': 8, 'Negative': 2}]}

This may be one way of solving:
#taking the input in a dictionary
d = {"Posting_Stats":{
"Posts":[
{
"Date":"2020-03-29 12:41:00",
"Forum":"panorama",
"Positive":2,
"Negative":0
},
{
"Date":"2020-03-29 12:37:00",
"Forum":"web",
"Positive":6,
"Negative":0
},
{
"Date":"2020-03-29 12:37:00",
"Forum":"web",
"Positive":2,
"Negative":2
}]}}
#iterating over the values to get their some on the basis of forum as key
temp = {}
for i in d.get('Posting_Stats').get('Posts'):
if temp.get(i.get('Forum')) == None:
temp[i.get('Forum')] = {}
temp[i.get('Forum')]['Positive'] = 0
temp[i.get('Forum')]['Negative'] = 0
temp[i.get('Forum')]['Positive']+=i.get('Positive')
temp[i.get('Forum')]['Negative']+=i.get('Negative')
Finally converting the output into the required format
output = [{'Forum': i , **temp[i] } for i in temp]
print(output)
#[{'Forum': 'panorama', 'Positive': 2, 'Negative': 0},
#{'Forum': 'web', 'Positive': 8, 'Negative': 2}]

Related

Merge dictionaries with same key from two lists of dicts in python

I have two dictionaries, as below. Both dictionaries have a list of dictionaries as the value associated with their properties key; each dictionary within these lists has an id key. I wish to merge my two dictionaries into one such that the properties list in the resulting dictionary only has one dictionary for each id.
{
"name":"harry",
"properties":[
{
"id":"N3",
"status":"OPEN",
"type":"energetic"
},
{
"id":"N5",
"status":"OPEN",
"type":"hot"
}
]
}
and the other list:
{
"name":"harry",
"properties":[
{
"id":"N3",
"type":"energetic",
"language": "english"
},
{
"id":"N6",
"status":"OPEN",
"type":"cool"
}
]
}
The output I am trying to achieve is:
"name":"harry",
"properties":[
{
"id":"N3",
"status":"OPEN",
"type":"energetic",
"language": "english"
},
{
"id":"N5",
"status":"OPEN",
"type":"hot"
},
{
"id":"N6",
"status":"OPEN",
"type":"cool"
}
]
}
As id: N3 is common in both the lists, those 2 dicts should be merged with all the fields. So far I have tried using itertools and
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = tuple(d[k] for d in ds)
Could someone please help in figuring this out?

Here is one of the approach:
a = {
"name":"harry",
"properties":[
{
"id":"N3",
"status":"OPEN",
"type":"energetic"
},
{
"id":"N5",
"status":"OPEN",
"type":"hot"
}
]
}
b = {
"name":"harry",
"properties":[
{
"id":"N3",
"type":"energetic",
"language": "english"
},
{
"id":"N6",
"status":"OPEN",
"type":"cool"
}
]
}
# Create dic maintaining the index of each id in resp dict
a_ids = {item['id']: index for index,item in enumerate(a['properties'])} #{'N3': 0, 'N5': 1}
b_ids = {item['id']: index for index,item in enumerate(b['properties'])} #{'N3': 0, 'N6': 1}
# Loop through one of the dict created
for id in a_ids.keys():
# If same ID exists in another dict, update it with the key value
if id in b_ids:
b['properties'][b_ids[id]].update(a['properties'][a_ids[id]])
# If it does not exist, then just append the new dict
else:
b['properties'].append(a['properties'][a_ids[id]])
print (b)
Output:
{'name': 'harry', 'properties': [{'id': 'N3', 'type': 'energetic', 'language': 'english', 'status': 'OPEN'}, {'id': 'N6', 'status': 'OPEN', 'type': 'cool'}, {'id': 'N5', 'status': 'OPEN', 'type': 'hot'}]}

It might help to treat the two objects as elements each in their own lists. Maybe you have other objects with different name values, such as might come out of a JSON-formatted REST request.
Then you could do a left outer join on both name and id keys:
#!/usr/bin/env python
a = [
{
"name": "harry",
"properties": [
{
"id":"N3",
"status":"OPEN",
"type":"energetic"
},
{
"id":"N5",
"status":"OPEN",
"type":"hot"
}
]
}
]
b = [
{
"name": "harry",
"properties": [
{
"id":"N3",
"type":"energetic",
"language": "english"
},
{
"id":"N6",
"status":"OPEN",
"type":"cool"
}
]
}
]
a_names = set()
a_prop_ids_by_name = {}
a_by_name = {}
for ao in a:
an = ao['name']
a_names.add(an)
if an not in a_prop_ids_by_name:
a_prop_ids_by_name[an] = set()
for ap in ao['properties']:
api = ap['id']
a_prop_ids_by_name[an].add(api)
a_by_name[an] = ao
res = []
for bo in b:
bn = bo['name']
if bn not in a_names:
res.append(bo)
else:
ao = a_by_name[bn]
bp = bo['properties']
for bpo in bp:
if bpo['id'] not in a_prop_ids_by_name[bn]:
ao['properties'].append(bpo)
res.append(ao)
print(res)
The idea above is to process list a for names and ids. The names and ids-by-name are instances of a Python set. So members are always unique.
Once you have these sets, you can do the left outer join on the contents of list b.
Either there's an object in b that doesn't exist in a (i.e. shares a common name), in which case you add that object to the result as-is. But if there is an object in b that does exist in a (which shares a common name), then you iterate over that object's id values and look for ids not already in the a ids-by-name set. You add missing properties to a, and then add that processed object to the result.
Output:
[{'name': 'harry', 'properties': [{'id': 'N3', 'status': 'OPEN', 'type': 'energetic'}, {'id': 'N5', 'status': 'OPEN', 'type': 'hot'}, {'id': 'N6', 'status': 'OPEN', 'type': 'cool'}]}]
This doesn't do any error checking on input. This relies on name values being unique per object. So if you have duplicate keys in objects in both lists, you may get garbage (incorrect or unexpected output).

How to create a dictionary with duplicate keys and form a list of dictionary

I am trying to write a program where I am having a list of dictionaries in the following manner
[
{
'unique':1,
'duplicate':2,
},
{
'unique':1,
'duplicate':2,
},
{
'unique':1,
'duplicate':2,
},
{
'unique':1,
'duplicate':2,
}
]
Can we form it as a dictionary, where the first key in tuple should become unique Key in a dictionary
and it's corresponding values as a list for that values
Example:
[
{
'unique':1,
'duplicate':2,
},
{
'unique':1,
'duplicate':8,
},
{
'unique':2,
'duplicate':2,
},
{
'unique':1,
'duplicate':4,
}
]
The above list should be converted into the following
---- Expected Outcome ---
[
{
'unique':1,
'duplicates':[2,8,4]
},
{
'unique':2,
'duplicates':[2]
}
]
PS: I am doing this in python
Thanks for the code in advance

you can also use itertools.groupby:
from itertools import groupby
from operator import itemgetter
l = [
{
'unique':1,
'duplicate':2,
},
{
'unique':1,
'duplicate':8,
},
{
'unique':2,
'duplicate':2,
},
{
'unique':1,
'duplicate':4,
}
]
key = itemgetter('unique')
result = [{'unique':k, 'duplicate': list(map(itemgetter('duplicate'), g))}
for k, g in groupby(sorted(l, key=key ), key = key)]
print(result)
output:
[{'unique': 1, 'duplicate': [2, 8, 4]}, {'unique': 2, 'duplicate': [2]}]

I think this list comprehension can solve your problem:
result = [{'unique': id, 'duplicates': [d['duplicate'] for d in l if d['unique'] == id]} for id in set(map(lambda d: d['unique'], l))]

This might help you:
l = [
{
'unique':1,
'duplicate':2,
},
{
'unique':1,
'duplicate':8,
},
{
'unique':2,
'duplicate':2,
},
{
'unique':1,
'duplicate':4,
}
]
a = set()
for i in l:
a.add(i['unique'])
d = {i:[] for i in a }
for i in l:
d[i['unique']].append(i['duplicate'])
output = [{'unique': i, 'duplicate': j}for i, j in d.items()]
The output will be:
[{'unique': 1, 'duplicate': [2, 8, 4]}, {'unique': 2, 'duplicate': [2]}]

defaultdict(list) may help you here:
from collections import defaultdict
# data = [ {'unique': 1, 'duplicate': 2}, ... ] # your data
dups = defaultdict(list) # {unique: [duplicate]}
for dd in data:
dups[dd['unique']].append(dd['duplicate'])
answer = [dict(unique = k, duplicates = v) for k, v in dups.items()]
If you don't know the name of unique key, then replace 'unique' with something like
unique_key = list(data[0].keys())[0]

unique=[]
duplicate ={}
for items in data:
if items['unique'] not in unique:
unique.append(items['unique'])
duplicate[items['unique']]=[items['duplicate']]
else:
duplicate[items['unique']].append(items['duplicate'])
new_data=[]
for key in unique:
new_data.append({'unique':key,'duplicate':duplicate[key]})
Explanation: In the first for loop, I am appending unique keys to 'unique'. If the key doesn't exists in 'unique', I will append it in 'unique' & add a key in 'duplicate' with value as single element list. If the same key is found again, I simply append that value to 'duplicate' corresponding the key. In the 2nd loop, I am creating a 'new_dict' where I am adding these unique keys & its duplicate value list

How to add new key into dictionary like this [{ {]. This looks more like a dictionary inside a list

I would like to add new key into the dictionary list. Example:
"label" : [] (with empty list)
[
{
"Next" : {
"seed" : [
{
"Argument" : [
{
"id" : 4,
"label" : "org"
},
{
"id" : "I"
},
{
"word" : "He",
"seed" : 2,
"id" : 3,
"label" : "object"
},
{
"word" : "Gets",
"seed" : 9,
"id" : 2,
"label" : "verb"
}
]
}
],
"Next" : "he,get",
"time" : ""
}
}
]
I tried to use loop into "seed" and then to "argument" then use .update("label":[]) in the loop but it won't work. Can anyone please give me an example of using for loop to loop from beginning then to add these new "label"?
My prefered goal: ( to have extra "label" within the dictionary according to my input)
Example:
[
{
"Next" : {
"seed" : [
{
"Argument" : [
{
"id" : 4,
"label" : "org"
},
{
"id" : "I"
},
{
"word" : "He",
"seed" : 2,
"id" : 3,
"label" : "object"
},
{
"word" : "Gets",
"seed" : 9,
"id" : 2,
"label" : "verb"
},
{
"id" : 5,
"label" : "EXTRA"
},
{
"id" : 6,
"label" : "EXTRA"
},
{
"id" : 7,
"label" : "EXTRA"
}
]
}
],
"Next" : "he,get",
"time" : ""
}
}
]
I am new to dictionary so really need help with this

If I understand your problem correctly, you want to add 'label' to dict in Argument where there is no label. You could do it like so -
for i in x[0]['Next']['seed'][0]['Argument']:
if not 'label' in i.keys():
i['label'] = []
Where x is your dict. But what's x[0]['Next']['seed'][0]['Argument']:?
Let's simplify your dict -
x = [{'Next': {'Next': 'he,get',
'seed': [{'Argument': [{these}, {are}, {your}, {dicts}]}],
'time': ''}}]
How did we reach here?
Let's see-
x = [{'Next'(parent dict): {'Next'(child of previous 'Next'):{},
'seed(child of previous 'Next')':[{these}, {are}, {your}, {dicts}](a list of dicts)}]
I hope that makes sense. And to add more dictionaries in Argument
# create a function that returns a dict
import random # I don't know how you want ids, so this
def create_dicts():
return {"id": random.randint(1, 10), "label": ""}
for i in range(3): # 3 is how many dicts you want to push in Argument
x[0]['Next']['seed'][0]['Argument'].append(create_dicts())
Now your dict will become -
[{'Next': {'Next': 'he,get',
'seed': [{'Argument': [{'id': 4, 'label': 'org'},
{'id': 'I'},
{'id': 3, 'label': 'object', 'seed': 2, 'word': 'He'},
{'id': 2, 'label': 'verb', 'seed': 9, 'word': 'Gets'},
{'id': 1, 'label': ''},
{'id': 4, 'label': ''},
{'id': 4, 'label': ''}]}],
'time': ''}}]

First things first: access the list of dict that need to be updated.
according to your given structure that's l[0]["Next"]["seed"][0]["Argument"]
Then iterate that list and check if label already exists, if it does not then add it as an empty list.
This can be done by explicit checking:
if "label" not in i:
i["label"] = []
or by re-assigning:
i["label"] = i.get("label", [])
Full Code:
import pprint
l = [ {
"Next" : {
"seed" : [ {
"Argument" : [ {
"id" : 4,
"label" : "org"
}, {
"id" : "I"
}, {
"word" : "He",
"seed" : 2,
"id" : 3,
"label" : "object"
}, {
"word" : "Gets",
"seed" : 9,
"id" : 2,
"label" : "verb"
} ]
} ],
"Next" : "he,get",
"time" : ""
} }]
# access the list of dict that needs to be updated
l2 = l[0]["Next"]["seed"][0]["Argument"]
for i in l2:
i["label"] = i.get("label", []) # use the existing label or add an empty list
pprint.pprint(l)
Output:
[{'Next': {'Next': 'he,get',
'seed': [{'Argument': [{'id': 4, 'label': 'org'},
{'id': 'I', 'label': []},
{'id': 3,
'label': 'object',
'seed': 2,
'word': 'He'},
{'id': 2,
'label': 'verb',
'seed': 9,
'word': 'Gets'}]}],
'time': ''}}]

You have a list with one nested dictionary. Get the list of the inner dicts, and iterate. Assuming your initial data structure is named data
dict_list = data[0]['Next']['seed'][0]['Argument']
for item in dict_list:
item['label'] = input()

python change value in nested dictionary if condition is met

I know similar questions have already been asked before, but I really having problems implementing them for my special case:
Let's say I have a dictionary with varying depths, for example:
dicti = {'files':
{'a':{'offset':100, 'start': 0},
'b':{
'c':{'offset':50, 'start':0}
'd':{'offset':70, 'start':0}
}
'e':{
'f':{'offset':80, 'start':0}
'g':{'offset':30, 'start':0}
'h':{'offset':20, 'start':0}
}
}
}
etc... (with a lot more different levels and entries)
so now I want a copy of that dictionary with basically the same structure and keys, but if 'offset' (at any level) is greater than let's say 50 'offset' should be changed to 0
I guess some kind of iterative function would be the best, but I cannot get my head around that...

You might use the standard machinery for the copy and then modify the copied dictionary (solution #1 in my example), or you might do copying and modification in the same function (solution #2).
In either case, you're looking for a recursive function.
import copy
from pprint import pprint
dicti = {'files':
{'a':{'offset':100, 'start': 0},
'b':{
'c':{'offset':50, 'start':0},
'd':{'offset':70, 'start':0},
},
'e':{
'f':{'offset':80, 'start':0},
'g':{'offset':30, 'start':0},
'h':{'offset':20, 'start':0},
}
}
}
# Solution 1, two passes
def modify(d):
if isinstance(d, dict):
if d.get('offset', 0) > 50:
d['offset'] = 0
for k,v in d.items():
modify(v)
dictj = copy.deepcopy(dicti)
modify(dictj)
pprint(dictj)
# Solution 2, copy and modify in one pass
def copy_and_modify(d):
if isinstance(d, dict):
d2 = {k:copy_and_modify(v) for k,v in d.items()}
if d2.get('offset') > 50:
d2['offset'] = 0
return d2
return d
dictj = copy_and_modify(dicti)
pprint(dictj)

A recursive solution is going to be more intuitive. You want something like the following pseudocode:
def copy(dict):
new_dict = {}
for key, value in dict:
if value is a dictionary:
new_dict[key] = copy(value)
else if key == 'offset' and value > 50:
new_dict[key] = 0
else:
new_dict[key] = value
return new_dict

d = {'files':
{'a':{'offset':100, 'start': 0},
'b':{
'c':{'offset':50, 'start':0},
'd':{'offset':70, 'start':0}
},
'e':{
'f':{'offset':80, 'start':0},
'g':{'offset':30, 'start':0},
'h':{'offset':20, 'start':0}
}
}
}
def transform(item):
new_item = item.copy() # consider usage of deepcopy if needed
if new_item['offset'] == 80:
new_item['offset'] = 'CHANGED'
return new_item
def visit(item):
if item.get('offset'):
return transform(item)
else:
return {k: visit(v) for k, v in item.items()}
result = visit(d)
print(result)
Output:
{
'files': {
'b': {
'd': {
'offset': 70,
'start': 0
},
'c': {
'offset': 50,
'start': 0
}
},
'e': {
'g': {
'offset': 30,
'start': 0
},
'h': {
'offset': 20,
'start': 0
},
'f': {
'offset': 'CHANGED',
'start': 0
}
},
'a': {
'offset': 100,
'start': 0
}
}
}
You can revise some links regarding stuff which is used in the answer:
Recursion
Visitor pattern

You could call a recursive function to change its value once condition is met:
dicti = {'files':
{'a':{'offset':100, 'start': 0},
'b':{
'c':{'offset':50, 'start':0},
'd':{'offset':70, 'start':0}
},
'e':{
'f':{'offset':80, 'start':0},
'g':{'offset':30, 'start':0},
'h':{'offset':20, 'start':0}
}
}
}
def dictLoop(dt):
for k, v in dt.items():
if isinstance(v, int):
if k == 'offset' and v > 50:
dt[k] = 0
else: dictLoop(v)
return dt
print dictLoop(dicti)

replace information in Json string based on a condition

I have a very large json file with several nested keys. From whaat I've read so far, if you do:
x = json.loads(data)
Python will interpret it as a dictionary (correct me if I'm wrong). The fourth level of nesting in the json file contains several elements named by an ID number and all of them contain an element called children, something like this:
{"level1":
{"level2":
{"level3":
{"ID1":
{"children": [1,2,3,4,5]}
}
{"ID2":
{"children": []}
}
{"ID3":
{"children": [6,7,8,9,10]}
}
}
}
}
What I need to do is to replace all items in all the "children" elements with nothing, meaning "children": [] if the ID number is in a list called new_ids and then convert it back to json. I've been reading on the subject for a few hours now but I haven't found anything similar to this to try to help myself.
I'm running Python 3.3.3. Any ideas are greatly appreciated!!
Thanks!!
EDIT
List:
new_ids=["ID1","ID3"]
Expected result:
{"level1":
{"level2":
{"level3":
{"ID1":
{"children": []}
}
{"ID2":
{"children": []}
}
{"ID3":
{"children": []}
}
}
}
}

First of all, your JSON is invalid. I assume you want this:
{"level1":
{"level2":
{"level3":
{
"ID1":{"children": [1,2,3,4,5]},
"ID2":{"children": []},
"ID3":{"children": [6,7,8,9,10]}
}
}
}
}
Now, load your data as a dictionary:
>>> with open('file', 'r') as f:
... x = json.load(f)
...
>>> x
{u'level1': {u'level2': {u'level3': {u'ID2': {u'children': []}, u'ID3': {u'children': [6, 7, 8, 9, 10]}, u'ID1': {u'children': [1, 2, 3, 4, 5]}}}}}
Now you can loop over the keys in x['level1']['level2']['level3'] and check whether they are in your new_ids.
>>> new_ids=["ID1","ID3"]
>>> for key in x['level1']['level2']['level3']:
... if key in new_ids:
... x['level1']['level2']['level3'][key]['children'] = []
...
>>> x
{u'level1': {u'level2': {u'level3': {u'ID2': {u'children': []}, u'ID3': {u'children': []}, u'ID1': {u'children': []}}}}}
You can now write x back to a file like this:
with open('myfile', 'w') as f:
f.write(json.dumps(x))
If your new_ids list is large, consider making it a set.

If you have simple dictionary like this
data_dict = {
"level1": {
"level2":{
"level3":{
"ID1":{"children": [1,2,3,4,5]},
"ID2":{"children": [] },
"ID3":{"children": [6,7,8,9,10]},
}
}
}
}
than you need only this:
data_dict = {
"level1": {
"level2":{
"level3":{
"ID1":{"children": [1,2,3,4,5]},
"ID2":{"children": [] },
"ID3":{"children": [6,7,8,9,10]},
}
}
}
}
new_ids=["ID1","ID3"]
for idx in new_ids:
if idx in data_dict['level1']["level2"]["level3"]:
data_dict['level1']["level2"]["level3"][idx]['children'] = []
print data_dict
'''
{
'level1': {
'level2': {
'level3': {
'ID2': {'children': []},
'ID3': {'children': []},
'ID1': {'children': []}
}
}
}
}
'''
but if you have more complicated dictionary
data_dict = {
"level1a": {
"level2a":{
"level3a":{
"ID2":{"children": [] },
"ID3":{"children": [6,7,8,9,10]},
}
}
},
"level1b": {
"level2b":{
"level3b":{
"ID1":{"children": [1,2,3,4,5]},
}
}
}
}
new_ids =["ID1","ID3"]
for level1 in data_dict.values():
for level2 in level1.values():
for level3 in level2.values():
for idx in new_ids:
if idx in level3:
level3[idx]['children'] = []
print data_dict
'''
{
'level1a': {
'level2a': {
'level3a': {
'ID2': {'children': []},
'ID3': {'children': []}
}
}
},
'level1b': {
'level2b': {
'level3b': {
'ID1': {'children': []}
}
}
}
}
'''

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Aggregate certain values in array of dictionary based on key/value criteria - python

Related

Merge dictionaries with same key from two lists of dicts in python

How to create a dictionary with duplicate keys and form a list of dictionary

How to add new key into dictionary like this [{ {]. This looks more like a dictionary inside a list

python change value in nested dictionary if condition is met

replace information in Json string based on a condition

Categories

Resources