Group/Count list of dictionaries based on value - python

I've got a list of Tokens which looks something like:
[{
Value: "Blah",
StartOffset: 0,
EndOffset: 4
}, ... ]
What I want to do is get a count of how many times each value occurs in the list of tokens.
In VB.Net I'd do something like...
Tokens = Tokens.
GroupBy(Function(x) x.Value).
Select(Function(g) New With {
.Value = g.Key,
.Count = g.Count})
What's the equivalent in Python?

IIUC, you can use collections.Counter:
>>> from collections import Counter
>>> tokens = [{"Value": "Blah", "SO": 0}, {"Value": "zoom", "SO": 5}, {"Value": "Blah", "SO": 2}, {"Value": "Blah", "SO": 3}]
>>> Counter(tok['Value'] for tok in tokens)
Counter({'Blah': 3, 'zoom': 1})
if you only need a count. If you want them grouped by the value, you could use itertools.groupby and something like:
>>> from itertools import groupby
>>> def keyfn(x):
return x['Value']
...
>>> [(k, list(g)) for k,g in groupby(sorted(tokens, key=keyfn), keyfn)]
[('Blah', [{'SO': 0, 'Value': 'Blah'}, {'SO': 2, 'Value': 'Blah'}, {'SO': 3, 'Value': 'Blah'}]), ('zoom', [{'SO': 5, 'Value': 'zoom'}])]
although it's a little trickier because groupby requires the grouped terms to be contiguous, and so you have to sort by the key first.

Let's assume that is your python list, containing dictionnaries:
my_list = [{'Value': 'Blah',
'StartOffset': 0,
'EndOffset': 4},
{'Value': 'oqwij',
'StartOffset': 13,
'EndOffset': 98},
{'Value': 'Blah',
'StartOffset': 6,
'EndOffset': 18}]
A one liner:
len([i for i in a if i['Value'] == 'Blah']) # returns 2

import collections
# example token list
tokens = [{'Value':'Blah', 'Start':0}, {'Value':'BlahBlah'}]
count=collections.Counter([d['Value'] for d in tokens])
print count
shows
Counter({'BlahBlah': 1, 'Blah': 1})

token = [{
'Value': "Blah",
'StartOffset': 0,
'EndOffset': 4
}, ... ]
value_counter = {}
for t in token:
v = t['Value']
if v not in value_counter:
value_counter[v] = 0
value_counter[v] += 1
print value_counter

Another efficient way is to convert data to Pandas DataFrame and then aggregate them. Like this:
import pandas as pd
df = pd.DataFrame(data)
df.groupby('key')['value'].count()
df.groupby('key')['value'].sum()

Related

Convert pandas.DataFrame to list of dictionaries in Python

I have a dictionary which is converted from a dataframe as below :
a = d.to_json(orient='index')
Dictionary :
{"0":{"yr":2017,"PKID":"58306, 57011","Subject":"ABC","ID":"T001"},"1":{"yr":2018,"PKID":"1234,54321","Subject":"XYZ","ID":"T002"}}
What I need is it be in a list, so essentially a list of dictionary.
So i just add a [] because that is the format to be used in the rest of the code.
input_dict = [a]
input_dict :
['
{"0":{"yr":2017,"PKID":"58306, 57011","Subject":"ABC","ID":"T001"},"1":{"yr":2018,"PKID":"1234,54321","Subject":"XYZ","ID":"T002"}}
']
I need to get the single quotes removed just after the [ and just before the ]. Also, have the PKID values in form of list.
How can this be achieved ?
Expected Output :
[ {"yr":2017,"PKID":[58306, 57011],"Subject":"ABC","ID":"T001"},"1":{"yr":2018,"PKID":[1234,54321],"Subject":"XYZ","ID":"T002"} ]
NOTE : The PKID column has multiple integer values which have to come as a lift of integers. a string is not acceptable.
so we need like "PKID":[58306, 57011] and not "PKID":"[58306, 57011]"
pandas.DataFrame.to_json returns a string (JSON string), not a dictionary. Try to_dict instead:
>>> df
col1 col2
0 1 3
1 2 4
>>> [df.to_dict(orient='index')]
[{0: {'col1': 1, 'col2': 3}, 1: {'col1': 2, 'col2': 4}}]
>>> df.to_dict(orient='records')
[{'col1': 1, 'col2': 3}, {'col1': 2, 'col2': 4}]
Here is one way:
from collections import OrderedDict
d = {"0":{"yr":2017,"PKID":"58306, 57011","Subject":"ABC","ID":"T001"},"1":{"yr":2018,"PKID":"1234,54321","Subject":"XYZ","ID":"T002"}}
list(OrderedDict(sorted(d.items())).values())
# [{'ID': 'T001', 'PKID': '58306, 57011', 'Subject': 'ABC', 'yr': 2017},
# {'ID': 'T002', 'PKID': '1234,54321', 'Subject': 'XYZ', 'yr': 2018}]
Note the ordered dictionary is ordered by text string keys, as supplied. You may wish to convert these to integers first before any processing via d = {int(k): v for k, v in d.items()}.
You are converting your dictionary to json which is a string. Then you wrap your resulting string a list. So, naturally, the result is a string inside of a list.
Try instead: [d] where d is your raw dictionary (not converted json
You can use a list comprehension
Ex:
d = {"0":{"yr":2017,"PKID":"58306, 57011","Subject":"ABC","ID":"T001"},"1":{"yr":2018,"PKID":"1234,54321","Subject":"XYZ","ID":"T002"}}
print [{k: v} for k, v in d.items()]
Output:
[{'1': {'PKID': '1234,54321', 'yr': 2018, 'ID': 'T002', 'Subject': 'XYZ'}}, {'0': {'PKID': '58306, 57011', 'yr': 2017, 'ID': 'T001', 'Subject': 'ABC'}}]
What about something like this:
from operator import itemgetter
d = {"0":{"yr":2017,"PKID":"58306, 57011","Subject":"ABC","ID":"T001"},"1":
{"yr":2018,"PKID":"1234,54321","Subject":"XYZ","ID":"T002"}}
sorted_d = sorted(d.items(), key=lambda x: int(x[0]))
print(list(map(itemgetter(1), sorted_d)))
Which Outputs:
[{'yr': 2017, 'PKID': '58306, 57011', 'Subject': 'ABC', 'ID': 'T001'},
{'yr': 2018, 'PKID': '1234,54321', 'Subject': 'XYZ', 'ID': 'T002'}]

How to remove a json string from list in python without losing data

My question is similar to that of another question in SO How to remove a json string from list.
The solution to that question did solve a part of my problem but mine is little different.
My lists are:
list1 = [{"ID": 1, "data": "12"},{"ID": 2, "data": "13"}]
list2 = [{"ID": 1, "col": "5"},{"ID": 1, "col": "8"},{"ID": 2,"col": "2"}]
I did the following to modify the final list:
per_id = {}
for info in chain(list1, list2):
per_id.setdefault(info['ID'], {}).update(info)
output = list(per_id.values())
The expected output was:
output = [{"ID": 1,"data": "12", "col": "5"},{"ID": 1,"data": "12", "col": "8"},{"ID": 2,"data": "13","col": "2"}]
But the output i got is:
output = [{"ID": 1,"data": "12", "col": "5"},{"ID": 2,"data": "13","col": "2"}]
Is there a way to rectify this problem.
You get the second version because there is more than one "ID" with the the value of 1. If you use a defaultdict with a list, you can set it to append.
This example is taken directly from the page:
>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
... d[k].append(v)
...
>>> d.items()
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
Try itertools-combinations-
from itertools import chain
from itertools import combinations
list1 = [{"ID": 1, "data": "12"},{"ID": 2, "data": "13"}]
list2 = [{"ID": 1, "col": "5"},{"ID": 1, "col": "8"},{"ID": 2,"col": "2"}]
data = []
for i,j in combinations(chain(list1,list2),2):
if i['ID'] == j['ID']:
d = dict(i.items()+j.items())
if len(d.keys())==3:#Ensure that it has three keys i.e. avoid combination between lsit1 elements or combination between list2 elements themselves.
data.append(d)
print data
Output-
[{'data': '12', 'ID': 1, 'col': '5'}, {'data': '12', 'ID': 1, 'col': '8'}, {'data': '13', 'ID': 2, 'col': '2'}]
The previous answer from Serge Ballesta works if you include a simple check to avoid repeating values (i would comment it, but i don't have enough reputation).
result = [] # start with an empty list
for elt1 in list1:
for elt2 in list2:
if elt1['ID'] == elt2['ID']:
for k in elt2.keys():
if k != "ID":
eltr = elt1.copy() # take a copy to avoid changing original lists
eltr.update(elt2)
result.append(eltr)
result
Output:
[{'data': '12', 'ID': 1, 'col': '5'}, {'data': '12', 'ID': 1, 'col': '8'}, {'data': '13', 'ID': 2, 'col': '2'}]
If you have one list that contains some attributes for an ID and another one that contains other attributes, chaining is probably not the best idea.
Here you could simply iterate separately both lists and update the map of one list with the map of the second one. Code example in python console:
>>> result = [] # start with an empty list
>>> for elt1 in list1:
for elt2 in list2:
if elt1['ID'] == elt2['ID']:
eltr = elt1.copy() # take a copy to avoid changing original lists
eltr.update(elt2)
result.append(eltr)
>>> result
[{'data': '12', 'ID': 1, 'col': '5'}, {'data': '12', 'ID': 1, 'col': '8'}, {'data': '13', 'ID': 2, 'col': '2'}]
as expected...

generate list from values of certain field in list of objects

How would I generate a list of values of a certain field of objects in a list?
Given the list of objects:
[ {name: "Joe", group: 1}, {name: "Kirk", group: 2}, {name: "Bob", group: 1}]
I want to generate list of the name field values:
["Joe", "Kirk", "Bob"]
The built-in filter() function seems to come close, but it will return the entire objects themselves.
I'd like a clean, one line solution such as:
filterLikeFunc(function(obj){return obj.name}, mylist)
Sorry, I know that's c syntax.
Just replace filter built-in function with map built-in function.
And use get function which will not give you key error in the absence of that particular key to get value for name key.
data = [{'name': "Joe", 'group': 1}, {'name': "Kirk", 'group': 2}, {'name': "Bob", 'group': 1}]
print map(lambda x: x.get('name'), data)
In Python 3.x
print(list(map(lambda x: x.get('name'), data)))
Results:
['Joe', 'Kirk', 'Bob']
Using List Comprehension:
print [each.get('name') for each in data]
Using a list comprehension approach you get:
objects = [{'group': 1, 'name': 'Joe'}, {'group': 2, 'name': 'Kirk'}, {'group': 1, 'name': 'Bob'}]
names = [i["name"] for i in objects]
For a good intro to list comprehensions, see https://docs.python.org/2/tutorial/datastructures.html
Just iterate over your list of dicts and pick out the name value and put them in a list.
x = [ {'name': "Joe", 'group': 1}, {'name': "Kirk", 'group': 2}, {'name': "Bob", 'group': 1}]
y = [y['name'] for y in x]
print(y)

Get all values from nested dictionaries in python

I have some dictionaries of dictionaries, like this:
a['b']['c']['d']['answer'] = answer1
a['b']['c']['e']['answer'] = answer2
a['b']['c']['f']['answer'] = answer3
....
a['b']['c']['d']['conf'] = conf1
a['b']['c']['e']['conf'] = conf2
a['b']['c']['f']['conf'] = conf3
Is there a fast way to get a list of values of all answers for all elements at the third level (d,e,f)?
Specifically I'd like to know if there's any mechanism implementing a wildcard (e.g., a['b']['c']['*']['answer'].values()
update
The fastest way I've found till now is:
[x['answer'] for x in a['b']['c'].values()]
In Python3 we can build a simple generator for this:
def NestedDictValues(d):
for v in d.values():
if isinstance(v, dict):
yield from NestedDictValues(v)
else:
yield v
a={4:1,6:2,7:{8:3,9:4,5:{10:5},2:6,6:{2:7,1:8}}}
list(NestedDictValues(a))
The output is:
[1, 2, 3, 4, 6, 5, 8, 7]
which is all of the values.
You could use a simple list comprehension:
[a['b']['c'][key]['answer'] for key in a['b']['c'].keys()]
Out[11]: ['answer1', 'answer2', 'answer3']
If you want to get all the answers and conf etc. You could do:
[[a['b']['c'][key][type] for key in a['b']['c'].keys()] for type in a['b']['c']['d'].keys()]
Out[15]: [['conf1', 'conf2', 'conf3'], ['answer1', 'answer2', 'answer3']]
I would do that using recursive generator function:
def d_values(d, depth):
if depth == 1:
for i in d.values():
yield i
else:
for v in d.values():
if isinstance(v, dict):
for i in d_values(v, depth-1):
yield i
Example:
>>> list(d_values({1: {2: 3, 4: 5}}, 2))
[3, 5]
In your case, this would give a dictionary like {'answer': answer1, 'conf': conf1} as each item, so you can use:
list(d['answer'] for d in d_values(a, 3))
Just to give an answer to this topic, copying my solution from the "updating status" of my question:
[x['answer'] for x in a['b']['c'].values()]
Hope this can help.
list(map(lambda key: a['b']['c'][key], a['b']['c'].keys()))
You can use a NestedDict. First, let me recreate your dictionary
>>> from ndicts.ndicts import NestedDict
>>> nd = NestedDict.from_product("b", "c", "def", ["answer", "conf"])
NestedDict({
'b': {
'c': {
'd': {'answer': None, 'conf': None},
'e': {'answer': None, 'conf': None},
'f': {'answer': None, 'conf': None}
}
}
})
Then use an empty string as a wildcard
>>> nd_extract = nd.extract["b", "c", "", "answer"]
>>> nd_extract
NestedDict({
'b': {
'c': {
'd': {'answer': None},
'e': {'answer': None},
'f': {'answer': None}
}
}
})
Finally get the values
>>> list(nd_extract.values())
[None, None, None]
To install ndicts
pip install ndicts

Python list to dictionary

I have two list one with the label and the other data. for example
label = ["first","second"]
list = [[1,2],[11,22]]
I need the result to be a list of dictionary
[ {
"first" : 1,
"second" : 2,
},
{
"first" : 11,
"second" : 22,
}
]
Is there a simple way to do that. Note the label and list might vary, but number of entry remain the same.
>>> label = ["first","second"]
>>> lists = [[1,2],[11,22]]
>>> [dict(zip(label, l)) for l in lists]
[{'second': 2, 'first': 1}, {'second': 22, 'first': 11}]
Try this:
>>> [dict(zip(label, e)) for e in list]
[{'second': 2, 'first': 1}, {'second': 22, 'first': 11}]

Categories