Selecting distinct keys and their counts from a dictionary series in python - python

I have a pandas dictionary series, that takes the values like
0 {AA:25,BB:31}
1 {CC:45,AA:3}
2 {BB:3,CD:4,AA:5}
I want to create a dictionary out of it based on the key and its occurrence in series, like:
{AA:3,BB:2,CC:1,CD:1}

I doubt there is a "built-in" solutiuon for this, so you'd have to manually iterate and count each key in every dictionary.
import pandas as pd
from collections import defaultdict
ser = pd.Series([{'AA':25,'BB':31},
{'CC':45,'AA':3},
{'BB':3,'CD':4,'AA':5}])
count = defaultdict(int)
for d in ser:
for key in d:
count[key] += 1
print(count)
# defaultdict(<class 'int'>, {'CC': 1, 'BB': 2, 'AA': 3, 'CD': 1})
You could also use Counter, however this looks rather "forced" in this situation:
import pandas as pd
from collections import Counter
total = Counter()
ser = pd.Series([{'AA':25,'BB':31},
{'CC':45,'AA':3},
{'BB':3,'CD':4,'AA':5}])
for d in ser:
total.update(d.keys())
print(total)
# Counter({'AA': 3, 'BB': 2, 'CD': 1, 'CC': 1})

Turn your series in to a series of lists of keys, sum those creating a single list of keys, and use a Counter:
In [23]: pd.Series([{'AA':25,'BB':31},{'CC':45,'AA':3},{'BB':3,'CD':4,'AA':5}])
Out[23]:
0 {'AA': 25, 'BB': 31}
1 {'AA': 3, 'CC': 45}
2 {'CD': 4, 'AA': 5, 'BB': 3}
dtype: object
In [24]: series = _
In [34]: from collections import Counter
In [35]: Counter(series.apply(lambda x: list(x.keys())).sum())
Out[35]: Counter({'AA': 3, 'BB': 2, 'CC': 1, 'CD': 1})
Or using generator expressions and flattening:
In [37]: Counter(k for d in series for k in d.keys())
Out[37]: Counter({'AA': 3, 'BB': 2, 'CC': 1, 'CD': 1})

counter = dict()
for item in series:
for key in item:
counter[key] = counter.get(key, 0) + 1

Maybe it's a bit late but this is another way of doing it by using pandas built-in functions.
s = pd.Series([{'AA':25,'BB':31},
{'CC':45,'AA':3},
{'BB':3,'CD':4,'AA':5}])
#convert dict to a dataframe and count non nan elements and finally convert it to a dict.
s.apply(pd.Series).count().to_dict()
Out[651]: {'AA': 3, 'BB': 2, 'CC': 1, 'CD': 1}

Related

Python How to add all values in a 2d Dictionary and return a single dictionary with summed values?

Part of the program I am developing has a 2D dict of length n.
Dictionary Example:
test_dict = {
0: {'A': 2, 'B': 1, 'C': 5},
1: {'A': 3, 'B': 1, 'C': 2},
2: {'A': 1, 'B': 1, 'C': 1},
3: {'A': 4, 'B': 2, 'C': 5}
}
All of the dictionaries have the same keys but different values. I need to sum all the values as to equal below.
I have tried to merge the dictionaries using the following:
new_dict = {}
for k, v in test_dict.items():
new_dict.setdefault(k, []).append(v)
I also tried using:
new_dict = {**test_dict[0], **test_dict[1], **test_dict[2], **test_dict[3]}
Unfortuntly I have not had any luck in getting the desired outcome.
Desired Outcome: outcome = {'A': 10, 'B': 5, 'C': 13}
How can I add all the values into a single dictionary?
Solution using pandas
Convert your dict to pandas.DataFrame and then do summation on columns and convert it back to dict.
import pandas as pd
df = pd.DataFrame.from_dict(test_dict, orient='index')
print(df.sum().to_dict())
Output:
{'A': 10, 'B': 5, 'C': 13}
Alternate solution
Use collections.Counter which allows you to add the values of same keys within dict
from collections import Counter
d = Counter()
for _,v in test_dict.items():
d.update(v)
print(d)

How to sum all the values that belong to the same key?

I'm pulling data from the database and assuming i have something like this:
Product Name Quantity
a 3
a 5
b 2
c 7
I want to sum the Quantity based on Product name, so this is what i want:
product = {'a':8, 'b':2, 'c':7 }
Here's what I'm trying to do after fetching the data from the database:
for row in result:
product[row['product_name']] += row['quantity']
but this will give me: 'a'=5 only, not 8.
Option 1: pandas
This is one way, assuming you begin with a pandas dataframe df. This solution has O(n log n) complexity.
product = df.groupby('Product Name')['Quantity'].sum().to_dict()
# {'a': 8, 'b': 2, 'c': 7}
The idea is you can perform a groupby operation, which produces a series indexed by "Product Name". Then use the to_dict() method to convert to a dictionary.
Option 2: collections.Counter
If you begin with a list or iterator of results, and wish to use a for loop, you can use collections.Counter for O(n) complexity.
from collections import Counter
result = [['a', 3],
['a', 5],
['b', 2],
['c', 7]]
product = Counter()
for row in result:
product[row[0]] += row[1]
print(product)
# Counter({'a': 8, 'c': 7, 'b': 2})
Option 3: itertools.groupby
You can also use a dictionary comprehension with itertools.groupby. This requires sorting beforehand.
from itertools import groupby
res = {i: sum(list(zip(*j))[1]) for i, j in groupby(sorted(result), key=lambda x: x[0])}
# {'a': 8, 'b': 2, 'c': 7}
If you insist on using loops, you can do this:
# fake data to make the script runnable
result = [
{'product_name': 'a', 'quantity': 3},
{'product_name': 'a', 'quantity': 5},
{'product_name': 'b', 'quantity': 2},
{'product_name': 'c', 'quantity': 7}
]
# solution with defaultdict and loops
from collections import defaultdict
d = defaultdict(int)
for row in result:
d[row['product_name']] += row['quantity']
print(dict(d))
The output:
{'a': 8, 'b': 2, 'c': 7}
Since you mention pandas
df.set_index('ProductName').Quantity.sum(level=0).to_dict()
Out[20]: {'a': 8, 'b': 2, 'c': 7}
Use tuple to store the result.
Edit:
Not clear if the data mentioned is really a dataframe.
If yes then li = [tuple(x) for x in df.to_records(index=False)]
li = [('a', 3), ('a', 5), ('b', 2), ('c', 7)]
d = dict()
for key, val in li:
val_old = 0
if key in d:
val_old = d[key]
d[key] = val + val_old
print(d)
Output
{'a': 8, 'b': 2, 'c': 7}

Find summary statistics for a python dictionary with multiple values

suppose I have a dictionary:
a_dic = {'file1':["a","b","c"],
'file2':["b","c","d"],
'file3':["c","d","e"]}
I want to write a function to be able to return a dictionary/dataframe to find the occurrence of the keys like:
occurrence = {'a':1, 'b':2, 'c':3, 'd':2,'e':1}
With collections.Counter object and itertools.chain.from_iterable function:
import collections, itertools
a_dic = {'file1':["a","b","c"], 'file2':["b","c","d"], 'file3':["c","d","e"]}
result = dict(collections.Counter(itertools.chain.from_iterable(a_dic.values())))
print(result)
The output:
{'c': 3, 'e': 1, 'b': 2, 'd': 2, 'a': 1}
from collections import Counter
flat_list = [item for sublist in (a_dic.values()) for item in sublist]
print(Counter(flat_list))
Output
Counter({'c': 3, 'b': 2, 'd': 2, 'a': 1, 'e': 1})

Difference in syntax of a normal dictionary and OrderedDictonary

The syntax of printing a normal dictionary is
{'a': 1, 'c': 3, 'b': 2}
Whereas
The syntax of printing a OrderedDict is
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
Is there any way where i can print/return the OrderedDict in the Normal Dictionary way??
Use a custom __str__ method in your own OrderedDict class ; in the custom method, you can build the string you want:
from collections import OrderedDict
class MyOrderedDict(OrderedDict):
def __str__(self):
return "{%s}" % ", ".join([repr(k)+": "+str(v) for k, v in self.iteritems()])
d = MyOrderedDict([('a', 1), ('b', 2), ('c', 3)])
The easy way:
from collections import OrderedDict
a = {'a': 1, 'c': 3, 'b': 2}
new_dict = dict(OrderedDict(a))
print(new_dict)
print(type(new_dict))
Output:
{'a': 1, 'c': 3, 'b': 2}
<type 'dict'>
The hard way:
You can also return the OrderedDict to a simple dict using groupby from itertools module like this way:
from collections import OrderedDict
from itertools import groupby
a = {'a': 1, 'c': 3, 'b': 2}
new_dict = {key:list(val)[0][1] for key, val in groupby(OrderedDict(a).items(), lambda x : x[0])}
print(new_dict)
print(type(new_dict))
Output:
{'c': 3, 'a': 1, 'b': 2`}
<type 'dict'>
Edit:
I see that there is some downvotes because they think that the output should be an ordered string. So, this is how do deal with it:
from collections import OrderedDict
a = {'a': 1, 'c': 3, 'b': 2}
new_dict = OrderedDict(a)
b = "{%s}" % ", ".join([str(key)+":"+str(val) for key, val in sorted(new_dict.iteritems())])
print(b)
print(type(b))
Output:
{'a':1, 'b':2, 'c':3}
<type 'str'>
To print OrderedDict in a dict-way you can act like this (of course, if the actual order of the items does not matter to print):
def some_function():
d = {'banana': 3, 'apple': 4, 'pear': 1, 'orange': 2}
od = OrderedDict(sorted(d.items(), key=lambda t: t[0]))
return od
my_ordered_dict_variable = some_function()
print('My Ordered Dicdict:', dict(my_ordered_dict_variable))
This code will print:
{'banana': 3, 'apple': 4, 'pear': 1, 'orange': 2}

python sorting dictionary by length of values

I have found many threads for sorting by values like here but it doesn't seem to be working for me...
I have a dictionary of lists that have tuples. Each list has a different amount of tuples. I want to sort the dictionary by how many tuples each list contain.
>>>to_format
>>>{"one":[(1,3),(1,4)],"two":[(1,2),(1,2),(1,3)],"three":[(1,1)]}
>>>for key in some_sort(to_format):
print key,
>>>two one three
Is this possible?
>>> d = {"one": [(1,3),(1,4)], "two": [(1,2),(1,2),(1,3)], "three": [(1,1)]}
>>> for k in sorted(d, key=lambda k: len(d[k]), reverse=True):
print k,
two one three
Here is a universal solution that works on Python 2 & Python 3:
>>> print(' '.join(sorted(d, key=lambda k: len(d[k]), reverse=True)))
two one three
dict= {'a': [9,2,3,4,5], 'b': [1,2,3,4, 5, 6], 'c': [], 'd': [1,2,3,4], 'e': [1,2]}
dict_temp = {'a': 'hello', 'b': 'bye', 'c': '', 'd': 'aa', 'e': 'zz'}
def sort_by_values_len(dict):
dict_len= {key: len(value) for key, value in dict.items()}
import operator
sorted_key_list = sorted(dict_len.items(), key=operator.itemgetter(1), reverse=True)
sorted_dict = [{item[0]: dict[item [0]]} for item in sorted_key_list]
return sorted_dict
print (sort_by_values_len(dict))
output:
[{'b': [1, 2, 3, 4, 5, 6]}, {'a': [9, 2, 3, 4, 5]}, {'d': [1, 2, 3, 4]}, {'e': [1, 2]}, {'c': []}]

Categories