Python: Count elements on Counter output - python

I have a nested list as:
List1 = [[A,B,A,A],[C,C,B,B],[A,C,B,B]]..... so on
I used counter function to count the number of elements in the nested lists:
for i,j in enumerate(List1):
print(Counter(j))
I got following output as:
Counter({'A': 3, 'B': 1})
Counter({'C': 2, 'B': 2})
Counter({'B': 2, 'A': 1, 'C': 1})
....
I want to calculate percentage of A in Counter output:
A = number of A's / total number of elements
For example:
Counter({'A': 3, 'B': 1})
Would yield:
A = 3/4 = 0.75
I am not able to calculate A, Can anyone kindly help me with this?

The following would give you a list of dictionaries holding both the counts and the percentages for each entry:
List1 = [['A','B','A','A'],['C','C','B','B'],['A','C','B','B']]
counts = [Counter(x) for x in List1]
percentages = [{k : (v, v / float(len(l1))) for k,v in cc.items()} for l1, cc in zip(List1, counts)]
print percentages
Giving the following output:
[{'A': (3, 0.75), 'B': (1, 0.25)}, {'C': (2, 0.5), 'B': (2, 0.5)}, {'A': (1, 0.25), 'C': (1, 0.25), 'B': (2, 0.5)}]
For just the percentages:
List1 = [['A','B','A','A'],['C','C','B','B'],['A','C','B','B']]
counts = [Counter(x) for x in List1]
percentages = [{k : v / float(len(l1)) for k,v in cc.items()} for l1, cc in zip(List1, counts)]
print percentages
Giving:
[{'A': 0.75, 'B': 0.25}, {'C': 0.5, 'B': 0.5}, {'A': 0.25, 'C': 0.25, 'B': 0.5}]

This:
In [1]: l = [['A','B','A','A'],['C','C','B','B'],['A','C','B','B']]
In [2]: [{i: x.count(i)/float(len(x)) for i in x} for x in l]
Out[2]:
[{'A': 0.75, 'B': 0.25},
{'B': 0.5, 'C': 0.5},
{'A': 0.25, 'B': 0.5, 'C': 0.25}]

>>> for sublist in List1:
c = Counter(sublist)
print(c['A'] / sum(c.values()))
0.75
0.0
0.25
All values at once:
>>> for sublist in List1:
c = Counter(sublist)
s = sum(c.values())
print(c['A'] / s, c['B'] / s, c['C'] / s)
0.75 0.25 0.0
0.0 0.5 0.5
0.25 0.5 0.25
If you want to get a list of all items in a sublist with their respective percentages, you need to iterate the counter:
>>> for sublist in List1:
c = Counter(sublist)
s = sum(c.values())
for elem, count in c.items():
print(elem, count / s)
print()
A 0.75
B 0.25
B 0.5
C 0.5
A 0.25
B 0.5
C 0.25
Or use a dictionary comprehension:
>>> for sublist in List1:
c = Counter(sublist)
s = sum(c.values())
print({ elem: count / s for elem, count in c.items() })
{'A': 0.75, 'B': 0.25}
{'B': 0.5, 'C': 0.5}
{'A': 0.25, 'B': 0.5, 'C': 0.25}

You can use list generator and join method to connect your lists of lists of chars into one-liner list of strings.
>>> List1 = [['A', 'B', 'A', 'A'],['C', 'C', 'B', 'B'],['A', 'C', 'B', 'B']]
>>> [''.join(x) for x in List1]
['ABAA', 'CCBB', 'ACBB']
Then, join again your list to the one string.
>>> ''.join(['ABAA', 'CCBB', 'ACBB'])
'ABAACCBBACBB'
And count 'A' symbol, or any other.
>>> 'ABAACCBBACBB'.count('A')
4
This could be one-liner solution:
>>> ''.join(''.join(x) for x in List1).count('A')
4
String of symbols is iterable type. The same as the list. List of strings is more useful than the list of lists of chars.

Related

Python Index Nested Dictionary with List

If I have a nested dictionary and varying lists:
d = {'a': {'b': {'c': {'d': 0}}}}
list1 = ['a', 'b']
list2 = ['a', 'b', 'c']
list3 = ['a', 'b', 'c', 'd']
How can I access dictionary values like so:
>>> d[list1]
{'c': {'d': 0}}
>>> d[list3]
0
you can use functools reduce. info here. You have a nice post on reduce in real python
from functools import reduce
reduce(dict.get, list3, d)
>>> 0
EDIT: mix of list and dictioanries
in case of having mixed list and dictionary values the following is possible
d = {'a': [{'b0': {'c': 1}}, {'b1': {'c': 1}}]}
list1 = ['a', 1, 'b1', 'c']
fun = lambda element, indexer: element[indexer]
reduce(fun, list1, d)
>>> 1
Use a short function:
def nested_get(d, lst):
out = d
for x in lst:
out = out[x]
return out
nested_get(d, list1)
# {'c': {'d': 0}}

Is it possible to use two (non-nested) for loops inside a dicitonary?

I have these two lists:
a = ['A', 'B', 'C']
b = [ 1 , 2 , 3 ]
And I want to merge them into a dictionary like this:
{'A': 1, 'B': 2, 'C': 3}
I already tried doing stuff like:
{i: j for i in a for j in b}
dict(*a: *b)
Which outputs
{'A': 3, 'B': 3, 'C': 3}
SyntaxError: invalid syntax
a = ['A', 'B', 'C']
b = [ 1 , 2 , 3 ]
print (dict(zip(a,b)))
Output:
{'A': 1, 'B': 2, 'C': 3}
You should better use zip for this:
a = ['A', 'B', 'C']
b = [ 1 , 2 , 3 ]
{i:k for i,k in zip(a,b)}
#{'A': 1, 'B': 2, 'C': 3}
You can also use enumerate
d = {elem: b[i] for i, elem in enumerate(a)}
d
{'A': 1, 'B': 2, 'C': 3}

How to sum all the values that belong to the same key?

I'm pulling data from the database and assuming i have something like this:
Product Name Quantity
a 3
a 5
b 2
c 7
I want to sum the Quantity based on Product name, so this is what i want:
product = {'a':8, 'b':2, 'c':7 }
Here's what I'm trying to do after fetching the data from the database:
for row in result:
product[row['product_name']] += row['quantity']
but this will give me: 'a'=5 only, not 8.
Option 1: pandas
This is one way, assuming you begin with a pandas dataframe df. This solution has O(n log n) complexity.
product = df.groupby('Product Name')['Quantity'].sum().to_dict()
# {'a': 8, 'b': 2, 'c': 7}
The idea is you can perform a groupby operation, which produces a series indexed by "Product Name". Then use the to_dict() method to convert to a dictionary.
Option 2: collections.Counter
If you begin with a list or iterator of results, and wish to use a for loop, you can use collections.Counter for O(n) complexity.
from collections import Counter
result = [['a', 3],
['a', 5],
['b', 2],
['c', 7]]
product = Counter()
for row in result:
product[row[0]] += row[1]
print(product)
# Counter({'a': 8, 'c': 7, 'b': 2})
Option 3: itertools.groupby
You can also use a dictionary comprehension with itertools.groupby. This requires sorting beforehand.
from itertools import groupby
res = {i: sum(list(zip(*j))[1]) for i, j in groupby(sorted(result), key=lambda x: x[0])}
# {'a': 8, 'b': 2, 'c': 7}
If you insist on using loops, you can do this:
# fake data to make the script runnable
result = [
{'product_name': 'a', 'quantity': 3},
{'product_name': 'a', 'quantity': 5},
{'product_name': 'b', 'quantity': 2},
{'product_name': 'c', 'quantity': 7}
]
# solution with defaultdict and loops
from collections import defaultdict
d = defaultdict(int)
for row in result:
d[row['product_name']] += row['quantity']
print(dict(d))
The output:
{'a': 8, 'b': 2, 'c': 7}
Since you mention pandas
df.set_index('ProductName').Quantity.sum(level=0).to_dict()
Out[20]: {'a': 8, 'b': 2, 'c': 7}
Use tuple to store the result.
Edit:
Not clear if the data mentioned is really a dataframe.
If yes then li = [tuple(x) for x in df.to_records(index=False)]
li = [('a', 3), ('a', 5), ('b', 2), ('c', 7)]
d = dict()
for key, val in li:
val_old = 0
if key in d:
val_old = d[key]
d[key] = val + val_old
print(d)
Output
{'a': 8, 'b': 2, 'c': 7}

Convert redundant array to dict (or JSON)?

Suppose I have an array:
[['a', 10, 1, 0.1],
['a', 10, 2, 0.2],
['a', 20, 2, 0.3],
['b', 10, 1, 0.4],
['b', 20, 2, 0.5]]
And I want a dict (or JSON):
{
'a': {
10: {1: 0.1, 2: 0.2},
20: {2: 0.3}
}
'b': {
10: {1: 0.4},
20: {2: 0.5}
}
}
Is there any good way or some library for this task?
In this example the array is just 4-column, but my original array is more complicated (7-column).
Currently I implement this naively:
import pandas as pd
df = pd.DataFrame(array)
grouped1 = df.groupby('column1')
for column1 in grouped1.groups:
group1 = grouped1.get_group(column1)
grouped2 = group1.groupby('column2')
for column2 in grouped2.groups:
group2 = grouped2.get_group(column2)
...
And defaultdict way:
d = defaultdict(lambda x: defaultdict(lambda y: defaultdict ... ))
for row in array:
d[row[0]][row[1]][row[2]... = row[-1]
But I think neither is smart.
I would suggest this rather simple solution:
from functools import reduce
data = [['a', 10, 1, 0.1],
['a', 10, 2, 0.2],
['a', 20, 2, 0.3],
['b', 10, 1, 0.4],
['b', 20, 2, 0.5]]
result = dict()
for row in data:
reduce(lambda v, k: v.setdefault(k, {}), row[:-2], result)[row[-2]] = row[-1]
print(result)
{'a': {10: {1: 0.1, 2: 0.2}, 20: {2: 0.3}}, 'b': {10: {1: 0.4}, 20: {2: 0.5}}}
An actual recursive solution would be something like this:
def add_to_group(keys: list, group: dict):
if len(keys) == 2:
group[keys[0]] = keys[1]
else:
add_to_group(keys[1:], group.setdefault(keys[0], dict()))
result = dict()
for row in data:
add_to_group(row, result)
print(result)
Introduction
Here is a recursive solution. The base case is when you have a list of 2-element lists (or tuples), in which case, the dict will do what we want:
>>> dict([(1, 0.1), (2, 0.2)])
{1: 0.1, 2: 0.2}
For other cases, we will remove the first column and recurse down until we get to the base case.
The code:
from itertools import groupby
def rows2dict(rows):
if len(rows[0]) == 2:
# e.g. [(1, 0.1), (2, 0.2)] ==> {1: 0.1, 2: 0.2}
return dict(rows)
else:
dict_object = dict()
for column1, groupped_rows in groupby(rows, lambda x: x[0]):
rows_without_first_column = [x[1:] for x in groupped_rows]
dict_object[column1] = rows2dict(rows_without_first_column)
return dict_object
if __name__ == '__main__':
rows = [['a', 10, 1, 0.1],
['a', 10, 2, 0.2],
['a', 20, 2, 0.3],
['b', 10, 1, 0.4],
['b', 20, 2, 0.5]]
dict_object = rows2dict(rows)
print dict_object
Output
{'a': {10: {1: 0.1, 2: 0.2}, 20: {2: 0.3}}, 'b': {10: {1: 0.4}, 20: {2: 0.5}}}
Notes
We use the itertools.groupby generator to simplify grouping of similar rows based on the first column
For each group of rows, we remove the first column and recurse down
This solution assumes that the rows variable has 2 or more columns. The result is unpreditable for rows which has 0 or 1 column.

Extracting key from the nested dictionary [duplicate]

So I have this block of code
dictionary = {
'key1': {'a': 1, 'b': 2, 'c': 10},
'key2': {'d': 1, 'e': 1, 'c': 11},
'key3': {'d': 2, 'b': 1, 'g': 12}}
and
list1 = (a,b,c)
What I want to do is run a loop that finds the maximums of all the items in the list and returns the key. So for example, the maximum of 'c' would return 'key2', the maximum of 'b' would return 'key1', etc.
So far I have
for value in list1:
m = max(dictionary, key=lambda v: dictionary[v][value])
print(m + "\n")
But this only works if the same subkey exists in all keys in the dictionary. Any ideas on what to do?
Use float('-inf') when the key is missing:
m = max(dictionary, key=lambda v: dictionary[v].get(value, float('-inf')))
Negative infinity is guaranteed to be smaller than any existing value in the dictionaries, ensuring that nested dictionaries with the specific key missing are ignored.
Demo:
>>> dictionary = {
... 'key1': {'a': 1, 'b': 2, 'c': 10},
... 'key2': {'d': 1, 'e': 1, 'c': 11},
... 'key3': {'d': 2, 'b': 1, 'g': 12}}
>>> list1 = ('a', 'b', 'c')
>>> for value in list1:
... print(value, max(dictionary, key=lambda v: dictionary[v].get(value, float('-inf'))))
...
a key1
b key1
c key2
However, it'll be more efficient if you looped over all your dictionary values just once instead:
maximi = dict.fromkeys(list1, (None, float('-inf')))
for key, nested in dictionary.items():
for k in nested.keys() & maximi: # intersection of keys
if maximi[k][0] is None or dictionary[maximi[k][0]][k] < nested[k]:
maximi[k] = (key, nested[k])
for value in list1:
print(value, maximi[value][0])
That's presuming you are using Python 3; in Python 2, replace .items() with .iteritems() and .keys() with .viewkeys().
Demo:
>>> maximi = dict.fromkeys(list1, (None, float('-inf')))
>>> for key, nested in dictionary.items():
... for k in nested.keys() & maximi: # intersection of keys
... if maximi[k][0] is None or dictionary[maximi[k][0]][k] < nested[k]:
... maximi[k] = (key, nested[k])
...
>>> maximi
{'a': ('key1', 1), 'b': ('key1', 2), 'c': ('key2', 11)}
>>> for value in list1:
... print(value, maximi[value][0])
...
a key1
b key1
c key2

Categories