slicing dictionary with values - python

I have a dictionary like:
d = {1: 'a', 2:'b', 3:'c', 4:'c', 5:'c', 6:'c'}
I want to slice this dictionary such that if the values in the end are same, it should return only the first value encountered. so the return is:
d = {1: 'a', 2:'b', 3:'c'}
I'm using collections.defaultdict(OrderedDict) to maintain sorting by the keys.
Currently, I'm using a loop. Is there a pythonic way of doing this?
UPDATE
the dictionary values can also be dictionaries:
d = {1: {'a': 'a1', 'b': 'b1'}, 2:{'a': 'a1', 'b': 'b2'}, 3:{'a': 'a1', 'b': 'c1'}, 4:{'a': 'a1', 'b': 'c1'}, 5:{'a': 'a1', 'b': 'c1'}, 6:{'a': 'a1', 'b': 'c1'}}
output:
d = {1: {'a': 'a1', 'b': 'b1'}, 2:{'a': 'a1', 'b': 'b2'}, 3:{'a': 'a1', 'b': 'c1'}}

You can use itertools.groupy with a list-comprehension to achieve your result
>>> from itertools import groupby
>>> d = {1: 'a', 2:'b', 3:'c', 4:'c', 5:'c', 6:'c'}
>>> n = [(min([k[0] for k in list(g)]),k) for k,g in groupby(d.items(),key=lambda x: x[1])]
>>> n
>>> [(1, 'a'), (2, 'b'), (3, 'c')]
The above expression can also be written as
>>> from operator import itemgetter
>>> n = [(min(map(itemgetter(0), g)), k) for k, g in groupby(d.items(), key=itemgetter(1))]
You can cast this to dict by simply using
>>> dict(n)
>>> {1: 'a', 2: 'b', 3: 'c'}
This obviously don't maintain order of keys, so you can use OrderedDict
>>> OrderedDict(sorted(n))
>>> OrderedDict([(1, 'a'), (2, 'b'), (3, 'c')])

If you want to get rid of for loop - you can do it this way:
{a:b for b,a in {y:x for x,y in sorted(d.iteritems(), reverse=True)}.iteritems()}
But it is not so pythonic and not so efficient.

Instead of using a ordered dictionary with the keys representing indexes, the more pythonic way is using a list. In this case, you will use indexes instead of keys and will be able to slice the list more effectively.
>>> d = {1: 'a', 2:'b', 3:'c', 4:'c', 5:'c', 6:'c'}
>>> a = list(d.values())
>>> a[:a.index(a[-1])+1]
['a', 'b', 'c']

Just in case, a solution with pandas
import pandas as pd
df = pd.DataFrame(dict(key=list(d.keys()),val=list(d.values())))
print(df)
key val
0 1 a
1 2 b
2 3 c
3 4 c
4 5 c
5 6 c
df = df.drop_duplicates(subset=['val'])
df.index=df.key
df.val.to_dict()
{1: 'a', 2: 'b', 3: 'c'}
Don't know performances issues on biggest dataset or if it is more pythonic.
Nevertheless, no loops.

You can check if two last values are same:
d = OrderedDict({1: 'a', 2:'b', 3:'c', 4:'c', 5:'c', 6:'c'})
while d.values()[-1] == d.values()[-2]:
d.popitem()
print d
# OrderedDict([(1, 'a'), (2, 'b'), (3, 'c')])

Related

Python Index Nested Dictionary with List

If I have a nested dictionary and varying lists:
d = {'a': {'b': {'c': {'d': 0}}}}
list1 = ['a', 'b']
list2 = ['a', 'b', 'c']
list3 = ['a', 'b', 'c', 'd']
How can I access dictionary values like so:
>>> d[list1]
{'c': {'d': 0}}
>>> d[list3]
0
you can use functools reduce. info here. You have a nice post on reduce in real python
from functools import reduce
reduce(dict.get, list3, d)
>>> 0
EDIT: mix of list and dictioanries
in case of having mixed list and dictionary values the following is possible
d = {'a': [{'b0': {'c': 1}}, {'b1': {'c': 1}}]}
list1 = ['a', 1, 'b1', 'c']
fun = lambda element, indexer: element[indexer]
reduce(fun, list1, d)
>>> 1
Use a short function:
def nested_get(d, lst):
out = d
for x in lst:
out = out[x]
return out
nested_get(d, list1)
# {'c': {'d': 0}}

Average multiple values in same key based on 2nd category in python [duplicate]

This question already has answers here:
Group and compute the average in list of tuples
(2 answers)
Closed 4 years ago.
I have a list which looks something like this
[('A1', 'A', 342.5), ('A2', 'A', 509.70), ('A2', 'B', 119.34),
('A1', 'B', 618.42), ('A1', 'A', 173.54), ('A1', 'B', 235.21)]
I'm looking to find the average of the third elements for each type of second element for each first column values. The output would look something like this
A1 (A 258.02) (B 426.815)
A2 (A 509.70) (B 119.34)
I've been able to do something like this for a list of tuples with two elements but am struggling with three.
If this question has already been answered then please point me there as I couldn't find it myself
Here is a solution using itertools.groupby():
data = [('A1', 'A', 342.5), ('A2', 'A', 509.70), ('A2', 'B', 119.34),
('A1', 'B', 618.42), ('A1', 'A', 173.54), ('A1', 'B', 235.21)]
import itertools as it
for g1 in it.groupby(sorted(data), key=lambda x: x[0]):
print(g1[0], end=' ')
for g2 in it.groupby(g1[1], key=lambda x: x[1]):
nums = [i[2] for i in g2[1]]
print('(%s %.2f)' % (g2[0], sum(nums) / len(nums)), end=' ')
print()
Results:
A1 (A 258.02) (B 426.81)
A2 (A 509.70) (B 119.34)
Using nested defaultdict with float
from collections import defaultdict
l = [('A1', 'A', 342.5), ('A2', 'A', 509.70), ('A2', 'B', 119.34),
('A1', 'B', 618.42), ('A1', 'A', 173.54), ('A1', 'B', 235.21)]
d = defaultdict(lambda: defaultdict(lambda: defaultdict(float)))
for a,b,c in l:
d[a][b]['sum'] += c
d[a][b]['count'] += 1
d[a][b]['average'] += (c - d[a][b]['average'])/d[a][b]['count']
We use the fact that the average can be calculates as (see: https://math.stackexchange.com/posts/957376/)
Returns the following structure:
{
"A1": {
"A": {
"sum": 516.04,
"count": 2.0,
"average": 258.02
},
"B": {
"sum": 853.63,
"count": 2.0,
"average": 426.815
}
},
"A2": {
"A": {
"sum": 509.7,
"count": 1.0,
"average": 509.7
},
"B": {
"sum": 119.34,
"count": 1.0,
"average": 119.34
}
}
}
With this you can easily add more data, e.g. running this again:
l = [('A1', 'A', 100)]
for a,b,c in l:
d[a][b]['sum'] += c
d[a][b]['count'] += 1
d[a][b]['average'] += (c - d[a][b]['average'])/d[a][b]['count']
for k,v in d.items():
print(k)
print('-------------')
for k2, v2 in v.items():
print(k2)
for k3, v3 in v2.items():
print('{}: {}'.format(k3,v3))
print()
Returns:
A1
-------------
A
sum: 616.04
count: 3.0
average: 205.34666666666666
B
sum: 853.63
count: 2.0
average: 426.815
A2
-------------
A
sum: 509.7
count: 1.0
average: 509.7
B
sum: 119.34
count: 1.0
average: 119.34
It's fairly easy to construct a suitable data structure to hold the data:
d = [('A1', 'A', 342.5), ('A2', 'A', 509.70), ('A2', 'B', 119.34),
('A1', 'B', 618.42), ('A1', 'A', 173.54), ('A1', 'B', 235.21)]
In []:
r = {}
for a, b, c in d:
r.setdefault(a, {}).setdefault(b, []).append(c)
r
Out[]:
{'A1': {'A': [342.5, 173.54], 'B': [618.42, 235.21]}, 'A2': {'A': [509.7], 'B': [119.34]}}
Then you can just iterate through this doing the sums:
In []:
{k1: {k2: sum(v2)/len(v2) for k2, v2 in v1.items()} for k1, v1 in r.items()}
Out[]:
{'A1': {'A': 258.02, 'B': 426.815}, 'A2': {'A': 509.7, 'B': 119.34}}
Generate a dictionary of all the values first, and then average them.
So, if we name your list l, you can do:
d = {}
for a, b, c in l:
d.setdefault(a, {}).setdefault(b, []).append(c)
d = {p: {r: sum(s) / len(s) for r, s in q.items()} for p, q in d.items()}
which gives d as:
{
'A1': {
'A': 258.02,
'B': 426.815
},
'A2': {
'A': 509.7,
'B': 119.34
}
}
You can do this really easy in pandas.
import pandas as pd
l = [('A1', 'A', 342.5), ('A2', 'A', 509.70), ('A2', 'B', 119.34),
('A1', 'B', 618.42), ('A1', 'A', 173.54), ('A1', 'B', 235.21)]
df = pd.DataFrame(l)
print(df.groupby([0, 1]).mean())
Hope it helps.

How to sum all the values that belong to the same key?

I'm pulling data from the database and assuming i have something like this:
Product Name Quantity
a 3
a 5
b 2
c 7
I want to sum the Quantity based on Product name, so this is what i want:
product = {'a':8, 'b':2, 'c':7 }
Here's what I'm trying to do after fetching the data from the database:
for row in result:
product[row['product_name']] += row['quantity']
but this will give me: 'a'=5 only, not 8.
Option 1: pandas
This is one way, assuming you begin with a pandas dataframe df. This solution has O(n log n) complexity.
product = df.groupby('Product Name')['Quantity'].sum().to_dict()
# {'a': 8, 'b': 2, 'c': 7}
The idea is you can perform a groupby operation, which produces a series indexed by "Product Name". Then use the to_dict() method to convert to a dictionary.
Option 2: collections.Counter
If you begin with a list or iterator of results, and wish to use a for loop, you can use collections.Counter for O(n) complexity.
from collections import Counter
result = [['a', 3],
['a', 5],
['b', 2],
['c', 7]]
product = Counter()
for row in result:
product[row[0]] += row[1]
print(product)
# Counter({'a': 8, 'c': 7, 'b': 2})
Option 3: itertools.groupby
You can also use a dictionary comprehension with itertools.groupby. This requires sorting beforehand.
from itertools import groupby
res = {i: sum(list(zip(*j))[1]) for i, j in groupby(sorted(result), key=lambda x: x[0])}
# {'a': 8, 'b': 2, 'c': 7}
If you insist on using loops, you can do this:
# fake data to make the script runnable
result = [
{'product_name': 'a', 'quantity': 3},
{'product_name': 'a', 'quantity': 5},
{'product_name': 'b', 'quantity': 2},
{'product_name': 'c', 'quantity': 7}
]
# solution with defaultdict and loops
from collections import defaultdict
d = defaultdict(int)
for row in result:
d[row['product_name']] += row['quantity']
print(dict(d))
The output:
{'a': 8, 'b': 2, 'c': 7}
Since you mention pandas
df.set_index('ProductName').Quantity.sum(level=0).to_dict()
Out[20]: {'a': 8, 'b': 2, 'c': 7}
Use tuple to store the result.
Edit:
Not clear if the data mentioned is really a dataframe.
If yes then li = [tuple(x) for x in df.to_records(index=False)]
li = [('a', 3), ('a', 5), ('b', 2), ('c', 7)]
d = dict()
for key, val in li:
val_old = 0
if key in d:
val_old = d[key]
d[key] = val + val_old
print(d)
Output
{'a': 8, 'b': 2, 'c': 7}

How to "sort" a dictionary by number of occurrences of a key?

I have a dictionary of values that gives the number of occurrences of a value in a list. How can I return a new dictionary that divides the former dictionary into separate dictionaries based on the value?
In other words, I want to sort this dictionary:
>>> a = {'A':2, 'B':3, 'C':4, 'D':2, 'E':3}
to this one.
b = {2: {'A', 'D'}, 3: {'B', 'E'}, 4: {'C'}}
How do I approach the problem?
from collections import defaultdict
a = {'A': 2, 'B': 3, 'C': 4, 'D': 2, 'E': 3}
b = defaultdict(set)
for k, v in a.items():
b[v].add(k)
This is what you'll get:
defaultdict(<class 'set'>, {2: {'D', 'A'}, 3: {'B', 'E'}, 4: {'C'}})
You can convert b to a normal dict afterwards with b = dict(b).
if you are a python beginner like me, you probably wanna try this
a = {'A': 2 , 'B': 3 , 'C' : 4 , 'D' : 2, 'E' : 3}
b = {}
for key in a:
lst = []
new_key = a[key]
if new_key not in b:
lst.append(key)
b[new_key] = lst
else:
b[new_key].append(key)
print(b)
It uses the mutable property of python dictionary to achieve the result you want.

Extracting dictionary with a subset of values from another dictionary

I have a dictionary and want to remove certain values in bad_list from its value list, and return the remainder. Here is the code:
d = {1: ['a', 'c', 'd'], 2: ['b'], 5: ['e']}
bad_list = ['d','e']
ad = {k:d[k].remove(i) for k in d.keys() for sublist in d[k] for i in sublist if i in bad_list}
print 'd =', d
print 'ad =', ad
Unfortunately what that does is it changes the values in d permanently, and returns None for values in ad.
d = {1: ['a', 'c'], 2: ['b'], 5: []}
ad = {1: None, 5: None}
How can I get a dictionary that looks like this:
new_dict = {1: ['a','c'], 2:['b']}
without looping through? I have a much larger dictionary to deal with, and I'd like to do it in the most efficient way.
There is no way to do it without loop:
d = dict((key, [x for x in value if x not in bad_list]) for key, value in d.iteritems())
or with filter:
d = dict((key, filter(lambda x: x not in bad_list, d[key])) for key in d)
UPDATE
To exclude empty values:
d = dict((key, list(x)) for key in d for x in [set(d[key]).difference(bad_list)] if x)
Well, you could just use 'list comprehension', this one liner works, thought I find if ugly.
ad = {k:v for k,v in {k:[i for i in v if i not in bad_list] for k,v in d.items()}.items() if v}
I'd better use a for loop.
ad2 = dict()
for k,v in d.items():
_data_ = [item for item in v if item not in bad_list]
if _data_:
ad2[k]=_data_
Output:
print 'd =', d
print 'ad =', ad
print 'ad2=', ad2
>d = {1: ['a', 'c', 'd'], 2: ['b'], 5: ['e']}
>ad = {1: ['a', 'c'], 2: ['b']}
>ad2= {1: ['a', 'c'], 2: ['b']}
The following code written in Python 3.5 appears to do as requested in your question. Minimal change should be required for it to work with Python 2.x instead. Just use print statements instead of functions.
d = {1: ['a', 'c', 'd'], 2: ['b'], 5: ['e']}
bad_list = ['d', 'e']
ad = {a: b for a, b in ((a, [c for c in b if c not in bad_list]) for a, b in d.items()) if b}
print('d =', d)
print('ad =', ad)

Categories