Sum Values by Key First X Characters (Python) - python

I have a dictionary like so (but much longer):
codes = {
'113110': 7, '113310': 1, '213111': 1,
'213112': 3, '236115': 2, '236220': 1,
'238190': 1, '238330': 1, '238990': 2,
'311612': 1, '321214': 1, }
I want to know the sum value of all keys grouped by the first two digits. So, '11' should be 8. But if I check like the following, an occurrence of '11' anywhere in the key will count.
group_11 = sum([ v for k,v in codes.items() if '11' in k])
# Returns 15 instead of 8
I've tried using startswith, but I'm not sure how it works in this context. Not like this:
group_11 = sum([ v for k,v in codes.items() if any(k.startswith('11')])
I have 20 groups to check against, but I want to be able to total any set of keys grouping by first x characters as the groupings could change in the future.

You can use itertools.groupby to sort (the sorting is important for groupby to work properly) and group your dict's items by the first two key chars and sum the values for each group:
from itertools import groupby
d = {
k: sum(item[1] for item in g)
for k, g in groupby(sorted(codes.items()), key=lambda item: item[0][:2])
}
d
{'11': 8, '32': 1, '31': 1, '21': 4, '23': 7}

You could convert all the items in codes to Counter and sum them together:
from collections import Counter
codes = {
'113110': 7, '113310': 1, '213111': 1,
'213112': 3, '236115': 2, '236220': 1,
'238190': 1, '238330': 1, '238990': 2,
'311612': 1, '321214': 1
}
sum((Counter({k[:2]: v}) for k, v in codes.iteritems()), Counter()) # Counter({'11': 8, '23': 7, '21': 4, '32': 1, '31': 1})

Related

how to get index of a giving string in liste python?

my list is like this, in example the string is 'a' and 'b' ;
i want to return the index of string 'a' and for 'b' then i want to calculate how many time is 'a' repeated in the list1 :
list1=['a','a','b','a','a','b','a','a','b','a','b','a','a']
i want to return the order of evry 'a' in list1
the result should be like this :
a_position=[1,2,4,5,7,8,10,12,13]
and i want to calculate how many time 'a' is repeated in list1:
a_rep=9
You could do below:
a_positions = [idx + 1 for idx, el in enumerate(list1) if el == 'a']
a_repitition = len(a_positions)
print(a_positions):
[1, 2, 4, 5, 7, 8, 10, 12, 13]
print(a_repitition):
9
If you need repititions of each element you can also use collections.Counter
from collections import Counter
counter = Counter(list1)
print(counter['a']):
9
If you want to get the indices and counts of all letters:
list1=['a','a','b','a','a','b','a','a','b','a','b','a','a']
pos = {}
for i,c in enumerate(list1, start=1): # 1-based indexing
pos.setdefault(c, []).append(i)
pos
# {'a': [1, 2, 4, 5, 7, 8, 10, 12, 13],
# 'b': [3, 6, 9, 11]}
counts = {k: len(v) for k,v in pos.items()}
# {'a': 9, 'b': 4}

Combinations with max length and per element max repetition values

My goal is to find a more efficient way to get all combinations of 1 to r mixed elements, where each family of element potentially has a different count and r is a parameter. The elements can be any (hashable) type. The result is a list of Counter-like dictionaries.
Here is an example data:
example = {1e-8: 3, "k": 2}
r = 5 # sum(example.values()) == 5 therefore all possible combinations for this example
The expected result is the following:
[{1e-08: 1},
{'k': 1},
{1e-08: 2},
{1e-08: 1, 'k': 1},
{'k': 2},
{1e-08: 3},
{1e-08: 2, 'k': 1},
{1e-08: 1, 'k': 2},
{1e-08: 3, 'k': 1},
{1e-08: 2, 'k': 2},
{1e-08: 3, 'k': 2}]
... correspondong to every possible combinations of 1, 2, 3, 4 and 5 elements.
The order preservation of the list is preferable (since Python 3.7+ preserves the order of keys inside dictionaries) but not mandatory.
Here is the solution I currently use:
from more_itertools import distinct_combinations
from collections import Counter
def all_combis(elements, r=None):
if r is None:
r = sum(elements.values())
# "Flattening" by repeating the elements according to their count
flatt = []
for k, v in elements.items():
flatt.extend([k] * v)
for r in range(1, r+1):
for comb in distinct_combinations(flatt, r):
yield dict(Counter(comb))
list(all_combis(example))
# > The expected result
A real-life example has 300 elements distributed among 15 families. It is processed in ~13 seconds with a value of r=10 for about 2 million combinations, and ~31 seconds with r=11 for 4.5 million combinations.
I'm guessing there are better ways which avoid "flattening" the elements and/or counting the combinations, but I struggle to find any when each element has a different count.
Can you design a more time-efficient solution ?
The keys are a bit of a distraction. They can be added in later. Mathematically, what you have is a vector of bounds, together with a global bound, and want to generate all tuples where each element is bounded by its respective bound, and the total is bounded by the global bound. This leads to a simple recursive approach based on the idea that if
(a_1, a_2, ..., a_n) <= (b_1, b_2, ..., b_n) with a_1 + ... a_n <= k
then
(a_2, ..., a_n) <= (b_2, ..., b_n) with a_2 + ... a_n <= k - a_1
This leads to something like:
def bounded_tuples(r,bounds):
n = len(bounds)
if r == 0:
return [(0,)*n]
elif n == 0:
return [()]
else:
tuples = []
for i in range(1+min(r,bounds[0])):
tuples.extend((i,)+t for t in bounded_tuples(r-i,bounds[1:]))
return tuples
Note that this includes the solution with all 0's -- which you exclude, but that can be filtered out and the keys reintroduced:
def all_combis(elements, r=None):
if r is None:
r = sum(elements.values())
for t in bounded_tuples(r,list(elements.values())):
if max(t) > 0:
yield dict(zip(elements.keys(),t))
For example:
example = {1e-8: 3, "k": 2}
for d in all_combis(example):
print(d)
Output:
{1e-08: 0, 'k': 1}
{1e-08: 0, 'k': 2}
{1e-08: 1, 'k': 0}
{1e-08: 1, 'k': 1}
{1e-08: 1, 'k': 2}
{1e-08: 2, 'k': 0}
{1e-08: 2, 'k': 1}
{1e-08: 2, 'k': 2}
{1e-08: 3, 'k': 0}
{1e-08: 3, 'k': 1}
{1e-08: 3, 'k': 2}
Which is essentially what you have. The code could obviously be tweaked to eliminate dictionary entries with the value 0.
Timing with larger examples seems to suggest that my approach isn't any quicker than yours, though it still might give you some ideas.
As #John Coleman said without the keys you may be able to speed things up.
This recursive approach starts at the end of the list and iterates until either the max sum is reached, or the max value of that element.
It returns a list, but as #John Coleman also showed, it is easy to add the keys later.
From my tests it appears to run in about half the time as your current implementation.
def all_combis(elements, r=None):
if r is None:
r = sum(elements)
if r == 0:
yield [0] * len(elements)
return
if not elements:
yield []
return
elements = list(elements)
element = elements.pop(0)
for i in range(min(element + 1, r + 1)):
for combi in all_combis(elements, r - i):
yield [i] + combi
example = {1e-8: 3, "k": 2}
list(all_combis([val for val in example.values()]))
Output:
[[0, 0], [0, 1], [0, 2], [1, 0], [1, 1], [1, 2], [2, 0], [2, 1], [2, 2], [3, 0], [3, 1], [3, 2]]

Nesting dictionary algorithm

Suppose I have the following dictionary:
{'a': 0, 'b': 1, 'c': 2, 'c.1': 3, 'd': 4, 'd.1': 5, 'd.1.2': 6}
I wish to write an algorithm which outputs the following:
{
"a": 0,
"b": 1,
"c": {
"c": 2,
"c.1": 3
},
"d":{
"d": 4,
"d.1": {
"d.1": 5,
"d.1.2": 6
}
}
}
Note how the names are repeated inside the dictionary. And some have variable level of nesting (eg. "d").
I was wondering how you would go about doing this, or if there is a python library for this? I know you'd have to use recursion for something like this, but my recursion skills are quite poor. Any thoughts would be highly appreciated.
You can use a recursive function for this or just a loop. The tricky part is wrapping existing values into dictionaries if further child nodes have to be added below them.
def nested(d):
res = {}
for key, val in d.items():
t = res
# descend deeper into the nested dict
for x in [key[:i] for i, c in enumerate(key) if c == "."]:
if x in t and not isinstance(t[x], dict):
# wrap leaf value into another dict
t[x] = {x: t[x]}
t = t.setdefault(x, {})
# add actual key to nested dict
if key in t:
# already exists, go one level deeper
t[key][key] = val
else:
t[key] = val
return res
Your example:
d = {'a': 0, 'b': 1, 'c': 2, 'c.1': 3, 'd': 4, 'd.1': 5, 'd.1.2': 6}
print(nested(d))
# {'a': 0,
# 'b': 1,
# 'c': {'c': 2, 'c.1': 3},
# 'd': {'d': 4, 'd.1': {'d.1': 5, 'd.1.2': 6}}}
Nesting dictionary algorithm ...
how you would go about doing this,
sort the dictionary items
group the result by index 0 of the keys (first item in the tuples)
iterate over the groups
if there are is than one item in a group make a key for the group and add the group items as the values.
Slightly shorter recursion approach with collections.defaultdict:
from collections import defaultdict
data = {'a': 0, 'b': 1, 'c': 2, 'c.1': 3, 'd': 4, 'd.1': 5, 'd.1.2': 6}
def group(d, p = []):
_d, r = defaultdict(list), {}
for n, [a, *b], c in d:
_d[a].append((n, b, c))
for a, b in _d.items():
if (k:=[i for i in b if i[1]]):
r['.'.join(p+[a])] = {**{i[0]:i[-1] for i in b if not i[1]}, **group(k, p+[a])}
else:
r[b[0][0]] = b[0][-1]
return r
print(group([(a, a.split('.'), b) for a, b in data.items()]))
Output:
{'a': 0, 'b': 1, 'c': {'c': 2, 'c.1': 3}, 'd': {'d': 4, 'd.1': {'d.1': 5, 'd.1.2': 6}}}

Binary digits string manipulation in Python

How do I calculate unique counts of (groups) of 1's and 0's in a string in Python 3? e.g.
'11110000110110001111011'
Output should be
{0:{1, 3, 4} , 1:{2, 4}}
You can use itertools.groupby and add the length of each group to a dict of sets:
from itertools import groupby
s = '11110000110110001111011'
d = {}
for k, g in groupby(s):
d.setdefault(int(k), set()).add(sum(1 for _ in g))
d becomes:
{1: {2, 4}, 0: {1, 3, 4}}
You could do this with while loop, by adding difference of index of first character opposite to that on current index and that idx, to some list, here values of counted_x.
x = '11110000110110001111011'
counted_x = {'0': [], '1': []}
idx = 0
while idx < len(x):
oposite_key = str(abs(int(x[idx])-1))
if oposite_key in x[idx:]:
counted_x[x[idx]].append(x.index(oposite_key, idx) - idx)
else:
counted_x[x[idx]].append(len(x[idx:]))
idx += counted_x[x[idx]][-1]
counted_x = [{k: list(set(v))} for k, v in counted_x.items()]
print(counted_x)
Output:
[{'0': [1, 3, 4]}, {'1': [2, 4]}]

Remap data according to ranking

I have the following:
d = {"a":3,"b":2,"c":3,"d":2,"e":2,"f":3,"g":4, "h":6}
m = {v: i+1 for i,v in enumerate(sorted(set(d.values()),reverse=True))}
r = {k:m[d[k]] for k in d}
where r is:
{'a': 3, 'd': 4, 'b': 4, 'c': 3, 'e': 4, 'f': 3, 'g': 2, 'h': 1}
So "h" has the highest value, 6, in d so it is remapped to 1 in r. Then 'g' is ranked 2 since it has the next highest value, 4 in d.
My solution works fine but I was wondering if there is a more elegant solution.
Python dicts don't keep order. If you want that you need an OrderedDict.
Use Counter to get the ranks. Then turn that into a list of tuples or into an OrderedDict.
from collections import Counter, OrderedDict
d = {"a":3,"b":2,"c":3,"d":2,"e":2,"f":3,"g":4, "h":6}
c = Counter(d)
# if you want a list of tuples
ranked_list = [(pair[0],rank+1) for rank,pair in enumerate(c.most_common())]
# [('h', 1),('g', 2),('f', 3),('a', 4),('c', 5),('b', 6),('d', 7), ('e', 8)]
# if you want a dict:
ranked_dict = OrderedDict(ranked_list)
# OrderedDict([('h', 1),('g', 2),('f', 3),('a', 4),('c', 5),('b', 6),('d', 7), ('e', 8)])
You can use this:
d = {"a":3,"b":2,"c":3,"d":2,"e":2,"f":3,"g":4, "h":6}
# sort the dictionary items by -value, throw away old value and use the
# enumerate position starting at 1 instead - no backreferencing in the old
# dict needed here
k = {k:idx for idx,(k,_) in enumerate(sorted(d.items(), key = lambda x:-x[1]),1)}
print(k)
Output:
{'h': 1, 'g': 2, 'a': 3, 'c': 4, 'f': 5, 'b': 6, 'd': 7, 'e': 8}
def ranker(d):
ranks = sorted(set(d.values()),reverse=True)
ranks = {r:i+1 for i,r in enumerate(ranks)}
return {k: ranks[v] for k,v in d.items()}

Categories