Combinations with max length and per element max repetition values - python

My goal is to find a more efficient way to get all combinations of 1 to r mixed elements, where each family of element potentially has a different count and r is a parameter. The elements can be any (hashable) type. The result is a list of Counter-like dictionaries.
Here is an example data:
example = {1e-8: 3, "k": 2}
r = 5 # sum(example.values()) == 5 therefore all possible combinations for this example
The expected result is the following:
[{1e-08: 1},
{'k': 1},
{1e-08: 2},
{1e-08: 1, 'k': 1},
{'k': 2},
{1e-08: 3},
{1e-08: 2, 'k': 1},
{1e-08: 1, 'k': 2},
{1e-08: 3, 'k': 1},
{1e-08: 2, 'k': 2},
{1e-08: 3, 'k': 2}]
... correspondong to every possible combinations of 1, 2, 3, 4 and 5 elements.
The order preservation of the list is preferable (since Python 3.7+ preserves the order of keys inside dictionaries) but not mandatory.
Here is the solution I currently use:
from more_itertools import distinct_combinations
from collections import Counter
def all_combis(elements, r=None):
if r is None:
r = sum(elements.values())
# "Flattening" by repeating the elements according to their count
flatt = []
for k, v in elements.items():
flatt.extend([k] * v)
for r in range(1, r+1):
for comb in distinct_combinations(flatt, r):
yield dict(Counter(comb))
list(all_combis(example))
# > The expected result
A real-life example has 300 elements distributed among 15 families. It is processed in ~13 seconds with a value of r=10 for about 2 million combinations, and ~31 seconds with r=11 for 4.5 million combinations.
I'm guessing there are better ways which avoid "flattening" the elements and/or counting the combinations, but I struggle to find any when each element has a different count.
Can you design a more time-efficient solution ?

The keys are a bit of a distraction. They can be added in later. Mathematically, what you have is a vector of bounds, together with a global bound, and want to generate all tuples where each element is bounded by its respective bound, and the total is bounded by the global bound. This leads to a simple recursive approach based on the idea that if
(a_1, a_2, ..., a_n) <= (b_1, b_2, ..., b_n) with a_1 + ... a_n <= k
then
(a_2, ..., a_n) <= (b_2, ..., b_n) with a_2 + ... a_n <= k - a_1
This leads to something like:
def bounded_tuples(r,bounds):
n = len(bounds)
if r == 0:
return [(0,)*n]
elif n == 0:
return [()]
else:
tuples = []
for i in range(1+min(r,bounds[0])):
tuples.extend((i,)+t for t in bounded_tuples(r-i,bounds[1:]))
return tuples
Note that this includes the solution with all 0's -- which you exclude, but that can be filtered out and the keys reintroduced:
def all_combis(elements, r=None):
if r is None:
r = sum(elements.values())
for t in bounded_tuples(r,list(elements.values())):
if max(t) > 0:
yield dict(zip(elements.keys(),t))
For example:
example = {1e-8: 3, "k": 2}
for d in all_combis(example):
print(d)
Output:
{1e-08: 0, 'k': 1}
{1e-08: 0, 'k': 2}
{1e-08: 1, 'k': 0}
{1e-08: 1, 'k': 1}
{1e-08: 1, 'k': 2}
{1e-08: 2, 'k': 0}
{1e-08: 2, 'k': 1}
{1e-08: 2, 'k': 2}
{1e-08: 3, 'k': 0}
{1e-08: 3, 'k': 1}
{1e-08: 3, 'k': 2}
Which is essentially what you have. The code could obviously be tweaked to eliminate dictionary entries with the value 0.
Timing with larger examples seems to suggest that my approach isn't any quicker than yours, though it still might give you some ideas.

As #John Coleman said without the keys you may be able to speed things up.
This recursive approach starts at the end of the list and iterates until either the max sum is reached, or the max value of that element.
It returns a list, but as #John Coleman also showed, it is easy to add the keys later.
From my tests it appears to run in about half the time as your current implementation.
def all_combis(elements, r=None):
if r is None:
r = sum(elements)
if r == 0:
yield [0] * len(elements)
return
if not elements:
yield []
return
elements = list(elements)
element = elements.pop(0)
for i in range(min(element + 1, r + 1)):
for combi in all_combis(elements, r - i):
yield [i] + combi
example = {1e-8: 3, "k": 2}
list(all_combis([val for val in example.values()]))
Output:
[[0, 0], [0, 1], [0, 2], [1, 0], [1, 1], [1, 2], [2, 0], [2, 1], [2, 2], [3, 0], [3, 1], [3, 2]]

Related

Python Find permutable list in a dict list

Given
listOfDict = [{'ref': 1, 'a': 1, 'b': 2, 'c': 3},
{'ref': 2, 'a': 4, 'b': 5, 'c': 6},
{'ref': 3, 'a': 7, 'b': 8, 'c': 9}]
Lets' consider a list of permutable integer
[7,8,9]=[7,9,8]=[8,7,9]=[8,9,7]=[9,7,8]=[9,8,7] # (3!)
Each of this list has a unique mapping ref, so how given for (8,7,9) can I get ref=3 ?
Also in real case I might until 10 (a,b,c,d,e,f,g,h,i,j)...
You can generate a dictionary that maps the values as frozenset to the value of ref:
listOfDict = [{'ref': 1, 'a': 1, 'b': 2, 'c': 3},
{'ref': 2, 'a': 4, 'b': 5, 'c': 6},
{'ref': 3, 'a': 7, 'b': 8, 'c': 9}]
keys = ['a', 'b', 'c']
out = {frozenset(d[k] for k in keys): d['ref'] for d in listOfDict}
# {frozenset({1, 2, 3}): 1,
# frozenset({4, 5, 6}): 2,
# frozenset({7, 8, 9}): 3}
example:
check = frozenset((8,7,9))
out[check]
# 3
but I don't know in advance the name of the other keys!
Then use this approach:
out = {}
for d in listOfDict:
d2 = d.copy() # this is to avoid modifying the original object
out[frozenset(d2.values())] = d2.pop('ref')
out
or as a comprehension:
out = dict(((d2:=d.copy()).pop('ref'), frozenset(d2.values()))[::-1]
for d in listOfDict)
Here is a commented solution to your problem. The idea is to compare the sorted list of the values in a, b, c etc with the sorted values in list_of_ints. The sorted values will be the same for all permutations of a given set of numbers.
def get_ref(list_of_ints):
# Loop through dictionaries in listOfDict.
for dictionary in listOfDict:
# Get list of values in each dictionary.
vals = [dictionary[key] for key in dictionary if key != "ref"]
if sorted(vals) == sorted(list_of_ints):
# If sorted values are equal to sorted list of ints, return ref.
return dictionary["ref"])
By the way, I believe it would be cleaner to structure this data as a dict of dicts in the following way:
dicts = {
1: {'a': 1, 'b': 2, 'c': 3},
2: {'a': 4, 'b': 5, 'c': 6},
3: {'a': 7, 'b': 8, 'c': 9}
}
The code would then be:
def get_ref(list_of_ints):
for ref, dictionary in dicts.items():
if sorted(dictionary.values()) == sorted(list_of_ints):
return ref
Assuming that all integers in the permutations are unique, the code can be simplified further using sets instead of sorted lists.
Since its a list of dict I can call each dict as it self by using for loop
and record the first number on ref
for i in listOfDict:
ref_num=i["ref"]
and to turn dictunary to list we simply use:
z=list(i.values())
then the last step is to find if its the same input list if so we print/return the ref number
if z[1:]==InputList:
return ref_num
and the code should be like this:
listOfDict = [
{"ref": 1,
"a": 1,
"b": 2,
"c": 3},
{"ref": 2,
"a": 4,
"b": 5,
"c": 6},
{"ref": 3,
"a": 7,
"b": 8,
"c": 9},]
def find_ref_Num(InputList):
for i in listOfDict:
ref_num=i["ref"]
z=list(i.values())
if z[1:]==InputList:
return ref_num
print ("your ref number is: "+str(find_ref_Num([7,8,9])))

create leaf dictionary form parent dictionary

There is a dictionary say d , let n > 0
d = {
'leaf1': 1,
'leaf2': 2,
'leaf3': 3,
'leaf4': 4,
'leaf5': 5,
'leaf6': 6
}
I want to create a list of dictionary (say b) taking n items from d
then when n = 1,
b = [{'leaf1': 1},
{'leaf2': 2},
{'leaf3': 3},
{'leaf4': 4},
{'leaf5': 5},
{'leaf6': 6}]
when n = 2, b is
b = [{'leaf1': 1, 'leaf2': 2 },
{'leaf2': 2, 'leaf3': 3},
{'leaf3': 3,'leaf4': 4},
{'leaf4': 4,'leaf5': 5},
{'leaf5': 5,'leaf6': 6},
{'leaf6': 6,'leaf1': 1}]
when n = 3, b will be
b = {'leaf1': 1, 'leaf2': 2 , 'leaf3': 3},
{'leaf2': 2, 'leaf3': 3,'leaf4': 4},
{'leaf3': 3,'leaf4': 4,'leaf5': 5},
{'leaf4': 4,'leaf5': 5,'leaf6': 6},
{'leaf5': 5,'leaf6': 6,'leaf1': 1},
{'leaf6': 6,'leaf1': 1,'leaf2': 2}]
To create a list of dictionary with n elements at a time, we can first loop in len(d.keys()) i.e total no of elements in dictionary, then loop through the indices that we have to add in list element. As the index may become greater than length of d.keys(), so I have added a check in the loop, we can reduce index from len(d.keys()) to start from 0.
def createLeafDictionary(d, n):
array = []
for i in range(len(d.keys())):
leaf = {}
keyList = list(d.keys())
for j in range(i, i+n):
if j >= len(d.keys()):
j = j - len(d.keys())
leaf[keyList[j]] = d[keyList[j]]
array.append(leaf)
return array
keyList variable I have used to create a list of dictionary keys, we can access dictionary values with help of it.

Dictionary values and list values within a function

I have a dictionary with product names and prices:
products = {'a': 2, 'b': 3, 'c': 4, 'd': 5, 'e': 6, 'f': 7, 'g': 8}
And a list with amounts of each product:
amounts = [3, 0, 5, 1, 3, 2, 0]
I want to get an output shown there total price of that order.
Not using functions I seem to get it right:
products = {'a': 2, 'b': 3, 'c': 4, 'd': 5, 'e': 6, 'f': 7, 'g': 8}
amounts = [3, 0, 5, 1, 3, 2, 0]
res_list = []
order = []
for value in products.values():
res_list.append(value)
for i in range(0, len(res_list)):
order.append(amounts[i] * res_list[i])
total = sum(order)
print(res_list)
print(order) #this line and the one above are not really necessary
print(total)
Output : 63
But when I try using this code within a function I am having some problems. this is what I have tried:
products = {'a': 2, 'b': 3, 'c': 4, 'd': 5, 'e': 6, 'f': 7, 'g': 8}
amounts = [3, 0, 5, 1, 3, 2, 0]
#order = []
def order(prod):
res_list = []
for value in prod.values():
res_list.append(value)
return res_list
prices = order(products)
print(prices)
def order1(prices):
order =[]
for i in range(0, len(prices)):
order.append(amounts[i] * prices[i])
total = sum(order)
return total
print(order1(prices))
Not working the way it is intended.
Thanks for all the help I am learning.
The immediate problem is that your lines:
total = sum(order)
return total
are indented too much, so that they are inside the for loop. Outside of a function, the bug does not matter too much, because all that happens is that the total is recalculated on every iteration but the final value is the one that is used. But inside the function, what will happen is that it will return on the first iteration.
Reducing the indentation so that it is outside the for loop will fix this.
def order1(prices):
order =[]
for i in range(0, len(prices)):
order.append(amounts[i] * prices[i])
total = sum(order)
return total
However, separate from that, you are relying on the order within the dictionary, which is only guaranteed for Python 3.7 and more recent. If you want to allow the code to be run reliably on earlier versions of Python, you can use an OrderedDict.
from collections import OrderedDict
products = OrderedDict([('a', 2), ('b', 3), ('c', 4), ('d', 5),
('e', 6), ('f', 7), ('g', 8)])
Incidentally, your order function is unnecessary. If you want to convert products.values() (a dictionary values iterator) to a list, just use:
prices = list(products.values())
Also, in order1 it is unnecessary to build up an order list and sum it - you could use:
total = 0
for i in range(0, len(prices)):
total += amounts[i] * prices[i]
That is probably enough to be getting on with for now, but if you wish to make a further refinement, then look up about how zip is used, and think how it could be used with your loop over amounts and prices.
Just zip products.values() and amounts, find the product of each pair, and then finally sum the result
>>> products = {'a': 2, 'b': 3, 'c': 4, 'd': 5, 'e': 6, 'f': 7, 'g': 8}
>>> amounts = [3, 0, 5, 1, 3, 2, 0]
>>>
>>> sum(i*j for i,j in zip(products.values(), amounts))
63
You can do this.
products = {'a': 2, 'b': 3, 'c': 4, 'd': 5, 'e': 6, 'f': 7, 'g': 8}
amounts = [3, 0, 5, 1, 3, 2, 0]
def order(products, amounts):
res_list = []
order = []
for value in products.values():
res_list.append(value)
for i in range(0, len(res_list)):
order.append(amounts[i] * res_list[i])
total = sum(order)
print(res_list)
print(order) #this line and the one above are not really necessary
print(total)
return(total)
order(products, amounts)
You don't really need to iterate twice assuming that the amount of items in products and in amounts is the same.
products = {'a': 2, 'b': 3, 'c': 4, 'd': 5, 'e': 6, 'f': 7, 'g': 8}
amounts = [3, 0, 5, 1, 3, 2, 0]
def order(products: dict, amounts: list):
total = 0
for idx, (_key, val) in enumerate(products.items()):
total = total + amounts[idx] * val
return total
print(order(products, amounts))
Note: The order of the items in the dictionary is not guaranteed, you might want to look into different data structures that link together products and amounts in a better way, i.e.:
products = {'a': 2, 'b': 3, 'c': 4, 'd': 5, 'e': 6, 'f': 7, 'g': 8}
amounts = {'a': 3, 'b': 0, 'c': 5, 'd': 1, 'e': 3, 'f': 2, 'g': 0}
In this way you could do this:
def order(products: dict, amounts: dict):
total = 0
for key, val in products.items():
total = total + val * amounts[key]
return total
print(order(products, amounts))
Once we're at it, let's get fancy with numpy, since in the end, you just want the dot product prices x amounts:
import numpy as np
total = np.dot(list(products.values()), amounts)
63
But seriously, I'd strictly use either lists or dicts for both datasets, not mix them up, since that can seriously cause problems with order syncronisation between them, even if you are on Python 3.7 with the changes made there as mentioned.

Nesting dictionary algorithm

Suppose I have the following dictionary:
{'a': 0, 'b': 1, 'c': 2, 'c.1': 3, 'd': 4, 'd.1': 5, 'd.1.2': 6}
I wish to write an algorithm which outputs the following:
{
"a": 0,
"b": 1,
"c": {
"c": 2,
"c.1": 3
},
"d":{
"d": 4,
"d.1": {
"d.1": 5,
"d.1.2": 6
}
}
}
Note how the names are repeated inside the dictionary. And some have variable level of nesting (eg. "d").
I was wondering how you would go about doing this, or if there is a python library for this? I know you'd have to use recursion for something like this, but my recursion skills are quite poor. Any thoughts would be highly appreciated.
You can use a recursive function for this or just a loop. The tricky part is wrapping existing values into dictionaries if further child nodes have to be added below them.
def nested(d):
res = {}
for key, val in d.items():
t = res
# descend deeper into the nested dict
for x in [key[:i] for i, c in enumerate(key) if c == "."]:
if x in t and not isinstance(t[x], dict):
# wrap leaf value into another dict
t[x] = {x: t[x]}
t = t.setdefault(x, {})
# add actual key to nested dict
if key in t:
# already exists, go one level deeper
t[key][key] = val
else:
t[key] = val
return res
Your example:
d = {'a': 0, 'b': 1, 'c': 2, 'c.1': 3, 'd': 4, 'd.1': 5, 'd.1.2': 6}
print(nested(d))
# {'a': 0,
# 'b': 1,
# 'c': {'c': 2, 'c.1': 3},
# 'd': {'d': 4, 'd.1': {'d.1': 5, 'd.1.2': 6}}}
Nesting dictionary algorithm ...
how you would go about doing this,
sort the dictionary items
group the result by index 0 of the keys (first item in the tuples)
iterate over the groups
if there are is than one item in a group make a key for the group and add the group items as the values.
Slightly shorter recursion approach with collections.defaultdict:
from collections import defaultdict
data = {'a': 0, 'b': 1, 'c': 2, 'c.1': 3, 'd': 4, 'd.1': 5, 'd.1.2': 6}
def group(d, p = []):
_d, r = defaultdict(list), {}
for n, [a, *b], c in d:
_d[a].append((n, b, c))
for a, b in _d.items():
if (k:=[i for i in b if i[1]]):
r['.'.join(p+[a])] = {**{i[0]:i[-1] for i in b if not i[1]}, **group(k, p+[a])}
else:
r[b[0][0]] = b[0][-1]
return r
print(group([(a, a.split('.'), b) for a, b in data.items()]))
Output:
{'a': 0, 'b': 1, 'c': {'c': 2, 'c.1': 3}, 'd': {'d': 4, 'd.1': {'d.1': 5, 'd.1.2': 6}}}

Sum Values by Key First X Characters (Python)

I have a dictionary like so (but much longer):
codes = {
'113110': 7, '113310': 1, '213111': 1,
'213112': 3, '236115': 2, '236220': 1,
'238190': 1, '238330': 1, '238990': 2,
'311612': 1, '321214': 1, }
I want to know the sum value of all keys grouped by the first two digits. So, '11' should be 8. But if I check like the following, an occurrence of '11' anywhere in the key will count.
group_11 = sum([ v for k,v in codes.items() if '11' in k])
# Returns 15 instead of 8
I've tried using startswith, but I'm not sure how it works in this context. Not like this:
group_11 = sum([ v for k,v in codes.items() if any(k.startswith('11')])
I have 20 groups to check against, but I want to be able to total any set of keys grouping by first x characters as the groupings could change in the future.
You can use itertools.groupby to sort (the sorting is important for groupby to work properly) and group your dict's items by the first two key chars and sum the values for each group:
from itertools import groupby
d = {
k: sum(item[1] for item in g)
for k, g in groupby(sorted(codes.items()), key=lambda item: item[0][:2])
}
d
{'11': 8, '32': 1, '31': 1, '21': 4, '23': 7}
You could convert all the items in codes to Counter and sum them together:
from collections import Counter
codes = {
'113110': 7, '113310': 1, '213111': 1,
'213112': 3, '236115': 2, '236220': 1,
'238190': 1, '238330': 1, '238990': 2,
'311612': 1, '321214': 1
}
sum((Counter({k[:2]: v}) for k, v in codes.iteritems()), Counter()) # Counter({'11': 8, '23': 7, '21': 4, '32': 1, '31': 1})

Categories