Getting all possible combinations from a list with duplicate elements?

Getting all possible combinations from a list with duplicate elements? - python

I have list with repeated elements, for example array = [2,2,2,7].
If I use the solution suggested in this answer (using itertools.combinations()), I get:
()
(7,)
(2,)
(2,)
(2,)
(7, 2)
(7, 2)
(7, 2)
(2, 2)
(2, 2)
(2, 2)
(7, 2, 2)
(7, 2, 2)
(7, 2, 2)
(2, 2, 2)
(7, 2, 2, 2)
As you can see some of the 'combinations' are repeated, e.g. (7,2,2) appears 3 times.
The output I would like is:
()
(7,)
(2,)
(7, 2)
(2, 2)
(7, 2, 2)
(2, 2, 2)
(7, 2, 2, 2)
I could check the output for repeated combinations but I don't feel like that is the best solution to this problem.

You can take the set of the combinations and then chain them together.
from itertools import chain, combinations
arr = [2, 2, 2, 7]
list(chain.from_iterable(set(combinations(arr, i)) for i in range(len(arr) + 1)))
# [(), (7,), (2,), (2, 7), (2, 2), (2, 2, 2), (2, 2, 7), (2, 2, 2, 7)]

You would need to maintain a set of tuples that are sorted in the same fashion:
import itertools as it
desired=set([(),(7,),(2,),(7, 2),(2, 2),(7, 2, 2),(2, 2, 2),(7, 2, 2, 2)])
result=set()
for i in range(len(array)+1):
for combo in it.combinations(array, i):
result.add(tuple(sorted(combo, reverse=True)))
>>> result==desired
True

Without using itertools.combinations() and set's:
from collections import Counter
import itertools
def powerset(bag):
for v in itertools.product(*(range(r + 1) for r in bag.values())):
yield Counter(zip(bag.keys(), v))
array = [2, 2, 2, 7]
for s in powerset(Counter(array)):
# Convert `Counter` object back to a list
s = list(itertools.chain.from_iterable(itertools.repeat(*mv) for mv in s))
print(s)
I believe your problem could alternatively be stated as finding the power set of a multiset, at least according to this definition.
However it's worth noting that the method shown above will be slower than the solutions in other answers such as this one which simply group the results from itertools.combinations() into a set to remove duplicates, despite being seemingly less efficient, it is still faster in practice as iterating in Python is much slower than in C (see itertoolsmodule.c for the implementation of itertools.combinations()).
Through my limited testing, the method shown in this answer will outperform the previously cited method when there are approximately 14 distinct elements in your array, each with an average multiplicity of 2 (at which point the other method begins to pull away and run many times slower), however the running time for either method under those circumstances are >30 seconds, so if performance is of concern, then you might want to consider implementing this part of your application in C.

Related

How to append to an itertools generator

Is there a simple way to append an integer to each item in an itertools iterator? If I use itertools.product, I do not receive the expected output. For example:
>>> for i in itertools.product(itertools.combinations(np.arange(4),2),(4,)):
... print(i)
...
((0, 1), 4)
((0, 2), 4)
((0, 3), 4)
((1, 2), 4)
((1, 3), 4)
((2, 3), 4)
But I would expect (and I want) is
>>> for i in itertools.product(itertools.combinations(np.arange(4),2),(4,)):
... print(i)
...
(0, 1, 4)
(0, 2, 4)
(0, 3, 4)
(1, 2, 4)
(1, 3, 4)
(2, 3, 4)
I know that I can "flatten" the output, but I would rather construct the iterator to produce tuples, not tuples of tuples.
I have many different iterators floating around, and I want to keep the code the same for products of itertool iterators and plain itertool iterators

These two alternatives each produce an iterator. In the first case, the iterator is created by a generator expression. In the second, the iterator is created by the use of a generator function.
In [9]: for i in (tup + (4,) for tup in itertools.combinations(np.arange(4),2)):
...: print(i)
...:
(0, 1, 4)
(0, 2, 4)
(0, 3, 4)
(1, 2, 4)
(1, 3, 4)
(2, 3, 4)
In [10]:
A generator function might be more readable at the call site, especially if the function name describes its behavior.
import itertools
import numpy as np
def adder(it, addend):
for x in it:
yield x + addend
for i in adder(itertools.combinations(np.arange(4),2), (4,)):
print(i)

Zipping a set and a list in python

I'm trying to find duplicates in a list. I want to preserve the values and insert them into a tuple with their number of occurrences.
For example:
list_of_n = [2, 3, 5, 5, 5, 6, 2]
occurance_of_n = zip(set(list_of_n), [list_of_n.count(n) for n in set(list_of_n)])
[(2, 2), (3, 1), (5, 3), (6, 1)]
This works fine with small sets. My question is: as list_of_n gets larger, will I have to worry about arg1 and arg2 in zip(arg1, arg2) not lining up correctly if they're the same set?
I.e. Is there a conceivable future where I call zip() and it accidentally aligns index [0] of list_of_n in arg1 with some other index of list_of_n in arg2?
(in case it's not clear, I'm converting the list to a set for purposes of speed in arg2, and under the pretense that zip will behave better if they're the same in arg1)

Since your sample output preserves the order of appearance, you might want to go with a collections.OrderedDict to gather the counts:
list_of_n = [2, 3, 5, 5, 5, 6, 2]
d = OrderedDict()
for x in list_of_n:
d[x] = d.get(x, 0) + 1
occurance_of_n = list(d.items())
# [(2, 2), (3, 1), (5, 3), (6, 1)]
If order does not matter, the appropriate approach is using a collections.Counter:
occurance_of_n = list(Counter(list_of_n).items())
Note that both approach require only one iteration of the list. Your version could be amended to sth like:
occurance_of_n = list(set((n, list_of_n.count(n)) for n in set(list_of_n)))
# [(6, 1), (3, 1), (5, 3), (2, 2)]
but the repeated calls to list.count make an entire iteration of the initial list for each (unique) element.

Build combinations without pair repetition

Given a dataset I want to build all combinations (still using itertools.combinations).
To reduce the number of combinations n!/r!/(n-r)! I want to omit pairs included in other combinations.
For illustration a small example. All combinations(range(9), 3) looks like
(0,1,2), (0,1,3), (0,1,4),... (0,1,8),... (0,7,8), (1,2,3),...
This gives pair (0,1) being part of 7 tuples. Also for all other tuples.
Full wanted output for range(9), 3:
(0, 1, 2)
(0, 3, 4)
(0, 5, 6)
(0, 7, 8)
(1, 3, 6)
(1, 4, 7)
(1, 5, 8)
(2, 3, 8)
(2, 4, 5)
(2, 6, 7)
(3, 5, 7)
(4, 6, 8)
Building tuples of length r given a range of n elements uses (n-1)*n/2 pairs and should provide (n-1)*n/(r-1)/r tuples.
How to build the output?
How to name this "combinations with pair omitting" thing scientifically?

For tuples of length 3
from numpy import roll
from pprint import pprint
def a(s):
if len(s)<3:return
s=list(s)
z=s.pop(0)
s1,s2=s[0::2],s[1::2]
l=[]
for a,b in zip(s1,s2):l+=[(z,a,b)]
z1,z2=s1.pop(0),s2.pop(0)
if(len(s1)>1 and len(s2)>1):
for a,b in sorted(map(sorted,zip(s1,roll(s2,-1)))):l+=[(z1,a,b)]
if(len(s1)>2 and len(s2)>2):
for a,b in sorted(map(sorted,zip(s1,roll(s2,1)))):l+=[(z2,a,b)]
if(len(s1)>3):l+=[a(s1)]
if(len(s2)>3):l+=[a(s2)]
return l
s = range(9)
pprint(a(s))
Using recursion we could build for various lengths

Python generating all nondecreasing sequences

I am having trouble finding a way to do this in a Pythonic way. I assume I can use itertools somehow because I've done something similar before but can't remember what I did.
I am trying to generate all non-decreasing lists of length L where each element can take on a value between 1 and N. For example if L=3 and N=3 then [1,1,1],[1,1,2],[1,1,3],[1,2,2],[1,2,3], etc.

You can do this using itertools.combinations_with_replacement:
>>> L, N = 3,3
>>> cc = combinations_with_replacement(range(1, N+1), L)
>>> for c in cc: print(c)
(1, 1, 1)
(1, 1, 2)
(1, 1, 3)
(1, 2, 2)
(1, 2, 3)
(1, 3, 3)
(2, 2, 2)
(2, 2, 3)
(2, 3, 3)
(3, 3, 3)
This works because c_w_r preserves the order of the input, and since we're passing a nondecreasing sequence in, we only get nondecreasing tuples out.
(It's easy to convert to lists if you really need those as opposed to tuples.)

Interact with Each Iteration of itertools.product

I'm using the product method from the itertools python library to calculate all permutations of items in a list of lists. As an example:
>> elems = [[1,2],[4,5],[7,8]]
>> permutations = list(itertools.product(*elems))
>> print permutations
# this prints [(1, 4, 7), (1, 4, 8), (1, 5, 7), (1, 5, 8), (2, 4, 7), (2, 4, 8), (2, 5, 7), (2, 5, 8)]
How can I check each permutation as it is calculated, rather than returning the entire set of permutations at once? The problem I am currently facing is that I run into a python Memory Error while running my script because too many permutations are being generated. I only care about a single one of the permutations. If I can check each permutation as it is generated, I can store just a single value rather than storing every possible permutation. Is this possible, and if so, how would I go about implementing this?

You can just do it in a for loop, one at a time:
for a_perm in itertools.product(*elems):
print(a_perm)
itertools.product() gives you iterator, which you can iterate over one item at a time.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting all possible combinations from a list with duplicate elements? - python

You can take the set of the combinations and then chain them together. from itertools import chain, combinations arr = [2, 2, 2, 7] list(chain.from_iterable(set(combinations(arr, i)) for i in range(len(arr) + 1))) # [(), (7,), (2,), (2, 7), (2, 2), (2, 2, 2), (2, 2, 7), (2, 2, 2, 7)]

Related

How to append to an itertools generator

Zipping a set and a list in python

Build combinations without pair repetition

Python generating all nondecreasing sequences

Interact with Each Iteration of itertools.product

Categories

Resources