actually I'm looking for combinatory with a limited number of repetitions, I know in python in itertools we already have for no repetitions and with ANY repetition, but I can't found anything for this.
Lets remember, a Combinatory we pick n elements, with a max repetitions, and in the elements, we don't care the order, (A, B, C) is the same as (B, C, A).
Here an example:
A B, C
picking 2, repeated 0:
A, B
A, C
B, C
picking 2, repeated 1:
A, A
A, B
A, C
B, B
B, C
C, C
The functions combinations and combinations_with_replacement doesn't give this behavior, is like what I'm looking for is the mix of both.
Lets be clear, the number of repetitions is the max, so with ABCD, pick 3 and repeat 2, AAB would be valid.
Is there a lib or module with somthing like this?
Considerations:
I use big list to apply this, so even If I can filter the results from combinations_with_replacement is not very efficient one by one, I need a generator to not overload the ram.
I would like to avoid this method, or some other more efficient:
def Combs2(data, pick, rep):
for i in itertools.combinations_with_replacement(data, pick):
s = set(i)
show = True
for j in s:
if i.count(j) > (rep+1):
show = False
break
if show:
yield i
I test this code, is soo slow that kills every multiprocessing that I'm using, instead using the cores ends using 1....
Edited:
To show the difference with combinations_with_replacement
We have ABCDE, lets pick 3 elements with 1 rep.
here example, AAB, BBC, BCA
AAA would be not valid.
We can't get this with combinations_with_replacement or combinations.
There is a one-to-one correspondence between the combinations that you seek and k-tuples of bounded non-negative integers with a given target sum. For example,
AAB, when drawn from ABC consists of 2 A, 1 B and 0 C, so the correspondence is AAB <=> (2,1,0).
One strategy is to write a generator for such tuples and then decode it as it generates to get the output that you want:
#generates multisets of a given size but with bounded multiplicity
#The following generator generates all tuples of
#non-negative integers of length k
#bounded by d and summing to n
#shouldn't be called if n > k*d
def f(n,k,d,path = ()):
if n == 0:
yield path + (0,)*k
elif k == 1:
yield path + (n,)
else:
lower = max(0,n - (k-1)*d)
upper = min(n,d)
for i in range(lower,upper+1):
yield from f(n-i,k-1,d,path + (i,))
def bounded_combos(items,count,bound):
for t in f(count,len(items),bound):
combo = []
for item,count in zip(items,t):
combo.extend([item]*count)
yield combo
For example,
>>> for c in bounded_combos('ABC',3,2): print(c)
['B', 'C', 'C']
['B', 'B', 'C']
['A', 'C', 'C']
['A', 'B', 'C']
['A', 'B', 'B']
['A', 'A', 'C']
['A', 'A', 'B']
In terms of the number tuples:
>>> for t in f(3,3,2): print(t)
(0, 1, 2)
(0, 2, 1)
(1, 0, 2)
(1, 1, 1)
(1, 2, 0)
(2, 0, 1)
(2, 1, 0)
As far as how many goes, you can work out a recursive formula for the number of combinations with the key idea that if
a_1 + a_2 + ... + a_k = n
then
a_2 + a_3 + ... + a_k = n - a_1
hence the count for a given k and n can be reduced to counts for k-1 and smaller n. A naive implementation of this recursion would involve repeatedly evalauting the same expression, but memoization makes it feasible for large k,n:
def g(n,k,d):
memo = {} #memoization dictionary
def h(n,k):
if (n,k) in memo:
return memo[(n,k)]
else:
if n == 0:
count = 1
elif k == 1:
count = 1
else:
lower = max(0,n - (k-1)*d)
upper = min(n,d)
count = sum(h(n-i,k-1) for i in range(lower,upper+1))
memo[(n,k)] = count
return count
return h(n,k)
Examples:
>>> g(3,3,2)
7
>>> g(10,5,5)
651
>>> g(100,20,10)
18832730699014127291
I don't know of any closed-form formula for these. As an experiment, I evaluated
','.join(str(g(n,n,2)) for n in range(1,11))
and pasted it into the search bar of the On-Line Encyclopedia of Integer Sequences and got a very interesting hit: A002426, the central trinomial coefficients, which are discussed here. If you rerun this experiment with different choices of n,k,d, you might stumble upon a nice formula for the overall function.
I have a list of unique items, such as this one:
myList = ['a','b','c','d','e','f','g','h','i','j']
I want to find every possible way to split this list in half. For example, this is one way:
A = ['g','b','j','d','e']
B = ['f','a','h','i','c']
The first thing I thought of was to find all the combinations of 5 items from the list, and make this be sub-list A, and then everything else would be sub-list B:
for combination in itertools.combinations(myList, 5):
A = combination
B = everything_else()
This however does not work, as I will get every result twice. For example, if one of the combinations is ['a','b','c','d','e'] then, from this loop, I will get:
A = ['a','b','c','d','e']
B = ['f','g','h','i','j']
But then later on, when the combination ['f','g','h','i','j'] comes up, I will also get:
A = ['f','g','h','i','j']
B = ['a','b','c','d','e']
For my purposes, two sets of combinations are the same, therefore I should only get this result once. How can I achieve this?
EDIT: And to clarify, I want every single possible way to split the list (without any element appearing in both A and B at the same time, of course).
Liberal application of sets can solve this quite easily:
def split(items):
items = frozenset(items)
combinations = (frozenset(combination) for combination in itertools.combinations(items, len(items) // 2))
return {frozenset((combination, items - combination)) for combination in combinations}
Which works as expected:
>>> split([1, 2, 3, 4])
{
frozenset({frozenset({2, 4}), frozenset({1, 3})}),
frozenset({frozenset({1, 4}), frozenset({2, 3})}),
frozenset({frozenset({3, 4}), frozenset({1, 2})})
}
This follows your basic idea—we use the combinations of five from the original large set of items, and then get the other elements (which is easy enough with a set difference). We can then simplify down the duplicates by making the pairs sets as well, so the order doesn't matter and the two in any order are treated as equivalent. We then make the outer structure a set, which means the duplicates are removed.
The use of frozenset over set here is because mutable sets can't be members of other sets. We don't need any mutation here though, so that isn't a problem.
Obviously this isn't the most efficient possible solution, as we are still generating the duplicates, but it is probably the easiest and most foolproof way of implementing it.
This also leads pretty clearly into a simple upgrade for the later extension to the problem you give in the comments:
def split(items, first_length=None):
items = frozenset(items)
if first_length == None:
first_length = len(items) // 2
combinations = (frozenset(combination) for combination in itertools.combinations(items, first_length))
return {frozenset((combination, items - combination)) for combination in combinations}
Your basic idea was sound, but as you noted you were getting duplicate splits. The obvious and simplest correction is to record every split you compute and check each new split computed against those already generated. Of course, the most efficient way to record and test splits is to keep them in a set:
import itertools
def split(myList):
assert len(myList) % 2 == 0
s = set(tuple(myList))
seen = set()
for combination in itertools.combinations(myList, len(myList) // 2):
A = list(combination)
A.sort()
A = tuple(A)
if A in seen: # saw this split already
continue
B = list(s - set(A))
B.sort()
B = tuple(B)
if B in seen: # saw this split
continue
seen.add(A) # record that we have seen this split
seen.add(B) # record that we have seen this split
yield (A, B) # yield next split
for s in split(['a', 'b', 'c', 'd']):
print(s)
Prints:
(('a', 'b'), ('c', 'd'))
(('a', 'c'), ('b', 'd'))
(('a', 'd'), ('b', 'c'))
I want to implement an algorithm that gets the index of letter changes.
I have the below list, here I want to find the beginning of every letter changes and put a result list except the first one. Because, for the first one, we should get the last index of occurrence of it. Let me give you an example:
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
Transitions:
'A','A','A','A','A','A','A','A','A','A','A','A'-->'B'-->'C','C'-->'X'-->'D'-->'X'-->'B','B'-->'A','A','A','A'
Here, after A letters finish, B starts, we should put the index of last A and the index of first B and so on, but we should not include X letter into the result list.
Desired result:
[(11, 'A'), (12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]
So far, I have done this code, this finds other items except the (11, 'A'). How can I modify my code to get the desired result?
for i in range(len(letters)):
if letters[i]!='X' and letters[i]!=letters[i-1]:
result.append((i,(letters[i])))
My result:
[(12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')] ---> missing (11, 'A').
Now that you've explained you want the first index of every letter after the first, here's a one-liner:
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
[(n+1, b) for (n, (a,b)) in enumerate(zip(letters,letters[1:])) if a!=b and b!='X']
#=> [(12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]
Now, your first entry is different. For this, you need to use a recipe which finds the last index of each item:
import itertools
grouped = [(len(list(g))-1,k) for k,g in (itertools.groupby(letters))]
weird_transitions = [grouped[0]] + [(n+1, b) for (n, (a,b)) in enumerate(zip(letters,letters[1:])) if a!=b and b!='X']
#=> [(11, 'A'), (12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]
Of course, you could avoid creating the whole list of grouped, because you only ever use the first item from groupby. I leave that as an exercise for the reader.
This will also give you an X as the first item, if X is the first (set of) items. Because you say nothing about what you're doing, or why the Xs are there, but omitted, I can't figure out if that's the right behaviour or not. If it's not, then probably use my entire other recipe (in my other answer), and then take the first item from that.
Your question is a bit confusing, but this code should do what you want.
firstChangeFound = False
for i in range(len(letters)):
if letters[i]!='X' and letters[i]!=letters[i-1]:
if not firstChangeFound:
result.append((i-1, letters[i-1])) #Grab the last occurrence of the first character
result.append((i, letters[i]))
firstChangeFound = True
else:
result.append((i, letters[i]))
You want (Or, you don't, as you finally explained - see my other answer):
import itertools
import functional # get it from pypi
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
grouped = [(len(list(g)),k) for k,g in (itertools.groupby(letters))]
#=> [(12, 'A'), (1, 'B'), (2, 'C'), (1, 'D'), (2, 'B'), (4, 'A')]
#-1 to take this from counts to indices
filter(lambda (a,b): b!='X',functional.scanl(lambda (a,b),(c,d): (a+c,d), (-1,'X'), grouped))
#=> [(11, 'A'), (12, 'B'), (14, 'C'), (16, 'D'), (19, 'B'), (23, 'A')]
This gives you the last index of each letter run, other than Xs. If you want the first index after the relevant letter, then switch the -1 to 0.
scanl is a reduce which returns intermediate results.
As a general rule, it makes sense to either filter first or last, unless that is for some reason expensive, or the filtering can easily be accomplished without increasing complexity.
Also, your code is relatively hard to read and understand, because you iterate by index. That's unusual in python, unless manipulating the index numerically. If you're visiting every item, it's usual to iterate directly.
Also, why do you want this particular format? It's usual to have the format as (unique item,data) because that can easily be placed in a dict.
With minimal change to your code, and following Josh Caswell's suggestion:
for i, letter in enumerate(letters[1:], 1):
if letter != 'X' and letters[i] != letters[i-1]:
result.append((i, letter))
first_change = result[0][0]
first_stretch = ''.join(letters[:first_change]).rstrip('X')
if first_stretch:
result.insert(0, (len(first_stretch) - 1, first_stretch[-1]))
Here's a solution which uses groupby to generate a single sequence from which both first and last indices can be extracted.
import itertools
import functools
letters = ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'C', 'C', 'X', 'D', 'X', 'B', 'B', 'A', 'A', 'A', 'A']
groupbysecond = functools.partial(itertools.groupby,key=operator.itemgetter(1))
def transitions(letters):
#segregate transition and non-transition indices
grouped = groupbysecond(enumerate(zip(letters,letters[1:])))
# extract first such entry from each group
firsts = (next(l) for k,l in grouped)
# group those entries together - where multiple, there are first and last
# indices of the run of letters
regrouped = groupbysecond((n,a) for n,(a,b) in firsts)
# special case for first entry, which wants last index of first letter
kfirst,lfirst = next(regrouped)
firstitem = (tuple(lfirst)[-1],) if kfirst != 'X' else ()
#return first item, and first index for all other letters
return itertools.chain(firstitem,(next(l) for k,l in regrouped if k != 'X'))
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
prev = letters[0]
result = []
for i in range(len(letters)):
if prev!=letters[i]:
result.append((i-1,prev))
if letters[i]!='X':
prev = letters[i]
else:
prev = letters[i+1]
result.append((len(letters)-1,letters[-1]))
print result
RESULTS IN: (Not OP's desired results, sorry I must have misunderstood. see JSutton's ans)
[(11,'A'), (12,'B'), (14,'C'), (16,'D'), (19,'B'), (23,'A')]
which is actually the index of the last instance of a letter before they change or the list ends.
With an aid of dictionary to keep running time linear in number of input, here is a solution:
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
def f(letters):
result = []
added = {}
for i in range(len(letters)):
if (i+1 == len(letters)):
break
if letters[i+1]!='X' and letters[i+1]!=letters[i]:
if(i not in added and letters[i]!='X'):
result.append((i, letters[i]))
added[i] = letters[i]
if(i+1 not in added):
result.append((i+1, letters[i+1]))
added[i+1] = letters[i+1]
return result
Basically, my the solution always tries to add both indices where a change occurred. But the dictionary (which has constant time lookup tells us if we already added the element or not to exclude duplicates). This takes care of adding the first element. Otherwise you can use an if statement to indicate first round which will only run once. However, I argue that this solution has same running time. As long as you do not check if you added the element by looking up the list itself (since this is linear time lookup at worst), this will result in O(n^2) time which is bad!
Here's my suggestion. It has three steps.
Fist, find all the starting indexes for each run of letters.
Replace the index in the first non-X run with the index of the end of its run, which will be one less than the start of the following run.
Filter out all X runs.
The code:
def letter_runs(letters):
prev = None
results = []
for index, letter in enumerate(letters):
if letter != prev:
prev = letter
results.append((index, letter))
if results[0][1] != "X":
results[0] = (results[1][0]-1, results[0][1])
else: # if first run is "X" second must be something else!
results[1] = (results[2][0]-1, results[1][1])
return [(index, letter) for index, letter in results if letter != "X"]
I have a list of strings:
l = ['a', 'b', 'c']
I want to create all possible combinations of the list elements in groups of different sizes. I would prefer this to be a list of tuples of tuples, but it could also be a list of lists of lists, etc. The orders of the tuples, and of the tuples in the tuples, does not matter. No list element can be repeated in either the tuples or the tuples of tuples. For the above list, I would expect something like:
[(('a'),('b'),('c')),
(('a', 'b'), ('c')),
(('a', 'c'), ('b')),
(('b', 'c'), ('a')),
(('a', 'b', 'c'))]
Any help is greatly appreciated.
EDIT:
I do require that each of the tuples in the list contain all of the elements of l.
senderle and Antimony, you are both correct regarding the omissions.
Here's one way to do things. I don't know if there are any more elegant methods. The itertools module has functions for combinations and permutations, but unfortunately, nothing for partitions.
Edit: My first version isn't correct, but fortunately, I already have this lying around from an old project I did.
You can also get a unique integer key that represents an edge bitset associated with each partition by returning d instead of d.values(). This is useful for efficiently testing whether one partition is a refinement of another.
def connectivityDictSub(num, d, setl, key, i):
if i >= num:
assert(key not in d)
d[key] = setl
else:
for ni in range(len(setl)):
nsetl, nkey = setl[:], key
for other in nsetl[ni]:
assert(other != i)
x,y = sorted((i, other))
ki = ((2*num-3-x)*x)/2 + y-1
nkey |= 1<<ki
nsetl[ni] = nsetl[ni] + [i] #not the same as += since it makes a copy
connectivityDictSub(num, d, nsetl, nkey, i+1)
nsetl = setl + [[i]]
connectivityDictSub(num, d, nsetl, key, i+1)
def connectivityDict(groundSet):
gset = sorted(set(groundSet))
d = {}
connectivityDictSub(len(gset), d, [], 0, 0)
for setl in d.values():
setl[:] = [tuple(gset[i] for i in x) for x in setl]
return map(tuple, d.values())
for x in connectivityDict('ABCD'):
print x
itertools should do most of the job you want.
Example:
stuff = [1, 2, 3]
for L in range(0, len(stuff)+1):
for subset in itertools.combinations(stuff, L):
print(subset)
The example is just to show itertools. You will have to figure it out to get the exact output you want.
I was put in a position today in which I needed to enumerate all possible combinations of jagged list. For instance, a naive approach would be:
for a in [1,2,3]:
for b in [4,5,6,7,8,9]:
for c in [1,2]:
yield (a,b,c)
This is functional, but not general in terms of the number of lists that can be used. Here is a more generalized approach:
from numpy import zeros, array, nonzero, max
make_subset = lambda x,y: [x[i][j] for i,j in enumerate(y)]
def combinations(items):
num_items = [len(i) - 1 for i in items]
state = zeros(len(items), dtype=int)
finished = array(num_items, dtype=int)
yield grab_items(items, state)
while True:
if state[-1] != num_items[-1]:
state[-1] += 1
yield make_subset(items, state)
else:
incrementable = nonzero(state != finished)[0]
if not len(incrementable):
raise StopIteration
rightmost = max(incrementable)
state[rightmost] += 1
state[rightmost+1:] = 0
yield make_subset(items, state)
Any recommendations on a better approach or reasons against the above approach?
The naive approach can be written more compactly as a generator expression:
((a,b,c) for a in [1,2,3] for b in [4,5,6,7,8,9] for c in [1,2])
The general approach can be written much more simply using a recursive function:
def combinations(*seqs):
if not seqs: return (item for item in ())
first, rest = seqs[0], seqs[1:]
if not rest: return ((item,) for item in first)
return ((item,) + items for item in first for items in combinations(*rest))
Sample usage:
>>> for pair in combinations('abc', [1,2,3]):
... print pair
...
('a', 1)
('a', 2)
('a', 3)
('b', 1)
('b', 2)
('b', 3)
('c', 1)
('c', 2)
('c', 3)