Related
As stated in the title, I'm trying to generate all partitions of a set of size n where all the subsets have size 2, and if n is uneven, there is ne singleton set. I very slightly modified some SO code for generating all partitions to get this:
def partitionIntoPairs(collection):
if len(collection) == 1:
yield [ collection ]
return
first = collection[0]
for smaller in partition2(collection[1:]):
for n, subset in enumerate(smaller):
if len(subset):
yield smaller[:n] + [[ first ] + subset] + smaller[n+1:]
yield [ [ first ] ] + smaller
This works, but is sadly far too slow. My second idea is to generate all pairs for a certain set using itertools.combinations, and then recursively call the function for every det without a given pair removed, but I'm guessing that's even slower. Also the implmentation is incorrect, it only returns one possible paritition, and I am unsure how to get it to return all of them:
from itertools import combinations
def partitionIntoPairs2(collection):
if not collection:
return []
elif len(collection) == 1:
return [(next(iter(collection)))]
else:
pairs = set(combinations(collection, 2))
for pair in pairs:
collection.remove(pair[0])
collection.remove(pair[1])
return partition3(collection) + [pair]
I stumbled upon some algorithms for partitions with a given number of sets, and various implementations of algorithms generating all possible partitions, but neither of those efficiently solve my problem as far as I can see.
So, to formulate a more concrete question: If the second algorithm is a viable option, what would be the correct implementation? And of course, is there a faster way to do this? If yes, how?
Partition should be viewed as a set and two partition differing only by order should be considered as the same one. So there are only 3 partitions of number set (1,2,3,4).
the number of partitions should be N!/(N/2)!/2^(N/2). Using Stirling's formula, it is approx. Sqrt(2)*(N/e)^(N/2) where e=2.71828... and very huge.
I leveraged #VirtualScooter's code and provide the recursive version of Partition, which runs faster than his itertools version (note this is not an apple-apple comparison because my Partition has no repeats).
import itertools
import timeit
t3 = (1, 2, 3)
t4 = (1, 2, 3, 4)
t6 = (1, 2, 3, 4, 5, 6)
def grouper(iterable, n, fillvalue=None):
"""Collect data into fixed-length chunks or blocks.
Code from Python itertools page
"""
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.zip_longest(*args, fillvalue=fillvalue)
def partitionIntoPairs(collection):
perms = itertools.permutations(collection)
for p in perms:
group = list(grouper(p, 2))
if group[-1][-1] is None:
group[-1] = (group[-1][0],)
yield group
def Partition(Indexable):
if len(Indexable)<=2:
yield [Indexable]
elif len(Indexable)%2==1:
for i,x in enumerate(Indexable):
for s in Partition(Indexable[:i]+Indexable[i+1:]):
yield [[x]]+s
else:
for i,x in enumerate(Indexable):
if i==0:
x0=x
else:
for s in Partition(Indexable[1:i]+Indexable[i+1:]):
yield [[x0,x]]+s
def comp_timeit(collection, repeats=1_000):
s1 = f"l1 = list(Partition({collection}))"
s2 = f"l1 = list(partitionIntoPairs({collection}))"
t1 = timeit.timeit(s1, globals=globals(),number=repeats)
t2 = timeit.timeit(s2, globals=globals(),number=repeats)
print(f"partition, {repeats:_} runs: {t1:,.4f}")
print(f"itertools, {repeats:_} runs: {t2:,.4f}")
for p in Partition(t4):
print(p)
comp_timeit(t3)
comp_timeit(t4)
comp_timeit(t6)
This recursive generator function yields partitions when the length of the partition is the same as the original input, and only makes recursive calls when it can either add to a subpartition in progress or retain a single subpartition (if len(data)%2 == 1) :
data = {1, 2, 3}
def partition(d, m, c = []):
if len(l:=[j for k in c for j in k]) == len(d):
yield c
for i in filter(lambda x:x not in l, d):
if not c or len(c[-1]) == m:
yield from partition(d, m, c=c+[[i]])
else:
if sum(len(i) == 1 for i in c) == 1 and len(data)%2:
yield from partition(d, m, c=c+[[i]])
yield from partition(d, m, c=[*c[:-1], c[-1]+[i]])
print(list(partition(list(data), 2)))
Output:
[[[1], [2, 3]], [[1, 2], [3]], [[1], [3, 2]], [[1, 3], [2]], [[2], [1, 3]], [[2, 1], [3]], [[2], [3, 1]], [[2, 3], [1]], [[3], [1, 2]], [[3, 1], [2]], [[3], [2, 1]], [[3, 2], [1]]]
When len(data)%2 == 0:
data = {1, 2, 3, 4}
print(list(partition(list(data), 2)))
Output:
[[[1, 2], [3, 4]], [[1, 2], [4, 3]], [[1, 3], [2, 4]], [[1, 3], [4, 2]], [[1, 4], [2, 3]], [[1, 4], [3, 2]], [[2, 1], [3, 4]], [[2, 1], [4, 3]], [[2, 3], [1, 4]], [[2, 3], [4, 1]], [[2, 4], [1, 3]], [[2, 4], [3, 1]], [[3, 1], [2, 4]], [[3, 1], [4, 2]], [[3, 2], [1, 4]], [[3, 2], [4, 1]], [[3, 4], [1, 2]], [[3, 4], [2, 1]], [[4, 1], [2, 3]], [[4, 1], [3, 2]], [[4, 2], [1, 3]], [[4, 2], [3, 1]], [[4, 3], [1, 2]], [[4, 3], [2, 1]]]
This can be done with itertools, probably faster than a recursive algorithm,
like partition in another answer (https://stackoverflow.com/a/66972507/5660315). I measured 4.5s runtime for t6 in my timeit sequence like below,
versus times lower than 0.2s for mi_partition.
The first idea was to first list all permutations of the set, then split
each in subsets, using the grouper algorithm from the itertools document
page. Then, we cull the filler for the final odd-size subset if applicable.
As #Bing Wang points out, duplicates occur in this type of sequence. So,
instead, I called upon the more_itertools.set_partitions function, which
cuts down on the repetition. This also generates subsets with length greater
than 2, so these are filtered out with itertools.filterfalse.
import itertools
import timeit
import more_itertools
t3 = (1, 2, 3)
t4 = (1, 2, 3, 4)
t6 = (1, 2, 3, 4, 5, 6)
def mi_partition(collection):
k = len(collection) // 2 + len(collection) % 2
s1 = more_itertools.set_partitions(collection, k)
if False:
p1, p2 = itertools.tee(s1)
print(len(list(p1)))
s1 = p2
return itertools.filterfalse(lambda x: any(len(y)>2 for y in x), s1)
print(list(mi_partition(t3)))
print(list(mi_partition(t4)))
Output:
[[[1], [2, 3]], [[1, 2], [3]], [[2], [1, 3]]]
[[[1, 2], [3, 4]], [[2, 3], [1, 4]], [[1, 3], [2, 4]]]
A small timing comparison with the Partition algorithm from
#Bing Wang's answer, suggests that their solution is faster:
def comp_timeit(collection, repeats=1_000):
s3 = f"l1 = list(mi_partition({collection}))"
s4 = f"l1 = list(Partition({collection}))"
t3 = timeit.timeit(s3, globals=globals(),number=repeats)
print(f"more_itertools, {repeats:_} runs: {t3:,.4f}")
t4 = timeit.timeit(s4, globals=globals(),number=repeats)
print(f"Partition, {repeats:_} runs: {t4:,.4f}")
comp_timeit(t3)
comp_timeit(t4)
comp_timeit(t6)
Output below. Note that for t3 to t4, the result list has
length 3 in both cases, while for t5, it's length 15.
It seems the Partitions solution is slightly faster, probably
because it does not need to filter any solutions. For t6,
set_partitions(t6, 3) generates 90 partitions, with only 15
making it into the final answer.
more_itertools, 1_000 runs: 0.0051
Partition, 1_000 runs: 0.0024
more_itertools, 1_000 runs: 0.0111
Partition, 1_000 runs: 0.0026
more_itertools, 1_000 runs: 0.1333
Partition, 1_000 runs: 0.0160```
Your examples didn't show what does mean "all the subsets".
If you need to get all possible pairs of values in the given set just try to use set() and frozenset()
my_set = {1,2,3,}
res = set()
for value in my_set:
current_set = set()
current_set.add(value)
for value in my_set:
new_set = current_set.copy()
new_set.add(value)
res.add(frozenset(new_set))
if not len(my_set) % 2:
res = [list(new_set) for new_set in res if len(new_set) > 1]
else:
res = list(map(list, res))
print(res)
I want to get all possible groupings. All elements have to be assigned. If I use itertools.permutations, I miss some groups:
from itertools import permutations, chain
testlist = [1, 2, 3]
print(all_group_assignments(testlist))
def all_group_assignments(list):
groups = []
for i in range(0, len(list)+1):
for splits in permutations(range(1, len(list)), i):
prev = None
result = []
for split in chain(splits, [None]):
result.append(list[prev:split])
prev = split
groups.append(result)
return groups
I receive this as result:
[[[1, 2, 3]], [[1], [2, 3]], [[1, 2], [3]], [[1], [2], [3]], [[1, 2], [], [2, 3]]]
which misses the group [[1, 3], [2]] and instead includes [[1, 2], [], [2, 3]] which I don't need.
Is there a way to solve this elegantly? Thanks in advance!
You can use itertools.combinations for the task.
For example:
from itertools import combinations
lst = [1, 2, 3]
out = [[[i] for i in lst]]
for i in range(2, len(lst)+1):
for p in combinations(out[0], i):
out.append([])
group, seen = sum(p, []), set()
for v in out[0]:
if v[0] not in group and v[0] not in seen:
out[-1].append(v)
elif v[0] in group and v[0] not in seen:
out[-1].append(group)
seen = set(group)
group = []
print(out)
Prints:
[[[1], [2], [3]], [[1, 2], [3]], [[1, 3], [2]], [[1], [2, 3]], [[1, 2, 3]]]
How can I "pack" consecutive duplicated elements in a list into sublists of the repeated element?
What I mean is:
l = [1, 1, 1, 2, 2, 3, 4, 4, 1]
pack(l) -> [[1,1,1], [2,2], [3], [4, 4], [1]]
I want to do this problem in a very basic way as I have just started i.e using loops and list methods. I have looked for other methods but they were difficult for me to understand
For removing the duplicates instead of packing them, see Removing elements that have consecutive duplicates
You can use groupby:
from itertools import groupby
def pack(List):
result = []
for key, group in groupby(List):
result.append(list(group))
return result
l = [1, 1, 1, 2, 2, 3, 4, 4, 1]
print(pack(l))
Or one-line:
l = [1, 1, 1, 2, 2, 3, 4, 4, 1]
result = [list(group) for key,group in groupby(l)]
# [[1, 1, 1], [2, 2], [3], [4, 4], [1]]
You can use:
lst = [1, 1, 1, 2, 2, 3, 4, 4, 1]
# bootstrap: initialize a sublist with the first element of lst
out = [[lst[0]]]
for it1, it2 in zip(lst, lst[1:]):
# if previous item and current one are equal, append result to the last sublist
if it1 == it2:
out[-1].append(it2)
# else append a new empty sublist
else:
out.append([it2])
Output:
>>> out
[[1, 1, 1], [2, 2], [3], [4, 4], [1]]
This code will do:
data = [0,0,1,2,3,4,4,5,6,6,6,7,8,9,4,4,9,9,9,9,9,3,3,2,45,2,11,11,11]
newdata=[]
for i,l in enumerate(data):
if i==0 or l!=data[i-1]:
newdata.append([l])
else:
newdata[-1].append(l)
#Output
[[0,0],[1],[2],[3],[4,4],[5],[6,6,6],[7],[8],[9],[4,4],[9,9,9,9,9],[3,3],[2],[45],[2],[11,11,11]]
I have a list of lists sorted in an ascending order, similar to this one:
input = [[1,1],[1,2],[1,3],[1,4],[2,1],[2,2],[2,3],[3,1],[6,1],[6,2]]
I want to filter this list so that the new list would only contain the first two (or the only) element with matching integers in position 0, like so:
output = [[1,1],[1,2],[2,1],[2,2],[3,1],[6,1],[6,2]]
It would be ideal if the remaining elements (the ones which did not meet the criteria) would remain on the input list, while the matching elements would be stored separately.
How do I go about doing this?
Thank you in advance!
Edit: The elements on the index 1 could be virtually any integers, e.g. [[1,6],[1,7],[1,8],[2,1],[2,2]]
Pandas
Although this is a bit overkill, we can use pandas for this:
import pandas as pd
pd.DataFrame(d).groupby(0).head(2).values.tolist()
With d the original list. This then yields:
>>> pd.DataFrame(d).groupby(0).head(2).values.tolist()
[[1, 1], [1, 2], [2, 1], [2, 2], [3, 1], [6, 1], [6, 2]]
Note that this will return copies of the lists, not the original lists. Furthermore all the rows should have the same number of items.
Itertools groupby and islice
If the list is ordered lexicographically, then we can use itertools.groupby:
from operator import itemgetter
from itertools import groupby, islice
[e for _, g in groupby(d, itemgetter(0)) for e in islice(g, 2)]
this again yields:
>>> [e for _, g in groupby(d, itemgetter(0)) for e in islice(g, 2)]
[[1, 1], [1, 2], [2, 1], [2, 2], [3, 1], [6, 1], [6, 2]]
It is also more flexible since we copy the reference to the list, and all lists can have a different number of elements (at least one here).
EDIT
The rest of the values can be obtained, by letting islice work the opposite way: retain everything but the firs two:
[e for _, g in groupby(d, itemgetter(0)) for e in islice(g, 2, None)]
we then obtain:
>>> [e for _, g in groupby(d, itemgetter(0)) for e in islice(g, 2, None)]
[[1, 3], [1, 4], [2, 3]]
You could also use a collections.defaultdict to group the sublists by the first index:
from collections import defaultdict
from pprint import pprint
input_lst = [[1,1],[1,2],[1,3],[1,4],[2,1],[2,2],[2,3],[3,1],[6,1],[6,2]]
groups = defaultdict(list)
for lst in input_lst:
key = lst[0]
groups[key].append(lst)
pprint(groups)
Which gives this grouped dictionary:
defaultdict(<class 'list'>,
{1: [[1, 1], [1, 2], [1, 3], [1, 4]],
2: [[2, 1], [2, 2], [2, 3]],
3: [[3, 1]],
6: [[6, 1], [6, 2]]})
Then you could just take the first two [:2] values from each key, and make sure the result is flattened and sorted in the end:
from itertools import chain
result = sorted(chain.from_iterable(x[:2] for x in groups.values()))
print(result)
Which outputs:
[[1, 1], [1, 2], [2, 1], [2, 2], [3, 1], [6, 1], [6, 2]]
Does Python offer a way to iterate over all "consecutive sublists" of a given list L - i.e. sublists of L where any two consecutive elements are also consecutive in L - or should I write my own?
(Example: if L = [1, 2, 3], then the set over which I want to iterate is {[1], [2], [3], [1, 2], [2,3], [1, 2, 3]}. [1, 3] is skipped since 1 and 3 are not consecutive in L.)
I don't think there's a built-in for exactly that; but it probably wouldn't be too difficult to code up by hand - you're basically just looping through all of the possible lengths from 1 to L.length, and then taking all substrings of each length.
You could probably use itertools.chain() to combine the sequences for each length of substring together into a generator for all of them.
Example:
>>> a = [1,2,3,4]
>>> list(
... itertools.chain(
... *[[a[i:i+q] for q in xrange(1,len(a)-i+1)] for i in xrange(len(a))]
... )
... )
[[1], [1, 2], [1, 2, 3], [1, 2, 3, 4], [2], [2, 3], [2, 3, 4], [3], [3, 4], [4]]
If you prefer them in the increasing-length-and-then-lexographical-order sequence that you described, you'd want this instead:
itertools.chain(*[[a[q:i+q] for q in xrange(len(a)-i+1)] for i in xrange(1,len(a)+1)])
Try something like this:
def iter_sublists(l):
n = len(l)+1
for i in xrange(n):
for j in xrange(i+1, n):
yield l[i:j]
>>> print list(iter_sublists([1,2,3]))
[[1], [1, 2], [1, 2, 3], [2], [2, 3], [3]]
This should work:
def sublists(lst):
for sublen in xrange(1,len(lst)+1):
for idx in xrange(0,len(lst)-sublen+1):
yield lst[idx:idx+sublen]