Related
I'm trying to use itertools.combinations to return unique combinations. I've searched through several similar questions but have not been able to find an answer.
An example:
>>> import itertools
>>> e = ['r','g','b','g']
>>> list(itertools.combinations(e,3))
[('r', 'g', 'b'), ('r', 'g', 'g'), ('r', 'b', 'g'), ('g', 'b', 'g')]
For my purposes, (r,g,b) is identical to (r,b,g) and so I would want to return only (rgb),(rgg) and (gbg).
This is just an illustrative example and I would want to ignore all such 'duplicates'. The list e could contain up to 5 elements. Each individual element would be either r, g or b. Always looking for combinations of 3 elements from e.
To be concrete, the following are the only combinations I wish to call 'valid': (rrr), (ggg), (bbb), (rgb).
So perhaps the question boils down to how to treat any variation of (rgb) as equal to (rgb) and therefore ignore it.
Can I use itertools to achieve this or do I need to write my own code to drop the 'dupliates' here? If no itertools solution then I can just easily check if each is a variation of (rgb), but this feels a bit 'un-pythonic'.
You can use a set to discard duplicates.
In your case the number of characters is the way you identify duplicates so you could use collections.Counter. In order to save them in a set you need to convert them to frozensets though (because Counter isn't hashable):
>>> import itertools
>>> from collections import Counter
>>> e = ['r','g','b','g']
>>> result = []
>>> seen = set()
>>> for comb in itertools.combinations(e,3):
... cnts = frozenset(Counter(comb).items())
... if cnts in seen:
... pass
... else:
... seen.add(cnts)
... result.append(comb)
>>> result
[('r', 'g', 'b'), ('r', 'g', 'g'), ('g', 'b', 'g')]
If you want to convert them to strings use:
result.append(''.join(comb)) # instead of result.append(comb)
and it will give:
['rgb', 'rgg', 'gbg']
The approach is a variation of the unique_everseen recipe (itertools module documentation) - so it's probably "quite pythonic".
According to your definition of "valid outputs", you can directly build them like this:
from collections import Counter
# Your distinct values
values = ['r', 'g', 'b']
e = ['r','g','b','g', 'g']
count = Counter(e)
# Counter({'g': 3, 'r': 1, 'b': 1})
# If x appears at least 3 times, 'xxx' is a valid combination
combinations = [x*3 for x in values if count[x] >=3]
# If all values appear at least once, 'rgb' is a valid combination
if all([count[x]>=1 for x in values]):
combinations.append('rgb')
print(combinations)
#['ggg', 'rgb']
This will be more efficient than creating all possible combinations and filtering the valid ones afterwards.
It is not completely clear what you want to return. It depends on what comes first when iterating. For example if gbr is found first, then rgb will be discarded as a duplicate:
import itertools
e = ['r','g','b','g']
s = set(e)
v = [s] * len(s)
solns = []
for c in itertools.product(*v):
in_order = sorted(c)
if in_order not in solns:
solns.append(in_order)
print solns
This would give you:
[['r', 'r', 'r'], ['b', 'r', 'r'], ['g', 'r', 'r'], ['b', 'b', 'r'], ['b', 'g', 'r'], ['g', 'g', 'r'], ['b', 'b', 'b'], ['b', 'b', 'g'], ['b', 'g', 'g'], ['g', 'g', 'g']]
I have a list
a = ['a', 'b', 'c']
of given length and I want to insert a certain element 'x' after every item to get
ax = ['a', 'x', 'b', 'x', 'c', 'x']
Since the elements are of large size, I don't want to do a lot of pops or sublists.
Any ideas?
Since the list is large, the best way is to go with a generator, like this
def interleave(my_list, filler):
for item in my_list:
yield item
yield filler
print list(interleave(['a', 'b', 'c'], 'x'))
# ['a', 'x', 'b', 'x', 'c', 'x']
Or you can return a chained iterator like this
from itertools import chain, izip, repeat
def interleave(my_list, filler):
return chain.from_iterable(izip(my_list, repeat(filler)))
repeat(filler) returns an iterator which gives filler infinite number of times.
izip(my_list, repeat(filler)) returns an iterator, which picks one value at a time from both my_list and repeat(filler). So, the output of list(izip(my_list, repeat(filler))) would look like this
[('a', 'x'), ('b', 'x'), ('c', 'x')]
Now, all we have to do is flatten the data. So, we chain the result of izip, with chain.from_iterable, which gives one value at a time from the iterables.
Have you considered itertools izip?
izip('ABCD', 'xy') --> Ax By
izip_longest can be used with a zero length list, a fillvalue, and combined via chain.from_iterable as follows:
import itertools
list(itertools.chain.from_iterable(itertools.izip_longest('ABCD', '', fillvalue='x'))
>>> ['A', 'x', 'B', 'x', 'C', 'x', 'D', 'x']
I tend to use list comprehension for such things.
a = ['a', 'b', 'c']
ax = [a[i/2] if i%2 == 0 else 'x' for i in range(2*len(a))]
print ax
['a', 'x', 'b', 'x', 'c', 'x']
You can generate your list with a nested list comprehension
a = ['a', 'b', 'c']
ax = [c for y in a for x in y, 'x']
If you don't really need ax to be a list, you can make a generator like this
ax = (c for y in a for c in (y, 'x'))
for item in ax:
# do something ...
I am generating all possible three letters keywords e.g. aaa, aab, aac.... zzy, zzz below is my code:
alphabets = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
keywords = []
for alpha1 in alphabets:
for alpha2 in alphabets:
for alpha3 in alphabets:
keywords.append(alpha1+alpha2+alpha3)
Can this functionality be achieved in a more sleek and efficient way?
keywords = itertools.product(alphabets, repeat = 3)
See the documentation for itertools.product. If you need a list of strings, just use
keywords = [''.join(i) for i in itertools.product(alphabets, repeat = 3)]
alphabets also doesn't need to be a list, it can just be a string, for example:
from itertools import product
from string import ascii_lowercase
keywords = [''.join(i) for i in product(ascii_lowercase, repeat = 3)]
will work if you just want the lowercase ascii letters.
You could also use map instead of the list comprehension (this is one of the cases where map is still faster than the LC)
>>> from itertools import product
>>> from string import ascii_lowercase
>>> keywords = map(''.join, product(ascii_lowercase, repeat=3))
This variation of the list comprehension is also faster than using ''.join
>>> keywords = [a+b+c for a,b,c in product(ascii_lowercase, repeat=3)]
from itertools import combinations_with_replacement
alphabets = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
for (a,b,c) in combinations_with_replacement(alphabets, 3):
print a+b+c
You can also do this without any external modules by doing simple calculation.
The PermutationIterator is what you are searching for.
def permutation_atindex(_int, _set, length):
"""
Return the permutation at index '_int' for itemgetter '_set'
with length 'length'.
"""
items = []
strLength = len(_set)
index = _int % strLength
items.append(_set[index])
for n in xrange(1,length, 1):
_int //= strLength
index = _int % strLength
items.append(_set[index])
return items
class PermutationIterator:
"""
A class that can iterate over possible permuations
of the given 'iterable' and 'length' argument.
"""
def __init__(self, iterable, length):
self.length = length
self.current = 0
self.max = len(iterable) ** length
self.iterable = iterable
def __iter__(self):
return self
def __next__(self):
if self.current >= self.max:
raise StopIteration
try:
return permutation_atindex(self.current, self.iterable, self.length)
finally:
self.current += 1
Give it an iterable object and an integer as the output-length.
from string import ascii_lowercase
for e in PermutationIterator(ascii_lowercase, 3):
print "".join(e)
This will start from 'aaa' and end with 'zzz'.
chars = range(ord('a'), ord('z')+1);
print [chr(a) + chr(b) +chr(c) for a in chars for b in chars for c in chars]
We could solve this without the itertools by utilizing two function definitions:
def combos(alphas, k):
l = len(alphas)
kRecur(alphas, "", l, k)
def KRecur(alphas, prfx, l, k):
if k==0:
print(prfx)
else:
for i in range(l):
newPrfx = prfx + alphas[i]
KRecur(alphas, newPrfx, l, k-1)
It's done using two functions to avoid resetting the length of the alphas, and the second function self-iterates itself until it reaches a k of 0 to return the k-mer for that i loop.
Adopted from a solution by Abhinav Ramana on Geeks4Geeks
Well, i came up with that solution while thinking about how to cover that topic:
import random
s = "aei"
b = []
lenght=len(s)
for _ in range(10):
for _ in range(length):
password = ("".join(random.sample(s,length)))
if password not in b:
b.append("".join(password))
print(b)
print(len(b))
Please let me describe what is going on inside:
Importing Random,
creating a string with letters that we want to use
creating an empty list that we will use to put our combinations in
and now we are using range (I put 10 but for 3 digits it can be less)
next using random.sample with a list and list length we are creating letter combinations and joining it.
in next steps we are checking if in our b list we have that combination - if so, it is not added to the b list. If current combination is not on the list, we are adding it to it. (we are comparing final joined combination).
the last step is to print list b with all combinations and print number of possible combinations.
Maybe it is not clear and most efficient code but i think it works...
print([a+b+c for a in alphabets for b in alphabets for c in alphabets if a !=b and b!=c and c!= a])
This removes the repetition of characters in one string
I have a list that I'm attempting to remove duplicate items from. I'm using python 2.7.1 so I can simply use the set() function. However, this reorders my list. Which for my particular case is unacceptable.
Below is a function I wrote; which does this. However I'm wondering if there's a better/faster way. Also any comments on it would be appreciated.
def ordered_set(list_):
newlist = []
lastitem = None
for item in list_:
if item != lastitem:
newlist.append(item)
lastitem = item
return newlist
The above function assumes that none of the items will be None, and that the items are in order (ie, ['a', 'a', 'a', 'b', 'b', 'c', 'd'])
The above function returns ['a', 'a', 'a', 'b', 'b', 'c', 'd'] as ['a', 'b', 'c', 'd'].
Another very fast method with set:
def remove_duplicates(lst):
dset = set()
# relies on the fact that dset.add() always returns None.
return [item for item in lst
if item not in dset and not dset.add(item)]
Use an OrderedDict:
from collections import OrderedDict
l = ['a', 'a', 'a', 'b', 'b', 'c', 'd']
d = OrderedDict()
for x in l:
d[x] = True
# prints a b c d
for x in d:
print x,
print
Assuming the input sequence is unordered, here's O(N) solution (both in space and time).
It produces a sequence with duplicates removed, while leaving unique items in the same relative order as they appeared in the input sequence.
>>> def remove_dups_stable(s):
... seen = set()
... for i in s:
... if i not in seen:
... yield i
... seen.add(i)
>>> list(remove_dups_stable(['q', 'w', 'e', 'r', 'q', 'w', 'y', 'u', 'i', 't', 'e', 'p', 't', 'y', 'e']))
['q', 'w', 'e', 'r', 'y', 'u', 'i', 't', 'p']
I know this has already been answered, but here's a one-liner (plus import):
from collections import OrderedDict
def dedupe(_list):
return OrderedDict((item,None) for item in _list).keys()
>>> dedupe(['q', 'w', 'e', 'r', 'q', 'w', 'y', 'u', 'i', 't', 'e', 'p', 't', 'y', 'e'])
['q', 'w', 'e', 'r', 'y', 'u', 'i', 't', 'p']
I think this is perfectly OK. You get O(n) performance which is the best you could hope for.
If the list were unordered, then you'd need a helper set to contain the items you've already visited, but in your case that's not necessary.
if your list isn't sorted then your question doesn't make sense.
e.g. [1,2,1] could become [1,2] or [2,1]
if your list is large you may want to write your result back into the same list using a SLICE to save on memory:
>>> x=['a', 'a', 'a', 'b', 'b', 'c', 'd']
>>> x[:]=[x[i] for i in range(len(x)) if i==0 or x[i]!=x[i-1]]
>>> x
['a', 'b', 'c', 'd']
for inline deleting see Remove items from a list while iterating or Remove items from a list while iterating without using extra memory in Python
one trick you can use is that if you know x is sorted, and you know x[i]=x[i+j] then you don't need to check anything between x[i] and x[i+j] (and if you don't need to delete these j values, you can just copy the values you want into a new list)
So while you can't beat n operations if everything in the set is unique i.e. len(set(x))=len(x)
There is probably an algorithm that has n comparisons as its worst case but can have n/2 comparisons as its best case (or lower than n/2 as its best case if you know somehow know in advance that len(x)/len(set(x))>2 because of the data you've generated):
The optimal algorithm would probably use binary search to find maximum j for each minimum i in a divide and conquer type approach. Initial divisions would probably be of length len(x)/approximated(len(set(x))). Hopefully it could be carried out such that even if len(x)=len(set(x)) it still uses only n operations.
There is unique_everseen solution described in
http://docs.python.org/2/library/itertools.html
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Looks ok to me. If you really want to use sets do something like this:
def ordered_set (_list) :
result = set()
lastitem = None
for item in _list :
if item != lastitem :
result.add(item)
lastitem = item
return sorted(tuple(result))
I don't know what performance you will get, you should test it; probably the same because of method's overheat!
If you really are paranoid, just like me, read here:
http://wiki.python.org/moin/HowTo/Sorting/
http://wiki.python.org/moin/PythonSpeed/PerformanceTips
Just remembered this(it contains the answer):
http://www.peterbe.com/plog/uniqifiers-benchmark
I have read the answers to the Slicing a list into n nearly-equal-length partitions [duplicate] question.
This is the accepted answer:
def partition(lst, n):
division = len(lst) / float(n)
return [ lst[int(round(division * i)): int(round(division * (i + 1)))] for i in xrange(n) ]
I am wondering, how does one modify these solutions in order to randomly assign items to a partition as opposed to incremental assignment.
Call random.shuffle() on the list before partitioning it.
Complete 2018 solution (python 3.6):
import random
def partition (list_in, n):
random.shuffle(list_in)
return [list_in[i::n] for i in range(n)]
Beware! this may mutate your original list
shuffle input list.
First you randomize the list and then you split it in n nearly equal parts.
Shuffling the list doesn't preserve order. You could do something like this instead (pretty easy to adapt to more than two parts). Completely untested.
from __future__ import annotations
from typing import TypeVar
import random
T = TypeVar("T")
def partition_list(s: list[T]) -> tuple[list[T], list[T]]:
"""
Randomly partition a list into two lists, preserving order. The number to
take is drawn from a uniform distribution.
"""
len_a = random.randint(0, len(s))
len_b = len(s) - len_a
put_in_a = [True] * len_a + [False] * len_b
random.shuffle(put_in_a)
a: list[T] = []
b: list[T] = []
for val, in_a in zip(s, put_in_a):
if in_a:
a.append(val)
else:
b.append(val)
return a, b
The random partition that also preserves the order:
def partition_preserve_order(list_in, n):
indices = list(range(len(list_in)))
shuffle(indices)
index_partitions = [sorted(indices[i::n]) for i in range(n)]
return [[list_in[i] for i in index_partition]
for index_partition in index_partitions]
(that is we shuffle the indices then sort them within the partitions)
example:
random_partition_preserve_order(list('abcdefghijklmnopqrstuvxyz'), 3)
# [
# ['c', 'd', 'g', 'm', 'p', 'r', 'v', 'x', 'y'],
# ['b', 'e', 'h', 'k', 'o', 'q', 't', 'u'],
# ['a', 'f', 'i', 'j', 'l', 'n', 's', 'z']
# ]