The normal kind of permutation is:
'ABC'
↓
'ACB'
'BAC'
'BCA'
'CAB'
'CBA'
But, what if I want to do this:
'ABC'
↓
'AA'
'AB'
'AC'
'BA'
'BB'
'BC'
'CA'
'CB'
'CC'
What is this called, and how efficient would this be with arrays with hundreds of elements?
Your terminology is a bit confusing: what you have are not permutations of your characters, but rather the pairing of every possible character with every possible character: a Cartesian product.
You can use itertools.product to generate these combinations, but note that this returns an iterator rather than a container. So if you need all the combinations in a list, you need to construct a list explicitly:
from itertools import product
mystr = 'ABC'
prodlen = 2
products = list(product(mystr,repeat=prodlen))
Or, if you're only looping over these values:
for char1,char2 in product(mystr,repeat=prodlen):
# do something with your characters
...
Or, if you want to generate the 2-length strings, you can do this in a list comprehension:
allpairs = [''.join(pairs) for pairs in products]
# ['AA', 'AB', 'AC', 'BA', 'BB', 'BC', 'CA', 'CB', 'CC']
Nothing against itertools, but if you want a little insight on how to manually generate permutations of strings by applying modulo arithmetic to an incrementing sequence number. Should work with a string of any length and any value of n where n <= len(s)
The number of permutations generated is len(s) ** n
For example, just call printPermutations("abc", 2)
def printPermutations(s, n) :
if (not s) or (n < 1):
return
maxpermutations = len(s) ** n
for p in range(maxpermutations):
perm = getSpecificPermutation(s, n, p)
print(perm)
def getSpecificPermutation(s, n, p):
# s is the source string
# n is the number of characters to extract
# p is the permutation sequence number
result = ''
for j in range(n):
result = s[p % len(s)] + result
p = p // len(s)
return result
You'll want to use the itertools solution. But I know what it's called...
Most people call it counting. You're being sneaky about it, but I think it's just counting in base len(set), where set is your input set (I'm assuming it is truly a set, no repeated elements). Imagine, in your example A -> 0, B->1, C->2. You're also asking for elements that have a certain amount of max digits. Let me show you:
def numberToBase(n, b):
if n == 0:
return [0]
digits = []
while n:
digits.append(int(n % b))
n /= b
return digits[::-1]
def count_me(set, max_digits=2):
# Just count! From 0 to len(set) ** max_digits to be precise
numbers = [i for i in range(len(set) ** max_digits)]
# Convert to base len(set)
lists_of_digits_in_base_b = [numberToBase(i, len(set)) for i in numbers]
# Add 0s to the front (making each list of digits max_digit - 1 in length)
prepended_with_zeros = []
for li in lists_of_digits_in_base_b:
prepended_with_zeros.append([0]*(max_digits - len(li)) + li)
# Map each digit to an item in our set
m = {index: item for index, item in enumerate(set)}
temp = map(lambda x: [m[digit] for digit in x], prepended_with_zeros)
# Convert to strings
temp2 = map(lambda x: [str(i) for i in x], prepended_with_zeros)
# Concatenate each item
concat_strings = map(lambda a: reduce(lambda x, y: x + y, a, ""), temp)
return concat_strings
Here's some outputs:
print count_me("ABC", 2)
outputs:
['AA', 'AB', 'AC', 'BA', 'BB', 'BC', 'CA', 'CB', 'CC']
and
print count_me("ABCD", 2)
outputs:
['AA', 'AB', 'AC', 'AD', 'BA', 'BB', 'BC', 'BD', 'CA', 'CB', 'CC', 'CD', 'DA', 'DB', 'DC', 'DD']
and
print count_me("ABCD", 3)
outputs (a big one):
['AAA', 'AAB', 'AAC', 'AAD', 'ABA', 'ABB', 'ABC', 'ABD', 'ACA', 'ACB', 'ACC', 'ACD', 'ADA', 'ADB', 'ADC', 'ADD', 'BAA', 'BAB', 'BAC', 'BAD', 'BBA', 'BBB', 'BBC', 'BBD', 'BCA', 'BCB', 'BCC', 'BCD', 'BDA', 'BDB', 'BDC', 'BDD', 'CAA', 'CAB', 'CAC', 'CAD', 'CBA', 'CBB', 'CBC', 'CBD', 'CCA', 'CCB', 'CCC', 'CCD', 'CDA', 'CDB', 'CDC', 'CDD', 'DAA', 'DAB', 'DAC', 'DAD', 'DBA', 'DBB', 'DBC', 'DBD', 'DCA', 'DCB', 'DCC', 'DCD', 'DDA', 'DDB', 'DDC', 'DDD']
P.S. numberToBase courtesy of this post
As it says Andras Deak, using itertools product:
import itertools
for i, j in itertools.product('ABC', repeat=2):
print(i + j)
Related
I am trying to create all subset of a given string recursively.
Given string = 'aab', we generate all subsets for the characters being distinct.
The answer is: ["", "b", "a", "ab", "ba", "a", "ab", "ba", "aa", "aa", "aab", "aab", "aba", "aba", "baa", "baa"].
I have been looking at several solutions such as this one
but I am trying to make the function accept a single variable- only the string and work with that, and can't figure out how.
I have been also looking at this solution of a similar problem, but as it deals with lists and not strings I seem to have some trouble transforming that to accept and generate strings.
Here is my code, in this example I can't connect the str to the list. Hence my question.
I edited the input and the output.
def gen_all_strings(word):
if len(word) == 0:
return ''
rest = gen_all_strings(word[1:])
return rest + [[ + word[0]] + dummy for dummy in rest]
from itertools import *
def recursive_product(s,r=None,i=0):
if r is None:
r = []
if i>len(s):
return r
for c in product(s, repeat=i):
r.append("".join(c))
return recursive_product(s,r,i+1)
print(recursive_product('ab'))
print(recursive_product('abc'))
Output:
['', 'a', 'b', 'aa', 'ab', 'ba', 'bb']
['', 'a', 'b', 'c', 'aa', 'ab', 'ac', 'ba', 'bb', 'bc', 'ca', 'cb', 'cc', 'aaa', 'aab', 'aac', 'aba', 'abb', 'abc', 'aca', 'acb', 'acc', 'baa', 'bab', 'bac', 'bba', 'bbb', 'bbc', 'bca', 'bcb', 'bcc', 'caa', 'cab', 'cac', 'cba', 'cbb', 'cbc', 'cca', 'ccb', 'ccc']
To be honest it feels really forced to use recursion in this case, a much simpler version that has the same results:
nonrecursive_product = lambda s: [''.join(c)for i in range(len(s)+1) for c in product(s,repeat=i)]
This is the powerset of the set of characters in the string.
from itertools import chain, combinations
s = set('ab') #split string into a set of characters
# combinations gives the elements of the powerset of a given length r
# from_iterable puts all these into an 'iterable'
# which is converted here to a list
list(chain.from_iterable(combinations(s, r) for r in range(len(s)+1)))
import itertools as it
s='aab'
subsets = sorted(list(map("".join, it.chain.from_iterable(it.permutations(s,r) for r in range(len(s) + 1)))))
print(subsets)
# ['', 'a', 'a', 'aa', 'aa', 'aab', 'aab', 'ab', 'ab', 'aba', 'aba', 'b', 'ba', 'ba', 'baa', 'baa']
For example, given the alphabet = 'abcd', how I can get this output in Python:
a
aa
b
bb
ab
ba
(...)
iteration by iteration.
I already tried the powerset() function that is found here on stackoverflow,
but that doesn't repeat letters in the same string.
Also, if I want to set a minimum and maximum limit that the string can have, how can I?
For example min=3 and max=4, abc, aaa, aba, ..., aaaa, abca, abcb, ...
You can use combinations_with_replacement from itertools (docs). The function combinations_with_replacement takes an iterable object as its first argument (e.g. your alphabet) and the desired length of the combinations to generate. Since you want strings of different lengths, you can loop over each desired length.
For example:
from itertools import combinations_with_replacement
def get_all_poss_strings(alphabet, min_length, max_length):
poss_strings = []
for r in range(min_length, max_length + 1):
poss_strings += combinations_with_replacement(alphabet, r)
return ["".join(s) for s in poss_strings] # combinations_with_replacement returns tuples, so join them into individual strings
Sample:
alphabet = "abcd"
min_length = 3
max_length = 4
get_all_poss_strings(alphabet, min_length, max_length)
Output:
['aaa', 'aab', 'aac', 'aad', 'abb', 'abc', 'abd', 'acc', 'acd', 'add', 'bbb', 'bbc', 'bbd', 'bcc', 'bcd', 'bdd', 'ccc', 'ccd', 'cdd', 'ddd', 'aaaa', 'aaab', 'aaac', 'aaad', 'aabb', 'aabc', 'aabd', 'aacc', 'aacd', 'aadd', 'abbb', 'abbc', 'abbd', 'abcc', 'abcd', 'abdd', 'accc', 'accd', 'acdd', 'addd', 'bbbb', 'bbbc', 'bbbd', 'bbcc', 'bbcd', 'bbdd', 'bccc', 'bccd', 'bcdd', 'bddd', 'cccc', 'cccd', 'ccdd', 'cddd', 'dddd']
Edit:
If order also matters for your strings (as indicated by having "ab" and "ba"), you can use the following function to get all permutations of all lengths in a given range:
from itertools import combinations_with_replacement, permutations
def get_all_poss_strings(alphabet, min_length, max_length):
poss_strings = []
for r in range(min_length, max_length + 1):
combos = combinations_with_replacement(alphabet, r)
perms_of_combos = []
for combo in combos:
perms_of_combos += permutations(combo)
poss_strings += perms_of_combos
return list(set(["".join(s) for s in poss_strings]))
Sample:
alphabet = "abcd"
min_length = 1
max_length = 2
get_all_poss_strings(alphabet, min_length, max_length)
Output:
['a', 'aa', 'ab', 'ac', 'ad', 'b', 'ba', 'bb', 'bc', 'bd', 'c', 'ca', 'cb', 'cc', 'cd', 'd', 'da', 'db', 'dc', 'dd']
You can use the product function of itertools with varying lengths. The result differs in order from the example you give, but this may be what you want. This results in a generator that you can use to get all your desired strings. This code lets you set a minimum and a maximum length of the returned strings. If you do not specify a value for parameter maxlen then the generator is infinite. Be sure you have a way to stop it or you will get an infinite loop.
import itertools
def allcombinations(alphabet, minlen=1, maxlen=None):
thislen = minlen
while maxlen is None or thislen <= maxlen:
for prod in itertools.product(alphabet, repeat=thislen):
yield ''.join(prod)
thislen += 1
for c in allcombinations('abcd', minlen=1, maxlen=2):
print(c)
This example gives the printout which is similar to your first example, though in a different order.
a
b
c
d
aa
ab
ac
ad
ba
bb
bc
bd
ca
cb
cc
cd
da
db
dc
dd
If you really want a full list, just use
list(allcombinations('abcd', minlen=1, maxlen=2))
I was given a homework assignment to find all possible sequences of a given word. eg. if word = 'abc', the below code would return ['a', 'ab', 'abc', 'ac', 'acb', 'b', 'ba', 'bac', 'bc', 'bca', 'c', 'ca', 'cab', 'cb', 'cba'].
However, this feels inefficient. I'm just starting to learn recursion, so I'm not sure if there is a better or more efficient way to produce these sequences?
edit:
I think it's necessary to add a couple things as I kept working and reading the material
Duplicates are fine, those are sorted out in a separate function
Each value is unique, so sequence 'aab' should produce two 'aa' sequences
def gen_all_strings(word):
if len(word) == 1:
return list(word)
else:
main_list = list()
for idx in range(len(word)):
cur_val = word[idx]
rest = gen_all_strings(word[:idx] + word[idx+1:])
main_list.append(cur_val)
for seq in rest:
main_list.append(cur_val + seq)
return main_list
Itertools and list comprehensions are good for breaking stuff down like this.
import itertools
["".join(x) for y in range(1, len(word) + 1) for x in itertools.permutations(word, y)]
I've seen many questions on getting all the possible substrings (i.e., adjacent sets of characters), but none on generating all possible strings including the combinations of its substrings.
For example, let:
x = 'abc'
I would like the output to be something like:
['abc', 'ab', 'ac', 'bc', 'a', 'b', 'c']
The main point is that we can remove multiple characters that are not adjacent in the original string (as well as the adjacent ones).
Here is what I have tried so far:
def return_substrings(input_string):
length = len(input_string)
return [input_string[i:j + 1] for i in range(length) for j in range(i, length)]
print(return_substrings('abc'))
However, this only removes sets of adjacent strings from the original string, and will not return the element 'ac' from the example above.
Another example is if we use the string 'abcde', the output list should contain the elements 'ace', 'bd' etc.
You can do this easily using itertools.combinations
>>> from itertools import combinations
>>> x = 'abc'
>>> [''.join(l) for i in range(len(x)) for l in combinations(x, i+1)]
['a', 'b', 'c', 'ab', 'ac', 'bc', 'abc']
If you want it in the reversed order, you can make the range function return its sequence in reversed order
>>> [''.join(l) for i in range(len(x),0,-1) for l in combinations(x, i)]
['abc', 'ab', 'ac', 'bc', 'a', 'b', 'c']
This is a fun exercise. I think other answers may use itertools.product or itertools.combinations. But just for fun, you can also do this recursively with something like
def subs(string, ret=['']):
if len(string) == 0:
return ret
head, tail = string[0], string[1:]
ret = ret + list(map(lambda x: x+head, ret))
return subs(tail, ret)
subs('abc')
# returns ['', 'a', 'b', 'ab', 'c', 'ac', 'bc', 'abc']
#Sunitha answer provided the right tool to use. I will just go and suggest an improved way while using your return_substrings method. Basically, my solution will take care of duplicates.
I will use "ABCA" in order to prove validity of my solution. Note that it would include a duplicate 'A' in the returned list of the accepted answer.
Python 3.7+ solution,
x= "ABCA"
def return_substrings(x):
all_combnations = [''.join(l) for i in range(len(x)) for l in combinations(x, i+1)]
return list(reversed(list(dict.fromkeys(all_combnations))))
# return list(dict.fromkeys(all_combnations)) for none-reversed ordering
print(return_substrings(x))
>>>>['ABCA', 'BCA', 'ACA', 'ABA', 'ABC', 'CA', 'BA', 'BC', 'AA', 'AC', 'AB', 'C', 'B', 'A']
Python 2.7 solution,
You'll have to use OrderedDict instead of a normal dict. Therefore,
return list(reversed(list(dict.fromkeys(all_combnations))))
becomes
return list(reversed(list(OrderedDict.fromkeys(all_combnations))))
Order is irrelevant for you ?
You can reduce code complexity if order is not relevant,
x= "ABCA"
def return_substrings(x):
all_combnations = [''.join(l) for i in range(len(x)) for l in combinations(x, i+1)]
return list(set(all_combnations))
def return_substrings(s):
all_sub = set()
recent = {s}
while recent:
tmp = set()
for word in recent:
for i in range(len(word)):
tmp.add(word[:i] + word[i + 1:])
all_sub.update(recent)
recent = tmp
return all_sub
For an overkill / different version of the accepted answer (expressing combinations using https://docs.python.org/3/library/itertools.html#itertools.product ):
["".join(["abc"[y[0]] for y in x if y[1]]) for x in map(enumerate, itertools.product((False, True), repeat=3))]
For a more visual interpretation, consider all substrings as a mapping of all bitstrings of length n.
Here is my problem: I have a sequence representing a cyclic peptide and I'm trying to create a function that generate all possible subpeptides. A subpeptide is created when bonds between 2 aminoacids are broken. For example: for the peptide 'ABCD', its subpeptides would be 'A', 'B', 'C', 'D', 'AB', 'BC', 'CD', 'DA', 'ABC', 'BCD', 'CDA', DAB'. Thus, the amount of possible subpeptides from a peptide of length n will always be n*(n-1). Note that not all of them are substrings from peptide ('DA', 'CDA'...).
I've written a code that generate combinations. However, there are some excessive elements, such as not linked aminoacids ('AC', 'BD'...). Does anyone have a hint of how could I eliminate those, since peptide may have a different length each time the function is called? Here's what I have so far:
def Subpeptides(peptide):
subpeptides = []
from itertools import combinations
for n in range(1, len(peptide)):
subpeptides.extend(
[''.join(comb) for comb in combinations(peptide, n)]
)
return subpeptides
Here are the results for peptide 'ABCD':
['A', 'B', 'C', 'D', 'AB', 'AC', 'AD', 'BC', 'BD', 'CD', 'ABC', 'ABD', 'ACD', 'BCD']
The order of aminoacids is not important, if they represent a real sequence of the peptide. For example, 'ABD' is a valid form of 'DAB', since D and A have a bond in the cyclic peptide.
I'm using Python.
it's probably easier to just generate them all:
def subpeptides(peptide):
l = len(peptide)
looped = peptide + peptide
for start in range(0, l):
for length in range(1, l):
print(looped[start:start+length])
which gives:
>>> subpeptides("ABCD")
A
AB
ABC
B
BC
BCD
C
CD
CDA
D
DA
DAB
(if you want a list instead of printing, just change print(...) to yield ... and you have a generator).
all the above does is enumerate the different places the first bond could be broken, and then the different products you would get if the next bond broke after one, two, or three (in this case) acids. looped is just an easy way to avoid having the logic of going "round the loop".
Last term is missed
you can use below code
def subpeptides(peptide):
l = len(peptide)
ls=[]
looped = peptide + peptide
for start in range(0, l):
for length in range(1, l):
ls.append( (looped[start:start+length]))
ls.append(peptide)
return ls
you can use this one
>>>aa='ABCD'
>>> F=[]
>>> B=[]
>>> for j in range(1,len(aa)+1,1):
for i in range(0,len(aa),1):
A=str.split(((aa*j)[i:i+j]))
B=B+A
C=(B[0:len(aa)*len(aa)-len(aa)+1])
it gives you:
C=['A', 'B', 'C', 'D', 'AB', 'BC', 'CD', 'DA', 'ABC', 'BCD', 'CDA', 'DAB', 'ABCD']
i hope this helps , btw im doing the coursera course too if it would be of interest joining up forces , let me know