Using Recursion to make sequences of a word - python

I was given a homework assignment to find all possible sequences of a given word. eg. if word = 'abc', the below code would return ['a', 'ab', 'abc', 'ac', 'acb', 'b', 'ba', 'bac', 'bc', 'bca', 'c', 'ca', 'cab', 'cb', 'cba'].
However, this feels inefficient. I'm just starting to learn recursion, so I'm not sure if there is a better or more efficient way to produce these sequences?
edit:
I think it's necessary to add a couple things as I kept working and reading the material
Duplicates are fine, those are sorted out in a separate function
Each value is unique, so sequence 'aab' should produce two 'aa' sequences
def gen_all_strings(word):
if len(word) == 1:
return list(word)
else:
main_list = list()
for idx in range(len(word)):
cur_val = word[idx]
rest = gen_all_strings(word[:idx] + word[idx+1:])
main_list.append(cur_val)
for seq in rest:
main_list.append(cur_val + seq)
return main_list

Itertools and list comprehensions are good for breaking stuff down like this.
import itertools
["".join(x) for y in range(1, len(word) + 1) for x in itertools.permutations(word, y)]

Related

Get all possible ordered sublists of a list

Let's say I have a list with the following letters:
lst=['A','B','C','D']
And I need to get all the possible sublists of that list that maintain the order. Thus, the result must be:
res=['A'
'AB'
'ABC'
'ABCD'
'B'
'BC'
'BCD'
'C'
'CD'
'D']
I had implemebted the following for loop, but an error occurs, saying that "TypeError:Can only concatenate str (not "list) to str"
res=[]
for x in range(len(lst)):
for y in range(len(lst)):
if x==y:
res.appebd(x)
if y>x:
res.append(lst[x]+lst[y:len(lst)-1]
Is there a better and more efficient way to do this?
lst=['A','B','C','D']
out = []
for i in range(len(lst)):
for j in range(i, len(lst)):
out.append( ''.join(lst[i:j+1]) )
print(out)
Prints:
['A', 'AB', 'ABC', 'ABCD', 'B', 'BC', 'BCD', 'C', 'CD', 'D']
Rather than nested loops with redefined inner loop bounds on each go, you can use itertools to generate the bounds for you:
from itertools import combinations
lst = ['A','B','C','D']
out = []
for s, e in combinations(range(len(lst) + 1), 2):
out.append(''.join(lst[s:e]))
combinations conveniently produces all possible start and end indices from a single range, producing each set one at a time in your desired order. It also simplifies the code enough that the equivalent listcomp isn't too unreadable, allowing you to condense three lines of code down to one:
out = [''.join(lst[s:e]) for s, e in combinations(range(len(lst) + 1), 2)]
Either way, out ends up with the value:
['A', 'AB', 'ABC', 'ABCD', 'B', 'BC', 'BCD', 'C', 'CD', 'D']
This is probably the closest to what you have got and will produce the desired result:
res=[]
for x in range(len(lst)):
for y in range(len(lst)):
if x==y:
res.append(lst[x])
if y>x:
res.append(''.join(lst[x:y+1]))
The error you are describing mean that you are trying to add a character to a list:
lst[x]+lst[y:len(lst)-1]
lst[x] is a character and lst[y:len(lst)-1] is a list of characters and python does not know how to add it together. It can add a character and a string though using a join function.

Python code to solve classic P(n, r): Print all permutations of n objects taken r at a time without repetition

Python code to solve classic P(n, r)
Problem: Print all permutations of n objects taken r at a time without repetition.
I'm a Python learner looking for an elegant solution vs. trying to solve a coding problem at work.
Interested in seeing code to solve the classic P(n, r) permuation problem -- how to print all permuations of a string taken r characters at a time, without repeated characters.
Because learning is my focus, not interested in using the Python itertools "permutations" library function. Looked at it, but couldn't understand what it was doing. Looking for actual code to solve this problem, so I can learn the implementation.
Example: if input string s == 'abcdef', and r == 4, then n == 6.
Output would be something like: abcd abce abcf abde abdf abef ...
There are a lot of closely similar questions, but I didn't find a duplicate. Most specify "r". I want to leave r as an input parameter to keep the solution general.
This approach uses recursive generator functions which I find very readable. It is the easiest to start with combinations:
def combs(s, r):
if not r:
yield ''
elif s:
first, rest = s[0], s[1:]
for comb in combs(rest, r-1):
yield first + comb # use first char ...
yield from combs(rest, r) # ... or don't
>>> list(combs('abcd', 2))
['ab', 'ac', 'ad', 'bc', 'bd', 'cd']
>>> list(combs('abcd', 3))
['abc', 'abd', 'acd', 'bcd']
And build permutations on top of them:
def perms(s, r):
if not r:
yield ''
else:
for comb in combs(s, r):
for i, char in enumerate(comb):
rest = comb[:i] + comb[i+1:]
for perm in perms(rest, r-1):
yield char + perm
>>> list(perms('abc', 2))
['ab', 'ba', 'ac', 'ca', 'bc', 'cb']
>>> list(perms('abcd', 2))
['ab', 'ba', 'ac', 'ca', 'ad', 'da', 'bc', 'cb', 'bd', 'db', 'cd', 'dc']

Getting all combinations of a string and its substrings

I've seen many questions on getting all the possible substrings (i.e., adjacent sets of characters), but none on generating all possible strings including the combinations of its substrings.
For example, let:
x = 'abc'
I would like the output to be something like:
['abc', 'ab', 'ac', 'bc', 'a', 'b', 'c']
The main point is that we can remove multiple characters that are not adjacent in the original string (as well as the adjacent ones).
Here is what I have tried so far:
def return_substrings(input_string):
length = len(input_string)
return [input_string[i:j + 1] for i in range(length) for j in range(i, length)]
print(return_substrings('abc'))
However, this only removes sets of adjacent strings from the original string, and will not return the element 'ac' from the example above.
Another example is if we use the string 'abcde', the output list should contain the elements 'ace', 'bd' etc.
You can do this easily using itertools.combinations
>>> from itertools import combinations
>>> x = 'abc'
>>> [''.join(l) for i in range(len(x)) for l in combinations(x, i+1)]
['a', 'b', 'c', 'ab', 'ac', 'bc', 'abc']
If you want it in the reversed order, you can make the range function return its sequence in reversed order
>>> [''.join(l) for i in range(len(x),0,-1) for l in combinations(x, i)]
['abc', 'ab', 'ac', 'bc', 'a', 'b', 'c']
This is a fun exercise. I think other answers may use itertools.product or itertools.combinations. But just for fun, you can also do this recursively with something like
def subs(string, ret=['']):
if len(string) == 0:
return ret
head, tail = string[0], string[1:]
ret = ret + list(map(lambda x: x+head, ret))
return subs(tail, ret)
subs('abc')
# returns ['', 'a', 'b', 'ab', 'c', 'ac', 'bc', 'abc']
#Sunitha answer provided the right tool to use. I will just go and suggest an improved way while using your return_substrings method. Basically, my solution will take care of duplicates.
I will use "ABCA" in order to prove validity of my solution. Note that it would include a duplicate 'A' in the returned list of the accepted answer.
Python 3.7+ solution,
x= "ABCA"
def return_substrings(x):
all_combnations = [''.join(l) for i in range(len(x)) for l in combinations(x, i+1)]
return list(reversed(list(dict.fromkeys(all_combnations))))
# return list(dict.fromkeys(all_combnations)) for none-reversed ordering
print(return_substrings(x))
>>>>['ABCA', 'BCA', 'ACA', 'ABA', 'ABC', 'CA', 'BA', 'BC', 'AA', 'AC', 'AB', 'C', 'B', 'A']
Python 2.7 solution,
You'll have to use OrderedDict instead of a normal dict. Therefore,
return list(reversed(list(dict.fromkeys(all_combnations))))
becomes
return list(reversed(list(OrderedDict.fromkeys(all_combnations))))
Order is irrelevant for you ?
You can reduce code complexity if order is not relevant,
x= "ABCA"
def return_substrings(x):
all_combnations = [''.join(l) for i in range(len(x)) for l in combinations(x, i+1)]
return list(set(all_combnations))
def return_substrings(s):
all_sub = set()
recent = {s}
while recent:
tmp = set()
for word in recent:
for i in range(len(word)):
tmp.add(word[:i] + word[i + 1:])
all_sub.update(recent)
recent = tmp
return all_sub
For an overkill / different version of the accepted answer (expressing combinations using https://docs.python.org/3/library/itertools.html#itertools.product ):
["".join(["abc"[y[0]] for y in x if y[1]]) for x in map(enumerate, itertools.product((False, True), repeat=3))]
For a more visual interpretation, consider all substrings as a mapping of all bitstrings of length n.

How can I use different kinds of permutations in Python3?

The normal kind of permutation is:
'ABC'
↓
'ACB'
'BAC'
'BCA'
'CAB'
'CBA'
But, what if I want to do this:
'ABC'
↓
'AA'
'AB'
'AC'
'BA'
'BB'
'BC'
'CA'
'CB'
'CC'
What is this called, and how efficient would this be with arrays with hundreds of elements?
Your terminology is a bit confusing: what you have are not permutations of your characters, but rather the pairing of every possible character with every possible character: a Cartesian product.
You can use itertools.product to generate these combinations, but note that this returns an iterator rather than a container. So if you need all the combinations in a list, you need to construct a list explicitly:
from itertools import product
mystr = 'ABC'
prodlen = 2
products = list(product(mystr,repeat=prodlen))
Or, if you're only looping over these values:
for char1,char2 in product(mystr,repeat=prodlen):
# do something with your characters
...
Or, if you want to generate the 2-length strings, you can do this in a list comprehension:
allpairs = [''.join(pairs) for pairs in products]
# ['AA', 'AB', 'AC', 'BA', 'BB', 'BC', 'CA', 'CB', 'CC']
Nothing against itertools, but if you want a little insight on how to manually generate permutations of strings by applying modulo arithmetic to an incrementing sequence number. Should work with a string of any length and any value of n where n <= len(s)
The number of permutations generated is len(s) ** n
For example, just call printPermutations("abc", 2)
def printPermutations(s, n) :
if (not s) or (n < 1):
return
maxpermutations = len(s) ** n
for p in range(maxpermutations):
perm = getSpecificPermutation(s, n, p)
print(perm)
def getSpecificPermutation(s, n, p):
# s is the source string
# n is the number of characters to extract
# p is the permutation sequence number
result = ''
for j in range(n):
result = s[p % len(s)] + result
p = p // len(s)
return result
You'll want to use the itertools solution. But I know what it's called...
Most people call it counting. You're being sneaky about it, but I think it's just counting in base len(set), where set is your input set (I'm assuming it is truly a set, no repeated elements). Imagine, in your example A -> 0, B->1, C->2. You're also asking for elements that have a certain amount of max digits. Let me show you:
def numberToBase(n, b):
if n == 0:
return [0]
digits = []
while n:
digits.append(int(n % b))
n /= b
return digits[::-1]
def count_me(set, max_digits=2):
# Just count! From 0 to len(set) ** max_digits to be precise
numbers = [i for i in range(len(set) ** max_digits)]
# Convert to base len(set)
lists_of_digits_in_base_b = [numberToBase(i, len(set)) for i in numbers]
# Add 0s to the front (making each list of digits max_digit - 1 in length)
prepended_with_zeros = []
for li in lists_of_digits_in_base_b:
prepended_with_zeros.append([0]*(max_digits - len(li)) + li)
# Map each digit to an item in our set
m = {index: item for index, item in enumerate(set)}
temp = map(lambda x: [m[digit] for digit in x], prepended_with_zeros)
# Convert to strings
temp2 = map(lambda x: [str(i) for i in x], prepended_with_zeros)
# Concatenate each item
concat_strings = map(lambda a: reduce(lambda x, y: x + y, a, ""), temp)
return concat_strings
Here's some outputs:
print count_me("ABC", 2)
outputs:
['AA', 'AB', 'AC', 'BA', 'BB', 'BC', 'CA', 'CB', 'CC']
and
print count_me("ABCD", 2)
outputs:
['AA', 'AB', 'AC', 'AD', 'BA', 'BB', 'BC', 'BD', 'CA', 'CB', 'CC', 'CD', 'DA', 'DB', 'DC', 'DD']
and
print count_me("ABCD", 3)
outputs (a big one):
['AAA', 'AAB', 'AAC', 'AAD', 'ABA', 'ABB', 'ABC', 'ABD', 'ACA', 'ACB', 'ACC', 'ACD', 'ADA', 'ADB', 'ADC', 'ADD', 'BAA', 'BAB', 'BAC', 'BAD', 'BBA', 'BBB', 'BBC', 'BBD', 'BCA', 'BCB', 'BCC', 'BCD', 'BDA', 'BDB', 'BDC', 'BDD', 'CAA', 'CAB', 'CAC', 'CAD', 'CBA', 'CBB', 'CBC', 'CBD', 'CCA', 'CCB', 'CCC', 'CCD', 'CDA', 'CDB', 'CDC', 'CDD', 'DAA', 'DAB', 'DAC', 'DAD', 'DBA', 'DBB', 'DBC', 'DBD', 'DCA', 'DCB', 'DCC', 'DCD', 'DDA', 'DDB', 'DDC', 'DDD']
P.S. numberToBase courtesy of this post
As it says Andras Deak, using itertools product:
import itertools
for i, j in itertools.product('ABC', repeat=2):
print(i + j)

How to generate subpeptides (special combinations) from a string representing a cyclic peptide?

Here is my problem: I have a sequence representing a cyclic peptide and I'm trying to create a function that generate all possible subpeptides. A subpeptide is created when bonds between 2 aminoacids are broken. For example: for the peptide 'ABCD', its subpeptides would be 'A', 'B', 'C', 'D', 'AB', 'BC', 'CD', 'DA', 'ABC', 'BCD', 'CDA', DAB'. Thus, the amount of possible subpeptides from a peptide of length n will always be n*(n-1). Note that not all of them are substrings from peptide ('DA', 'CDA'...).
I've written a code that generate combinations. However, there are some excessive elements, such as not linked aminoacids ('AC', 'BD'...). Does anyone have a hint of how could I eliminate those, since peptide may have a different length each time the function is called? Here's what I have so far:
def Subpeptides(peptide):
subpeptides = []
from itertools import combinations
for n in range(1, len(peptide)):
subpeptides.extend(
[''.join(comb) for comb in combinations(peptide, n)]
)
return subpeptides
Here are the results for peptide 'ABCD':
['A', 'B', 'C', 'D', 'AB', 'AC', 'AD', 'BC', 'BD', 'CD', 'ABC', 'ABD', 'ACD', 'BCD']
The order of aminoacids is not important, if they represent a real sequence of the peptide. For example, 'ABD' is a valid form of 'DAB', since D and A have a bond in the cyclic peptide.
I'm using Python.
it's probably easier to just generate them all:
def subpeptides(peptide):
l = len(peptide)
looped = peptide + peptide
for start in range(0, l):
for length in range(1, l):
print(looped[start:start+length])
which gives:
>>> subpeptides("ABCD")
A
AB
ABC
B
BC
BCD
C
CD
CDA
D
DA
DAB
(if you want a list instead of printing, just change print(...) to yield ... and you have a generator).
all the above does is enumerate the different places the first bond could be broken, and then the different products you would get if the next bond broke after one, two, or three (in this case) acids. looped is just an easy way to avoid having the logic of going "round the loop".
Last term is missed
you can use below code
def subpeptides(peptide):
l = len(peptide)
ls=[]
looped = peptide + peptide
for start in range(0, l):
for length in range(1, l):
ls.append( (looped[start:start+length]))
ls.append(peptide)
return ls
you can use this one
>>>aa='ABCD'
>>> F=[]
>>> B=[]
>>> for j in range(1,len(aa)+1,1):
for i in range(0,len(aa),1):
A=str.split(((aa*j)[i:i+j]))
B=B+A
C=(B[0:len(aa)*len(aa)-len(aa)+1])
it gives you:
C=['A', 'B', 'C', 'D', 'AB', 'BC', 'CD', 'DA', 'ABC', 'BCD', 'CDA', 'DAB', 'ABCD']
i hope this helps , btw im doing the coursera course too if it would be of interest joining up forces , let me know

Categories