Getting all combinations of a string and its substrings - python

I've seen many questions on getting all the possible substrings (i.e., adjacent sets of characters), but none on generating all possible strings including the combinations of its substrings.
For example, let:
x = 'abc'
I would like the output to be something like:
['abc', 'ab', 'ac', 'bc', 'a', 'b', 'c']
The main point is that we can remove multiple characters that are not adjacent in the original string (as well as the adjacent ones).
Here is what I have tried so far:
def return_substrings(input_string):
length = len(input_string)
return [input_string[i:j + 1] for i in range(length) for j in range(i, length)]
print(return_substrings('abc'))
However, this only removes sets of adjacent strings from the original string, and will not return the element 'ac' from the example above.
Another example is if we use the string 'abcde', the output list should contain the elements 'ace', 'bd' etc.

You can do this easily using itertools.combinations
>>> from itertools import combinations
>>> x = 'abc'
>>> [''.join(l) for i in range(len(x)) for l in combinations(x, i+1)]
['a', 'b', 'c', 'ab', 'ac', 'bc', 'abc']
If you want it in the reversed order, you can make the range function return its sequence in reversed order
>>> [''.join(l) for i in range(len(x),0,-1) for l in combinations(x, i)]
['abc', 'ab', 'ac', 'bc', 'a', 'b', 'c']

This is a fun exercise. I think other answers may use itertools.product or itertools.combinations. But just for fun, you can also do this recursively with something like
def subs(string, ret=['']):
if len(string) == 0:
return ret
head, tail = string[0], string[1:]
ret = ret + list(map(lambda x: x+head, ret))
return subs(tail, ret)
subs('abc')
# returns ['', 'a', 'b', 'ab', 'c', 'ac', 'bc', 'abc']

#Sunitha answer provided the right tool to use. I will just go and suggest an improved way while using your return_substrings method. Basically, my solution will take care of duplicates.
I will use "ABCA" in order to prove validity of my solution. Note that it would include a duplicate 'A' in the returned list of the accepted answer.
Python 3.7+ solution,
x= "ABCA"
def return_substrings(x):
all_combnations = [''.join(l) for i in range(len(x)) for l in combinations(x, i+1)]
return list(reversed(list(dict.fromkeys(all_combnations))))
# return list(dict.fromkeys(all_combnations)) for none-reversed ordering
print(return_substrings(x))
>>>>['ABCA', 'BCA', 'ACA', 'ABA', 'ABC', 'CA', 'BA', 'BC', 'AA', 'AC', 'AB', 'C', 'B', 'A']
Python 2.7 solution,
You'll have to use OrderedDict instead of a normal dict. Therefore,
return list(reversed(list(dict.fromkeys(all_combnations))))
becomes
return list(reversed(list(OrderedDict.fromkeys(all_combnations))))
Order is irrelevant for you ?
You can reduce code complexity if order is not relevant,
x= "ABCA"
def return_substrings(x):
all_combnations = [''.join(l) for i in range(len(x)) for l in combinations(x, i+1)]
return list(set(all_combnations))

def return_substrings(s):
all_sub = set()
recent = {s}
while recent:
tmp = set()
for word in recent:
for i in range(len(word)):
tmp.add(word[:i] + word[i + 1:])
all_sub.update(recent)
recent = tmp
return all_sub

For an overkill / different version of the accepted answer (expressing combinations using https://docs.python.org/3/library/itertools.html#itertools.product ):
["".join(["abc"[y[0]] for y in x if y[1]]) for x in map(enumerate, itertools.product((False, True), repeat=3))]
For a more visual interpretation, consider all substrings as a mapping of all bitstrings of length n.

Related

Get all possible ordered sublists of a list

Let's say I have a list with the following letters:
lst=['A','B','C','D']
And I need to get all the possible sublists of that list that maintain the order. Thus, the result must be:
res=['A'
'AB'
'ABC'
'ABCD'
'B'
'BC'
'BCD'
'C'
'CD'
'D']
I had implemebted the following for loop, but an error occurs, saying that "TypeError:Can only concatenate str (not "list) to str"
res=[]
for x in range(len(lst)):
for y in range(len(lst)):
if x==y:
res.appebd(x)
if y>x:
res.append(lst[x]+lst[y:len(lst)-1]
Is there a better and more efficient way to do this?
lst=['A','B','C','D']
out = []
for i in range(len(lst)):
for j in range(i, len(lst)):
out.append( ''.join(lst[i:j+1]) )
print(out)
Prints:
['A', 'AB', 'ABC', 'ABCD', 'B', 'BC', 'BCD', 'C', 'CD', 'D']
Rather than nested loops with redefined inner loop bounds on each go, you can use itertools to generate the bounds for you:
from itertools import combinations
lst = ['A','B','C','D']
out = []
for s, e in combinations(range(len(lst) + 1), 2):
out.append(''.join(lst[s:e]))
combinations conveniently produces all possible start and end indices from a single range, producing each set one at a time in your desired order. It also simplifies the code enough that the equivalent listcomp isn't too unreadable, allowing you to condense three lines of code down to one:
out = [''.join(lst[s:e]) for s, e in combinations(range(len(lst) + 1), 2)]
Either way, out ends up with the value:
['A', 'AB', 'ABC', 'ABCD', 'B', 'BC', 'BCD', 'C', 'CD', 'D']
This is probably the closest to what you have got and will produce the desired result:
res=[]
for x in range(len(lst)):
for y in range(len(lst)):
if x==y:
res.append(lst[x])
if y>x:
res.append(''.join(lst[x:y+1]))
The error you are describing mean that you are trying to add a character to a list:
lst[x]+lst[y:len(lst)-1]
lst[x] is a character and lst[y:len(lst)-1] is a list of characters and python does not know how to add it together. It can add a character and a string though using a join function.

Python code to solve classic P(n, r): Print all permutations of n objects taken r at a time without repetition

Python code to solve classic P(n, r)
Problem: Print all permutations of n objects taken r at a time without repetition.
I'm a Python learner looking for an elegant solution vs. trying to solve a coding problem at work.
Interested in seeing code to solve the classic P(n, r) permuation problem -- how to print all permuations of a string taken r characters at a time, without repeated characters.
Because learning is my focus, not interested in using the Python itertools "permutations" library function. Looked at it, but couldn't understand what it was doing. Looking for actual code to solve this problem, so I can learn the implementation.
Example: if input string s == 'abcdef', and r == 4, then n == 6.
Output would be something like: abcd abce abcf abde abdf abef ...
There are a lot of closely similar questions, but I didn't find a duplicate. Most specify "r". I want to leave r as an input parameter to keep the solution general.
This approach uses recursive generator functions which I find very readable. It is the easiest to start with combinations:
def combs(s, r):
if not r:
yield ''
elif s:
first, rest = s[0], s[1:]
for comb in combs(rest, r-1):
yield first + comb # use first char ...
yield from combs(rest, r) # ... or don't
>>> list(combs('abcd', 2))
['ab', 'ac', 'ad', 'bc', 'bd', 'cd']
>>> list(combs('abcd', 3))
['abc', 'abd', 'acd', 'bcd']
And build permutations on top of them:
def perms(s, r):
if not r:
yield ''
else:
for comb in combs(s, r):
for i, char in enumerate(comb):
rest = comb[:i] + comb[i+1:]
for perm in perms(rest, r-1):
yield char + perm
>>> list(perms('abc', 2))
['ab', 'ba', 'ac', 'ca', 'bc', 'cb']
>>> list(perms('abcd', 2))
['ab', 'ba', 'ac', 'ca', 'ad', 'da', 'bc', 'cb', 'bd', 'db', 'cd', 'dc']

Using Recursion to make sequences of a word

I was given a homework assignment to find all possible sequences of a given word. eg. if word = 'abc', the below code would return ['a', 'ab', 'abc', 'ac', 'acb', 'b', 'ba', 'bac', 'bc', 'bca', 'c', 'ca', 'cab', 'cb', 'cba'].
However, this feels inefficient. I'm just starting to learn recursion, so I'm not sure if there is a better or more efficient way to produce these sequences?
edit:
I think it's necessary to add a couple things as I kept working and reading the material
Duplicates are fine, those are sorted out in a separate function
Each value is unique, so sequence 'aab' should produce two 'aa' sequences
def gen_all_strings(word):
if len(word) == 1:
return list(word)
else:
main_list = list()
for idx in range(len(word)):
cur_val = word[idx]
rest = gen_all_strings(word[:idx] + word[idx+1:])
main_list.append(cur_val)
for seq in rest:
main_list.append(cur_val + seq)
return main_list
Itertools and list comprehensions are good for breaking stuff down like this.
import itertools
["".join(x) for y in range(1, len(word) + 1) for x in itertools.permutations(word, y)]

How to generate subpeptides (special combinations) from a string representing a cyclic peptide?

Here is my problem: I have a sequence representing a cyclic peptide and I'm trying to create a function that generate all possible subpeptides. A subpeptide is created when bonds between 2 aminoacids are broken. For example: for the peptide 'ABCD', its subpeptides would be 'A', 'B', 'C', 'D', 'AB', 'BC', 'CD', 'DA', 'ABC', 'BCD', 'CDA', DAB'. Thus, the amount of possible subpeptides from a peptide of length n will always be n*(n-1). Note that not all of them are substrings from peptide ('DA', 'CDA'...).
I've written a code that generate combinations. However, there are some excessive elements, such as not linked aminoacids ('AC', 'BD'...). Does anyone have a hint of how could I eliminate those, since peptide may have a different length each time the function is called? Here's what I have so far:
def Subpeptides(peptide):
subpeptides = []
from itertools import combinations
for n in range(1, len(peptide)):
subpeptides.extend(
[''.join(comb) for comb in combinations(peptide, n)]
)
return subpeptides
Here are the results for peptide 'ABCD':
['A', 'B', 'C', 'D', 'AB', 'AC', 'AD', 'BC', 'BD', 'CD', 'ABC', 'ABD', 'ACD', 'BCD']
The order of aminoacids is not important, if they represent a real sequence of the peptide. For example, 'ABD' is a valid form of 'DAB', since D and A have a bond in the cyclic peptide.
I'm using Python.
it's probably easier to just generate them all:
def subpeptides(peptide):
l = len(peptide)
looped = peptide + peptide
for start in range(0, l):
for length in range(1, l):
print(looped[start:start+length])
which gives:
>>> subpeptides("ABCD")
A
AB
ABC
B
BC
BCD
C
CD
CDA
D
DA
DAB
(if you want a list instead of printing, just change print(...) to yield ... and you have a generator).
all the above does is enumerate the different places the first bond could be broken, and then the different products you would get if the next bond broke after one, two, or three (in this case) acids. looped is just an easy way to avoid having the logic of going "round the loop".
Last term is missed
you can use below code
def subpeptides(peptide):
l = len(peptide)
ls=[]
looped = peptide + peptide
for start in range(0, l):
for length in range(1, l):
ls.append( (looped[start:start+length]))
ls.append(peptide)
return ls
you can use this one
>>>aa='ABCD'
>>> F=[]
>>> B=[]
>>> for j in range(1,len(aa)+1,1):
for i in range(0,len(aa),1):
A=str.split(((aa*j)[i:i+j]))
B=B+A
C=(B[0:len(aa)*len(aa)-len(aa)+1])
it gives you:
C=['A', 'B', 'C', 'D', 'AB', 'BC', 'CD', 'DA', 'ABC', 'BCD', 'CDA', 'DAB', 'ABCD']
i hope this helps , btw im doing the coursera course too if it would be of interest joining up forces , let me know

Filtering a list of strings based on contents

Given the list ['a','ab','abc','bac'], I want to compute a list with strings that have 'ab' in them. I.e. the result is ['ab','abc']. How can this be done in Python?
This simple filtering can be achieved in many ways with Python. The best approach is to use "list comprehensions" as follows:
>>> lst = ['a', 'ab', 'abc', 'bac']
>>> [k for k in lst if 'ab' in k]
['ab', 'abc']
Another way is to use the filter function. In Python 2:
>>> filter(lambda k: 'ab' in k, lst)
['ab', 'abc']
In Python 3, it returns an iterator instead of a list, but you can cast it:
>>> list(filter(lambda k: 'ab' in k, lst))
['ab', 'abc']
Though it's better practice to use a comprehension.
[x for x in L if 'ab' in x]
# To support matches from the beginning, not any matches:
items = ['a', 'ab', 'abc', 'bac']
prefix = 'ab'
filter(lambda x: x.startswith(prefix), items)
Tried this out quickly in the interactive shell:
>>> l = ['a', 'ab', 'abc', 'bac']
>>> [x for x in l if 'ab' in x]
['ab', 'abc']
>>>
Why does this work? Because the in operator is defined for strings to mean: "is substring of".
Also, you might want to consider writing out the loop as opposed to using the list comprehension syntax used above:
l = ['a', 'ab', 'abc', 'bac']
result = []
for s in l:
if 'ab' in s:
result.append(s)
mylist = ['a', 'ab', 'abc']
assert 'ab' in mylist

Categories