I am trying to create all subset of a given string recursively.
Given string = 'aab', we generate all subsets for the characters being distinct.
The answer is: ["", "b", "a", "ab", "ba", "a", "ab", "ba", "aa", "aa", "aab", "aab", "aba", "aba", "baa", "baa"].
I have been looking at several solutions such as this one
but I am trying to make the function accept a single variable- only the string and work with that, and can't figure out how.
I have been also looking at this solution of a similar problem, but as it deals with lists and not strings I seem to have some trouble transforming that to accept and generate strings.
Here is my code, in this example I can't connect the str to the list. Hence my question.
I edited the input and the output.
def gen_all_strings(word):
if len(word) == 0:
return ''
rest = gen_all_strings(word[1:])
return rest + [[ + word[0]] + dummy for dummy in rest]
from itertools import *
def recursive_product(s,r=None,i=0):
if r is None:
r = []
if i>len(s):
return r
for c in product(s, repeat=i):
r.append("".join(c))
return recursive_product(s,r,i+1)
print(recursive_product('ab'))
print(recursive_product('abc'))
Output:
['', 'a', 'b', 'aa', 'ab', 'ba', 'bb']
['', 'a', 'b', 'c', 'aa', 'ab', 'ac', 'ba', 'bb', 'bc', 'ca', 'cb', 'cc', 'aaa', 'aab', 'aac', 'aba', 'abb', 'abc', 'aca', 'acb', 'acc', 'baa', 'bab', 'bac', 'bba', 'bbb', 'bbc', 'bca', 'bcb', 'bcc', 'caa', 'cab', 'cac', 'cba', 'cbb', 'cbc', 'cca', 'ccb', 'ccc']
To be honest it feels really forced to use recursion in this case, a much simpler version that has the same results:
nonrecursive_product = lambda s: [''.join(c)for i in range(len(s)+1) for c in product(s,repeat=i)]
This is the powerset of the set of characters in the string.
from itertools import chain, combinations
s = set('ab') #split string into a set of characters
# combinations gives the elements of the powerset of a given length r
# from_iterable puts all these into an 'iterable'
# which is converted here to a list
list(chain.from_iterable(combinations(s, r) for r in range(len(s)+1)))
import itertools as it
s='aab'
subsets = sorted(list(map("".join, it.chain.from_iterable(it.permutations(s,r) for r in range(len(s) + 1)))))
print(subsets)
# ['', 'a', 'a', 'aa', 'aa', 'aab', 'aab', 'ab', 'ab', 'aba', 'aba', 'b', 'ba', 'ba', 'baa', 'baa']
Related
This question already has an answer here:
How to return a subset of a list that matches a condition [duplicate]
(1 answer)
Closed 2 years ago.
I have a list of strings like the following:
mylist = ['a', 'b', 'c', 'aa', 'bb', 'cc', 'aaa', 'bbb', 'ccc', 'aaaa', 'bbbb', 'cccc']
And I need to extract only the strings with k=4 characters, so the output would be:
minlist = ['aaaa', 'bbbb', 'cccc']
How can be implemented efficiently ?
This is exactly the type of situation the filter function is intended for:
>>> mylist = ['a', 'b', 'c', 'aa', 'bb', 'cc', 'aaa', 'bbb', 'ccc', 'aaaa', 'bbbb', 'cccc']
>>> minlist = list(filter(lambda i: len(i) == 4, mylist))
>>> minlist
['aaaa', 'bbbb', 'cccc']
filter takes two arguments: the first is a function, and the second is an iterable. The function will be applied to each element of the iterable, and if the function returns True, the element will be kept, and if the function returns False, the element will be excluded. filter returns the result of filtering these elements according to the passed in function
As a sidenote, the filter function returns a filter object, which is an iterator, rather than a list (which is why the explicit list call is included). So, if you're simply iterating over the values, you don't need to convert it to a list as it will be more efficient
Try this:
def get_minlist(my_list, k):
return [item for item in my_list if len(item) == k]
You can use this as:
print(get_minlist(["abc", "ab", "a"], 2))
Result:
['ab']
The code is pythonic, fast, and is very easy to understand. The code goes through the items in the list, checks if they are k in length, if so it keeps them.
You can check the length of a string using len().
mylist = ['a', 'b', 'c', 'aa', 'bb', 'cc', 'aaa', 'bbb', 'ccc', 'aaaa', 'bbbb', 'cccc']
minlist = [x for x in mylist if len(x) == 4]
Result:
['aaaa', 'bbbb', 'cccc']
Try this:
mylist = ['a', 'b', 'c', 'aa', 'bb', 'cc', 'aaa', 'bbb', 'ccc', 'aaaa', 'bbbb', 'cccc']
minilist=[]
for i in range (len(mylist)):
if len(mylist[i]) == 4:
minilist.append(mylist[i])
print(minilist)
Like I said in the comment, you could try something like this:
mylist = ['a', 'b', 'c', 'aa', 'bb', 'cc', 'aaa', 'bbb', 'ccc', 'aaaa', 'bbbb', 'cccc']
newlst=[]
for item in mylist:
if len(item) == 4:
newlst.append(item)
print (newlst)
'mylist = ['a', 'b', 'c', 'aa', 'bb', 'cc', 'aaa', 'bbb', 'ccc', 'aaaa', 'bbbb', 'cccc']
here we are using the concept called as list comprehension,list comprehension means it is a easy way to create a list based on some iterables.
note:-iterable is something which can be looped over
during list comprehension creation elements from the iterables(ex:-mylist) can be conditionally included in the new list and transformed as needed
syntax of list comprehension:-
note:- this symbol '|' is used to tell syntax as three parts,1st 2 parts are mandatory and the last part is optional
[give me this | from the collection | with this condition ]
[mandatory | mandatory | optional ]
[var for var in iterables condition ]
filtered_list=[item for item in mylist if len(item)==4]
print(filtered list)
How can we use recursion to calculate all dna sequences of length n in a function.
For instance if the function is given 2, it returns ['AA', 'AC', 'AT', 'AG', 'CA', 'CC', 'CT', 'CG', 'TA', 'TC', 'TT', 'TG', 'GA', 'GC', 'GT', 'GG']
etc...
functools.permutations will give all combinations of a given iterable, the second argument r is the length of the combinations returned
itertools.permutations('ACGT', length)
Here is one way:
def all_seq(n, curr, e, ways):
"""All possible sequences of size n given elements e.
ARGS
n: size of sequence
curr: a list used for constructing sequences
e: the list of possible elements (could have been a global list instead)
ways: the final list of sequences
"""
if len(curr) == n:
ways.append(''.join(curr))
return
for element in e:
all_seq(n, list(curr) + [element], e, ways)
perms = []
all_seq(2, [], ['A', 'C', 'T', 'G'], perms)
print(perms)
The ouput:
['AA', 'AC', 'AT', 'AG', 'CA', 'CC', 'CT', 'CG', 'TA', 'TC', 'TT', 'TG', 'GA', 'GC', 'GT', 'GG']
You actually want itertools.product('ACGT', repeat=n). Note that this will grow enormously fast (4^n elements of n length).
If your assignment is to do it recursively, consider how you would get all n+1-length options that start with a n-length prefix. The naive recursive option might be rather slow compared to itertools, if you need to use it in anger.
For example, given the alphabet = 'abcd', how I can get this output in Python:
a
aa
b
bb
ab
ba
(...)
iteration by iteration.
I already tried the powerset() function that is found here on stackoverflow,
but that doesn't repeat letters in the same string.
Also, if I want to set a minimum and maximum limit that the string can have, how can I?
For example min=3 and max=4, abc, aaa, aba, ..., aaaa, abca, abcb, ...
You can use combinations_with_replacement from itertools (docs). The function combinations_with_replacement takes an iterable object as its first argument (e.g. your alphabet) and the desired length of the combinations to generate. Since you want strings of different lengths, you can loop over each desired length.
For example:
from itertools import combinations_with_replacement
def get_all_poss_strings(alphabet, min_length, max_length):
poss_strings = []
for r in range(min_length, max_length + 1):
poss_strings += combinations_with_replacement(alphabet, r)
return ["".join(s) for s in poss_strings] # combinations_with_replacement returns tuples, so join them into individual strings
Sample:
alphabet = "abcd"
min_length = 3
max_length = 4
get_all_poss_strings(alphabet, min_length, max_length)
Output:
['aaa', 'aab', 'aac', 'aad', 'abb', 'abc', 'abd', 'acc', 'acd', 'add', 'bbb', 'bbc', 'bbd', 'bcc', 'bcd', 'bdd', 'ccc', 'ccd', 'cdd', 'ddd', 'aaaa', 'aaab', 'aaac', 'aaad', 'aabb', 'aabc', 'aabd', 'aacc', 'aacd', 'aadd', 'abbb', 'abbc', 'abbd', 'abcc', 'abcd', 'abdd', 'accc', 'accd', 'acdd', 'addd', 'bbbb', 'bbbc', 'bbbd', 'bbcc', 'bbcd', 'bbdd', 'bccc', 'bccd', 'bcdd', 'bddd', 'cccc', 'cccd', 'ccdd', 'cddd', 'dddd']
Edit:
If order also matters for your strings (as indicated by having "ab" and "ba"), you can use the following function to get all permutations of all lengths in a given range:
from itertools import combinations_with_replacement, permutations
def get_all_poss_strings(alphabet, min_length, max_length):
poss_strings = []
for r in range(min_length, max_length + 1):
combos = combinations_with_replacement(alphabet, r)
perms_of_combos = []
for combo in combos:
perms_of_combos += permutations(combo)
poss_strings += perms_of_combos
return list(set(["".join(s) for s in poss_strings]))
Sample:
alphabet = "abcd"
min_length = 1
max_length = 2
get_all_poss_strings(alphabet, min_length, max_length)
Output:
['a', 'aa', 'ab', 'ac', 'ad', 'b', 'ba', 'bb', 'bc', 'bd', 'c', 'ca', 'cb', 'cc', 'cd', 'd', 'da', 'db', 'dc', 'dd']
You can use the product function of itertools with varying lengths. The result differs in order from the example you give, but this may be what you want. This results in a generator that you can use to get all your desired strings. This code lets you set a minimum and a maximum length of the returned strings. If you do not specify a value for parameter maxlen then the generator is infinite. Be sure you have a way to stop it or you will get an infinite loop.
import itertools
def allcombinations(alphabet, minlen=1, maxlen=None):
thislen = minlen
while maxlen is None or thislen <= maxlen:
for prod in itertools.product(alphabet, repeat=thislen):
yield ''.join(prod)
thislen += 1
for c in allcombinations('abcd', minlen=1, maxlen=2):
print(c)
This example gives the printout which is similar to your first example, though in a different order.
a
b
c
d
aa
ab
ac
ad
ba
bb
bc
bd
ca
cb
cc
cd
da
db
dc
dd
If you really want a full list, just use
list(allcombinations('abcd', minlen=1, maxlen=2))
The normal kind of permutation is:
'ABC'
↓
'ACB'
'BAC'
'BCA'
'CAB'
'CBA'
But, what if I want to do this:
'ABC'
↓
'AA'
'AB'
'AC'
'BA'
'BB'
'BC'
'CA'
'CB'
'CC'
What is this called, and how efficient would this be with arrays with hundreds of elements?
Your terminology is a bit confusing: what you have are not permutations of your characters, but rather the pairing of every possible character with every possible character: a Cartesian product.
You can use itertools.product to generate these combinations, but note that this returns an iterator rather than a container. So if you need all the combinations in a list, you need to construct a list explicitly:
from itertools import product
mystr = 'ABC'
prodlen = 2
products = list(product(mystr,repeat=prodlen))
Or, if you're only looping over these values:
for char1,char2 in product(mystr,repeat=prodlen):
# do something with your characters
...
Or, if you want to generate the 2-length strings, you can do this in a list comprehension:
allpairs = [''.join(pairs) for pairs in products]
# ['AA', 'AB', 'AC', 'BA', 'BB', 'BC', 'CA', 'CB', 'CC']
Nothing against itertools, but if you want a little insight on how to manually generate permutations of strings by applying modulo arithmetic to an incrementing sequence number. Should work with a string of any length and any value of n where n <= len(s)
The number of permutations generated is len(s) ** n
For example, just call printPermutations("abc", 2)
def printPermutations(s, n) :
if (not s) or (n < 1):
return
maxpermutations = len(s) ** n
for p in range(maxpermutations):
perm = getSpecificPermutation(s, n, p)
print(perm)
def getSpecificPermutation(s, n, p):
# s is the source string
# n is the number of characters to extract
# p is the permutation sequence number
result = ''
for j in range(n):
result = s[p % len(s)] + result
p = p // len(s)
return result
You'll want to use the itertools solution. But I know what it's called...
Most people call it counting. You're being sneaky about it, but I think it's just counting in base len(set), where set is your input set (I'm assuming it is truly a set, no repeated elements). Imagine, in your example A -> 0, B->1, C->2. You're also asking for elements that have a certain amount of max digits. Let me show you:
def numberToBase(n, b):
if n == 0:
return [0]
digits = []
while n:
digits.append(int(n % b))
n /= b
return digits[::-1]
def count_me(set, max_digits=2):
# Just count! From 0 to len(set) ** max_digits to be precise
numbers = [i for i in range(len(set) ** max_digits)]
# Convert to base len(set)
lists_of_digits_in_base_b = [numberToBase(i, len(set)) for i in numbers]
# Add 0s to the front (making each list of digits max_digit - 1 in length)
prepended_with_zeros = []
for li in lists_of_digits_in_base_b:
prepended_with_zeros.append([0]*(max_digits - len(li)) + li)
# Map each digit to an item in our set
m = {index: item for index, item in enumerate(set)}
temp = map(lambda x: [m[digit] for digit in x], prepended_with_zeros)
# Convert to strings
temp2 = map(lambda x: [str(i) for i in x], prepended_with_zeros)
# Concatenate each item
concat_strings = map(lambda a: reduce(lambda x, y: x + y, a, ""), temp)
return concat_strings
Here's some outputs:
print count_me("ABC", 2)
outputs:
['AA', 'AB', 'AC', 'BA', 'BB', 'BC', 'CA', 'CB', 'CC']
and
print count_me("ABCD", 2)
outputs:
['AA', 'AB', 'AC', 'AD', 'BA', 'BB', 'BC', 'BD', 'CA', 'CB', 'CC', 'CD', 'DA', 'DB', 'DC', 'DD']
and
print count_me("ABCD", 3)
outputs (a big one):
['AAA', 'AAB', 'AAC', 'AAD', 'ABA', 'ABB', 'ABC', 'ABD', 'ACA', 'ACB', 'ACC', 'ACD', 'ADA', 'ADB', 'ADC', 'ADD', 'BAA', 'BAB', 'BAC', 'BAD', 'BBA', 'BBB', 'BBC', 'BBD', 'BCA', 'BCB', 'BCC', 'BCD', 'BDA', 'BDB', 'BDC', 'BDD', 'CAA', 'CAB', 'CAC', 'CAD', 'CBA', 'CBB', 'CBC', 'CBD', 'CCA', 'CCB', 'CCC', 'CCD', 'CDA', 'CDB', 'CDC', 'CDD', 'DAA', 'DAB', 'DAC', 'DAD', 'DBA', 'DBB', 'DBC', 'DBD', 'DCA', 'DCB', 'DCC', 'DCD', 'DDA', 'DDB', 'DDC', 'DDD']
P.S. numberToBase courtesy of this post
As it says Andras Deak, using itertools product:
import itertools
for i, j in itertools.product('ABC', repeat=2):
print(i + j)
My goal is to be able to generate all possible strings (Letters and numbers) of length x and be able to activate a block of code for each one. (like an iterator) The only problem is the ones in the itertools don't make copies of the letter in the same string. For example:
I get "ABC" "BAC" "CAB" etc. instead of "AAA".
Any suggestions?
Use itertools.product():
>>> import itertools
>>> map(''.join, itertools.product('ABC', repeat=3))
['AAA', 'AAB', 'AAC', 'ABA', 'ABB', 'ABC', 'ACA', 'ACB', 'ACC', 'BAA', 'BAB', 'BAC', 'BBA', 'BBB', 'BBC', 'BCA', 'BCB', 'BCC', 'CAA', 'CAB', 'CAC', 'CBA', 'CBB', 'CBC', 'CCA', 'CCB', 'CCC']
Note that creating a list containing all combinations is very inefficient for longer strings - iterate over them instead:
for string in itertools.imap(''.join, itertools.product('ABC', repeat=3)):
print string
To get all characters and numbers use string.uppercase + string.lowercase + string.digits.
Use itertools.product() if you want letters to repeat:
>>> from itertools import product
>>> from string import ascii_uppercase
>>> for combo in product(ascii_uppercase, repeat=3):
... print ''.join(combo)
...
AAA
AAB
...
ZZY
ZZZ
itertools.combinations() and itertools.permutations() are not the correct tools for your job.