I need limit for particular characters in Python itertools

I need limit for particular characters in Python itertools - python

How I can find a way to get all combination with some limits for particular characters. For now I have only limit for all characters. But I want to have character "Q" 4 times in every combinations? Is that possible with my code?
I use itertools combination_with_replacement
from itertools import combinations_with_replacement
import collections
def combine(arr, s):
return [x for x in combinations_with_replacement(symbols, s) if max(collections.Counter(x).values()) <= 3]
symbols = "LNhkPepm3684th"
max_length = 10
set = 10
print(combine(symbols, set))

I notice that your symbols collection contains the letter "h" twice. I'm not sure whether your "must appear 0 or 1 or 2 times, but no more" restriction applies individually to each h, or whether it applies to all "h"es collectively. In other words, is "LLLLLNNNNhh3684hh" a legal result? The "first h" appears twice, and the "second h" appears twice, and so there are four instances of "h" total.
Here's an approach that works if all symbols are individually restricted and "LLLLLNNNNhh3684hh" is a legal result. it works on the principle that any combination of a sequence can be uniquely represented as a list of numbers indicating how many times the element at that index appears in the combination.
def restricted_sum(n, s, restrictions):
"""
Restricted sum problem. Find each list that sums up to a certain number, and obeys restrictions regarding its size and contents.
input:
n -- an integer. Indicates the length of the result.
s -- an integer. Indicates the sum of the result.
restrictions -- a list of tuples. Indicates the minimum and maximum of each corresponding element in the result.
yields:
result -- A list of positive integers, satisfying the requirements:
len(result) == n
sum(result) == s
for i in range(len(result)):
restrictions[i][0] <= result[i] <= restrictions[i][1]
"""
if n == 0:
if s == 0:
yield ()
return
else:
return
else:
if sum(t[0] for t in restrictions) > s: return
if sum(t[1] for t in restrictions) < s: return
l,r = restrictions[0]
for amt in range(l, r+1):
for rest in restricted_sum(n-1, s-amt, restrictions[1:]):
yield (amt,) + rest
def combine(characters, size, character_restrictions):
assert len(characters) == len(set(characters)) #only works for character sets with no duplicates
n = len(characters)
s = size
restrictions = tuple(character_restrictions[c] for c in characters)
for seq in restricted_sum(n, s, restrictions):
yield "".join(c*i for i,c in zip(seq, characters))
symbols = "LNhkPepm3684th"
character_restrictions = {}
#most symbols can appear 0-2 times
for c in symbols:
character_restrictions[c] = (0,2)
#these characters must appear an exact number of times
limits = {"L":5, "N": 4}
for c, amt in limits.items():
character_restrictions[c] = (amt, amt)
for result in combine(symbols, 17, character_restrictions):
print(result)
Result:
LLLLLNNNN8844tthh
LLLLLNNNN6844tthh
LLLLLNNNN6884tthh
LLLLLNNNN68844thh
LLLLLNNNN68844tth
... 23,462 more values go here...
LLLLLNNNNhh3684hh
... 4,847 more values go here...
LLLLLNNNNhhkkPPe6
LLLLLNNNNhhkkPPe3
LLLLLNNNNhhkkPPem
LLLLLNNNNhhkkPPep
LLLLLNNNNhhkkPPee

Add a dictionary that specifies the limit for each character, and uses that instead of 3 in your condition. You can use .get() with a default value so you don't have to specify all the limits.
limits = {'Q': 4, 'A': 2}
def combine(arr, s):
return [x for x in combinations_with_replacement(symbols, s) if max(collections.Counter(x).values()) <= limits.get(x, 3)]

Related

How can I count sequences that meet these constraints?

I am trying to count permutations of a sequence of I and O symbols, representing e.g. people entering (I for "in") and leaving (O for "out") a room. For a given n many I symbols, there should be exactly as many O symbols, giving a total length of 2*n for the sequence. Also, at any point in a valid permutation, the number of O symbols must be less than or equal to the number of I symbols (since it is not possible for someone to leave the room when it is empty).
Additionally, I have some initial prefix of I and O symbols, representing people who previously entered or left the room. The output should only count sequences starting with that prefix.
For example, for n=1 and an initial state of '', the result should be 1 since the only valid sequence is IO; for n=3 and an initial state of II, the possible permutations are
IIIOOO
IIOIOO
IIOOIO
for a result of 3. (There are five ways for three people to enter and leave the room, but the other two involve the first person leaving immediately.)
I'm guessing the simplest way to solve this is using itertools.permutations. This is my code so far:
n=int(input()) ##actual length will be 2*n
string=input()
I_COUNT=string.count("I")
O_COUNT=string.count("O")
if string[0]!="I":
sys.exit()
if O_COUNT>I_COUNT:
sys.exit()
perms = [''.join(p) for p in permutations(string)]
print(perms)
the goal is to get the permutation for whatever is left out of the string and append it to the user's input, so how can I append user's input to the remaining length of the string and get the count for permutation?

#cache
def count_permutations(ins: int, outs: int):
# ins and outs are the remaining number of ins and outs to process
assert outs >= ins
if ins == 0 :
# Can do nothing but output "outs"
return 1
elif outs == ins:
# Your next output needs to be an I else you become unbalanced
return count_permutations(ins - 1, outs)
else:
# Your. next output can either be an I or an O
return count_permutations(ins - 1, outs) + count_permutations(ins, outs - 1)
If, say you have a total of 5 Is and 5 Os, and you've already output one I, then you want: count_permutations(4, 5).

I'm guessing the simplest way to solve this is using itertools.permutations
Sadly, this will not be very helpful. The problem is that itertools.permutations does not care about the value of the elements it's permuting; it treats them as all distinct regardless. So if you have 6 input elements, and ask for length-6 permutations, you will get 720 results, even if all the inputs are the same.
itertools.combinations has the opposite issue; it doesn't distinguish any elements. When it selects some elements, it only puts those elements in the order they initially appeared. So if you have 6 input elements and ask for length-6 combinations, you will get 1 result - the original sequence.
Presumably what you wanted to do is generate all the distinct ways of arranging the Is and Os, then take out the invalid ones, then count what remains. This is possible, and the itertools library can help with the first step, but it is not straightforward.
It will be simpler to use a recursive algorithm directly. The general approach is as follows:
At any given time, we care about how many people are in the room and how many people must still enter. To handle the prefix, we simply count how many people are in the room right now, and subtract that from the total number of people in order to determine how many must still enter. I leave the input handling as an exercise.
To determine that count, we count up the ways that involve the next action being I (someone comes in), plus the ways that involve the next action being O (someone leaves).
If everyone has entered, there is only one way forward: everyone must leave, one at a time. This is a base case.
Otherwise, it is definitely possible for someone to come in. We recursively count the ways for everyone else to enter after that; in the recursive call, there is one more person in the room, and one fewer person who must still enter.
If there are still people who have to enter, and there is also someone in the room right now, then it is also possible for someone to leave first. We recursively count the ways for others to enter after that; in the recursive call, there is one fewer person in the room, and the same number who must still enter.
This translates into code fairly directly:
def ways_to_enter(currently_in, waiting):
if waiting == 0:
return 1
result = ways_to_enter(currently_in + 1, waiting - 1)
if currently_in > 0:
result += ways_to_enter(currently_in - 1, waiting)
return result
Some testing:
>>> ways_to_enter(0, 1) # n = 1, prefix = ''
1
>>> ways_to_enter(2, 1) # n = 3, prefix = 'II'; OR e.g. n = 4, prefix = 'IIOI'
3
>>> ways_to_enter(0, 3) # n = 3, prefix = ''
5
>>> ways_to_enter(0, 14) # takes less than a second on my machine
2674440
We can improve the performance for larger values by decorating the function with functools.cache (lru_cache prior to 3.9), which will memoize results of the previous recursive calls. The more purpose-built approach is to use dynamic programming techniques: in this case, we would initialize 2-dimensional storage for the results of ways_to_enter(x, y), and compute those values one at a time, in such a way that the values needed for the "recursive calls" have already been done earlier in the process.
That direct approach would look something like:
def ways_to_enter(currently_in, waiting):
# initialize storage
results = [[0] * currently_in for _ in waiting]
# We will iterate with `waiting` as the major axis.
for w, row in enumerate(results):
for c, column in enumerate(currently_in):
if w == 0:
value = 1
else:
value = results[w - 1][c + 1]
if c > 0:
value += results[w][c - 1]
results[w][c] = value
return results[-1][-1]

The product() function from itertools will allow you to generate all the possible sequences of 'I' and 'O' for a given length.
From that list, you can filter by the sequences that start with the user-supplied start_seq.
From that list, you can filter by the sequences that are valid, given your rules of the number and order of the 'I's and 'O's:
from itertools import product
def is_valid(seq):
'''Evaluates a sequence I's and O's following the rules that:
- there cannot be more outs than ins
- the ins and outs must be balanced
'''
_in, _out = 0, 0
for x in seq:
if x == 'I':
_in += 1
else:
_out += 1
if (_out > _in) or (_in > len(seq)/2):
return False
return True
# User inputs...
start_seq = 'II'
assert start_seq[0] != 'O', 'Starting sequence cannot start with an OUT.'
n = 3
total_len = n*2
assert len(start_seq) < total_len, 'Starting sequence is at least as big as total number, nothing to iterate.'
# Calculate all possible sequences that are total_len long, as tuples of 'I' and 'O'
seq_tuples = product('IO', repeat=total_len)
# Convert tuples to strings, e.g., `('I', 'O', 'I')` to `'IOI'`
sequences = [''.join(seq_tpl) for seq_tpl in seq_tuples]
# Filter for sequences that start correctly
sequences = [seq for seq in sequences if seq.startswith(start_seq)]
# Filter for valid sequences
sequences = [seq for seq in sequences if is_valid(seq)]
print(sequences)
and I get:
['IIIOOO', 'IIOIOO', 'IIOOIO']

Not very elegant perhaps but this certainly seems to fulfil the brief:
from itertools import permutations
def isvalid(start, p):
for c1, c2 in zip(start, p):
if c1 != c2:
return 0
n = 0
for c in p:
if c == 'O':
if (n := n - 1) < 0:
return 0
else:
n += 1
return 1
def calc(n, i):
s = i + 'I' * (n - i.count('I'))
s += 'O' * (n * 2 - len(s))
return sum(isvalid(i, p) for p in set(permutations(s)))
print(calc(3, 'II'))
print(calc(3, 'IO'))
print(calc(3, 'I'))
print(calc(3, ''))
Output:
3
2
5
5

def solve(string,n):
countI =string.count('I')
if countI==n:
return 1
countO=string.count('O')
if countO > countI:
return 0
k= solve(string + 'O',n)
h= solve(string + 'I',n)
return k+h
n= int(input())
string=input()
print(solve(string,n))

This is a dynamic programming problem.
Given the number of in and out operations remaining, we do one of the following:
If we're out of either ins or outs, we can only use operations of the other type. There is only one possible assignment.
If we have an equal number of ins or outs, we must use an in operation according to the constraints of the problem.
Finally, if we have more ins than outs, we can perform either operation. The answer, then, is the sum of the number of sequences if we choose to use an in operation plus the number of sequences if we choose to use an out operation.
This runs in O(n^2) time, although in practice the following code snippet can be made faster using a 2D-list rather than the cache annotation (I've used #cache in this case to make the recurrence easier to understand).
from functools import cache
#cache
def find_permutation_count(in_remaining, out_remaining):
if in_remaining == 0 or out_remaining == 0:
return 1
elif in_remaining == out_remaining:
return find_permutation_count(in_remaining - 1, out_remaining)
else:
return find_permutation_count(in_remaining - 1, out_remaining) + find_permutation_count(in_remaining, out_remaining - 1)
print(find_permutation_count(3, 3)) # prints 5

The number of such permutations of length 2n is given by the n'th Catalan number. Wikipedia gives a formula for Catalan numbers in terms of central binomial coefficients:
from math import comb
def count_permutations(n):
return comb(2*n,n) // (n+1)
for i in range(1,10):
print(i, count_permutations(i))
# 1 1
# 2 2
# 3 5
# 4 14
# 5 42
# 6 132
# 7 429
# 8 1430
# 9 4862

I'm unable to figure out what test cases am I failing here

I need to find the maximum occurring character in a string: a-z. It is 26 characters long i.e. 26 different types.
Even though the output is correct, I'm still failing. What am I doing wrong?
These are the conditions:
Note: If there are more than one type of equal maximum then the type with lesser ASCII value will be considered.
Input Format
The first line of input consists of number of test cases, T.
The second line of each test case consists of a string representing the type of each individual characters.
Constraints
1<= T <=10
1<= |string| <=100000
Output Format
For each test case, print the required output in a separate line.
Sample TestCase 1
Input
2
gqtrawq
fnaxtyyzz
Output
q
y
Explanation
Test Case 1: There are 2 q occurring the max while the rest all are present alone.
Test Case 2: There are 2 y and 2 z types. Since the maximum value is same, the type with lesser Ascii value is considered as output. Therfore, y is the correct type.
def testcase(str1):
ASCII_SIZE = 256
ctr = [0] * ASCII_SIZE
max = -1
ch = ''
for i in str1:
ctr[ord(i)]+=1;
for i in str1:
if max < ctr[ord(i)]:
max = ctr[ord(i)]
ch = i
return ch
print(testcase("gqtrawq"))
print(testcase("fnaxtyyzz"))
I'm passing the output i.e. I'm getting the correct output but failing the test cases.

Note the note:
Note: If there are more than one type of equal maximum then the type with lesser ASCII value will be considered.
But with your code, you return the character with highest count that appears first in the string. In case of ties, take the character itself into account in the comparison:
for i in str1:
if max < ctr[ord(i)] or max == ctr[ord(i)] and i < ch:
max = ctr[ord(i)]
ch = i
Or shorter (but not necessarily clearer) comparing tuples of (count, char):
if (max, i) < (ctr[ord(i)], ch):
(Note that this is comparing (old_cnt, new_char) < (new_cnt, old_chr)!)
Alternatively, you could also iterate the characters in the string in sorted order:
for i in sorted(str1):
if max < ctr[ord(i)]:
...
Having said that, you could simplify/improve your code by counting the characters directly instead of their ord (using a dict instead of list), and using the max function with an appropriate key function to get the most common character.
def testcase(str1):
ctr = {c: 0 for c in str1}
for c in str1:
ctr[c] += 1
return max(sorted(set(str1)), key=ctr.get)
You could also use collections.Counter, and most_common, but where's the fun in that?

What should be the output for this - print(testcase("fanaxtyfzyz"))?
IMO the output should be 'a' but your program writes 'f'.
The reason is you are iterating through the characters of the input string,
for i in str1: #Iterating through the values 'f','a','n','a','x','t',...
#first count of 'f' is considered.
#count of 'f' occurs first, count of 'a' not considered.
if max < ctr[ord(i)]:
max = ctr[ord(i)]
ch = i
Instead, you should iterate through the values of ctr. Or sort the input string and do the same.

Finding regular expression with at least one repetition of each letter

From any *.fasta DNA sequence (only 'ACTG' characters) I must find all sequences which contain at least one repetition of each letter.
For examle from sequence 'AAGTCCTAG' I should be able to find: 'AAGTC', 'AGTC', 'GTCCTA', 'TCCTAG', 'CCTAG' and 'CTAG' (iteration on each letter).
I have no clue how to do that in pyhton 2.7. I was trying with regular expressions but it was not searching for every variants.
How can I achive that?

You could find all substrings of length 4+, and then down select from those to find only the shortest possible combinations that contain one of each letter:
s = 'AAGTCCTAG'
def get_shortest(s):
l, b = len(s), set('ATCG')
options = [s[i:j+1] for i in range(l) for j in range(i,l) if (j+1)-i > 3]
return [i for i in options if len(set(i) & b) == 4 and (set(i) != set(i[:-1]))]
print(get_shortest(s))
Output:
['AAGTC', 'AGTC', 'GTCCTA', 'TCCTAG', 'CCTAG', 'CTAG']

This is another way you can do it. Maybe not as fast and nice as chrisz answere. But maybe a little simpler to read and understand for beginners.
DNA='AAGTCCTAG'
toSave=[]
for i in range(len(DNA)):
letters=['A','G','T','C']
j=i
seq=[]
while len(letters)>0 and j<(len(DNA)):
seq.append(DNA[j])
try:
letters.remove(DNA[j])
except:
pass
j+=1
if len(letters)==0:
toSave.append(seq)
print(toSave)

Since the substring you are looking for may be of about any length, a LIFO queue seems to work. Append each letter at a time, check if there are at least one of each letters. If found return it. Then remove letters at the front and keep checking until no longer valid.
def find_agtc_seq(seq_in):
chars = 'AGTC'
cur_str = []
for ch in seq_in:
cur_str.append(ch)
while all(map(cur_str.count,chars)):
yield("".join(cur_str))
cur_str.pop(0)
seq = 'AAGTCCTAG'
for substr in find_agtc_seq(seq):
print(substr)
That seems to result in the substrings you are looking for:
AAGTC
AGTC
GTCCTA
TCCTAG
CCTAG
CTAG

I really wanted to create a short answer for this, so this is what I came up with!
See code in use here
s = 'AAGTCCTAG'
d = 'ACGT'
c = len(d)
while c <= len(s):
x,c = s[:c],c+1
if all(l in x for l in d):
print(x)
s,c = s[1:],len(d)
It works as follows:
c is set to the length of the string of characters we are ensuring exist in the string (d = ACGT)
The while loop iterates over each possible substring of s such that c is smaller than the length of s.
This works by increasing c by 1 upon each iteration of the while loop.
If every character in our string d (ACGT) exist in the substring, we print the result, reset c to its default value and slice the string by 1 character from the start.
The loop continues until the string s is shorter than d
Result:
AAGTC
AGTC
GTCCTA
TCCTAG
CCTAG
CTAG
To get the output in a list instead (see code in use here):
s = 'AAGTCCTAG'
d = 'ACGT'
c,r = len(d),[]
while c <= len(s):
x,c = s[:c],c+1
if all(l in x for l in d):
r.append(x)
s,c = s[1:],len(d)
print(r)
Result:
['AAGTC', 'AGTC', 'GTCCTA', 'TCCTAG', 'CCTAG', 'CTAG']

If you can break the sequence into a list, e.g. of 5-letter sequences, you could then use this function to find repeated sequences.
from itertools import groupby
import numpy as np
def find_repeats(input_list, n_repeats):
flagged_items = []
for item in input_list:
# Create itertools.groupby object
groups = groupby(str(item))
# Create list of tuples: (digit, number of repeats)
result = [(label, sum(1 for _ in group)) for label, group in groups]
# Extract just number of repeats
char_lens = np.array([x[1] for x in result])
# Append to flagged items
if any(char_lens >= n_repeats):
flagged_items.append(item)
# Return flagged items
return flagged_items
#--------------------------------------
test_list = ['aatcg', 'ctagg', 'catcg']
find_repeats(test_list, n_repeats=2) # Returns ['aatcg', 'ctagg']

how to make an imputed string to a list, change it to a palindrome(if it isn't already) and reverse it as a string back

A string is palindrome if it reads the same forward and backward. Given a string that contains only lower case English alphabets, you are required to create a new palindrome string from the given string following the rules gives below:
1. You can reduce (but not increase) any character in a string by one; for example you can reduce the character h to g but not from g to h
2. In order to achieve your goal, if you have to then you can reduce a character of a string repeatedly until it becomes the letter a; but once it becomes a, you cannot reduce it any further.
Each reduction operation is counted as one. So you need to count as well how many reductions you make. Write a Python program that reads a string from a user input (using raw_input statement), creates a palindrome string from the given string with the minimum possible number of operations and then prints the palindrome string created and the number of operations needed to create the new palindrome string.
I tried to convert the string to a list first, then modify the list so that should any string be given, if its not a palindrome, it automatically edits it to a palindrome and then prints the result.after modifying the list, convert it back to a string.
c=raw_input("enter a string ")
x=list(c)
y = ""
i = 0
j = len(x)-1
a = 0
while i < j:
if x[i] < x[j]:
a += ord(x[j]) - ord(x[i])
x[j] = x[i]
print x
else:
a += ord(x[i]) - ord(x[j])
x [i] = x[j]
print x
i = i + 1
j = (len(x)-1)-1
print "The number of operations is ",a print "The palindrome created is",( ''.join(x) )
Am i approaching it the right way or is there something I'm not adding up?

Since only reduction is allowed, it is clear that the number of reductions for each pair will be the difference between them. For example, consider the string 'abcd'.
Here the pairs to check are (a,d) and (b,c).
Now difference between 'a' and 'd' is 3, which is obtained by (ord('d')-ord('a')).
I am using absolute value to avoid checking which alphabet has higher ASCII value.
I hope this approach will help.
s=input()
l=len(s)
count=0
m=0
n=l-1
while m<n:
count+=abs(ord(s[m])-ord(s[n]))
m+=1
n-=1
print(count)

This is a common "homework" or competition question. The basic concept here is that you have to find a way to get to minimum values with as few reduction operations as possible. The trick here is to utilize string manipulation to keep that number low. For this particular problem, there are two very simple things to remember: 1) you have to split the string, and 2) you have to apply a bit of symmetry.
First, split the string in half. The following function should do it.
def split_string_to_halves(string):
half, rem = divmod(len(string), 2)
a, b, c = '', '', ''
a, b = string[:half], string[half:]
if rem > 0:
b, c = string[half + 1:], string[rem + 1]
return (a, b, c)
The above should recreate the string if you do a + c + b. Next is you have to convert a and b to lists and map the ord function on each half. Leave the remainder alone, if any.
def convert_to_ord_list(string):
return map(ord, list(string))
Since you just have to do a one-way operation (only reduction, no need for addition), you can assume that for each pair of elements in the two converted lists, the higher value less the lower value is the number of operations needed. Easier shown than said:
def convert_to_palindrome(string):
halfone, halftwo, rem = split_string_to_halves(string)
if halfone == halftwo[::-1]:
return halfone + halftwo + rem, 0
halftwo = halftwo[::-1]
zipped = zip(convert_to_ord_list(halfone), convert_to_ord_list(halftwo))
counter = sum([max(x) - min(x) for x in zipped])
floors = [min(x) for x in zipped]
res = "".join(map(chr, floors))
res += rem + res[::-1]
return res, counter
Finally, some tests:
target = 'ideal'
print convert_to_palindrome(target) # ('iaeai', 6)
target = 'euler'
print convert_to_palindrome(target) # ('eelee', 29)
target = 'ohmygodthisisinsane'
print convert_to_palindrome(target) # ('ehasgidihmhidigsahe', 84)
I'm not sure if this is optimized nor if I covered all bases. But I think this pretty much covers the general concept of the approach needed. Compared to your code, this is clearer and actually works (yours does not). Good luck and let us know how this works for you.

how to generate a set of similar strings in python

I am wondering how to generate a set of similar strings based on Levenshtein distance (string edit distance). Ideally, I like to pass in, a source string (i.e. a string which is used to generate other strings that are similar to it), the number of strings need to be generated and a threshold as parameters, i.e. similarities among the strings in the generated set should be greater than the threshold. I am wondering what Python package(s) should I use to achieve that? Or any idea how to implement this?

I think you can think of the problem in another way (reversed).
Given a string, say it is sittin.
Given a threshold (edit distance), say it is k.
Then you apply combinations of different "edits" in k-steps.
For example, let's say k = 2. And assume the allowed edit modes you have are:
delete one character
add one character
substitute one character with another one.
Then the logic is something like below:
input = 'sittin'
for num in 1 ... n: # suppose you want to have n strings generated
my_input_ = input
# suppose the edit distance should be smaller or equal to k;
# but greater or equal to one
for i in in 1 ... randint(k):
pick a random edit mode from (delete, add, substitute)
do it! and update my_input_
If you need to stick with a pre-defined dictionary, that adds some complexity but it is still doable. In this case, the edit must be valid.

Borrowing heavily on the pseudocode in #greeness answer I thought I would include the code I used to do this for DNA sequences.
This may not be your exact use case but I think it should be easily adaptable.
import random
dna = set(["A", "C", "G", "T"])
class Sequence(str):
def mutate(self, d, n):
mutants = set([self])
while len(mutants) < n:
k = random.randint(1, d)
for _ in range(k):
mutant_type = random.choice(["d", "s", "i"])
if mutant_type == "i":
mutants.add(self.insertion(k))
elif mutant_type == "d":
mutants.add(self.deletion(k))
elif mutant_type == "s":
mutants.add(self.substitute(k))
return list(mutants)
def deletion(self, n):
if n >= len(self):
return ""
chars = list(self)
i = 0
while i < n:
idx = random.choice(range(len(chars)))
del chars[idx]
i += 1
return "".join(chars)
def insertion(self, n):
chars = list(self)
i = 0
while i < n:
idx = random.choice(range(len(chars)))
new_base = random.choice(list(dna))
chars.insert(idx, new_base)
i += 1
return "".join(chars)
def substitute(self, n):
idxs = random.sample(range(len(self)), n)
chars = list(self)
for i in idxs:
new_base = random.choice(list(dna.difference(chars[i])))
chars[i] = new_base
return "".join(chars)
To use this you can do the following
s = Sequence("AAAAA")
d = 2 # max edit distance
n = 5 # number of strings in result
s.mutate(d, n)
>>> ['AAA', 'GACAAAA', 'AAAAA', 'CAGAA', 'AACAAAA']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

I need limit for particular characters in Python itertools - python

Related

How can I count sequences that meet these constraints?

I'm unable to figure out what test cases am I failing here

Finding regular expression with at least one repetition of each letter

how to make an imputed string to a list, change it to a palindrome(if it isn't already) and reverse it as a string back

how to generate a set of similar strings in python

Categories

Resources