Create DNA Sequences of length n

Create DNA Sequences of length n - python

How can we use recursion to calculate all dna sequences of length n in a function.
For instance if the function is given 2, it returns ['AA', 'AC', 'AT', 'AG', 'CA', 'CC', 'CT', 'CG', 'TA', 'TC', 'TT', 'TG', 'GA', 'GC', 'GT', 'GG']
etc...

functools.permutations will give all combinations of a given iterable, the second argument r is the length of the combinations returned
itertools.permutations('ACGT', length)

Here is one way:
def all_seq(n, curr, e, ways):
"""All possible sequences of size n given elements e.
ARGS
n: size of sequence
curr: a list used for constructing sequences
e: the list of possible elements (could have been a global list instead)
ways: the final list of sequences
"""
if len(curr) == n:
ways.append(''.join(curr))
return
for element in e:
all_seq(n, list(curr) + [element], e, ways)
perms = []
all_seq(2, [], ['A', 'C', 'T', 'G'], perms)
print(perms)
The ouput:
['AA', 'AC', 'AT', 'AG', 'CA', 'CC', 'CT', 'CG', 'TA', 'TC', 'TT', 'TG', 'GA', 'GC', 'GT', 'GG']

You actually want itertools.product('ACGT', repeat=n). Note that this will grow enormously fast (4^n elements of n length).
If your assignment is to do it recursively, consider how you would get all n+1-length options that start with a n-length prefix. The naive recursive option might be rather slow compared to itertools, if you need to use it in anger.

Related

Generating a pair of letter from a given sequence

I have a problem to be solved and I would appreciate if anyone can help. I want to generate all possible two-letters string from the given sequence. For example from string 'ACCG', I want to generate a list of [AA, CC, GG, AC,CA,AG,GA,CG,GC].
Does anyone have an idea how I can do that ?

An efficient solution can be coded using itertools module
CODE
import itertools
string = 'ACCG'
num = 2
combinations = list(itertools.product(string, repeat=num))
result = [*set([''.join(tup) for tup in combinations])]
print(result)
OUTPUT
['CG', 'GG', 'GC', 'GA', 'AG', 'AA', 'CC', 'AC', 'CA']

If you want a one-liner (using product from itertools) then try this:
from itertools import product
out = [''.join(p) for p in set(product('ACCG', repeat=2))]
print(out)
Output:
['AA', 'GG', 'CC', 'GA', 'AC', 'CG', 'GC', 'CA', 'AG']

Generate all possible unique samples with n-elements from

Is there any simple way to generate all possible unique samples from any given sample frame eg. I have a list with 5 elements members = ['P', 'V', 'S', 'T', 'A'], and would like to draw all possible 2 element combinations, disregarding an order i.e 'PV' is equivalent to 'VP'. So from list ['P', 'V', 'S', 'T', 'A'], I should get 10, 2 element samples.
I created something that does the trick, but I wonder if there is some method or function available already that does it and would allow to simply provide sample frame, size of the sample and created all possible combinations.
members = list('PVSTA')
ms = []
for i in members:
for j in members:
if i != j and i+j not in ms and j+i not in ms:
ms.append(i+j)
else:
continue
print(ms)
['PV', 'PS', 'PT', 'PA', 'VS', 'VT', 'VA', 'ST', 'SA', 'TA']

You can use itertools.combinations(iterable, r) which return r length subsequences of elements from the input iterable. So in your case when the iterable is ['P', 'V', 'S', 'T', 'A'] and r=2 it will return 5C2 = 10 combinations.
Use:
from itertools import combinations
ms = ["".join(c) for c in combinations(list("PVSTA"), r=2)]
print(ms)
Output:
['PV', 'PS', 'PT', 'PA', 'VS', 'VT', 'VA', 'ST', 'SA', 'TA']

What you want to do is called the combinations, you can do this by using the itertools library in python.
from itertools import combinations
members = list('PVSTA')
comb_2 = combinations(members, 2)
result = ["".join(c) for c in comb_2]
print(result)

Others have already posted the itertools.combinations route (the best approach), but here is the manual way to do it for anyone interested:
members = list('PVSTA')
ms = []
for i in range(len(members)-1):
for j in range(i+1, len(members)):
ms.append(members[i] + members[j]
print(ms) # ['PV', 'PS', 'PT', 'PA', 'VS', 'VT', 'VA', 'ST', 'SA', 'TA']

Finding unique elements in nested list of strings

Similar to the query posted at this URL:
https://stackoverflow.com/questions/54477996/finding-unique-elements-in-nested-list/,
I have another query.
If I have a list that I have imported from Pandas and I need to get a single list as an output with all the unique elements as 
[Ac, Ad, An, Bi, Co, Cr, Dr, Fa, Mu, My, Sc]
Once I have all the unique elements, I want to check the count of each of these elements within the whole list.
Can someone advise as to how can I accomplish that?
mylist = df.Abv.str.split().tolist()
mylist
[[‘Ac,Cr,Dr’],
[‘Ac,Ad,Sc'],
[‘Ac,Bi,Dr’],
[‘Ad,Dr,Sc'],
[‘An,Dr,Fa’],
[‘Bi,Co,Dr’],
[‘Dr,Mu’],
[‘Ac,Co,My’],
[‘Co,Dr’],
[‘Ac,Ad,Sc'],
[‘An,Ac,Ad’],
]
I have tried different things but can't seem to make it work.
Tried to convert it into a string and apply split function on the string, but to no avail.

You can do it this way in Python3
mylist = [['Ac,Cr,Dr'],
['Ac,Ad,Sc'],
['Ac,Bi,Dr'],
['Ad,Dr,Sc'],
['An,Dr,Fa'],
['Bi,Co,Dr'],
['Dr,Mu'],
['Ac,Co,My'],
['Co,Dr'],
['Ac,Ad,Sc'],
['An,Ac,Ad'],
]
uniquedict = {}
for sublist in mylist:
for item in sublist[0].split(','):
if item in uniquedict.keys():
uniquedict[item] += 1
else:
uniquedict[item] = 1
print(uniquedict)
print(list(uniquedict.keys()))
{'Ac': 6, 'Cr': 1, 'Dr': 7, 'Ad': 4, 'Sc': 3, 'Bi': 2, 'An': 2, 'Fa': 1, 'Co': 3, 'Mu': 1, 'My': 1}
['Ac', 'Cr', 'Dr', 'Ad', 'Sc', 'Bi', 'An', 'Fa', 'Co', 'Mu', 'My']

you can create a dictionary with keys as list value and value as their count
your code may look like this
.
mylists = [[‘Ac,Cr,Dr’],
[‘Ac,Ad,Sc'],
[‘Ac,Bi,Dr’],
[‘Ad,Dr,Sc'],
[‘An,Dr,Fa’],
[‘Bi,Co,Dr’],
[‘Dr,Mu’],
[‘Ac,Co,My’],
[‘Co,Dr’],
[‘Ac,Ad,Sc'],
[‘An,Ac,Ad’],
]
unique = {}
for mylist in mylists:
for elem in mylist:
unique[elem] = unique[elem]+1 if elem in unique else 1
unique.keys() will give unique element array and if you want the count of any value you can get this from dictionary e.g unique['Ad']

You can use collections.Counter to make a dictionary of the counts of the elements. This will also give you easy access to a list of all unique elements. It looks like you have a list of lists where each sublist contains a ingle string. You will need to split these before you add them to the counter.
from collections import Counter
count = Counter()
mylist = [['Ac,Cr,Dr'],
['Ac,Ad,Sc'],
['Ac,Bi,Dr'],
['Ad,Dr,Sc'],
['An,Dr,Fa'],
['Bi,Co,Dr'],
['Dr,Mu'],
['Ac,Co,My'],
['Co,Dr'],
['Ac,Ad,Sc'],
['An,Ac,Ad'],
]
for arr in mylist:
count.update(arr[0].split(','))
print(count) # dictionary of symbols: counts
print(list(count.keys())) # list of all unique elements

You can take advantage of the very powerful tools offered by collections, itertools and functools and get a one-line solution.
If your lists contain only one element:
from collections import Counter
from itertools import chain
from functools import partial
if __name__ == '__main__':
mylist = [
['Ac,Cr,Dr'],
['Ac,Ad,Sc'],
['Ac,Bi,Dr'],
['Ad,Dr,Sc'],
['An,Dr,Fa'],
['Bi,Co,Dr'],
['Dr,Mu'],
['Ac,Co,My'],
['Co,Dr'],
['Ac,Ad,Sc'],
['An,Ac,Ad'],
]
# if lists contain only one element
occurrence_count = Counter(chain(*map(lambda x: x[0].split(','), mylist)))
items = list(occurrence_count.keys()) # items, with no repetitions
all_items = list(occurrence_count.elements()) # all items
ac_occurrences = occurrence_count['Ac'] # occurrences of 'Ac'
print(f"Unique items: {items}")
print(f"All list elements: {all_items}")
print(f"Occurrences of 'Ac': {ac_occurrences}")
And this is what you get:
Unique items: ['Ac', 'Cr', 'Dr', 'Ad', 'Sc', 'Bi', 'An', 'Fa', 'Co', 'Mu', 'My']
All list elements: ['Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Cr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Ad', 'Ad', 'Ad', 'Ad', 'Sc', 'Sc', 'Sc', 'Bi', 'Bi', 'An', 'An', 'Fa', 'Co', 'Co', 'Co', 'Mu', 'My']
Occurrences of 'Ac': 6
Otherwise, if your lists have more than one element:
from collections import Counter
from itertools import chain
from functools import partial
if __name__ == '__main__':
mylist_complex = [
['Ac,Cr,Dr', 'Ac,Ad,Sc'],
['Ac,Ad,Sc', 'Ac,Bi,Dr'],
['Ac,Bi,Dr', 'Ad,Dr,Sc'],
['Ad,Dr,Sc', 'An,Dr,Fa'],
['An,Dr,Fa', 'Bi,Co,Dr'],
['Bi,Co,Dr', 'Dr,Mu'],
['Dr,Mu', 'Ac,Co,My'],
['Ac,Co,My', 'Co,Dr'],
['Co,Dr', 'Ac,Ad,Sc'],
['Ac,Ad,Sc', 'An,Ac,Ad'],
['An,Ac,Ad', 'Ac,Cr,Dr'],
]
# if lists contain more than one element
occurrence_count_complex = Counter(chain(*map(lambda x: chain(*map(partial(str.split, sep=','), x)), mylist_complex)))
items = list(occurrence_count_complex.keys()) # items, with no repetitions
all_items = list(occurrence_count_complex.elements()) # all items
ac_occurrences = occurrence_count_complex['Ac'] # occurrences of 'Ac'
print(f"Unique items: {items}")
print(f"All list elements: {all_items}")
print(f"Occurrences of 'Ac': {ac_occurrences}")
And this is what you get in this case:
Unique items: ['Ac', 'Cr', 'Dr', 'Ad', 'Sc', 'Bi', 'An', 'Fa', 'Co', 'Mu', 'My']
All list elements: ['Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Cr', 'Cr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Ad', 'Ad', 'Ad', 'Ad', 'Ad', 'Ad', 'Ad', 'Ad', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Bi', 'Bi', 'Bi', 'Bi', 'An', 'An', 'An', 'An', 'Fa', 'Fa', 'Co', 'Co', 'Co', 'Co', 'Co', 'Co', 'Mu', 'Mu', 'My', 'My']
Occurrences of 'Ac': 12

Try below:
from itertools import chain
mylist = [['Ac,Cr,Dr'],
['Ac,Ad,Sc'],
['Ac,Bi,Dr'],
['Ad,Dr,Sc'],
['An,Dr,Fa'],
['Bi,Co,Dr'],
['Dr,Mu'],
['Ac,Co,My'],
['Co,Dr'],
['Ac,Ad,Sc'],
['An,Ac,Ad']
]
flat_list = list(chain.from_iterable(mylist))
unique_list = set(','.join(flat_list).split(','))

Permutations Python

How do I take, for example, this tuple ("A", "E", "L") and generate all possible words without repeating the letters? The result would be 3 words with only one letter, 6 words with two letters and 6 words with 3 letters.
I tried this:
def gererate(tuplo_letras):
return [i for i in itertools.permutations(tuplo_letras)]
def final(arg):
return generate(list(map(''.join, itertools.permutations(arg))))

You can use itertools.permutations and iterate over all the lengthes of the permutations you want to cover. Note that permutations takes two arguments, the iterable and the desired length of the permutations you want:
from itertools import permutations, chain
tpl = ("A", "E", "L")
[''.join(p) for p in chain(*(permutations(tpl, l+1) for l in range(len(tpl))))]
# ['A', 'E', 'L', 'AE', 'AL', 'EA', 'EL', 'LA', 'LE', 'AEL', 'ALE', 'EAL', 'ELA', 'LAE', 'LEA']
If you need them grouped you can nest the comprehensions accordingly:
[[''.join(p) for p in (permutations(tpl, l+1))] for l in range(len(tpl))]
# [['A', 'E', 'L'], ['AE', 'AL', 'EA', 'EL', 'LA', 'LE'], ['AEL', 'ALE', 'EAL', 'ELA', 'LAE', 'LEA']]

How can I use different kinds of permutations in Python3?

The normal kind of permutation is:
'ABC'
↓
'ACB'
'BAC'
'BCA'
'CAB'
'CBA'
But, what if I want to do this:
'ABC'
↓
'AA'
'AB'
'AC'
'BA'
'BB'
'BC'
'CA'
'CB'
'CC'
What is this called, and how efficient would this be with arrays with hundreds of elements?

Your terminology is a bit confusing: what you have are not permutations of your characters, but rather the pairing of every possible character with every possible character: a Cartesian product.
You can use itertools.product to generate these combinations, but note that this returns an iterator rather than a container. So if you need all the combinations in a list, you need to construct a list explicitly:
from itertools import product
mystr = 'ABC'
prodlen = 2
products = list(product(mystr,repeat=prodlen))
Or, if you're only looping over these values:
for char1,char2 in product(mystr,repeat=prodlen):
# do something with your characters
...
Or, if you want to generate the 2-length strings, you can do this in a list comprehension:
allpairs = [''.join(pairs) for pairs in products]
# ['AA', 'AB', 'AC', 'BA', 'BB', 'BC', 'CA', 'CB', 'CC']

Nothing against itertools, but if you want a little insight on how to manually generate permutations of strings by applying modulo arithmetic to an incrementing sequence number. Should work with a string of any length and any value of n where n <= len(s)
The number of permutations generated is len(s) ** n
For example, just call printPermutations("abc", 2)
def printPermutations(s, n) :
if (not s) or (n < 1):
return
maxpermutations = len(s) ** n
for p in range(maxpermutations):
perm = getSpecificPermutation(s, n, p)
print(perm)
def getSpecificPermutation(s, n, p):
# s is the source string
# n is the number of characters to extract
# p is the permutation sequence number
result = ''
for j in range(n):
result = s[p % len(s)] + result
p = p // len(s)
return result

You'll want to use the itertools solution. But I know what it's called...
Most people call it counting. You're being sneaky about it, but I think it's just counting in base len(set), where set is your input set (I'm assuming it is truly a set, no repeated elements). Imagine, in your example A -> 0, B->1, C->2. You're also asking for elements that have a certain amount of max digits. Let me show you:
def numberToBase(n, b):
if n == 0:
return [0]
digits = []
while n:
digits.append(int(n % b))
n /= b
return digits[::-1]
def count_me(set, max_digits=2):
# Just count! From 0 to len(set) ** max_digits to be precise
numbers = [i for i in range(len(set) ** max_digits)]
# Convert to base len(set)
lists_of_digits_in_base_b = [numberToBase(i, len(set)) for i in numbers]
# Add 0s to the front (making each list of digits max_digit - 1 in length)
prepended_with_zeros = []
for li in lists_of_digits_in_base_b:
prepended_with_zeros.append([0]*(max_digits - len(li)) + li)
# Map each digit to an item in our set
m = {index: item for index, item in enumerate(set)}
temp = map(lambda x: [m[digit] for digit in x], prepended_with_zeros)
# Convert to strings
temp2 = map(lambda x: [str(i) for i in x], prepended_with_zeros)
# Concatenate each item
concat_strings = map(lambda a: reduce(lambda x, y: x + y, a, ""), temp)
return concat_strings
Here's some outputs:
print count_me("ABC", 2)
outputs:
['AA', 'AB', 'AC', 'BA', 'BB', 'BC', 'CA', 'CB', 'CC']
and
print count_me("ABCD", 2)
outputs:
['AA', 'AB', 'AC', 'AD', 'BA', 'BB', 'BC', 'BD', 'CA', 'CB', 'CC', 'CD', 'DA', 'DB', 'DC', 'DD']
and
print count_me("ABCD", 3)
outputs (a big one):
['AAA', 'AAB', 'AAC', 'AAD', 'ABA', 'ABB', 'ABC', 'ABD', 'ACA', 'ACB', 'ACC', 'ACD', 'ADA', 'ADB', 'ADC', 'ADD', 'BAA', 'BAB', 'BAC', 'BAD', 'BBA', 'BBB', 'BBC', 'BBD', 'BCA', 'BCB', 'BCC', 'BCD', 'BDA', 'BDB', 'BDC', 'BDD', 'CAA', 'CAB', 'CAC', 'CAD', 'CBA', 'CBB', 'CBC', 'CBD', 'CCA', 'CCB', 'CCC', 'CCD', 'CDA', 'CDB', 'CDC', 'CDD', 'DAA', 'DAB', 'DAC', 'DAD', 'DBA', 'DBB', 'DBC', 'DBD', 'DCA', 'DCB', 'DCC', 'DCD', 'DDA', 'DDB', 'DDC', 'DDD']
P.S. numberToBase courtesy of this post

As it says Andras Deak, using itertools product:
import itertools
for i, j in itertools.product('ABC', repeat=2):
print(i + j)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create DNA Sequences of length n - python

How can we use recursion to calculate all dna sequences of length n in a function. For instance if the function is given 2, it returns ['AA', 'AC', 'AT', 'AG', 'CA', 'CC', 'CT', 'CG', 'TA', 'TC', 'TT', 'TG', 'GA', 'GC', 'GT', 'GG'] etc...

functools.permutations will give all combinations of a given iterable, the second argument r is the length of the combinations returned itertools.permutations('ACGT', length)

Related

Generating a pair of letter from a given sequence

Generate all possible unique samples with n-elements from

Finding unique elements in nested list of strings

Permutations Python

How can I use different kinds of permutations in Python3?

Categories

Resources