Generating a pair of letter from a given sequence - python

I have a problem to be solved and I would appreciate if anyone can help. I want to generate all possible two-letters string from the given sequence. For example from string 'ACCG', I want to generate a list of [AA, CC, GG, AC,CA,AG,GA,CG,GC].
Does anyone have an idea how I can do that ?

An efficient solution can be coded using itertools module
CODE
import itertools
string = 'ACCG'
num = 2
combinations = list(itertools.product(string, repeat=num))
result = [*set([''.join(tup) for tup in combinations])]
print(result)
OUTPUT
['CG', 'GG', 'GC', 'GA', 'AG', 'AA', 'CC', 'AC', 'CA']

If you want a one-liner (using product from itertools) then try this:
from itertools import product
out = [''.join(p) for p in set(product('ACCG', repeat=2))]
print(out)
Output:
['AA', 'GG', 'CC', 'GA', 'AC', 'CG', 'GC', 'CA', 'AG']

Related

Insert element in Python list after every other element list comprehension

The goal is to insert a string after every other element in the list apart from the last one:
arr0 = ['aa','bb','cc','dd']
Goal
['aa','XX','bb', 'XX','cc','XX','dd']
This topic has been addressed in posts like this, but the lists of strings used are only one character in length which affects the list comprehension. I do not have enough reputation points to comment and ask for clarification.
I have implemented it with a for loop, but I was trying to practice with list-comprehension and would appreciate insight as to where I am going wrong with it. Currently getting a SyntaxError: invalid syntax.
Example and Current Implementation
arr0 = ['aa','bb','cc','dd'] # Goal ['aa','XX','bb', 'XX','cc','XX','dd']
# Stop for range
total = len(arr0)*2-1
for i in range(1, total, 2):
arr0.insert(i, "XX")
# Returns ['aa', 'XX', 'bb', 'XX', 'cc', 'XX', 'dd']
List Comprehension Attempt:
[el for y in [[el, 'XX'] if idx != len(arr0)-1 else el for idx, el in enumerate(arr0)] for el in y if isinstance(y, list) else el]
Breakdown
[[el, 'XX'] if idx != len(arr0)-1 else el for idx, el in enumerate(arr0)]
# Returns
# [['aa', 'XX'], ['bb', 'XX'], ['cc', 'XX'], 'dd']
In the outer comprehension, I am trying to return it as a single list of the strings. I am trying to use isinstance to check if the element is a list or not (the last item being a string) and if not return simply the string.
Edit
I really appreciate the responses. I should have included this alternative case that I do encounter where I do not want elements inserted after a 'Note' element at the end, in which case I could not perform the slice. Is negative indexing with a step possible?
# Alternative scenario
arr1 = ['aa','bb','cc','dd', 'Note']
# ['aa','XX,'bb','XX,'cc','XX,'dd','Note']
You can simply use a nested list comprehension and strip the last element:
>>> [e for i in arr0 for e in [i, "XX"]][:-1]
['aa', 'XX', 'bb', 'XX', 'cc', 'XX', 'dd']
You can also use a .split()/.join() trick (which is probably less performant):
>>> ",XX,".join(arr0).split(",")
['aa', 'XX', 'bb', 'XX', 'cc', 'XX', 'dd']
A fancier way is to use itertools.chain:
>>> from itertools import chain
>>> list(chain.from_iterable(zip(arr0, ["XX"]*len(arr0))))[:-1]
['aa', 'XX', 'bb', 'XX', 'cc', 'XX', 'dd']
Edit: For the alternative case added later to the question, it is possible to slice the input and manually append the last element to the output:
>>> arr1 = ['aa','bb','cc','dd', 'Note']
>>> [e for i in arr1[:-1] for e in [i, "XX"]][:-1] + [arr1[-1]]
['aa', 'XX', 'bb', 'XX', 'cc', 'XX', 'dd', 'Note']
Just for fun, one more option to consider if you want all the logic within the list comprehension itself:
from itertools import zip_longest
initial_list = ['aa','bb','cc','dd'] # Goal ['aa','XX','bb', 'XX','cc','XX','dd']
padded_list = [
value
for item in zip_longest(initial_list, ["XX"] * (len(initial_list) - 1))
for value in item
if value
]
print(padded_list)
Output:
['aa', 'XX', 'bb', 'XX', 'cc', 'XX', 'dd']

Python code to solve classic P(n, r): Print all permutations of n objects taken r at a time without repetition

Python code to solve classic P(n, r)
Problem: Print all permutations of n objects taken r at a time without repetition.
I'm a Python learner looking for an elegant solution vs. trying to solve a coding problem at work.
Interested in seeing code to solve the classic P(n, r) permuation problem -- how to print all permuations of a string taken r characters at a time, without repeated characters.
Because learning is my focus, not interested in using the Python itertools "permutations" library function. Looked at it, but couldn't understand what it was doing. Looking for actual code to solve this problem, so I can learn the implementation.
Example: if input string s == 'abcdef', and r == 4, then n == 6.
Output would be something like: abcd abce abcf abde abdf abef ...
There are a lot of closely similar questions, but I didn't find a duplicate. Most specify "r". I want to leave r as an input parameter to keep the solution general.
This approach uses recursive generator functions which I find very readable. It is the easiest to start with combinations:
def combs(s, r):
if not r:
yield ''
elif s:
first, rest = s[0], s[1:]
for comb in combs(rest, r-1):
yield first + comb # use first char ...
yield from combs(rest, r) # ... or don't
>>> list(combs('abcd', 2))
['ab', 'ac', 'ad', 'bc', 'bd', 'cd']
>>> list(combs('abcd', 3))
['abc', 'abd', 'acd', 'bcd']
And build permutations on top of them:
def perms(s, r):
if not r:
yield ''
else:
for comb in combs(s, r):
for i, char in enumerate(comb):
rest = comb[:i] + comb[i+1:]
for perm in perms(rest, r-1):
yield char + perm
>>> list(perms('abc', 2))
['ab', 'ba', 'ac', 'ca', 'bc', 'cb']
>>> list(perms('abcd', 2))
['ab', 'ba', 'ac', 'ca', 'ad', 'da', 'bc', 'cb', 'bd', 'db', 'cd', 'dc']

Create DNA Sequences of length n

How can we use recursion to calculate all dna sequences of length n in a function.
For instance if the function is given 2, it returns ['AA', 'AC', 'AT', 'AG', 'CA', 'CC', 'CT', 'CG', 'TA', 'TC', 'TT', 'TG', 'GA', 'GC', 'GT', 'GG']
etc...
functools.permutations will give all combinations of a given iterable, the second argument r is the length of the combinations returned
itertools.permutations('ACGT', length)
Here is one way:
def all_seq(n, curr, e, ways):
"""All possible sequences of size n given elements e.
ARGS
n: size of sequence
curr: a list used for constructing sequences
e: the list of possible elements (could have been a global list instead)
ways: the final list of sequences
"""
if len(curr) == n:
ways.append(''.join(curr))
return
for element in e:
all_seq(n, list(curr) + [element], e, ways)
perms = []
all_seq(2, [], ['A', 'C', 'T', 'G'], perms)
print(perms)
The ouput:
['AA', 'AC', 'AT', 'AG', 'CA', 'CC', 'CT', 'CG', 'TA', 'TC', 'TT', 'TG', 'GA', 'GC', 'GT', 'GG']
You actually want itertools.product('ACGT', repeat=n). Note that this will grow enormously fast (4^n elements of n length).
If your assignment is to do it recursively, consider how you would get all n+1-length options that start with a n-length prefix. The naive recursive option might be rather slow compared to itertools, if you need to use it in anger.

how to get all possible strings for the alphabet letters in python?

For example, given the alphabet = 'abcd', how I can get this output in Python:
a
aa
b
bb
ab
ba
(...)
iteration by iteration.
I already tried the powerset() function that is found here on stackoverflow,
but that doesn't repeat letters in the same string.
Also, if I want to set a minimum and maximum limit that the string can have, how can I?
For example min=3 and max=4, abc, aaa, aba, ..., aaaa, abca, abcb, ...
You can use combinations_with_replacement from itertools (docs). The function combinations_with_replacement takes an iterable object as its first argument (e.g. your alphabet) and the desired length of the combinations to generate. Since you want strings of different lengths, you can loop over each desired length.
For example:
from itertools import combinations_with_replacement
def get_all_poss_strings(alphabet, min_length, max_length):
poss_strings = []
for r in range(min_length, max_length + 1):
poss_strings += combinations_with_replacement(alphabet, r)
return ["".join(s) for s in poss_strings] # combinations_with_replacement returns tuples, so join them into individual strings
Sample:
alphabet = "abcd"
min_length = 3
max_length = 4
get_all_poss_strings(alphabet, min_length, max_length)
Output:
['aaa', 'aab', 'aac', 'aad', 'abb', 'abc', 'abd', 'acc', 'acd', 'add', 'bbb', 'bbc', 'bbd', 'bcc', 'bcd', 'bdd', 'ccc', 'ccd', 'cdd', 'ddd', 'aaaa', 'aaab', 'aaac', 'aaad', 'aabb', 'aabc', 'aabd', 'aacc', 'aacd', 'aadd', 'abbb', 'abbc', 'abbd', 'abcc', 'abcd', 'abdd', 'accc', 'accd', 'acdd', 'addd', 'bbbb', 'bbbc', 'bbbd', 'bbcc', 'bbcd', 'bbdd', 'bccc', 'bccd', 'bcdd', 'bddd', 'cccc', 'cccd', 'ccdd', 'cddd', 'dddd']
Edit:
If order also matters for your strings (as indicated by having "ab" and "ba"), you can use the following function to get all permutations of all lengths in a given range:
from itertools import combinations_with_replacement, permutations
def get_all_poss_strings(alphabet, min_length, max_length):
poss_strings = []
for r in range(min_length, max_length + 1):
combos = combinations_with_replacement(alphabet, r)
perms_of_combos = []
for combo in combos:
perms_of_combos += permutations(combo)
poss_strings += perms_of_combos
return list(set(["".join(s) for s in poss_strings]))
Sample:
alphabet = "abcd"
min_length = 1
max_length = 2
get_all_poss_strings(alphabet, min_length, max_length)
Output:
['a', 'aa', 'ab', 'ac', 'ad', 'b', 'ba', 'bb', 'bc', 'bd', 'c', 'ca', 'cb', 'cc', 'cd', 'd', 'da', 'db', 'dc', 'dd']
You can use the product function of itertools with varying lengths. The result differs in order from the example you give, but this may be what you want. This results in a generator that you can use to get all your desired strings. This code lets you set a minimum and a maximum length of the returned strings. If you do not specify a value for parameter maxlen then the generator is infinite. Be sure you have a way to stop it or you will get an infinite loop.
import itertools
def allcombinations(alphabet, minlen=1, maxlen=None):
thislen = minlen
while maxlen is None or thislen <= maxlen:
for prod in itertools.product(alphabet, repeat=thislen):
yield ''.join(prod)
thislen += 1
for c in allcombinations('abcd', minlen=1, maxlen=2):
print(c)
This example gives the printout which is similar to your first example, though in a different order.
a
b
c
d
aa
ab
ac
ad
ba
bb
bc
bd
ca
cb
cc
cd
da
db
dc
dd
If you really want a full list, just use
list(allcombinations('abcd', minlen=1, maxlen=2))

Finding Subsequences of a large String

I am trying to get all the Subsequences of a String. Example:-
firstString = "ABCD"
O/P should be;
'ABCD', 'BCD', 'ACD', 'ABD', 'ABC', 'CD', 'BD', 'BC', 'AD', 'AC', 'AB', 'D', 'C', 'B', 'A'
For that I am using following part of code:-
#!usr/bin/python
from __future__ import print_function
from operator import itemgetter
from subprocess import call
import math
import itertools
import operator
call(["date"])
firstArray = []
firstString = "ABCD"
firstList = list(firstString)
for L in range(0, len(firstList)+1):
for subset in itertools.combinations(firstList, L):
firstArray.append(''.join(subset))
firstArray.reverse()
print (firstArray)
call(["date"])
But this code is not scalable.
If I provide :-
firstString = "ABCDABCDABCDABCDABCDABCDABCD"
The program takes almost 6 mins time to complete.
---------------- Capture while running the script --------------------
python sample-0012.py
Wed Feb 8 21:30:30 PST 2017
Wed Feb 8 21:30:30 PST 2017
Can someone please help?
What you are looking for is called a "Power set" (or Powerset).
The wikipedia def:
a power set (or powerset) of any set S is the set of all subsets of S,
including the empty set and S itself.
A good solution might be recursive, here you can find one:
link
For better doing with powerset concept go through,
How to get all possible combinations of a list’s elements?
otherwise, you can do like this.
wordlist = []
for i in range(len(firststring)):
...: comblist = combinations(list(firststring), i+1)
...: same_length_words = []
...: for i, word in enumerate(comblist):
...: if word not in same_length_words:
...: same_length_words.append(word)
...: for each_word in same_length_words:
...: wordlist.append(''.join(each_word))
...:
try this
from itertools import chain, combinations
firstString = 'ABCD'
data = list(firstString)
lists = chain.from_iterable(combinations(data, r) for r in range(len(data)+1))
print [''.join(i) for i in lists if i]
# ['A', 'B', 'C', 'D', 'AB', 'AC', 'AD', 'BC', 'BD', 'CD', 'ABC', 'ABD', 'ACD', 'BCD', 'ABCD']

Categories