I'm having a small formatting issue that I can't seem to solve. I have some long strings, in the form of DNA sequences. I added each to a separate list, with the letters each an individual item in either list. They are of unequal length, so I appended "N's" to the shorter of the two.
Ex:
seq1 = ['A', 'T', 'G', 'G', 'A', 'C', 'G', 'C', 'A']
seq2 = ['A', 'T', 'G', 'G', 'C', 'T', 'G']
seq2 became: ['A', 'T', 'G', 'G', 'C', 'T', 'G', 'N', 'N']
Currently, after comparing the letter in each list I get:
ATGG--G--
where '-' is a mismatch in the letters (includings "N's").
Ideally what I would like to print is:
seq1 ATGGACGCA
|||||||||
seq2 ATGG--G--
I've been playing around with new line characters commas at the end of print statements, however I can't get it to work. I would like to print an identifier for each one on the same line as it's sequence.
Here's the function used to compare the two seqs:
def align_seqs(orf, query):
orf_base = list(orf)
query_base = list(query)
if len(query_base) > len(orf_base):
N = (len(query_base) - len(orf_base))
for i in range(N):
orf_base.append("N")
elif len(query_base) < len(orf_base):
N = (len(orf_base) - len(query_base))
for i in range(N):
query_base.append("N")
align = []
for i in range(0, len(orf_base)):
if orf_base[i] == query_base[i]:
align.append(orf_base[i])
else:
align.append("-")
print ''.join(align)
At the present time, I'm just printing the "bottom" portion of what I want to print.
All help is appreciated.
So, here's a solution for you that works with long strings:
s1 = 'ATAAGGATAAGGATAAGGATAAGGATAAGGATAAGGATAAGGATAAGGATAAGGATAAGG'
s2 = 'A-AAGGA-AAGGA-AAGGA-AAGGA-AAGGA-AAGGA-AAGGA-AAGGA-AAGGA-AAGG'
#assumes both sequences are of same length (post-alignment)
def print_align(seq1, seq2, length):
while len(seq1) > 0:
print "seq1: " + seq1[:length-6]
print " " + '|'*len(seq1[:length-6])
print "seq2: " + seq2[:length-6] + "\n"
seq1 = seq1[length-6:]
seq2 = seq2[length-6:]
print_align(s1, s2, 30)
The output is:
seq1: ATAAGGATAAGGATAAGGATAAGG
||||||||||||||||||||||||
seq2: A-AAGGA-AAGGA-AAGGA-AAGG
seq1: ATAAGGATAAGGATAAGGATAAGG
||||||||||||||||||||||||
seq2: A-AAGGA-AAGGA-AAGGA-AAGG
seq1: ATAAGGATAAGG
||||||||||||
seq2: A-AAGGA-AAGG
Which I believe is what you want. You can play around with the length parameter in order to get the lines to display properly (each line is cut off after reaching the length specified by that parameter). For example, if I call print_align(s1, s2, 39) I get:
seq1: ATAAGGATAAGGATAAGGATAAGGATAAGGATA
|||||||||||||||||||||||||||||||||
seq2: A-AAGGA-AAGGA-AAGGA-AAGGA-AAGGA-A
seq1: AGGATAAGGATAAGGATAAGGATAAGG
|||||||||||||||||||||||||||
seq2: AGGA-AAGGA-AAGGA-AAGGA-AAGG
This will have a much more reasonable result when you try it with huge (>1000bp) sequences.
Note that the function takes two sequences of the same length as input, so this is just to print it nicely after you've done all the hard aligning work.
P.S. Generally in sequence alignment one only displays the bar | for matching nucleotides. The solution is pretty easy and you should be able to figure it out (if you have throuble though let me know).
If I understand correctly, this is a formatting question. I recommend looking at str.format(). Assuming you can get your sequences to strings (as you did with seq2 as align). Try:
seq1 = 'ATGGACGCA'
seq2 = 'ATGG--G--'
print(' seq1: {}\n {}\n seq2: {}'.format(seq1, len(seq1)*'|', seq2))
A little hacky, but gets the job done. The arguments of format() replace the {}'s in order in the given string. I get:
seq1: ATGGACGCA
|||||||||
seq2: ATGG--G--
You could always try something simple like the following which does not assume the same size but you can adjust it as you see fit.
def printSequences(seq1, seq2):
print('seq1',seq1)
print(' ','|'*max(len(seq1),len(seq2)))
print('seq2',seq2)
Related
I know the most popular permutation algorithms (thanks to wonderful question/answer on SO, and other related sites, such as Wikipedia, etc), but I recently wanted to see if I could get the Nth permutation without exhausting the whole permutation space.
Factorial comes to mind, so I ended up looking at posts such as this one that implements the unrank and rank algorithm, as well as many, many other ones. (Here as mentioned, I take into account other sites as "post")
I stumbled upon this ActiveState recipe which seems like it fit what I wanted to do, but it doesn't support doing the reverse (using the result of the function and reusing the index to get back the original sequence/order).
I also found a similar and related answer on SO: https://stackoverflow.com/a/38166666/12349101
But the same problem as above.
I tried and made different versions of the unrank/rank implementation(s) above, but they require that the sorted sequence be passed as well as the index given by the rank function. If a random (even within the range of the total permutation count) is given, it won't work (most of the time I tried at least).
I don't know how to implement this and I don't think I saw anyone on SO doing this yet. Is there any existing algorithm or way to do this/approach this?
To make things clearer:
Here is the Activestate recipe I posted above (at least the one posted in the comment):
from functools import reduce
def NPerms (seq):
"computes the factorial of the length of "
return reduce(lambda x, y: x * y, range (1, len (seq) + 1), 1)
def PermN (seq, index):
"Returns the th permutation of (in proper order)"
seqc = list (seq [:])
result = []
fact = NPerms (seq)
index %= fact
while seqc:
fact = fact // len (seqc)
choice, index = index // fact, index % fact
result += [seqc.pop (choice)]
return result
As mentioned, this handles doing part of what I mentioned in the title, but I don't know how to get back the original sequence/order using both the result of that function + the same index used.
Say I use the above on a string such as hello world inside a list:
print(PermN(list("hello world"), 20))
This output: ['h', 'e', 'l', 'l', 'o', ' ', 'w', 'd', 'r', 'o', 'l']
Now to see if this can go back to the original using the same index + result of the above:
print(PermN(['h', 'e', 'l', 'l', 'o', ' ', 'w', 'd', 'r', 'o', 'l'], 20))
Output: ['h', 'e', 'l', 'l', 'o', ' ', 'w', 'l', 'r', 'd', 'o']
I think this does what you want, and has the benefit that it doesn't matter what the algorithm behind PermN is:
def NmreP(seq,index):
# Copied from PermN
seqc = list (seq [:])
result = []
fact = NPerms (seq)
index %= fact
seq2 = list(range(len(seq))) # Make a list of seq indices
fwd = PermN(seq2,index) # Arrange them as PermN would
result = [0]*len(seqc) # Make an array to fill with results
for i,j in enumerate(fwd): # For each position, find the element in seqc in the position this started from
result[j] = seqc[i]
return result
I'd like to count the elements in y that exist in the same order than in x. So for:
x = [a,b,c,d,e,f,g,h]
y = [c,a,b,z,k,f,g,d,s,t]
I'd want a function that returns me a 4 as 'a','b','c','d' are in y but not "e" I'd like a function that returns 4. y is random but it never has any duplicates. x is constant and len(x) = 8.
x and y are both lists of strings.
That means for:
x = [a,b,c,d,e,f,g,h]
y = [c,a,k,z,k,f,g,d,s,t]
I'd like the function to return 1.
I've tried something with a nested loop:
i = 0
h = 0
for s in x:
for t in y:
if s == t:
i = i + 1 #i is what I'm looking for in the end.
h = 0
elif h = 9:
break
else:
h = h + 1
My idea was to count the delta from one 't' to the next 't' but I can't get it to work properly as I just can't wrap my head around the required math.
Thanks a lot for your suggestions already and please enjoy your day!
In my previous answer, the code would throw an error when all elements of x were in y - so, here is my revised code:
print(([value in y for value in x] + [False]).index(False))
It does the job, but it's really hard to read. Let's split it up (the comments explain what each line does):
# This is our new list. In the previous code, this was a tuple - I'll get into
# that later. Basically, for each element in x, it checks whether that value is in
# y, resulting in a new list of boolean values. (In the last code, I used the map
# function with a lambda, but this is definitely more readable).
# For example, in OP's example, this list would look like
# [True, True, True, True, False, True, True, False]
new_list = [value in y for value in x]
# This is the step lacking with the old code and why I changed to a list.
# This adds a last False value, which prevents the index function from throwing an
# error if it doesn't find a value in the list (it instead returns the index this
# last False value is at). I had to convert from a tuple because
# you cannot add to a tuple, but you can add to a list. I was using a tuple in the
# last code because it is supposed to be faster than a list.
new_list_with_overflow = (new_list + [False])
# This is the final result. The index function gets the first element that is equal
# to False - meaning, it gets the index of the first element where x is not in y.
result = new_list_with_overflow.index(False)
# Finally, print out the result.
print(result)
Hopefully this explains what that one line is doing!
Some more links for reading:
What's the difference between lists and tuples?
How do I concatenate two lists in Python?
Python Docs on List Comprehensions
Here is another (arguably less readable) code snippet:
print((*(value in y for value in x), False).index(False))
A benefit of this code is that it uses tuples, so it is faster than the previous code, with the drawback of being a bit harder to understand. It also is not supported by older versions of python. However, I can leave this as an exercise for you to figure out! You might want to check out what the * does.
EDIT: This is the new answer. The code below only works when all elements of x are not in y - otherwise, it throws an error. Also, these solutions are just more readable.
A "pythonic" one-liner:
print(tuple(map(lambda value: value in y, x)).index(False))
Here's your function needed:
def counter(x, y):
print("_" * 50)
print("x: " + str(x))
print("y: " + str(y))
for i, letter in enumerate(x):
if letter not in y:
break
print("i: " + str(i))
return i
counter(
["a","b","c","d","e","f","g","h"],
["c","a","b","z","k","f","g","d","s","t"]
)
counter(
["a","b","c","d","e","f","g","h"],
["a","b","z","k","f","g","d","s","t"]
)
counter(
["a","b","c","d","e","f","g","h"],
["c","a","b","z","k","f","g","d","s","t", "e"]
)
return:
__________________________________________________
x: ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
y: ['c', 'a', 'b', 'z', 'k', 'f', 'g', 'd', 's', 't']
i: 4
__________________________________________________
x: ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
y: ['a', 'b', 'z', 'k', 'f', 'g', 'd', 's', 't']
i: 2
__________________________________________________
x: ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
y: ['c', 'a', 'b', 'z', 'k', 'f', 'g', 'd', 's', 't', 'e']
i: 7
Using itertools.takewhile:
from itertools import takewhile
result = len(list(takewhile(lambda item : item in y, x)))
It takes every item in x starting from the first item in x until the condition lambda item : item in y is no longer satisfied.
I am looking to get the number of similar characters between two lists.
The first list is:
list1=['e', 'n', 'z', 'o', 'a']
The second list is going to be a word user inputted turned into a list:
word=input("Enter word")
word=list(word)
I'll run this function below to get the number of similitudes in the two lists:
def getSimilarItems(word,list1):
counter = 0
for i in list2:
for j in list1:
if i in j:
counter = counter + 1
return counter
What I don't know how to do is how to get the number of similitudes for each item of the list(which is going to be either 0 or 1 as the word is going to be split into a list where an item is a character).
Help would be VERY appreciated :)
For example:
If the word inputted by the user is afez:
I'd like the run the function:
wordcount= getSimilarItems(word,list1)
And get this as an output:
>>>1 (because a from afez is in list ['e', 'n', 'z', 'o', 'a'])
>>>0 (because f from afez isn't in list ['e', 'n', 'z', 'o', 'a'])
>>>1 (because e from afez is in list ['e', 'n', 'z', 'o', 'a'])
>>>1 (because z from afez is in list ['e', 'n', 'z', 'o', 'a'])
Sounds like you simply want:
def getSimilarItems(word,list1):
return [int(letter in list1) for letter in word]
What I don't know how to do is how to get the number of similitudes
for each item of the list(which is going to be either 0 or 1 as the
word is going to be split into a list where an item is a character).
I assume that instead of counting the number of items in the list, you want to get the individual match result for each element.
For that you can use a dictionary or a list, and return that from your function.
Going off the assumption that the input is going to be the same length as the list,
def getSimilarItems(list1,list2):
counter = 0
list = []
for i in list2:
for j in list1:
if i in j:
list.append(1)
else:
list.append(0)
return list
Based off your edit,
def getSimilarItems(list1,list2):
counter = 0
for i in list2:
if i in list1:
print('1 (because )'+i +' from temp_word is in list'+ str(list1))
else:
print("0 (because )"+i +" from temp_word isn't in list" + str(list1))
Look at Julien's answer if you want a more condensed version (I'm not very good with list comprehension)
I wrote a function with two parameters. One is an empty string and the other is a string word. My assignment is to use to recursion to reverse the word and place it in the empty string. Just as I think ive got it, i received an "out of memory error". I wrote the code so that so it take the word, turn it into a list, flips it backwards, then places the first letter in the empty string, then deletes the letter out of the list so recursion can happen to each letter. Then it compares the length of the the original word to the length of the empty string (i made a list so they can be compared) so that when their equivalent the recursion will end, but idk
def reverseString(prefix, aStr):
num = 1
if num > 0:
#prefix = ""
aStrlist = list(aStr)
revaStrlist = list(reversed(aStrlist))
revaStrlist2 = list(reversed(aStrlist))
prefixlist = list(prefix)
prefixlist.append(revaStrlist[0])
del revaStrlist[0]
if len(revaStrlist2)!= len(prefixlist):
aStr = str(revaStrlist)
return reverseString(prefix,aStr)
When writing something recursive I try and think about 2 things
The condition to stop the recursion
What I want one iteration to do and how I can pass that progress to the next iteration.
Also I'd recommend getting the one iteration working then worry about calling itself again. Otherwise it can be harder to debug
Anyway so applying this to your logic
When the length of the output string matches the length of the input string
add one letter to the new list in reverse. to maintain progress pass list accumulated so far to itself
I wanted to just modify your code slightly as I thought that would help you learn the most...but was having a hard time with that so I tried to write what i would do with your logic.
Hopefully you can still learn something from this example.
def reverse_string(input_string, output_list=[]):
# condition to keep going, lengths don't match we still have work to do otherwise output result
if len(output_list) < len(list(input_string)):
# lets see how much we have done so far.
# use the length of current new list as a way to get current character we are on
# as we are reversing it we need to take the length of the string minus the current character we are on
# because lists are zero indexed and strings aren't we need to minus 1 from the string length
character_index = len(input_string)-1 - len(output_list)
# then add it to our output list
output_list.append(input_string[character_index])
# output_list is our progress so far pass it to the next iteration
return reverse_string(input_string, output_list)
else:
# combine the output list back into string when we are all done
return ''.join(output_list)
if __name__ == '__main__':
print(reverse_string('hello'))
This is what the recursion will look like for this code
1.
character_index = 5-1 - 0
character_index is set to 4
output_list so far = ['o']
reverse_string('hello', ['o'])
2.
character_index = 5-1 - 1
character_index is set to 3
output_list so far = ['o', 'l']
reverse_string('hello', ['o', 'l'])
3.
character_index = 5-1 - 2
character_index is set to 2
output_list so far = ['o', 'l', 'l']
reverse_string('hello', ['o', 'l', 'l'])
4.
character_index = 5-1 - 3
character_index is set to 1
output_list so far = ['o', 'l', 'l', 'e']
reverse_string('hello', ['o', 'l', 'l', 'e'])
5.
character_index = 5-1 - 4
character_index is set to 0
output_list so far = ['o', 'l', 'l', 'e', 'h']
reverse_string('hello', ['o', 'l', 'l', 'e', 'h'])
6. lengths match just print what we have!
olleh
I have a string, for example:
"ab(abcds)kadf(sd)k(afsd)(lbne)"
I want to split it to a list such that the list is stored like this:
a
b
abcds
k
a
d
f
sd
k
afsd
lbne
I need to get the elements outside the parenthesis in separate rows and the ones inside it in separate ones.
I am not able to think of any solution to this problem.
You can use iter to make an iterator and use itertools.takewhile to extract the strings between the parens:
it = iter(s)
from itertools import takewhile
print([ch if ch != "(" else "".join(takewhile(lambda x: x!= ")",it)) for ch in it])
['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
If ch is not equal to ( we just take the char else if ch is a ( we use takewhile which will keep taking chars until we hit a ) .
Or using re.findall get all strings starting and ending in () with \((.+?))` and all other characters with :
print([''.join(tup) for tup in re.findall(r'\((.+?)\)|(\w)', s)])
['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
You just need to use the magic of 're.split' and some logic.
import re
string = "ab(abcds)kadf(sd)k(afsd)(lbne)"
temp = []
x = re.split(r'[(]',string)
#x = ['ab', 'abcds)kadf', 'sd)k', 'afsd)', 'lbne)']
for i in x:
if ')' not in i:
temp.extend(list(i))
else:
t = re.split(r'[)]',i)
temp.append(t[0])
temp.extend(list(t[1]))
print temp
#temp = ['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
Have a look at difference in append and extend here.
I hope this helps.
You have two options. The really easy one is to just iterate over the string. For example:
in_parens=False
buffer=''
for char in my_string:
if char =='(':
in_parens=True
elif char==')':
in_parens = False
my_list.append(buffer)
buffer=''
elif in_parens:
buffer+=char
else:
my_list.append(char)
The other option is regex.
I would suggest regex. It is worth practicing.
Try: Python re. If you are new to re it may take a bit of time but you can do all kind of string manipulations once you get it.
import re
search_string = 'ab(abcds)kadf(sd)k(afsd)(lbne)'
re_pattern = re.compile('(\w)|\((\w*)\)') # Match single character or characters in parenthesis
print [x if x else y for x,y in re_pattern.findall(search_string)]