Let the input sequences be X[0..m-1] and Y[0..n-1] of lengths m and n respectively. And let L(X[0..m-1], Y[0..n-1]) be the length of LCS of the two sequences X and Y. Following is the recursive definition of L(X[0..m-1], Y[0..n-1]).
If last characters of both sequences match (or X[m-1] == Y[n-1]) then
L(X[0..m-1], Y[0..n-1]) = 1 + L(X[0..m-2], Y[0..n-2])
If last characters of both sequences do not match (or X[m-1] != Y[n-1]) then
L(X[0..m-1], Y[0..n-1]) = MAX ( L(X[0..m-2], Y[0..n-1]), L(X[0..m-1], Y[0..n-2]) )
How to solve the problem if the lengths are different ? and how to print the respective sequences
It doesn't matter if the length of input strings are same or not, and this is taken care by the base case of recursion.
if (m == 0 || n == 0)
return 0;
If we reach the end of any one of the string, the recursion stops and unwinds from there.
Also the example you mentioned in comment:
ABCEFG and ABXDE
First we compare last character from both string. In this case, they are not same.
So we try two cases:
Remove last character from first string and compare it with second.
Remove last character from second string and compare it with first.
And return the max from both cases.
(As a side note, if the last character had matched, we would add 1 to our answer and remove the last character from both strings)
This process continues till any of the string reaches it's end, in which case, the base case of your recursion is satisfied and the recursion returns.
So it doesn't matter if the original length of string is same or not.
Related
I hope this is a simple question! I am trying to reverse a number, and be given the digits in the 'even' positions. When I try to do this within one string slice, I am just given a single digit, even when I am expecing more. When I do it as two slices, I am given the correct answer, but I am unsure why.
For example, if I have the number 512341234, I would expect it to give me 3131, as I have first reversed the string (432143215) and then taken the even position numbers (4[3]2[1]4[3]2[1]5).
Below is the code which I have tried to use to make it work, but doing it as one slice only returns the single digit, whereas doing it as two means it returns the expected value. Why is this?
num = 512341234
str(num)[1::-2] #returns 1
str(num)[::-1][1::2] #returns 3131
Thanks!
Noah
1::-2 means to start at position 1 (the second character) and go backwards two characters at a time. You want to start somewhere near the end of the string, e.g.
num = 512341234
str(num)[-1::-2]
'42425'
num = 512341234
str(num)[-2::-2]
'3131'
But you’ll have to pick -1 or -2 based on which one of those characters is in an even position (i.e. based on the length of the string) to do this.
How is this working? It checks if a string contains each character from a-z at least once?
import string
def ispangram(str1, alphabet=string.ascii_lowercase):
alphaset = set(alphabet)
return alphaset <= set(str1.lower())
This returns True for example:
ispangram("The quick brown fox jumps over the lazy dog")
I can only assume it is something to do with lexographical ordering as stated here, but still a bit confused.
Comparing two lists using the greater than or less than operator
When I read the link in this SO question:
https://docs.python.org/3/tutorial/datastructures.html#comparing-sequences-and-other-types
It says:
Sequence objects may be compared to other objects with the same
sequence type. The comparison uses lexicographical ordering: first the
first two items are compared, and if they differ this determines the
outcome of the comparison; if they are equal, the next two items are
compared, and so on, until either sequence is exhausted. If two items
to be compared are themselves sequences of the same type, the
lexicographical comparison is carried out recursively. If all items of
two sequences compare equal, the sequences are considered equal. If
one sequence is an initial sub-sequence of the other, the shorter
sequence is the smaller (lesser) one. Lexicographical ordering for
strings uses the Unicode code point number to order individual
characters. Some examples of comparisons between sequences of the same
type.
But this isn't clear to me.
It is a set operation, not list. Which is equivalent to,
alphaset.issubset(set(str1.lower()))
s <= t
s.issubset(t)
Test whether every element in s is in t.
See here:
https://docs.python.org/2/library/sets.html
Edit: See here for the current version of Set. Though easier explanation is given in the old version (For comparisons).
No. It's comparing two sets. So it's converting the input string to lower case and then using Python's set type to compare it with the set of lowercase letters.
This is very useful (and fast) technique for comparing two lists to see what members they have in common/difference.
def pangram(s):
alphabet = set('abcdefghijklmnopqrstuvwxyz')
s = s.replace(' ','').lower()
s= sorted(s)
count = {}
#alphabet could be one sting within '' later sorted, but I just went straight to the point.
#After initializing my dictionary at null, we start the count
for letter in s:
if letter in count:
count[letter] =[]
else:
count[letter] = 1
for letter in alphabet:
if letter in count:
count[letter] =[]
else:
count[letter] = 0
for letter in count:
if count[letter]== 0:
print (letter +' missing!')
print (count[letter]!= 0)
Given a string, lets say "TATA__", I need to find the total number of differences between adjacent characters in that string. i.e. there is a difference between T and A, but not a difference between A and A, or _ and _.
My code more or less tells me this. But when a string such as "TTAA__" is given, it doesn't work as planned.
I need to take a character in that string, and check if the character next to it is not equal to the first character. If it is indeed not equal, I need to add 1 to a running count. If it is equal, nothing is added to the count.
This what I have so far:
def num_diffs(state):
count = 0
for char in state:
if char != state[char2]:
count += 1
char2 += 1
return count
When I run it using num_diffs("TATA__") I get 4 as the response. When I run it with num_diffs("TTAA__") I also get 4. Whereas the answer should be 2.
If any of that makes sense at all, could anyone help in fixing it/pointing out where my error lies? I have a feeling is has to do with state[char2]. Sorry if this seems like a trivial problem, it's just that I'm totally new to the Python language.
import operator
def num_diffs(state):
return sum(map(operator.ne, state, state[1:]))
To open this up a bit, it maps !=, operator.ne, over state and state beginning at the 2nd character. The map function accepts multible iterables as arguments and passes elements from those one by one as positional arguments to given function, until one of the iterables is exhausted (state[1:] in this case will stop first).
The map results in an iterable of boolean values, but since bool in python inherits from int you can treat it as such in some contexts. Here we are interested in the True values, because they represent the points where the adjacent characters differed. Calling sum over that mapping is an obvious next step.
Apart from the string slicing the whole thing runs using iterators in python3. It is possible to use iterators over the string state too, if one wants to avoid slicing huge strings:
import operator
from itertools import islice
def num_diffs(state):
return sum(map(operator.ne,
state,
islice(state, 1, len(state))))
There are a couple of ways you might do this.
First, you could iterate through the string using an index, and compare each character with the character at the previous index.
Second, you could keep track of the previous character in a separate variable. The second seems closer to your attempt.
def num_diffs(s):
count = 0
prev = None
for ch in s:
if prev is not None and prev!=ch:
count += 1
prev = ch
return count
prev is the character from the previous loop iteration. You assign it to ch (the current character) at the end of each iteration so it will be available in the next.
You might want to investigate Python's groupby function which helps with this kind of analysis.
from itertools import groupby
def num_diffs(seq):
return len(list(groupby(seq))) - 1
for test in ["TATA__", "TTAA__"]:
print(test, num_diffs(test))
This would display:
TATA__ 4
TTAA__ 2
The groupby() function works by grouping identical entries together. It returns a key and a group, the key being the matching single entry, and the group being a list of the matching entries. So each time it returns, it is telling you there is a difference.
Trying to make as little modifications to your original code as possible:
def num_diffs(state):
count = 0
for char2 in range(1, len(state)):
if state[char2 - 1] != state[char2]:
count += 1
return count
One of the problems with your original code was that the char2 variable was not initialized within the body of the function, so it was impossible to predict the function's behaviour.
However, working with indices is not the most Pythonic way and it is error prone (see comments for a mistake that I made). You may want rewrite the function in such a way that it does one loop over a pair of strings, a pair of characters at a time:
def num_diffs(state):
count = 0
for char1, char2 in zip(state[:-1], state[1:]):
if char1 != char2:
count += 1
return count
Finally, that very logic can be written much more succinctly — see #Ilja's answer.
I was answering some programming problems in the internet and this problem interests me. The problem is defined as follows:
This code prints all the permutations of the string lexicographically. Something is wrong with it. Find and fix it by modifying or adding one line!
Input:
The input consists of a single line containing a string of lowercase characters with no spaces in between. Its length is at most 7 characters, and its characters are sorted lexicographically.
Output:
All permutations of the string printed one in each line, listed lexicographically.
def permutations():
global running
global characters
global bitmask
if len(running) == len(characters):
print(''.join(running))
else:
for i in xrange(len(characters)):
if ((bitmask>>i)&1) == 0:
bitmask |= 1<<i
running.append(characters[i])
permutations()
running.pop()
raw = raw_input()
characters = list(raw)
running = []
bitmask = 0
permutations()
Can somebody answer it for me and explain how it works? I am not really familiar in the applications of bitmasking. Thank you.
You should make the bitmask bit 0 again by adding the line:
bitmask ^= 1<<i
Code:
def permutations():
global running
global characters
global bitmask
if len(running) == len(characters):
print(''.join(running))
else:
for i in xrange(len(characters)):
if ((bitmask>>i)&1) == 0:
bitmask |= 1<<i
running.append(characters[i])
permutations()
bitmask ^= 1<<i #make the bit zero again.
running.pop()
raw = raw_input()
characters = list(raw)
running = []
bitmask = 0
permutations()
Explanation:
Bitmask is an integer that is treated as a string of bits. In your case the length of this string is equal to the length of the input string.
Each position in this string signifies whether the corresponding character has already added in the partially built string or not.
The code works by building a new string starting from an empty string. Whenever any character is added, the bitmask records it. Then the string is sent deeper into recursion for further addition of characters. When the code returns from recursion, then the added character is to be removed and the bitmask value has to be made to its original value.
More information about masking can be found here.http://en.wikipedia.org/wiki/Mask_%28computing%29
EDIT:
Say the input string is "abcde" and the bitmask at any point in the execution of the code is "00100". This means that only the character 'c' has been added so far to the partially built string.
Hence we should not add the character 'c' again.
The "if" condition ((bitmask >> i) & 1) == 0 checks whether the i'th bit in bitmask has been set, ie., whether the i'th character has already been added in the string. If it is not added, only then the character gets appended, otherwise not.
If the bit operations are new to you then I suggest you look up on this topic on the internet.
I'm trying to make a program for my biology research.
I need to take this sequence:
NNNNNNNNNNCCNNAGTGNGNACAGACGACGGGCCCTGGCCCCTCGCACACCCTGGACCA
AGTCAATCGCACCCACTTCCCTTTCTTCTCGGATGTCAAGGGCGACCACCGGTTGGTGTT
GAGCGTCGTGGAGACCACCGTTCTGGGGCTCATCTTTGTCGTCTCACTGCTGGGCAACGT
GTGTGCTCTAGTGCTGGTGGCGCGCCGTCGGCGCCGTGGGGCGACAGCCAGCCTGGTGCT
CAACCTCTTCTGCGCGGATTTGCTCTTCACCAGCGCCATCCCTCTAGTGCTCGTCGTGCG
CTGGACTGAGGCCTGGCTGTTGGGGCCCGTCGTCTGCCACCTGCTCTTCTACGTGATGAC
AATGAGCGGCAGCGTCACGATCCTCACACTGGCCGCGGTCAGCCTGGAGCGCATGGTGTG
CATCGTGCGCCTCCGGCGCGGCTTGAGCGGCCCGGGGCGGCGGACTCAGGCGGCACTGCT
GGCTTTCATATGGGGTTACTCGGCGCTCGCCGCGCTGCCCCTCTGCATCTTGTTCCGCGT
GGTCCCGCAGCGCCTTCCCGGCGGGGACCAGGAAATTCCGATTTGCACATTGGATTGGCC
CAACCGCATAGGAGAAATCTCATGGGATGTGTTTTTTGTGACTTTGAACTTCCTGGTGCC
GGGACTGGTCATTGTGATCAGTTACTCCAAAATTTTACAGATCACGAAAGCATCGCGGAA
GAGGCTTACGCTGAGCTTGGCATACTCTGAGAGCCACCAGATCCGAGTGTCCCAACAAGA
CTACCGACTCTTCCGCACGCTCTTCCTGCTCATGGTTTCCTTCTTCATCATGTGGAGTCC
CATCATCATCACCATCCTCNCATCTTGATCCAAAACTTCCGGCAGGACCTGGNCATCTGG
NCATCCCTTTTCTTCTGGGNNGTNNNNNCACGTTGCNACTCTNCCTAAANCCCATACTGT
ANNANATGNCGCTNNNAGGAANGAATGGAGGAANANTTTTTGNNNNNNNNN
...and remove everything past the last N in the beginning and the first N at the end. In other words, to make it look something like this:
ACAGACGACGGGCCCTGGCCCCTCGCACACCCTGGACCA
AGTCAATCGCACCCACTTCCCTTTCTTCTCGGATGTCAAGGGCGACCACCGGTTGGTGTT
GAGCGTCGTGGAGACCACCGTTCTGGGGCTCATCTTTGTCGTCTCACTGCTGGGCAACGT
GTGTGCTCTAGTGCTGGTGGCGCGCCGTCGGCGCCGTGGGGCGACAGCCAGCCTGGTGCT
CAACCTCTTCTGCGCGGATTTGCTCTTCACCAGCGCCATCCCTCTAGTGCTCGTCGTGCG
CTGGACTGAGGCCTGGCTGTTGGGGCCCGTCGTCTGCCACCTGCTCTTCTACGTGATGAC
AATGAGCGGCAGCGTCACGATCCTCACACTGGCCGCGGTCAGCCTGGAGCGCATGGTGTG
CATCGTGCGCCTCCGGCGCGGCTTGAGCGGCCCGGGGCGGCGGACTCAGGCGGCACTGCT
GGCTTTCATATGGGGTTACTCGGCGCTCGCCGCGCTGCCCCTCTGCATCTTGTTCCGCGT
GGTCCCGCAGCGCCTTCCCGGCGGGGACCAGGAAATTCCGATTTGCACATTGGATTGGCC
CAACCGCATAGGAGAAATCTCATGGGATGTGTTTTTTGTGACTTTGAACTTCCTGGTGCC
GGGACTGGTCATTGTGATCAGTTACTCCAAAATTTTACAGATCACGAAAGCATCGCGGAA
GAGGCTTACGCTGAGCTTGGCATACTCTGAGAGCCACCAGATCCGAGTGTCCCAACAAGA
CTACCGACTCTTCCGCACGCTCTTCCTGCTCATGGTTTCCTTCTTCATCATGTGGAGTCC
CATCATCATCACCATCCTC
How would I do this?
I think you may be looking for the longest sequence of non-N characters in the input.
Otherwise, you have no rule to distinguish the last N in the prefix from the first N in the suffix. There is nothing at all different about the N you want to start after (before the ACAGAC…) and the next N (before the CATCCC), or, for that matter, the previous one (before the GN) except that it picks out the longest sequence. In fact, other than the 10 N's at the very start and the 9 at the very end, there doesn't seem to be anything special about any of the N's.
The easiest way to do that is to just grab all the sequences and keep the longest:
max(s.split('N'), key=len)
If you have some additional rule on top of this—e.g., the longest sequence whose length is divisible by three (which in this case is the same thing)—you can do the same basic thing:
max((seq for seq in s.split('N') if len(seq) % 3 == 0), key=len)
#abarnert's answer is correct but str.split() returns the a list sub strings. Meaning the memory usage is literally O(N) (e.g. use tons of memory). This isn't a problem when you're input is short but when processing DNA sequences, your input is typically very long. To avoid the memory overhead, your need to use a iterator. I recommend the re's finditer.
import re
_find_n_free_substrings = re.compile(r'[^N]+', re.MULTILINE).finditer
def longest_n_free_substring(string):
substrings = (match.group(0) for match in _find_n_free_substrings(string))
return max(substrings, key=len)