So, i have this problem where i recieve 2 strings of letters ACGT, one with only letters, the other contain letters and dashes "-".both are same length. the string with the dashes is compared to the string without it. cell for cell. and for each pairing i have a scoring system. i wrote this code for the scoring system:
for example:
dna1: -ACA
dna2: TACG
the scoring is -1. (because dash compared to a letter(T) gives -2, letter compared to same letter gives +1 (A to A), +1 (C to C) and non similar letters give (-1) so sum is -1.
def get_score(dna1, dna2, match=1, mismatch=-1, gap=-2):
""""""
score = 0
for index in range(len(dna1)):
if dna1[index] is dna2[index]:
score += match
elif dna1[index] is not dna2[index]:
if "-" not in (dna1[index], dna2[index]):
score += mismatch
else:
score += gap
this is working fine.
now i have to use recursion to give the best possible score for 2 strings.
i recieve 2 strings, they can be of different sizes this time. ( i cant change the order of letters).
so i wrote this code that adds "-" as many times needed to the shorter string to create 2 strings of same length and put them in the start of list. now i want to start moving the dashes and record the score for every dash position, and finally get the highest posibble score. so for moving the dashes around i wrote a litle bubble sort.. but it dosnt seem to do what i want. i realize its a long quesiton but i'd love some help. let me know if anything i wrote is not understood.
def best_score(dna1, dna2, match=1, mismatch=-1, gap=-2,\
score=[], count=0):
""""""
diff = abs(len(dna1) - len(dna2))
if len(dna1) is len(dna2):
short = []
elif len(dna1) < len(dna2):
short = [base for base in iter(dna1)]
else:
short = [base for base in iter(dna2)]
for i in range(diff):
short.insert(count, "-")
for i in range(diff+count, len(short)-1):
if len(dna1) < len(dna2):
score.append((get_score(short, dna2),\
''.join(short), dna2))
else:
score.append((get_score(dna1, short),\
dna1, ''.join(short)))
short[i+1], short[i] = short[i], short[i+1]
if count is min(len(dna1), len(dna2)):
return score[score.index(max(score))]
return best_score(dna1, dna2, 1, -1, -2, score, count+1)
First, if I correctly deciephered your cost function, your best score value do not depend on gap, as number of dashes is fixed.
Second, it is lineary dependent on number of mismatches and so doesn't depend on match and mismatch exact values, as long as they are positive and negative respectively.
So your task reduces to lookup of a longest subsequence of longest string letters strictly matching subsequence of letters of the shortest one.
Third, define by M(string, substr) function returnin length of best match from above. If you smallest string fisrt letter is S, that is substr == 'S<letters>', then
M(string, 'S<letters>') = \
max(1 + M(string[string.index(S):], '<letters>') + # found S
M(string[1:], '<letters>')) # letter S not found, placed at 1st place
latter is an easy to implement recursive expression.
For a pair string, substr denoting m=M(string, substr) best score is equal
m * match + (len(substr) - m) * mismatch + (len(string)-len(substr)) * gap
It is straightforward, storing what value was max in recursive expression, to find what exactly best match is.
Related
I have few words(strings) like 'hefg','dhck','dkhc','lmno' which is to be converted to new words by swapping some or all the characters such that the new word is greater than the original word lexicographically also the new word is the least of all the words greater than the original word.
for e.g 'dhck'
should output 'dhkc' and not 'kdhc','dchk' or any other.
i have these inputs
hefg
dhck
dkhc
fedcbabcd
which should output
hegf
dhkc
hcdk
fedcbabdc
I have tried with this code in python it worked for all except 'dkhc' and 'fedcbabcd'.
I have figured out that the first character in case of 'fedcbabcd' is the max so, it is not getting swapped.and
Im getting "ValueError: min() arg is an empty sequence"
How can I modify the algorithm To fix the cases?
list1=['d','k','h','c']
list2=[]
maxVal=list1.index(max(list1))
for i in range(maxVal):
temp=list1[maxVal]
list1[maxVal]=list1[i-1]
list1[i-1]=temp
list2.append(''.join(list1))
print(min(list2))
You can try something like this:
iterate the characters in the string in reverse order
keep track of the characters you've already seen, and where you saw them
if you've seen a character larger than the curent character, swap it with the smallest larger character
sort all the characters after the that position to get the minimum string
Example code:
def next_word(word):
word = list(word)
seen = {}
for i in range(len(word)-1, -1, -1):
if any(x > word[i] for x in seen):
x = min(x for x in seen if x > word[i])
word[i], word[seen[x]] = word[seen[x]], word[i]
return ''.join(word[:i+1] + sorted(word[i+1:]))
if word[i] not in seen:
seen[word[i]] = i
for word in ["hefg", "dhck", "dkhc", "fedcbabcd"]:
print(word, next_word(word))
Result:
hefg hegf
dhck dhkc
dkhc hcdk
fedcbabcd fedcbabdc
The max character and its position doesn't influence the algorithm in the general case. For example, for 'fedcbabcd', you could prepend an a or a z at the beginning of the string and it wouldn't change the fact that you need to swap the final two letters.
Consider the input 'dgfecba'. Here, the output is 'eabcdfg'. Why? Notice that the final six letters are sorted in decreasing order, so by changing anything there, you get a smaller string lexicographically, which is no good. It follows that you need to replace the initial 'd'. What should we put in its place? We want something greater than 'd', but as small as possible, so 'e'. What about the remaining six letters? Again, we want a string that's as small as possible, so we sort the letters lexicographically: 'eabcdfg'.
So the algorithm is:
start at the back of the string (right end);
go left while the symbols keep increasing;
let i be the rightmost position where s[i] < s[i + 1]; in our case, that's i = 0;
leave the symbols on position 0, 1, ..., i - 1 untouched;
find the position among i+1 ... n-1 containing the least symbol that's greater than s[i]; call this position j; in our case, j = 3;
swap s[i] and s[j]; in our case, we obtain 'egfdcba';
reverse the string s[i+1] ... s[n-1]; in our case, we obtain 'eabcdfg'.
Your problem can we reworded as finding the next lexicographical permutation of a string.
The algorithm in the above link is described as follow:
1) Find the longest non-increasing suffix
2) The number left of the
suffix is our pivot
3) Find the right-most successor of the pivot in
the suffix
4) Swap the successor and the pivot
5) Reverse the suffix
The above algorithm is especially interesting because it is O(n).
Code
def next_lexicographical(word):
word = list(word)
# Find the pivot and the successor
pivot = next(i for i in range(len(word) - 2, -1, -1) if word[i] < word[i+1])
successor = next(i for i in range(len(word) - 1, pivot, -1) if word[i] > word[pivot])
# Swap the pivot and the successor
word[pivot], word[successor] = word[successor], word[pivot]
# Reverse the suffix
word[pivot+1:] = word[-1:pivot:-1]
# Reform the word and return it
return ''.join(word)
The above algorithm will raise a StopIteration exception if the word is already the last lexicographical permutation.
Example
words = [
'hefg',
'dhck',
'dkhc',
'fedcbabcd'
]
for word in words:
print(next_lexicographical(word))
Output
hegf
dhkc
hcdk
fedcbabdc
def solution(S):
total = 0
i = 1
while i <= len(S):
for j in range(0, len(S) - i + 1):
if is_p(S[ j: j + i]):
total += 1
i += 1
return total
def is_p(S):
if len(S) == 1:
return False
elif S == S[::-1]:
return True
else:
return False
I am writing a function to count the number of Palindromic Slices(with length bigger than 1) in a string. The above code is in poor time complexity. Can someone help me to improve it and make it O(N) complexity?
Edit: It is not duplicate since the other question is about finding the longest Palindromic Slices
Apply Manacher's Algorithm, also described by the multiple answers to this question.
That gives you the length of the longest palindrome centered at every location (centered at a character for odd-length, or centered between characters for even-length). You can use this to easily calculate the number of palindromes. Note that every palindrome must be centered somewhere, so it must be a substring (or equal to) the longest palindrome centered there.
So consider the string ababcdcbaa. By Manacher's Algorithm, you know that the maximal length palindrome centered at the d has length 7: abcdcba. By the properties of palindromes, you immediately know that bcdcb and cdc and d are also palindromes centered at d. In fact there are floor((k+1)/2) palindromes centered at a location, if you know that the longest palindrome centered there has length k.
So you sum the results of Manacher's Algorithm to get your count of all palindromes. If you want to only count palindromes of length > 1, you just need to subtract the number of length-1 palindromes, which is just n, the length of your string.
This can be done in linear time using suffix trees:
1) For constant sized alphabet we can build suffix trees using Ukkonen's Algorithm in O(n).
2) For given string S, build a generalized suffix tree of S#S' where S' is reverse of string S and # is delimiting character.
3) Now in this suffix tree, for every suffix i in S, look for lowest common ancestor of (2n-i+1) suffix is S'.
4) count for all such suffixes in the tree to get total count of all palindromes.
I am completing the Introduction to Computer Science and Programming Using Python Course and am stuck on Week 1: Python Basics - Problem Set 1 - Problem 3.
The problem asks:
Assume s is a string of lower case characters.
Write a program that prints the longest substring of s in which the
letters occur in alphabetical order. For example, if s = 'azcbobobegghakl', then your program should print
Longest substring in alphabetical order is: beggh
In the case of ties, print the first substring. For example, if s = 'abcbcd', then your program should print*
Longest substring in alphabetical order is: abc
There are many posts on stack overflow where people are just chasing or giving the code as the answer. I am looking to understand the concept behind the code as I am new to programming and want gain a better understanding of the basics
I found the following code that seems to answer the question. I understand the basic concept of the for loop, I am having trouble understanding how to use them (for loops) to find alphabetical sequences in a string
Can someone please help me understand the concept of using the for loops in this way.
s = 'cyqfjhcclkbxpbojgkar'
lstring = s[0]
slen = 1
for i in range(len(s)):
for j in range(i,len(s)-1):
if s[j+1] >= s[j]:
if (j+1)-i+1 > slen:
lstring = s[i:(j+1)+1]
slen = (j+1)-i+1
else:
break
print("Longest substring in alphabetical order is: " + lstring)
Let's go through your code step by step.
First we assume that the first character forms the longest sequence. What we will do is try improving this guess.
s = 'cyqfjhcclkbxpbojgkar'
lstring = s[0]
slen = 1
The first loop then picks some index i, it will be the start of a sequence. From there, we will check all existing sequences starting from i by looping over the possible end of a sequence with the nested loop.
for i in range(len(s)): # This loops over the whole string indices
for j in range(i,len(s)-1): # This loops over indices following i
This nested loops will allow us to check every subsequence by picking every combination of i and j.
The first if statement intends to check if that sequence is still an increasing one. If it is not we break the inner loop as we are not interested in that sequence.
if s[j+1] >= s[j]:
...
else:
break
We finally need to check if the current sequence we are looking at is better than our current guess by comparing its length to slen, which is our best guess.
if (j+1)-i+1 > slen:
lstring = s[i:(j+1)+1]
slen = (j+1)-i+1
Improvements
Note that this code is not optimal as it needlessly traverses your string multiple times. You could implement a more efficient approach that traverses the string only once to recover all increasing substrings and then uses max to pick the longuest one.
s = 'cyqfjhcclkbxpbojgkar'
substrings = []
start = 0
end = 1
while end < len(s):
if s[end - 1] > s[end]:
substrings.append(s[start:end])
start = end + 1
end = start + 1
else:
end += 1
lstring = max(substrings, key=len)
print("Longest substring in alphabetical order is: " + lstring)
The list substrings looks like this after the while-loop: ['cy', 'fj', 'ccl', 'bx', 'bo', 'gk']
From these, max(..., key=len) picks the longuest one.
A string is palindrome if it reads the same forward and backward. Given a string that contains only lower case English alphabets, you are required to create a new palindrome string from the given string following the rules gives below:
1. You can reduce (but not increase) any character in a string by one; for example you can reduce the character h to g but not from g to h
2. In order to achieve your goal, if you have to then you can reduce a character of a string repeatedly until it becomes the letter a; but once it becomes a, you cannot reduce it any further.
Each reduction operation is counted as one. So you need to count as well how many reductions you make. Write a Python program that reads a string from a user input (using raw_input statement), creates a palindrome string from the given string with the minimum possible number of operations and then prints the palindrome string created and the number of operations needed to create the new palindrome string.
I tried to convert the string to a list first, then modify the list so that should any string be given, if its not a palindrome, it automatically edits it to a palindrome and then prints the result.after modifying the list, convert it back to a string.
c=raw_input("enter a string ")
x=list(c)
y = ""
i = 0
j = len(x)-1
a = 0
while i < j:
if x[i] < x[j]:
a += ord(x[j]) - ord(x[i])
x[j] = x[i]
print x
else:
a += ord(x[i]) - ord(x[j])
x [i] = x[j]
print x
i = i + 1
j = (len(x)-1)-1
print "The number of operations is ",a print "The palindrome created is",( ''.join(x) )
Am i approaching it the right way or is there something I'm not adding up?
Since only reduction is allowed, it is clear that the number of reductions for each pair will be the difference between them. For example, consider the string 'abcd'.
Here the pairs to check are (a,d) and (b,c).
Now difference between 'a' and 'd' is 3, which is obtained by (ord('d')-ord('a')).
I am using absolute value to avoid checking which alphabet has higher ASCII value.
I hope this approach will help.
s=input()
l=len(s)
count=0
m=0
n=l-1
while m<n:
count+=abs(ord(s[m])-ord(s[n]))
m+=1
n-=1
print(count)
This is a common "homework" or competition question. The basic concept here is that you have to find a way to get to minimum values with as few reduction operations as possible. The trick here is to utilize string manipulation to keep that number low. For this particular problem, there are two very simple things to remember: 1) you have to split the string, and 2) you have to apply a bit of symmetry.
First, split the string in half. The following function should do it.
def split_string_to_halves(string):
half, rem = divmod(len(string), 2)
a, b, c = '', '', ''
a, b = string[:half], string[half:]
if rem > 0:
b, c = string[half + 1:], string[rem + 1]
return (a, b, c)
The above should recreate the string if you do a + c + b. Next is you have to convert a and b to lists and map the ord function on each half. Leave the remainder alone, if any.
def convert_to_ord_list(string):
return map(ord, list(string))
Since you just have to do a one-way operation (only reduction, no need for addition), you can assume that for each pair of elements in the two converted lists, the higher value less the lower value is the number of operations needed. Easier shown than said:
def convert_to_palindrome(string):
halfone, halftwo, rem = split_string_to_halves(string)
if halfone == halftwo[::-1]:
return halfone + halftwo + rem, 0
halftwo = halftwo[::-1]
zipped = zip(convert_to_ord_list(halfone), convert_to_ord_list(halftwo))
counter = sum([max(x) - min(x) for x in zipped])
floors = [min(x) for x in zipped]
res = "".join(map(chr, floors))
res += rem + res[::-1]
return res, counter
Finally, some tests:
target = 'ideal'
print convert_to_palindrome(target) # ('iaeai', 6)
target = 'euler'
print convert_to_palindrome(target) # ('eelee', 29)
target = 'ohmygodthisisinsane'
print convert_to_palindrome(target) # ('ehasgidihmhidigsahe', 84)
I'm not sure if this is optimized nor if I covered all bases. But I think this pretty much covers the general concept of the approach needed. Compared to your code, this is clearer and actually works (yours does not). Good luck and let us know how this works for you.
Given a string, find the longest substring whose characters are contiguous (i.e. they are consecutive letters) but possibly jumbled (i.e. out of order). For example:
Input : "owadcbjkl"
Output: "adcb"
We consider adcb as contiguous as it forms abcd.
(This is an interview question.)
I have an idea of running a while loop with 2 conditions, one that checks for continuous characters using Python's ord and another condition to find the minimum and maximum and check if all the following characters fall in this range.
Is there any way this problem could be solved with low running time complexity? The best I can achieve is O(N^2) where N is the length of the input string and ord() seems to be a slow operation.
If the substring is defined as ''.join(sorted(substr)) in alphabet then:
there is no duplicates in the substring and therefore the size of
the longest substring is less than (or equal to) the size of the alphabet
(ord(max(substr)) - ord(min(substr)) + 1) == len(substr), where
ord() returns position in the alphabet (+/- constant) (builtin
ord() can be used for lowercase ascii letters)
Here's O(n*m*m)-time, O(m)-space solution, where n is len(input_string) and m is len(alphabet):
from itertools import count
def longest_substr(input_string):
maxsubstr = input_string[0:0] # empty slice (to accept subclasses of str)
for start in range(len(input_string)): # O(n)
for end in count(start + len(maxsubstr) + 1): # O(m)
substr = input_string[start:end] # O(m)
if len(set(substr)) != (end - start): # found duplicates or EOS
break
if (ord(max(substr)) - ord(min(substr)) + 1) == len(substr):
maxsubstr = substr
return maxsubstr
Example:
print(longest_substr("owadcbjkl"))
# -> adcb