Finding the edit distance of two strings with recursion

Finding the edit distance of two strings with recursion - python

I need to use recursion to find the edit distance of two strings, i.e I give the function two arguments(each a different sring). And the function will find the least amount of changes required to change s1 into s2. This is what I have so far:
def edit_distance(s1,s2):
split1 = list(s1)
split2 = list(s2)
count = 0
pos = 0
if split1[pos] == split2[pos]:
pos += 1
else:
pos +=1
count += 1
edit_distance(s1, s2)
return count #This should be the minimum amount required to match the two strings

I annotated your code to show you the code flow. I hope you understand now why you get the error:
def edit_distance(s1,s2):
split1 = list(s1) # Split strings into characters
split2 = list(s2)
count = 0 # This variable is local, it is not shared through calls to the function!
pos = 0 # Same
if split1[pos] == split2[pos]: # pos is always 0 here!
pos += 1 # pos is incremented anyway, in if but also in else !
else:
pos +=1 # See above
count += 1 # count is incremented, now it is 1
edit_distance(s1, s2) # recursive call, but with the identical arguments as before! The next function call will do exactly the same as this one, resulting in infinite recursion!
return count # Wrong indentation here
Your function does not do what you want. In case you are talking about Hamming distance, which is not really clear to me still, here is a sample implementation assuming the lengths of both strings are equal:
# Notice that pos is passed between calls and initially set to zero
def hamming(s1, s2, pos=0):
# Are we after the last character already?
if pos < len(s1):
# Return one if the current position differs and add the result for the following positions (starting at pos+1) to that
return (s1[pos] != s2[pos]) + hamming(s1, s2, pos+1)
else:
# If the end is already reached, the remaining distance is 0
return 0

Related

Need some help on a function

Write a function named one_frame that takes one argument seq and performs the tasks specified below. The argument seq is to be a string that contains information for the bases of a DNA sequence.
a → The function searches given DNA string from left to right in multiples of three nucleotides (in a single reading frame).
b → When it hits a start codon ATG it calls get_orf on the slice of the string beginning at that start codon.
c → The ORF returned by get_orf is added to a list of ORFs.
d → The function skips ahead in the DNA string to the point right after the ORF that we just found and starts looking for the next ORF.
e → Steps a through d are repeated until we have traversed the entire DNA string.
The function should return a list of all ORFs it has found.
def one_frame(seq):
start_codon = 'ATG'
list_of_codons = []
y = 0
while y < len(seq):
subORF = seq[y:y + 3]
if start_codon in subORF:
list_of_codons.append(get_orf(seq))
return list_of_codons
else:
y += 3
one_frame('ATGAGATGAACCATGGGGTAA')
The one_frame at the very bottom is a test case. It is supposed to be equal to ['ATGAGA', 'ATGGGG'], however my code only returns the first item in the list.
How could I fix my function to also return the other part of that list?

You have several problems:
You have return list_of_codons inside the loop. So you return as soon as you find the first match and only return that one. Put that at the end of the function, not inside the loop.
You have y += 3 in the else: block. So you won't increment y when you find a matching codon, and you'll be stuck in a loop.
You need to call get_orf() on the slice of the string starting at y, not the whole string (task b).
Task d says you have to skip to the point after the ORF that was returned in task b, not just continue at the next codon.
def one_frame(seq):
start_codon = 'ATG'
list_of_orfs = []
y = 0
while y < len(seq):
subORF = seq[y:y + 3]
if start_codon = subORF:
orf = get_orf(seq[y:])
list_of_orfs.append(orf)
y += len(orf)
else:
y += 3
return list_of_orfs
one_frame('ATGAGATGAACCATGGGGTAA')

You have a number of problems in this code, as identified in the comments. I think this does what you are actually supposed to do:
def one_frame(seq):
start_codon = 'ATG'
list_of_codons = []
y = 0
while y < len(seq):
if seq[y:y+3] == start_codon:
orf = get_orf(seq[y:])
list_of_codons.append(orf)
y += len(orf)
else:
y += 3
return list_of_codons
one_frame('ATGAGATGAACCATGGGGTAA')

Try splitting seq into codons instead:
def one_frame(seq):
shift = 3
codons = [seq[i:i+shift] for i in range(0, len(seq), shift)]
start_codon = "ATG"
orf_list = []
for codon in codons:
if codon == start_codon:
orf_list += [get_orf(codon)]
return orf_list
seq = 'ATGAGATGAACCATGGGGTAA'
one_frame(seq)

Slightly different approach but as I know nothing about DNA sequencing this may not make sense. Here goes anyway:
def one_frame(seq):
start_codon = 'ATG'
list_of_codons = []
offset = 0
while (i := seq[offset:].find(start_codon)) >= 0:
offset += i
list_of_codons.append(get_orf(seq[offset:]))
offset += len(list_of_codons[-1])
return list_of_codons
In this way the find() starts searching from the beginning of the sequence initially but subsequently only from the end of any previous codon

traversing through a list using recursion

So I am new to recursion and I am trying to make a program where you can enter a list and python tests each integer (lets say 9 for example) and sees if the integer following it is doubled. So if I entered a list of 2 4 8 16 32, would return 4, and -5 -10 0 6 12 9 36, would return 2 because -5 followed by -10 is one and 6 followed by 12 is the second. This is the code I have so far. I feel like I am very close. but just a few thing stand in my way. Any help would be great!
L = []
def countDouble(L):
x = input(f'Enter a list of numbers separated by a space: ')
y = (x.split(' '))
print(y[1])
print(y[0])
count = 0
y[0] += y[0]
# unsure of how to multiple y[0] by 2
if y[0]*2 == y[1]:
count += 1
else:
count += 0
#how would I traverse through the rest of the entered list using recursion?
print(count)
countDouble(L)

If you want/need to solve it using recursion, the following will do the trick:
def count_sequential_doubles(li, count=0):
return count_sequential_doubles(li[1:], count + int(li[0] * 2 == li[1])) if len(li) > 1 else count

I would suggest this recursive way:
def countDouble(L):
count = 0
if len(L) == 1:
return count
else:
if int(L[0])*2 == int(L[1]):
count += 1
return count + countDouble(L[1:])
x = input(f'Enter a list of numbers separated by a space: ')
y = (x.split(' '))
count = countDouble(y)
print(count)

I urge you to read the entire answer, but in case you are not interested in tips, notes and the process of finding the solution, here are two solutions:
solution using recursion (not recommended):
x = input()
y = x.split(' ')
count = 0
def countDouble(i):
if(i+1 == len(y)):
return 'recursion ends here when'
if(int(y[i])*2==int(y[i+1])):
count += 1
countDouble(i+1)
countDouble(0)
print(count)
this solution just imitates a while loop:
solution using a while loop (recommended):
x = input()
y = x.split(' ')
count = 0
i = 0
while(i < len(y) - 1):
if(int(y[i]) * 2 == int(y[i+1])):
count += 1
i += 1
print(count)
Before I continue, here are a few tips and notes: (some of them will only make sense after)
I assume the 14 in your example is a typo
I didn't put the code in a function because it's not needed, but you can change it easily.
In your code, you are passing L as a parameter to the countDouble() function, but you don't use it. if you don't need a parameter don't pass it.
when splitting the input, the values of the list are still strings. so you have to invert them to integers (for instance, you can do that with the int() 'function') before comparing their values - otherwise multiplying by 2 will just repeat the string. for example: '13'*2 is the string '1313'
I don't know why you why you added y[0] to itself in line 9, but based on the code that comes after this would yield incorrect results, you don't need to change the elements in order to get their value multiplied by 2.
notice that in the else block, nothing has changed. adding 0 to the count doesn't change it. so you can remove the else block entirely
While it's possible to solve the problem in recursion, there's something else designed for these kind of problems: loops.
The problem is essentially repeating a simple check for every element of a list.
This is how I would arrive to a solution
so we want to run the following 'code':
if(y[0]*2 == y[1]):
count += 1
if(y[1]*2 == y[2]):
count += 1
if(y[2]*2 == y[3]):
count += 1
...
of course the computer doesn't understand what "..." means, but it gives us an idea to the pattern in the code. now we can do the following:
divide the extended 'code' into similar sections.
identify the variables in the pattern - the values that change between sections
find the starting values of all variables
find a pattern in the changes of each variable
find a breaking point, a condition on one of the variables that tells us we have reached the last repeating section.
here are the steps in this specific problem:
the sections are the if statements
the variables are the indexes of the elements in y we compare
the first index starts at 0 and the second at 1
both indexes increase by one after each if-statement
when the second index is bigger then the last index of y then we already checked all the elements and we can stop
so all is left is to set the needed variables, have a while loop with the breaking condition we found, and in the while loop have the general case of the repeating sections and then the changing of the variables.
so:
x = input(f'Enter a list of numbers separated by a space: ')
y = (x.split(' '))
count = 0
# setting the starting values of the variables
index1 = 0
index2 = 1
# creating a loop with the breaking condition
while(index2 < len(y)):
# the general case of the repeated code:
if(int(y[index1]) * 2 == int(y[index2])):
count += 1
# changing the variables for the next loop
index1 += 1
index2 += 1
print(count)
We see that the index2 is just index1 + 1 at all time. so we can replace it like that:
x = input(f'Enter a list of numbers separated by a space: ')
y = (x.split(' '))
count = 0
index1 = 0
while(index1 + 1 < len(y)):
if(int(y[index1]) * 2 == int(y[index1 + 1])):
count += 1
index1 += 1
print(count)
Note: You can use a for loop similarly to the while loop
So in summary, you can use recursion to solve the problem, but the recursion would just be imitating the process of a loop:
in each call, the breaking condition will be checked, the repeated code would run and the variables/parameters would change.
Hope you find this answer useful :)

Final edit: OP edited his example so my other code didnt apply
Some good questions people are asking, but in the spirit of helping, here's a recursive function that returns the count of all doubles.
def get_doubles_count_with_recursion(a_list, count, previous=None):
while a_list:
try:
first = previous if previous else a_list.pop(0)
next_item = a_list.pop(0)
except IndexError:
return count
if next_item / 2 == first:
count += 1
return get_doubles_count_with_recursion(a_list, count, next_item)
return count
a_list = [1, 3, 5, 10, 11, 14, 28, 56, 88, 116, 232, 464, 500]
doubles = get_doubles_count_with_recursion(a_list, 0)
print(doubles == 5)
Probably could clean it up a bit, but it's a lot easier to read than the other guy's ;)

If I'm reading your question right, you want a count of all pairs where the 2nd item is double the first. (and the 14 in the first list is a typo). In which case a simple function like this should do the job:
#a = [2,4,8,16,32]
a = [-5, -10, 0, 16, 32]
count = 0
for i, x in enumerate(a):
# Stop before the list overflows
if i < len(a) - 1:
# If the next element is double the current one, increment the counter
if a[i+1] == x * 2:
count = count + 1
else:
break
print(count)

Finding longest sequence of consecutive repeats of a substring within a string

My code for the function is really messy and I cannot find why it returns a list of 1's. A solution would obviously be great, but with advice to make the code just better, i'd be happy
def cont_cons_repeats(ADN, STR, pos):
slong = 0
# Find start of sequence
for i in range(len(ADN[pos:])):
if ADN[pos + i:i + len(STR)] == STR:
slong = 1
pos = i + pos
break
if slong == 0:
return 0
# First run
for i in range(len(ADN[pos:])):
i += len(STR) - 1
if ADN[pos + i + 1:pos + i + len(STR)] == STR:
slong += 1
else:
pos = i + pos
break
# Every other run
while True:
pslong = cont_cons_repets(ADN, STR, pos)
if pslong > slong:
slong = pslong
if pslong == 0:
break
return slong
(slong stands for size of longest sequence, pslong for potential slong, and pos for position)

Assuming you pass in pos because you want to ignore the start of the string you're searching up to pos:
def longest_run(text, part, pos):
m = 0
n = 0
while pos < len(text):
if text[pos:pos+len(part)] == part:
n += 1
pos += len(part)
else:
m = max(n, m)
n = 0
pos += 1
return m
You say your function returns a list of 1s, but that doesn't seem to match what your code is doing. Your provided code has some syntax errors, including a misspelled call to your function cont_cons_repets, so it's impossible to say why you're getting that result.
You mentioned in the comments that you thought a recursive solution was required. You could definitely make it work as a recursive function, but in many cases where a recursive function works, you should consider a non-recursive function to save on resources. Recursive functions can be very elegant and easy to read, but remember that any recursive function can also be written as a non-recursive function. It's never required, often more resource-intensive, but sometimes just a very clean and easy to maintain solution.

How to make this function work more efficiently to count dominators?

I have to make a function where it would count the amount of dominators in a given list (number to the right of any number has to be smaller for it to be counted as a dominator.) For example, given the list [53,7,14,11,4,7] the function would return 4 since 53,14,11,7 are dominators(last item on the list is a dominator). Issue is that it takes a very long time to run for larger integers. Was just wondering if there is a less brute and more efficient way to achieve the same result?
Here's what I have:
def count_dominators(items): # works but takes a very long time to execute.
k = 0
for idx,item in enumerate(items):
dominator = True
for ritem in items[idx+1:]:
if item<=ritem:
dominator = False
break
if dominator:
k = k+1
return k

Going backwards, you get a linear-time algorithm:
def count_dominators(items):
rev = list(reversed(items))
if rev:
max = rev[0]
count = 1
for i in range(1,len(rev)):
if rev[i] > max:
max = rev[i]
count += 1
else:
count = 0
return count

This gives a linear time algorithm and avoids copying the input list (helpful if the list is large):
import math
def count_dominators(numbers):
count = 0
max = -math.inf
for number in reversed(numbers):
if number > max:
max = number
count += 1
return count

Return the number of words over min length and Replace ones that arent with word + spaces

For any name whose length is less than the min_length, replace that item of the list with a new string containing the original name with the space(s) added to the right-hand side to achieve the minimum length
example: min_length = 5 /// 'dog' after the change = 'dog '
and also Return the amount of names that were originally over the min_length in the list
def pad_names(list_of_names, min_length):
'''(list of str, int) -> int
Return the number of names that are longer than the minimum length.
>>> pad_names(list_of_names, 5)
2
'''
new_list = []
counter = 0
for word in list_of_names:
if len(word) > min_length:
counter = counter + 1
for word in list_of_names:
if len(word) < min_length:
new_list.append(word.ljust(min_length))
else:
new_list.append(word)
return(counter)
I currently am getting 5 errors on this function:
all errors are #incorrect return value
test_02: pad_names_one_name_much_padding_changes_list
Cause: The list is not changed correctly for the case of 1 name needing padding
test_04: pad_names_one_name_needs_one_pad_changes_list
Cause: one name needs one pad changes list
test_10: pad_names_all_padded_by_3_changes_list
Cause:all padded by 3 changes list
test_12: pad_names_all_padded_different_amounts_changes_list
Cause: all padded different amounts changes list
test_20: pad_names_one_empty_name_list_changed
Cause: one empty name list changed
This function does not need to be efficient just needs to pass those tests without creating more problems

Keeping in spirit with how you've written this, I'm guessing you want to be modifying the list of names so the result can be checked (as well as returning the count)...
def test(names, minlen):
counter = 0
for idx, name in enumerate(names):
newname = name.ljust(minlen)
if newname != name:
names[idx] = newname
else:
counter += 1
return counter

Not sure I completely understand your question, but here is a fairly pythonic way to do what I think you want:
data = ['cat', 'snake', 'zebra']
# pad all strings with spaces to 5 chars minimum
# uses string.ljust for padding and a list comprehension
padded_data = [d.ljust( 5, ' ' ) for d in data]
# get the count of original elements whose length was 5 or great
ct = sum( len(d) >= 5 for d in data )

You mean this?
def pad_names(list_of_names, min_length):
counter = 0
for i, val in enumerate(list_of_names):
if len(val) >= min_length:
counter += 1
elif len(val) < min_length:
list_of_names[i] = val.ljust(min_length)
return counter
OR:
def pad_names(words, min_length):
i, counter = 0, 0
for val in words:
if len(val) >= min_length:
counter += 1
elif len(val) < min_length:
words[i] = val.ljust(min_length)
i += 1
return counter

string.ljust() is a quick and simple way to pad strings to a minimum length.
Here's how I would write the function:
def pad_names(list_of_names, min_length):
# count strings that meets the min_length requirement
count = sum(1 for s in list_of_names if len(s) > min_length)
# replace list with updated content, using ljust() to pad strings
list_of_names[:] = [s.ljust(min_length) for s in list_of_names]
return count # return num of strings that exceeds min_length
While succinct, that may not be the most efficient approach for large datasets since we're essentially creating a new list and copying it over the new one.
If performance is an issue, I'd go with a loop and only update the relevant entries.
def pad_names(list_of_names, min_length):
count = 0
for i, s in enumerate(list_of_names):
if len(s) > min_length: # should it be >= ??
count += 1 # increment counter
else:
list_of_names[i] = s.ljust(min_length) # update original list
return count

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding the edit distance of two strings with recursion - python

Related

Need some help on a function

traversing through a list using recursion

Finding longest sequence of consecutive repeats of a substring within a string

How to make this function work more efficiently to count dominators?

Return the number of words over min length and Replace ones that arent with word + spaces

Categories

Resources