Displaying multiple substring indices within a string by using recursion

Displaying multiple substring indices within a string by using recursion - python

def subStringMatchExact(target, key):
if (target.find(key) == -1):
return []
else:
foundStringAt = [target.find(key)]
target = target[foundStringAt[0] + len(key):]
return foundStringAt + subStringMatchExact(target, key)
string = subStringMatchExact("your code works with wrongly correlated coefficients which incorporates more costs", "co")
print(string)
Current incorrect output:
[5, 22, 9, 19, 14]
I am having trouble summing the length of the substring on the previous recursion step. Like the second element of the list should be 29 instead of 22 as in len(previousSubstring) + len(key) - 1 + len(currentSubstring).
Any ideas to improve my code and/or fix my error too?

The fast way
You don't have to implement your own solution, its already done! Use the finditer function from the re module:
>>> import re
>>> s = 'your code works with wrongly correlated coefficients which incorporates more costs'
>>> matches = re.finditer('co', s)
>>> positions = [ match.start() for match in matches ]
>>> positions
[5, 29, 40, 61, 77]
Your own way
If you want to make your own implementation (using recursion) you could take advantage of the extra arguments of the str.find function. Lets see what help(str.find) says about it:
S.find(sub [,start [,end]]) -> int
Return the lowest index in S where substring sub is found,
such that sub is contained within s[start:end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
There is an extra argument called start that tells str.find where to start searching the substring. That's just what we need!
So, modifying your implementation, we can get a simple, fast and beautiful solution:
def substring_match_exact(pattern, string, where_should_I_start=0):
# Save the result in a variable to avoid doing the same thing twice
pos = string.find(pattern, where_should_I_start)
if pos == -1:
# Not found!
return []
# No need for an else statement
return [pos] + substring_match_exact(pattern, string, pos + len(key))
What is the recursion doing here?
You're first searching the substring in the string starting at position 0.
If the substring wasn't found, an empty list is returned [].
If the substring was found, it will be returned [pos] plus all the positions where the substring will appear in the string starting at position pos + len(key).
Using our brand new function
>>> s = 'your code works with wrongly correlated coefficients which incorporates more costs'
>>> substring_match_exact('co', s)
[5, 29, 40, 61, 77]

Currently, your code is attempting to find the index of co in the shortened string, rather than the original string. Therefore, while [5, 22, 9, 19, 14] may seem incorrect, the script is doing exactly what you told it to do. By including an offset, like the script below, this code could work.
def subStringMatchExact(target, key, offset=0): # note the addition of offset
if (target.find(key) == -1):
return []
else:
foundStringAt = target.find(key)
target = target[foundStringAt + len(key):]
foundStringAt += offset # added
return [foundStringAt] + subStringMatchExact(target, key, foundStringAt + len(key))
# added foundStringAt + len(key) part
string = subStringMatchExact("your code works with wrongly correlated coefficients which incorporates more costs", "co")
# no need to call w/ 0 since offset defaults to 0 if no offset is given
print(string)
I should add that making foundStringAt a list from the beginning isn't great practice when dealing with only one value, as you add some overhead with every [0] index lookup. Instead, since you want a list return type, you should just enclose it in [] in the return statement (as shown in my code).

You are always adding the position in the respective substring. In
return foundStringAt + subStringMatchExact(target, key)
, the result of the function call is related to the "new" string target which is different from the "Old" one, as it was redefined with target = target[foundStringAt[0] + len(key):].
So you should add exactly this value to the function call results:
foundStringAt = target.find(key)
offset = foundStringAt + len(key)
target = target[offset:]
return [foundStringAt] + [i + offset for i in subStringMatchExact(target, key)]
should do the trick (untested).

I wouldn't bother using recursion for this, except as an exercise.
To fix the problem:
I am having trouble summing the length of the substring on the previous recursion step.
What you really want to "sum" is the amount of the string that has already been searched. Pass this to the function as a parameter (use 0 for the first call), adding the amount of string removed (foundStringAt[0] + len(key):, currently) to the input value for the recursive call.
As a matter of formatting (and to make things correspond better to their names), you'll probably find it neater to let foundStringAt store the result directly (instead of that 1-element list), and do the list wrapping as part of the expression with the recursive call.

Related

The longest prefix that is also suffix of two lists

So I have two lists:
def function(w,w2): # => this is how I want to define my function (no more inputs than this 2 lists)
I want to know the biggest prefix of w which is also suffix of w2.
How can I do this only with logic (without importing anything)

I can try and help get you started on this problem, but it sort of sounds like a homework question so I won't give you a complete answer (per these guidelines).
If I were you I'd start with a small case and build up from there. Lets start with:
w = "ab"
w2 = "ba"
The function for this might look like:
def function(w,w2):
prefix = ""
# Does the first letter of w equal the last letter of w2?
if w[0] == w2[-1]:
prefix += w[0]
# What about the second letter?
if w[1] == w2[-2]:
prefix += w[1]
return prefix
Then when you run print(function(w,w2)) you get ab.
This code should work for 2 letter words, but what if the words are longer? This is when we would introduce a loop.
def function(w,w2):
prefix = ""
for i in range(0, len(w)):
if w[i] == w2[(i+1)*-1]:
prefix+= w[i]
else:
return prefix
return prefix
Hopefully this code will offer a good starting place for you! One issue with what I have written is what if w2 is shorter than w. Then you will get an index error! There are a few ways to solve this, but one way is to make sure that w is always the shorter word. Best of luck, and feel free to DM me if you have other questions.

A simple iterative approach could be:
Start from the longest possible prefix (i.e. all of w), and test it against a w2 suffix of the same length.
If they match, you can return it immediately, since it must be the longest possible match.
If they don't match, shorten it by one, and repeat.
If you never find a match, the answer is an empty string.
In code, this looks like:
>>> def function(w, w2):
... for i in range(len(w), 0, -1):
... if w[:i] == w2[-i:]:
... return w[:i]
... return ''
...
>>> function("asdfasdf", "qwertyasdf")
'asdf'
The slice operator (w[:i] for a prefix of length i, w2[-i:] for a suffix of length i) gracefully handles mismatched lengths by just giving you a shorter string if i is out of the range of the given string (which means they won't match, so the iteration is forced to continue until the lengths do match).
>>> function("aaaaaba", "ba")
'a'
>>> function("a", "abbbaababaa")
'a'

Recursively searching for a string in a list of characters

I have a problem to solve which is to recursively search for a string in a list (length of string and list is atleast 2) and return it's positions. for example: if we had ab with the list ['a','b','c'], the function should return '(0,2)', as ab starts at index 0 and ends at 1 (we add one more).
if we had bc with the same list the function should return '(1,3)'.
if we had ac with the same list the function should return not found.
Note that I'm solving a bigger problem which is to recursively search for a string in a matrix of characters (that appears from up to down, or left to right only), but I am nowhere near the solution, so I'm starting by searching for a word in a row of a matrix on a given index (as for searching for a word in a normal list), so my code might have char_mat[idx], treat it as a normal list like ['c','d','e'] for example.
Note that my code is full of bugs and it doesn't work, so I explained what I tried to do under it.
def search_at_idx(search_word, char_mat, idx, start, end):
if len(char_mat[idx]) == 2:
if ''.join(char_mat[idx]) == search_word:
return 0,2
else:
return 'not found', 'not found'
start, end = search_at_idx(search_word, char_mat[idx][1:], idx, start+1, end)
return start, end
The idea of what I tried to do here is to find the base of the recursion (when the length of the list reaches 2), and with that little problem I just check if my word is equal to the chars when joined together as a string, and return the position of the string if it's equal else return not found
Then for the recursion step, I send the list without the first character, and my start index +1, so if this function does all the job for me (as the recursion hypothesis), I need to check the last element in the list so my recursion works. (but I don't know really if this is the way to do it since the last index can be not in the word, so I got stuck). Now I know that I made alot of mistakes and I'm nowhere near the correct answer,I would really appreciate any explanation or help in order to understand how to do this problem and move on to my bigger problem which is finding the string in a matrix of chars.

I've thrown together a little example that should get you a few steps ahead
char_mat = [['c', 'e', 'l', 'k', 'v'],]
search_word = 'lk'
def search_at_idx(search_word, char_mat, idx, start=0):
if len(char_mat[idx]) < len(search_word):
return 'not', 'found'
if ''.join(char_mat[idx][:len(search_word)]) == search_word:
return start, start+len(search_word)
char_mat[idx] = char_mat[idx][1:]
start, end = search_at_idx(search_word, char_mat, idx, start+1)
return start, end
print(search_at_idx(search_word, char_mat, 0))
To point out a few errors of yours:
In your recursion, you use char_mat[idx][1:]. This will pass a slice of the list and not the modified matrix. That means your next call to char_mat[idx] will check the letter at that index in the array. I'll recommend using the debugger and stepping through the program to check the contents of your variables
Instead of using start and end, you can always assume that the found word has the same length as the word you are searching for. So the distance you have to look is always start + len(search_word)
If you have any additional questions about my code, please comment.
Here's an example for list comprehension if that counts as loophole:
foundword = list(map("".join, list(zip(*([char_mat[idx][i:] + list(char_mat[idx][i-1]) for i in range(len(search_word))])))[:-1])).index(search_word)
print((foundword, foundword + len(search_word)) if foundword else 'Not found')

l = ["a","b","c"]
def my_indexes(pattern, look_list, indx_val):
if pattern == "".join(look_list)[:2]:
return indx_val, indx_val+1
else:
if len(look_list) == 2:
return None
return my_indexes(pattern, look_list[1:],indx_val+1)
print(my_indexes("bc",l,0))
Two options:
1.We find the case we are looking for, so the first two elements of our list are "ab", or
2. "a" and "b" are not first two elements of our list. call the same function without first element of the list,and increase indx_val so our result will be correct.We stop doing this when the len(list) = 2 and we didn't find a case. (assuming we're looking for length of 2 chars)
edit: for all lengths
l = ["a","b","c","d"]
def my_indexes(pattern, look_list, indx_val):
if pattern == "".join(look_list)[:len(pattern)]:
return indx_val, indx_val+len(pattern) # -1 to match correct indexes
else:
if len(look_list) == len(pattern):
return None
return my_indexes(pattern, look_list[1:],indx_val+1)
print(my_indexes("cd",l,0))

Algorithm to print all valid combinations of n pairs of parentheses [duplicate]

This question already has answers here:
Algorithm to print all valid combations of n pairs of parenthesis
(3 answers)
Closed 2 years ago.
This is a very popular interview question and there are tons of pages on the internet about the solution to this problem.
eg. Calculating the complexity of algorithm to print all valid (i.e., properly opened and closed) combinations of n-pairs of parentheses
So before marking this as a duplicate question please read the full details.
I implemented my own solution to this problem but I'm missing some edge cases that I'm having a hard time to figure out.
def get_all_parens(num):
if num == 0:
return []
if num == 1:
return ['()']
else:
sub_parens = get_all_parens(num - 1)
temp = []
for parens in sub_parens:
temp.append('(' + parens + ')')
temp.append('()' + parens)
temp.append(parens + '()')
return set(temp)
there is basically a recursive call to subproblems and putting parenthesis around the combinations from subproblem.
For num = 4, it returns 13 possible combinations however the correct answer is 14, and the missing one is (())(())
I'm not sure what I'm doing wrong here. is this a right direction I'm moving towards or it's a completely wrong approach?
For the first time reader here is the question:
Implement an algorithm to print all valid (e.g., properly opened and closed) combinations of n pairs of parentheses.
E.G Input: 3, Output: ()()(), ()(()), (())(), (()()), ((()))

It looks like a wrong approach.
As you can see in your failure case (())(()) your algorithm may only obtain such string by placing parenthesis around ())((). Unfortunately the latter is not a valid combination, and cannot be generated: the prior recursive call only builds valid ones.

There are many things to correct in your approach.
recursion - it is not the fastest solution
returning set from list with duplicates (did you consider only set instead of list?)
approach of generating only 3 types of new combinations:
a) surrounding parentheses
b) parentheses on the left
c) parentheses on the right,
which also generates many duplications and omits the symmetrical results
You can try to add one additional loop (it will not reduce problems mentioned above) but it will add the expected results to the returned set.
I modified your function by adding only one loop (my proposition is to use every position of ( and add parentheses in the middle of that string):
def get_all_parens(num):
if num == 0:
return []
if num == 1:
return ['()']
else:
sub_parens = get_all_parens(num - 1)
temp = []
for parens in sub_parens:
temp.append('()' + parens)
temp.append('(' + parens + ')')
temp.append(parens + '()')
# added loop
last_index = 0
for _ in range(parens.count('(')):
temp.append(parens[:last_index] + '()' + parens[last_index:])
last_index = parens.index('(', last_index) + 1
# end of added loop
return set(temp)
EDIT:
I propose linear version of that algorithm:
def get_all_combinations(n):
results = set()
for i in range(n):
new_results = set()
if i == 0:
results = {"()"}
continue
for it in results:
output = set()
last_index = 0
for _ in range(it.count("(")):
output.add(it[:last_index] + "()" + it[last_index:])
last_index = it.index("(", last_index) + 1
output.add(it[:last_index] + "()" + it[last_index:])
new_results.update(output)
results = new_results
return list(results), len(results)

Extra space added on reverse string function

I'm trying to figure out how to reverse a string using Python without using the [::-1] solution.
My code seems to work fine for several test cases, but it adds an extra space for one instance and I can't figure out why.
def reverse(s):
r = list(s)
start, end = 0, len(s) - 1
x = end//2
for i in range(x):
r[start], r[end] = r[end], r[start]
start += 1
end -= 1
print(''.join(r))
reverse('A man, a plan, a canal: Panama')
# returns 'amanaP :lanac a,nalp a ,nam A'
# note the double space ^^ - don't know why
reverse('a monkey named fred, had a banana')
# 'returns ananab a dah ,derf deman yeknom a'
reverse('Able was I ere I saw Elba')
# returns 'ablE was I ere I saw elbA'

Change
x = end//2
to
x = len(s)//2

The bug appears to be related to the handling of even-length strings. A much easier way to build a string reverse function would be:
def reverse(s):
result = ""
for character in reversed(s): #Reversed returns an object that, when used in a for loop, outputs each object of a string, list, or other iterable, in reverse order.
result += character #Add that character back to the result.
return result
This function works regardless of string length. I hope this helps.

Using the technique you're using it's probably clearer to test start agains end directly rather than trying to manage lengths and indexes. You can do that with while start < end:. For example:
def reverse(s):
r = list(s)
start, end = 0, len(s) - 1
while start < end:
r[start], r[end] = r[end], r[start]
start += 1
end -= 1
print(''.join(r))
reverse('A man, a plan, a canal: Panama')
prints
amanaP :lanac a ,nalp a ,nam A

Your bug is in the boundary condition for an even-length string:
start, end = 0, len(s) - 1
x = end//2
for i in range(x):
For instance, with 8 characters, your for iterator is (range(3)), which gets you only the first three positions.
You fail to swap the middle pair.
The "clean" way to fix this is to change your x calculation:
x = (end+1) // 2
or, as others have said
x = len(s) // 2

Taking long time to execute Python code for the definition

This is the problem definition:
Given a string of lowercase letters, determine the index of the
character whose removal will make a palindrome. If is already a
palindrome or no such character exists, then print -1. There will always
be a valid solution, and any correct answer is acceptable. For
example, if "bcbc", we can either remove 'b' at index or 'c' at index.
I tried this code:
# !/bin/python
import sys
def palindromeIndex(s):
# Complete this function
length = len(s)
index = 0
while index != length:
string = list(s)
del string[index]
if string == list(reversed(string)):
return index
index += 1
return -1
q = int(raw_input().strip())
for a0 in xrange(q):
s = raw_input().strip()
result = palindromeIndex(s)
print(result)
This code works for the smaller values. But taken hell lot of time for the larger inputs.
Here is the sample: Link to sample
the above one is the bigger sample which is to be decoded. But at the solution must run for the following input:
Input (stdin)
3
aaab
baa
aaa
Expected Output
3
0
-1
How to optimize the solution?

Here is a code that is optimized for the very task
def palindrome_index(s):
# Complete this function
rev = s[::-1]
if rev == s:
return -1
for i, (a, b) in enumerate(zip(s, rev)):
if a != b:
candidate = s[:i] + s[i + 1:]
if candidate == candidate[::-1]:
return i
else:
return len(s) - i - 1
First we calculate the reverse of the string. If rev equals the original, it was a palindrome to begin with. Then we iterate the characters at the both ends, keeping tab on the index as well:
for i, (a, b) in enumerate(zip(s, rev)):
a will hold the current character from the beginning of the string and b from the end. i will hold the index from the beginning of the string. If at any point a != b then it means that either a or b must be removed. Since there is always a solution, and it is always one character, we test if the removal of a results in a palindrome. If it does, we return the index of a, which is i. If it doesn't, then by necessity, the removal of b must result in a palindrome, therefore we return its index, counting from the end.

There is no need to convert the string to a list, as you can compare strings. This will remove a computation that is called a lot thus speeding up the process. To reverse a string, all you need to do is used slicing:
>>> s = "abcdef"
>>> s[::-1]
'fedcba'
So using this, you can re-write your function to:
def palindromeIndex(s):
if s == s[::-1]:
return -1
for i in range(len(s)):
c = s[:i] + s[i+1:]
if c == c[::-1]:
return i
return -1
and the tests from your question:
>>> palindromeIndex("aaab")
3
>>> palindromeIndex("baa")
0
>>> palindromeIndex("aaa")
-1
and for the first one in the link that you gave, the result was:
16722
which computed in about 900ms compared to your original function which took 17000ms but still gave the same result. So it is clear that this function is a drastic improvement. :)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Displaying multiple substring indices within a string by using recursion - python

Related

The longest prefix that is also suffix of two lists

Recursively searching for a string in a list of characters

Algorithm to print all valid combinations of n pairs of parentheses [duplicate]

Extra space added on reverse string function

Taking long time to execute Python code for the definition

Categories

Resources