Emulate Python str.find(substring) using iteration but not built-in functions - python

How can I find the position of a substring in a string without using str.find() in Python? How should I loop it?
def find substring(string,substring):
for i in xrange(len(string)):
if string[i]==substring[0]:
print i
else: print false
For example, when string = "ATACGTG" and substring = "ACGT", it should return 2. I want to understand how str.find() works

You can use Boyer-Moore or Knuth-Morris-Pratt. Both create tables to precalculate faster moves on each miss. The B-M page has a python implementation. And both pages refer to other string-searching algorithms.

I can't think of a way to do it without any built-in functions at all.
I can:
def find_substring(string, substring):
def starts_with(string, substring):
while True:
if substring == '':
return True
if string == '' or string[0] != substring[0]:
return False
string, substring = string[1:], substring[1:]
n = 0
while string != '' and substring != '':
if starts_with(string, substring):
return n
string = string[1:]
n += 1
return -1
print(find_substring('ATACGTG', 'ACGT'))
I.e. avoiding built-ins len(), range(), etc. By not using built-in len() we lose some efficiency in that we could have finished sooner. The OP specified iteration, which the above uses, but the recursive variant is a bit more compact:
def find_substring(string, substring, n=0):
def starts_with(string, substring):
if substring == '':
return True
if string == '' or string[0] != substring[0]:
return False
return starts_with(string[1:], substring[1:])
if string == '' or substring == '':
return -1
if starts_with(string, substring):
return n
return find_substring(string[1:], substring, n + 1)
print(find_substring('ATACGTG', 'ACGT'))

Under the constraint of not using find, you can use str.index instead, which returns a ValueError if the substring is not found:
def find_substring(a_string, substring):
try:
print(a_string.index(substring))
except ValueError:
print('Not Found')
and usage:
>>> find_substring('foo bar baz', 'bar')
4
>>> find_substring('foo bar baz', 'quux')
Not Found
If you must loop, you can do this, which slides along the string, and with a matching first character then checks to see if the rest of the string startswith the substring, which is a match:
def find_substring(a_string, substring):
for i, c in enumerate(a_string):
if c == substring[0] and a_string[i:].startswith(substring):
print(i)
return
else:
print(False)
To do it with no string methods:
def find_substring(a_string, substring):
for i in range(len(a_string)):
if a_string[i] == substring[0] and a_string[i:i+len(substring)] == substring:
print(i)
return
else:
print(False)
I can't think of a way to do it without any built-in functions at all.

Related

Balanced Parentheses Program Python: Match Function Returning Incorrect Value

so I'm trying to do the "are the string of parentheses balanced?" program in Python and while my balanced function is working properly, the function that I created to check if the parentheses are a match is returning incorrect values. I'm going to attach the whole code, comments and all so that you can see. The first way I tried to do it was with conditional if/else statements. For that approach I kept getting False even if the parentheses were a match. For the second approach I kept getting TypeError: . This is my code.
from collections import deque
stack = deque()
#dir(stack)
#use a stack to see if an input string has a balanced set of parentheses
#function that tells which parentheses should match. will be used later
def is_match(paren1, paren2):
#dictionary for more efficiency rather than a bunch of conditionals
#match_dict = {
# ')': '(',
# ']': '[',
# '}': '{'
#}
if paren1 == '(' and paren2 == ')':
return True
if paren1 == '[' and paren2 == ']':
return True
if paren1 == '{' and paren2 == '}':
return True
else:
return False
#print(match_dict[paren1] == paren2)
#return match_dict[paren1] == paren2
def is_balanced(string):
#start with an iterative for loop to index through the string
for i in string:
#check to see if the index of the string is an open parentheses, if so, append to stack
if i in '([{':
stack.append([i])
print(i)
#if index is not in substring, check to see if string is empty
else:
if len(stack) == 0:
return 'not balanced'
else:
match = stack.pop()
if is_match(match, i) == True:
return 'balanced'
else:
return 'not balanced'
string = ('([{}])')
is_balanced(string)
Use stack.append(i) instead of stack.append([i]) to add the element i to the deque:
def is_balanced(string):
# start with an iterative for loop to index through the string
for i in string:
# check to see if the index of the string is an open parentheses, if so, append to stack
if i in "([{":
stack.append(i) # <- HERE!
print(i)
# ...
If you want to extend the deque by appending elements from an iterable argument ([i]), use extend:
stack.extend([i])
See Python documentation for more information.

Pattern match with character/numeric pattern in Python [duplicate]

This question already has answers here:
Product code looks like abcd2343, how to split by letters and numbers?
(6 answers)
Closed 1 year ago.
I'm trying to write a Python function that follows the pattern below. Essentially a pattern-matching algorithm is required.
isSubstring(pattern, word) -> bool
A) isSubstring("b1tecat", "bytecat") -> True
B) isSubstring("b2ecat", "bytecat") -> True
C) isSubstring("b5cat", "bytecat") -> False
D) isSubstring("b2tecat", "bytecat") -> False
E) isSubstring("bytecat", "bytecat") -> True
F) isSubstring("2", "be") -> True
G) isSubstring("2bbbb", "b") -> False
The code below is the basic solution that works for the (E) case from above, but obviously it does nothing to account for numbers in the pattern. Have searched leetcode, hackerrank, geeksforgeeks, etc, but can't find a decent solution.
def isSubstring(substring, string):
len_substring = len(substring)
len_string = len(string)
for i in range(len_string - len_substring + 1):
j = 0
while j < len_substring:
if string[i+j] != substring[j]:
break
j += 1
if j == len_substring:
return True
return False
How can I account for the numbers in the pattern?
You can use an index in the second array, and when the character(s) from the first array is/are numeric, evaluate that number as integer and increase the index with that number. I assume that if the left string looks like "a21b", it means the second string should have 21 characters between "a" and "b". To easily identify consecutive digits, I would suggest a regular expression \d+|\D to split up the first string into its individual parts:
import re
def isSubstring(substring, string):
tokens = re.findall(r"\d+|\D", substring)
for i in range(len(string)):
for ch in tokens:
if i >= len(string):
return False
elif ch.isdigit():
i += int(ch)
elif ch != string[i]:
break
else:
i += 1
else:
return True
return False
It will however be easier to rely on regular expressions themselves, as follows:
Convert the first string to a regular expression itself, and then see if the second string matches it:
import re
def isSubstring(substring, string):
regex = re.sub(r"(\d+)", r".{\1}", re.escape(substring))
return re.search(regex, string) is not None
Building on #trincot answer here, you can use his answer to build a regex that matches what you are searching for.
For example: "b2tecat" is really just r"b..tecat" - feed that to another re.findall and you will find all of the occurrences of your string.
import re
def isSubstring(substring, string):
regex = r""
for ch in re.findall(r"\d+|\D", substring):
if ch.isdigit():
regex += "." * int(ch)
else:
regex += ch
if re.search(regex, re.escape(string)):
return True
else:
return False
Side note: there are more "classical" ways to solve this problem - pattern matching with "don't cares". For example using fft.

How to write this iterative function to be recursive?

I need to write this iterative function to do the same thing but it must be recursive.
def task1(string: str):
for i in range(len(string)):
if string[i] != string[len(string) - i - 1]:
return False
return True
This is what i tried but it does not work.
def task1_recursion(string: str):
print(string)
if len(string) > 1:
if string[0] == task1_recursion(string[1::1]):
return True
else:
return False
else:
return string
My code seems to one the last recursion return string "" and that makes it to return False.
Just check the tip and the tail, continue with the string without them:
def task1_recursion(string: str):
# recursion base condition (exit condition)
if len(string) <= 1:
return True
# unpack values
first, *_, last = string
# check if they are different
if first != last:
return False
# if not continue checking the remaining string
return task1_recursion(string[1:-1])
If I understand correctly you want to check if a string is symmetric with the code in task1. My solution is below:
def fct(s: str, i: int):
if len(s) <= 1 or i == len(s):
return True
return s[i] == s[len(s) - 1 - i] and fct(s, i + 1)
I tested and fct produces the same result as task1. It needs an additional parameter for the index though. But you can wrap it inside another function if you want the parameter to include only the input string. i is always set to 0 when you call the function, e.g. fct("ABCCBA", 0).

How to see if a string only contains certain characters, and if they do, return True, else return False: Python

For python 3.4.1, how would you go about finding if certain characters are in your string? I tried doing it this way:
def isItBinary(myString):
for ele in myString:
if ele == '1' or if ele == '0':
return True
else:
return False
The problem with this code is that if I type isItBinary('102'), it will return True. I just want it to return True if and only if it contains '1' or '0'.
I would just use the all function.
def isItBinary(myString):
return all(x in ('0', '1') for x in myString)
The x in ('0', '1') checks that the character in x is either '0' or '1'.
You want to apply isitBinary on multiple characters, since as the way you wrote it it will return as soon as the first character is checked.
A simple way to do what you want would be:
def binaryChar(myCharacter):
return myCharacter == '1' or myCharacter == '0'
and then apply it to all of the chars in a string, like this:
def isItBinary(myString):
return all(binaryChar(c) for c in myString)
Of course, these can be simplified in a more readable way:
def isItBinary(myString):
return all(c in '01' for c in myString)
or via a lambda function:
isItBinary = lambda myString: all(c in '01' for c in myString)
Your program was returning the value as soon as the first character was encountered, you were not even iterating over the whole string. This approach below checks if the condition is invalid at any point, returns False otherwise iterates the whole string and returns True.
def isItBinary(myString):
for ele in myString:
if not ele in ("0","1"):
return False
return True
print isItBinary("102")
>>> False
print isItBinary("101")
>>> True
Use sets:
def is_it_binary(s):
return not (set(s) - set("01"))
If the set constructed from the string contains any characters not in the second set the subtraction gives a non-empty set. Otherwise if it contains only the characters in the second set you get an empty set which the not flips to True.
Alternatively just:
def is_it_binary(s):
allowed = set("01")
return all(c in allowed for c in s)
which has the advantage of short-circuiting (i.e. it bombs out as soon as an invalid character is found).

How to replace the Nth appearance of a needle in a haystack? (Python)

I am trying to replace the Nth appearance of a needle in a haystack. I want to do this simply via re.sub(), but cannot seem to come up with an appropriate regex to solve this. I am trying to adapt: http://docstore.mik.ua/orelly/perl/cookbook/ch06_06.htm but am failing at spanning multilines, I suppose.
My current method is an iterative approach that finds the position of each occurrence from the beginning after each mutation. This is pretty inefficient and I would like to get some input. Thanks!
I think you mean re.sub. You could pass a function and keep track of how often it was called so far:
def replaceNthWith(n, replacement):
def replace(match, c=[0]):
c[0] += 1
return replacement if c[0] == n else match.group(0)
return replace
Usage:
re.sub(pattern, replaceNthWith(n, replacement), str)
But this approach feels a bit hacky, maybe there are more elegant ways.
DEMO
Something like this regex should help you. Though I'm not sure how efficient it is:
#N=3
re.sub(
r'^((?:.*?mytexttoreplace){2}.*?)mytexttoreplace',
'\1yourreplacementtext.',
'mystring',
flags=re.DOTALL
)
The DOTALL flag is important.
I've been struggling for a while with this, but I found a solution that I think is pretty pythonic:
>>> def nth_matcher(n, replacement):
... def alternate(n):
... i=0
... while True:
... i += 1
... yield i%n == 0
... gen = alternate(n)
... def match(m):
... replace = gen.next()
... if replace:
... return replacement
... else:
... return m.group(0)
... return match
...
...
>>> re.sub("([0-9])", nth_matcher(3, "X"), "1234567890")
'12X45X78X0'
EDIT: the matcher consists of two parts:
the alternate(n) function. This returns a generator that returns an infinite sequence True/False, where every nth value is True. Think of it like list(alternate(3)) == [False, False, True, False, False, True, False, ...].
The match(m) function. This is the function that gets passed to re.sub: it gets the next value in alternate(n) (gen.next()) and if it's True it replaces the matched value; otherwise, it keeps it unchanged (replaces it with itself).
I hope this is clear enough. If my explanation is hazy, please say so and I'll improve it.
Could you do it using re.findall with MatchObject.start() and MatchObject.end()?
find all occurences of pattern in string with .findall, get indices of Nth occurrence with .start/.end, make new string with replacement value using the indices?
If the pattern ("needle") or replacement is a complex regular expression, you can't assume anything. The function "nth_occurrence_sub" is what I came up with as a more general solution:
def nth_match_end(pattern, string, n, flags):
for i, match_object in enumerate(re.finditer(pattern, string, flags)):
if i + 1 == n:
return match_object.end()
def nth_occurrence_sub(pattern, repl, string, n=0, flags=0):
max_n = len(re.findall(pattern, string, flags))
if abs(n) > max_n or n == 0:
return string
if n < 0:
n = max_n + n + 1
sub_n_times = re.sub(pattern, repl, string, n, flags)
if n == 1:
return sub_n_times
nm1_end = nth_match_end(pattern, string, n - 1, flags)
sub_nm1_times = re.sub(pattern, repl, string, n - 1, flags)
sub_nm1_change = sub_nm1_times[:-1 * len(string[nm1_end:])]
components = [
string[:nm1_end],
sub_n_times[len(sub_nm1_change):]
]
return ''.join(components)
I have a similar function I wrote to do this. I was trying to replicate SQL REGEXP_REPLACE() functionality. I ended up with:
def sql_regexp_replace( txt, pattern, replacement='', position=1, occurrence=0, regexp_modifier='c'):
class ReplWrapper(object):
def __init__(self, replacement, occurrence):
self.count = 0
self.replacement = replacement
self.occurrence = occurrence
def repl(self, match):
self.count += 1
if self.occurrence == 0 or self.occurrence == self.count:
return match.expand(self.replacement)
else:
try:
return match.group(0)
except IndexError:
return match.group(0)
occurrence = 0 if occurrence < 0 else occurrence
flags = regexp_flags(regexp_modifier)
rx = re.compile(pattern, flags)
replw = ReplWrapper(replacement, occurrence)
return txt[0:position-1] + rx.sub(replw.repl, txt[position-1:])
One important note that I haven't seen mentioned is that you need to return match.expand() otherwise it won't expand the \1 templates properly and will treat them as literals.
If you want this to work you'll need to handle the flags differently (or take it from my github, it's simple to implement and you can dummy it for a test by setting it to 0 and ignoring my call to regexp_flags()).

Categories