python basic regex function - python

I am trying to write a function that implements a simple regex matching algorithm. The special characters "*" and "?" should stand for 1 and n>=0 degrees of freedom respectively. For example the strings
y="abc" and x="a*c",
y="abc" and x="a?c",
y="abddddzfjc" and x="a?" or x="a?c"
should return True, whereas the strings
y="abcd" and x="a*d",
y="abcdef" and x="a?d*"
should return False.
My method is to run in a loop and shorten the strings as each subsequent match is identified, which works fine for identical matches or single * with alphabet character matches, but I am a stumped on about how to do it for edge cases like the last example. To handle the case where "?" has n degrees of freedom, I loop forward in the right string to find the next alphabet character, then try to find that character in the left string, looking from right to left. I am sure there is a more elegant way (maybe with a generator?!).
def match_func(x,y):
x, y = list(x), list(y)
if len(x)==len(y)==1:
if x[0] == y[0] or bool((set(x)|set(y)) & {"?","*"})
return True
elif len(x)>0 and len(y)==0:
return False
else:
for ix, char in enumerate(x):
if char==y[ix] or char=="*":
return match_func(x[ix+1:],y[ix+1:])
else:
if char=="?"
if ix==len(x)=1: return True
##check if the next letter in x has an eventual match in y
peek = ix+1
next_char = x[peek]
while peek<len(x)-1:
next_char = x[peek]
if next_char.isalpha():
break
else: peek+=1
if peek == len(x)-1:
return True
ys = ''.join(y)
next_char_ix = ys[ix].rfind(next_char)
##search y for next possible match?
if next_char_ix!=-1:
return match_func(x[peek:], y[next_char_ix:])
else:
return False
else:
return False
return True

First decide whether to make your match algorithm a minimal or maximal search. Meaning, if your pattern is a, and your subject string is aa, does the match occur at the first or second position? As you state the problem, either choice seems to be acceptable.
Having made that choice, it will become clear how you should traverse the string - either as far to the right as possible and then working backward until you either match or fail; or starting at the left and backtracking after each attempt.
I recommend a recursive implementation either way. At each position, evaluate whether you have a possible match. If so, make your recursive call advancing the appropriate amount down both the pattern and subject string. If not, give up. If there is no match for the first character of the pattern, advance only the subject string (according to your minimal/maximal choice) and try again.
The tricky part is, you have to consider variable-length tokens in your pattern as possible matches even if the same character also matches a literal character following that wildcard. That puts you in the realm of depth-first search. Evaluating patterns like a?a?a?a on subject strings like aaaabaaaa will be lots of fun, and if you push it too far, may take years to complete.
Your professor chose well the regex operators to give you to make the assignment of meaningful depth, without the tedium of writing a full-on parser and lexer just to make the thing work.
Good luck!

Related

python function returns true when it should return false

I am trying to write a function to check whether a string is palindrome or not, but every string is showing as palindrome
def is_palindrome(input_string):
x=0
reverse_string = ""
while x<len(input_string):
reverse_string+=input_string[x]
x=x+1
if input_string == reverse_string:
return True
else:
return False
print(is_palindrome("abc")) # Should be False but it return True
In your code, the variabile "reverse_string" will always be equal to "input_string" since you are just appending the characters in the same order with the += operator.
A simple way to reverse a string in Python is to use slicing like that:
def is_palindrome(input_string):
if input_string == input_string[::-1]:
return True
return False
input_string[::-1] means "start from the first index to the last in the reverse order (-1)"
Your problem is in the reversal of the string. (your x is going from 0 to len(input_string)-1 but it should go the other way)
That's why it's important to break code into functions that do one and only one thing (at least in the beginning)
In this case is an overkill, but it will help you when your code grows more complex.
your function can then be simplified as:
def is_palindrome(input_string):
return input_string == reverse_string(input_string)
If you look at it is self explanatory. Is the input string equal to its reverse?
Now we need to implement the function reverse_string.
The advantage of having a function that just reverses a string is that we can do a lot of tests on it to check just this particular function
In your case, you can use negative indexes, or you can start with the index set to len(input_string)-1 and go towards 0.
But it's also a good moment to learn about string slicing and how to do things in a pythonic way, so the reverse function can be written as:
def reverse_string(input_string):
return input_string[::-1]
Feel free to put your own implementation of reverse_string if you are not yet confident with string slicing, but with this function you have separated two different things: reversing a string and checking is string is a palindrome. You can even reuse that reverse_string function later on.
Now we can test it with many cases until we are confident that it works as expected.
I'd recommend taking a look at unit tests it might seem too much for such an easy problem, but it will help you a lot in the future.
Just test what happens if you pass a palindrome, a non-palindrome, an empty string, a number, a None...

Palindrome vs Symmetry and how to deal with 2 word character

Can we say that a word with 2 characters are palindrome? like "oo" is palindrome and "go" is not?
I am going through a program which is detecting a palindrome from GeeksForGeeks, but it detects go as palindrome as well, though it is not:
# Function to check whether the
# string is plaindrome or not def palindrome(a):
# finding the mid, start
# and last index of the string
mid = (len(a)-1)//2
start = 0
last = len(a)-1
flag = 0
# A loop till the mid of the
# string
while(start<mid):
# comparing letters from right
# from the letters from left
if (a[start]== a[last]):
start += 1
last -= 1
else:
flag = 1
break;
# Checking the flag variable to
# check if the string is palindrome
# or not
if flag == 0:
print("The entered string is palindrome")
else:
print("The entered string is not palindrome")
# ... other code ...
# Driver code
string = 'amaama'
palindrome(string)
Is there any particular length or condition defined for a word to be a palindrome? I read the Wikipedia article, but did not find any particular condition on the length of a palindrome.
The above program detects "go" as palindrome because the midpoint is 0, which is "g" and the starting point is 0, which is also "g", and so it determines it is a palindrome. But I am still confused about the number of characters. Can a 2 number word be a palindrome? If yes, then do we need to just add a specific condition for it: if word[0] == word[1]?
Let's take a look at the definition of palindrome, according to Merriam-Webster:
a word, verse, or sentence (such as "Able was I ere I saw Elba") or a number (such as 1881) that reads the same backward or forward
Therefore, two-character words (or any even-numbered character words) can also be palindromes. The example code is simply poorly written and does not work correctly in the case of two-character strings. As you have correctly deduced, it sets the mid variable to 0 if the length of the string is 2. The loop, while (start < mid), is then instantly skipped, as start is also initialised as 0. Therefore, the flag variable (initialised as 0, corresponding to 'is a palindrome') is never changed, so the function incorrectly prints that go is a palindrome.
There are a number of ways in which you can adapt the algorithm; the simplest of which would be to simply check up to and including the middle character index, by changing the while condition to start <= mid. Note that this is only the simplest way to adapt the given code, the simplest piece of Python code to check whether a string is palindromic is significantly simpler (as you can easily reverse a string using a[::-1], and compare this to the original string).
(Edit to add: the other answer by trincot actually shows that the provided algorithm is incorrect for all even-numbered character words. The fix suggested in this answer still works.)
Your question is justified. The code from GeeksForGeeks you have referenced is not giving the correct result. In fact it also produces wrong results for longer words, like "gang".
The above program detects "go" as palindrome because the midpoint is 0, which is "g" and the starting point is 0, which is also "g", and so it determines it is a palindrome.
This is indeed where the algorithm goes wrong.
...then do we need to just add a specific condition for it: if word[0] == word[1]?
Given the while condition is start<mid, the midpoint should be the first index after the first half of the string that must be verified, and so in the case of a 2-letter word, the midpoint should be 1, not 0.
It is easy to correct the error in the program. Change:
mid = (len(a)-1)//2
To:
mid = len(a)//2
That fixes the issue. No extra line of code is needed to treat this as a separate case.
I did not find any particular condition on the length of a palindrome.
And right you are: there is no such condition. The GeeksForGeeks code made you doubt, but you were right from the start, and the code was wrong.

hackerrank password cracker timeout due to recursion

This problem simply restated is this: given a bunch of strings and a target string, what all combinations from the given string can combine together to form target string with and without repetition.
e.g.
strings: we do what we must because we can
target: wedowhatwemustbecausewecan
output: we do what we must because we can
Approach I took is to remove every longer word from the target until target becomes empty. If targets becomes empty then just return the output. If longer words doesn't lead to a solution then try with shorter words and so on. I am also using memoization to make sure that if the target is already tried then just return, same as backtracking with memoization.
This apporach passed all the testcases except 2, where I am getting timeout.
def recursions(word_map, paswd, output, remember):
flag = 0
if len(paswd) == 0:
return 1
if paswd in remember:
return flag
for char in paswd:
for word in (word_map[char] if char in word_map else []):
if paswd.startswith(word):
output.append(word + " ")
if recursions(word_map, paswd[len(word):], output, remember):
return 1
output.pop()
remember[paswd] = 1
return flag
Please help in providing a hint. Complete solution is here.
You could try dynamic programming approach where you mark the ending locations of each password. Start by trying every password at the beginning of the longer string. If it fits there mark down the ending position in the longer string. You could then repeat the same process for every location in longer string where previous location is marked as ending position.
Hope this helps you getting started, I've intentionally left out some of the information required for full solution so let me know in comments if you're still stuck.
EDIT Here's a short example on what I'm talking about. It doesn't allow you to reconstruct the solution but it shows how to do the matching without recursion:
passwords = ['foo', 'bar']
login = 'foobar'
ends_here = [False] * len(login)
for i in range(len(ends_here)):
# Match password at the beginning or if password match
# ended to previous index
if i == 0 or ends_here[i - 1]:
for pw in passwords:
if login.find(pw, i, i + len(pw)) != -1:
ends_here[i + len(pw) - 1] = True
print(ends_here)
print('We can match whole login attempt:', ends_here[-1])
Output:
[False, False, True, False, False, True]
We can match whole login attempt: True
EDIT Took a closer look at the code provided in the question. The issue is on the line where matched strings are filtered by the characters contained in the target: for char in paswd:. Instead of doing filtering for every character in the target string the filtering should be done for every unique character: for char in set(paswd):. Fix that and solution runs much faster but would probably be even faster if there wouldn't be that kind of filtering at all since the maximum number of shorter strings is 10.

How efficient is list.index(value, start, end)?

Today I realized that python's list.index can also take an optional start (and even end) parameter.
I was wondering whether or not this is efficiently implemented and which of these two is better:
pattern = "qwertyuytresdftyuioknn"
words_list = ['queen', 'quoin']
for word in words_list:
i = 1
for character in word:
try:
i += pattern[i:].index(character)
except ValueError:
break
else:
yield word
or
pattern = "qwertyuytresdftyuioknn"
words_list = ['queen', 'quoin']
for word in words_list:
i = 1
for character in word:
try:
i = pattern.index(character, i)
except ValueError:
break
else:
yield word
So basically i += pattern[i:].index(character) vs i = pattern.index(character, i).
Searching for this on generic_search_machine returns nothing helpful, except a lot of beginner tutorials trying to teach me what a list is.
Background:
This code tries to find all words from words_list which match pattern. pattern is a list of characters a user entered by swiping over the keyboard, like on most modern mobile device's keyboards.
In the actual implementation there is the additional requirement that the returned word should be longer than 5 characters and the first and last character have to exactly match. These lines are omitted here for brevity, since they are trivial to implement.
This calls a built-in function implemented in C:
i = pattern.index(character, i)
Even without looking at the source code, you can always assume that the underlying implementation is smart enough to implement that efficiently, i.e. that it does not look at the first i values in the list.
As a rule of thumb, using a built-in functionality is always faster than (or at least as fast as) the best thing you can implement yourself.
The attempt to make it better:
i += pattern[i:].index(character)
This is deffinitely worse. It makes a copy of pattern[i:] and then looks for character in it.
So, in the worst case, if you have a pattern of 1 GB and i=1, this copies 1 GB of data in memory in attempt to skip the first element (which whould have been skipped anyway).

Count occurrences of a given character in a string using recursion

I have to make a function called countLetterString(char, str) where
I need to use recursion to find the amount of times the given character appears in the string.
My code so far looks like this.
def countLetterString(char, str):
if not str:
return 0
else:
return 1 + countLetterString(char, str[1:])
All this does is count how many characters are in the string but I can't seem to figure out how to split the string then see whether the character is the character split.
The first step is to break this problem into pieces:
1. How do I determine if a character is in a string?
If you are doing this recursively you need to check if the first character of the string.
2. How do I compare two characters?
Python has a == operator that determines whether or not two things are equivalent
3. What do I do after I know whether or not the first character of the string matches or not?
You need to move on to the remainder of the string, yet somehow maintain a count of the characters you have seen so far. This is normally very easy with a for-loop because you can just declare a variable outside of it, but recursively you have to pass the state of the program to each new function call.
Here is an example where I compute the length of a string recursively:
def length(s):
if not s: # test if there are no more characters in the string
return 0
else: # maintain a count by adding 1 each time you return
# get all but the first character using a slice
return 1 + length( s[1:] )
from this example, see if you can complete your problem. Yours will have a single additional step.
4. When do I stop recursing?
This is always a question when dealing with recursion, when do I need to stop recalling myself. See if you can figure this one out.
EDIT:
not s will test if s is empty, because in Python the empty string "" evaluates to False; and not False == True
First of all, you shouldn't use str as a variable name as it will mask the built-in str type. Use something like s or text instead.
The if str == 0: line will not do what you expect, the correct way to check if a string is empty is with if not str: or if len(str) == 0: (the first method is preferred). See this answer for more info.
So now you have the base case of the recursion figured out, so what is the "step". You will either want to return 1 + countLetterString(...) or 0 + countLetterString(...) where you are calling countLetterString() with one less character. You will use the 1 if the character you remove matches char, or 0 otherwise. For example you could check to see if the first character from s matches char using s[0] == char.
To remove a single character in the string you can use slicing, so for the string s you can get all characters but the first using s[1:], or all characters but the last using s[:-1]. Hope that is enough to get you started!
Reasoning about recursion requires breaking the problem into "regular" and "special" cases. What are the special cases here? Well, if the string is empty, then char certainly isn't in the string. Return 0 in that case.
Are there other special cases? Not really! If the string isn't empty, you can break it into its first character (the_string[0]) and all the rest (the_string[1:]). Then you can recursively count the number of character occurrences in the rest, and add 1 if the first character equals the char you're looking for.
I assume this is an assignment, so I won't write the code for you. It's not hard. Note that your if str == 0: won't work: that's testing whether str is the integer 0. if len(str) == 0: is a way that will work, and if str == "": is another. There are shorter ways, but at this point those are probably clearest.
First of all you I would suggest not using char or str. Str is a built function/type and while I don't believe char would give you any problems, it's a reserved word in many other languages. Second you can achieve the same functionality using count, as in :
letterstring="This is a string!"
letterstring.count("i")
which would give you the number of occurrences of i in the given string, in this case 3.
If you need to do it purely for speculation, the thing to remember with recursion is carrying some condition or counter over which each call and placing some kind of conditional within the code that will change it. For example:
def countToZero(count):
print(str(count))
if count > 0:
countToZero(count-1)
Keep it mind this is a very quick example, but as you can see on each call I print the current value and then the function calls itself again while decrementing the count. Once the count is no longer greater than 0 the function will end.
Knowing this you will want to keep track of you count, the index you are comparing in the string, the character you are searching for, and the string itself given your example. Without doing the code for you, I think that should at least give you a start.
You have to decide a base case first. The point where the recursion unwinds and returns.
In this case the the base case would be the point where there are no (further) instances of a particular character, say X, in the string. (if string.find(X) == -1: return count) and the function makes no further calls to itself and returns with the number of instances it found, while trusting its previous caller information.
Recursion means a function calling itself from within, therefore creating a stack(at least in Python) of calls and every call is an individual and has a specified purpose with no knowledge whatsoever of what happened before it was called, unless provided, to which it adds its own result and returns(not strictly speaking). And this information has to be supplied by its invoker, its parent, or can be done using global variables which is not advisable.
So in this case that information is how many instances of that particular character were found by the parent function in the first fraction of the string. The initial function call, made by us, also needs to be supplied that information, since we are the root of all function calls and have no idea(as we haven't treaded the string) of how many Xs are there we can safely tell the initial call that since I haven't gone through the string and haven't found any or zero/0 X therefore here's the string entire string and could you please tread the rest of it and find out how many X are in there. This 0 as a convenience could be the default argument of the function, or you have to supply the 0 every time you make the call.
When will the function call another function?
Recursion is breaking down the task into the most granular level(strictly speaking, maybe) and leave the rest to the (grand)child(ren). The most granular break down of this task would be finding a single instance of X and passing the rest of the string from the point, exclusive(point + 1) at which it occurred to the next call, and adding 1 to the count which its parent function supplied it with.
if not string.find(X) == -1:
string = string[string.find(X) + 1:]
return countLetterString(char, string, count = count + 1)`
Counting X in file through iteration/loop.
It would involve opening the file(TextFILE), then text = read(TextFile)ing it, text is a string. Then looping over each character (for char in text:) , remember granularity, and each time char (equals) == X, increment count by +=1. Before you run the loop specify that you never went through the string and therefore your count for the number X (in text) was = 0. (Sounds familiar?)
return count.
#This function will print the count using recursion.
def countrec(s, c, cnt = 0):
if len(s) == 0:
print(cnt)
return 0
if s[-1] == c:
countrec(s[0:-1], c, cnt+1)
else:
countrec(s[0:-1], c, cnt)
#Function call
countrec('foobar', 'o')
With an extra parameter, the same function can be implemented.
Woking function code:
def countLetterString(char, str, count = 0):
if len(str) == 0:
return count
if str[-1] == char:
return countLetterString(char, str[0:-1], count+1)
else:
return countLetterString(char, str[0:-1], count)
The below function signature accepts 1 more parameter - count.
(P.S : I was presented this question where the function signature was pre-defined; just had to complete the logic.)
Hereby, the code :
def count_occurrences(s, substr, count=0):
''' s - indicates the string,
output : Returns the count of occurrences of substr found in s
'''
len_s = len(s)
len_substr = len(substr)
if len_s == 0:
return count
if len_s < len_substr:
return count
if substr == s[0:len_substr]:
count += 1
count = count_occurrences(s[1:], substr, count) ## RECURSIVE CALL
return count
output behavior :
count_occurences("hishiihisha", "hi", 0) => 3
count_occurences("xxAbx", "xx") => 1 (not mandatory to pass the count , since it's a positional arg.)

Categories