Count amount of comparison using naive string matching algorithm python - python

In my code I try to found index of the pattern, count every comparison and I also want to stop this algorithm then my pattern is too long for comparison to rest of the text. Something is wrong with counter and I couldn't stop algorithm. I am beginner and I don't have idea what to do.
def search(W, T):
count = 0
for i in range(len(T) - len(W) + 1):
if i > len(T)-len(W):
break
j = 0
while(j < len(W)):
if (T[i + j] != W[j]):
count += 1
break
j += 1
if (j == len(W)):
count += len(W)
print("Pattern found at index ", i, count)
if __name__ == '__main__':
T = "AABAACAADAABAAABAA"
W = "AABA"
search(W, T)
Thanks for helping.
I expect that someone will change my code or tell me what to do.

First of all, as you are a beginner in Python here's some tips:
You do not need brackets in if: if j == len(W): is ok, moreover these brackets are against PEP8 codestyle.
You don't need this condition:
if i > len(T)-len(W):
break
range() is smart guy. He won't let i be more then len(T)-len(W)
Now to your question, here's the solution (I hope, that I understand what do you want)
def search(W, T):
count = 0
for i in range(len(T) - len(W) + 1):
j = 0
while j < len(W):
if T[i + j] != W[j]:
count += 1
break
j += 1
count += 1
if j == len(W):
print("Pattern found at index ", i, count)
if __name__ == '__main__':
T = "AABAACAADAABAAABAA"
W = "AABA"
search(W, T)
The problem was that you increment your counter only when you founded the pattern or when you failed to find it. Here's example:
T = AAABAAA
W = AABA
In your solution counter would be incremented only on the 3d letter "A" while it should be incremented after each comparison.
I hope everything is clear. Good luck in studying!

Related

counting number of occurrence in string

I'm trying to count the number of times "bob" has occurred in a given string. this is what I tried:
s = input("give me a string:")
count = 0
for i in s:
if i=="b":
for j in s:
x=0
if j!="b":
x+=1
else:
break
if s[x+1]=="o" and s[x+2]=="b":
count+=1
print(count)
if I give the string "bob", it gives back 2, and if I give something like "jbhxbobalih", it gives back 0. I don't know why this happens. any idea?
The easiest manual count would probably use indeces and slices. The main difference between this and the much simpler s.count("bob") is that it also counts overlapping occurrences:
# s = "aboboba" -> 2
count = 0
for i, c in enumerate(s):
if s[i:i+3] == "bob":
count += 1
You can try checking 3 consecutive characters, if they are 'bob', just add our counter up, and do nothing otherwise.
Your code should be like this:
s = input("give me a string:")
count = 0
for i in range(0, len(s) - 3):
if s[i] == 'b' and s[i + 1] == 'o' and s[i + 2] == 'b':
count += 1
print(count)
100 % working this will work for all string.
import re
def check(string, sub_str):
count = 0
val = re.findall(sub_str, string)
for i in val:
count+=1
print(count)
# driver code
string = "baadbobaaaabobsasddswqbobdwqdwqsbob"
sub_str = "bob"
check(string, sub_str)
This gives the correct output.

Count the number of comparison when compare two lists of string python

I have homework that I cannot do correctly and not sure what's wrong with the code.
The exercise is :
with simple search or Brute force find how many comparisons we make:
We have 2 lists that contain letters(string) and we compare them.Print out how many comparisons are made.
Example:
pattern= ABABC
text= ABBABACABCBAC
how I tried :
def search(text,pattern):
text=list(text)
pattern=list(pattern)
n=len(text)
m=len(pattern)
co=1
l=0
k=0
while k<=m:
if text[l] == pattern[k]:
co+=1
l+=1
k+=1
else:
co+=1
l+=1
k=0
c=co
return "Simple matching made " + str(c) +"
comparisons"
It should be 16, because we compare by letters and its like 3+1+1+4+1+2+1+3
We get 3 by: A=A means +1, B=B means 1,
B is not A so we add +1 but shift by one in the text.
I scripted something that does what I think you are looking for, but I think you are missing a term at the end unless I did it wrong.
pattern = 'ABABC'
text = 'ABBABACABCBAC'
def search(text, pattern):
slices = len(text) - len(pattern)
for i in range(0, slices + 1):
count = 0
text_to_compare = text[i:i + len(pattern)]
for j in range(len(pattern)):
count += 1
if pattern[j] == text_to_compare[j]:
continue
else:
break
print("{} -> {}".format(text_to_compare, count))
search(text, pattern)
This outputs
ABBAB -> 3
BBABA -> 1
BABAC -> 1
ABACA -> 4
BACAB -> 1
ACABC -> 2
CABCB -> 1
ABCBA -> 3
BCBAC -> 1
It can be adapted for total count like:
def search(text, pattern):
total_count = 0
slices = len(text) - len(pattern)
for i in range(0, slices + 1):
count = 0
text_to_compare = text[i:i + len(pattern)]
for j in range(len(pattern)):
count += 1
total_count += 1
if pattern[j] == text_to_compare[j]:
continue
else:
break
print("{} -> {}".format(text_to_compare, count))
print("Total count: {}".format(total_count))
Which outputs the same as before but also with:
Total count: 17
Is this what you are looking for? I can explain what parts you don't understand :)

string comparison time complexity for advice

I'm working on a problem to find wholly repeated shortest substring of a given string, and if no match, return length of the string.
My major idea is learned from Juliana's answer here (Check if string is repetition of an unknown substring), I rewrite the algorithm in Python 2.7.
I think it should be O(n^2), but not confident I am correct, here is my thought -- since in the outer loop, it tries possibility of begin character to iterate with -- it is O(n) external loop, and in the inner loop, it compares character one by one -- it is O(n) internal comparison. So, overall time complexity is O(n^2)? If I am not correct, please help to correct. If I am correct, please help to confirm. :)
Input and output example
catcatcat => 3
aaaaaa=>1
aaaaaba = > 7
My code,
def rorate_solution(word):
for i in range(1, len(word)//2 + 1):
j = i
k = 0
while k < len(word):
if word[j] != word[k]:
break
j+=1
if j == len(word):
j = 0
k+=1
else:
return i
return len(word)
if __name__ == "__main__":
print rorate_solution('catcatcat')
print rorate_solution('catcatcatdog')
print rorate_solution('abaaba')
print rorate_solution('aaaaab')
print rorate_solution('aaaaaa')
Your assessment of the runtime of your re-write is correct.
But Use just the preprocessing of KMP to find the shortest period of a string.
(The re-write could be more simple:
def shortestPeriod(word):
"""the length of the shortest prefix p of word
such that word is a repetition p
"""
# try prefixes of increasing length
for i in xrange(1, len(word)//2 + 1):
j = i
while word[j] == word[j-i]:
j += 1
if len(word) <= j:
return i
return len(word)
if __name__ == "__main__":
for word in ('catcatcat', 'catcatcatdog',
'abaaba', 'ababbbababbbababbbababbb',
'aaaaab', 'aaaaaa'):
print shortestBase(word)
- yours compares word[0:i] to word[i:2*i] twice in a row.)

How to count the number of vowels in a string without a function wrapper

I've currently solving a MIT undergrad problem in Python 3.5.
The goal is to write a Python script counting and printing the number of vowels in a string containing only lower-case letters without using a function wrapper or even a function definition (stated in the assignment, weird ?).
def vowels_count(s):
i=0
counter = 0
while(s[i] != " "):
if s[i] == "a" or s[i] == "e" or s[i] == "i" or s[i] == "o" or s[i] == "u":
counter += 1
i = i + 1
return(counter)
I have two problems:
1/ first of, my own code using a while do structure meets a problem with the use of the index navigating from the first character to the last one. The debugger says: index out of range
2/ finally, if I have to comply with the MIT instructions, I would not be able to do anything in a single-line code without defining a function.
Thanks for your support
Why is this version not correct on the string index i ?
def vowels_count_1(s):
i = 0
counter = 0
while(s[i] != ""):
if s[i] == "a" or s[i] == "e" or s[i] == "i" or s[i] == "o" or s[i] == "u":
counter += 1
i += 1
print("Number of vowels: " + str(counter))
You can use the condition of i being less than the length of your string to break out of the while loop. I also recommend the easier approach of just checking if the letter at s[i] is in a string composed of vowels:
def vowels_count(s):
i = 0
counter = 0
while i < len(s):
if s[i] in 'aeiou':
counter += 1
i += 1
return counter
If you wanted to do this in one line, you could use the length of a list comprehension:
counter = len([c for c in s if c in 'aeiou'])
As you learn more and more you'll be able to count the vowels in one line using sum and a generation expression.
You could fix your loop while i < len(s), i.e. up to the length of the string, but much better is just to iterate over the sequence of characters we call "string".
for ch in s:
if ch == 'a' or ...
No indices needed. No i.
If you have learned the in operator already, you could simplify the test.
Without a function probably means this:
s = "the string"
# your code here
print("vowel count:", counter)
But I'm not sure ...
Here is an one line solution:
reduce(lambda t, c : (t + 1) if c in 'aeiou' else t, s.lower(), 0)

Finding the longest alphabetical substring in a longer string

This code finds the longest alphabetical substring in a string (s).
letter = s[0]
best = ''
for n in range(1, len(s)):
if len(letter) > len(best):
best = letter
if s[n] >= s[n-1]:
letter += s[n]
else:
letter = s[n]
It works most of the time, but occasionally it gives wrong answers and I am confused why it only works sometimes. for example when:
s='maezsibmhzxhpprvx'
It said the answer was "hpprv" while it should have been "hpprvx".
In another case, when
s='ysdxvkctcpxidnvaepz'
It gave the answer "cpx", when it should have been "aepz".
Can anyone make sense of why it does this?
You should move this check
if len(letter) > len(best):
best = letter
after the rest of the loop
Your routine was almost ok, here's a little comparison between yours, the fixed one and another possible solution to your problem:
def buggy(s):
letter = s[0]
best = ''
for n in range(1, len(s)):
if len(letter) > len(best):
best = letter
if s[n] >= s[n - 1]:
letter += s[n]
else:
letter = s[n]
return best
def fixed(s):
letter = s[0]
best = ''
for n in range(1, len(s)):
if s[n] >= s[n - 1]:
letter += s[n]
else:
letter = s[n]
if len(letter) > len(best):
best = letter
return best
def alternative(s):
result = ['' for i in range(len(s))]
index = 0
for i in range(len(s)):
if (i == len(s) - 1 and s[i] >= s[i - 1]) or s[i] <= s[i + 1]:
result[index] += s[i]
else:
result[index] += s[i]
index += 1
return max(result, key=len)
for sample in ['maezsibmhzxhpprvx', 'ysdxvkctcpxidnvaepz']:
o1, o2, o3 = buggy(sample), fixed(sample), alternative(sample)
print "buggy={0:10} fixed={1:10} alternative={2:10}".format(o1, o2, o3)
As you can see in your version the order of the inner loop conditional is not good, you should move the first conditional to the end of loop.
The logic is almost okay except that if letter grows on the last loop iteration (when n == len(s) - 1), best is not changed that last time. You may insert another best = letter part after the loop, or re-think carefully the program structure so you won't repeat yourself.

Categories