For example I have GolDeNSanDyWateRyBeaChSand and I need to find how many times the word sand appears.
text = input()
text = text.lower()
count = 0
if "sand" in text:
count += 1
print(count)
But the problem is that there is 2 sand in this string and when it found the first one it stops. Im a beginner in the programming.
You can simply use the str.count() method to count how many times a string appears in another string.
text = input()
text = text.lower()
count = text.count("sand")
To find every occurrence of a string pattern inside another string s, even nested occurrences, do the following:
s = "sandsandssands" # your string here
pattern = "sands" # your pattern here
pos = -1
end_of_string = False
while not end_of_string:
pos = s.find(pattern, pos+1)
print(pos)
end_of_string = (pos == -1)
Output
0
4
9
-1
Extending the solution offered by the OP.
The idea is to use find and move to towards the end of the string.
It is clear that count can be used here and the solution below is for educational purpose.
text = 'GolDeNSanDyWateRyBeaChSand'
word = 'sand'
ltext = text.lower()
offset = 0
counter = 0
while True:
idx = ltext.find(word, offset)
if idx == -1:
break
else:
counter += 1
offset = idx + len(word)
print(f'The word {word} was found {counter} times')
output
The word sand was found 2 times
Related
I'd like write code to find specific instances of words in a long string of text, where the letters making up the word are not adjacent, but consecutive.
The string I use will be thousands of characters long, but a as a shorter example... If I want to find instances of the word "chair" within the following string, where each letter is no more than 10 characters from the previous.
djecskjwidhl;asdjakimcoperkldrlkadkj
To avoid the problem of finding many instances in a large string, I'd prefer to limit the distance between every two letters to 10. So the word chair in the string abcCabcabcHabcAabdIabcR would count. But the word chair in the string abcCabcabcabcabcabcabcabcabHjdkeAlcndInadhR would not count.
Can I do this with python code? If so I'd appreciate an example that I could work with.
Maybe paste the string of text or use an input file? Have it search for the word or words I want, and then identify if those words are there?
Thanks.
This code below will do what you want:
will_find = "aaaaaaaaaaaaaaaaaaaaaaaabcCabcabcHabcAabdIabcR"
wont_find = "abcCabcabcabcabcabcabcabcabHjdkeAlcndInadhR"
looking_for = "CHAIR"
max_look = 10
def find_word(characters, word):
i = characters.find(word[0])
if i == -1:
print("I couldnt find the first character ...")
return False
for symbol in word:
print(characters[i:i + max_look+1])
if symbol in characters[i:i + max_look+1]:
i += characters[i: i + max_look+1].find(symbol)
print("{} is in the range of {} [{}]".format(symbol, characters[i:i+ max_look], i))
continue
else:
print("Couldnt find {} in {}".format(symbol, characters[i: i + max_look]))
return False
return True
find_word(will_find, looking_for)
print("--------")
find_word(wont_find, looking_for)
An alternative, this may also work for you.
long_string = 'djecskjwidhl;asdjakimcoperkldrlkadkj'
check_word = 'chair'
def substringChecker(longString, substring):
starting_index = []
n , derived_word = 0, substring[0]
for i, char in enumerate(longString[:-11]):
if char == substring[n] and substring[n + 1] in longString[i : i + 11]:
n += 1
derived_word += substring[n]
starting_index.append(i)
if len(derived_word) == len(substring):
return derived_word == substring, starting_index[0]
return False
print(substringChecker(long_string, check_word))
(True, 3)
To check if the word is there:
string = "abccabcabchabcaabdiabcr"
word = "chair"
while string or word:
index = string[:10].find(word[0])
if index > -1:
string = string[index+1:]
word = word[1:]
continue
if not word:
print("found")
else:
break
EDIT: there's more wrong with this than just an off-by-one error, it seems.
I've got an off-by-one error in the following simple algorithm which is supposed to display the count of letters in a string, along the lines of run-length encoding.
I can see why the last character is not added to the result string, but if I increase the range of i I get index out of range for obvious reasons.
I want to know what the conceptual issue is here from an algorithm design perspective, as well as just getting my code to work.
Do I need some special case code to handle the last item in the original string? Or maybe it makes more sense to be comparing the current character with the previous character, although that poses a problem at the beginning of the algorithm?
Is there a general approach to this kind of algorithm, where current elements are compared to previous/next elements, which avoids index out of range issues?
def encode(text):
# stores output string
encoding = ""
i = 0
while i < len(text) - 1:
# count occurrences of character at index i
count = 1
while text[i] == text[i + 1]:
count += 1
i += 1
# append current character and its count to the result
encoding += text[i] + str(count)
i += 1
return encoding
text = "Hello World"
print(encode(text))
# Gives H1e1l2o1 1W1o1r1l1
You're right, you should have while i < len(text) for the external loop to process the last character if it is different for the previous one (d in your case).
Your algorithm is then globally fine, but it will crash when looking for occurrences of the last character. At this point, text[i+1] becomes illegal.
To solve this, just add a safety check in the internal loop: while i+1 < len(text)
def encode(text):
# stores output string
encoding = ""
i = 0
while i < len(text):
# count occurrences of character at index i
count = 1
# FIX: check that we did not reach the end of the string
# while looking for occurences
while i+1 < len(text) and text[i] == text[i + 1]:
count += 1
i += 1
# append current character and its count to the result
encoding += text[i] + str(count)
i += 1
return encoding
text = "Hello World"
print(encode(text))
# Gives H1e1l2o1 1W1o1r1l1d1
If you keep your strategy, you'll have to check i+1 < len(text).
This gives something like:
def encode(text):
L = len(text)
start = 0
encoding = ''
while start < L:
c = text[start]
stop = start + 1
while stop < L and text[stop] == c:
stop += 1
encoding += c + str(stop - start)
start = stop
return encoding
Another way to do things, is to remember the start of each run:
def encode2(text):
start = 0
encoding = ''
for i,c in enumerate(text):
if c != text[start]:
encoding += text[start] + str(i-start)
start = i
if text:
encoding += text[start] + str(len(text)-start)
return encoding
This allows you to just enumerate the input which feels more pythonic.
I want to count the number of occurrences of the substring "bob" within the string s. I do this exercise for an edX Course.
s = 'azcbobobegghakl'
counter = 0
numofiterations = len(s)
position = 0
#loop that goes through the string char by char
for iteration in range(numofiterations):
if s[position] == "b": # search pos. for starting point
if s[position+1:position+2] == "ob": # check if complete
counter += 1
position +=1
print("Number of times bob occurs is: " + str(counter))
However it seems that the s[position+1:position+2] statement is not working properly. How do i adress the two chars behind a "b"?
The second slice index isn't included. It means that s[position+1:position+2] is a single character at position position + 1, and this substring cannot be equal to ob. See a related answer. You need [:position + 3]:
s = 'azcbobobegghakl'
counter = 0
numofiterations = len(s)
position = 0
#loop that goes through the string char by char
for iteration in range(numofiterations - 2):
if s[position] == "b": # search pos. for starting point
if s[position+1:position+3] == "ob": # check if complete
counter += 1
position +=1
print("Number of times bob occurs is: " + str(counter))
# 2
You could use .find with an index:
s = 'azcbobobegghakl'
needle = 'bob'
idx = -1; cnt = 0
while True:
idx = s.find(needle, idx+1)
if idx >= 0:
cnt += 1
else:
break
print("{} was found {} times.".format(needle, cnt))
# bob was found 2 times.
Eric's answer explains perfectly why your approach didn't work (slicing in Python is end-exclusive), but let me propose another option:
s = 'azcbobobegghakl'
substrings = [s[i:] for i in range(0, len(s))]
filtered_s = filter(substrings, lambda s: s.startswith("bob"))
result = len(filtered_s)
or simply
s = 'azcbobobegghakl'
result = sum(1 for ss in [s[i:] for i in range(0, len(s))] if ss.startswith("bob"))
I am currently working on python, and I do not understand this much. I am looking for help with this question, before the dictionaries. This question is to be completed without any dictionaries. The problem is I do not know much about the max function.
So Far I have:
AlphaCount = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
Alpha = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
for ch in text:
ch = ch.upper()
index=Alpha.find(ch)
if index >-1:
AlphaCount[index] = AlphaCount[index]+1
You can use Counter
from collections import Counter
foo = 'wubalubadubdub'
Counter(list(foo))
To get the most frequent letter
Counter(list(foo)).most_common(1)
You can use set which will get only unique characters from the input. Then iterate over them and count how many times it occurs in the input with count. If it occurs more often then the max and isalpha (not a space) then set max to the count.
text='This is a test of tons of tall tales'
un=set(text.upper())
max=0
fav=''
for u in un:
c=text.upper().count(u)
if c>max and u.isalpha():
max=c
fav=u
print(fav) # T
print(max) # 6
EDIT
To do this from your code: fix capitalization(for, if) and then find and print/return the most common letter. Also AlphaCount has an extra 0, you only need 26.
text='This is a test of tons of tall talez'
AlphaCount=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
Alpha='ABCDEFGHIJKLMNOPQRSTUVWXYZ'
for ch in text:
ch= ch.upper()
index=Alpha.find(ch)
if index >-1:
AlphaCount[index]+=1
print(AlphaCount) # the count of characters
print(max(AlphaCount)) # max value in list
print(AlphaCount.index(max(AlphaCount))) # index of max value
print(Alpha[AlphaCount.index(max(AlphaCount))]) # letter that occurs most frequently
def main():
string = input('Enter a sentence: ')
strings=string.lower()
counter = 0
total_counter = 0
most_frequent_character = ""
for ch in strings:
for str in strings:
if str == ch:
counter += 1
if counter > total_counter:
total_counter = counter
most_frequent_character = ch
counter = 0
print("The most frequent character is", most_frequent_character, "and it appears", total_counter, "times.")
main()
I'm trying to create a function where you can put in a phrase such as "ana" in the word "banana", and count how many times it finds the phrase in the word. I can't find the error I'm making for some of my test units not to work.
def test(actual, expected):
""" Compare the actual to the expected value,
and print a suitable message.
"""
import sys
linenum = sys._getframe(1).f_lineno # get the caller's line number.
if (expected == actual):
msg = "Test on line {0} passed.".format(linenum)
else:
msg = ("Test on line {0} failed. Expected '{1}', but got '{2}'.".format(linenum, expected, actual))
print(msg)
def count(phrase, word):
count1 = 0
num_phrase = len(phrase)
num_letters = len(word)
for i in range(num_letters):
for x in word[i:i+num_phrase]:
if phrase in word:
count1 += 1
else:
continue
return count1
def test_suite():
test(count('is', 'Mississippi'), 2)
test(count('an', 'banana'), 2)
test(count('ana', 'banana'), 2)
test(count('nana', 'banana'), 1)
test(count('nanan', 'banana'), 0)
test(count('aaa', 'aaaaaa'), 4)
test_suite()
Changing your count function to the following passes the tests:
def count(phrase, word):
count1 = 0
num_phrase = len(phrase)
num_letters = len(word)
for i in range(num_letters):
if word[i:i+num_phrase] == phrase:
count1 += 1
return count1
Use str.count(substring). This will return how many times the substring occurs in the full string (str).
Here is an interactive session showing how it works:
>>> 'Mississippi'.count('is')
2
>>> 'banana'.count('an')
2
>>> 'banana'.count('ana')
1
>>> 'banana'.count('nana')
1
>>> 'banana'.count('nanan')
0
>>> 'aaaaaa'.count('aaa')
2
>>>
As you can see, the function is non-overlapping. If you need overlapping behaviour, look here: string count with overlapping occurrences
You're using the iteration wrong, so:
for i in range(num_letters): #This will go from 1, 2, ---> len(word)
for x in word[i:i+num_phrase]:
#This will give you the letters starting from word[i] to [i_num_phrase]
#but one by one, so : for i in 'dada': will give you 'd' 'a' 'd' 'a'
if phrase in word: #This condition doesnt make sense in your problem,
#if it's true it will hold true trough all the
#iteration and count will be
#len(word) * num_phrase,
#and if it's false it will return 0
count1 += 1
else:
continue
I guess, str.count(substring) is wrong solution, because it doesn't count overlapping substrings and test suite fails.
There is also builtin str.find method, which could be helpful for the task.
Another way :
def count(sequence,item) :
count = 0
for x in sequence :
if x == item :
count = count+1
return count
A basic question rais this times.
when u see a string like "isisisisisi" howmany "isi" do u count?
at first state you see the string "isi s isi s isi" and return 3 as count.
at the second state you see the string "isisisisisi" and counts the "i" tow times per phrase like this "isi isi isi isi isi".
In other word second 'i' is last character of first 'isi' and first character of second 'isi'.
so you have to return 5 as count.
for first state simply can use:
>>> string = "isisisisisi"
>>> string.count("isi")
3
and for second state you have to recognize the "phrase"+"anything"+"phrase" in the search keyword.
the below function can do it:
def find_iterate(Str):
i = 1
cnt = 0
while Str[i-1] == Str[-i] and i < len(Str)/2:
i += 1
cnt += 1
return Str[0:cnt+1]
Now you have many choice to count the search keyword in the string.
for example I do such below:
if __name__ == "__main__":
search_keyword = "isi"
String = "isisisisisi"
itterated_part = find_iterate(search_keyword)
c = 0
while search_keyword in String:
c += String.count(search_keyword)
String = String.replace(search_keyword, itterated_part)
print c
I do not know if a better way be in python.but I tried to do this with help of Regular Expressions but found no way.