Python - Remove All Occurences Of A Substring Within A String - python

There are 2 main rules of note for the function I am trying to make:
No use of modules are allowed
The substring must be obtained by a 'begin' and 'end' string.
The aim is to take a base, begin, and end string. Then, remove all text between those strings. This has to be for each occurrence, not just the first.
eg:
base is "yes_and_no___yes_and_no",
begin is "yes",
end is "no"
output: "yesno___yesno"
This is my code so far, however it only works for the first occurrence. Would a recursive implementation be ideal?
def extractFromString(baseStr, extStr1, extStr2):
if extStr1 and extStr2 in baseStr:
# >1. Get start/end indices
start = baseStr.find(extStr1) + len(extStr1)
end = baseStr.find(extStr2)
# >2. Get first/second halves
firstHalf = baseStr[:start]
secondHalf = baseStr[end:]
# >3. Combine and return
result = firstHalf + secondHalf
return result

extStr1 = "yes"
extStr2 = "no"
def extractFromString(baseStr, extStr1, extStr2):
if extStr1 in baseStr and extStr2 in baseStr:
# >1. Get start/end indices
start = baseStr.find(extStr1) + len(extStr1)
end = baseStr.find(extStr2, start)
if end == -1:
return baseStr
processStr = baseStr[:end+len(extStr2)]
queueStr = baseStr[end+len(extStr2):]
firstHalf = processStr[:start]
secondHalf = processStr[end:]
processStr = firstHalf + secondHalf
return processStr + extractFromString(queueStr, extStr1, extStr2)
else:
return baseStr
for exampleStr in exampleStrs:
print("input:")
print(exampleStr)
print("output:")
print(extractFromString(exampleStr, extStr1, extStr2))
print("\n")
gives the following output:
input:
yes_and_no___yes_and_no
output:
yesno___yesno
input:
aha_no_yes_deleteThis_no_no_no_yes
output:
aha_no_yesno_no_no_yes
input:
yes_yes_aha_no_no_yes_no_no
output:
yesno_no_yesno_no
input:
yes_yes_no_no
output:
yesno_no
this is done by splitting the string and recursively calling the function.
Check for the last example if this is the behaviour you want tho.

There's a problem with your if. if extStr1 and extStr2 in baseStr doesn't do what you think it does. You need to check if each substring is in the base string individually like if extStr1 in baseStr and extStr2 in baseStr
Instead of using loops or recursion, I'd suggest using regular expressions and re.sub()
First, we build a regex to match yes, then as few of any character as possible, and then no: yes.*?no Try it
Remember to escape() the input strings in case they contain special characters.
Next, we replace all occurrences of this regex with yesno.
import re
def extractFromString(baseStr, extStr1, extStr2):
rexp = re.compile(f"{re.escape(extStr1)}.*?{re.escape(extStr2)}")
return re.sub(rexp, extStr1 + extStr2, baseStr)
Running this with a bunch of inputs
extractFromString("yes_and_no___yes_and_no", "yes", "no")
# Output: 'yesno___yesno'
extractFromString("aha_no_yes_deleteThis_no_no_no_yes", "yes", "no")
# Output: 'aha_no_yesno_no_no_yes'
extractFromString("yes_yes_aha_no_no_yes_no_no", "yes", "no")
# Output: 'yesno_no_yesno_no'
extractFromString("yes_yes_no_no", "yes", "no")
# Output: 'yesno_no'

You can split the base string at every occurrence of your extStr2 first and then split it at the occurrence of extStr1
def extractFromString(baseStr, extStr1, extStr2):
final_str= ""
if extStr1 and extStr2 in baseStr:
base_subStr= baseStr.split(extStr2)
for index in range(0,len(base_subStr)):
if extStr1 not in base_subStr[index]:
final_str= final_str + base_subStr[index]
else:
final_str= final_str + base_subStr[index].split(extStr1)[0] + extStr2
I haven't run this, but this might work for your case

Related

How to replace every third word in a string with the # length equivalent

Input:
string = "My dear adventurer, do you understand the nature of the given discussion?"
expected output:
string = 'My dear ##########, do you ########## the nature ## the given ##########?'
How can you replace the third word in a string of words with the # length equivalent of that word while avoiding counting special characters found in the string such as apostrophes('), quotations("), full stops(.), commas(,), exclamations(!), question marks(?), colons(:) and semicolons (;).
I took the approach of converting the string to a list of elements but am finding difficulty filtering out the special characters and replacing the words with the # equivalent. Is there a better way to go about it?
I solved it with:
s = "My dear adventurer, do you understand the nature of the given discussion?"
def replace_alphabet_with_char(word: str, replacement: str) -> str:
new_word = []
alphabet = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
for c in word:
if c in alphabet:
new_word.append(replacement)
else:
new_word.append(c)
return "".join(new_word)
every_nth_word = 3
s_split = s.split(' ')
result = " ".join([replace_alphabet_with_char(s_split[i], '#') if i % every_nth_word == every_nth_word - 1 else s_split[i] for i in range(len(s_split))])
print(result)
Output:
My dear ##########, do you ########## the nature ## the given ##########?
There are more efficient ways to solve this question, but I hope this is the simplest!
My approach is:
Split the sentence into a list of the words
Using that, make a list of every third word.
Remove unwanted characters from this
Replace third words in original string with # times the length of the word.
Here's the code (explained in comments) :
# original line
line = "My dear adventurer, do you understand the nature of the given discussion?"
# printing original line
print(f'\n\nOriginal Line:\n"{line}"\n')
# printing somehting to indicate that next few prints will be for showing what is happenning after each lone
print('\n\nStages of parsing:')
# splitting by spaces, into list
wordList = line.split(' ')
# printing wordlist
print(wordList)
# making list of every third word
thirdWordList = [wordList[i-1] for i in range(1,len(wordList)+1) if i%3==0]
# pritning third-word list
print(thirdWordList)
# characters that you don't want hashed
unwantedCharacters = ['.','/','|','?','!','_','"',',','-','#','\n','\\',':',';','(',')','<','>','{','}','[',']','%','*','&','+']
# replacing these characters by empty strings in the list of third-words
for unwantedchar in unwantedCharacters:
for i in range(0,len(thirdWordList)):
thirdWordList[i] = thirdWordList[i].replace(unwantedchar,'')
# printing third word list, now without punctuation
print(thirdWordList)
# replacing with #
for word in thirdWordList:
line = line.replace(word,len(word)*'#')
# Voila! Printing the result:
print(f'\n\nFinal Output:\n"{line}"\n\n')
Hope this helps!
Following works and does not use regular expressions
special_chars = {'.','/','|','?','!','_','"',',','-','#','\n','\\'}
def format_word(w, fill):
if w[-1] in special_chars:
return fill*(len(w) - 1) + w[-1]
else:
return fill*len(w)
def obscure(string, every=3, fill='#'):
return ' '.join(
(format_word(w, fill) if (i+1) % every == 0 else w)
for (i, w) in enumerate(string.split())
)
Here are some example usage
In [15]: obscure(string)
Out[15]: 'My dear ##########, do you ########## the nature ## the given ##########?'
In [16]: obscure(string, 4)
Out[16]: 'My dear adventurer, ## you understand the ###### of the given ##########?'
In [17]: obscure(string, 3, '?')
Out[17]: 'My dear ??????????, do you ?????????? the nature ?? the given ???????????'
With help of some regex. Explanation in the comments.
import re
imp = "My dear adventurer, do you understand the nature of the given discussion?"
every_nth = 3 # in case you want to change this later
out_list = []
# split the input at spaces, enumerate the parts for looping
for idx, word in enumerate(imp.split(' ')):
# only do the special logic for multiples of n (0-indexed, thus +1)
if (idx + 1) % every_nth == 0:
# find how many special chars there are in the current segment
len_special_chars = len(re.findall(r'[.,!?:;\'"]', word))
# ^ add more special chars here if needed
# subtract the number of special chars from the length of segment
str_len = len(word) - len_special_chars
# repeat '#' for every non-special char and add the special chars
out_list.append('#'*str_len + word[-len_special_chars] if len_special_chars > 0 else '')
else:
# if the index is not a multiple of n, just add the word
out_list.append(word)
print(' '.join(out_list))
A mixed of regex and string manipulation
import re
string = "My dear adventurer, do you understand the nature of the given discussion?"
new_string = []
for i, s in enumerate(string.split()):
if (i+1) % 3 == 0:
s = re.sub(r'[^\.:,;\'"!\?]', '#', s)
new_string.append(s)
new_string = ' '.join(new_string)
print(new_string)

Given a string containing uppercase alphabets (A-Z), compress the string using Run Length encoding

Given a string containing uppercase alphabets (A-Z), compress the string using Run Length encoding. Repetition of character has to be replaced by storing the length of that run.
I tried the following codes
#Code 1: Tried on my own
def encode(message):
list1=[]
for i in range (0,len(message)):
count = 1
while(i < len(message)-1 and message[i]==message[i+1]):
count+=1
i+=1
list1=str(count)+message[i]
return list1
encoded_message=encode("ABBBBCCCCCCCCAB")
print(encoded_message)
Input:AAAABBBBCCCCCCCC
Expected Output: 4A4B8C
#code 2:I tried this by looking at another code based on run-length encoding
def encode(message):
list1=[]
count=1
for i in range (1,len(message)):
if(message[i]==message[i-1]):
count+=1
else:
list1.append((count,list1[i-1]))
count=1
if i == len(messege) - 1 :
list1.append((count , data[i]))
return list1
encoded_message=encode("ABBBBCCCCCCCCAB")
print(encoded_message)
Input:AAAABBBBCCCCCCCC
Expected Output: 4A4B8C
The first code gives output as 2B
def encode(message):
pairs = []
for char in message:
if len(pairs) > 0:
if pairs[-1][0] == char:
pairs[-1] = (char, pairs[-1][1] + 1)
else:
pairs.append((char, 1))
else:
pairs.append((char, 1))
strings = []
for letter, count in pairs:
strings.append(f"{count}{letter.upper()}")
return "".join(strings)
print(encode("ABBBBCCCCCCCCAB"))
print(encode("AAAABBBBCCCCCCCC"))
This outputs:
1A4B8C1A1B
4A4B8C
This is a very good use for the groupby function from itertools:
from itertools import groupby
message = 'AAAABBBBCCCCCCCC'
''.join('{}{}'.format(len(list(g)), c) for c, g in groupby(message))
Based on your code #2 method I have tweaked it to give out the output as you have in Expected Output: 4A4B8C
basically, your returning a tuple in a list so you needed to make it a string instead and add to it your also using data but have no data variable and your trying to find the content of the message, not your list so the code would be
def encode2(message):
encoded_return_message = ""
count=1
for i in range (1,len(message)):
if(message[i]==message[i-1]):
count+=1
else:
encoded_return_message += (f'{count}{message[i-1]}')
count=1
if i == len(message) - 1 :
encoded_return_message +=(f'{count}{message[i]}')
return encoded_return_message
encoded_message=encode2("ABBBBCCCCCCCCAB")
print(str(encoded_message))
I also did a demo on Repl.it
https://repl.it/repls/RowdyFloralwhiteBlockchain
Personally I would do that task using re module following way:
import re
text = 'AAAABBBBCCCCCCCC'
def sub_function(m):
span = m.span()
return f"{span[1]-span[0]}"+m.groups()[0]
out = re.sub(r'(\w)(\1*)',sub_function,text)
print(out)
Output:
4A4B8C
Explanation: pattern in re.sub is looking for letter followed by 0 or more occurences of same letter, than every such substring is feed to sub_function which calculate overall length of substring and return that value concatenated with first letter (which is same as all others) of substring. Note that I used so-called f-string in my code which is not available in older versions (I tested my code in Python 3.6.7), if you have to use older version you need to use other string formatting method. Note also that my code as is would replace single letter with digit 1 plus that letter for example input ABC would result in 1A1B1C, if you wish to retain single letters without adding 1 then change 1st argument of re.sub from r'(\w1)(\1*)' to r'(\w1)(\1+)'
Though maybe now I am the guy with hammer seeing nails everywhere.
def encode(message):
count=0
characters=''
previous_char=message[0]
result=''
length=len(message)
i=0
while(i!=length):
character=message[i]
if previous_char==character:
count=count+1
else:
result=result+str(count)+previous_char
count=1
previous_char=character
i=i+1
return result+str(count)+str(previous_char)
encoded_messsage=encode("ABBBBCCCCCCCCAB")
print(encoded_message)
Input is:ABBBBCCCCCCCCAB
output is:1A4B8C1A1B
def encodeString(s):
encoded = ""
ctr = 1
for i in range(len(s)-1):
if s[i]==s[i+1]:
ctr += 1
i += 1
else:
encoded = encoded + str(ctr) + s[i]
i += 1
ctr = 1
#print(encoded)
encoded = encoded + str(ctr) + s[i]
#print(encoded)
return encoded
Input :"AAAAABBCCDDAB"
Output: 5A2B2C4D1A1B
def encode(message):
list1=[]
count=1
for i in range (1,len(message)):
if(message[i].upper()==message[i-1].upper()):
count+=1
else:
list1.append(f"{count}{message[i-1].upper()}")
count=1
if i == len(message) - 1 :
list1.append(f"{count}{message[i].upper()}")
return "".join(list1)
encoded_message=encode("ABBBBCCCCCCCCAB")
print(encoded_message)

Pattern search by NOT using Regex algorithm and code in python

Today I had an interview at AMD and was asked a question which I didn't know how to solve it without Regex. Here is the question:
Find all the pattern for the word "Hello" in a text. Consider that there is only ONE char can be in between letters of hello e.g. search for all instances of "h.ello", "hell o", "he,llo", or "hel!lo".
Since you also tagged this question algorithm, I'm just going to show the general approach that I would take when looking at this question, without including any language tricks from python.
1) I would want to split the string into a list of words
2) Loop through each string in the resulting list, checking if the string matches 'hello' without the character at the current index (or if it simply matches 'hello')
3) If a match is found, return it.
Here is a simple approach in python:
s = "h.ello hello h!ello hell.o none of these"
all = s.split()
def drop_one(s, match):
if s == match:
return True # WARNING: Early Return
for i in range(len(s) - 1):
if s[:i] + s[i+1:] == match:
return True
matches = [x for x in all if drop_one(x, "hello")]
print(matches)
The output of this snippet:
['h.ello', 'hello', 'h!ello', 'hell.o']
This should work. I've tried to make it generic. You might have to make some minor adjustments. Let me know if you don't understand any part.
def checkValidity(tlist):
tmpVar = ''
for i in range(len(tlist)):
if tlist[i] in set("hello"):
tmpVar += tlist[i]
return(tmpVar == 'hello')
mStr = "he.llo hehellbo hellox hell.o hello helloxy abhell.oyz"
mWord = "hello"
mlen = len(mStr)
wordLen = len(mWord)+1
i=0
print ("given str = ", mStr)
while i<mlen:
tmpList = []
if mStr[i] == 'h':
for j in range(wordLen):
tmpList.append(mStr[i+j])
validFlag = checkValidity(tmpList)
if validFlag:
print("Match starting at index: ",i, ':', mStr[i:i+wordLen])
i += wordLen
else:
i += 1
else:
i += 1

Python regular expression to extract optional number at the end of string

I'm trying to write a Python regular expression that can parse strings of the type "<name>(<number>)", where <number> is optional.
For example, if I pass 'sclkout', then there is no number at the end, so it should just match 'sclkout'. If the input is 'line7', then is should match 'line' and '7'. The name can also contain numbers inside it, so if I give it 'dx3f', then the output should be 'dx3f', but for 'dx3b0' it should match 'dx3b' and 0.
This is what I first tried:
import re
def do_match(signal):
match = re.match('(\w+)(\d+)?', signal)
assert match
print "Input = " + signal
print "group1 = " + match.group(1)
if match.lastindex == 2:
print "group2 = " + match.group(2)
print ""
# should match 'sclkout'
do_match("sclkout")
# should match 'line' and '7'
do_match("line7")
# should match 'dx4f'
do_match("dx4f")
# should match 'dx3b' and '0'
do_match("dx3b0")
This is of course wrong because of greedy matching in the (\w+) group, so I tried setting that to non-greedy:
match = re.match('(\w+?)(\d+)?', signal)
This however only matches the first letter of the string.
You don't need regex for this:
from itertools import takewhile
def do_match(s):
num = ''.join(takewhile(str.isdigit, reversed(s)))[::-1]
return s[:s.rindex(num)], num
...
>>> do_match('sclkout')
('sclkout', '')
>>> do_match('line7')
('line', '7')
>>> do_match('dx4f')
('dx4f', '')
>>> do_match('dx3b0')
('dx3b', '0')
You can use a possessive quantifier like this:
^(?<name>\w+?)(?<number>\d+)?$
Or ^(\w+?)(\d+)?$, if you don't want the named capture groups.
See live demo here: http://rubular.com/r/44Ntc4mLDY
([a-zA-Z0-9]*[a-zA-Z]+)([0-9]*) is what you want.
import re
test = ["sclkout", "line7", "dx4f", "dx3b0"]
ans = [("sclkout", ""), ("line", "7"), ("dx4f", ""), ("dx3b", "0")]
for t, a in zip(test, ans):
m = re.match(r'([a-zA-Z0-9]*[a-zA-Z]+)([0-9]*)', t)
if m.groups() == a:
print "OK"
else:
print "NG"
output:
OK
OK
OK
OK

How to find all occurrences of a substring?

Python has string.find() and string.rfind() to get the index of a substring in a string.
I'm wondering whether there is something like string.find_all() which can return all found indexes (not only the first from the beginning or the first from the end).
For example:
string = "test test test test"
print string.find('test') # 0
print string.rfind('test') # 15
#this is the goal
print string.find_all('test') # [0,5,10,15]
For counting the occurrences, see Count number of occurrences of a substring in a string.
There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:
import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]
If you want to find overlapping matches, lookahead will do that:
[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]
If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:
search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]
re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you're only iterating through the results once.
>>> help(str.find)
Help on method_descriptor:
find(...)
S.find(sub [,start [,end]]) -> int
Thus, we can build it ourselves:
def find_all(a_str, sub):
start = 0
while True:
start = a_str.find(sub, start)
if start == -1: return
yield start
start += len(sub) # use start += 1 to find overlapping matches
list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]
No temporary strings or regexes required.
Here's a (very inefficient) way to get all (i.e. even overlapping) matches:
>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]
Use re.finditer:
import re
sentence = input("Give me a sentence ")
word = input("What word would you like to find ")
for match in re.finditer(word, sentence):
print (match.start(), match.end())
For word = "this" and sentence = "this is a sentence this this" this will yield the output:
(0, 4)
(19, 23)
(24, 28)
Again, old thread, but here's my solution using a generator and plain str.find.
def findall(p, s):
'''Yields all the positions of
the pattern p in the string s.'''
i = s.find(p)
while i != -1:
yield i
i = s.find(p, i+1)
Example
x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]
returns
[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]
You can use re.finditer() for non-overlapping matches.
>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]
but won't work for:
In [1]: aString="ababa"
In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]
Come, let us recurse together.
def locations_of_substring(string, substring):
"""Return a list of locations of a substring."""
substring_length = len(substring)
def recurse(locations_found, start):
location = string.find(substring, start)
if location != -1:
return recurse(locations_found + [location], location+substring_length)
else:
return locations_found
return recurse([], 0)
print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]
No need for regular expressions this way.
If you're just looking for a single character, this would work:
string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7
Also,
string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4
My hunch is that neither of these (especially #2) is terribly performant.
this is an old thread but i got interested and wanted to share my solution.
def find_all(a_string, sub):
result = []
k = 0
while k < len(a_string):
k = a_string.find(sub, k)
if k == -1:
return result
else:
result.append(k)
k += 1 #change to k += len(sub) to not search overlapping results
return result
It should return a list of positions where the substring was found.
Please comment if you see an error or room for improvment.
This does the trick for me using re.finditer
import re
text = 'This is sample text to test if this pythonic '\
'program can serve as an indexing platform for '\
'finding words in a paragraph. It can give '\
'values as to where the word is located with the '\
'different examples as stated'
# find all occurances of the word 'as' in the above text
find_the_word = re.finditer('as', text)
for match in find_the_word:
print('start {}, end {}, search string \'{}\''.
format(match.start(), match.end(), match.group()))
This thread is a little old but this worked for me:
numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"
marker = 0
while marker < len(numberString):
try:
print(numberString.index("five",marker))
marker = numberString.index("five", marker) + 1
except ValueError:
print("String not found")
marker = len(numberString)
You can try :
>>> string = "test test test test"
>>> for index,value in enumerate(string):
if string[index:index+(len("test"))] == "test":
print index
0
5
10
15
You can try :
import re
str1 = "This dress looks good; you have good taste in clothes."
substr = "good"
result = [_.start() for _ in re.finditer(substr, str1)]
# result = [17, 32]
When looking for a large amount of key words in a document, use flashtext
from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)
Flashtext runs faster than regex on large list of search words.
This function does not look at all positions inside the string, it does not waste compute resources. My try:
def findAll(string,word):
all_positions=[]
next_pos=-1
while True:
next_pos=string.find(word,next_pos+1)
if(next_pos<0):
break
all_positions.append(next_pos)
return all_positions
to use it call it like this:
result=findAll('this word is a big word man how many words are there?','word')
src = input() # we will find substring in this string
sub = input() # substring
res = []
pos = src.find(sub)
while pos != -1:
res.append(pos)
pos = src.find(sub, pos + 1)
Whatever the solutions provided by others are completely based on the available method find() or any available methods.
What is the core basic algorithm to find all the occurrences of a
substring in a string?
def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes
You can also inherit str class to new class and can use this function
below.
class newstr(str):
def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes
Calling the method
newstr.find_all('Do you find this answer helpful? then upvote
this!','this')
This is solution of a similar question from hackerrank. I hope this could help you.
import re
a = input()
b = input()
if b not in a:
print((-1,-1))
else:
#create two list as
start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
for i in range(len(start_indc)):
print((start_indc[i], start_indc[i]+len(b)-1))
Output:
aaadaa
aa
(0, 1)
(1, 2)
(4, 5)
Here's a solution that I came up with, using assignment expression (new feature since Python 3.8):
string = "test test test test"
phrase = "test"
start = -1
result = [(start := string.find(phrase, start + 1)) for _ in range(string.count(phrase))]
Output:
[0, 5, 10, 15]
I think the most clean way of solution is without libraries and yields:
def find_all_occurrences(string, sub):
index_of_occurrences = []
current_index = 0
while True:
current_index = string.find(sub, current_index)
if current_index == -1:
return index_of_occurrences
else:
index_of_occurrences.append(current_index)
current_index += len(sub)
find_all_occurrences(string, substr)
Note: find() method returns -1 when it can't find anything
The pythonic way would be:
mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]
# s represents the search string
# c represents the character string
find_all(mystring,'o') # will return all positions of 'o'
[4, 7, 20, 26]
>>>
if you only want to use numpy here is a solution
import numpy as np
S= "test test test test"
S2 = 'test'
inds = np.cumsum([len(k)+len(S2) for k in S.split(S2)[:-1]])- len(S2)
print(inds)
if you want to use without re(regex) then:
find_all = lambda _str,_w : [ i for i in range(len(_str)) if _str.startswith(_w,i) ]
string = "test test test test"
print( find_all(string, 'test') ) # >>> [0, 5, 10, 15]
please look at below code
#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''
def get_substring_indices(text, s):
result = [i for i in range(len(text)) if text.startswith(s, i)]
return result
if __name__ == '__main__':
text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
s = 'wood'
print get_substring_indices(text, s)
def find_index(string, let):
enumerated = [place for place, letter in enumerate(string) if letter == let]
return enumerated
for example :
find_index("hey doode find d", "d")
returns:
[4, 7, 13, 15]
Not exactly what OP asked but you could also use the split function to get a list of where all the substrings don't occur. OP didn't specify the end goal of the code but if your goal is to remove the substrings anyways then this could be a simple one-liner. There are probably more efficient ways to do this with larger strings; regular expressions would be preferable in that case
# Extract all non-substrings
s = "an-example-string"
s_no_dash = s.split('-')
# >>> s_no_dash
# ['an', 'example', 'string']
# Or extract and join them into a sentence
s_no_dash2 = ' '.join(s.split('-'))
# >>> s_no_dash2
# 'an example string'
Did a brief skim of other answers so apologies if this is already up there.
def count_substring(string, sub_string):
c=0
for i in range(0,len(string)-2):
if string[i:i+len(sub_string)] == sub_string:
c+=1
return c
if __name__ == '__main__':
string = input().strip()
sub_string = input().strip()
count = count_substring(string, sub_string)
print(count)
I runned in the same problem and did this:
hw = 'Hello oh World!'
list_hw = list(hw)
o_in_hw = []
while True:
o = hw.find('o')
if o != -1:
o_in_hw.append(o)
list_hw[o] = ' '
hw = ''.join(list_hw)
else:
print(o_in_hw)
break
Im pretty new at coding so you can probably simplify it (and if planned to used continuously of course make it a function).
All and all it works as intended for what i was doing.
Edit: Please consider this is for single characters only, and it will change your variable, so you have to create a copy of the string in a new variable to save it, i didnt put it in the code cause its easy and its only to show how i made it work.
By slicing we find all the combinations possible and append them in a list and find the number of times it occurs using count function
s=input()
n=len(s)
l=[]
f=input()
print(s[0])
for i in range(0,n):
for j in range(1,n+1):
l.append(s[i:j])
if f in l:
print(l.count(f))
To find all the occurence of a character in a give string and return as a dictionary
eg: hello
result :
{'h':1, 'e':1, 'l':2, 'o':1}
def count(string):
result = {}
if(string):
for i in string:
result[i] = string.count(i)
return result
return {}
or else you do like this
from collections import Counter
def count(string):
return Counter(string)

Categories