Python: Find X to Y in a list of strings

Python: Find X to Y in a list of strings - python

I have a list of maybe a 100 or so elements that is actually an email with each line as an element. The list is slightly variable because lines that have a \n in them are put in a separate element so I can't simply slice using fixed values. I essentially need a variable start and stop phrase (needs to be a partial search as well because one of my start phrases might actually be Total Cost: $13.43 so I would just use Total Cost:.) Same thing with the end phrase. I also do not wish to include the start/stop phrases in the returned list. In summary:
>>> email = ['apples','bananas','cats','dogs','elephants','fish','gee']
>>> start = 'ban'
>>> stop = 'ele'
# the magic here
>>> print new_email
['cats', 'dogs']
NOTES
While not perfect formatting of the email, it is fairly consistent so there is a slim chance a start/stop phrase will occur more than once.
There are also no blank elements.
SOLUTION
Just for funzies and thanks to everybody's help here is my final code:
def get_elements_positions(stringList=list(), startPhrase=None, stopPhrase=None):
elementPositionStart, elementPositionStop = 0, -1
if startPhrase:
elementPositionStart = next((i for i, j in enumerate(stringList) if j.startswith(startPhrase)), 0)
if stopPhrase:
elementPositionStop = next((i for i, j in enumerate(stringList) if j.startswith(stopPhrase)), -1)
if elementPositionStart + 1 == elementPositionStop - 1:
return elementPositionStart + 1
else:
return [elementPositionStart, elementPositionStop]
It returns a list with the starting and ending element position and defaults to 0 and -1 if the respective value cannot be found. (0 being the first element and -1 being the last).
SOLUTION-B
I made a small change, now if the list is describing a start and stop position resulting in just 1 element between it returns that elements position as an integer instead of a list which you still get for multi-line returns.
Thanks again!

>>> email = ['apples','bananas','cats','dogs','elephants','fish','gee']
>>> start, stop = 'ban', 'ele'
>>> ind_s = next(i for i, j in enumerate(email) if j.startswith(start))
>>> ind_e = next(i for i, j in enumerate(email) if j.startswith(stop) and i > ind_s)
>>> email[ind_s+1:ind_e]
['cats', 'dogs']
To satisfy conditions when element might not be in the list:
>>> def get_ind(prefix, prev=-1):
it = (i for i, j in enumerate(email) if i > prev and j.startswith(prefix))
return next(it, None)
>>> start = get_ind('ban')
>>> start = -1 if start is None else start
>>> stop = get_ind('ele', start)
>>> email[start+1:stop]
['cats', 'dogs']

An itertools based approach:
import itertools
email = ['apples','bananas','cats','dogs','elephants','fish','gee']
start, stop = 'ban', 'ele'
findstart = itertools.dropwhile(lambda item: not item.startswith(start), email)
findstop = itertools.takewhile(lambda item: not item.startswith(stop), findstart)
print list(findstop)[1:]
// ['cats', 'dogs']

Here you go:
>>> email = ['apples','bananas','cats','dogs','elephants','fish','gee']
>>> start = 'ban'
>>> stop = 'ele'
>>> out = []
>>> appending = False
>>> for item in email:
... if appending:
... if stop in item:
... out.append(item)
... break
... else:
... out.append(item)
... elif start in item:
... out.append(item)
... appending = True
...
>>> out.pop(0)
'bananas'
>>> out.pop()
'elephants'
>>> print out
['cats', 'dogs']
I think my version is much more readable than the other answers and doesn't require any imports =)

Related

How to get the index of a repeating element in list?

I wanted to make a Japanese transliteration program.
I won't explain the details, but some characters in pairs have different values than if they were separated, so I made a loop that gets two characters (current and next)
b = "きゃきゃ"
b = list(b)
name = ""
for i in b:
if b.index(i) + 1 <= len(b) - 1:
if i in "き / キ" and b[b.index(i) + 1] in "ゃ ャ":
if b[b.index(i) + 1] != " ":
del b[b.index(i) + 1]
del b[int(b.index(i))]
cur = "kya"
name += cur
print(name)
but it always automatically giving an index 0 to "き", so i can't check it more than once.
How can i change that?
I tried to delete an element after analyzing it.... but it didn't help.

Rather than looking ahead a character, it may be easier to store a reference to the previous character, and replacing the previous transliteration if you found a combo match.
Example (I'm not sure if I got all of the transliterations correct):
COMBOS = {('き', 'ゃ'): 'kya', ('き', 'ャ'): 'kya', ('キ', 'ゃ'): 'kya', ('キ', 'ャ'): 'kya'}
TRANSLITERATIONS = {'き': 'ki', 'キ': 'ki', 'ャ': 'ya', 'ゃ': 'ya'}
def transliterate(text: str) -> str:
transliterated = []
last = None
for c in text:
try:
combo = COMBOS[(last, c)]
except KeyError:
transliterated.append(TRANSLITERATIONS.get(c, c))
else:
transliterated.pop() # remove the last value that was added
transliterated.append(combo)
last = c
return ''.join(transliterated) # combine the transliterations into a single str
That being said, rather than re-inventing the wheel, it may make more sense to use an existing library that already handles transliterating Japanese to romaji, such as Pykakasi.
Example:
>>> import pykakasi
>>> kks = pykakasi.kakasi()
>>> kks.convert('きゃ')
[{'orig': 'きゃ', 'hira': 'きゃ', 'kana': 'キャ', 'hepburn': 'kya', 'kunrei': 'kya', 'passport': 'kya'}]

if you are looking for the indices of 'き':
b = "きゃきゃ"
b = list(b)
indices = [i for i, x in enumerate(b) if x == "き"]
print(indices)
[0, 2]

Python - removing repeated letters in a string

Say I have a string in alphabetical order, based on the amount of times that a letter repeats.
Example: "BBBAADDC".
There are 3 B's, so they go at the start, 2 A's and 2 D's, so the A's go in front of the D's because they are in alphabetical order, and 1 C. Another example would be CCCCAAABBDDAB.
Note that there can be 4 letters in the middle somewhere (i.e. CCCC), as there could be 2 pairs of 2 letters.
However, let's say I can only have n letters in a row. For example, if n = 3 in the second example, then I would have to omit one "C" from the first substring of 4 C's, because there can only be a maximum of 3 of the same letters in a row.
Another example would be the string "CCCDDDAABC"; if n = 2, I would have to remove one C and one D to get the string CCDDAABC
Example input/output:
n=2: Input: AAABBCCCCDE, Output: AABBCCDE
n=4: Input: EEEEEFFFFGGG, Output: EEEEFFFFGGG
n=1: Input: XXYYZZ, Output: XYZ
How can I do this with Python? Thanks in advance!
This is what I have right now, although I'm not sure if it's on the right track. Here, z is the length of the string.
for k in range(z+1):
if final_string[k] == final_string[k+1] == final_string[k+2] == final_string[k+3]:
final_string = final_string.translate({ord(final_string[k]): None})
return final_string

Ok, based on your comment, you're either pre-sorting the string or it doesn't need to be sorted by the function you're trying to create. You can do this more easily with itertools.groupby():
import itertools
def max_seq(text, n=1):
result = []
for k, g in itertools.groupby(text):
result.extend(list(g)[:n])
return ''.join(result)
max_seq('AAABBCCCCDE', 2)
# 'AABBCCDE'
max_seq('EEEEEFFFFGGG', 4)
# 'EEEEFFFFGGG'
max_seq('XXYYZZ')
# 'XYZ'
max_seq('CCCDDDAABC', 2)
# 'CCDDAABC'
In each group g, it's expanded and then sliced until n elements (the [:n] part) so you get each letter at most n times in a row. If the same letter appears elsewhere, it's treated as an independent sequence when counting n in a row.
Edit: Here's a shorter version, which may also perform better for very long strings. And while we're using itertools, this one additionally utilises itertools.chain.from_iterable() to create the flattened list of letters. And since each of these is a generator, it's only evaluated/expanded at the last line:
import itertools
def max_seq(text, n=1):
sequences = (list(g)[:n] for _, g in itertools.groupby(text))
letters = itertools.chain.from_iterable(sequences)
return ''.join(letters)

hello = "hello frrriend"
def replacing() -> str:
global hello
j = 0
for i in hello:
if j == 0:
pass
else:
if i == prev:
hello = hello.replace(i, "")
prev = i
prev = i
j += 1
return hello
replacing()
looks a bit primal but i think it works, thats what i came up with on the go anyways , hope it helps :D

Here's my solution:
def snip_string(string, n):
list_string = list(string)
list_string.sort()
chars = set(string)
for char in chars:
while list_string.count(char) > n:
list_string.remove(char)
return ''.join(list_string)
Calling the function with various values for n gives the following output:
>>> string = "AAAABBBCCCDDD"
>>> snip_string(string, 1)
'ABCD'
>>> snip_string(string, 2)
'AABBCCDD'
>>> snip_string(string, 3)
'AAABBBCCCDDD'
>>>
Edit
Here is the updated version of my solution, which only removes characters if the group of repeated characters exceeds n.
import itertools
def snip_string(string, n):
groups = [list(g) for k, g in itertools.groupby(string)]
string_list = []
for group in groups:
while len(group) > n:
del group[-1]
string_list.extend(group)
return ''.join(string_list)
Output:
>>> string = "DDDAABBBBCCABCDE"
>>> snip_string(string, 3)
'DDDAABBBCCABCDE'

from itertools import groupby
n = 2
def rem(string):
out = "".join(["".join(list(g)[:n]) for _, g in groupby(string)])
print(out)
So this is the entire code for your question.
s = "AABBCCDDEEE"
s2 = "AAAABBBDDDDDDD"
s3 = "CCCCAAABBDDABBB"
s4 = "AAAAAAAA"
z = "AAABBCCCCDE"
With following test:
AABBCCDDEE
AABBDD
CCAABBDDABB
AA
AABBCCDE

How to speed up combination algorithm?

Code below finds minimum items of list B that forms string A. lets assume A='hello world how are you doing' and B=['hello world how', 'hello are' ,'hello', 'hello are you doing']. Then since items with index 0 and 3 contains all words of string A, the answer will be 2.
I converted all the strings to integer to speed up the algorithm, but since there are larger and complicated test cases I need more optimized algorithm. I wondering how to speed up this algorithm.
import itertools
A='hello world how are you doing'
B=['hello world how', 'hello are' ,'hello', 'hello are you doing']
d = {}
res_A = [d.setdefault(word, len(d)+1) for word in A.lower().split()]
mapping = dict(zip(A.split(), range(1, len(A) + 1)))
# find mappings of words in B
res_B = [[mapping[word] for word in s.split()] for s in B]
set_a = set(res_A)
solved = False
for L in range(0, len(res_B)+1):
for subset in itertools.combinations(res_B, L):
s = set(item for sublist in subset for item in sublist)
if set_a.issubset(s):
print(f'{L}')
solved = True
break
if solved: break

I Had a logic mistake on remove_sub, no idea why it still worked
try cleaning the data and reducing as much items from b
import itertools as it
import time
import numpy as np
from collections import Counter, defaultdict as dd
import copy
A='hello world how are you doing'
B=['hello world how', 'hello are' ,'hello', 'hello are you doing']
d = {}
res_A = [d.setdefault(word, len(d)+1) for word in A.lower().split()
mapping = dict(zip(A.split(), range(1, len(A) + 1)))
# find mappings of words in B
res_B = [[mapping[word] for word in s.split()] for s in B]
set_a = set(res_A)
# my adding works on list of sets
for i in range(len(res_B)):
res_B[i] = set(res_B[i])
# a is a list of numbers, b is a list of sets of numbers, we are trying to cover a using min items from b
a = np.random.randint(0,50,size = 30)
np_set_a = set(a)
b = []
for i in range(200):
size = np.random.randint(0,20)
b.append(set(np.random.choice(a,size)))
# till here, created a,b for larger data test
def f1(set_a, b):
solved = False
for L in range(0, len(b)+1):
for subset in it.combinations(b, L):
s = set(item for sublist in subset for item in sublist)
if set_a.issubset(s):
print(f'{L}','**************f1')
solved = True
break
if solved: break
def rare(b):
c = Counter() #a dict where the key is a num and the value is how many times this num appears on all b sets
items = dd(list) # dict where the key is num and value is list of index where this num exist in b
for i in range(len(b)):
c.update(b[i])
for num in b[i]:
items[num].append(i)
rare = set()
common = c.most_common() #return sorted list of tuples with a number and how many times it appear
for i in range(1,len(common)-1): #take all the numbers that appear only once on b, these items will have to be on the final combination so you can remove them from b and their numbers from a because those numbers are covered
if common[-i][1] ==1:
rare.add(common[0])
continue
break
rare_items = {} # a set of all index that have rare number in them
for k in rare:
rare_items.update(items[k])
values_from_rare_items = set() # a set of all the numbers in the items with the rare numbers
for i in rare_items:
values_from_rare_items.update(b[i])
for i in reversed(sorted(rare_items)): #remove from b all the items with rare numbers, because they have to be on the final combination, you dont need to check them
b.pop(i)
return values_from_rare_items,b, len(rare_items)
#check sets on b, if 2 are equal remove 1, if 1 is a subset of the other, remove it
def remove_sub(b):
to_pop = set()
t = copy.deepcopy(b)
for i in range(len(b)):
for j in range(len(t)):
if i ==j:
continue
if b[i] == t[j]:
to_pop.add(i)
continue
if b[i].issubset(t[j]):
to_pop.add(i)
if t[j].issubset(b[i]):
to_pop.add(j)
for i in reversed(sorted(to_pop)):
b.pop(i)
return b
def f2(set_a, b):
b1 = remove_sub(b)
values_from_rare_items,b2, num_rare_items = rare(b)
a_without_rare = set_a-values_from_rare_items #remove from a all the number you added with the rare unique numbers, they are already covered
solved = False
for L in range(0, len(b2)+1):
for subset in it.combinations(b2, L):
s = set(item for sublist in subset for item in sublist)
if a_without_rare.issubset(s):
length = L+num_rare_items
print(f'{length}', "*********f2")
solved = True
break
if solved: break
s = time.time()
f1(set_a,b)
print(time.time()-s,'********************f1')
s = time.time()
f2(set_a,b)
print(time.time()-s,'******************f2')
s = time.time()
f1(set_a,res_B)
print(time.time()-s,'********************f1')
s = time.time()
f2(set_a,res_B)
print(time.time()-s,'******************f2')
this is the out put
2 **************f1
0.16755199432373047 ********************f1 num_array
2 *********f2
0.09078240394592285 ******************f2 num_array
2 **************f1
0.0009989738464355469 ********************f1 your_data
2 *********f2
0.0009975433349609375 ******************f2 your_data
you can improve it more by taking all item that appear just few times, and treat them as if they appear once, in rare cases it will not be the real min number, but the time improvement is significant

python - string match only whole words

I have two lists - query and line. My code finds if a query such as:
["president" ,"publicly"]
Is contained in a line (order matters) such as:
["president" ,"publicly", "told"]
And this is the code I'm currently using:
if ' '.join(query) in ' '.join(line)
Problem is, I want to match whole words only. So the query below won't pass the condition statement:
["president" ,"pub"]
How can I do that?

Here is one way:
re.search(r'\b' + re.escape(' '.join(query)) + r'\b', ' '.join(line)) is not None

Just use the "in" operator:
mylist = ['foo', 'bar', 'baz']
'foo' in mylist -> returns True
'bar' in mylist -> returns True
'fo' in mylist -> returns False
'ba' in mylist -> returns False

You could use regexes and the \b word boundaries:
import re
the_regex = re.compile(r'\b' + r'\b'.join(map(re.escape, ['president', 'pub'])) + r'\b')
if the_regex.search(' '.join(line)):
print 'matching'
else:
print 'not matching'
As an alternative you can write a function to check if a given list is a sublist of the line. Something like:
def find_sublist(sub, lst):
if not sub:
return 0
cur_index = 0
while cur_index < len(lst):
try:
cur_index = lst.index(sub[0], cur_index)
except ValueError:
break
if lst[cur_index:cur_index + len(sub)] == sub:
break
lst = lst[cur_index + 1:]
return cur_index
Which you can use as:
if find_sublist(query, line) >= 0:
print 'matching'
else:
print 'not matching'

Just for fun you can also do:
a = ["president" ,"publicly", "told"]
b = ["president" ,"publicly"]
c = ["president" ,"pub"]
d = ["publicly", "president"]
e = ["publicly", "told"]
from itertools import izip
not [l for l,n in izip(a, b) if l != n] ## True
not [l for l,n in izip(a, c) if l != n] ## False
not [l for l,n in izip(a, d) if l != n] ## False
## to support query in the middle of the line:
try:
query_list = a[a.index(e[0]):]
not [l for l,n in izip(query_list, e) if l != n] ## True
expect ValueError:
pass

you can use issubset method to achieve this. Simply do:
a = ["president" ,"publicly"]
b = ["president" ,"publicly", "told"]
if set(a).issubset(b):
#bla bla
this will return matching items in both lists.

You can use the all built in quantor function:
if all(word in b for word in a):
""" all words in list"""
Note that this may not be run time efficient for long lists. Better use set type instead of list for a (list list of words to search in).

Here is a non-regex way of doing it. I'm sure regex would be much faster than this:
>>> query = ['president', 'publicly']
>>> line = ['president', 'publicly', 'told']
>>> any(query == line[i:i+len(query)] for i in range(len(line) - len(query)))
True
>>> query = ["president" ,"pub"]
>>> any(query == line[i:i+len(query)] for i in range(len(line) - len(query)))
False

Explicit is better than implicit. And as ordering matters, I would write it down like this:
query = ['president','publicly']
query_false = ['president','pub']
line = ['president','publicly','told']
query_len = len(query)
blocks = [line[i:i+query_len] for i in xrange(len(line)-query_len+1)]
blocks holds all relevant combinations to check for:
[['president', 'publicly'], ['publicly', 'told']]
Now you can simply check if your query is in that list:
print query in blocks # -> True
print query_false in blocks # -> False
The code works the way you would probably explain the straight forward solution in words, which is usually a good sign to me. If you have long lines and performance becomes a problem, you can replace the generated list by a generator.

How to find all occurrences of a substring?

Python has string.find() and string.rfind() to get the index of a substring in a string.
I'm wondering whether there is something like string.find_all() which can return all found indexes (not only the first from the beginning or the first from the end).
For example:
string = "test test test test"
print string.find('test') # 0
print string.rfind('test') # 15
#this is the goal
print string.find_all('test') # [0,5,10,15]
For counting the occurrences, see Count number of occurrences of a substring in a string.

There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:
import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]
If you want to find overlapping matches, lookahead will do that:
[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]
If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:
search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]
re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you're only iterating through the results once.

>>> help(str.find)
Help on method_descriptor:
find(...)
S.find(sub [,start [,end]]) -> int
Thus, we can build it ourselves:
def find_all(a_str, sub):
start = 0
while True:
start = a_str.find(sub, start)
if start == -1: return
yield start
start += len(sub) # use start += 1 to find overlapping matches
list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]
No temporary strings or regexes required.

Here's a (very inefficient) way to get all (i.e. even overlapping) matches:
>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

Use re.finditer:
import re
sentence = input("Give me a sentence ")
word = input("What word would you like to find ")
for match in re.finditer(word, sentence):
print (match.start(), match.end())
For word = "this" and sentence = "this is a sentence this this" this will yield the output:
(0, 4)
(19, 23)
(24, 28)

Again, old thread, but here's my solution using a generator and plain str.find.
def findall(p, s):
'''Yields all the positions of
the pattern p in the string s.'''
i = s.find(p)
while i != -1:
yield i
i = s.find(p, i+1)
Example
x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]
returns
[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

You can use re.finditer() for non-overlapping matches.
>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]
but won't work for:
In [1]: aString="ababa"
In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]

Come, let us recurse together.
def locations_of_substring(string, substring):
"""Return a list of locations of a substring."""
substring_length = len(substring)
def recurse(locations_found, start):
location = string.find(substring, start)
if location != -1:
return recurse(locations_found + [location], location+substring_length)
else:
return locations_found
return recurse([], 0)
print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]
No need for regular expressions this way.

If you're just looking for a single character, this would work:
string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7
Also,
string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4
My hunch is that neither of these (especially #2) is terribly performant.

this is an old thread but i got interested and wanted to share my solution.
def find_all(a_string, sub):
result = []
k = 0
while k < len(a_string):
k = a_string.find(sub, k)
if k == -1:
return result
else:
result.append(k)
k += 1 #change to k += len(sub) to not search overlapping results
return result
It should return a list of positions where the substring was found.
Please comment if you see an error or room for improvment.

This does the trick for me using re.finditer
import re
text = 'This is sample text to test if this pythonic '\
'program can serve as an indexing platform for '\
'finding words in a paragraph. It can give '\
'values as to where the word is located with the '\
'different examples as stated'
# find all occurances of the word 'as' in the above text
find_the_word = re.finditer('as', text)
for match in find_the_word:
print('start {}, end {}, search string \'{}\''.
format(match.start(), match.end(), match.group()))

This thread is a little old but this worked for me:
numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"
marker = 0
while marker < len(numberString):
try:
print(numberString.index("five",marker))
marker = numberString.index("five", marker) + 1
except ValueError:
print("String not found")
marker = len(numberString)

You can try :
>>> string = "test test test test"
>>> for index,value in enumerate(string):
if string[index:index+(len("test"))] == "test":
print index
0
5
10
15

You can try :
import re
str1 = "This dress looks good; you have good taste in clothes."
substr = "good"
result = [_.start() for _ in re.finditer(substr, str1)]
# result = [17, 32]

When looking for a large amount of key words in a document, use flashtext
from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)
Flashtext runs faster than regex on large list of search words.

This function does not look at all positions inside the string, it does not waste compute resources. My try:
def findAll(string,word):
all_positions=[]
next_pos=-1
while True:
next_pos=string.find(word,next_pos+1)
if(next_pos<0):
break
all_positions.append(next_pos)
return all_positions
to use it call it like this:
result=findAll('this word is a big word man how many words are there?','word')

src = input() # we will find substring in this string
sub = input() # substring
res = []
pos = src.find(sub)
while pos != -1:
res.append(pos)
pos = src.find(sub, pos + 1)

Whatever the solutions provided by others are completely based on the available method find() or any available methods.
What is the core basic algorithm to find all the occurrences of a
substring in a string?
def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes
You can also inherit str class to new class and can use this function
below.
class newstr(str):
def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes
Calling the method
newstr.find_all('Do you find this answer helpful? then upvote
this!','this')

This is solution of a similar question from hackerrank. I hope this could help you.
import re
a = input()
b = input()
if b not in a:
print((-1,-1))
else:
#create two list as
start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
for i in range(len(start_indc)):
print((start_indc[i], start_indc[i]+len(b)-1))
Output:
aaadaa
aa
(0, 1)
(1, 2)
(4, 5)

Here's a solution that I came up with, using assignment expression (new feature since Python 3.8):
string = "test test test test"
phrase = "test"
start = -1
result = [(start := string.find(phrase, start + 1)) for _ in range(string.count(phrase))]
Output:
[0, 5, 10, 15]

I think the most clean way of solution is without libraries and yields:
def find_all_occurrences(string, sub):
index_of_occurrences = []
current_index = 0
while True:
current_index = string.find(sub, current_index)
if current_index == -1:
return index_of_occurrences
else:
index_of_occurrences.append(current_index)
current_index += len(sub)
find_all_occurrences(string, substr)
Note: find() method returns -1 when it can't find anything

The pythonic way would be:
mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]
# s represents the search string
# c represents the character string
find_all(mystring,'o') # will return all positions of 'o'
[4, 7, 20, 26]
>>>

if you only want to use numpy here is a solution
import numpy as np
S= "test test test test"
S2 = 'test'
inds = np.cumsum([len(k)+len(S2) for k in S.split(S2)[:-1]])- len(S2)
print(inds)

if you want to use without re(regex) then:
find_all = lambda _str,_w : [ i for i in range(len(_str)) if _str.startswith(_w,i) ]
string = "test test test test"
print( find_all(string, 'test') ) # >>> [0, 5, 10, 15]

please look at below code
#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''
def get_substring_indices(text, s):
result = [i for i in range(len(text)) if text.startswith(s, i)]
return result
if __name__ == '__main__':
text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
s = 'wood'
print get_substring_indices(text, s)

def find_index(string, let):
enumerated = [place for place, letter in enumerate(string) if letter == let]
return enumerated
for example :
find_index("hey doode find d", "d")
returns:
[4, 7, 13, 15]

Not exactly what OP asked but you could also use the split function to get a list of where all the substrings don't occur. OP didn't specify the end goal of the code but if your goal is to remove the substrings anyways then this could be a simple one-liner. There are probably more efficient ways to do this with larger strings; regular expressions would be preferable in that case
# Extract all non-substrings
s = "an-example-string"
s_no_dash = s.split('-')
# >>> s_no_dash
# ['an', 'example', 'string']
# Or extract and join them into a sentence
s_no_dash2 = ' '.join(s.split('-'))
# >>> s_no_dash2
# 'an example string'
Did a brief skim of other answers so apologies if this is already up there.

def count_substring(string, sub_string):
c=0
for i in range(0,len(string)-2):
if string[i:i+len(sub_string)] == sub_string:
c+=1
return c
if __name__ == '__main__':
string = input().strip()
sub_string = input().strip()
count = count_substring(string, sub_string)
print(count)

I runned in the same problem and did this:
hw = 'Hello oh World!'
list_hw = list(hw)
o_in_hw = []
while True:
o = hw.find('o')
if o != -1:
o_in_hw.append(o)
list_hw[o] = ' '
hw = ''.join(list_hw)
else:
print(o_in_hw)
break
Im pretty new at coding so you can probably simplify it (and if planned to used continuously of course make it a function).
All and all it works as intended for what i was doing.
Edit: Please consider this is for single characters only, and it will change your variable, so you have to create a copy of the string in a new variable to save it, i didnt put it in the code cause its easy and its only to show how i made it work.

By slicing we find all the combinations possible and append them in a list and find the number of times it occurs using count function
s=input()
n=len(s)
l=[]
f=input()
print(s[0])
for i in range(0,n):
for j in range(1,n+1):
l.append(s[i:j])
if f in l:
print(l.count(f))

To find all the occurence of a character in a give string and return as a dictionary
eg: hello
result :
{'h':1, 'e':1, 'l':2, 'o':1}
def count(string):
result = {}
if(string):
for i in string:
result[i] = string.count(i)
return result
return {}
or else you do like this
from collections import Counter
def count(string):
return Counter(string)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Find X to Y in a list of strings - python

Related

How to get the index of a repeating element in list?

Python - removing repeated letters in a string

How to speed up combination algorithm?

python - string match only whole words

How to find all occurrences of a substring?

Categories

Resources