Python: how to count overlapping occurrences of a substring [duplicate]

Python: how to count overlapping occurrences of a substring [duplicate] - python

This question already has answers here:
How can I find the number of overlapping sequences in a String with Python? [duplicate]
(4 answers)
Closed 9 years ago.
I wanted to count the number of times that a string like 'aa' appears in 'aaa' (or 'aaaa').
The most obvious code gives the wrong (or at least, not the intuitive) answer:
'aaa'.count('aa')
1 # should be 2
'aaaa'.count('aa')
2 # should be 3
Does anyone have a simple way to fix this?

From str.count() documentation:
Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.
So, no. You are getting the expected result.
If you want to count number of overlapping matches, use regex:
>>> import re
>>>
>>> len(re.findall(r'(a)(?=\1)', 'aaa'))
2
This finds all the occurrence of a, which is followed by a. The 2nd a wouldn't be captured, as we've used look-ahead, which is zero-width assertion.

haystack = "aaaa"
needle = "aa"
matches = sum(haystack[i:i+len(needle)] == needle
for i in xrange(len(haystack)-len(needle)+1))
# for Python 3 use range instead of xrange

The solution is not taking overlap into consideration.
Try this:
big_string = "aaaa"
substring = "aaa"
count = 0
for char in range(len(big_string)):
count += big_string[char: char + len(subtring)] == substring
print count

You have to be careful, because you seem to looking for non-overlapping substrings. To fix this I'd do:
len([s.start() for s in re.finditer('(?=aa)', 'aaa')])
And if you don't care about the position where the substring starts you can do:
len([_ for s in re.finditer('(?=aa)', 'aaa')])
Although someone smarter than myself might be able to show that there are performances differences :)

Related

Check if any character in one string appears in another [duplicate]

This question already has answers here:
Find common characters between two strings
(5 answers)
Closed 2 months ago.
I have a string of text
hfHrpphHBppfTvmzgMmbLbgf
I have separated this string into two half's
hfHrpphHBppf,TvmzgMmbLbgf
I'd like to check if any of the characters in the first string, also appear in the second string, and would like to class lowercase and uppercase characters as separate (so if string 1 had a and string 2 had A this would not be a match).
and the above would return:
f

split_text = ['hfHrpphHBppf', 'TvmzgMmbLbgf']
for char in split_text[0]:
if char in split_text[1]:
print(char)
There is probably a better way to do it, but this a quick and simple way to do what you want.
Edit:
split_text = ['hfHrpphHBppf', 'TvmzgMmbLbgf']
found_chars = []
for char in split_text[0]:
if char in split_text[1] and char not in found_chars:
found_chars.append(char)
print(char)
There is almost certainly a better way of doing this, but this is a way of doing it with the answer I already gave

You could use the "in" word.
something like this :
for i in range(len(word1) :
if word1[i] in word2 :
print(word[i])
Not optimal, but it should print you all the letter in common

You can achieve this using set() and intersection
text = "hfHrpphHBppf,TvmzgMmbLbgf"
text = text.split(",")
print(set(text[0]).intersection(set(text[1])))

You can use list comprehension in order to check if letters from string a appears in string b.
a='hfHrpphHBppf'
b='TvmzgMmbLbgf'
c=[x for x in a if x in b]
print(' '.join(set(c)))
then output will be:
f
But you can use for,too. Like:
a='hfHrpphHBppf'
b='TvmzgMmbLbgf'
c=[]
for i in a:
if i in b:
c.append(i)
print(set(c))

Printing a String in Reverse After Extracting [duplicate]

This question already has answers here:
How do I reverse a string in Python?
(19 answers)
Closed 2 years ago.
I am trying to create a program in which the user inputs a statement containing two '!' surrounding a string. (example: hello all! this is a test! bye.) I am to grab the string within the two exclamation points, and print it in reverse letter by letter. I have been able to find the start and endpoints that contain the statement, however I am having difficulties creating an index that would cycle through my variable userstring in reverse and print.
test = input('Enter a string with two "!" surrounding portion of the string:')
expoint = test.find('!')
#print (expoint)
twoexpoint = test.find('!', expoint+1)
#print (twoexpoint)
userstring = test[expoint+1 : twoexpoint]
#print(userstring)
number = 0
while number < len(userstring) :
letter = [twoexpoint - 1]
print (letter)
number += 1

twoexpoint - 1 is the last index of the string you need relative to the input string. So what you need is to start from that index and reduce. In your while loop:
letter = test[twoexpoint- number - 1]
Each iteration you increase number which will reduce the index and reverse the string.
But this way you don't actually use the userstring you already found (except for the length...). Instead of caring for indexes, just reverse the userstring:
for letter in userstring[::-1]:
print(letter)

Explanation we use regex to find the pattern
then we loop for every occurance and we replace the occurance with the reversed string. We can reverse string in python with mystring[::-1] (works for lists too)
Python re documentation Very usefull and you will need it all the time down the coder road :). happy coding!
Very usefull article Check it out!
import re # I recommend using regex
def reverse_string(a):
matches = re.findall(r'\!(.*?)\!', a)
for match in matches:
print("Match found", match)
print("Match reversed", match[::-1])
for i in match[::-1]:
print(i)
In [3]: reverse_string('test test !test! !123asd!')
Match found test
Match reversed tset
t
s
e
t
Match found 123asd
Match reversed dsa321
d
s
a
3
2
1

You're overcomplicating it. Don't bother with an index, simply use reversed() on userstring to cycle through the characters themselves:
userstring = test[expoint+1:twoexpoint]
for letter in reversed(userstring):
print(letter)
Or use a reversed slice:
userstring = test[twoexpoint-1:expoint:-1]
for letter in userstring:
print(letter)

Replace sequence of chars in string with its length [duplicate]

This question already has answers here:
Python replace string pattern with output of function
(4 answers)
Closed 5 years ago.
Say I have the following string:
mystr = "6374696f6e20????28??????2c??2c????29"
And I want to replace every sequence of "??" with its length\2. So for the example above, I want to get the following result:
mystr = "6374696f6e2022832c12c229"
Meaning:
???? replaced with 2
?????? replaced with 3
?? replaced with 1
???? replaced with 2
I tried the following but I'm not sure it's the good approach, and anyway -- it doesn't work:
regex = re.compile('(\?+)')
matches = regex.findall(mystr)
if matches:
for match in matches:
match_length = len(match)/2
if (match_length > 0):
mystr= regex .sub(match_length , mystr)

You can use a callback function in Python's re.sub. FYI lambda expressions are shorthand to create anonymous functions.
See code in use here
import re
mystr = "6374696f6e20????28??????2c??2c????29"
regex = re.compile(r"\?+")
print(re.sub(regex, lambda m: str(int(len(m.group())/2)), mystr))
There seems to be uncertainty about what should happen in the case of ???. The above code will result in 1 since it converts to int. Without int conversion the result would be 1.0. If you want to ??? to become 1? you can use the pattern (?:\?{2})+ instead.

Python: re module to replace digits of telephone with asterisk [duplicate]

This question already has answers here:
re.sub replace with matched content
(4 answers)
Closed 8 years ago.
I want to replace the digits in the middle of telephone with regex but failed. Here is my code:
temp= re.sub(r'1([0-9]{1}[0-9])[0-9]{4}([0-9]{4})', repl=r'$1****$2', tel_phone)
print temp
In the output, it always shows:
$1****$2
But I want to show like this: 131****1234. How to accomplish it ? Thanks

I think you're trying to replace four digits present in the middle (four digits present before the last four digits) with ****
>>> s = "13111111234"
>>> temp= re.sub(r'^(1[0-9]{2})[0-9]{4}([0-9]{4})$', r'\1****\2', s)
>>> print temp
131****1234
You might have seen $1 in replacement string in other languages. However, in Python, use \1 instead of $1. For correctness, you also need to include the starting 1 in the first capturing group, so that the output also include the starting 1; otherwise, the starting 1 will be lost.

How to select only certain Substrings

from a string say dna = 'ATAGGGATAGGGAGAGAGCGATCGAGCTAG'
i got substring say dna.format = 'ATAGGGATAG','GGGAGAGAG'
i only want to print substring whose length is divisible by 3
how to do that? im using modulo but its not working !
import re
if mydna = 'ATAGGGATAGGGAGAGAGCAGATCGAGCTAG'
print re.findall("ATA"(.*?)"AGA" , mydna)
if len(mydna)%3 == 0
print mydna
corrected code
import re
mydna = 'ATAGGGATAGGGAGAGAGCAGATCGAGCTAG'
re.findall("ATA"(.*?)"AGA" , mydna.format)
if len(mydna.format)%3 == 0:
print mydna.format
this still doesnt give me substring with length divisible by three . . any idea whats wrong ?
im expecting only substrings which has length divisible by three to be printed

For including overlap substrings, I have the following lengthy version. The idea is to find all starting and ending marks and calculate the distance between them.
mydna = 'ATAGGGATAGGGAGAGAGCAGATCGAGCTAG'
[mydna[start.start():end.start()+3] for start in re.finditer('(?=ATA)',mydna) for end in re.finditer('(?=AGA)',mydna) if end.start()>start.start() and (end.start()-start.start())%3 == 0]
['ATAGGGATAGGG', 'ATAGGG']
Show all substrings, including overlapping ones:
[mydna[start.start():end.start()+3] for start in re.finditer('(?=ATA)',mydna) for end in re.finditer('(?=AGA)',mydna) if end.start()>start.start()]
['ATAGGGATAGGG', 'ATAGGGATAGGGAG', 'ATAGGGATAGGGAGAGAGC', 'ATAGGG', 'ATAGGGAG', 'ATAGGGAGAGAGC']

You can also use the regular expression for that:
re.findall('ATA((...)*?)AGA', mydna)
the inner braces match 3 letters at once.

Using modulo is the correct procedure. If it's not working, you're doing it wrong. Please provide an example of your code in order to debug it.

re.findAll() will return you an array of matching strings, You need to iterate on each of those and do a modulo on those strings to achieve what you want.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: how to count overlapping occurrences of a substring [duplicate] - python

haystack = "aaaa" needle = "aa" matches = sum(haystack[i:i+len(needle)] == needle for i in xrange(len(haystack)-len(needle)+1)) # for Python 3 use range instead of xrange

The solution is not taking overlap into consideration. Try this: big_string = "aaaa" substring = "aaa" count = 0 for char in range(len(big_string)): count += big_string[char: char + len(subtring)] == substring print count

Related

Check if any character in one string appears in another [duplicate]

Printing a String in Reverse After Extracting [duplicate]

Replace sequence of chars in string with its length [duplicate]

Python: re module to replace digits of telephone with asterisk [duplicate]

How to select only certain Substrings

Categories

Resources