Python regex to match string with specified count of given phrase [duplicate] - python

This question already has answers here:
Count the number of occurrences of a character in a string
(26 answers)
Closed 25 days ago.
Let's say I have a string:
my_sentence = "I like not only apples, but also bananas."
and I would like to use python regex to match string if for example a letter occurs at least 3 times in it (no matter on what place). How will pattern look like in this case?

You could use
import re
matches = re.findall(r'a', my_sentence) # ['a', 'a', 'a', 'a', 'a']
And then check for the number of occurrences
len(matches) > 3 # True

Try this pattern:
^.*(a.*){3}$
You can easily change it to any number of a's or any other character.

Related

python 3 regex: match a character exactly once [duplicate]

This question already has answers here:
Match exact string
(3 answers)
Closed 4 years ago.
a sentence needs to contain 1 or more instances of 'a', exactly 1 instance of 'b' and 0 or more instances of 'c'
my expression is a+bc*
it works for strings like 'abc' 'ab' 'aabcc' which is all fine but it also works when i have multiple b's like 'abbc' which it shouldn't. How do i get it to work when theres only 1 'b'
Here is my full code
import re
qq = re.compile('a+bc*')
if qq.match('abb') is not None:
print("True")
else:
print('False')
which should produce False
Use qq=re.compile(r'^a+bc*$'). The ^ means match at start and $ means match at the end.
You want to match the pattern to the full string and not a part of it. That is why you need the ^ and $ in this case

How can I remove the duplicated consecutive chars and reserve the first one using regex? [duplicate]

This question already has answers here:
Remove duplicate chars using regex?
(3 answers)
Closed 6 years ago.
I found a code snippet of removing duplicated consecutive characters and reserving the first character in Python by regex from web like this:
import re
re.sub(r'(?s)(.)(?=.*\1)','','aabbcc') #'abc'
But there is a defect that if the string is 'aabbccaabb' it will ignore the first 'aa', 'bb' and turn out 'cab'.
re.sub(r'(?s)(.)(?=.*\1)','','aabbccaabb') #'cab'
Is there a way to solve it by regex?
Without regex, check if previous character is the same as current, using a list comprehension with a condition and join the results:
s='aabbccaabb'
print("".join([c for i,c in enumerate(s) if i==0 or s[i-1]!=c]))
Just remove the .* in the positive look ahead.
import re
print re.sub(r'(?s)(.)(?=\1)','','aabbcc')
print re.sub(r'(?s)(.)(?=\1)','','aabbccaabb')
Output:
abc
abcab

Count how many time a string appears in a longer string [duplicate]

This question already has answers here:
String count with overlapping occurrences [closed]
(25 answers)
Closed 7 years ago.
So I have a little problem,
I want to count how many times a string : "aa" is in my longer string "aaatattgg" its looks like a dna sequence.
Here for exemple I expect 2 (overlap is allow)
There is the .count method but overlap is not allowed
PS: excuse my english , I'm french
Through re module. Put your regex inside positive lookarounds in-order to do overlapping match.
>>> import re
>>> s = "aaatattgg"
>>> re.findall(r'(?=(aa))', s)
['aa', 'aa']
>>> len(re.findall(r'(?=(aa))', s))
2

Python: re module to replace digits of telephone with asterisk [duplicate]

This question already has answers here:
re.sub replace with matched content
(4 answers)
Closed 8 years ago.
I want to replace the digits in the middle of telephone with regex but failed. Here is my code:
temp= re.sub(r'1([0-9]{1}[0-9])[0-9]{4}([0-9]{4})', repl=r'$1****$2', tel_phone)
print temp
In the output, it always shows:
$1****$2
But I want to show like this: 131****1234. How to accomplish it ? Thanks
I think you're trying to replace four digits present in the middle (four digits present before the last four digits) with ****
>>> s = "13111111234"
>>> temp= re.sub(r'^(1[0-9]{2})[0-9]{4}([0-9]{4})$', r'\1****\2', s)
>>> print temp
131****1234
You might have seen $1 in replacement string in other languages. However, in Python, use \1 instead of $1. For correctness, you also need to include the starting 1 in the first capturing group, so that the output also include the starting 1; otherwise, the starting 1 will be lost.

Python: how to count overlapping occurrences of a substring [duplicate]

This question already has answers here:
How can I find the number of overlapping sequences in a String with Python? [duplicate]
(4 answers)
Closed 9 years ago.
I wanted to count the number of times that a string like 'aa' appears in 'aaa' (or 'aaaa').
The most obvious code gives the wrong (or at least, not the intuitive) answer:
'aaa'.count('aa')
1 # should be 2
'aaaa'.count('aa')
2 # should be 3
Does anyone have a simple way to fix this?
From str.count() documentation:
Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.
So, no. You are getting the expected result.
If you want to count number of overlapping matches, use regex:
>>> import re
>>>
>>> len(re.findall(r'(a)(?=\1)', 'aaa'))
2
This finds all the occurrence of a, which is followed by a. The 2nd a wouldn't be captured, as we've used look-ahead, which is zero-width assertion.
haystack = "aaaa"
needle = "aa"
matches = sum(haystack[i:i+len(needle)] == needle
for i in xrange(len(haystack)-len(needle)+1))
# for Python 3 use range instead of xrange
The solution is not taking overlap into consideration.
Try this:
big_string = "aaaa"
substring = "aaa"
count = 0
for char in range(len(big_string)):
count += big_string[char: char + len(subtring)] == substring
print count
You have to be careful, because you seem to looking for non-overlapping substrings. To fix this I'd do:
len([s.start() for s in re.finditer('(?=aa)', 'aaa')])
And if you don't care about the position where the substring starts you can do:
len([_ for s in re.finditer('(?=aa)', 'aaa')])
Although someone smarter than myself might be able to show that there are performances differences :)

Categories