This question already has answers here:
How to use re to find consecutive, repeated chars
(3 answers)
Closed 2 years ago.
I want to detect if there are three of the same letter next to each other in a string.
For example:
string1 = 'this is oooonly excaple' # ooo
string2 = 'nooo way that he did this' # ooo
string3 = 'I kneeeeeew it!' # eee
Is there any pythonic way to do this?
I guess that a solution like this is not the best one:
for letters in ['aaa', 'bbb', 'ccc', 'ddd', ..., 'zzz']:
if letters in string:
print(True)
you dont have to use regex but solution is little long for something as simple as that
def repeated(string, amount):
current = None
count = 0
for letter in string:
if letter == current:
count += 1
if count == amount:
return True
else:
count = 1
current = letter
return False
print(repeated("helllo", 3) == True)
print(repeated("hello", 3) == False)
You can use groupby to group similar letters and then check the length of each group:
from itertools import groupby
string = "this is ooonly an examplle nooo wway that he did this I kneeeeeew it!"
for letter, group in groupby(string):
if len(list(group)) >= 3:
print(letter)
Will output:
o
o
e
If you don't care for the letters themselves and just want to know if there was a repetition, take advantage of short-circuiting with the built-in any function:
print(any(len(list(group)) >= 3 for letter, group in groupby(string)))
One of the best ways to tackle these simple pattern problems is with regex
import re
test_cases = [
'abc',
'a bbb a', # expected match for 'bbb'
'bb a b',
'aaa c bbb', # expected match for 'aaa' and 'bbb'
]
for string in test_cases:
# We use re.findall because don't want to keep only with the first result.
# In case we want to stop at the first result, we should use re.search
match = re.findall(r'(?P<repeated_characters>(.)\2{2})', string)
if match:
print([groups[0] for groups in match])
Result:
['bbb']
['aaa', 'bbb']
Use a regular expression:
import re
pattern = r"(\w)\1{2}"
string = "this is ooonly an example"
print(re.search(pattern, string) is not None)
Output:
True
>>>
How about using regex? - ([a-z])\1{2}
>>> import re
>>> re.search(r'([a-z])\1{2}', 'I kneeew it!', flags=re.I)
<re.Match object; span=(4, 7), match='eee'>
re.search will return None if it doesn't find a match, otherwise it'll return a match object, you can get the full match from the match object using [0] on it.
string1 = 'nooo way that he did this'
for i in range(0,len(string1)-2):
sub_st = string1[i:i+3]
if sub_st[0]*3 == sub_st:
print('true')
print statement is from your example.
sub_st[0]*3 clone fist character in the sub_st combine those into single string. If original sub_st and clone one same it means sub_st carries the same latter 3 times.
If you don't need a general answer for n repetitions you can just iterate through the string and print true if previous character and next character are equal to current character, excluding the first and last character.
text = "I kneeeeeew it!"
for i in range(1,len(text)-1):
if text[i-1] == text[i] and text[i+1] == text[i]:
print(True)
break;
We can define following predicate with itertools.groupby + any functions like
from itertools import groupby
def has_repeated_letter(string):
return any(len(list(group)) >= 3 for _, group in groupby(string))
and after that use it
>>> has_repeated_letter('this is oooonly example')
True
>>> has_repeated_letter('nooo way that he did this')
True
>>> has_repeated_letter('I kneeeeeew it!')
True
>>> has_repeated_letter('I kneew it!')
False
Related
To be more specific, it's for an "if" condition
I have a list of strings which have 5 spaces then the last character
Is there a character that can replace the last character of every string
Like:
if string == " &":
do something
And the condition would be true if & == any type of character
You can access the last character by slicing, e.g. -1 is the last one:
lst = ['&', 'A', 'B', 'C']
s = 'some random string which ends on &'
if s[-1] in lst:
print('hurray!')
#hurray!
Alternatively you can also use .endswith() if its only a few entries:
s = 'some random string which ends on &'
if s.endswith('&') or s.endswith('A'):
print('hurray!')
#hurray!
Since you also asked how to replace the last character, this can be done like this:
s = s[:-1] + '!'
#Out[72]: 'some random string which ends on !'
As per you comment, here is a wildcard solution:
import re
s = r' &'
pattern = r' .{1}$'
if re.search(pattern, s):
print('hurray!')
#hurray!
Try this:
if string[-1] == 'A' or string[-1] == '1':
do something
You may use a regular expression along with re.search, for example:
vals = ["validA", "valid1", "invalid"]
for val in vals:
if re.search(r'[A1]$', val):
print(val + ": MATCH")
This prints:
validA: MATCH
valid1: MATCH
Perhaps you're looking for the .endswith() function? For example:
if "waffles".endswith("s"):
...
This is my code :
def cap_space(txt):
e = txt
upper = "WLMFSC"
letters = [each for each in e if each in upper]
a = ''.join(letters)
b = a.lower()
c = txt.replace(a,' '+b)
return c
who i built to find the uppercase latters on a given string and replace it with space and the lowercase of the latter
example input :
print(cap_space('helloWorld!'))
print(cap_space('iLoveMyFriend'))
print(cap_space('iLikeSwimming'))
print(cap_space('takeCare'))
what should output be like :
hello world!
i love my friend
take care
i like swimming
what i get as output instead is :
hello world!
iLoveMyFriend
iLikeSwimming
take care
the problem here is the condition only applied if there only one upper case latter in the given string for some reasons how i could improve it to get it applied to every upper case latter on the given string ?
Being a regex addict, I can offer the following solution which relies on re.findall with an appropriate regex pattern:
def cap_space(txt):
parts = re.findall(r'^[a-z]+|[A-Z][a-z]*[^\w\s]?', txt)
output = ' '.join(parts).lower()
return output
inp = ['helloWorld!', 'iLoveMyFriend', 'iLikeSwimming', 'akeCare']
output = [cap_space(x) for x in inp]
print(inp)
print(output)
This prints:
['helloWorld!', 'iLoveMyFriend', 'iLikeSwimming', 'akeCare']
['hello world!', 'i love my friend', 'i like swimming', 'ake care']
Here is an explanation of the regex pattern used:
^[a-z]+ match an all lowercase word from the very start of the string
| OR
[A-Z] match a leading uppercase letter
[a-z]* followed by zero or more lowercase letters
[^\w\s]? followed by an optional "symbol" (defined here as any non word,
non whitespace character)
You can make use of nice python3 methods str.translate and str.maketrans:
In [281]: def cap_space(txt):
...: upper = "WLMFSC"
...: letters = [each for each in txt if each in upper]
...: d = {i: ' ' + i.lower() for i in letters}
...: return txt.translate(str.maketrans(d))
...:
...:
In [283]: print(cap_space('helloWorld!'))
...: print(cap_space('iLoveMyFriend'))
...: print(cap_space('iLikeSwimming'))
...: print(cap_space('takeCare'))
hello world!
i love my friend
i like swimming
take care
A simple and crude way. It might not be effective but it is easier to understand
def cap_space(sentence):
characters = []
for character in sentence:
if character.islower():
characters.append(character)
else:
characters.append(f' {character.lower()}')
return ''.join(characters)
a is all the matching uppercase letters combined into a single string. When you try to replace them with txt.replace(a, ' '+b), it will only match if all the matchinguppercase letters are consecutive in txt, or there's just a single match. str.replace() matches and replaces the whole seawrch string, not any characters in it.
Combining all the matches into a single string won't work. Just loop through txt, checking each character to see if it matches.
def cap_space(txt):
result = ''
upper = "WLMFSC"
for c in txt:
if c in upper:
result += ' ' + c.lower()
else:
result += c
return result
I have a very long string of text with () and [] in it. I'm trying to remove the characters between the parentheses and brackets but I cannot figure out how.
The list is similar to this:
x = "This is a sentence. (once a day) [twice a day]"
This list isn't what I'm working with but is very similar and a lot shorter.
You can use re.sub function.
>>> import re
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("([\(\[]).*?([\)\]])", "\g<1>\g<2>", x)
'This is a sentence. () []'
If you want to remove the [] and the () you can use this code:
>>> import re
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("[\(\[].*?[\)\]]", "", x)
'This is a sentence. '
Important: This code will not work with nested symbols
Explanation
The first regex groups ( or [ into group 1 (by surrounding it with parentheses) and ) or ] into group 2, matching these groups and all characters that come in between them. After matching, the matched portion is substituted with groups 1 and 2, leaving the final string with nothing inside the brackets. The second regex is self explanatory from this -> match everything and substitute with the empty string.
-- modified from comment by Ajay Thomas
Run this script, it works even with nested brackets.
Uses basic logical tests.
def a(test_str):
ret = ''
skip1c = 0
skip2c = 0
for i in test_str:
if i == '[':
skip1c += 1
elif i == '(':
skip2c += 1
elif i == ']' and skip1c > 0:
skip1c -= 1
elif i == ')'and skip2c > 0:
skip2c -= 1
elif skip1c == 0 and skip2c == 0:
ret += i
return ret
x = "ewq[a [(b] ([c))]] This is a sentence. (once a day) [twice a day]"
x = a(x)
print x
print repr(x)
Just incase you don't run it,
Here's the output:
>>>
ewq This is a sentence.
'ewq This is a sentence. '
Here's a solution similar to #pradyunsg's answer (it works with arbitrary nested brackets):
def remove_text_inside_brackets(text, brackets="()[]"):
count = [0] * (len(brackets) // 2) # count open/close brackets
saved_chars = []
for character in text:
for i, b in enumerate(brackets):
if character == b: # found bracket
kind, is_close = divmod(i, 2)
count[kind] += (-1)**is_close # `+1`: open, `-1`: close
if count[kind] < 0: # unbalanced bracket
count[kind] = 0 # keep it
else: # found bracket to remove
break
else: # character is not a [balanced] bracket
if not any(count): # outside brackets
saved_chars.append(character)
return ''.join(saved_chars)
print(repr(remove_text_inside_brackets(
"This is a sentence. (once a day) [twice a day]")))
# -> 'This is a sentence. '
This should work for parentheses. Regular expressions will "consume" the text it has matched so it won't work for nested parentheses.
import re
regex = re.compile(".*?\((.*?)\)")
result = re.findall(regex, mystring)
or this would find one set of parentheses, simply loop to find more:
start = mystring.find("(")
end = mystring.find(")")
if start != -1 and end != -1:
result = mystring[start+1:end]
You can split, filter, and join the string again. If your brackets are well defined the following code should do.
import re
x = "".join(re.split("\(|\)|\[|\]", x)[::2])
You can try this. Can remove the bracket and the content exist inside it.
import re
x = "This is a sentence. (once a day) [twice a day]"
x = re.sub("\(.*?\)|\[.*?\]","",x)
print(x)
Expected ouput :
This is a sentence.
For anyone who appreciates the simplicity of the accepted answer by jvallver, and is looking for more readability from their code:
>>> import re
>>> x = 'This is a sentence. (once a day) [twice a day]'
>>> opening_braces = '\(\['
>>> closing_braces = '\)\]'
>>> non_greedy_wildcard = '.*?'
>>> re.sub(f'[{opening_braces}]{non_greedy_wildcard}[{closing_braces}]', '', x)
'This is a sentence. '
Most of the explanation for why this regex works is included in the code. Your future self will thank you for the 3 additional lines.
(Replace the f-string with the equivalent string concatenation for Python2 compatibility)
The RegEx \(.*?\)|\[.*?\] removes bracket content by finding pairs, first it remove paranthesis and then square brackets. I also works fine for the nested brackets as it acts in sequence. Ofcourse, it would break in case of bad brackets scenario.
_brackets = re.compile("\(.*?\)|\[.*?\]")
_spaces = re.compile("\s+")
_b = _brackets.sub(" ", "microRNAs (miR) play a role in cancer ([1], [2])")
_s = _spaces.sub(" ", _b.strip())
print(_s)
# OUTPUT: microRNAs play a role in cancer
This question already has answers here:
efficiently checking that string consists of one character in Python
(8 answers)
Closed 6 years ago.
What is the shortest way to check if a given string has the same characters?
For example if you have name = 'aaaaa' or surname = 'bbbb' or underscores = '___' or p = '++++', how do you check to know the characters are the same?
An option is to check whether the set of its characters has length 1:
>>> len(set("aaaa")) == 1
True
Or with all(), this could be faster if the strings are very long and it's rare that they are all the same character (but then the regex is good too):
>>> s = "aaaaa"
>>> s0 = s[0]
>>> all(c == s0 for c in s[1:])
True
You can use regex for this:
import re
p = re.compile(ur'^(.)\1*$')
re.search(p, "aaaa") # returns a match object
re.search(p, "bbbb") # returns a match object
re.search(p, "aaab") # returns None
Here's an explanation of what this regex pattern means: https://regexper.com/#%5E(.)%5C1*%24
Also possible:
s = "aaaaa"
s.count(s[0]) == len(s)
compare == len(name) * name[0]
if(compare):
# all characters are same
else:
# all characters aren't same
Here are a couple of ways.
def all_match0(s):
head, tail = s[0], s[1:]
return tail == head * len(tail)
def all_match1(s):
head, tail = s[0], s[1:]
return all(c == head for c in tail)
all_match = all_match0
data = [
'aaaaa',
'bbbb',
'___',
'++++',
'q',
'aaaaaz',
'bbbBb',
'_---',
]
for s in data:
print(s, all_match(s))
output
aaaaa True
bbbb True
___ True
++++ True
q True
aaaaaz False
bbbBb False
_--- False
all_match0 will be faster unless the string is very long, because its testing loop runs at C speed, but it uses more RAM because it constructs a duplicate string. For very long strings, the time taken to construct the duplicate string becomes significant, and of course it can't do any testing until it creates that duplicate string.
all_match1 should only be slightly slower, even for short strings, and because it stops testing as soon as it finds a mismatch it may even be faster than all_match0, if the mismatch occurs early enough in the string.
try to use Counter (High-performance container datatypes).
>>> from collections import Counter
>>> s = 'aaaaaaaaa'
>>> c = Counter(s)
>>> len(c) == 1
True
I have a very long string of text with () and [] in it. I'm trying to remove the characters between the parentheses and brackets but I cannot figure out how.
The list is similar to this:
x = "This is a sentence. (once a day) [twice a day]"
This list isn't what I'm working with but is very similar and a lot shorter.
You can use re.sub function.
>>> import re
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("([\(\[]).*?([\)\]])", "\g<1>\g<2>", x)
'This is a sentence. () []'
If you want to remove the [] and the () you can use this code:
>>> import re
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("[\(\[].*?[\)\]]", "", x)
'This is a sentence. '
Important: This code will not work with nested symbols
Explanation
The first regex groups ( or [ into group 1 (by surrounding it with parentheses) and ) or ] into group 2, matching these groups and all characters that come in between them. After matching, the matched portion is substituted with groups 1 and 2, leaving the final string with nothing inside the brackets. The second regex is self explanatory from this -> match everything and substitute with the empty string.
-- modified from comment by Ajay Thomas
Run this script, it works even with nested brackets.
Uses basic logical tests.
def a(test_str):
ret = ''
skip1c = 0
skip2c = 0
for i in test_str:
if i == '[':
skip1c += 1
elif i == '(':
skip2c += 1
elif i == ']' and skip1c > 0:
skip1c -= 1
elif i == ')'and skip2c > 0:
skip2c -= 1
elif skip1c == 0 and skip2c == 0:
ret += i
return ret
x = "ewq[a [(b] ([c))]] This is a sentence. (once a day) [twice a day]"
x = a(x)
print x
print repr(x)
Just incase you don't run it,
Here's the output:
>>>
ewq This is a sentence.
'ewq This is a sentence. '
Here's a solution similar to #pradyunsg's answer (it works with arbitrary nested brackets):
def remove_text_inside_brackets(text, brackets="()[]"):
count = [0] * (len(brackets) // 2) # count open/close brackets
saved_chars = []
for character in text:
for i, b in enumerate(brackets):
if character == b: # found bracket
kind, is_close = divmod(i, 2)
count[kind] += (-1)**is_close # `+1`: open, `-1`: close
if count[kind] < 0: # unbalanced bracket
count[kind] = 0 # keep it
else: # found bracket to remove
break
else: # character is not a [balanced] bracket
if not any(count): # outside brackets
saved_chars.append(character)
return ''.join(saved_chars)
print(repr(remove_text_inside_brackets(
"This is a sentence. (once a day) [twice a day]")))
# -> 'This is a sentence. '
This should work for parentheses. Regular expressions will "consume" the text it has matched so it won't work for nested parentheses.
import re
regex = re.compile(".*?\((.*?)\)")
result = re.findall(regex, mystring)
or this would find one set of parentheses, simply loop to find more:
start = mystring.find("(")
end = mystring.find(")")
if start != -1 and end != -1:
result = mystring[start+1:end]
You can split, filter, and join the string again. If your brackets are well defined the following code should do.
import re
x = "".join(re.split("\(|\)|\[|\]", x)[::2])
You can try this. Can remove the bracket and the content exist inside it.
import re
x = "This is a sentence. (once a day) [twice a day]"
x = re.sub("\(.*?\)|\[.*?\]","",x)
print(x)
Expected ouput :
This is a sentence.
For anyone who appreciates the simplicity of the accepted answer by jvallver, and is looking for more readability from their code:
>>> import re
>>> x = 'This is a sentence. (once a day) [twice a day]'
>>> opening_braces = '\(\['
>>> closing_braces = '\)\]'
>>> non_greedy_wildcard = '.*?'
>>> re.sub(f'[{opening_braces}]{non_greedy_wildcard}[{closing_braces}]', '', x)
'This is a sentence. '
Most of the explanation for why this regex works is included in the code. Your future self will thank you for the 3 additional lines.
(Replace the f-string with the equivalent string concatenation for Python2 compatibility)
The RegEx \(.*?\)|\[.*?\] removes bracket content by finding pairs, first it remove paranthesis and then square brackets. I also works fine for the nested brackets as it acts in sequence. Ofcourse, it would break in case of bad brackets scenario.
_brackets = re.compile("\(.*?\)|\[.*?\]")
_spaces = re.compile("\s+")
_b = _brackets.sub(" ", "microRNAs (miR) play a role in cancer ([1], [2])")
_s = _spaces.sub(" ", _b.strip())
print(_s)
# OUTPUT: microRNAs play a role in cancer