I need to replace all occurrences of dots but only if the dot is in parenteses, with something else (semicolon for example), using python like this:
Input: "Hello (This . will be replaced, this one. too)."
Output:"Hello (This ; will be replaced, this one; too)."
Assuming the parentheses are balanced and not nested, here's an idea with re.split.
>>> import re
>>>
>>> s = 'Hello (This . will be replaced, this one. too). This ... not but this (.).'
>>> ''.join(m.replace('.', ';') if m.startswith('(') else m
...: for m in re.split('(\([^)]+\))', s))
...:
'Hello (This ; will be replaced, this one; too). This ... not but this (;).'
The main trick here is to wrap the regex \([^)]+\) with another pair of () such that the splitting-matches are kept.
Loop over characters in string, track number of opening and closing parentheses, only replace if more opening than closing parentheses encountered.
def replace_inside_parentheses(string, find_string, replace_string):
bracket_count = 0
return_string = ""
for a in string:
if a == "(":
bracket_count += 1
elif a == ")":
bracket_count -= 1
if bracket_count > 0:
return_string += a.replace(find_string, replace_string)
else:
return_string += a
return return_string
my_str = "Hello (This . will be replaced, this one. too, (even this one . inside nested parentheses!))."
print(my_str)
print(replace_inside_parentheses(my_str, ".", ";"))
Not the most elegant way, but this should work.
def sanitize(string):
string = string.split("(",1)
string0 = str(string[0])+"("
string1 = str(string[1]).split(")",1)
ending = str(")"+string1[1])
middle = str(string1[0])
# replace second "" with character you'd like to replace with
# I.E. middle.replace(".","!")
middle = middle.replace(".","").replace(";","")
stringBackTogether = string0+middle+ending
return stringBackTogether
a = sanitize("Hello (This . will be replaced, this one. too).")
print(a)
Related
To be more specific, it's for an "if" condition
I have a list of strings which have 5 spaces then the last character
Is there a character that can replace the last character of every string
Like:
if string == " &":
do something
And the condition would be true if & == any type of character
You can access the last character by slicing, e.g. -1 is the last one:
lst = ['&', 'A', 'B', 'C']
s = 'some random string which ends on &'
if s[-1] in lst:
print('hurray!')
#hurray!
Alternatively you can also use .endswith() if its only a few entries:
s = 'some random string which ends on &'
if s.endswith('&') or s.endswith('A'):
print('hurray!')
#hurray!
Since you also asked how to replace the last character, this can be done like this:
s = s[:-1] + '!'
#Out[72]: 'some random string which ends on !'
As per you comment, here is a wildcard solution:
import re
s = r' &'
pattern = r' .{1}$'
if re.search(pattern, s):
print('hurray!')
#hurray!
Try this:
if string[-1] == 'A' or string[-1] == '1':
do something
You may use a regular expression along with re.search, for example:
vals = ["validA", "valid1", "invalid"]
for val in vals:
if re.search(r'[A1]$', val):
print(val + ": MATCH")
This prints:
validA: MATCH
valid1: MATCH
Perhaps you're looking for the .endswith() function? For example:
if "waffles".endswith("s"):
...
I have a very long string of text with () and [] in it. I'm trying to remove the characters between the parentheses and brackets but I cannot figure out how.
The list is similar to this:
x = "This is a sentence. (once a day) [twice a day]"
This list isn't what I'm working with but is very similar and a lot shorter.
You can use re.sub function.
>>> import re
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("([\(\[]).*?([\)\]])", "\g<1>\g<2>", x)
'This is a sentence. () []'
If you want to remove the [] and the () you can use this code:
>>> import re
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("[\(\[].*?[\)\]]", "", x)
'This is a sentence. '
Important: This code will not work with nested symbols
Explanation
The first regex groups ( or [ into group 1 (by surrounding it with parentheses) and ) or ] into group 2, matching these groups and all characters that come in between them. After matching, the matched portion is substituted with groups 1 and 2, leaving the final string with nothing inside the brackets. The second regex is self explanatory from this -> match everything and substitute with the empty string.
-- modified from comment by Ajay Thomas
Run this script, it works even with nested brackets.
Uses basic logical tests.
def a(test_str):
ret = ''
skip1c = 0
skip2c = 0
for i in test_str:
if i == '[':
skip1c += 1
elif i == '(':
skip2c += 1
elif i == ']' and skip1c > 0:
skip1c -= 1
elif i == ')'and skip2c > 0:
skip2c -= 1
elif skip1c == 0 and skip2c == 0:
ret += i
return ret
x = "ewq[a [(b] ([c))]] This is a sentence. (once a day) [twice a day]"
x = a(x)
print x
print repr(x)
Just incase you don't run it,
Here's the output:
>>>
ewq This is a sentence.
'ewq This is a sentence. '
Here's a solution similar to #pradyunsg's answer (it works with arbitrary nested brackets):
def remove_text_inside_brackets(text, brackets="()[]"):
count = [0] * (len(brackets) // 2) # count open/close brackets
saved_chars = []
for character in text:
for i, b in enumerate(brackets):
if character == b: # found bracket
kind, is_close = divmod(i, 2)
count[kind] += (-1)**is_close # `+1`: open, `-1`: close
if count[kind] < 0: # unbalanced bracket
count[kind] = 0 # keep it
else: # found bracket to remove
break
else: # character is not a [balanced] bracket
if not any(count): # outside brackets
saved_chars.append(character)
return ''.join(saved_chars)
print(repr(remove_text_inside_brackets(
"This is a sentence. (once a day) [twice a day]")))
# -> 'This is a sentence. '
This should work for parentheses. Regular expressions will "consume" the text it has matched so it won't work for nested parentheses.
import re
regex = re.compile(".*?\((.*?)\)")
result = re.findall(regex, mystring)
or this would find one set of parentheses, simply loop to find more:
start = mystring.find("(")
end = mystring.find(")")
if start != -1 and end != -1:
result = mystring[start+1:end]
You can split, filter, and join the string again. If your brackets are well defined the following code should do.
import re
x = "".join(re.split("\(|\)|\[|\]", x)[::2])
You can try this. Can remove the bracket and the content exist inside it.
import re
x = "This is a sentence. (once a day) [twice a day]"
x = re.sub("\(.*?\)|\[.*?\]","",x)
print(x)
Expected ouput :
This is a sentence.
For anyone who appreciates the simplicity of the accepted answer by jvallver, and is looking for more readability from their code:
>>> import re
>>> x = 'This is a sentence. (once a day) [twice a day]'
>>> opening_braces = '\(\['
>>> closing_braces = '\)\]'
>>> non_greedy_wildcard = '.*?'
>>> re.sub(f'[{opening_braces}]{non_greedy_wildcard}[{closing_braces}]', '', x)
'This is a sentence. '
Most of the explanation for why this regex works is included in the code. Your future self will thank you for the 3 additional lines.
(Replace the f-string with the equivalent string concatenation for Python2 compatibility)
The RegEx \(.*?\)|\[.*?\] removes bracket content by finding pairs, first it remove paranthesis and then square brackets. I also works fine for the nested brackets as it acts in sequence. Ofcourse, it would break in case of bad brackets scenario.
_brackets = re.compile("\(.*?\)|\[.*?\]")
_spaces = re.compile("\s+")
_b = _brackets.sub(" ", "microRNAs (miR) play a role in cancer ([1], [2])")
_s = _spaces.sub(" ", _b.strip())
print(_s)
# OUTPUT: microRNAs play a role in cancer
I would like to detect brackets in a string, and if found, remove the brackets and all data in the brackets
e.g.
Developer (12)
would become
Developer
Edit: Note that the string will be a different length/text each time, and the brackets will not always be present.
I can detect the brackets using something like
if '(' in mystring:
print 'found it'
but how would I remove the (12)?
You can user regex and replace it:
>>> re.sub(r'\(.*?\)', '','Developer (12)')
'Developer '
>>> a='DEf (asd () . as ( as ssdd (12334))'
>>> re.sub(r'\(.*?\)', '','DEf (asd () . as ( as ssdd (12334))')
'DEf . as )'
I believe you want something like this
import re
a = "developer (12)"
print(re.sub("\(.*\)", "", a))
Since it's always at the end and there is no nested brackets:
s = "Developer (12)"
s[:s.index('(')] # or s.index(' (') if you want to get rid of the previous space too
For nested brackets and multiple pairs in string this solution would work
def replace_parenthesis_with_empty_str(str):
new_str = ""
stack = []
in_bracker = False
for c in str :
if c == '(' :
stack.append(c)
in_bracker = True
continue
else:
if in_bracker == True:
if c == ')' :
stack.pop()
if not len(stack):
in_bracker = False
else :
new_str += c
return new_str
a = "fsdf(ds fOsf(fs)sdfs f(sdfsd)sd fsdf)c sdsds (sdsd)"
print(replace_parenthesis_with_empty_str(a))
How can I remove the all lowercase letters before and after "Johnson" in these strings?
str1 = 'aBcdJohnsonzZz'
str2 = 'asdVJohnsonkkk'
Expected results are as below:
str1 = 'BJohnsonZ'
str2 = 'VJohnson'
You can partition the string, check it had the separator, than translate out lowercase letters, eg:
from string import ascii_lowercase as alc
str1 = 'aBcdJohnsonzZz'
p1, sep, p2 = str1.partition('Johnson')
if sep:
str1 = p1.translate(None, alc) + sep + p2.translate(None, alc)
print str1
str.partition() is your friend here:
def munge(text, match):
prefix, match, suffix = text.partition(match)
prefix = "".join(c for c in prefix if not c.islower())
suffix = "".join(c for c in suffix if not c.islower())
return prefix + match + suffix
Example use:
>>> munge("aBcdJohnsonzZz", "Johnson")
'BJohnsonZ'
>>> munge("asdVJohnsonkkk", "Johnson")
'VJohnson'
import re
def foo(input_st, keep_st):
parts = input_st.split(keep_st)
clean_parts = [re.sub("[a-z]*", "", part) for part in parts]
return keep_st.join(clean_parts)
Other methods using the partition module don't seem to take into account your trigger word being repeated. This example will work in the case you have 'aBcJohnsonDeFJohnsonHiJkL' in the event that, that particular case is of concern to you.
There are a couple of ways you could tackle this. Here's the simplest one I could think of. The idea is to tackle it in three parts. First off, you need to know the middle string. In your case 'Johnson.' Then you can remove the lowercase letters from the part before and the part after.
def removeLowercaseAround(full, middle):
stop_at = full.index(middle) #the beginning of the name
start_again = stop_at+len(middle) #the end of the name
new_str = ''; #the string we'll return at the end
for i in range(stop_at): #for each char until the middle starts
if not full[i].islower(): #if it is not a lowercase char
new_str += full[i] #add it to the end of the new string
new_str+=middle #then add the middle char
for i in range(start_again, len(full)): #do the same thing with the end
if not full[i].islower(): #if it is not a lowercase char
new_str += full[i] #add it to the string
return new_str
print removeLowercaseAround('ABcdJohnsonzZZ', 'Johnson')
Not exactly very simple or streamlined, but you could do this sort of thing (based partially on Zero Piraeus')
(edited to reflect errors)
def remove_lower(string):
return ''.join(filter(str.isupper, string))
def strip_johnson(input_str):
prefix, match, postfix = input_str.partition("Johnson")
return (
remove_lower(prefix) +
match +
remove_lower(postfix)
)
I have a very long string of text with () and [] in it. I'm trying to remove the characters between the parentheses and brackets but I cannot figure out how.
The list is similar to this:
x = "This is a sentence. (once a day) [twice a day]"
This list isn't what I'm working with but is very similar and a lot shorter.
You can use re.sub function.
>>> import re
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("([\(\[]).*?([\)\]])", "\g<1>\g<2>", x)
'This is a sentence. () []'
If you want to remove the [] and the () you can use this code:
>>> import re
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("[\(\[].*?[\)\]]", "", x)
'This is a sentence. '
Important: This code will not work with nested symbols
Explanation
The first regex groups ( or [ into group 1 (by surrounding it with parentheses) and ) or ] into group 2, matching these groups and all characters that come in between them. After matching, the matched portion is substituted with groups 1 and 2, leaving the final string with nothing inside the brackets. The second regex is self explanatory from this -> match everything and substitute with the empty string.
-- modified from comment by Ajay Thomas
Run this script, it works even with nested brackets.
Uses basic logical tests.
def a(test_str):
ret = ''
skip1c = 0
skip2c = 0
for i in test_str:
if i == '[':
skip1c += 1
elif i == '(':
skip2c += 1
elif i == ']' and skip1c > 0:
skip1c -= 1
elif i == ')'and skip2c > 0:
skip2c -= 1
elif skip1c == 0 and skip2c == 0:
ret += i
return ret
x = "ewq[a [(b] ([c))]] This is a sentence. (once a day) [twice a day]"
x = a(x)
print x
print repr(x)
Just incase you don't run it,
Here's the output:
>>>
ewq This is a sentence.
'ewq This is a sentence. '
Here's a solution similar to #pradyunsg's answer (it works with arbitrary nested brackets):
def remove_text_inside_brackets(text, brackets="()[]"):
count = [0] * (len(brackets) // 2) # count open/close brackets
saved_chars = []
for character in text:
for i, b in enumerate(brackets):
if character == b: # found bracket
kind, is_close = divmod(i, 2)
count[kind] += (-1)**is_close # `+1`: open, `-1`: close
if count[kind] < 0: # unbalanced bracket
count[kind] = 0 # keep it
else: # found bracket to remove
break
else: # character is not a [balanced] bracket
if not any(count): # outside brackets
saved_chars.append(character)
return ''.join(saved_chars)
print(repr(remove_text_inside_brackets(
"This is a sentence. (once a day) [twice a day]")))
# -> 'This is a sentence. '
This should work for parentheses. Regular expressions will "consume" the text it has matched so it won't work for nested parentheses.
import re
regex = re.compile(".*?\((.*?)\)")
result = re.findall(regex, mystring)
or this would find one set of parentheses, simply loop to find more:
start = mystring.find("(")
end = mystring.find(")")
if start != -1 and end != -1:
result = mystring[start+1:end]
You can split, filter, and join the string again. If your brackets are well defined the following code should do.
import re
x = "".join(re.split("\(|\)|\[|\]", x)[::2])
You can try this. Can remove the bracket and the content exist inside it.
import re
x = "This is a sentence. (once a day) [twice a day]"
x = re.sub("\(.*?\)|\[.*?\]","",x)
print(x)
Expected ouput :
This is a sentence.
For anyone who appreciates the simplicity of the accepted answer by jvallver, and is looking for more readability from their code:
>>> import re
>>> x = 'This is a sentence. (once a day) [twice a day]'
>>> opening_braces = '\(\['
>>> closing_braces = '\)\]'
>>> non_greedy_wildcard = '.*?'
>>> re.sub(f'[{opening_braces}]{non_greedy_wildcard}[{closing_braces}]', '', x)
'This is a sentence. '
Most of the explanation for why this regex works is included in the code. Your future self will thank you for the 3 additional lines.
(Replace the f-string with the equivalent string concatenation for Python2 compatibility)
The RegEx \(.*?\)|\[.*?\] removes bracket content by finding pairs, first it remove paranthesis and then square brackets. I also works fine for the nested brackets as it acts in sequence. Ofcourse, it would break in case of bad brackets scenario.
_brackets = re.compile("\(.*?\)|\[.*?\]")
_spaces = re.compile("\s+")
_b = _brackets.sub(" ", "microRNAs (miR) play a role in cancer ([1], [2])")
_s = _spaces.sub(" ", _b.strip())
print(_s)
# OUTPUT: microRNAs play a role in cancer