I have a word within two opening and closing parenthesis, like this ((word)).
I want to remove the first and the last parenthesis, so they are not duplicate, in order to obtain something like this: (word).
I have tried using strip('()') on the variable that contains ((word)). However, it removes ALL parentheses at the beginning and at the end. Result: word.
Is there a way to specify that I only want the first and last one removed?
For this you could slice the string and only keep from the second character until the second to last character:
word = '((word))'
new_word = word[1:-1]
print(new_word)
Produces:
(word)
For varying quantities of parenthesis, you could count how many exist first and pass this to the slicing as such (this leaves only 1 bracket on each side, if you want to remove only the first and last bracket you can use the first suggestion);
word ='((((word))))'
quan = word.count('(')
new_word = word[quan-1:1-quan]
print(new_word)
Produces;
(word)
You can use regex.
import re
word = '((word))'
re.findall('(\(?\w+\)?)', word)[0]
This only keeps one pair of brackets.
instead use str.replace, so you would do str.replace('(','',1)
basically you would replace all '(' with '', but the third argument will only replace n instances of the specified substring (as argument 1), hence you will only replace the first '('
read the documentation :
replace(...)
S.replace (old, new[, count]) -> string
Return a copy of string S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
you can replace double opening and double closing parentheses, and set the max parameter to 1 for both operations
print('((word))'.replace('((','(',1).replace('))',')',1) )
But this will not work if there are more occurrences of double closing parentheses
Maybe reversing the string before replacing the closing ones will help
t= '((word))'
t = t.replace('((','(',1)
t = t[::-1] # see string reversion topic [https://stackoverflow.com/questions/931092/reverse-a-string-in-python]
t = t.replace('))',')',1) )
t = t[::-1] # and reverse again
Well , I used regular expression for this purpose and substitute a bunch of brackets with a single one using re.sub function
import re
s="((((((word)))))))))"
t=re.sub(r"\(+","(",s)
g=re.sub(r"\)+",")",t)
print(g)
Output
(word)
Try below:
>>> import re
>>> w = '((word))'
>>> re.sub(r'([()])\1+', r'\1', w)
'(word)'
>>> w = 'Hello My ((word)) into this world'
>>> re.sub(r'([()])\1+', r'\1', w)
'Hello My (word) into this world'
>>>
try this one:
str="((word))"
str[1:len(str)-1]
print (str)
And output is = (word)
Related
I have a regex in python and I want to prevent matching substrings. I want to add '#' at the beginning some words with alphanumeric and _ character and 4 to 15 characters. But it matches substring of larger words. I have this method:
def add_atsign(sents):
for i, sent in enumerate(sents):
sents[i] = re.sub(r'([a-zA-Z0-9_]{4,15})', r'#\1', str(sent))
return sents
And the example is :
mylist = list()
mylist.append("ali_s ali_t ali_u aabs:/t.co/kMMALke2l9")
add_atsign(mylist)
And the answer is :
['#ali_s #ali_t #ali_u #aabs:/t.co/#kMMALke2l9']
As you can see, it puts '#' at the beginning of 'aabs' and 'kMMALke2l9'. That it is wrong.
I tried to edit the code as bellow :
def add_atsign(sents):
for i, sent in enumerate(sents):
sents[i] = re.sub(r'((^|\s)[a-zA-Z0-9_]{4,15}(\s|$))', r'#\1', str(sent))
return sents
But the result will become like this :
['#ali_s ali_t# ali_u aabs:/t.co/kMMALke2l9']
As you can see It has wrong replacements.
The correct result I expect is:
"#ali_s #ali_t #ali_u aabs:/t.co/kMMALke2l9"
Could anyone help?
Thanks
This is a pretty interesting question. If I understand correctly, the issue is that you want to divide the string by spaces, and then do the replacement only if the entire word matches, and not catch a substring.
I think the best way to do this is to first split by spaces, and then add assertions to your regex that catch only an entire string:
def add_atsign(sents):
new_list = []
for string in sents:
new_list.append(' '.join(re.sub(r'^([a-zA-Z0-9_]{4,15})$', r'#\1', w)
for w in string.split()))
return new_list
mylist = ["ali_s ali_t ali_u aabs:/t.co/kMMALke2l9"]
add_atsign(mylist)
>
['#ali_s #ali_t #ali_u aabs:/t.co/kMMALke2l9']
ie, we split, then replace only if the entire word matches, then rejoin.
By the way, your regex can be simplified to r'^(\w{4,15})$':
def add_atsign(sents):
new_list = []
for string in sents:
new_list.append(' '.join(re.sub(r'^(\w{4,15})$', r'#\1', w)
for w in string.split()))
return new_list
You can separate words by spaces by adding (?<=\s) to the start and \s to the end of your first expression.
def add_atsign(sents):
for i, sent in enumerate(sents):
sents[i] = re.sub(r'((^|(?<=\s))[a-zA-Z0-9_]{4,15}\s)', r'#\1', str(sent))
return sents
The result will be like this:
['#ali_s #ali_t #ali_u aabs:/t.co/kMMALke2l9']
I am not sure what you are trying to accomplish, but the reason it puts the # at the wrong places is that as you added /s or ^ to the regex the whitespace becomes part of the match and it therefore puts the # before the whitespace.
you could try to split it to
check at beginning of string and put at first position and
check after every whitespace and put to second position
Im aware its not optimal, but maybe i can help if you clarify what the regex is supposed to match and what it shouldnt in a bit more detail
I am trying to find the easiest way of returning the substring consisting of the characters of a string after the last occurrence of a given character in Python.
Example:
s = 'foo-bar-123-7-foo2'
I am interested in the characters after the last occurrence of '-'.
So the output would be 'foo2'
I could do a str.find(sub,start,end) function to find the position of the first '-' then store it's position then repeat this starting at this position to search for the next one until there are no more then return the characters of the string after this last position but is their a nicer way?
Simply with str.rfind function (returns the highest index in the string where substring is found):
s = 'foo-bar-123-7-foo2'
res = s[s.rfind('-') + 1:]
print(res) # foo2
s = 'foo-bar-123-7-foo2'
print(s.rsplit('-', 1)[1])
use rfind
so, s.rfind('-') will return you 13, which is the last occurrence of '-'.
and further doing
s[s.rfind('-')+1:] will return foo2
Here is a regex option:
s = 'foo-bar-123-7-foo2'
output = re.sub(r'^.*-', '', s)
print(output)
This prints: foo2
The strategy here to match all content up, to and including, the final dash character. Then, replace that with empty string, leaving only the final portion.
I want to replace characters in a string but not all of the characters at once. For example:
s = "abac"
I would like to replace the string with all these options
"Xbac"
"abXc"
"XbXc"
I only know the normal s.replace() function that would replace all occurences of that character. Does anyone have any clever code that would replace all the possible options of characters in the string?
I have managed to replace all the character in the string with all the different combinations. Although the code isnt the most efficient it does what I wanted it to
def replaceall(s, n):
occurence = s.count(n)
alt = []
temp = s
for i in range(occurence):
temp2 = temp
for j in range(i,occurence):
temp2 = temp2.replace(n,"x",1)
alt.append(temp2)
temp = temp.replace(n,"!",1)
for i in range(len(alt)):
alt[i] = alt[i].replace("!",n)
return alt
string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.
So, if maxreplace=1 only the first character is replaced.
You can still use replace() to only replace the first k occurrences of a character, rather than all of them at once.
string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.
So your example could be accomplished by:
'abac'.replace('a','X',1).replace('a','X',1)
>>> 'XbXc'
More complicated patterns can be accomplished by using the re module in Python, which allows for pattern matching using regular expressions, notably with re.sub().
Does anyone have any clever code that would replace all the possible
options of characters in the string?
Take a look at regular expressions in Python.
Here's an example:
import re
s = "abac"
re.sub('^a', 'X', s)
# output = 'Xbac'
This would replace all strings that start with a and put an X there.
I'm trying to learn how to use Regular Expressions with Python. I want to retrieve an ID number (in parentheses) in the end from a string that looks like this:
"This is a string of variable length (561401)"
The ID number (561401 in this example) can be of variable length, as can the text.
"This is another string of variable length (99521199)"
My coding fails:
import re
import selenium
# [Code omitted here, I use selenium to navigate a web page]
result = driver.find_element_by_class_name("class_name")
print result.text # [This correctly prints the whole string "This is a text of variable length (561401)"]
id = re.findall("??????", result.text) # [Not sure what to do here]
print id
This should work for your example:
(?<=\()[0-9]*
?<= Matches something preceding the group you are looking for but doesn't consume it. In this case, I used \(. ( is a special character, so it has to be escaped with \. [0-9] matches any number. The * means match any number of the directly preceding rule, so [0-9]* means match as many numbers as there are.
Solved this thanks to Kaz's link, very useful:
http://regex101.com/
id = re.findall("(\d+)", result.text)
print id[0]
You can use this simple solution :
>>> originString = "This is a string of variable length (561401)"
>>> str1=OriginalString.replace("("," ")
'This is a string of variable length 561401)'
>>> str2=str1.replace(")"," ")
'This is a string of variable length 561401 '
>>> [int(s) for s in string.split() if s.isdigit()]
[561401]
First, I replace parantheses with space. and then I searched the new string for integers.
No need to really use regular expressions here, if it is always at the end and always in parenthesis you can split, extract last element and remove the parenthesis by taking the substring ([1:-1]). Regexes are relatively time expensive.
line = "This is another string of variable length (99521199)"
print line.split()[-1][1:-1]
If you did want to use regular expressions I would do this:
import re
line = "This is another string of variable length (99521199)"
id_match = re.match('.*\((\d+)\)',line)
if id_match:
print id_match.group(1)
I need to strip the character "'" from a string in python. How do I do this?
I know there is a simple answer. Really what I am looking for is how to write ' in my code. for example \n = newline.
As for how to represent a single apostrophe as a string in Python, you can simply surround it with double quotes ("'") or you can escape it inside single quotes ('\'').
To remove apostrophes from a string, a simple approach is to just replace the apostrophe character with an empty string:
>>> "didn't".replace("'", "")
'didnt'
Here are a few ways of removing a single ' from a string in python.
str.replace
replace is usually used to return a string with all the instances of the substring replaced.
"A single ' char".replace("'","")
str.translate
In Python 2
To remove characters you can pass the first argument to the funstion with all the substrings to be removed as second.
"A single ' char".translate(None,"'")
In Python 3
You will have to use str.maketrans
"A single ' char".translate(str.maketrans({"'":None}))
re.sub
Regular Expressions using re are even more powerful (but slow) and can be used to replace characters that match a particular regex rather than a substring.
re.sub("'","","A single ' char")
Other Ways
There are a few other ways that can be used but are not at all recommended. (Just to learn new ways). Here we have the given string as a variable string.
Using list comprehension
''.join([c for c in string if c != "'"])
Using generator Expression
''.join(c for c in string if c != "'")
Another final method can be used also (Again not recommended - works only if there is only one occurrence )
Using list call along with remove and join.
x = list(string)
x.remove("'")
''.join(x)
Do you mean like this?
>>> mystring = "This isn't the right place to have \"'\" (single quotes)"
>>> mystring
'This isn\'t the right place to have "\'" (single quotes)'
>>> newstring = mystring.replace("'", "")
>>> newstring
'This isnt the right place to have "" (single quotes)'
You can escape the apostrophe with a \ character as well:
mystring.replace('\'', '')
I met that problem in codewars, so I created temporary solution
pred = "aren't"
pred = pred.replace("'", "99o")
pred = pred.title()
pred = pred.replace("99O", "'")
print(pred)
You can use another char combination, like 123456k and etc., but the last char should be letter