I have a string s with nested brackets: s = "AX(p>q)&E((-p)Ur)"
I want to remove all characters between all pairs of brackets and store in a new string like this: new_string = AX&E
i tried doing this:
p = re.compile("\(.*?\)", re.DOTALL)
new_string = p.sub("", s)
It gives output: AX&EUr)
Is there any way to correct this, rather than iterating each element in the string?
Another simple option is removing the innermost parentheses at every stage, until there are no more parentheses:
p = re.compile("\([^()]*\)")
count = 1
while count:
s, count = p.subn("", s)
Working example: http://ideone.com/WicDK
You can just use string manipulation without regular expression
>>> s = "AX(p>q)&E(qUr)"
>>> [ i.split("(")[0] for i in s.split(")") ]
['AX', '&E', '']
I leave it to you to join the strings up.
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> re.compile("""\([^\)]*\)""").sub('', s)
'AX&E'
Yeah, it should be:
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> p = re.compile("\(.*?\)", re.DOTALL)
>>> new_string = p.sub("", s)
>>> new_string
'AX&E'
Nested brackets (or tags, ...) are something that are not possible to handle in a general way using regex. See http://www.amazon.de/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ref=sr_1_1?ie=UTF8&s=gateway&qid=1304230523&sr=8-1-spell for details why. You would need a real parser.
It's possible to construct a regex which can handle two levels of nesting, but they are already ugly, three levels will already be quite long. And you don't want to think about four levels. ;-)
You can use PyParsing to parse the string:
from pyparsing import nestedExpr
import sys
s = "AX(p>q)&E((-p)Ur)"
expr = nestedExpr('(', ')')
result = expr.parseString('(' + s + ')').asList()[0]
s = ''.join(filter(lambda x: isinstance(x, str), result))
print(s)
Most code is from: How can a recursive regexp be implemented in python?
You could use re.subn():
import re
s = 'AX(p>q)&E((-p)Ur)'
while True:
s, n = re.subn(r'\([^)(]*\)', '', s)
if n == 0:
break
print(s)
Output
AX&E
this is just how you do it:
# strings
# double and single quotes use in Python
"hey there! welcome to CIP"
'hey there! welcome to CIP'
"you'll understand python"
'i said, "python is awesome!"'
'i can\'t live without python'
# use of 'r' before string
print(r"\new code", "\n")
first = "code in"
last = "python"
first + last #concatenation
# slicing of strings
user = "code in python!"
print(user)
print(user[5]) # print an element
print(user[-3]) # print an element from rear end
print(user[2:6]) # slicing the string
print(user[:6])
print(user[2:])
print(len(user)) # length of the string
print(user.upper()) # convert to uppercase
print(user.lstrip())
print(user.rstrip())
print(max(user)) # max alphabet from user string
print(min(user)) # min alphabet from user string
print(user.join([1,2,3,4]))
input()
Related
I have a string in which I want to replace certain characters with "*". But replace() function of python doesn't replace the characters. I understand that the strings in python are immutable and I am creating a new variable to store the replaced string. But still the function doesn't provide the replaced strings.
This is the following code that I have written. I have tried the process in two ways but still don't get the desired output:
1st way:
a = "AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIAP"
rep = ['A','C','P']
for char in rep:
new = a.replace(char, "*")
print(new)
Output:
AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIA*
2nd way:
a = "AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIAP"
rep = ['A','C','P']
for i in a:
if(i in rep):
new = a.replace(i, "*")
print(new)
Output:
AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIA*
Any help would be much appreciated. Thanks
You assign the result of a.replace(char, "*") to new, but then on the next iteration of the for loop, you again replace parts of a, not new. Instead of assigning to new, just assign the result to a, replacing the original string.
a = "AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIAP"
rep = ['A','C','P']
for char in rep:
a = a.replace(char, "*")
print(a)
In addition to the answers offered, I would suggest that regular expressions make this perhaps more straightforward, accomplishing all of the substitutions with a single function call.
>>> import re
>>> a = "AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIAP"
>>> rep = ['A','C','P']
>>> r = re.compile('|'.join(rep))
>>> r.sub('*', a)
'*GG*FTFG*DF*DTRF**GF*D*RTR*DF**DGFLKLI**'
Just in case someone decides to be clever and puts something regex significant in rep, you could escape those when compiling your regex.
r = re.compile('|'.join(re.escape(x) for x in rep))
Others have explained errors in posted code. An alternative using generator expression:
new = ''.join("*" if char in ['A','C','P'] else char for char in a)
print(new)
>>> '*GG*FTFG*DF*DTRF**GF*D*RTR*DF**DGFLKLI**'
A simple loop is easy to understand and efficient. The crucial part of the looping approach is to re-assign the string reference to the output of replace()
I've taken the liberty of plagiarising two pieces of code from other contributors in order to demonstrate the performance differences (in case that's important).
import re
from timeit import timeit
a = "AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIAP"
rep = 'A', 'C', 'P'
p = re.compile('|'.join(rep))
def v1(s):
for c in rep:
s = s.replace(c, '*')
return s
def v2(s):
return p.sub('*', s)
def v3(s):
return ''.join("*" if char in rep else char for char in s)
for func in v1, v2, v3:
print(func.__name__, timeit(lambda: func(a)))
assert v1(a) == v2(a)
assert v1(a) == v3(a)
Output:
v1 0.3363962830003402
v2 1.8725565750000897
v3 3.3800653280000006
Platform:
macOS 13.0.1
Python 3.11.0
3 GHz 10-Core Intel Xeon W
As already mentioned, you should write a = a.replace(i, "*") because you are looping through rep and you want to do the replacement in the string a. Strings are immutable, and replace gives back a copy of the string.
The variable new only gives you the replacement over the last iteration of rep which is a P char and will result in AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIA* because there is only a single P at the end of the string and you are never actually changing the value of rep.
If you have single characters, you can use a character class [ACP] with a single call to re.sub
import re
a = "AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIAP"
print(re.sub("[ACP]", "*", a))
Output
*GG*FTFG*DF*DTRF**GF*D*RTR*DF**DGFLKLI**
If I have a series of python strings that I'm working with that will always take the form of
initialword_content
and I want to strip out the initialword portion, which will always be the same number of characters, and then I want to turn all instances of _ into spaces -- since content may have some underscores in it -- what's the easiest way to do that?
strs = "initialword_content"
strs = strs[12:].replace("_", " ")
print strs
Due to the initialword always has same number of character, so you can just get the suffix of the string. And use string.replace to replace all "_" into spaces.
First, split the string once (with the parameter 1 to split) to get two parts: the throw-away 'initialword' and the rest, where you replace all underscores with spaces.
s = 'initialword_content'
a, b = s.split('_', 1)
b = b.replace('_', ' ')
# b == 'content'
s = 'initialword_content_with_more_words'
a, b = s.split('_', 1)
b = b.replace('_', ' ')
# b == 'content with more words'
This can be done with a single command:
s.split('_', 1)[1].replace('_', ' ')
another way:
' '.join(s.split('_')[1:])
or, if the length of "initialword" is always the same (and you don't have to calculate it each time), take the #JunHu's solution.
I used slicing and the replace() function. replace() simply... replaces!
string = 'initialword_content'
content = string[12:] # You mentioned that intialword will always be the same length, so I used slicing.
content = content.replace('_', ' ')
For example:
>>> string = 'elephantone_con_ten_t' # elephantone was the first thing I thought of xD
>>> content = string[12:]
>>> content
... con_ten_t
>>> content = content.replace('_', ' ')
>>> content
... con ten t
However, if you also want to reference "elephantone" somewhere else, do this:
>>> string = 'elephantone_con_ten_t'
>>> l = string.split('_', 1) # This will only strip the string ONCE from the left.
>>> l[0]
... 'elephantone'
>>> l[1].replace('_', ' ')
... 'con ten t'
I'm creating a function to create all 26 combinations of words with a fixed suffix. The script works except for the JOIN in the second-to-last line.
def create_word(suffix):
e=[]
letters="abcefghijklmnopqrstuvwxyz"
t=list(letters)
for i in t:
e.append(i)
e.append(suffix)
' '.join(e)
print e
Currently, it is printing ['a', 'suffix', 'b', 'suffix, ...etc]. And I want it to print out as one long string: 'aSuffixbSuffixcSuffix...etc.' Why isn't the join working in this? How can I fix this?
In addition, how would I separate the characters once I have the string? For example to translate "take the last character of the suffix and add a space to it every time ('aSuffixbSuffixcSuffix' --> 'aSuffix bSuffix cSuffix')". Or, more generally, to replace the x-nth character, where x is any integer (e.g., to replace the 3rd, 6th, 9th, etc. character some something I choose).
str.join returns the new value, not transform the existing one. Here's one way to accomplish it.
result = ' '.join(e)
print result
But if you're feeling clever, you can streamline a lot of the setup.
import string
def create_word(suffix):
return ' '.join(i + suffix for i in string.ascii_lowercase)
join doesn't change its arguments - it just returns a new string:
result = ' '.join(e)
return result
If you really want the output you specified (all of the results concatenated together):
>>> import string
>>> string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'
>>> letters = string.ascii_lowercase
>>> suffix = 'Suffix'
>>> ''.join('%s%s' % (l, suffix) for l in letters)
'aSuffixbSuffixcSuffixdSuffixeSuffixfSuffixgSuffixhSuffixiSuffixjSuffixkSuffixlSuffixmSuffixnSuffixoSuffixpSuffixqSuffixrSuffixsSuffixtSuffixuSuffixvSuffixwSuffixxSuffixySuffixzSuffix'
Beside the problem already mentioned by rekursive, you should have a look at list comprehension:
def create_word(suffix):
return ''.join(
[i+suffix for i in "abcefghijklmnopqrstuvwxyz"]
)
print create_word('suffix')
There is a string, it contains numbers and characters.
I need to find an entire number(s) (in that string) that contains number 467033.
e.g. 1.467033777777777
Thanks
Try this:
import re
RE_NUM = re.compile('(\d*\.\d+)', re.M)
text = 'eghwodugo83o135.13508yegn1.4670337777777773u87208t'
for num in RE_NUM.findall(text):
if '467033' in num:
print num
Prints:
1.4670337777777773
Generalized / optimized in response to comment:
def find(text, numbers):
pattern = '|'.join('[\d.]*%s[\d.]*' % n for n in numbers)
re_num = re.compile(pattern, re.M)
return [m.group() for m in re_num.finditer(text)]
print find(text, ['467033', '13'])
Prints:
['135.13508', '1.4670337777777773']
If you're just searching for a substring within another substring, you can use in:
>>> sub_num = "467033"
>>> my_num = "1.467033777777777"
>>> sub_num in my_num
True
However, I suspect there's more to your problem than just searching strings, and that doing it this way might not be optimal. Can you be more specific about what you're trying to do?
import re
a = 'e.g. 1.467033777777777\nand also 576575567467033546.90 Thanks '
r = re.compile('[0-9.]*467033[0-9.]*')
r.findall(a)
['1.467033777777777', '576575567467033546.90']
i'm trying to keep only the letters in a string. i am trying to do something like this:
s = '1208uds9f8sdf978qh39h9i#H(&#*H(&H97dgh'
s_ = lambda: letter if letter.isalpha(), s
this errors out. how would a working version look?
Alternately:
s_ = filter(lambda c: c.isalpha(), s)
how about
re.sub('[^a-zA-Z]','', s)
or
"".join([x for x in s if x.isalpha()])
One handy way to manipulate strings is using a generator function and the join method:
result = "".join( letter for letter in s if letter.isalpha() )
You don't need a lambda function:
result = ''.join(c for c in input_str if c.isalpha())
If you really want to use a lambda function you could write it as follows:
result = ''.join(filter(lambda c:str.isalpha(c), input_str))
But this can also be simplified to:
result = ''.join(filter(str.isalpha, input_str))
You probably want a list comprehension here:
s_ = [letter for letter in s if letter.isalpha()]
However, this will give you a list of strings (each one character long). To convert this into a single string, you can use join:
s2 = ''.join(s_)
If you want, you can combine the two into a single statement:
s_ = ''.join(letter for letter in s if letter.isalpha())
If you particularly want or need to use a lambda function, you can use filter instead of the generator:
my_func = lambda letter: letter.isalpha()
s_ = ''.join(filter(my_func, s))
>>> s = '1208uds9f8sdf978qh39h9i#H(&#*H(&H97dgh'
>>> ''.join(e for e in s if e.isalpha())
'udsfsdfqhhiHHHdgh'
This is kind of the long way round, but will let you create a filter for any arbitrary set of characters.
import string
def strfilter(validChars):
vc = set(validChars)
def filter(s):
return ''.join(ch for ch in s if ch in vc)
return filter
filterAlpha = strfilter(string.letters)
filterAlpha('1208uds9f8sdf978qh39h9i#H(&#*H(&H97dgh') # -> 'udsfsdfqhhiHHHdgh'