Python RegEx: find specific word followed by roman numeral [duplicate]

Python RegEx: find specific word followed by roman numeral [duplicate] - python

http://regex101.com/r/oU6eI5/1 , test here seam works, but when i put in Python, match whole str.
str = galley/files/tew/tewt/tweqt/
re.sub('^.+/+([^/]+/$)', "\1", str)
i want get "tweqt/"

You need to use a raw string in the replace:
str = galley/files/tew/tewt/tweqt/
re.sub('^.+/+([^/]+/$)', r"\1", str)
# ^
Otherwise, you get the escaped character \1. For instance on my console, it's a little smiley.
If you somehow don't want to raw your string, you'll have to escape the backslash:
re.sub('^.+/+([^/]+/$)', "\\1", str)
Also worth noting that it's a good practice to raw your regex strings and use consistent quotes, so you I would advise using:
re.sub(r'^.+/+([^/]+/$)', r'\1', str)
Other notes
It might be simpler to match (using re.search) instead of using re.sub:
re.search(r'[^/]+/$', str).group()
# => tweqt/
And you might want to use another variable name other than str because this will override the existing function str().

It would be better if you define the pattern or regex as raw string.
>>> import re
>>> s = "galley/files/tew/tewt/tweqt/"
>>> m = re.sub(r'^.+/+([^/]+/$)', r'\1', s)
^ ^
>>> m
'tweqt/'

Related

using OR operator (|) in variable for regular expression in python

I need to match against a list of string values. I'm using '|'.join() to build a sting that is passed into re.match:
import re
line='GigabitEthernet0/1 is up, line protocol is up'
interfacenames=[
'Loopback',
'GigabitEthernet'
]
rex="r'" + '|'.join(interfacenames) + "'"
print rex
interface=re.match(rex,line)
print interface
The code result is:
r'Loopback|GigabitEthernet'
None
However if I copy past the string directly into match:
interface=re.match(r'Loopback|GigabitEthernet',line)
It works:
r'Loopback|GigabitEthernet'
<_sre.SRE_Match object at 0x7fcdaf2f4718>
I did try to replace .join with actual "Loopback|GigabitEthernet" in rex and it didn't work either. It looks like the pipe symbol is not treated as operator when passed from string.
Any thoughts how to fix it?

You use the r' prefix as a part of a string literal. This is how it could be used:
rex=r'|'.join(interfacenames)
See the Python demo
If the interfacenames may contain special regex metacharacters, escape the values like this:
rex=r'|'.join([re.escape(x) for x in interfacenames])
Also, if you plan to match the strings not only at the start of the string, use re.search rather than re.match. See What is the difference between Python's re.search and re.match?

You don't need to put "r'" at the beginning and "'". That's part of the syntax for literal raw strings, it's not part of the string itself.
rex = '|'.join(interfacenames)

Replace based on several regex rules in Python

I want to use, for example, this patterns
rules = {
'\s': '_',
'.(?P<word>\w)': '\1',
'text1': 'text2',
#etc
}
using re.sub()
There are some examples like this, but it doesn't work with regex special charecters.

I use raw strings when using regex in python. Saves you from having to escape special characters: https://docs.python.org/2/library/re.html
Try:
rules = {
r"\s": r"_",
r"text1": r"text2",
#etc
}

You should use raw strings like so:
rules = {
r'\s': r'_',
r'.(?P<word>\w)': r'\1',
r'text1': r'text2',
#etc
}
It means you don't need to escape special characters
Here is why it happens (direct quote from the docs):
Regular expressions use the backslash character ('\') to indicate
special forms or to allow special characters to be used without
invoking their special meaning. This collides with Python’s usage of
the same character for the same purpose in string literals; for
example, to match a literal backslash, one might have to write '\\'
as the pattern string, because the regular expression must be \, and
each backslash must be expressed as \ inside a regular Python string
literal.
And how to solve it (another quote from the docs):
The solution is to use Python’s raw string notation for regular
expression patterns; backslashes are not handled in any special way in
a string literal prefixed with 'r'. So r"\n" is a two-character string
containing '\' and 'n', while "\n" is a one-character string
containing a newline. Usually patterns will be expressed in Python
code using this raw string notation.

Surely, you need to use raw strings when declaring Python regexes, and there are some issues with your examples, but you are interested in how to run the regex replacements.
I suggest using an OrderedDict so that the replacements could be performed in a strict order, as they were defined in the dictionary. Then, the code will look like
import re
from collections import OrderedDict # adding the import
rules=OrderedDict() # defining the regex
rules[r'\s'] = '-' # replacement
rules[r'.(\w)'] = r'\1' # pairs
rules['text1'] = 'text2' # here
s = "nnoo mmoorree tteexxtt11" # a test string
for key in rules.keys(): # iterating through keys
s = re.sub(key, rules[key], s) # perform the S&R
print(s) # Demo printing
See the IDEONE demo

Use raw string notation to avoid having to escape your special characters:
rules = {
'\s': '_',
'.(?P<word>\w)': '\1',
'text1': 'text2',
#etc
}
Directly from the regular expression module (re) documentation:
Raw string notation (r"text") keeps regular expressions sane. Without it, every backslash ('\') in a regular expression would have to be prefixed with another one to escape it. For example, the two following lines of code are functionally identical:
>>> re.match(r"\W(.)\1\W", " ff ")
<_sre.SRE_Match object at ...>
>>> re.match("\\W(.)\\1\\W", " ff ")
<_sre.SRE_Match object at ...>
When one wants to match a literal backslash, it must be escaped in the regular expression. With raw string notation, this means r"\". Without raw string notation, one must use "\\", making the following lines of code functionally identical:
>>> re.match(r"\\", r"\\")
<_sre.SRE_Match object at ...>
>>> re.match("\\\\", r"\\")
<_sre.SRE_Match object at ...>

newbie about regular expression in Python

I have the following code to match pattern from str
match = re.search(r'word:\w\w\w', str)
I try to create a variable
pat='word:\w\w\w' # and using in re.search
match = re.search(rpat, str)
I got compile error
how to create a variable for a pattern in Python?

You can't replace r'word:\w\w\w' by erasing the string and adding a variable name in front of the r. The r is part of the string literal, so it has to go with you when you move it.
pat=r'word:\w\w\w'
match = re.search(pat, str)
Although in this particular situation, you don't need the r at all, since your string doesn't have any escape sequences in it anyway.
pat='word:\w\w\w'
match = re.search(pat, str)

How can Python's regular expressions work with patterns that have escaped special characters?

Is there a way to get Python's regular expressions to work with patterns that have escaped special characters? As far as my limited understanding can tell, the following example should work, but the pattern fails to match.
import re
string = r'This a string with ^g\.$s' # A string to search
pattern = r'^g\.$s' # The pattern to use
string = re.escape(string) # Escape special characters
pattern = re.escape(pattern)
print(re.search(pattern, string)) # This prints "None"
Note:
Yes, this question has been asked elsewhere (like here). But as you can see, I'm already implementing the solution described in the answers and it's still not working.

Why on earth are you applying re.escape to the string?! You want to find the "special" characters in that! If you just apply it to the pattern, you'll get a match:
>>> import re
>>> string = r'This a string with ^g\.$s'
>>> pattern = r'^g\.$s'
>>> re.search(re.escape(pattern), re.escape(string)) # nope
>>> re.search(re.escape(pattern), string) # yep
<_sre.SRE_Match object at 0x025089F8>
For bonus points, notice that you just need to re.escape the pattern one more times than the string:
>>> re.search(re.escape(re.escape(pattern)), re.escape(string))
<_sre.SRE_Match object at 0x025D8DE8>

Using regex in python

i have the following problem.
I want to escape all special characters in a python string.
str='eFEx-x?k=;-'
re.sub("([^a-zA-Z0-9])",r'\\1', str)
'eFEx\\1x\\1k\\1\\1\\1'
str='eFEx-x?k=;-'
re.sub("([^a-zA-Z0-9])",r'\1', str)
'eFEx-x?k=;-'
re.sub("([^a-zA-Z0-9])",r'\\\1', str)
I can't seem to win here. '\1' indicates the special character and i want to add a '\' before this special character. but using \1 removes its special meaning and \\1 also does not help.

Use r'\\\1'. That's a backslash (escaped, so denoted \\) followed by \1.
To verify that this works, try:
str = 'eFEx-x?k=;-'
print re.sub("([^a-zA-Z0-9])",r'\\\1', str)
This prints:
eFEx\-x\?k\=\;\-
which I think is what you want. Don't be confused when the interpreter outputs 'eFEx\\-x\\?k\\=\\;\\-'; the double backslashes are there because the interpreter quotes it output, unless you use print.

Why don't you use re.escape()?
str = 'eFEx-x?k=;-'
re.escape(str)
'eFEx\\-x\\?k\\=\\;\\-'

Try adding another backslash:
s = 'eFEx-x?k=;-'
print re.sub("([^a-zA-Z0-9])",r'\\\1', s)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python RegEx: find specific word followed by roman numeral [duplicate] - python

http://regex101.com/r/oU6eI5/1 , test here seam works, but when i put in Python, match whole str. str = galley/files/tew/tewt/tweqt/ re.sub('^.+/+([^/]+/$)', "\1", str) i want get "tweqt/"

It would be better if you define the pattern or regex as raw string. >>> import re >>> s = "galley/files/tew/tewt/tweqt/" >>> m = re.sub(r'^.+/+([^/]+/$)', r'\1', s) ^ ^ >>> m 'tweqt/'

Related

using OR operator (|) in variable for regular expression in python

Replace based on several regex rules in Python

newbie about regular expression in Python

How can Python's regular expressions work with patterns that have escaped special characters?

Using regex in python

Categories

Resources