Formatting regex strings causes regex pattern characters to be escaped [duplicate] - python

This question already has answers here:
Why do backslashes appear twice?
(2 answers)
How to format raw string with different expressions inside?
(2 answers)
Closed 3 years ago.
I want to put some variables into a regex, but also maintain a regex pattern.
regex = 'set groups {group} routing-instances (?P<routing_instances>[\w\W]+) interface {logical_interface}'.format(
group=group,
logical_interface=logical_interface
)
However, it escapes the escape characters:
ipdb> regex
'set groups GROUP1 routing-instances (?P<routing_instances>[\\w\\W]+) interface a10.555'

Use raw strings:
regex = r'your \regex \here'
Also, it doesn't really matter because your string doesn't actually contain the double slashes, it's the textual representation that contains them.

Related

Isn't the 'r' letter making the regex pattern string literal? [duplicate]

This question already has answers here:
What exactly is a "raw string regex" and how can you use it?
(7 answers)
Closed 7 months ago.
I had thought the 'r' prefix in the pattern is to make sure that anything in the pattern will be interpreted as string literal, so that I don't have to use escape, but in this case below, I still have to use '.' for literal match. So what's the purpose of the 'r' in the beginning of the regex?
pattern = r'.'
text = "this is. test"
text = re.sub(pattern, ' ', text)
The r prefix stands for "raw." It means that escape sequences inside a raw string will appear as literal. Consider:
print('Hello\b World') # Hello World
print(r'Hello\b World') # Hello\b World
In the first non raw string example, \b is interpreted as a control character (which doesn't get printed). In the second example using a raw string, \b is a literal word boundary.
Another example would be comparing '\1' to r'\1'. In the former, '\1' is a control character, while the latter is the first capture group. Note that to represent the first capture group without using a raw string we can double up backslashes, i.e. use '\\1'.

How to write \n without making a newline [duplicate]

This question already has answers here:
How can I put an actual backslash in a string literal (not use it for an escape sequence)?
(4 answers)
Closed 3 years ago.
So I'm trying to write this exact string but I don't \n to make a new line I want to actually print \n on the screen. Any thoughts on how to go about this? (using python
Languages:\npython\nc\njava
adding a backslash will interpret the succeeding backslash character literally. print("\\n").
Either escape the backslash by preceding it with another backslash:
'Languages:\\npython\\nc\\njava'
Or use a raw string by preceding the literal with an r:
r'Languages:\npython\nc\njava'

how to create a raw string when it's last character is a `\` in python [duplicate]

This question already has answers here:
Why can't Python's raw string literals end with a single backslash?
(13 answers)
Closed 5 years ago.
As we all know,we can create a string like this:
str1 = r"\abc\test"
But if I want put the \ in the end of a string like:
str2 = r"\abc\test\"
A syntax error occurs !
I have found an answer,but it's in JavaScript.
`String.raw` when last character is `\`
So,How to deal this in python
You can concat another normal string:
>>> r'\abc\test' + '\\'
'\\abc\\test\\'
duplicated Why can't Python's raw string literals end with a single backslash?
https://docs.python.org/3/reference/lexical_analysis.html
Specifically, a raw literal cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the literal, not as a line continuation.

Using findall method in a tokenized text, and prefix 'r' [duplicate]

This question already has answers here:
What does the "r" in pythons re.compile(r' pattern flags') mean?
(3 answers)
Closed 5 years ago.
I understand that the 'r' prefix indicates a raw string, hence why in the following example is the 'r' prefix being used, since there are special regex characters in the string, which should not be taken literally?
the 'string' that is being searched is an nltk Text object, I suppose it has something to do with this? However I don't understand how it affects the usage of findall.
moby.findall(r"<a> (<.*>) <man>")
In this particular case, r makes no difference, as this string does not contain any sequences which could be misinterpreted. However, it is a good habit to use r when writing regular expressions, to avoid misinterpretation of sequences like \n or \t; with r, they are treated literally, as two characters - backslash followed by a letter; without r, they evaluate to newline and tab, respectively.
The r preceeding the string is called a sigil.
For example, '\n' will be treated as a newline character, while r'\n' will be treated as the characters \ followed by n.
But for your regex:
moby.findall(r"<a> (<.*>) <man>")
it doesn't make a difference but it is always a good idea to treat regex as raw strings to avoid escaping backslashes.

Replace as raw string in Python [duplicate]

This question already has answers here:
Escaping regex string
(4 answers)
Closed 6 years ago.
I am replacing string content as:
re.sub(all, val, parsedData['outData'])
where all contains some round braces and might contain other characters.
>>> print all
PICDSPVERS="DspFw:1.0008(1.0008),Fpga1:2.0925(2.0925),Fpga2:1.0404(1.0404),Mcu:1.0000(1.0000)"
Because of which matching fails. The pattern is coming from some interface, so I don't want to put \\ in the data.
I tried with 'r' and re.U option also, but still the match fails.
re.search('PICDSPVERS="DspFw:1.0008(1.0008)', parsedData['outData'])
How can we direct Python to treat a matching pattern as a string?
I am using Python 2.x.
If you don't want the matching pattern to be treated as a regular expression, then don't use re.sub. For plain strings, use str.replace(), like so:
new_outData = parsedData['outData'].replace(all, val)

Categories