Replace as raw string in Python [duplicate] - python

This question already has answers here:
Escaping regex string
(4 answers)
Closed 6 years ago.
I am replacing string content as:
re.sub(all, val, parsedData['outData'])
where all contains some round braces and might contain other characters.
>>> print all
PICDSPVERS="DspFw:1.0008(1.0008),Fpga1:2.0925(2.0925),Fpga2:1.0404(1.0404),Mcu:1.0000(1.0000)"
Because of which matching fails. The pattern is coming from some interface, so I don't want to put \\ in the data.
I tried with 'r' and re.U option also, but still the match fails.
re.search('PICDSPVERS="DspFw:1.0008(1.0008)', parsedData['outData'])
How can we direct Python to treat a matching pattern as a string?
I am using Python 2.x.

If you don't want the matching pattern to be treated as a regular expression, then don't use re.sub. For plain strings, use str.replace(), like so:
new_outData = parsedData['outData'].replace(all, val)

Related

Raw notation doesn't give the desired outcome [duplicate]

This question already has answers here:
Escaping regex string
(4 answers)
Closed 2 years ago.
I'm trying to use Python's raw notation to find a pattern that includes special characters with no success.
When using the 'r' notation to ignore the special characters nothing is found - see the example below:
Problematic Code
import re
pattern = re.compile(r"testing+101#gmail.com")
sentence = '___dsdtesting+101#gmail.comaaa___'
result = re.search(pattern, sentence).group()
print(result)
The above code will not find the pattern and return
AttributeError: 'NoneType' object has no attribute 'group'
Working Code
When escaping the '+' with '\' it works as expected:
import re
pattern = re.compile("testing\+101#gmail.com")
sentence = '___dsdtesting+101#gmail.comaaa___'
result = re.search(pattern, sentence).group()
print(result)
The above code will return the desired outcome of "testing+101#gmail.com".
Am I using the raw notation wrong? What's going on?
TO CLARIFY: I am not interested in escaping with the '\', rather I want to use the raw notation.
There are two levels of special characters here — those that are special to Python’s string syntax, and those that are special in regular expressions. Using raw strings takes care of the first group, but not the second group.
The plus sign is special in regexes, so to match the string a+ you need the regex a\+. Because the backslash is special to Python strings, if you do not use raw strings you need to type this as 'a\\+'. Using raw strings lets you type r'a\+'.
(Because the sequence \+ does not mean anything special to Python, and Python leaves such sequences unchanged, you could actually get away with just 'a\+'.)

Formatting regex strings causes regex pattern characters to be escaped [duplicate]

This question already has answers here:
Why do backslashes appear twice?
(2 answers)
How to format raw string with different expressions inside?
(2 answers)
Closed 3 years ago.
I want to put some variables into a regex, but also maintain a regex pattern.
regex = 'set groups {group} routing-instances (?P<routing_instances>[\w\W]+) interface {logical_interface}'.format(
group=group,
logical_interface=logical_interface
)
However, it escapes the escape characters:
ipdb> regex
'set groups GROUP1 routing-instances (?P<routing_instances>[\\w\\W]+) interface a10.555'
Use raw strings:
regex = r'your \regex \here'
Also, it doesn't really matter because your string doesn't actually contain the double slashes, it's the textual representation that contains them.

Using findall method in a tokenized text, and prefix 'r' [duplicate]

This question already has answers here:
What does the "r" in pythons re.compile(r' pattern flags') mean?
(3 answers)
Closed 5 years ago.
I understand that the 'r' prefix indicates a raw string, hence why in the following example is the 'r' prefix being used, since there are special regex characters in the string, which should not be taken literally?
the 'string' that is being searched is an nltk Text object, I suppose it has something to do with this? However I don't understand how it affects the usage of findall.
moby.findall(r"<a> (<.*>) <man>")
In this particular case, r makes no difference, as this string does not contain any sequences which could be misinterpreted. However, it is a good habit to use r when writing regular expressions, to avoid misinterpretation of sequences like \n or \t; with r, they are treated literally, as two characters - backslash followed by a letter; without r, they evaluate to newline and tab, respectively.
The r preceeding the string is called a sigil.
For example, '\n' will be treated as a newline character, while r'\n' will be treated as the characters \ followed by n.
But for your regex:
moby.findall(r"<a> (<.*>) <man>")
it doesn't make a difference but it is always a good idea to treat regex as raw strings to avoid escaping backslashes.

match everything after a slash and without the slash [duplicate]

This question already has answers here:
Split a string by backslash in python
(6 answers)
Closed 6 years ago.
I am working with regular expression with the module re in python. I am supose to match everything before a slash, put the match in a variable, and match everything after a slash, and put it in another variable.
For example:
for the string
"NlaIII/Csp6I"
I would like to match NlaIII and store it in a variable and match Csp6I and store it in another variable
variable_1 = "NlaIII"
variable_2 = "Csp6I"
Using python module re, I have been able to match everything before the slash with the following regular expression:
first_enzyme = re.compile('.+?(?=\W+)')
But I am completely unable to everything after a backslash without the backslash
Thank you very much for your help!
You don't need a regex for that at all.
s = "NlaIII/Csp6I"
variable_1, variable_2 = s.split('/')

regex syntax in python, the r before the opening quote [duplicate]

This question already has answers here:
Python - Raw String Literals [duplicate]
(2 answers)
Closed 7 years ago.
This is a line of regex from a Python thing I'm writing:
m = re.match(r"{(.+)}", self.label)
As far as i can tell, it's working fine.
Anyways, my question is about the r character before the first double quote. I've never really questioned it. But why is it there? What is its purpose?
The r before a string literal tells Python not to do any \ escaping on the string. For instance:
>>> print('a\nb')
a
b
>>> print(r'a\nb')
a\nb
>>>
The reason r-prefixed strings are often used with regular expressions is because regular expressions often use a lot of \'s. For instance, to use a simple example, compare the regular expression '\\d+' versus r'\d+'. They're actually the same string, just represented in different ways. With the r syntax, you don't have to escape the \'s that are used in the regular expression syntax. Now imagine having a lot of \'s in your regular expression; it's much cleaner to use the r syntax.
"String literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and use different rules for interpreting backslash escape sequences."
https://docs.python.org/2/reference/lexical_analysis.html#string-literals

Categories