Invalid regular expression: invalid escape \ sequence, Postgres, Django - python

I'm trying to escape a string яблуко* for Postgres regex query:
name = re.escape('яблуко*')
Model.objects.filter(name__iregex='^%s' % name)
This gives me:
Invalid regular expression: invalid escape \ sequence
What am I doing wrong?
P.S. I know that I can do it with istartswith, just wondering why regex is not working.

The problem here is that re.escape does escape way too much for PostgreSQL - it does escape all non ASCII chars, while PostgreSQL doesn't support escape sequences for unknown chars - in this case it's all the unicode chars:
>>> print re.escape('яблуко*')
\я\б\л\у\к\о\*
In the end it's not really possible to mix Python regexp engine (for escaping) with database regexp engine (for evaluation). Unfortunately Django doesn't provide way to do this. In Weblate, I've solved this by writing custom function to escape the regexp, see https://github.com/WeblateOrg/weblate/commit/7425a749b44abafe36d8f1c9db018f57684e5983

Related

Python lint issue : invalid escape sequence '\/'

This is my python code line which is giving me invalid escape sequence '/' lint issue.
pattern = 'gs:\/\/([a-z0-9-]+)\/(.+)$' # for regex matching
It is giving me out that error for all the backslash I used here .
any idea how to resolve this ?
There's two issues here:
Since this is not a raw string, the backslashes are string escapes, not regexp escapes. Since \/ is not a valid string escape sequence, you get that warning. Use a raw string so that the backslashes will be ignored by the string parser and passed to the regexp engine. See What exactly is a "raw string regex" and how can you use it?
In some languages / is part of the regular expression syntax (it's the delimiter around the regexp), so they need to be escaped. But Python doesn't use / this way, so there's no need to escape them in the first place.
Use this:
pattern = r'gs://([a-z0-9-]+)/(.+)$' # for regex matching

How to fix “<string> DeprecationWarning: invalid escape sequence” in Python?

I am using pytest-3/python3
def check_email(email):
**regex = '^\w+([\.-]?\w+)*#\w+([\.-]?\w+)*(\.\w{2,3})+$'**
if(re.search(regex,email)):
return True
else:
return False
The ** part is what gave an error
Use a raw string
regex = r'^\w+([\.-]?\w+)*#\w+([\.-]?\w+)*(\.\w{2,3})+$'
Note the r prefix. This ensures that the \ are not interpreted as possible escape sequences and instead just as plain \.
As \ is meaningful in regular expressions, it is a good habit in python to always use raw strings for regular expressions. (See the re documentation)
The usage of "\~" is not printing literally a "\~" but will be reduced to an escape sequence much like '\n' is a New-Line Character. You can use for example "\\w" to achieve a literal "\w".
Or you can preprend the whole string with an r like r'your \string'

Python error because of regex inside a Google Big Query

I am writing Google Big Query wrappers in python. One of the queries has a regex and the python code is treating it as an syntax error.
Here is the regex
WHEN tier2 CONTAINS '-' THEN REGEXP_EXTRACT(tier2,'(.*)\s-')
the error is Invalid string literal: '(.*)\s-'>
The error is for \ in the regex.
Any suggestion to overcome it
You need to escape backslash by preceding it with yet another backslash
Backslash \ is an escape character so you need to escape it so it is treated as a normal character
Try
'(.*)\\s-'
Based on your comments, looks like above is exactly what you are using in BigQuery - so in this case you need to escape each of two backslashes
'(.*)\\\\s-'

Python - How to catch specific mysql warnings?

I want to catch any warning that contains the string 'value'.
From this question, I see this example to catch a specific message:
warnings.filterwarnings('error', 'Unknown table .*')
The docs say about the message parameter:
message is a string containing a regular expression that the warning message must match (the match is compiled to always be case-insensitive).
I have the following code but no errors are thrown and instead I'm just getting the warnings which I cannot catch.
warnings.filterwarnings('error', message='\bvalue\b')
What am I missing? As far as I know, that regex should work for matching the 'value' string.
Python's regular expression syntax is documented here, and the first thing it says is:
Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\' as the pattern string, because the regular expression must be \, and each backslash must be expressed as \ inside a regular Python string literal.
The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.
In Python, as in many languages, the string '\b' corresponds to ASCII backspace
(link).
You need to escape your backslash characters, or else use Python's special "raw/regex" prefix:
warnings.filterwarnings('error', message='\\bvalue\\b')
warnings.filterwarnings('error', message=r'\bvalue\b')

pep8 warning on regex string in Python, Eclipse

Why is pep8 complaining on the next string in the code?
import re
re.compile("\d{3}")
The warning I receive:
ID:W1401 Anomalous backslash in string: '\d'. String constant might be missing an r prefix.
Can you explain what is the meaning of the message? What do I need to change in the code so that the warning W1401 is passed?
The code passes the tests and runs as expected. Moreover \d{3} is a valid regex.
"\d" is same as "\\d" because there's no escape sequence for d. But it is not clear for the reader of the code.
But, consider \t. "\t" represent tab chracter, while r"\t" represent literal \ and t character.
So use raw string when you mean literal \ and d:
re.compile(r"\d{3}")
or escape backslash explicitly:
re.compile("\\d{3}")
Python is unable to parse '\d' as an escape sequence, that's why it produces a warning.
After that it's passed down to regex parser literally, works fine as an E.S. for regex.

Categories