Python conditional using regex with multiline string [duplicate] - python

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Regular expression works on regex101.com, but not on prod
(1 answer)
Closed 3 years ago.
I need help with very simples question, a conditional using regex with multiline string. No make sense to me why this not work:
if(re.match(r"\w", " \n\n\n aaaaaaaaaaaa\n\n", re.MULTILINE)):
print('ok')
else:
print('fail')
fail
I expected that result be ok, but no match any data. I trying using https://regex101.com/r/BsdymE/1, but there works and in my code not works.

re.match will only return a match if the search string is at the beginning.
https://docs.python.org/3/library/re.html#re.match
re.match(pattern, string, flags=0)
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object. Return None if the string does not match the pattern; note that this is different from a zero-length match.
Try using re.search(pattern, string, flags=0) instead

From pydoc re.match:
Try to apply the pattern at the start of the string, returning a
Match object, or None if no match was found.
(emphasis mine). So, the problem is not with the string being multiline, but rather with it not beginning with a word-class character. If you want to check if a string contains something anywhere, use re.search instead.

Related

Returning more than one result with pipe characer in Python Regex [duplicate]

This question already has an answer here:
How can I find all matches to a regular expression in Python?
(1 answer)
Closed 2 years ago.
when using the pipe character |, how do I get all the results defined in my regex to return?
Or does the .search() method only return the first result found?
Here is my code:
import re
bat.Regex = re.compile(r'Bat(man|mobile|copter|bat)')
matchObject = batRegex.search('Batmobile lost a wheel, Batcopter is not a chopper, his name is Batman, not Batbat')
print(matchObject.group())
Only the first result 'batmobile' is returned, is it possible to return all results?
Thanks!
Please refer to official documentation for "re" module
Excerpts from the docs linked:
findall() matches all occurrences of a pattern, not just the first one as search() does.
re.search(pattern, string, flags=0)
Scan through string looking for the first location where the regular expression and return a corresponding match object.
re.findall(pattern, string, flags=0) Return all non-overlapping matches of pattern in string, as a list of strings (Note here: Not a match object which is only one match)
import re
batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
results = batRegex.findall('Batmobile lost a wheel, Batcopter is not a chopper, his name is Batman, not Batbat')
results
output:
['mobile', 'copter', 'man', 'bat']

Compiling user defined regex object and searching/matching in Python [duplicate]

This question already has answers here:
How to create raw string from string variable in python?
(3 answers)
Closed 4 years ago.
I am writing an online regex checker, which takes input from the user in the form of a pattern, and flags, and uses that to compile a regex object. The regex object is then used to check if the test string matches within the format provided by the regex pattern or not. As of this moment, the compile function looks like this:
class RegexObject:
...
def compile(self):
flags = ''
if self.multiline_flag:
flags = re.M
if self.dotall_flag:
flags |= re.S
if self.verbose_flag:
flags |= re.X
if self.ignorecase_flag:
flags |= re.I
if self.unicode_flag:
flags |= re.U
regex = re.compile(self.pattern, flags)
return regex
Please note, the self.pattern and all the flags are class attributes defined by the user using a simple form. However, one thing I noticed in the docs is that there is usually an r before the pattern in the compile functions, like this:
re.compile(r'(?<=abc)def')
How do I place that r in my code before my variable name? Also, if I want to tell the user if the test input is valid or not, should I be using the match method, or the search method?
Thanks for any help.
Edit: This question is not a duplicate of this one, because that question has nothing to do with regular expressions.
Don't worry about the r, you don't need it here.
The r stands for "raw", not "regex". In an r string, you can put backslashes without escaping them. R strings are often used in regexes because there are often many backslashes in regexes. Escaping the backslashes can be annoying. See this shell output:
>>> s = r"\a"
>>> s2 = "\a"
>>> s2
'\x07'
>>> s
'\\a'
And you should use search, as match only looks at the start of the string. Look at the docs.
re.search(pattern, string, flags=0)
Scan through string looking for
the first location where the regular expression pattern produces a
match, and return a corresponding match object. Return None if no
position in the string matches the pattern; note that this is
different from finding a zero-length match at some point in the
string.
re.match(pattern, string, flags=0)
If zero or more characters at the
beginning of string match the regular expression pattern, return a
corresponding match object. Return None if the string does not match
the pattern; note that this is different from a zero-length match.
Note that even in MULTILINE mode, re.match() will only match at the
beginning of the string and not at the beginning of each line.
If you want to locate a match anywhere in string, use search() instead
(see also search() vs. match()).
You need not use r.Instead you should use re.escape.match or search again should be user input.

Python regex should match a dot but it does not [duplicate]

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 6 years ago.
I have the following Python 3 code:
import re
pattern=re.compile(r'\.')
print(pattern.match('abc.de'))
The output is:
None
What am I doing wrong? Why the regex does not match the dot?
As per the documentation of match it checks from the beginning of a string.
If zero or more characters at the beginning of string match this regular expression, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.
Use search instead for search at any position.
>>> import re
>>> pattern=re.compile(r'\.')
>>> print(pattern.search('abc.de'))
<_sre.SRE_Match object at 0x7fc7b5823648>
>>> print(pattern.search('abc.de').group())
.
match looks for matches at the beginning of the string, unless you tell it to do otherwise. The dot isn't at the beginning of the string, so it can't be found.
See the documentation here: https://docs.python.org/3/library/re.html#re.match

How to match an underscore using Python's regex? [duplicate]

This question already has an answer here:
Python regular expression re.match, why this code does not work? [duplicate]
(1 answer)
Closed 6 years ago.
I'm having trouble matching the underscore character in Python using regular expressions. Just playing around in the shell, I get:
>>> import re
>>> re.match(r'a', 'abc')
<_sre.SRE_Match object at 0xb746a368>
>>> re.match(r'_', 'ab_c')
>>> re.match(r'[_]', 'ab_c')
>>> re.match(r'\_', 'ab_c')
I would have expected at least one of these to return a match object. Am I doing something wrong?
Use re.search instead of re.match if the pattern you are looking for is not at the start of the search string.
re.match(pattern, string, flags=0)
Try to apply the pattern at the start of the string, returning a match
object, or None if no match was found.
re.search(pattern, string, flags=0)
Scan through string looking for a match to the pattern, returning a
match object, or None if no match was found.
You don't need to escape _ or even use raw string.
>>> re.search('_', 'ab_c')
Out[4]: <_sre.SRE_Match object; span=(2, 3), match='_'>
Try the following:
re.search(r'\_', 'ab_c')
You were indeed right to escape the underscore character!
Mind that you can only use match for the beginning of strings, as is also clear from the documentation (https://docs.python.org/2/library/re.html):
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.
You should use search in this case:
Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

Why does this Regex only match at the start of the line in Python? [duplicate]

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 7 years ago.
In Python I can do
import re
re.match("m", "mark")
and I get the expected result:
<_sre.SRE_Match object; span=(0, 1), match='m'>
But it only works if the pattern is at the start of the string:
re.match("m", "amark")
gives None. There is noting about that pattern which requires it to be at the start of the string - no ^ or similar. Indeed it works as expected on regex101.
Does Python have some special behaviour - and how do I disable it please?
From the docs on re.match:
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object.
Use re.search to search the entire string.
The docs even grant this issue its own chapter, outlining the differences between the two: search() vs. match()
import re
re.match("[^m]*m", "mark")
match matches from beginning of string.So you need to give it a way to match the start of string if m is not at start of string.

Categories