Python regex should match a dot but it does not [duplicate] - python

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 6 years ago.
I have the following Python 3 code:
import re
pattern=re.compile(r'\.')
print(pattern.match('abc.de'))
The output is:
None
What am I doing wrong? Why the regex does not match the dot?

As per the documentation of match it checks from the beginning of a string.
If zero or more characters at the beginning of string match this regular expression, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.
Use search instead for search at any position.
>>> import re
>>> pattern=re.compile(r'\.')
>>> print(pattern.search('abc.de'))
<_sre.SRE_Match object at 0x7fc7b5823648>
>>> print(pattern.search('abc.de').group())
.

match looks for matches at the beginning of the string, unless you tell it to do otherwise. The dot isn't at the beginning of the string, so it can't be found.
See the documentation here: https://docs.python.org/3/library/re.html#re.match

Related

Returning more than one result with pipe characer in Python Regex [duplicate]

This question already has an answer here:
How can I find all matches to a regular expression in Python?
(1 answer)
Closed 2 years ago.
when using the pipe character |, how do I get all the results defined in my regex to return?
Or does the .search() method only return the first result found?
Here is my code:
import re
bat.Regex = re.compile(r'Bat(man|mobile|copter|bat)')
matchObject = batRegex.search('Batmobile lost a wheel, Batcopter is not a chopper, his name is Batman, not Batbat')
print(matchObject.group())
Only the first result 'batmobile' is returned, is it possible to return all results?
Thanks!
Please refer to official documentation for "re" module
Excerpts from the docs linked:
findall() matches all occurrences of a pattern, not just the first one as search() does.
re.search(pattern, string, flags=0)
Scan through string looking for the first location where the regular expression and return a corresponding match object.
re.findall(pattern, string, flags=0) Return all non-overlapping matches of pattern in string, as a list of strings (Note here: Not a match object which is only one match)
import re
batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
results = batRegex.findall('Batmobile lost a wheel, Batcopter is not a chopper, his name is Batman, not Batbat')
results
output:
['mobile', 'copter', 'man', 'bat']

Python conditional using regex with multiline string [duplicate]

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Regular expression works on regex101.com, but not on prod
(1 answer)
Closed 3 years ago.
I need help with very simples question, a conditional using regex with multiline string. No make sense to me why this not work:
if(re.match(r"\w", " \n\n\n aaaaaaaaaaaa\n\n", re.MULTILINE)):
print('ok')
else:
print('fail')
fail
I expected that result be ok, but no match any data. I trying using https://regex101.com/r/BsdymE/1, but there works and in my code not works.
re.match will only return a match if the search string is at the beginning.
https://docs.python.org/3/library/re.html#re.match
re.match(pattern, string, flags=0)
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object. Return None if the string does not match the pattern; note that this is different from a zero-length match.
Try using re.search(pattern, string, flags=0) instead
From pydoc re.match:
Try to apply the pattern at the start of the string, returning a
Match object, or None if no match was found.
(emphasis mine). So, the problem is not with the string being multiline, but rather with it not beginning with a word-class character. If you want to check if a string contains something anywhere, use re.search instead.

Why the python regex pattern fails to match when append a dollar($)? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
Here's a small code, using python 3.4
import re
path_pattern=r'(([^\W]|[.~%$])+)'
re.search(path_pattern+'$','./').string
It will report AttributeError: 'NoneType' object has no attribute 'string' on execution.
If I remove the +'$' in the code, It works,
import re
path_pattern=r'(([^\W]|[.~%$])+)'
re.search(path_pattern,'./').string
As far as I know, $ is for matching the end of string, but why isn't it working here?
If your explore your regex path_pattern at https://regex101.com/, you'll find it only matches ., so after you append $, it'll match nothing, and re.search returns None if no position in the string matches the pattern, that's why you get the error.
Check it out here:
>>> path_pattern=r'(([^\W]|[.~%$])+)'
>>> r = re.search(path_pattern + "$",'./')
>>> print(r)
None
Your regex cannot match the / character in your string, and only . is matched.
When you use $ in your regex, it cannot match at all. When you remove it, it matches but only with ".".

How to match an underscore using Python's regex? [duplicate]

This question already has an answer here:
Python regular expression re.match, why this code does not work? [duplicate]
(1 answer)
Closed 6 years ago.
I'm having trouble matching the underscore character in Python using regular expressions. Just playing around in the shell, I get:
>>> import re
>>> re.match(r'a', 'abc')
<_sre.SRE_Match object at 0xb746a368>
>>> re.match(r'_', 'ab_c')
>>> re.match(r'[_]', 'ab_c')
>>> re.match(r'\_', 'ab_c')
I would have expected at least one of these to return a match object. Am I doing something wrong?
Use re.search instead of re.match if the pattern you are looking for is not at the start of the search string.
re.match(pattern, string, flags=0)
Try to apply the pattern at the start of the string, returning a match
object, or None if no match was found.
re.search(pattern, string, flags=0)
Scan through string looking for a match to the pattern, returning a
match object, or None if no match was found.
You don't need to escape _ or even use raw string.
>>> re.search('_', 'ab_c')
Out[4]: <_sre.SRE_Match object; span=(2, 3), match='_'>
Try the following:
re.search(r'\_', 'ab_c')
You were indeed right to escape the underscore character!
Mind that you can only use match for the beginning of strings, as is also clear from the documentation (https://docs.python.org/2/library/re.html):
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.
You should use search in this case:
Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

Why does this Regex only match at the start of the line in Python? [duplicate]

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 7 years ago.
In Python I can do
import re
re.match("m", "mark")
and I get the expected result:
<_sre.SRE_Match object; span=(0, 1), match='m'>
But it only works if the pattern is at the start of the string:
re.match("m", "amark")
gives None. There is noting about that pattern which requires it to be at the start of the string - no ^ or similar. Indeed it works as expected on regex101.
Does Python have some special behaviour - and how do I disable it please?
From the docs on re.match:
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object.
Use re.search to search the entire string.
The docs even grant this issue its own chapter, outlining the differences between the two: search() vs. match()
import re
re.match("[^m]*m", "mark")
match matches from beginning of string.So you need to give it a way to match the start of string if m is not at start of string.

Categories