I have a python regex that match method always return None. I tested in pythex site and the pattern seems OK.
Pythex example
But when I try with re module, the result is always None:
import re
a = re.match(re.compile("\.aspx\?.*cp="), 'page.aspx?cpm=549&cp=168')
What am I doing wrong?
re.match() only matches at the start of a string. Use re.search() instead:
re.search(r"\.aspx\?.*cp=", 'page.aspx?cpm=549&cp=168')
Demo:
>>> import re
>>> re.search(r"\.aspx\?.*cp=", 'page.aspx?cpm=549&cp=168')
<_sre.SRE_Match object at 0x105d7e440>
>>> re.search(r"\.aspx\?.*cp=", 'page.aspx?cpm=549&cp=168').group(0)
'.aspx?cpm=549&cp='
Note that any re functions that take a pattern, accept a string and will call re.compile() for you (which caches compilation results). You only need to use re.compile() if you want to store the compiled expression for re-use, at which point you can call pattern.search() on it:
pattern = re.compile(r"\.aspx\?.*cp=")
pattern.search('page.aspx?cpm=549&cp=168')
Related
I'm trying to use python re module:
import re
res = re.match(r"\d+", 'editUserProfile!input.jspa?userId=2089')
print(res)
I got None type for res, but if I replace the match by findall, I can find the 2089.
Do you know where the problem is ?
The problem is that you're using match() to search for a substring in a string.
The method match() only works for the whole string. If you want to search for a substring inside a string, you should use search().
As stated by khelwood in the comments, you should take a look at: Search vs Match.
Code:
import re
res = re.search(r"\d+", 'editUserProfile!input.jspa?userId=2089')
print(res.group(0))
Output:
2089
Alternatively you can use .split() to isolate the user id.
Code:
s = 'editUserProfile!input.jspa?userId=2089'
print(s.split('=')[1])
Output:
2089
Is there a way to get Python's regular expressions to work with patterns that have escaped special characters? As far as my limited understanding can tell, the following example should work, but the pattern fails to match.
import re
string = r'This a string with ^g\.$s' # A string to search
pattern = r'^g\.$s' # The pattern to use
string = re.escape(string) # Escape special characters
pattern = re.escape(pattern)
print(re.search(pattern, string)) # This prints "None"
Note:
Yes, this question has been asked elsewhere (like here). But as you can see, I'm already implementing the solution described in the answers and it's still not working.
Why on earth are you applying re.escape to the string?! You want to find the "special" characters in that! If you just apply it to the pattern, you'll get a match:
>>> import re
>>> string = r'This a string with ^g\.$s'
>>> pattern = r'^g\.$s'
>>> re.search(re.escape(pattern), re.escape(string)) # nope
>>> re.search(re.escape(pattern), string) # yep
<_sre.SRE_Match object at 0x025089F8>
For bonus points, notice that you just need to re.escape the pattern one more times than the string:
>>> re.search(re.escape(re.escape(pattern)), re.escape(string))
<_sre.SRE_Match object at 0x025D8DE8>
I have this pattern: "(\?(.+?))\b".
In python, what should happen, is findall should return ("?var", "var") if i run it on the string: "some text ?var etc".
It works normally elsewhere, here's a regexr for proof.
In python, re's findall returns an empty list. Why is that?
You're not using raw string notation:
>>> import re
>>> re.findall(r'(\?(.+?))\b', 'some text ?var etc')
[('?var', 'var')]
The pattern (?<!(asp|php|jsp))\?.* works in PCRE, but it doesn't work in Python.
So what can I do to get this regex working in Python? (Python 2.7)
It works perfectly fine for me. Are you maybe using it wrong? Make sure to use re.search instead of re.match:
>>> import re
>>> s = 'somestring.asp?1=123'
>>> re.search(r"(?<!(asp|php|jsp))\?.*", s)
>>> s = 'somestring.xml?1=123'
>>> re.search(r"(?<!(asp|php|jsp))\?.*", s)
<_sre.SRE_Match object at 0x0000000002DCB098>
Which is exactly how your pattern should behave. As glglgl mentioned, you can get the match if you assign that Match object to a variable (say m) and then call m.group(). That yields ?1=123.
By the way, you can leave out the inner parentheses. This pattern is equivalent:
(?<!asp|php|jsp)\?.*
How do I determine if a string matches a regular expression?
I want to find True if a string matches a regular expression.
Regular expression:
r".*apps\.facebook\.com.*"
I tried:
if string == r".*apps\.facebook\.com.*":
But that doesn't seem to work.
From the Python docs: on re module, regex
import re
if re.search(r'.*apps\.facebook\.com.*', stringName):
print('Yay, it matches!')
Since re.search returns a MatchObject if it finds it, or None if it is not found.
You have to import the re module and test it that way:
import re
if re.match(r'.*apps\.facebook\.com.*', string):
# it matches!
You can use re.search instead of re.match if you want to search for the pattern anywhere in the string. re.match will only match if the pattern can be located at the beginning of the string.
import re
match = re.search(r'.*apps\.facebook\.com.*', string)
You're looking for re.match():
import re
if (re.match(r'.*apps\.facebook\.com.*', string)):
do_something()
Or, if you want to match the pattern anywhere in the string, use re.search().
Why don't you also read through the Python documentation for the re module?