So, I'm trying to capture this big string in Python but it is failing me. The regex I wrote works fine in regexr: http://regexr.com/3cmdc
But trying to using it in Python to capture the text returns None. This is the code:
pattern = "var initialData = (.*?);\\n"
match = re.search(pattern, source).group(1)
What am I missing ?
You need to set the appropriate flags:
re.search(pattern, source, re.MULTILINE | re.DOTALL).group(1)
Use pythons raw string notation:
pattern = r"var initialData = (.*?);\\n"
match = re.search(pattern, source).group(1)
More information
Related
This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 4 years ago.
This regex works in pythex, but not in python 3.6. I am not sure why:
Pythex link (click)
Code in python:
import re
test = '105297 003 002394 o 0000 20891 0.00 1'
pattern = r"(?P<pun1>\d{3})\s+(?P<pun2>\d{6})(\s+(?P<pun3>[01oO])(\s+(?P<pun4>\d{4}))?)?\s.*\s(?P<amt>\d+\.\d\d)\s"
match = re.match(pattern, test, re.IGNORECASE)
match is None
True
I haven't been able to figure out why it works in pythex but not in python interpreter.
You might be looking for re.search() not re.match(). The latter only matches at the start of the string (implies an anchor ^, that is):
match = re.search(pattern, test, re.IGNORECASE)
# ^^^
if match:
# change the world here
See a demo on regex101.com.
I suspect your problems comes from calling re.match instead of re.search. The re.search function tries to find the regex in the given string, while re.match requires the regex to match at the beginning of the string.
Change this:
match = re.match(pattern, test, re.IGNORECASE)
to this:
match = re.search(pattern, test, re.IGNORECASE)
The problem is that match() is used for matching the beginning of a string, not anywehere.
from python docs: (Python docs for match())
"If zero or more characters at the beginning of string match this regular expression, return a corresponding match object."
You should use search() instead:
"If you want to locate a match anywhere in string, use search() instead."
see also search() vs. match()
this part:
match = re.match(pattern, test, re.IGNORECASE)
has to be:
match = re.search(pattern, test, re.IGNORECASE)
I am trying to get all email address from a text file using regular expression and Python but it always returns NoneType while it suppose to return the email. For example:
content = 'My email is lehai#gmail.com'
#Compare with suitable regex
emailRegex = re.compile(r'(^[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)')
mo = emailRegex.search(content)
print(mo.group())
I suspect the problem lies in the regex but could not figure out why.
Because of spaces in content; remove the ^ and $ to match anywhere:
([a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)
Try this one as a regex, but I am completely not sure whether it will work for you:
([^#|\s]+#[^#]+.[^#|\s]+)
Your regular expression doesn't match the pattern.
I normally call the regex search like this:
mo = re.search(regex, searchstring)
So in your case I would try
content = 'My email is lehai#gmail.com'
#Compare with suitable regex
emailRegex = re.compile(r'gmail')
mo = re.search(emailRegex, content)
print(mo.group())`
You can test your regex here: https://regex101.com/
This will work:
([a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)
How can I get only the words that match my regex in python? Because everything I tried also prints the full line where the string was found.
The regex is the following:
\b([1-9][0-9]{1,2})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\/([0-9]{1,2})\b
It matched IP + CIDR (e.g 12.0.0.0/8)
The text in which I am searching this is as follows:
04/30","172.18.186.0/24","172.18.185.0/24","172.18.177.16/28","dwefwf-1.RI-nc_wefwfwefwefpat_intweb_fe","172.18.176.16/28","edefwfwf
t_pat_infwef_fe","172.18.178.16/28","dwefwefwef-wefwffwefwefwef_dr_efwefeb_fe","172.18.176.80/28","DSwefwfH2.
RI-nc_rat_dr_fweweb_fe","172.18.178.48/28","172.18.177.208/28","wefwef
wefwtfweapp_fe","172.18.176.208/28","wfwfwefwefwefH2.RI-nwefwefdr_app_fe","172.18.177.192/28","de1dfwwf-1.wefewf","172.18.176.1
92/28","
You should modify your regex as follows:
\b(([1-9][0-9]{1,2})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\/([0-9]{1,2}))\b
and then extract the first matched group: \1
Demo: http://repl.it/R0W/1 (It takes a while to run)
I think your regexp work correctly. If you want to get matched string use group function. Like this:
import re
regexp = r'\b([1-9][0-9]{1,2})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\/([0-9]{1,2})\b'
text = '''04/30","172.18.186.0/24","172.18.185.0/24","172.18.177.16/28","dwefwf-1.RI-nc_wefwfwefwefpat_intweb_fe","172.18.176.16/28","edefwfwf
t_pat_infwef_fe","172.18.178.16/28","dwefwefwef-wefwffwefwefwef_dr_efwefeb_fe","172.18.176.80/28","DSwefwfH2.
RI-nc_rat_dr_fweweb_fe","172.18.178.48/28","172.18.177.208/28","wefwef
wefwtfweapp_fe","172.18.176.208/28","wfwfwefwefwefH2.RI-nwefwefdr_app_fe","172.18.177.192/28","de1dfwwf-1.wefewf","172.18.176.1
92/28","'''
for i in re.finditer(regexp, text):
print i.group(0)
I'm trying to convert a perl regex to python equivalent.
Line in perl:
($Cur) = $Line =~ m/\s*\<stat\>(.+)\<\/stat\>\s*$/i;
What I've attempted, but doesn't seem to work:
m = re.search('<stat>(.*?)</stat>/i', line)
cur = m.group(0)
almost /i means case insensitive
m = re.search(r'<stat>(.*?)</stat>',line,re.IGNORECASE)
also use the r modifier on the string so you dont need to escape stuff like angle brackets.
but my guess is a better solution is to use an html/xml parser like beautifulsoup or other similar packages
Something like the following ...
r is Python’s raw string notation for regex patterns and to avoid escaping, after the prefix comes your regular expression following your string data. re.I is used for case-insensitive matching.
See the re documentation explaining this in more detail.
To find your match, you could use the group() method of MatchObject like the following:
cur = re.search(r'<stat>([^<]*)</stat>', line).group(1)
Using search() matches only the first occurrence, use findall() to match all occurrences.
matches = re.findall(r'<stat>([^<]*)</stat>', line)
I'm trying to match either # or the string at, like for name#email and nameatemail. I imagine it's something like
regex = '#|at'
or
regex = '#|(at)'
but I just can't find the right syntax.
I suggest you use Kodos to test your regular expressions (it also provides you with Python code for your regex). And this for regular expression info.
For your issue both regex works correctly:
match = re.search("#|at", subject)
if match:
result = match.group()