re.search greedy matching all combinations (letters, special characters, numbers) - python

I am trying to find a pattern using re.search
How do I search a url like
blah&match=Z-300&
and get what comes after match=
So in this case,
I want to get Z-300

As tends to be my favorite answer to regex questions, don't use regex.
Use urlparse. (in py3, urllib.parse)
from urlparse import parse_qs
parse_qs('blah&match=Z-300&')
Out[22]: {'match': ['Z-300']}

import re
s = 'blah&match=Z-300&'
print re.search('&match=(.*)',s).group(1) #Z-300&
print re.search('&match=(.*)&',s).group(1) #Z-300

match = re.search('&match=(.*?)&', text).group(1)

Related

How can I use re.match to find numbers?

I'm trying to use python re module:
import re
res = re.match(r"\d+", 'editUserProfile!input.jspa?userId=2089')
print(res)
I got None type for res, but if I replace the match by findall, I can find the 2089.
Do you know where the problem is ?
The problem is that you're using match() to search for a substring in a string.
The method match() only works for the whole string. If you want to search for a substring inside a string, you should use search().
As stated by khelwood in the comments, you should take a look at: Search vs Match.
Code:
import re
res = re.search(r"\d+", 'editUserProfile!input.jspa?userId=2089')
print(res.group(0))
Output:
2089
Alternatively you can use .split() to isolate the user id.
Code:
s = 'editUserProfile!input.jspa?userId=2089'
print(s.split('=')[1])
Output:
2089

Regex - Python matching between string and first occurence

I'm having a hard time grasping regex no matter how much documentation I read up on. I'm trying to match everything between a a string and the first occurrence of & this is what I have
link = "group.do?sys_id=69adb887157e450051e85118b6ff533c&&"
rex = re.compile("group\.do\?sys_id=(.?)&")
sysid = rex.search(link).groups()[0]
I'm using https://regex101.com/#python to help me validate my regex and I can kinda get rex = re.compile("user_group.do?sys_id=(.*)&") to work but the .* is greedy and matches to the last & and im looking to match to the first &
I thought .? matches zero to 1 time
You don't necessarily need regular expressions here. Use urlparse instead:
>>> from urlparse import urlparse, parse_qs
>>> parse_qs(urlparse(link).query)['sys_id'][0]
'69adb887157e450051e85118b6ff533c'
In case of Python 3 change the import to:
from urllib.parse import urlparse, parse_qs
You can simply regex out to the &amp instead of the final & like so:
import re
link = "user_group.do?sys_id=69adb887157e450051e85118b6ff533c&&"
rex = re.compile("user_group\.do\?sys_id=(.*)&&")
sysid = rex.search(link).groups()[0]
print(sysid)
.*
is greedy but
.*?
should not be in regex.
.?
would only look for any character 0-1 times while
.*?
will look for it up to the earliest matching occurrence. I hope that explains it.

Return reoccuring regex matches with python

I have a string:
SomeTextSomeTextASomeThingBSomeTextSomeTextASomeThingElseBSomeText
I want to have the Strings SomeThing and SomeThingElse string returned because they are bracketed with A and B and assuming SomeText does not contain any A ... B occurences.
Any hint would be highly appreciated.
Here's what I tried, but it doesn't work:
import re
string = 'SomeTextSomeTextASomeThingBSomeTextSomeTextASomeThingElseBSomeText'
regex='(A.*B)'
I guess neither the regex is correct, nor do I know how to access the matches. Is it match of finditer or…?
Try using re.findall:
>>> print re.findall('A(.*?)B', s)
['SomeThing', 'SomeThingElse']
See it working online: ideone
Note the question mark. Without it the matching is done greedily - it will consume as many characters as possible.

What is the syntax for evaluating string matches on regular expressions?

How do I determine if a string matches a regular expression?
I want to find True if a string matches a regular expression.
Regular expression:
r".*apps\.facebook\.com.*"
I tried:
if string == r".*apps\.facebook\.com.*":
But that doesn't seem to work.
From the Python docs: on re module, regex
import re
if re.search(r'.*apps\.facebook\.com.*', stringName):
print('Yay, it matches!')
Since re.search returns a MatchObject if it finds it, or None if it is not found.
You have to import the re module and test it that way:
import re
if re.match(r'.*apps\.facebook\.com.*', string):
# it matches!
You can use re.search instead of re.match if you want to search for the pattern anywhere in the string. re.match will only match if the pattern can be located at the beginning of the string.
import re
match = re.search(r'.*apps\.facebook\.com.*', string)
You're looking for re.match():
import re
if (re.match(r'.*apps\.facebook\.com.*', string)):
do_something()
Or, if you want to match the pattern anywhere in the string, use re.search().
Why don't you also read through the Python documentation for the re module?

How do I use Regex to find the ID in a YouTube link?

when I try to extract this video ID (AIiMa2Fe-ZQ) with a regex expression, I can't get the dash an all the letters after.
>>> id = re.search('(?<=\?v\=)\w+', 'http://www.youtube.com/watch?v=AIiMa2Fe-ZQ')
>>> print id.group(0)
>>> AIiMa2Fe
Intead of \w+ use below. Word character (\w) doesn't include a dash. It only includes [a-zA-Z_0-9].
[\w-]+
I don't know the pattern for youtube hashes, but just include the "-" in the possibilities as it is not considered an alpha:
import re
id = re.search('(?<=\?v\=)[\w-]+', 'http://www.youtube.com/watch?v=AIiMa2Fe-ZQ')
print id.group(0)
I have edited the above because as it turns out:
>>> re.search("[\w|-]", "|").group(0)
'|'
The "|" in the character definition does not act as a special character but does indeed match the "|" pipe. My apologies.
>>> re.search('(?<=v=)[\w-]+', 'http://www.youtube.com/watch?v=AIiMa2Fe-ZQ').group()
'AIiMa2Fe-ZQ'
\w is a short-hand for [a-zA-Z0-9_] in python2.x, you'll have to use re.A flag in py3k. You quite clearly have additional character in that videoid, i.e., hyphen. I've also removed redundant escape backslashes from the lookbehind.
Use the urlparse module instead of regex for such kind of things.
import urlparse
parsed_url = urlparse.urlparse(url)
if parsed_url.netloc.find('youtube.com') != -1 and parsed_url.path == '/watch':
video = urlparse.parse_qs(parsed_url.query).get('v', None)
if video is None:
video = urlparse.parse_qs(parsed_url.fragment.strip('!')).get('v', None)
if video is not None:
print video[0]
EDIT: Updated for the upcoming new youtube url format.
/(?:/v/|/watch\?v=|/watch#!v=)([A-Za-z0-9_-]+)/
Explain the RE
There are three alternate YouTube formats: /v/[ID] and watch?v= and the new AJAX watch#!v= This RE captures all three. There is also new YouTube URL for user pages that is of the form /user/[user]?content={complex URI} This is not captured here by any regex...
I'd try this:
>>> import re
>>> a = re.compile(r'.*(\-\w+)$')
>>> a.search('http://www.youtube.com/watch?v=AIiMa2Fe-ZQ').group(1)
'-ZQ'

Categories