Python regular expression string groupings - python

I'm trying to match either # or the string at, like for name#email and nameatemail. I imagine it's something like
regex = '#|at'
or
regex = '#|(at)'
but I just can't find the right syntax.

I suggest you use Kodos to test your regular expressions (it also provides you with Python code for your regex). And this for regular expression info.

For your issue both regex works correctly:
match = re.search("#|at", subject)
if match:
result = match.group()

Related

Regular expression to match one or more patterns matching another regular expression

I am using regular expressions for my django url configurations. I have the following regex:
url(r'^myapp/prices/?([X]{1}[A-Z0-9]{3}:[A-Z0-9]{1}[A-Z09.-]{1,4})/?([0-9]{0,3})/?$', views.prices, name='prices'),
This matches urls such as:
htpp://127.0.0.1/myapp/prices/XNAS:GOOG/1
htpp://127.0.0.1/myapp/prices/XNAS:GOOG
htpp://127.0.0.1/myapp/prices/XNAS:FB/10
I want to modify my regex pattern in my url pattern, so that I can match on strings like the above, as well as strings like the one below:
htpp://127.0.0.1/myapp/prices/XNAS:GOOG+XNAS:TSLA+XNAS:FB/1
Essentially, I want my original pattern to be matched at least once, and if more than once, then the occurrences of the pattern should be separated by a '+' sign.
How would I express this using regex syntax (Python)
repeatable = r'[X]{1}[A-Z0-9]{3}:[A-Z0-9]{1}[A-Z09.-]{1,4}'
url_regex = r'^myapp/prices/?(' + re.escape(repeatable) + r')(\+' + re.escape(repeatable) + r')*/?([0-9]{0,3})/?$'
url(url_regex, views.prices, name='prices')
But I believe it's more complicated that this:
url(r'^myapp/prices/?([X]{1}[A-Z0-9]{3}:[A-Z0-9]{1}[A-Z09.-]{1,4})(\+?[X]{1}[A-Z0-9]{3}:[A-Z0-9]{1}[A-Z09.-]{1,4})*/?([0-9]{0,3})/?$', views.prices, name='prices'),

python how to replace string by regex group?

Give an string like '/apps/platform/app/app_name/etc', I can use
p = re.compile('/apps/(?P<p1>.*)/app/(?P<p2>.*)/')
to get two matched groups of platform and app_name, but how can I use re.sub function (or maybe better way) to replace those two groups with other string like windows and facebook? So the final string would like /apps/windows/app/facebook/etc.
Separate group replacement wouldn't be possible through regex. So i suggest you to do like this.
(?<=/apps/)(?P<p1>.*)(/app/)(?P<p2>.*)/
DEMO
Then replace the matched characters with windows\2facebook/ . And also i suggest you to define your regex as raw string. Lookbehind is used inorder to avoid extra capturing group.
>>> s = '/apps/platform/app/app_name/etc'
>>> re.sub(r'(?<=/apps/)(?P<p1>.*)(/app/)(?P<p2>.*)/', r'windows\2facebook/', s)
'/apps/windows/app/facebook/etc'

Python regular expression expansion

I am really bad with regular expressions, and stuck up to generate all the possible combinations for a regular expression.
When the regular expression is abc-defghi00[1-24,2]-[1-20,23].walmart.com, it should generate all its possible combinations.
The text before the braces can be anything and the pattern inside the braces is optional.
Need all the python experts to help me with the code.
Sample output
Here is the expected output:
abc-defghi001-1.walmart.com
.........
abc-defghi001-20.walmart.com
abc-defghi001-23.walmart.com
..............
abc-defghi002-1.walmart.com
Repeat this from 1-24 and 2.
Regex tried
([a-z]+)(-)([a-z]+)(\[)(\d)(-)(\d+)(,?)(\d?)(\])(-)(\[)(\d)(-)(\d+)(,?)(\d?)(\])(.*)
Lets say we would like to match against abc-defghi001-1.walmart.com. Now, if we write the following regex, it does the job.
s = 'abc-defghi001-1.walmart.com'
re.match ('.*[1-24]-[1-20|23]\.walmart\.com',s)
and the output:
<_sre.SRE_Match object at 0x029462C0>
So, its found. If you want to match to 27 in the first bracket, you simply replace it by [1-24|27], or if you want to match to 0 to 29, you simply replace it by [1-29]. And ofcourse, you know that you have to write import re, before all the above commands.
Edit1: As far as I understand, you want to generate all instances of a regular expression and store them in a list.
Use the exrex python library to do so. You can find further information about it here. Then, you have to limit the regex you use.
import re
s = 'abc-defghi001-1.walmart.com'
obj=re.match(r'^\w{3}-\w{6}00(1|2)-([1-20]|23)\.walmart\.com$',s)
print(obj.group())
The above regex will match the template you're looking for I hope!

Convert a perl regex to a python regex

I'm trying to convert a perl regex to python equivalent.
Line in perl:
($Cur) = $Line =~ m/\s*\<stat\>(.+)\<\/stat\>\s*$/i;
What I've attempted, but doesn't seem to work:
m = re.search('<stat>(.*?)</stat>/i', line)
cur = m.group(0)
almost /i means case insensitive
m = re.search(r'<stat>(.*?)</stat>',line,re.IGNORECASE)
also use the r modifier on the string so you dont need to escape stuff like angle brackets.
but my guess is a better solution is to use an html/xml parser like beautifulsoup or other similar packages
Something like the following ...
r is Python’s raw string notation for regex patterns and to avoid escaping, after the prefix comes your regular expression following your string data. re.I is used for case-insensitive matching.
See the re documentation explaining this in more detail.
To find your match, you could use the group() method of MatchObject like the following:
cur = re.search(r'<stat>([^<]*)</stat>', line).group(1)
Using search() matches only the first occurrence, use findall() to match all occurrences.
matches = re.findall(r'<stat>([^<]*)</stat>', line)

Python: RegEx assistance

I have a filename 10.10.10.17_super-micro-100-13.txt from which I need to extract everything between _ and .. E.g., in this case it would return super-micro-100-13
I will need a Python regex to accomplish the task. If I do
re.compile('\_(.*)\.), I get _super-micro-100-13. which is not what I want. Can anyone throw some light on what would be the correct regex in this case?
Thanks,
Neel
If you decide you don't need to use regex, throwing together a few string methods is more readable.
file_name = "10.10.10.17_super-micro-100-13.txt"
print file_name.split("_")[1].split(".")[0]
You can use a lookbehind and lookahead so that you are only actually matching the part that you want. Also note that you need to escape the . at the end to match a literal dot.
Here is the regex you could use:
regex = re.compile(r'(?<=_).*(?=\.)')
Alternatively, you can use your current regex and pull out the first capture group from your match:
regex = re.compile(r'_(.*)\.')
print regex.search('10.10.10.17_super-micro-100-13.txt').group(1)
# super-micro-100-13
Try this:
import re
name = '10.10.10.17_super-micro-100-13.txt'
regex = re.compile(r'.+_(.+)\.txt')
regex.match(name).group(1)
> 'super-micro-100-13'
I do think that regex is a bit overkill. You can use the "find" function as follow:
def extract_info(s):
underscore = s.find('_')
dot = s.find('_', underscore) //you only want a dot after the underscore
return s[underscore:dot]

Categories