using regular exp. in python

using regular exp. in python - python

if i am entering email address of the form username#companyname.com and i just wanna search '#' then why is this code isn't working.
``'
emailAddress=raw_input()
pat = '#'
match = re.match(pat2,emailAddress)
print match.group()
```

Assuming the pat/pat2 issue is just a typo, you want to use re.search instead of re.match. search searches the whole string for a match while match only searches for matches beginning at the beginning of the string.

Related

Extracting a word between two path separators that comes after a specific word

I have the following path stored as a python string 'C:\ABC\DEF\GHI\App\Module\feature\src' and I would like to extract the word Module that is located between words \App\ and \feature\ in the path name. Note that there are file separators '\' in between which ought not to be extracted, but only the string Module has to be extracted.
I had the few ideas on how to do it:
Write a RegEx that matches a string between \App\ and \feature\
Write a RegEx that matches a string after \App\ --> App\\[A-Za-z0-9]*\\, and then split that matched string in order to find the Module.
I think the 1st solution is better, but that unfortunately it goes over my RegEx knowledge and I am not sure how to do it.
I would much appreciate any help.
Thank you in advance!

The regex you want is:
(?<=\\App\\).*?(?=\\feature\\)
Explanation of the regex:
(?<=behind)rest matches all instances of rest if there is behind immediately before it. It's called a positive lookbehind
rest(?=ahead) matches all instances of rest where there is ahead immediately after it. This is a positive lookahead.
\ is a reserved character in regex patterns, so to use them as part of the pattern itself, we have to escape it; hence, \\
.* matches any character, zero or more times.
? specifies that the match is not greedy (so we are implicitly assuming here that \feature\ only shows up once after \App\).
The pattern in general also assumes that there are no \ characters between \App\ and \feature\.
The full code would be something like:
str = 'C:\\ABC\\DEF\\GHI\\App\\Module\\feature\\src'
start = '\\App\\'
end = '\\feature\\'
pattern = rf"(?<=\{start}\).*?(?=\{end}\)"
print(pattern) # (?<=\\App\\).*?(?=\\feature\\)
print(re.search(pattern, str)[0]) # Module
A link on regex lookarounds that may be helpful: https://www.regular-expressions.info/lookaround.html

We can do that by str.find somethings like
str = 'C:\\ABC\\DEF\\GHI\\App\\Module\\feature\\src'
import re
start = '\\App\\'
end = '\\feature\\'
print( (str[str.find(start)+len(start):str.rfind(end)]))
print("\n")
output
Module

Your are looking for groups. With some small modificatians you can extract only the part between App and Feature.
(?:App\\\\)([A-Za-z0-9]*)(?:\\\\feature)
The brackets ( ) define a Match group which you can get by match.group(1). Using (?:foo) defines a non-matching group, e.g. one that is not included in your result. Try the expression here: https://regex101.com/r/24mkLO/1

Regex for escaping path separator in url

I have a url pattern: "somepath/email/". I don't want to write a regex for matching email instead I want anything which isn't a path separator to match email.
Please suggest a regex for this. I am using Python and the url is for a Django application, So any library function will also be helpful but I will prefer a regex.

The regex [^/\\]+ is a negative character class with a + quantifier and matches any number of characters that are not a / or \\
Code sample:
match = re.search("[^/\\]+", subject)
if match:
result = match.group()
else:
result = ""

How to ignore \n in regular expressions in python?

So i have a regex telling if a number is integer.
regex = '^(0|[1-9][0-9]*)$'
import re
bool(re.search(regex, '42\n'))
returns True, and it is not supposed to?
Where does the problem come from ?

From the documentation:
'$'
Matches the end of the string or just before the newline at the end of the string
Try \Z instead.
Also, any time you find yourself writing a regular expression that starts with ^ or \A and ends with $ or \Z, if your intent is to only match the entire string, you should probably use re.fullmatch() instead of re.search() (and omit the boundary markers from the regex). Or if you're using a version of Python that's too old to have re.fullmatch(), (you really need to upgrade but) you can use re.match() and omit the beginning-of-string boundary marker.

regex ahould be regex = '\b^(0|[1-9][0-9]*)$\b'

The regex in the question matches ->start of line, numbers and end of line. And the given string matches that, thats why it is returning true. If you want it to return False when there is a number present, you can use "!" to indicate NOT.
Refer https://docs.python.org/2/library/re.html
regex = '!(0|[1-9][0-9]*)$'
bool(re.search(regex, '42\n')) => (Returns false)

Yeah, that $ matching one \n before the end is kind of trap/inconsistency. Check out my list of regex traps for python: http://www.cofoh.com/advanced-regex-tutorial-python/traps

How to get only searched word as a result python regex

How can I get only the words that match my regex in python? Because everything I tried also prints the full line where the string was found.
The regex is the following:
\b([1-9][0-9]{1,2})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\/([0-9]{1,2})\b
It matched IP + CIDR (e.g 12.0.0.0/8)
The text in which I am searching this is as follows:
04/30","172.18.186.0/24","172.18.185.0/24","172.18.177.16/28","dwefwf-1.RI-nc_wefwfwefwefpat_intweb_fe","172.18.176.16/28","edefwfwf
t_pat_infwef_fe","172.18.178.16/28","dwefwefwef-wefwffwefwefwef_dr_efwefeb_fe","172.18.176.80/28","DSwefwfH2.
RI-nc_rat_dr_fweweb_fe","172.18.178.48/28","172.18.177.208/28","wefwef
wefwtfweapp_fe","172.18.176.208/28","wfwfwefwefwefH2.RI-nwefwefdr_app_fe","172.18.177.192/28","de1dfwwf-1.wefewf","172.18.176.1
92/28","

You should modify your regex as follows:
\b(([1-9][0-9]{1,2})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\/([0-9]{1,2}))\b
and then extract the first matched group: \1
Demo: http://repl.it/R0W/1 (It takes a while to run)

I think your regexp work correctly. If you want to get matched string use group function. Like this:
import re
regexp = r'\b([1-9][0-9]{1,2})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\/([0-9]{1,2})\b'
text = '''04/30","172.18.186.0/24","172.18.185.0/24","172.18.177.16/28","dwefwf-1.RI-nc_wefwfwefwefpat_intweb_fe","172.18.176.16/28","edefwfwf
t_pat_infwef_fe","172.18.178.16/28","dwefwefwef-wefwffwefwefwef_dr_efwefeb_fe","172.18.176.80/28","DSwefwfH2.
RI-nc_rat_dr_fweweb_fe","172.18.178.48/28","172.18.177.208/28","wefwef
wefwtfweapp_fe","172.18.176.208/28","wfwfwefwefwefH2.RI-nwefwefdr_app_fe","172.18.177.192/28","de1dfwwf-1.wefewf","172.18.176.1
92/28","'''
for i in re.finditer(regexp, text):
print i.group(0)

python regular expression substitution with matched group

I'm trying to substitue the channel name for AndroidManifest.xml to batch generate a groups of channel apk packages for release.
<meta-data android:value="CHANNEL_NAME_TO_BE_DETERMINED" android:name="UMENG_CHANNEL"/>
from an xml file.
The channel configs are saved in a config file, sth like:
channel_name output_postfix valid
"androidmarket" "androidmarket" true
Here is what I tried:
manifest_original_xml_fh = open("../AndroidManifest_original.xml", "r")
manifest_xml_fh = open("../AndroidManifest.xml", "w")
pattern = re.compile('<meta-data\sandroid:value=\"(.*)\"\sandroid:name=\"UMENG_CHANNEL\".*')
for each_config_line in manifest_original_xml_fh:
each_config_line = re.sub(pattern, channel_name, each_config_line)
print each_config_line
It replaces the whole <meta-data android:value="CHANNEL_NAME_TO_BE_DETERMINED" android:name="UMENG_CHANNEL"/> to androidmarket which is obviously not my need. Then I figured out the problem is that pattern.match(each_config_line) return a match result ,and one of the result group is "CHANNEL_NAME_TO_BE_DETERMINED". I've also tried to give some replace implementation function, but still failed.
So, since I've successfully find the pattern, how can I replace the matched result group element correctly?

I suggest a different approach: save your xml as a template, with placeholders to be replaced with standard Python string operations.
E.g.
AndroidManifest_template.xml:
<meta-data android:value="%(channel_name)s" android:name="UMENG_CHANNEL"/>
python:
manifest_original_xml_fh = open("../AndroidManifest_template.xml", "r")
manifest_xml_fh = open("../AndroidManifest.xml", "w")
for each_config_line in manifest_original_xml_fh:
each_config_line = each_config_line % {'channel_name': channel_name}
print each_config_line

I think your misunderstanding is, everything that has been matched will be replaced. If you want to keep stuff from the pattern, you have to capture it and reinsert it in the replacement string.
Or match only what you want to replace by using lookaround assertions
Try this
pattern = re.compile('(?<=<meta-data\sandroid:value=\")[^"]+')
for each_config_line in manifest_original_xml_fh:
each_config_line = re.sub(pattern, channel_name, each_config_line)
(?<=<meta-data\sandroid:value=\") is a positive lookbehind assertion, it ensures that this text is before, but does not match it (so it will not be replaced)
[^"]+ will then match anything that is not a "
See it here on Regexr

To capture just the value of the meta-data tag you need to change the regex:
<meta-data\sandroid:value=\"([^"]*)\"\sandroid:name=\"UMENG_CHANNEL\".*
Specifically I changed this part:
\"(.*)\" - this is a greedy match, so it will go ahead and match as many characters as possible as long as the rest of the expression matches
to
\"([^"]*)\" - which will match anything that's not the double quote. The matching result will still be in the first capturing group
If you want to do the replace thing, a better idea might be to capture what you want to stay the same - I'm not a python expert but something like this would probably work:
re.sub(r'(<meta-data\sandroid:value=\")[^"]*(\"\sandroid:name=\"UMENG_CHANNEL\".*)'
, r'\1YourNewValue\2', s)
\1 is backreference 1 - i.e. it gets what the first capturing group matched

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

using regular exp. in python - python

if i am entering email address of the form username#companyname.com and i just wanna search '#' then why is this code isn't working. ``' emailAddress=raw_input() pat = '#' match = re.match(pat2,emailAddress) print match.group() ```

Assuming the pat/pat2 issue is just a typo, you want to use re.search instead of re.match. search searches the whole string for a match while match only searches for matches beginning at the beginning of the string.

Related

Extracting a word between two path separators that comes after a specific word

Regex for escaping path separator in url

How to ignore \n in regular expressions in python?

How to get only searched word as a result python regex

python regular expression substitution with matched group

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

using regular exp. in python - python

if i am entering email address of the form username#companyname.com and i just wanna search '#' then why is this code isn't working. ``' emailAddress=raw_input() pat = '#' match = re.match(pat2,emailAddress) print match.group() ```﻿

Assuming the pat/pat2 issue is just a typo, you want to use re.search instead of re.match. search searches the whole string for a match while match only searches for matches beginning at the beginning of the string.

Related

Extracting a word between two path separators that comes after a specific word

Regex for escaping path separator in url

How to ignore \n in regular expressions in python?

How to get only searched word as a result python regex

python regular expression substitution with matched group

Categories

Resources

if i am entering email address of the form username#companyname.com and i just wanna search '#' then why is this code isn't working. ``' emailAddress=raw_input() pat = '#' match = re.match(pat2,emailAddress) print match.group() ```