using OR operator (|) in variable for regular expression in python - python

I need to match against a list of string values. I'm using '|'.join() to build a sting that is passed into re.match:
import re
line='GigabitEthernet0/1 is up, line protocol is up'
interfacenames=[
'Loopback',
'GigabitEthernet'
]
rex="r'" + '|'.join(interfacenames) + "'"
print rex
interface=re.match(rex,line)
print interface
The code result is:
r'Loopback|GigabitEthernet'
None
However if I copy past the string directly into match:
interface=re.match(r'Loopback|GigabitEthernet',line)
It works:
r'Loopback|GigabitEthernet'
<_sre.SRE_Match object at 0x7fcdaf2f4718>
I did try to replace .join with actual "Loopback|GigabitEthernet" in rex and it didn't work either. It looks like the pipe symbol is not treated as operator when passed from string.
Any thoughts how to fix it?

You use the r' prefix as a part of a string literal. This is how it could be used:
rex=r'|'.join(interfacenames)
See the Python demo
If the interfacenames may contain special regex metacharacters, escape the values like this:
rex=r'|'.join([re.escape(x) for x in interfacenames])
Also, if you plan to match the strings not only at the start of the string, use re.search rather than re.match. See What is the difference between Python's re.search and re.match?

You don't need to put "r'" at the beginning and "'". That's part of the syntax for literal raw strings, it's not part of the string itself.
rex = '|'.join(interfacenames)

Related

force re.search to include # and $

I am trying to get a substring between two markers using re in Python, for example:
import re
test_str = "#$ -N model_simulation 2022"
# these two lines work
# the output is: model_simulation
print(re.search("-N(.*)2022",test_str).group(1))
print(re.search(" -N(.*)2022",test_str).group(1))
# these two lines give the error: 'NoneType' object has no attribute 'group'
print(re.search("$ -N(.*)2022",test_str).group(1))
print(re.search("#$ -N(.*)2022",test_str).group(1))
I read the documentation of re here. It says that "#" is intentionally ignored so that the outputs look neater.
But in my case, I do need to include "#" and "$". I need them to identify the part of the string that I want, because the "-N" is not unique in my entire text string for real work.
Is there a way to force re to include those? Or is there a different way without using re?
Thanks.
You can escape both with \, for example,
print(re.search("\#\$ -N(.*)2022",test_str).group(1))
# output model_simulation
You can get rid of the special meaning by using the backslash prefix: $. This way, you can match the dollar symbol in a given string
# add backslash before # and $
# the output is: model_simulation
print(re.search("\$ -N(.*)2022",test_str).group(1))
print(re.search("\#\$ -N(.*)2022",test_str).group(1))
In regular expressions, $ signals the end of the string. So 'foo' would match foo anywhere in the string, but 'foo$' only matches foo if it appears at the end. To solve this, you need to escape it by prefixing it with a backslash. That way it will match a literal $ character
# is only the start of a comment in verbose mode using re.VERBOSE (which also ignores spaces), otherwise it just matches a literal #.
In general, it is also good practice to use raw string literals for regular expressions (r'foo'), which means Python will let backslashes alone so it doesn't conflict with regular expressions (that way you don't have to type \\\\ to match a single backslash \).
Instead of re.search, it looks like you actually want re.fullmatch, which matches only if the whole string matches.
So I would write your code like this:
print(re.search(r"\$ -N(.*)2022", test_str).group(1)) # This one would not work with fullmatch, because it doesn't match at the start
print(re.fullmatch(r"#\$ -N(.*)2022", test_str).group(1))
In a comment you mentioned that the string you need to match changes all the time. In that case, re.escape may prove useful.
Example:
prefix = '#$ - N'
postfix = '2022'
print(re.fullmatch(re.escape(prefix) + '(.*)' + re.escape(postfix), tst_str).group(1))

how can I substitute a matched string in python

I have a string ="/One/Two/Three/Four"
I want to convert it to ="Four"
I can do this in one line in perl
string =~ s/.*+\///g
How Can I do this in python?
str_name="/One/Two/Three/Four"
str_name.split('/')[-1]
In general, split is a safe way to convert a string into a list based on some reg-ex. Then, we can call the last element in that list, which happens to be "Four" in this case.
Hope this helps.
Python's re module can handle regular expressions. For this case, you'd do
import re
my_str = "/One/Two/Three/Four"
new_str = re.sub(".*/", "", my_str)
# 'Four'
re.sub() is the regex replacement method. Like your perl regex, we simply look for any number of characters, followed by a slash, and then replace that with the empty string. What's left is what's after the last slash, which is 4.
The are alot of possibilities to solve this. One way would be by indexing the string. Other string method can be found here
string ="/One/Two/Three/Four"
string[string.index('Four'):]
Additionally you could split the string by the slash with .split('/')
print(string.split('/')[-1])
Another option would be regular expressions: see here

How to find a specific character in a string and put it at the end of the string

I have this string:
'Is?"they'
I want to find the question mark (?) in the string, and put it at the end of the string. The output should look like this:
'Is"they?'
I am using the following regular expression in python 2.7. I don't know why my regex is not working.
import re
regs = re.sub('(\w*)(\?)(\w*)', '\\1\\3\\2', 'Is?"they')
print regs
Is?"they # this is the output of my regex.
Your regex doesn't match because " is not in the \w character class. You would need to change it to something like:
regs = re.sub('(\w*)(\?)([^"\w]*)', '\\1\\3\\2', 'Is?"they')
As shown here, " is not captured by \w. Hence, it would probably be best to just use a .:
>>> import re
>>> re.sub("(.*)(\?)(.*)", r'\1\3\2', 'Is?"they')
'Is"they?'
>>>
. captures anything/everything in Regex (except newlines).
Also, you'll notice that I used a raw-string for the second argument of re.sub. Doing so is cleaner than having all those backslashes.

Python regular expression with string in it

I would like to match a string with something like:
re.match(r'<some_match_symbols><my_match><some_other_match_symbols>', mystring)
where mymatch is a string I would like it to find. The problem is that it may be different from time to time, and it is stored in a variable. Would it be possible to add one variable to a regex?
Nothing prevents you from simply doing this:
re.match('<some_match_symbols>' + '<my_match>' + '<some_other_match_symbols>', mystring)
Regular expressions are nothing else than a string containing some special characters, specific to the regular expression syntax. But they are still strings, so you can do whatever you are used to do with strings.
The r'…' syntax is btw. a raw string syntax which basically just prevents any escape sequences inside the string from being evaluated. So r'\n' will be the same as '\\n', a string containing a backslash and an n; while '\n' contain a line break.
import re
url = "www.dupe.com"
expression = re.compile('<p>%s</p>'%url)
result = expression.match("<p>www.dupe.com</p>BBB")
if result:
print result.start(), result.end()
The r'' notation is for constants. Use the re library to compile from variables.

Regular Expression Not matching the value

I have a file saving IP addresses to names in format
<<%#$192.168.8.40$#% %##Name_of_person##% >>
I read This file and now want to extract the list using pythons regular expressions
list=re.findall("<<%#$(\S+)$#%\s%##(\w+\s*\w*)##%\s>>",ace)
print list
But the list is always an empty list..
can anyone tell me where is the mistake in the regular expression
edit-ace is the variable saving the contents read from the file
$ is a special character in regular expressions, meaning "end of line" (or "end of string", depending on the flavour). Your regex has other characters following the $, and as such only matches strings that have those characters after the end, which is impossible.
You will need to escape the $, like so: \$
I would suggest the following regular expression (formatted as a raw string since you are using Python):
r"<<%#\$([^$]+)\$#%\s%##([^#]+)##%\s>>"
That is, <<%#$, then one or more non-$ characters, $#%, a whitespace character, %##, one or more non-# characters, ##%, whitespace, >>.
Something like:
text = '<<%#$192.168.8.40$#% %##Name_of_person##% >>'
ip, name = [el[1] for el in re.findall(r'%#(.)(.+?)\1#%', text)]
If you can get any with just splitting on '#' and '$' then...
from itertools import itemgetter
ip, name = itemgetter(1, 3)(re.split(r'[#\$]', text))
You could also just use built-in string functions:
tmp = text.split('$')
ip, name = tmp[1], tmp[2].split('#')[1]
u use a invalid regex pattern.
you may use
r"<\%#\$(\S+)\$#\%\s\%##(\w+\s*\w*)##\%\s>>" replace
"<<%#$(\S+)$#%\s%##(\w+\s*\w*)##%\s>>" in fandall method
good luck~!

Categories