How to use regular expressions in "in" python function - python

I'm ussing Python 'in' function to check if a little string is inside another bigger string. The problem is what next:
word1 = 'log'
word2 = 'log_enable'
string = ['parent_log', 'parent_log_enable']
for e in string:
if word1 in e:
print 'here we have the word'
So, obviously, "here we have the word" will be printted twice. I want to know if a regular expression can be used in 'in' function and, in this case, which one should I use to get the correct output?
Thanks, regards.
Mike.

You can't use a regular expression with in. Just use the module directly to replace in:
import re
pattern = re.compile(r'log(?:_enable)?')
for e in string:
if pattern.search(e):
print 'here we have the word'
Here the pattern checks for log optionally followed by _enable.

You can't use in by itself to achieve what you want.
If you wanted to use in because it's more succinct, list comprehensions may be what you want.
I'll build on Martijn's answer
import re
p = re.compile(r'log(?:_enable)?')
# Check if any match
any(p.search(s) for s in ['parent_log', 'parent_log_enable'])
# Result: True
# Get all matches
[s for s in ['parent_log', 'parent_log_enable'] if p.search(s)]
# Result: ['parent_log', 'parent_log_enable']

Related

Getting word from string

How can i get word example from such string:
str = "http://test-example:123/wd/hub"
I write something like that
print(str[10:str.rfind(':')])
but it doesn't work right, if string will be like
"http://tests-example:123/wd/hub"
You can use this regex to capture the value preceded by - and followed by : using lookarounds
(?<=-).+(?=:)
Regex Demo
Python code,
import re
str = "http://test-example:123/wd/hub"
print(re.search(r'(?<=-).+(?=:)', str).group())
Outputs,
example
Non-regex way to get the same is using these two splits,
str = "http://test-example:123/wd/hub"
print(str.split(':')[1].split('-')[1])
Prints,
example
You can use following non-regex because you know example is a 7 letter word:
s.split('-')[1][:7]
For any arbitrary word, that would change to:
s.split('-')[1].split(':')[0]
many ways
using splitting:
example_str = str.split('-')[-1].split(':')[0]
This is fragile, and could break if there are more hyphens or colons in the string.
using regex:
import re
pattern = re.compile(r'-(.*):')
example_str = pattern.search(str).group(1)
This still expects a particular format, but is more easily adaptable (if you know how to write regexes).
I am not sure why do you want to get a particular word from a string. I guess you wanted to see if this word is available in given string.
if that is the case, below code can be used.
import re
str1 = "http://tests-example:123/wd/hub"
matched = re.findall('example',str1)
Split on the -, and then on :
s = "http://test-example:123/wd/hub"
print(s.split('-')[1].split(':')[0])
#example
using re
import re
text = "http://test-example:123/wd/hub"
m = re.search('(?<=-).+(?=:)', text)
if m:
print(m.group())
Python strings has built-in function find:
a="http://test-example:123/wd/hub"
b="http://test-exaaaample:123/wd/hub"
print(a.find('example'))
print(b.find('example'))
will return:
12
-1
It is the index of found substring. If it equals to -1, the substring is not found in string. You can also use in keyword:
'example' in 'http://test-example:123/wd/hub'
True

I need help formulating a specific regex

I do not consider myself a newbie in regex, but I seem to have found a problem that stumped me (it's also Friday evening, so brain not at peak performance).
I am trying to substitute a place-holder inside a string with some other value. I am having great difficulty getting a syntax that behaves the way I want.
My place-holder has this format: {swap}
I want it to capture and replace these:
{swap} # NewValue
x{swap}x # xNewValuex
{swap}x # NewValuex
x{swap} # xNewValue
But I want it to NOT match these:
{{swap}} # NOT {NewValue}
x{{swap}}x # NOT x{NewValue}x
{{swap}}x # NOT {NewValue}x
x{{swap}} # NOT x{NewValue}
In all of the above, x can be any string, of any length, be it "word" or not.
I'm trying to do this using python3's re.sub() but anytime I satisfy one subset of criteria I lose another in the process. I'm starting to think it might not be possible to do in a single command.
Cheers!
If you're able to use the newer regex module, you can use (*SKIP)(*FAIL):
{{.*?}}(*SKIP)(*FAIL)|{.*?}
See a demo on regex101.com.
Broken down, this says:
{{.*?}}(*SKIP)(*FAIL) # match any {{...}} and "throw them away"
| # or ...
{.*?} # match your desired pattern
In Python this would be:
import regex as re
rx = re.compile(r'{{.*?}}(*SKIP)(*FAIL)|{.*?}')
string = """
{swap}
x{swap}x
{swap}x
x{swap}
{{swap}}
x{{swap}}x
{{swap}}x
x{{swap}}"""
string = rx.sub('NewValue', string)
print(string)
This yields:
NewValue
xNewValuex
NewValuex
xNewValue
{{swap}}
x{{swap}}x
{{swap}}x
x{{swap}}
For the sake of completeness, you can also achieve this with Python's own re module but here, you'll need a slightly adjusted pattern as well as a replacement function:
import re
rx = re.compile(r'{{.*?}}|({.*?})')
string = """
{swap}
x{swap}x
{swap}x
x{swap}
{{swap}}
x{{swap}}x
{{swap}}x
x{{swap}}"""
def repl(match):
if match.group(1) is not None:
return "NewValue"
else:
return match.group(0)
string = rx.sub(repl, string)
print(string)
Use negative lookahead and lookbehind:
s1 = "x{swap}x"
s2 = "x{{swap}}x"
pattern = r"(?<!\{)\{[^}]+\}(?!})"
re.sub(pattern, "foo", s1)
#'xfoox'
re.sub(pattern, "foo", s2)
#'x{{swap}}x'

simple regex pattern not matching [duplicate]

>>> import re
>>> s = 'this is a test'
>>> reg1 = re.compile('test$')
>>> match1 = reg1.match(s)
>>> print match1
None
in Kiki that matches the test at the end of the s. What do I miss? (I tried re.compile(r'test$') as well)
Use
match1 = reg1.search(s)
instead. The match function only matches at the start of the string ... see the documentation here:
Python offers two different primitive operations based on regular expressions: re.match() checks for a match only at the beginning of the string, while re.search() checks for a match anywhere in the string (this is what Perl does by default).
Your regex does not match the full string. You can use search instead as Useless mentioned, or you can change your regex to match the full string:
'^this is a test$'
Or somewhat harder to read but somewhat less useless:
'^t[^t]*test$'
It depends on what you're trying to do.
It's because of that match method returns None if it couldn't find expected pattern, if it find the pattern it would return an object with type of _sre.SRE_match .
So, if you want Boolean (True or False) result from match you must check the result is None or not!
You could examine texts are matched or not somehow like this:
string_to_evaluate = "Your text that needs to be examined"
expected_pattern = "pattern"
if re.match(expected_pattern, string_to_evaluate) is not None:
print("The text is as you expected!")
else:
print("The text is not as you expected!")

Regex Expression not matching correctly

I'm tackling a python challenge problem to find a block of text in the format xXXXxXXXx (lower vs upper case, not all X's) in a chunk like this:
jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn
I have tested the following RegEx and found it correctly matches what I am looking for from this site (http://www.regexr.com/):
'([a-z])([A-Z]){3}([a-z])([A-Z]){3}([a-z])'
However, when I try to match this expression to the block of text, it just returns the entire string:
In [1]: import re
In [2]: example = 'jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn'
In [3]: expression = re.compile(r'([a-z])([A-Z]){3}([a-z])([A-Z]){3}([a-z])')
In [4]: found = expression.search(example)
In [5]: print found.string
jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn
Any ideas? Is my expression incorrect? Also, if there is a simpler way to represent that expression, feel free to let me know. I'm fairly new to RegEx.
You need to return the match group instead of the string attribute.
>>> import re
>>> s = 'jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn'
>>> rgx = re.compile(r'[a-z][A-Z]{3}[a-z][A-Z]{3}[a-z]')
>>> found = rgx.search(s).group()
>>> print found
nJDKoJIWh
The string attribute always returns the string passed as input to the match. This is clearly documented:
string
The string passed to match() or search().
The problem has nothing to do with the matching, you're just grabbing the wrong thing from the match object. Use match.group(0) (or match.group()).
Based on xXXXxXXXx if you want upper letters with len 3 and lower with len 1 between them this is what you want :
([a-z])(([A-Z]){3}([a-z]))+
also you can get your search function with group()
print expression.search(example).group(0)

python regular expression substitute

I need to find the value of "taxid" in a large number of strings similar to one given below. For this particular string, the 'taxid' value is '9606'. I need to discard everything else. The "taxid" may appear anywhere in the text, but will always be followed by a ":" and then number.
score:0.86|taxid:9606(Human)|intact:EBI-999900
How to write regular expression for this in python.
>>> import re
>>> s = 'score:0.86|taxid:9606(Human)|intact:EBI-999900'
>>> re.search(r'taxid:(\d+)', s).group(1)
'9606'
If there are multiple taxids, use re.findall, which returns a list of all matches:
>>> re.findall(r'taxid:(\d+)', s)
['9606']
for line in lines:
match = re.match(".*\|taxid:([^|]+)\|.*",line)
print match.groups()

Categories