regular expression in python about or - python

I know p=re.compile('aaa|bbb') can work, but I want to rewrite p = re.compile('aaa|bbb') using variables, something like
A = 'aaa'
B = 'bbb'
p = re.compile(A|B)
but this doesn't work. How can I rewrite this so that variables are used (and it works)?

p=re.compile(A|B)
You are not doing the string concatenation correctly. What you are doing is applying the "bitwise or" (the pipe) operator to strings, which, of course, fails:
>>> 'aaa' | 'bbb'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'str' and 'str'
Instead, you can use str.join():
p = re.compile(r"|".join([A, B]))
Demo:
>>> A = 'aaa'
>>> B = 'bbb'
>>> r"|".join([A, B])
'aaa|bbb'
And, make sure you trust the source of A and B (beware of Regex injection attacks), or/and properly escape them.

Related

Find a phrase in defined list with a file

I have a list name my= ['cbs is down','abnormal']
and I have opened a file in read mode
Now I want to search any of the string available in list that exist in that file and perform the if action
fopen = open("test.txt","r")
my =['cbs is down', 'abnormal']
for line in fopen:
if my in line:
print ("down")
and when I execute it, I get the following
Traceback (most recent call last):
File "E:/python/fileread.py", line 4, in <module>
if my in line:
TypeError: 'in <string>' requires string as left operand, not list
This should work things out:
if any(i in line for i in my):
...
Basically you are going through my and checking whether any of its elements is present in line.
fopen = open("test.txt","r")
my =['cbs is down', 'abnormal']
for line in fopen:
for x in my:
if x in line:
print ("down")
Sample input
Some text cbs is down
Yes, abnormal
not in my list
cbs is down
Output
down
down
down
The reason for your error:
The in operator as used in:
if my in line: ...
^ ^
|_ left | hand side
|
|_ right hand side
for a string operand on the right side (i.e. line) requires a corresponding string operand on the left hand side. This operand consistency check is implemented by the str.__contains__ method, where the call to __contains__ is made from the string on the right hand side (see cpython implemenetation). Same as:
if line.__contains__(my): ...
You're however passing a list, my, instead of a string.
An easy way to resolve this is by check that any of the items in the list are contained in the current line using the builtin any function:
for line in fopen:
if any(item in line for item in my):
...
Or since you have just two items use the or operator (pun unintended) which short-circuits in the same way as any:
for line in fopen:
if 'cbs is down' in line or 'abnormal' in line:
...
You could also join the terms in my to a regular expression like \b(cbs is down|abnormal)\b and use re.findall or re.search to find the terms. This way, you can also enclose the pattern in word-boundaries \b...\b so it does not match parts of longer words, and you also see which term was matched, and where.
>>> import re
>>> my = ['cbs is down', 'abnormal']
>>> line = "notacbs is downright abnormal"
>>> p = re.compile(r"\b(" + "|".join(map(re.escape, my)) + r")\b")
>>> p.findall(line)
['abnormal']
>>> p.search(line).span()
(21, 29)

python regular expression error

I am trying do a pattern search and if match then set a bitarray on the counter value.
runOutput = device[router].execute (cmd)
runOutput = output.split('\n')
print(runOutput)
for this_line,counter in enumerate(runOutput):
print(counter)
if re.search(r'dev_router', this_line) :
#want to use the counter to set something
Getting the following error:
if re.search(r'dev_router', this_line) :
2016-07-15T16:27:13: %ERROR: File
"/auto/pysw/cel55/python/3.4.1/lib/python3.4/re.py", line 166,
in search 2016-07-15T16:27:13: %-ERROR: return _compile(pattern,
flags).search(string)
2016-07-15T16:27:13: %-ERROR: TypeError: expected string or buffer
You mixed up the arguments for enumerate() - first goes the index, then the item itself. Replace:
for this_line,counter in enumerate(runOutput):
with:
for counter, this_line in enumerate(runOutput):
You are getting a TypeError in this case because this_line is an integer and re.search() expects a string as a second argument. To demonstrate:
>>> import re
>>>
>>> this_line = 0
>>> re.search(r'dev_router', this_line)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/user/.virtualenvs/so/lib/python2.7/re.py", line 146, in search
return _compile(pattern, flags).search(string)
TypeError: expected string or buffer
By the way, modern IDEs like PyCharm can detect this kind of problems statically:
(Python 3.5 is used for this screenshot)

Python doesn't allow me to do match.group() with regex?

I wrote a regex in Python to just get the digits from a string. However, when I run match.group(), it says that the object list has no attribute group. What am I doing wrong? My code as typed pasted into the terminal, and the terminal's response. Thanks.
>>> #import regex library
... import re
>>>
>>> #use a regex to just get the numbers -- not the rest of the string
... matcht = re.findall(r'\d', dwtunl)
>>> matchb = re.findall(r'\d', ambaas)
>>> #macht = re.search(r'\d\d', dwtunl)
...
>>> #just a test to see about my regex
... print matcht.group()
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
AttributeError: 'list' object has no attribute 'group'
>>> print matchb.group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'group'
>>>
>>> #start defining the final variables
... if dwtunl == "No Delay":
... dwtunnl = 10
... else:
... dwtunnl = matcht.group()
...
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
AttributeError: 'list' object has no attribute 'group'
>>> if ambaas == "No Delay":
... ammbaas = 10
... else:
... ammbaas = matchb.group()
...
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
AttributeError: 'list' object has no attribute 'group'
re.findall() doesn't return a match object (or a list of them), it always returns a list of strings (or a list of tuples of strings, in the case of there being more than one capturing group). And a list doesn't have a .group() method.
>>> import re
>>> regex = re.compile(r"(\w)(\W)")
>>> regex.findall("A/1$5&")
[('A', '/'), ('1', '$'), ('5', '&')]
re.finditer() will return an iterator that yields one match object per match.
>>> for match in regex.finditer("A/1$5&"):
... print match.group(1), match.group(2)
...
A /
1 $
5 &
Because re.findall(r'\d', ambaas) returns a list. You can iterate over the list, like:
for i in stored_list
Or just stored_list[0].
re.findall() returns a list of strings, not a match object. You might mean re.finditer.

Shorthand for defining PyParsing field names

I have a few pyparsing tokens defined as follows:
field = Word(alphas + "_").setName("field")
Is there really no shorthand for this?
Furthermore, this does not seem to work, the dictionary returned by expression.parseString() is always an empty one.
You are confusing setName and setResultsName. setName assigns a name to the expression so that exception messages are more meaningful. Compare:
>>> integer1 = Word(nums)
>>> integer1.parseString('x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python26\lib\site-packages\pyparsing-1.5.6-py2.6.egg\pyparsing.py", line 1032, in parseString
raise exc
pyparsing.ParseException: Expected W:(0123...) (at char 0), (line:1, col:1)
and:
>>> integer2 = Word(nums).setName("integer")
>>> integer2.parseString('x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python26\lib\site-packages\pyparsing-1.5.6-py2.6.egg\pyparsing.py", line 1032, in parseString
raise exc
pyparsing.ParseException: Expected integer (at char 0), (line:1, col:1)
setName gives a name to the expression itself.
setResultsName on the other hand gives a name to the parsed data that is returned, like named fields in a regex.
>>> expr = integer.setResultsName('age') + integer.setResultsName('credits')
>>> data = expr.parseString('20 110')
>>> print data.dump()
['20', '110']
- age: 20
- credits: 110
And as #Kimvais has mentioned, there is a shortcut for setResultsName:
>>> expr = integer('age') + integer('credits')
Note also that setResultsName returns a copy of the expression - that is the only way that using the same expression multiple times with different names works.
field = Word(alphas + "_")("field")
seems to work.

In what order does Python resolve functions? (why does string.join(lst.append('a')) fail?)

How does string.join resolve? I tried using it as below:
import string
list_of_str = ['a','b','c']
string.join(list_of_str.append('d'))
But got this error instead (exactly the same error in 2.7.2):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/string.py", line 318, in join
return sep.join(words)
TypeError
The append does happen, as you can see if you try to join list_of_string again:
print string.join(list_of_string)
-->'a b c d'
here's the code from string.py (couldn't find the code for the builtin str.join() for sep):
def join(words, sep = ' '):
"""join(list [,sep]) -> string
Return a string composed of the words in list, with
intervening occurrences of sep. The default separator is a
single space.
(joinfields and join are synonymous)
"""
return sep.join(words)
What's going on here? Is this a bug? If it's expected behavior, how does it resolve/why does it happen? I feel like I'm either about to learn something interesting about the order in which python executes its functions/methods OR I've just hit a historical quirk of Python.
Sidenote: of course it works to just do the append beforehand:
list_of_string.append('d')
print string.join(list_of_string)
-->'a b c d'
list_of_str.append('d')
does not return the new list_of_str.
The method append has no return value and so returns None.
To make it work you can do this:
>>> import string
>>> list_of_str = ['a','b','c']
>>> string.join(list_of_str + ['d'])
Although that is not very Pythonic and there is no need to import string... this way is better:
>>> list_of_str = ['a','b','c']
>>> ''.join(list_of_str + ['d'])

Categories