invalid expression sre_constants.error: nothing to repeat - python

I am trying to match the data in output variable ,am looking to match the word after *,am trying the following way but running into an error, how to fix it?
import re
output = """test
* Peace
master"""
m = re.search('* (\w+)', output)
print m.group(0)
Error:-
Traceback (most recent call last):
File "testinglogic.py", line 7, in <module>
m = re.search('* (\w+)', output)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 146, in search
return _compile(pattern, flags).search(string)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 251, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat

The first fix would be to escape the *, because you want the engine to treat it literally (as an asterisk), so you escape it with a backslash.
Another suggestion would be to use a lookbehind, so you don't need to use another capture group:
>>> re.search('(?<=\*\s)\w+', output).group()
'Peace'

Related

importing python's Blessed library causes a regex error?

I wanted to learn about python's Blessed library, maybe make a text-based game or some useful thing. but no matter what code write, whenever I import blessed I get an error report.
Ive tried various code, including examples of complete code which have worked for other people, but its always the same error.
This is some simple code since it doesn't seem to matter...
from blessed import Terminal
t = Terminal()
print('Code that does nothing.')
this is the error..
Traceback (most recent call last):
File "/usr/lib/python3.7/sre_parse.py", line 1015, in parse_template
this = chr(ESCAPES[this][1])
KeyError: '\\d'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/drew/TBG/test.py", line 3, in <module>
t = blessed.Terminal()
File "/usr/lib/python3/dist-packages/blessed/terminal.py", line 226, in __init__
self.__init__capabilities()
File "/usr/lib/python3/dist-packages/blessed/terminal.py", line 244, in __init__capabilities
name, cap, attribute, **kwds)
File "/usr/lib/python3/dist-packages/blessed/sequences.py", line 134, in build
pattern = re.sub(r'\d+', _numeric_regex, _outp)
File "/usr/lib/python3.7/re.py", line 194, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/usr/lib/python3.7/re.py", line 311, in _subx
template = _compile_repl(template, pattern)
File "/usr/lib/python3.7/re.py", line 302, in _compile_repl
return sre_parse.parse_template(repl, pattern)
File "/usr/lib/python3.7/sre_parse.py", line 1018, in parse_template
raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \d at position 0
I've tried googling obviously but found no one having this issue with blessed, just other escape code errors which I dont have the knowledge to apply to this instance. I tried following the exception through those files but their contents are beyond my level (and I don't have permission to change them anyway, which is for the best).
I'm running ubuntu 14.04, have python 3.7.8 as default and blessed 1.17.8 installed.
Please help me determine whats causing this error.

Why same python re pattern regex works in single line but not in multi line

import re
regex =re.compile('''
((.*\n){2}
Cannot display: file marked as a binary type.\n
(.*\n){1})
''', re.X)
Above code throws error
Traceback (most recent call last):
File "/test.py", line 8, in <module>
''', re.X)
File "/usr/lib64/python2.7/re.py", line 190, in compile
return _compile(pattern, flags)
File "/usr/lib64/python2.7/re.py", line 242, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
while writing the regex in a single line works fine and there is no error
regex = re.compile('((.*\n){2}Cannot display: file marked as a binary type.\n(.*\n){1})')

How to search for text string in executable output with python?

I'm trying to create a python script to auto update a program for me. When I run program.exe --help, it gives a long output and inside the output is a string with value of "Version: X.X.X" How can I make a script that runs the command and isolates the version number from the executable's output?
I should have mentioned that I tried the following:
import re
import subprocess
regex = r'Version: ([\d\.]+)'
match = re.search(regex, subprocess.run(["program.exe", "--help"]))
print((match.group(0)))
and got the error:
Traceback (most recent call last):
File "run.py", line 6, in <module>
match = re.search(regex, subprocess.run(["program.exe", "--help"]))
File "C:\Python37\lib\re.py", line 183, in search
return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object
Something like this should work:
re.search(r'Version: ([\d\.]+)', subprocess.check_output(['program.exe', '--help']).decode()).group(1)

Python Regex Unmatched Groups error with multiple patterns

So I was trying to answer a question on SO when I ran into this issue. Basically a user had the following string:
Adobe.Flash.Player.14.00.125.ie
and wanted to replace it with
Adobe Flash Player 14.00.125 ie
so I used the following re.sub call to solve this issue:
re.sub("([a-zA-Z])\.([a-zA-Z0-9])",r"\1 \2",str)
I then realized that doesn't remove the dot between 125 and ie so I figured I'd try to match another pattern namely:
re.sub("([a-zA-Z])\.([a-zA-Z0-9])|([0-9])\.([a-zA-Z])",r"\1\3 \2\4",str)
When I try to run this, I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/re.py", line 151, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "/usr/lib64/python2.6/re.py", line 278, in filter
return sre_parse.expand_template(template, match)
File "/usr/lib64/python2.6/sre_parse.py", line 793, in expand_template
raise error, "unmatched group"
sre_constants.error: unmatched group
Now, I understand that it's complaining because I'm trying to replace the match with an unmatched group but is there a way around this without having to call re.sub twice?
Without any capturing groups,
>>> import re
>>> s = "Adobe.Flash.Player.14.00.125.ie"
>>> m = re.sub(r'\.(?=[A-Za-z])|(?<!\d)\.', r' ', s)
>>> m
'Adobe Flash Player 14.00.125 ie'

What's the maximum number of repetitions allowed in a Python regex?

In Python 2.7 and 3, the following works:
>>> re.search(r"a{1,9999}", 'aaa')
<_sre.SRE_Match object at 0x1f5d100>
but this gives an error:
>>> re.search(r"a{1,99999}", 'aaa')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "/usr/lib/python2.7/re.py", line 240, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python2.7/sre_compile.py", line 523, in compile
groupindex, indexgroup
RuntimeError: invalid SRE code
It seems like there is an upper limit on the number of repetitions allowed. Is this part of the regular expression specification, or a Python-specific limitation? If Python-specific, is the actual number documented somewhere, and does it vary between implementations?
A quick manual binary search revealed the answer, specifically 65535:
>>> re.search(r"a{1,65535}", 'aaa')
<_sre.SRE_Match object at 0x2a9a68>
>>>
>>> re.search(r"a{1,65536}", 'aaa')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/re.py", line 240, in _compile
p = sre_compile.compile(pattern, flags)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/sre_compile.py", line 523, in compile
groupindex, indexgroup
OverflowError: regular expression code size limit exceeded
This is discussed here:
The limit is an implementation detail. The pattern is compiled into codes which are then interpreted, and it just happens that the codes are (usually) 16 bits, giving a range of 0..65535, but it uses 65535 to represent no limit and doesn't warn if you actually write 65535.
and
The quantifiers use 65535 to represent no upper limit, so ".{0,65535}" is equivalent to ".*".
Thanks to the authors of the comments below for pointing a few more things out:
CPython implements this limitation in _sre.c. (#LukasGraf)
There is a constant MAXREPEAT in sre_constants.py that holds this max repetition value:
>>> import sre_constants
>>>
>>> sre_constants.MAXREPEAT
65535
(#MarkkuK. and #hcwhsa)

Categories