What's the maximum number of repetitions allowed in a Python regex? - python

In Python 2.7 and 3, the following works:
>>> re.search(r"a{1,9999}", 'aaa')
<_sre.SRE_Match object at 0x1f5d100>
but this gives an error:
>>> re.search(r"a{1,99999}", 'aaa')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "/usr/lib/python2.7/re.py", line 240, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python2.7/sre_compile.py", line 523, in compile
groupindex, indexgroup
RuntimeError: invalid SRE code
It seems like there is an upper limit on the number of repetitions allowed. Is this part of the regular expression specification, or a Python-specific limitation? If Python-specific, is the actual number documented somewhere, and does it vary between implementations?

A quick manual binary search revealed the answer, specifically 65535:
>>> re.search(r"a{1,65535}", 'aaa')
<_sre.SRE_Match object at 0x2a9a68>
>>>
>>> re.search(r"a{1,65536}", 'aaa')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/re.py", line 240, in _compile
p = sre_compile.compile(pattern, flags)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/sre_compile.py", line 523, in compile
groupindex, indexgroup
OverflowError: regular expression code size limit exceeded
This is discussed here:
The limit is an implementation detail. The pattern is compiled into codes which are then interpreted, and it just happens that the codes are (usually) 16 bits, giving a range of 0..65535, but it uses 65535 to represent no limit and doesn't warn if you actually write 65535.
and
The quantifiers use 65535 to represent no upper limit, so ".{0,65535}" is equivalent to ".*".
Thanks to the authors of the comments below for pointing a few more things out:
CPython implements this limitation in _sre.c. (#LukasGraf)
There is a constant MAXREPEAT in sre_constants.py that holds this max repetition value:
>>> import sre_constants
>>>
>>> sre_constants.MAXREPEAT
65535
(#MarkkuK. and #hcwhsa)

Related

Why same python re pattern regex works in single line but not in multi line

import re
regex =re.compile('''
((.*\n){2}
Cannot display: file marked as a binary type.\n
(.*\n){1})
''', re.X)
Above code throws error
Traceback (most recent call last):
File "/test.py", line 8, in <module>
''', re.X)
File "/usr/lib64/python2.7/re.py", line 190, in compile
return _compile(pattern, flags)
File "/usr/lib64/python2.7/re.py", line 242, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
while writing the regex in a single line works fine and there is no error
regex = re.compile('((.*\n){2}Cannot display: file marked as a binary type.\n(.*\n){1})')

invalid expression sre_constants.error: nothing to repeat

I am trying to match the data in output variable ,am looking to match the word after *,am trying the following way but running into an error, how to fix it?
import re
output = """test
* Peace
master"""
m = re.search('* (\w+)', output)
print m.group(0)
Error:-
Traceback (most recent call last):
File "testinglogic.py", line 7, in <module>
m = re.search('* (\w+)', output)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 146, in search
return _compile(pattern, flags).search(string)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 251, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
The first fix would be to escape the *, because you want the engine to treat it literally (as an asterisk), so you escape it with a backslash.
Another suggestion would be to use a lookbehind, so you don't need to use another capture group:
>>> re.search('(?<=\*\s)\w+', output).group()
'Peace'

Issue with python version 2.7.3 importing maxrepeat module

I am trying to import maxrepeat module in python 2.7.3 and couldn’t get much information in google can some one please help.
What is the module that helps maxrepeat module to work ?
I am able to import maxrepeat module using “from _sre import maxrepeat” but still fails while runnninv automation .
MAXREPEAT is used internally by the re module as an upper limit for the minimum, maximum, or exact number of repetitions that can be specified in a pattern. For example:
>>> import re
>>> re.compile(r'a{100}') # exactly 100 "a"s
<_sre.SRE_Pattern object at 0x7fa68be10780>
>>> re.compile(r'a{100, 200}') # between 100 and 200 "a"s
Equalling or exceeding MAXREPEAT in a repetition value causes an exception to be raised by the regular expression parser in module sre_parse:
>>> from sre_constants import MAXREPEAT
>>> MAXREPEAT
4294967295L
>>> re.compile(r'a{{{}}}'.format(MAXREPEAT-1))
<_sre.SRE_Pattern object at 0x7f0ec959f660>
>>> re.compile(r'a{{{}}}'.format(MAXREPEAT))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/re.py", line 194, in compile
return _compile(pattern, flags)
File "/usr/lib64/python2.7/re.py", line 249, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib64/python2.7/sre_compile.py", line 572, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib64/python2.7/sre_parse.py", line 716, in parse
p = _parse_sub(source, pattern, 0)
File "/usr/lib64/python2.7/sre_parse.py", line 324, in _parse_sub
itemsappend(_parse(source, state))
File "/usr/lib64/python2.7/sre_parse.py", line 518, in _parse
raise OverflowError("the repetition number is too large")
OverflowError: the repetition number is too large
There should not be any reason to care about MAXREPEAT with normal use of the re module. If you need to handle errors then use the exception:
try:
re.compile(r'a{{{}}}'.format(MAXREPEAT))
except OverflowError as exc:
print 'Failed to compile pattern: {}'.format(exc.message)

Error in calling a class in Python

I am unable to use a example from there
I am testing out the sample here
http://cloud.verizon.com/documentation/AuthenticationRESTCallExample.htm
I am calling them by using (This is for sample)
secretKey="xxxxxxxxx"
accessKey="Xxxxxxxxx"
restVerb="GET"
apiResource="'https://api.cloud.verizon.com/api/cloud/vdisk-template/"
VzREST(secretKey,accessKey).request(restVerb,apiResource)
But I am getting a error like
Traceback (most recent call last):
File "C:\Python34\admins\a1.py", line 176, in <module>
VzREST(secretKey,accessKey).request(restVerb,apiResource)
File "C:\Python34\admins\a1.py", line 171, in request
restVerb,apiResource=apiResource),data=data)
File "C:\Python34\admins\a1.py", line 103, in _reqREST
apiResource = self._stripAndEncodeApiResource(apiResource)
File "C:\Python34\admins\a1.py", line 45, in _stripAndEncodeApiResource
return re.sub(self._url, '', apiResource.encode('ascii', 'ignore'))
File "C:\Python34\lib\re.py", line 175, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: can't use a string pattern on a bytes-like object
What's the error I am making here? Whether I am calling it wrongly (I am not so good in Python)

dateutil.parser.parse() gives error "initial_value must be unicode or None, not str" on Windows platform

I'm sure there's a really simple solution to this, but I'm still fairly new to Python.
I'm trying to use dateutil.parser.parse() to parse a string with a timestamp in it:
>>> import dateutil.parser
>>> a = dateutil.parser.parse("2011-10-01 12:00:00+01:00")
>>> print a
2011-10-01 12:00:00+01:00
This works fine on my Linux server, but on my Windows test box it gives an error:
>>> import dateutil.parser
>>> a = dateutil.parser.parse("2011-10-01 12:00:00+01:00")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\python_dateutil-2.0-py2.7.egg\dateutil\parser.py", line 698, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "C:\Python27\lib\site-packages\python_dateutil-2.0-py2.7.egg\dateutil\parser.py", line 302, in parse
res = self._parse(timestr, **kwargs)
File "C:\Python27\lib\site-packages\python_dateutil-2.0-py2.7.egg\dateutil\parser.py", line 350, in _parse
l = _timelex.split(timestr)
File "C:\Python27\lib\site-packages\python_dateutil-2.0-py2.7.egg\dateutil\parser.py", line 144, in split
return list(cls(s))
File "C:\Python27\lib\site-packages\python_dateutil-2.0-py2.7.egg\dateutil\parser.py", line 44, in __init__
instream = StringIO(instream)
TypeError: initial_value must be unicode or None, not str
If I try giving dateutil.parser.parse() a unicode string, that doesn't work on the Windows box either:
>>> a = dateutil.parser.parse(unicode("2011-10-01 12:00:00+01:00"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\python_dateutil-2.0-py2.7.egg\dateutil\parser.py", line 698, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "C:\Python27\lib\site-packages\python_dateutil-2.0-py2.7.egg\dateutil\parser.py", line 302, in parse
res = self._parse(timestr, **kwargs)
File "C:\Python27\lib\site-packages\python_dateutil-2.0-py2.7.egg\dateutil\parser.py", line 350, in _parse
l = _timelex.split(timestr)
File "C:\Python27\lib\site-packages\python_dateutil-2.0-py2.7.egg\dateutil\parser.py", line 144, in split
return list(cls(s))
TypeError: iter() returned non-iterator of type '_timelex'
Yet this also works on the Linux box.
It's not a Windows issue, it's Python version / library version issue.
dateutil 2.0 is written to support only Python 3, not Python 2.X. Both cases here contain bugs when used with Python 2.X.
In the first case:
dateutil.parser.parse("2011-10-01 12:00:00+01:00")
the io.StringIO class allows only unicode arguments, but the code reads:
if isinstance(instream, str):
instream = StringIO(instream)
In the second case:
dateutil.parser.parse(unicode("2011-10-01 12:00:00+01:00"))
if you look at _timelex class, it contains the __next__ method, which is Python3's way of indicating that an object supports iteration protocol. In Python 2.X, the name of the method should be next.
Check if you have the same versions of both Python and the library on Linux and Windows. From project website:
python-dateutil-2.0.tar.gz (Python >= 3.0)
python-dateutil-1.5.tar.gz (Python < 3.0)

Categories