Python Regex Unmatched Groups error with multiple patterns

Python Regex Unmatched Groups error with multiple patterns - python

So I was trying to answer a question on SO when I ran into this issue. Basically a user had the following string:
Adobe.Flash.Player.14.00.125.ie
and wanted to replace it with
Adobe Flash Player 14.00.125 ie
so I used the following re.sub call to solve this issue:
re.sub("([a-zA-Z])\.([a-zA-Z0-9])",r"\1 \2",str)
I then realized that doesn't remove the dot between 125 and ie so I figured I'd try to match another pattern namely:
re.sub("([a-zA-Z])\.([a-zA-Z0-9])|([0-9])\.([a-zA-Z])",r"\1\3 \2\4",str)
When I try to run this, I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/re.py", line 151, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "/usr/lib64/python2.6/re.py", line 278, in filter
return sre_parse.expand_template(template, match)
File "/usr/lib64/python2.6/sre_parse.py", line 793, in expand_template
raise error, "unmatched group"
sre_constants.error: unmatched group
Now, I understand that it's complaining because I'm trying to replace the match with an unmatched group but is there a way around this without having to call re.sub twice?

Without any capturing groups,
>>> import re
>>> s = "Adobe.Flash.Player.14.00.125.ie"
>>> m = re.sub(r'\.(?=[A-Za-z])|(?<!\d)\.', r' ', s)
>>> m
'Adobe Flash Player 14.00.125 ie'

Related

How to fix this file `s' not found error in python?

import patoolib
patoolib.create_archive("file.zip", ("to_pdf.pdf"))
and on running i am getting the error
Traceback (most recent call last):
File "C:\Users\happy\Desktop\Site_Blocker\file_to_archive.py", line 2, in <module>
patoolib.create_archive("file.zip", ("to_pdf.pdf"))
File "C:\Users\happy\AppData\Local\Programs\Python\Python38\lib\site-packages\patoolib\__init__.py", line 712, in create_archive
util.check_archive_filelist(filenames)
File "C:\Users\happy\AppData\Local\Programs\Python\Python38\lib\site-packages\patoolib\util.py", line 422, in check_archive_filelist
check_existing_filename(filename, onlyfiles=False)
File "C:\Users\happy\AppData\Local\Programs\Python\Python38\lib\site-packages\patoolib\util.py", line 398, in check_existing_filename
raise PatoolError("file `%s' was not found" % filename)
patoolib.util.PatoolError: file `t' was not found
Please tell me how to fix this error.

The second argument of create_archive is the filenames. You appear to be trying to give it a tuple, which would work, but the syntax that you have used is not correct for creating a one-element tuple.
("to_pdf.pdf") will evaluate to simply "to_pdf.pdf", and when you iterate over this string you will get the characters in the string, hence the error on the first iteration that there is no file called t.
To create a one-element tuple, you should include the comma:
patoolib.create_archive("file.zip", ("to_pdf.pdf",))
Alternatively, you could use a list:
patoolib.create_archive("file.zip", ["to_pdf.pdf"])

How to search for text string in executable output with python?

I'm trying to create a python script to auto update a program for me. When I run program.exe --help, it gives a long output and inside the output is a string with value of "Version: X.X.X" How can I make a script that runs the command and isolates the version number from the executable's output?
I should have mentioned that I tried the following:
import re
import subprocess
regex = r'Version: ([\d\.]+)'
match = re.search(regex, subprocess.run(["program.exe", "--help"]))
print((match.group(0)))
and got the error:
Traceback (most recent call last):
File "run.py", line 6, in <module>
match = re.search(regex, subprocess.run(["program.exe", "--help"]))
File "C:\Python37\lib\re.py", line 183, in search
return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object

Something like this should work:
re.search(r'Version: ([\d\.]+)', subprocess.check_output(['program.exe', '--help']).decode()).group(1)

invalid expression sre_constants.error: nothing to repeat

I am trying to match the data in output variable ,am looking to match the word after *,am trying the following way but running into an error, how to fix it?
import re
output = """test
* Peace
master"""
m = re.search('* (\w+)', output)
print m.group(0)
Error:-
Traceback (most recent call last):
File "testinglogic.py", line 7, in <module>
m = re.search('* (\w+)', output)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 146, in search
return _compile(pattern, flags).search(string)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 251, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat

The first fix would be to escape the *, because you want the engine to treat it literally (as an asterisk), so you escape it with a backslash.
Another suggestion would be to use a lookbehind, so you don't need to use another capture group:
>>> re.search('(?<=\*\s)\w+', output).group()
'Peace'

Use re module in python re.compile('*ab') [duplicate]

This question already has answers here:
Find all files in a directory with extension .txt in Python
(25 answers)
Closed 8 years ago.
I have
os.listdir('/home/dir/')
with file and file.ab
How can I use regex to list only file.ab on that directory.
When i was use regex with
re.compile('*ab')
it return
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/re.py", line 190, in compile
return _compile(pattern, flags)
File "/usr/lib64/python2.6/re.py", line 245, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat

Better use glob:
import glob
print glob.glob('/home/dir/*.ab')

no need regex :
[i for i in os.listdir('/home/dir/') if i.endswith(".ab")]

What's the maximum number of repetitions allowed in a Python regex?

In Python 2.7 and 3, the following works:
>>> re.search(r"a{1,9999}", 'aaa')
<_sre.SRE_Match object at 0x1f5d100>
but this gives an error:
>>> re.search(r"a{1,99999}", 'aaa')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "/usr/lib/python2.7/re.py", line 240, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python2.7/sre_compile.py", line 523, in compile
groupindex, indexgroup
RuntimeError: invalid SRE code
It seems like there is an upper limit on the number of repetitions allowed. Is this part of the regular expression specification, or a Python-specific limitation? If Python-specific, is the actual number documented somewhere, and does it vary between implementations?

A quick manual binary search revealed the answer, specifically 65535:
>>> re.search(r"a{1,65535}", 'aaa')
<_sre.SRE_Match object at 0x2a9a68>
>>>
>>> re.search(r"a{1,65536}", 'aaa')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/re.py", line 240, in _compile
p = sre_compile.compile(pattern, flags)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/sre_compile.py", line 523, in compile
groupindex, indexgroup
OverflowError: regular expression code size limit exceeded
This is discussed here:
The limit is an implementation detail. The pattern is compiled into codes which are then interpreted, and it just happens that the codes are (usually) 16 bits, giving a range of 0..65535, but it uses 65535 to represent no limit and doesn't warn if you actually write 65535.
and
The quantifiers use 65535 to represent no upper limit, so ".{0,65535}" is equivalent to ".*".
Thanks to the authors of the comments below for pointing a few more things out:
CPython implements this limitation in _sre.c. (#LukasGraf)
There is a constant MAXREPEAT in sre_constants.py that holds this max repetition value:
>>> import sre_constants
>>>
>>> sre_constants.MAXREPEAT
65535
(#MarkkuK. and #hcwhsa)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Regex Unmatched Groups error with multiple patterns - python

Without any capturing groups, >>> import re >>> s = "Adobe.Flash.Player.14.00.125.ie" >>> m = re.sub(r'\.(?=[A-Za-z])|(?<!\d)\.', r' ', s) >>> m 'Adobe Flash Player 14.00.125 ie'

Related

How to fix this file `s' not found error in python?

How to search for text string in executable output with python?

invalid expression sre_constants.error: nothing to repeat

Use re module in python re.compile('*ab') [duplicate]

What's the maximum number of repetitions allowed in a Python regex?

Categories

Resources