So I was trying to answer a question on SO when I ran into this issue. Basically a user had the following string:
Adobe.Flash.Player.14.00.125.ie
and wanted to replace it with
Adobe Flash Player 14.00.125 ie
so I used the following re.sub call to solve this issue:
re.sub("([a-zA-Z])\.([a-zA-Z0-9])",r"\1 \2",str)
I then realized that doesn't remove the dot between 125 and ie so I figured I'd try to match another pattern namely:
re.sub("([a-zA-Z])\.([a-zA-Z0-9])|([0-9])\.([a-zA-Z])",r"\1\3 \2\4",str)
When I try to run this, I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/re.py", line 151, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "/usr/lib64/python2.6/re.py", line 278, in filter
return sre_parse.expand_template(template, match)
File "/usr/lib64/python2.6/sre_parse.py", line 793, in expand_template
raise error, "unmatched group"
sre_constants.error: unmatched group
Now, I understand that it's complaining because I'm trying to replace the match with an unmatched group but is there a way around this without having to call re.sub twice?
Without any capturing groups,
>>> import re
>>> s = "Adobe.Flash.Player.14.00.125.ie"
>>> m = re.sub(r'\.(?=[A-Za-z])|(?<!\d)\.', r' ', s)
>>> m
'Adobe Flash Player 14.00.125 ie'
Related
import patoolib
patoolib.create_archive("file.zip", ("to_pdf.pdf"))
and on running i am getting the error
Traceback (most recent call last):
File "C:\Users\happy\Desktop\Site_Blocker\file_to_archive.py", line 2, in <module>
patoolib.create_archive("file.zip", ("to_pdf.pdf"))
File "C:\Users\happy\AppData\Local\Programs\Python\Python38\lib\site-packages\patoolib\__init__.py", line 712, in create_archive
util.check_archive_filelist(filenames)
File "C:\Users\happy\AppData\Local\Programs\Python\Python38\lib\site-packages\patoolib\util.py", line 422, in check_archive_filelist
check_existing_filename(filename, onlyfiles=False)
File "C:\Users\happy\AppData\Local\Programs\Python\Python38\lib\site-packages\patoolib\util.py", line 398, in check_existing_filename
raise PatoolError("file `%s' was not found" % filename)
patoolib.util.PatoolError: file `t' was not found
Please tell me how to fix this error.
The second argument of create_archive is the filenames. You appear to be trying to give it a tuple, which would work, but the syntax that you have used is not correct for creating a one-element tuple.
("to_pdf.pdf") will evaluate to simply "to_pdf.pdf", and when you iterate over this string you will get the characters in the string, hence the error on the first iteration that there is no file called t.
To create a one-element tuple, you should include the comma:
patoolib.create_archive("file.zip", ("to_pdf.pdf",))
Alternatively, you could use a list:
patoolib.create_archive("file.zip", ["to_pdf.pdf"])
I'm trying to create a python script to auto update a program for me. When I run program.exe --help, it gives a long output and inside the output is a string with value of "Version: X.X.X" How can I make a script that runs the command and isolates the version number from the executable's output?
I should have mentioned that I tried the following:
import re
import subprocess
regex = r'Version: ([\d\.]+)'
match = re.search(regex, subprocess.run(["program.exe", "--help"]))
print((match.group(0)))
and got the error:
Traceback (most recent call last):
File "run.py", line 6, in <module>
match = re.search(regex, subprocess.run(["program.exe", "--help"]))
File "C:\Python37\lib\re.py", line 183, in search
return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object
Something like this should work:
re.search(r'Version: ([\d\.]+)', subprocess.check_output(['program.exe', '--help']).decode()).group(1)
I am trying to match the data in output variable ,am looking to match the word after *,am trying the following way but running into an error, how to fix it?
import re
output = """test
* Peace
master"""
m = re.search('* (\w+)', output)
print m.group(0)
Error:-
Traceback (most recent call last):
File "testinglogic.py", line 7, in <module>
m = re.search('* (\w+)', output)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 146, in search
return _compile(pattern, flags).search(string)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 251, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
The first fix would be to escape the *, because you want the engine to treat it literally (as an asterisk), so you escape it with a backslash.
Another suggestion would be to use a lookbehind, so you don't need to use another capture group:
>>> re.search('(?<=\*\s)\w+', output).group()
'Peace'
This question already has answers here:
Find all files in a directory with extension .txt in Python
(25 answers)
Closed 8 years ago.
I have
os.listdir('/home/dir/')
with file and file.ab
How can I use regex to list only file.ab on that directory.
When i was use regex with
re.compile('*ab')
it return
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/re.py", line 190, in compile
return _compile(pattern, flags)
File "/usr/lib64/python2.6/re.py", line 245, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
Better use glob:
import glob
print glob.glob('/home/dir/*.ab')
no need regex :
[i for i in os.listdir('/home/dir/') if i.endswith(".ab")]
In Python 2.7 and 3, the following works:
>>> re.search(r"a{1,9999}", 'aaa')
<_sre.SRE_Match object at 0x1f5d100>
but this gives an error:
>>> re.search(r"a{1,99999}", 'aaa')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "/usr/lib/python2.7/re.py", line 240, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python2.7/sre_compile.py", line 523, in compile
groupindex, indexgroup
RuntimeError: invalid SRE code
It seems like there is an upper limit on the number of repetitions allowed. Is this part of the regular expression specification, or a Python-specific limitation? If Python-specific, is the actual number documented somewhere, and does it vary between implementations?
A quick manual binary search revealed the answer, specifically 65535:
>>> re.search(r"a{1,65535}", 'aaa')
<_sre.SRE_Match object at 0x2a9a68>
>>>
>>> re.search(r"a{1,65536}", 'aaa')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/re.py", line 240, in _compile
p = sre_compile.compile(pattern, flags)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/sre_compile.py", line 523, in compile
groupindex, indexgroup
OverflowError: regular expression code size limit exceeded
This is discussed here:
The limit is an implementation detail. The pattern is compiled into codes which are then interpreted, and it just happens that the codes are (usually) 16 bits, giving a range of 0..65535, but it uses 65535 to represent no limit and doesn't warn if you actually write 65535.
and
The quantifiers use 65535 to represent no upper limit, so ".{0,65535}" is equivalent to ".*".
Thanks to the authors of the comments below for pointing a few more things out:
CPython implements this limitation in _sre.c. (#LukasGraf)
There is a constant MAXREPEAT in sre_constants.py that holds this max repetition value:
>>> import sre_constants
>>>
>>> sre_constants.MAXREPEAT
65535
(#MarkkuK. and #hcwhsa)