Doctest Involving Escape Characters - python

Have a function fix(), as a helper function to an output function which writes strings to a text file.
def fix(line):
"""
returns the corrected line, with all apostrophes prefixed by an escape character
>>> fix('DOUG\'S')
'DOUG\\\'S'
"""
if '\'' in line:
return line.replace('\'', '\\\'')
return line
Turning on doctests, I get the following error:
Failed example:
fix('DOUG'S')
Exception raised:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1254, in __run
compileflags, 1) in test.globs
File "<doctest convert.fix[0]>", line 1
fix('DOUG'S')
^
No matter what combination of \ and 's I use, the doctest doesn't seem to to want to work, even though the function itself works perfectly. Have a suspicion that it is a result of the doctest being in a block comment, but any tips to resolve this.

Is this what you want?:
def fix(line):
r"""
returns the corrected line, with all apostrophes prefixed by an escape character
>>> fix("DOUG\'S")
"DOUG\\'S"
>>> fix("DOUG'S") == r"DOUG\'S"
True
>>> fix("DOUG'S")
"DOUG\\'S"
"""
return line.replace("'", r"\'")
import doctest
doctest.testmod()
raw strings are your friend...

First, this is what happens if you actually call your function in the interactive interpreter:
>>> fix("Doug's")
"Doug\\'s"
Note that you don't need to escape single quotes in double-quoted strings, and that Python does not do this in the representation of the resulting string – only the back slash gets escaped.
This means the correct docstring should be (untested!)
"""
returns the corrected line, with all apostrophes prefixed by an escape character
>>> fix("DOUG'S")
"DOUG\\\\'S"
"""
I'd use a raw string literal for this docstring to make this more readable:
r"""
returns the corrected line, with all apostrophes prefixed by an escape character
>>> fix("DOUG'S")
"DOUG\\'S"
"""

Related

Python unit-test assertions are being joined by newlines

While using unittest.TestCase.run(test_class(test)), the correct errors are being reported but they are being joined by \n.
AssertionError: False is not true : Failures: [], Errors: [(<module1 testMethod=method1>, 'Traceback (most recent call last):\n File "<file_name>", line 20, in method1\n \'resource_partitions\')\n File "error_source_file_path", line 23, in error_function\n error_line_statement\nKeyError: \'gen_data\'\n')]
How can these be removed and replaced with actual newlines instead?
Has it got something to do with line-endings on my machine (currently set to \n)
That is an intended behaviour.
The string is being displayed as a part an object. Such displays always print escape sequences rather to convert them to their specific char.
Look in this short example with string in interpreter:
>>> "spam\neggs"
'spam\neggs'
>>> print("spam\neggs")
spam
eggs
The first one gets displayed because it's an interactive console, in normal code it would never happen. But that's also how string in other object behave.
Printing list containing a string vs printing each element separately:
>>> print(["spam\neggs"])
['spam\neggs']
>>> for element in ["spam\neggs"]: print(element)
...
spam
eggs

pyPEG2 parsing of newlines

I'm trying to use pyPEG2 to translate MoinMoin markup to Markdown, and I need to pay attention to newlines in certain cases. However, I can't even get my newline parsing tests to work. I'm new to pyPEG and my Python is rusty. Please bear with me.
Here's the code:
#!/usr/local/bin/python3
from pypeg2 import *
import re
class Newline(List):
grammar = re.compile(r'\n')
parse("\n", Newline)
parse("""
""", Newline)
This results in:
Traceback (most recent call last):
File "./pyPegNewlineTest.py", line 7, in <module>
parse("\n", Newline)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pypeg2/__init__.py", line 667, in parse
t, r = parser.parse(text, thing)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pypeg2/__init__.py", line 794, in parse
raise r
File "<string>", line 2
^
SyntaxError: expecting match on \n
It's as if pypeg is inserting an empty line after the \n.
Trying other options such as
grammar = re.compile(r'\n', re.MULTILINE)
grammar = re.compile(r'\r\n|\r|\n', re.MULTILINE)
grammar = contiguous(re.compile(r'\r\n|\r|\n', re.MULTILINE))
and various combinations of those don't change the error message (although I don't think I tried all combinations). Changing Newline to subclass str instead of List doesn't change the error either.
Update
I have figured out that pypeg is stripping the newline before parsing it:
#!/usr/local/bin/python3
from pypeg2 import *
import re
class Newline(str):
grammar = contiguous(re.compile(r'a'))
parse("\na", Newline)
parse("""
a""", Newline)
print("Success, of a sort.")
Running this results in:
Success, of a sort.
If I override the Newline's parse method I don't even see the newline. The first thing it gets is the "a". This is consistent with what I'm seeing elsewhere. pypeg strips all leading whitespace, even when you specify contiguous.
So, that's what's happening. Not sure what to do about it.
Yes by default pypeg remove the whitespaces including the newlines.
This is easly configurable by setting the optional whitespace argument in the parse() function, e.g. in:
parse("\na", Newline, whitespace=re.compile(r"[ \t\r]"))
Doing so spaces and tabs will still be skipped, but not newlines \n.
With this example the parser now correctly find the syntax error:
SyntaxError: expecting match on a

Substitute parenthesis for their regular expression

I'm trying to copy a file,
>>> originalFile = '/Users/alvinspivey/Documents/workspace/Image_PCA/spectra_text/HIS/jean paul test 1 - Copy (2)/bean-1-aa.txt'
>>> copyFile = os.system('cp '+originalFile+' '+NewTmpFile)
But must first replace the spaces and parenthesis before the open function will work:
/Users/alvinspivey/Documents/workspace/Image_PCA/spectra_text/HIS/jean\ paul\ test\ 1\ -\ Copy\ \(2\)/bean-1-aa.txt
spaces ' ' --> '\ '
parenthesis '(' --> '\(' etc.
Replacing the spaces work:
>>> originalFile = re.sub(r'\s',r'\ ', os.path.join(root,file))
but parenthesis return an error:
>>> originalFile = re.sub(r'(',r'\(', originalFile)
Traceback (most recent call last):
File "", line 1, in
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 244, in _compile
raise error, v # invalid expression
sre_constants.error: unbalanced parenthesis
Am I replacing parenthesis correctly?
Also, when using re.escape() for this, the file is not returned correctly. So it is not an alternative.
( has special meaning in regular expressions (grouping), you have to escape it:
originalFile = re.sub(r'\(',r'\(', originalFile)
or, since you don't use regex features for the replacement:
originalFile = re.sub(r'\(','\(', originalFile)
The regular expression r'(' is translated as start a capturing group. Which is why Python is complaining.
If all you are doing is replacing spaces and parenthesis then maybe just string.replace will do ?
Alternatively, if you avoid calling a shell (os.system) to do the copy, you don't need to worry about escaping spaces and other special characters,
import shutil
originalFile = '/Users/alvinspivey/Documents/workspace/Image_PCA/spectra_text/HIS/jean paul test 1 - Copy (2)/bean-1-aa.txt'
newTmpFile = '/whatever.txt'
shutil.copy(originalFile, newTmpFile)
Use shutil.copy to copy files, rather than calling the system.
Use subprocess rather than os.system - it avoids calling into the shell, so doesn't need the quoting.

Python re "bogus escape error"

I've been messing around with the python re modules .search method. cur is the input from a Tkinter entry widget. Whenever I enter a "\" into the entry widget, it throws this error. I'm not all to sure what the error is or how to deal with it. Any insight would be much appreciated.
cur is a string
tup[0] is also a string
Snippet:
se = re.search(cur, tup[0], flags=re.IGNORECASE)
The error:
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Python26\Lib\Tkinter.py", line 1410, in __call__
return self.func(*args)
File "C:\Python26\Suite\quidgets7.py", line 2874, in quick_links_results
self.quick_links_results_s()
File "C:\Python26\Suite\quidgets7.py", line 2893, in quick_links_results_s
se = re.search(cur, tup[0], flags=re.IGNORECASE)
File "C:\Python26\Lib\re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "C:\Python26\Lib\re.py", line 245, in _compile
raise error, v # invalid expression
error: bogus escape (end of line)
"bogus escape (end of line)" means that your pattern ends with a backslash. This has nothing to do with Tkinter. You can duplicate the error pretty easily in an interactive shell:
>>> import re
>>> pattern="foobar\\"
>>> re.search(pattern, "foobar")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py", line 241, in _compile
raise error, v # invalid expression
sre_constants.error: bogus escape (end of line)
The solution? Make sure your pattern doesn't end with a single backslash.
The solution to this issue is to use a raw string as the replacement text. The following won't work:
re.sub('this', 'This \\', 'this is a text')
It will throw the error: bogus escape (end of line)
But the following will work just fine:
re.sub('this', r'This \\', 'this is a text')
Now, the question is how do you convert a string generated during program runtime into a raw string in Python. You can find a solution for this here. But I prefer using a simpler method to do this:
def raw_string(s):
if isinstance(s, str):
s = s.encode('string-escape')
elif isinstance(s, unicode):
s = s.encode('unicode-escape')
return s
The above method can convert only ascii and unicode strings into raw strings. Well, this has been working great for me till date :)
If you are trying to search for "cur" in "tup[0]" you should do this through "try:... except:..." block to catch invalid pattern:
try :
se = re.search(cur, tup[0], flags=re.IGNORECASE)
except re.error, e:
# print to stdout or any status widget in your gui
print "Your search pattern is not valid."
# Some details for error:
print e
# Or some other code for default action.
The first parameter to re is the pattern to search for, thus if 'cur' contains a backslash at the end of the line, it'll be an invalid escape sequence. You've probably swapped your arguments around (I don't know what tup[0] is, but is it your pattern?) and it should be like this
se = re.search(tup[0], cur, flags=re.IGNORECASE)
As you very rarely use user input as a pattern (unless you're doing a regular expression search mechanism, in which case you might want to show the error instead).
HTH.
EDIT:
The error it is reporting is that you're using an escape character before the end of line (which is what bogus escape (end of line) means), that is your pattern ends with a backslash, which is not a valid pattern. Escape character (backslash) must be followed by another character, which removes or adds special meaning to that character (not sure exactly how python does it, posix makes groups by adding escape to parentheses, perl removes the group effect by escaping it). That is \* matches a literal asterix, whereas * matches the preceding character 0 or more times.

how to detect an escape sequence in a string

Given a string named line whose raw version has this value:
\rRAWSTRING
how can I detect if it has the escape character \r? What I've tried is:
if repr(line).startswith('\r'):
blah...
but it doesn't catch it. I also tried find, such as:
if repr(line).find('\r') != -1:
blah
doesn't work either. What am I missing?
thx!
EDIT:
thanks for all the replies and the corrections re terminolgy and sorry for the confusion.
OK, if i do this
print repr(line)
then what it prints is:
'\rSET ENABLE ACK\n'
(including the single quotes). i have tried all the suggestions, including:
line.startswith(r'\r')
line.startswith('\\r')
each of which returns False. also tried:
line.find(r'\r')
line.find('\\r')
each of which returns -1
If:
print repr(line)
Returns:
'\rSET ENABLE ACK\n'
Then:
line.find('\r')
line.startswith('\r')
'\r' in line
are what you are looking for. Example:
>>> line = '\rSET ENABLE ACK\n'
>>> print repr(line)
'\rSET ENABLE ACK\n'
>>> line.find('\r')
0
>>> line.startswith('\r')
True
>>> '\r' in line
True
repr() returns a display string. It actually contains the quotes and backslashes you see when you print the line:
>>> print line
SET ENABLE ACK
>>> print repr(line)
'\rSET ENABLE ACK\n'
>>> print len(line)
16
>>> print len(repr(line))
20
Dude, seems you have tried everything but the simplest thing, lines.startswith('\r'):
>>> line = '\rSET ENABLE ACK\n'
>>> line.startswith('\r')
True
>>> '\rSET ENABLE ACK\n'.startswith('\r')
True
For now, just hold on on using repr(), \r and r'string' and go with the simplest thing to avoid confusion.
You can try either:
if repr(line).startswith(r'\r'):
or
if line.startswith('\r'):
The latter is better: it seems like you are using repr only to get at the escaped character.
It's not entirely clear what you're asking. You speak of a string, but also a "raw version", and your string contains "RAWSTRING", which seems to imply you are talking about raw strings.
None of these are quite the same thing.
If you have an ordinary string with the character represented by '\r' in it, then you can use any ordinary means to match:
>>> "\rastring".find('\r')
0
>>>
If you defined an actual "raw string", that won't work because what you put in was not the '\r' character, but the two characters '\' and 'r':
>>> r"\rastring".find('\r')
-1
>>>
In this case, or in the case of an ordinary string that happens to have the characters '\' and 'r', and you want to find those two characters, you'll need to search using a raw string:
>>> r"\rastring".find(r'\r')
0
>>>
Or you can search for the sequence by escaping the backslash itself:
>>> r"\rastring".find('\\r')
0
>>>
if '\r' in line:
If that isn't what you mean, tell us PRECISELY what is in line. Do this:
print repr(line)
and copy/past the result into a edit of your question.
Re your subject: backslash is an escape character, "\r" is an escaped character.
Simplest:
>>> s = r'\rRAWSTRING'
>>> s.startswith(r'\r')
True
Unless I badly misunderstand what you're saying, you need to look for r'\r' (or, equivalently but perhaps a tad less readably, '\\r'), the escape sequence, not for '\r', the carriage return character itself.

Categories