Writing double backslash in python Docstring [duplicate] - python

This question already has answers here:
How can I put an actual backslash in a string literal (not use it for an escape sequence)?
(4 answers)
Closed 7 months ago.
I am writing documentations for a python package with a clear_stop_character function, in which users can provide extra stop-chars in a list. In the documentation I have written:
"""
stoplist (list, default empty): Accepts a list of extra stop characters,
should escape special regex characters (e.g., stoplist=['\\*']).
"""
It is crucial for the users to see the double backslash before the stop-char. However, the help() outcome of the built package shows:
"""
stoplist (list, default empty): Accepts a list of extra stop characters,
should escape special regex characters (e.g., stoplist=['\*']).
"""
So, it will be misleading for the users.
BTW, I did not find a solution based on the previous questions.
Any ideas?

\ in Python is an escape character which tells Python to interpret the character following it literally. This means that \\ tells Python to interpret the second \ literally, thus causing the error where the first backslash is not displayed.
The simplest solution to this problem is to use four backslashes: \\\\. This way, Python sees the first backslash and interprets the second one literally, printing \. Then, the third backslash will tell Python to interpret the fourth one literally like \.
Simply rewrite your code as:
"""
stoplist (list, default empty): Accepts a list of extra stop characters,
should escape special regex characters (e.g., stoplist=['\\\\*']).
"""

Related

Regex patterns with windows paths in python [duplicate]

This question already has answers here:
Why do backslashes appear twice?
(2 answers)
Closed 7 months ago.
I found a python package on GitHub that doesn't work. It attempts to replace a substring within a url with another string.
string = "filename.txt"
rewrite = "c:\\windows\\system32\\drivers\\hosts"
url = "https://www.example.com/path?parameter=filename.txt"
fullrewrite = re.sub(string, rewrite, url)
The string, rewrite, and url parameters are arbitrary and not hard-coded. I just put them there as an example (this is a path traversal testing library I'm trying to play around with).
When I run this code, I get a KeyError from re, which is expected according to the docs:
If you’re not using a raw string to express the pattern, remember that Python also uses the backslash as an escape sequence in string literals; if the escape sequence isn’t recognized by Python’s parser, the backslash and subsequent character are included in the resulting string. However, if Python would recognize the resulting sequence, the backslash should be repeated twice. This is complicated and hard to understand, so it’s highly recommended that you use raw strings for all but the simplest expressions.
I tried using repr to convert the string into a raw string:
raw = repr(rewrite)[1:-1] # [1:-1] removes extra quotes.
fullrewrite = re.sub(string, raw, url)
But this creates double backslashes in the resulting url: https://www.example.com/path?parameter=c:\\windows\\system32\\drivers\\hosts
My question is how am I supposed to have it replace the key word so that the resulting string is: https://www.example.com/path?parameter=c:\windows\system32\drivers\hosts?
This is my understanding, please correct me if i'm wrong.
You don't get double backslashes, but escaped backslashes. In Re and Python, one backslash is a special character. It does not match the backslash character.(or rather, not always) To print one backslash, one would need to escape it with another.(again - most often) Thus, one can say that a double backslash is an internal representation of a backslash.
If one puts 'c:\\' into print() or save it to a 'txt' file, one will get 'c:\'.
P.S. Since '\q' is not a special sequence in Python, '\q'=='\\q' returns True.

Removing backslash in Python at runtime [duplicate]

This question already has answers here:
How can I convert special characters in a string back into escape sequences?
(3 answers)
Closed 7 months ago.
I need a way for my function to take in a string at runtime and remove the backslashes while KEEPING the character it is prepended to. So for \a I must get a. This must also work for nonescaped characters like \e -> e.
I've scoured the internet looking for a general solution to this problem, but there does not appear to be one. The best solution I have found uses a dictionary to build the string from scratch like: How to prevent automatic escaping of special characters in Python
escape_dict={'\a':r'\a',
'\b':r'\b',
'\c':r'\c',
'\f':r'\f',
'\n':r'\n',
'\r':r'\r',
'\t':r'\t',
'\v':r'\v',
'\'':r'\'',
'\"':r'\"',
'\0':r'\0',
'\1':r'\1',
'\2':r'\2',
'\3':r'\3',
'\4':r'\4',
'\5':r'\5',
'\6':r'\6',
'\7':r'\7',
'\8':r'\8',
'\9':r'\9'}
def raw(text):
"""Returns a raw string representation of the string"""
new_string=''
for char in text:
try:
new_string += escape_dict[char]
except KeyError:
new_string += char
return new_string
However this fails in general because of conflicts between the escaped numbers and escaped letters. Using the 3 digit numbers like \001 instead of \1 also fails because the output will have additional numbers in it which defeats the purpose. I should simply remove the backslash. Other proposed solutions based on encodings like the one found here Process escape sequences in a string in Python
also does not work because this converts just converts the escape characters into the hex code. \a gets converted to \x07. Even if were to somehow remove this the character a is still lost.
There is a function you may want to use for this purpose called repr().
repr() computes the “official” string representation of an object (a representation that has all information about the object) and str() is used to compute the “informal” string representation of an object (a representation that is useful for printing the object).
Example:
s = 'This is a \t string tab. And this is a \n newline character'
print(s) # This will print `s` with a tab and a newline inserted in the string
print(repr(s)) # This prints `s` as the original string with backslash and the whatever letter you have used
# So maybe you can use this somewhere
print(repr(s).replace('\\', '_'))
# And obviously this might not have worked for you
print(s.replace('\\', '_'))
So you can replace the backslash from your string by using repr(<your string>)

Using findall method in a tokenized text, and prefix 'r' [duplicate]

This question already has answers here:
What does the "r" in pythons re.compile(r' pattern flags') mean?
(3 answers)
Closed 5 years ago.
I understand that the 'r' prefix indicates a raw string, hence why in the following example is the 'r' prefix being used, since there are special regex characters in the string, which should not be taken literally?
the 'string' that is being searched is an nltk Text object, I suppose it has something to do with this? However I don't understand how it affects the usage of findall.
moby.findall(r"<a> (<.*>) <man>")
In this particular case, r makes no difference, as this string does not contain any sequences which could be misinterpreted. However, it is a good habit to use r when writing regular expressions, to avoid misinterpretation of sequences like \n or \t; with r, they are treated literally, as two characters - backslash followed by a letter; without r, they evaluate to newline and tab, respectively.
The r preceeding the string is called a sigil.
For example, '\n' will be treated as a newline character, while r'\n' will be treated as the characters \ followed by n.
But for your regex:
moby.findall(r"<a> (<.*>) <man>")
it doesn't make a difference but it is always a good idea to treat regex as raw strings to avoid escaping backslashes.

Why is python automatically putting a \ infront of '? [duplicate]

This question already has answers here:
Why do backslashes appear twice?
(2 answers)
Closed 7 years ago.
Pretty much as question states- I have a code that finds sentences in a big string using regex- findall(). It then uses this sentence later, however when it uses it it puts a backslash infront of any apostrophe, for example Today's becomes Today\'s. Why is this happening, and how can I stop this happening?
It's called escaping a string. When you use " or ' inside of a string use \ to avoid lexical syntax errors. I believe there is a method that removes the escape character from a string if that's what you'd like to do.
The backslash denotes a so called escape sequence, which basically tells python that this character has to be interpreted differently from a "normal" ' character (which would signal the beginning or end of a string for the interpreter).

Python: Using numbers after "\" [duplicate]

This question already has answers here:
How can I put an actual backslash in a string literal (not use it for an escape sequence)?
(4 answers)
Closed 7 months ago.
What is the way for preventing Python from interpreting \ followed by numbers as something else?
e.g.
I get DirectoryNameFromAnotherProgram (say it is equal to 'N:\Some Directory')
print DirectoryNameFromAnotherProgram + '1234.txt'
# prints:
# N:\Some DirectoryS4.txt
Since the string with "\" comes as output from another script, I do not have a choice to change it.
Put a "\" in front of the "\". The meaning of "\" in a sting is: the next character doesn't mean what it normally means. If the next character was not normally special (for example, if it's a digit), it means something special now. If the next character does normally mean something special (for example, a backslash), it's not special now. Either way, the initial "\" has done its thing, and is removed.
Special case: if the next character is not normally special (for example, the "S" in your string), but cannot be made special (the sequence "\S" has no special meaning), then the backslash doesn't do anything and is not removed.

Categories