Difference between u"string" and ur"string" in Python [duplicate] - python

This question already has answers here:
What exactly do "u" and "r" string prefixes do, and what are raw string literals?
(7 answers)
Closed 6 years ago.
From documentation:
The solution is to use Python’s raw string notation for regular
expression patterns; backslashes are not handled in any special way in
a string literal prefixed with 'r'. So r"\n" is a two-character string
containing '\' and 'n', while "\n" is a one-character string
containing a newline. Usually patterns will be expressed in Python
code using this raw string notation.
Types also match; type(u"text") == type(ur"text"), and same goes when you remove u. Therefore, I have to ask: what is the difference between these two? If there is no difference, why use r at all?

For example:
>>> len(ur"tex\t")
5
>>> len(u"tex\t")
4
Without r, the \t is one character (the tab) so the string has length 4.
Use r if you want to build a regular expression that involves \. In an non-r string, you'd have to escape these which is not funny.
>>> len(u"\\")
1
>>> len(ur"\\")
2

Related

How to caculate the number of all elements including escape sequences in a string? [duplicate]

This question already has answers here:
How do I get the string representation of a variable in python?
(4 answers)
Closed last month.
I have a string, and I have to count all elements in this string.
str = '\r\n\r\n\r\n \r\n \xa0\xa0\r\nIntroduction\r\n\r\n\r\nHello\r\n\r\nWorld\r\nProblems...\r\nHow to calculate numbers...\r\nConclusion\r\n\r\n\r\n\xa0\r\n\r\nHello world.'
These elements contain numbers, letters, escape sequences, whitespaces, commas, etc.
Is there any way to count all elements in this kind of string in Python?
I know that len() and count() cannot help. And I also tried some regex methods like re.findall(r'.', str), but it cannot find elements like \n and also can only find \r instead of \ and r.
Edit:
To be more clear, I want to count \n as 2, not 1, and also \xa0 as 4, not 1.
\ is a special character in Python so you have to escape them like str = '\\r\\n ' or str = r'\r\n '. After that, len() counts \ as an independent character.
Python compiles your string literal into a python string where escaped character sequences such as \n are replaced with their unicode character equivalent (in this case the unicode U-000A newline). len would count this 2 character sequence as a single character.
By the time your code sees this string, the original python literal escape sequence is gone. But the repr representation adds escape sequences back. So you could take the length of that.
>>> s = '\r\n\r\n\r\n \r\n \xa0\xa0\r\nIntroduction\r\n\r\n\r\nHello\r\n\r\nWorld\r\nProblems...\r\nHow to calculate numbers...\r\nConclusion\r\n\r\n\r\n\xa0\r\n\r\nHello world.'
>>> print(len(s))
123
>>> print(len(repr(s)))
170
This isn't going to be 100% accurate because there is more than one way to construct a unicode character in a literal string. For instance "\n" and "\x0a" both decode to the same newline character and there is no way to know which form it came from.
Alternately, you could use "raw" strings that do not escape the characters. So, r"\n" is length 2.

How to ignore backslashes as escape characters in Python? [duplicate]

This question already has answers here:
How to write string literals in Python without having to escape them?
(6 answers)
Closed 7 months ago.
I know this is similar to many other questions regarding backslashes, but this deals with a specific problem that has yet to have been addressed. Is there a mode that can be used to completely eliminate backslashes as escape characters in a print statement? I need to know this for ascii art, as it is very difficult to find correct positioning when all backslashes must be doubled.
print('''
/\\/\\/\\/\\/\\
\\/\\/\\/\\/\\/
''')
\```
Preface the string with r (for "raw", I think) and it will be interpreted literally without substitutions:
>>> # Your original
>>> print('''
... /\\/\\/\\/\\/\\
... \\/\\/\\/\\/\\/
... ''')
/\/\/\/\/\
\/\/\/\/\/
>>> # as a raw string instead
>>> print(r'''
... /\\/\\/\\/\\/\\
... \\/\\/\\/\\/\\/
... ''')
/\\/\\/\\/\\/\\
\\/\\/\\/\\/\\/
These are often used for regular expressions, where it gets tedious to have to double-escape backslashes. There are a couple other letters you can do this with, including f (for format strings, which act differently), b (a literal bytes object, instead of a string), and u, which used to designate Unicode strings in python 2 and I don't think does anything special in python 3.

What does an 'r' represent before a string in python? [duplicate]

This question already has answers here:
What exactly do "u" and "r" string prefixes do, and what are raw string literals?
(7 answers)
Closed 7 years ago.
I want to understand why do we use a r before a path name in python such as
dirname = r'C:\temp\parts'
r means the string will be treated as raw string.
See the official Python 2 Reference about "String literals":
When an 'r' or 'R' prefix is present, a character following a
backslash is included in the string without change, and all
backslashes are left in the string. For example, the string literal
r"\n" consists of two characters: a backslash and a lowercase 'n'.
String quotes can be escaped with a backslash, but the backslash
remains in the string; for example, r"\"" is a valid string literal
consisting of two characters: a backslash and a double quote; r"\" is
not a valid string literal (even a raw string cannot end in an odd
number of backslashes). Specifically, a raw string cannot end in a
single backslash (since the backslash would escape the following quote
character). Note also that a single backslash followed by a newline is
interpreted as those two characters as part of the string, not as a
line continuation.

Trailing slash in a raw string [duplicate]

This question already has answers here:
Why can't I end a raw string with a backslash? [duplicate]
(4 answers)
Why can't Python's raw string literals end with a single backslash?
(14 answers)
Closed 8 years ago.
Just a quick silly question. How do I write a trailing slash in a raw string literal?
r = r'abc\' # syntax error
r = r'abc\\' # two slashes: "abc\\"
You can't. A raw string literal can't end with an odd number of backslashes (langref; last paragraph of that section). You can, howerver, write a raw string literal without the backslash, and write the final backslash as an ordinary string literal:
r = r'abc' '\\'
Adjacent string literals are implicitly concatenated by the parser.
Raw string literals are parsed in exactly the same way as ordinary string literals; it’s just the conversion from string literal to string object that’s different. This means that all string literals must end with an even number of backslashes; otherwise, the unpaired backslash at the end escapes the closing quote character, leaving an unterminated string.

backslash in Yaml string [duplicate]

This question already has answers here:
Why do backslashes appear twice?
(2 answers)
Closed 8 years ago.
So I'm using yaml for some configuration files and py yaml to parse it.
For one field I have something like:
host: HOSTNAME\SERVER,5858
But when it gets parsed here what I get:
{
"host": "HOSTNAME\\SERVER,5858"
}
With 2 backslashes. I tried every combination of single quotes, double quotes, etc.
What's the best way to parse it correctly ?
Thanks
len("\\") == 1. What you see is the representation of the string as Python string literal. Backslash has special meaning in a Python literal e.g., "\n" is a single character (a newline). To get literal backslash in a string, it should be escaped "\\".
You aren't getting two backslashes. Python is displaying the single backslash as \\ so that you don't think you've actually got a \S character (which doesn't exist... but e.g. \n does, and Python is trying to be as unambiguous as possible) in your string. Here's proof:
>>> data = {"host": "HOSTNAME\\SERVER,5858"}
>>> print(data["host"])
HOSTNAME\SERVER,5858
>>>
For more background, check out the documentation for repr().

Categories