Changing file path and need for raw? [duplicate] - python

This question already has answers here:
What exactly do "u" and "r" string prefixes do, and what are raw string literals?
(7 answers)
Closed 1 year ago.
import os
cwd = os.getcwd()
print("Current working directory: {0}".format(cwd))
# Print the type of the returned object
print("os.getcwd() returns an object of type: {0}".format(type(cwd)))
os.chdir(r"C:\Users\ghph0\AppData\Local\Programs\Python\Python39\Bootcamp\PDFs")
# Print the current working directory
print("Current working directory: {0}".format(os.getcwd()))
Hi all, I was changing my file directory so I could access specific files and was then greeted with this error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
From there I did some research and was told that converting the string to raw would fix the problem. My question is why do I convert it to raw and what does it do and why does it turn the file path into a red colour(not really important but never seen this before). Picture below:
https://i.stack.imgur.com/4oHlC.png
Many thanks to anyone that can help.

Backslashes in strings have a specific meaning in Python and are translated by the interpreter. You have surely already encountered "\n". Despite taking two letters to type, that is actually a one-character string meaning "newline". ANY backslashes in a string are interpreted that way. In your particular case, you used "\U", which is the way Python allows typing long Unicode values. "\U1F600", for example, is the grinning face emoji.
Because regular expressions often need to use backslashes for other uses, Python introduced the "raw" string. In a raw string, backslashes are not interpreted. So, r"\n" is a two-character string containing a backslash and an "n". This is NOT a newline.
Windows paths often use backslashes, so raw strings are convenient there. As it turns out, every Windows API will also accept forward slashes, so you can use those as well.
As for the colors, that probably means your editor doesn't know how to interpret raw strings.

Related

How to make python 3 understand double backslash? [duplicate]

This question already has answers here:
How to fix "<string> DeprecationWarning: invalid escape sequence" in Python?
(2 answers)
Closed 1 year ago.
So, as SO keeps suggesting me, I do not want to replace double backslashes, I want python to understand them.
I need to copy files from a windows distant directory to my local machine.
For example, a "equivalent" (even if not) in shell (with windows paths):
cp \\directory\subdirectory\file ./mylocalfile
But python does not even understand double backslashes in strings:
source = "\\dir\subdir\file"
print(source)
$ python __main__.py
__main__.py:1: DeprecationWarning: invalid escape sequence \s
source = "\\dir\subdir\file"
\dir\subdir
ile
Is Python able to understand windows paths (with double backslashes) in order to perform file copies ?
You can try this also:
source = r"\dir\subdir\file"
print(source)
# \dir\subdir\file
You can solve this issue by using this raw string also.
What we are doing here is making "\dir\subdir\file" to raw string by using r at first.
You can visit here for some other information.
raw strings are raw string literals that treat backslash (\ ) as a literal character. For example, if we try to print a string with a “\n” inside, it will add one line break. But if we mark it as a raw string, it will simply print out the “\n” as a normal character.

What is the right way to encode a string with backslashes? [duplicate]

This question already has answers here:
How to fix "<string> DeprecationWarning: invalid escape sequence" in Python?
(2 answers)
Closed 4 years ago.
In the given example: "\info\more info\nName"
how would I turn this into bytes
I tried using unicode-escape but that didn't seem to work :(
data = "\info\more info\nName"
dataV2 = str.encode(data)
FinalData = dataV2.decode('unicode-escape').encode('utf_8')
print(FinalData)
This is were I should get b'\info\more info\nName'
but something unexpected happens and I get DeprecationWarnings in my terminal
I'm assuming that its because of the backslashes causing a invalid sequence but I need them for this project
Backslashes before characters indicate an attempt to escape the character that follows to make it into a special character of some sort. You get the DeprecationWarning because Python is (finally) going to make unrecognized escapes an error, rather than silently treating them as a literal backslash followed by the character.
To fix, either double your backslashes (not sure if you intended a newline; if so, double double the backslash before the n):
data = "\\info\\more info\\nName"
or, if you want all the backslashes to be literal backslashes (the \n shouldn't be a newline), then you can use a raw string by prefixing with r:
data = r"\info\more info\nName"
which disables backslashes interpolation for everything except the quote character itself.
Note that if you just let data echo in the interactive interpreter, it will show the backslashes as doubled (because it implicitly uses the repr of the str, which is what you'd type to reproduce it). To avoid that, print the str to see what it would actually look like:
>>> "\\info\\more info\\nName" # repr produced by simply evaluating it, which shows backslashes doubled, but there's really only one each time
"\\info\\more info\\nName"
>>> print("\\info\\more info\\nName") # print shows the "real" contents
\info\more info\nName
>>> print("\\info\\more info\nName") # With new line left in place
\info\more info
Name
>>> print(r"\info\more info\nName") # Same as first option, but raw string means no doubling backslashes
\info\more info\nName
You can escape a backslash with another backslash.
data = "\\info\\more info\nName"
You could also use a raw string for the parts that don't need escapes.
data = r"\info\more info""\nName"
Note that raw strings don't work if the final character is a backslash.

Ignoring escape sequences

I'm using Python 2.6 and I have a variable which contains a string (I have sent it thorugh sockets and now I want to do something with it).
The problem is that I get the following error:
TypeError: file() argument 1 must be encoded string without NULL bytes, not str
After I looked it up I found out that the problem is probably that the string I'm sending contains '\0' but it isn't a literal string that I can just edit with double backslash or adding a 'r' before hand, so is there a way to tell python to ignore the escape sequences and treat the whole thing as string?
(For example - I don't want python to treat the sequence \0 as a null char, but rather I want it to be treated as a backslash char followed by a zero char)
Considering all comments it looks like incorrectly used PIL/Pillow API, namely the Image.open function that requires file name instead of file data.

SyntaxError when trying to use backslash for Windows file path [duplicate]

This question already has answers here:
How can I put an actual backslash in a string literal (not use it for an escape sequence)?
(4 answers)
Closed 7 months ago.
I tried to confirm if a file exists using the following line of code:
os.path.isfile()
But I noticed if back slash is used by copy&paste from Windows OS:
os.path.isfile("C:\Users\xxx\Desktop\xxx")
I got a syntax error: (unicode error) etc etc etc.
When forward slash is used:
os.path.isfile("C:/Users/xxx/Desktop/xxx")
It worked.
Can I please ask why this happened? Even the answer is as simple as :"It is a convention."
Backslash is the escape symbol. This should work:
os.path.isfile("C:\\Users\\xxx\\Desktop\\xxx")
This works because you escape the escape symbol, and Python passes it as this literal:
"C:\Users\xxx\Desktop\xxx"
But it's better practice and ensures cross-platform compatibility to collect your path segments (perhaps conditionally, based on the platform) like this and use os.path.join
path_segments = ['/', 'Users', 'xxx', 'Desktop', 'xxx']
os.path.isfile(os.path.join(*path_segments))
Should return True for your case.
Because backslashes are escapes in Python. Specifically, you get a Unicode error because the \U escape means "Unicode character here; next 8 characters are a 32-bit hexadecimal codepoint."
If you use a raw string, which treats backslashes as themselves, it should work:
os.path.isfile(r"C:\Users\xxx\Desktop\xxx")
You get the problem with the 2 character sequences \x and \U -- which are python escape codes. They tell python to interpret the data that comes after them in a special way (The former inserts bytes and the latter unicode). You can get around it by using a "raw" string:
os.path.isfile(r"C:\Users\xxx\Desktop\xxx")
or by using forward slashes (as, IIRC, windows will accept either one).

how to convert u'\uf04a' to unicode in python [duplicate]

This question already has answers here:
Python unicode codepoint to unicode character
(4 answers)
Closed 1 year ago.
I am trying to decode u'\uf04a' in python thus I can print it without error warnings. In other words, I need to convert stupid microsoft Windows 1252 characters to actual unicode
The source of html containing the unusual errors comes from here http://members.lovingfromadistance.com/showthread.php?12338-HAVING-SECOND-THOUGHTS
Read about u'\uf04a' and u'\uf04c' by clicking here http://www.fileformat.info/info/unicode/char/f04a/index.htm
one example looks like this:
"Oh god please some advice ":
Out[408]: u'Oh god please some advice \uf04c'
Given a thread like this as one example for test:
thread = u'who are you \uf04a Why you are so harsh to her \uf04c'
thread.decode('utf8')
print u'\uf04a'
print u'\uf04a'.decode('utf8') # error!!!
'charmap' codec can't encode character u'\uf04a' in position 1526: character maps to undefined
With the help of two Python scripts, I successfully convert the u'\x92', but I am still stuck with u'\uf04a'. Any suggestions?
References
https://github.com/AnthonyBRoberts/NNS/blob/master/tools/killgremlins.py
Handling non-standard American English Characters and Symbols in a CSV, using Python
Solution:
According to the comments below: I replace these character set with the question mark('?')
thread = u'who are you \uf04a Why you are so harsh to her \uf04c'
thread = thread.replace(u'\uf04a', '?')
thread = thread.replace(u'\uf04c', '?')
Hope this helpful to the other beginners.
The notation u'\uf04a' denotes the Unicode codepoint U+F04A, which is by definition a private use codepoint. This means that the Unicode standard does not assign any character to it, and never will; instead, it can be used by private agreements.
It is thus meaningless to talk about printing it. If there is a private agreement on using it in some context, then you print it using a font that has a glyph allocated to that codepoint. Different agreements and different fonts may allocate completely different characters and glyphs to the same codepoint.
It is possible that U+F04A is a result of erroneous processing (e.g., wrong conversions) of character data at some earlier phase.
u'\uf04a'
already is a Unicode object, which means there's nothing to decode. The only thing you can do with it is encode it, if you're targeting a specific file encoding like UTF-8 (which is not the same as Unicode, but is confused with it all the time).
u'\uf04a'.encode("utf-8")
gives you a string (Python 2) or bytes object (Python 3) which you can then write to a file or a UTF-8 terminal etc.
You won't be able to encode it as a plain Windows string because cp1252 doesn't have that character.
What you can do is convert it to an encoding that doesn't have those offending characters by telling the encoder to replace missing characters by ?:
>>> u'who\uf04a why\uf04c'.encode("ascii", errors="replace")
'who? why?'

Categories