Text issues with Abspath - python

I've been using
CURRENT_DIR=os.path.dirname(os.path.abspath(__file__)) and it has been working well.
However, for whatever reason when I have any path with a /N such as
tk.PhotoImage(f'{CURRENT_DIR}\Library\Images\NA_notes.png')
or
wb.open_new(f'{CURRENT_DIR}\Library\PD\Vender\No Hold Open\DOOR.pdf')
I get:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 15-16: malformed \N character escape
It gets fixed if I just change the /N to /n
The problem is I have hundreds of these instances within a ton of folders, so is there any way to fix this without having to change every single folder name?

Try escaping your location as \ is used for several escape sequence characters. You can do this by replacing \ with \\. Some thing like this:
tk.PhotoImage(f'{CURRENT_DIR}\\Library\\Images\\NA_notes.png')
wb.open_new(f'{CURRENT_DIR}\\Library\\PD\\Vender\\No Hold Open\\DOOR.pdf')

If you don't want to change path then simply say to python it's raw string.
print(fr'{CURRENT_DIR}\Library\Images\NA_notes.png')

Related

easy way to convert windows paths with variables into python string

i tend to copy a lot of paths from a windows file explorer address bar, and they can get pretty long:
C:\Users\avnav\Documents\Desktop\SITES\SERVERS\4. server1\0_SERVER_MAKER
now let's say i want to replace avnav with getpass.getuser().
if i try something like:
project_file = f'C:\Users\{getpass.getuser()}\Documents\Desktop\SITES\SERVERS\4. server1\0_SERVER_MAKER'
i will get the following error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
i presume it's because of the single back slash - although would be cool to pinpoint exactly what part causes the error, because i'm not sure.
in any case, i can also do something like:
project_file = r'C:\Users\!!\Documents\Desktop\SITES\SERVERS\4. server1\0_SERVER_MAKER'.replace('!!', getpass.getuser())
but this becomes annoying once i have more variables.
so is there a solution to this? or do i have to find and replace slashes or something?
Error
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
Read carefully. Yes, the error is that. There's a U char after a \.
This, in Python, is the way to insert in a string a Unicode character. So the interpreter expects hex digits to be after \U, but there's none so there's an exception raised.
Replace
so is there a solution to this? or do i have to find and replace slashes or something?
That is a possible solution, after replacing all \ with / everything should be fine, but that's not the best solution.
Raw string
Have you ever heard of characters escaping? In this case \ is escaping {.
In order to prevent this, just put the path as a raw-string.
rf'C:\Users\{getpass.getuser()}\Documents\Desktop\SITES\SERVERS\4. server1\0_SERVER_MAKER'

Reading .dat file through pandas.read_csv( ) gives unicode error

I've a set of .dat files and I'm not sure what type pf data they carry (mostly non video, audio content - should be a mix of integer, text and special characters). I came to learn that we read .dat files using pandas read_csv or read_table into Python and I tried the below
DATA = pd.read_csv(r'file_path\Data.dat', header=None)
below is the error I get
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes
in position 2-3: truncated \UXXXXXXXX escape
I've tried the solutions listed from around including Stack overflow and blogs, and tried the below options too, but none of them worked
Used Double quotes for filepath instead of single quote (pd.read_csv(r"filepath"))
Use double backslash instead of single backslash
Use forward slash
Use double backslash only at the beginning of the filepath, something like C:\User\folder....
Tried a few encodings like utf-8, ascii, latin-1 etc., and the error for all the above is "EmptyDataError: No columns to parse from file"
Tried without r in the read_csv argument. Didn't work
Tried 'sep='\s+', also set skiprows too. No use
One thing to mention is that one of my folder name has numbers apart from text. Does that create this issue by any chance?
Can someone highlight what I need to do...thanks in advance

Changing file path and need for raw? [duplicate]

This question already has answers here:
What exactly do "u" and "r" string prefixes do, and what are raw string literals?
(7 answers)
Closed 1 year ago.
import os
cwd = os.getcwd()
print("Current working directory: {0}".format(cwd))
# Print the type of the returned object
print("os.getcwd() returns an object of type: {0}".format(type(cwd)))
os.chdir(r"C:\Users\ghph0\AppData\Local\Programs\Python\Python39\Bootcamp\PDFs")
# Print the current working directory
print("Current working directory: {0}".format(os.getcwd()))
Hi all, I was changing my file directory so I could access specific files and was then greeted with this error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
From there I did some research and was told that converting the string to raw would fix the problem. My question is why do I convert it to raw and what does it do and why does it turn the file path into a red colour(not really important but never seen this before). Picture below:
https://i.stack.imgur.com/4oHlC.png
Many thanks to anyone that can help.
Backslashes in strings have a specific meaning in Python and are translated by the interpreter. You have surely already encountered "\n". Despite taking two letters to type, that is actually a one-character string meaning "newline". ANY backslashes in a string are interpreted that way. In your particular case, you used "\U", which is the way Python allows typing long Unicode values. "\U1F600", for example, is the grinning face emoji.
Because regular expressions often need to use backslashes for other uses, Python introduced the "raw" string. In a raw string, backslashes are not interpreted. So, r"\n" is a two-character string containing a backslash and an "n". This is NOT a newline.
Windows paths often use backslashes, so raw strings are convenient there. As it turns out, every Windows API will also accept forward slashes, so you can use those as well.
As for the colors, that probably means your editor doesn't know how to interpret raw strings.

Why does "SyntaxError: (unicode error)" occur when raw string is in triple quotes?

Whenever I put triple quotes around a raw string, the following error occurs:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 28-29: malformed \N character escape
I was wondering why this is the case and if there is any way to avoid it.
I have tried moving the triple quotes to align with various parts of the code but nothing has worked so far.
This runs without error:
final_dir = (r'C:\Documents\Newsletters')
'''
path_list = []
for file in os.listdir(final_dir):
path = os.path.join(final_dir, file)
path_list.append(path)
'''
But then this creates an error:
'''
final_dir = (r'C:\Documents\Newsletters')
path_list = []
for file in os.listdir(final_dir):
path = os.path.join(final_dir, file)
path_list.append(path)
'''
In a string literal like '\N', \N has a special meaning:
\N{name} Character named name in the Unicode database
from String and Bytes literals - Python 3 documentation
For example, '\N{tilde}' becomes '~'.
Since you're quoting code, you probably want to use a raw string literal:
r'\N'
For example:
>>> r"""r'C:\Documents\Newsletters'"""
"r'C:\\Documents\\Newsletters'"
Or you could escape the backslash:
'\\N'
The error doesn't occur for \D because it doesn't have a special meaning.
Thanks to deceze for practically writing this answer in the comments

Reading Unicode Files - Python3.2

I'm trying to read some files using Python3.2, the some of the files may contain unicode while others do not.
When I try:
file = open(item_path + item, encoding="utf-8")
for line in file:
print (repr(line))
I get the error:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 13-16: ordinal not in range(128)
I am following the documentation here: http://docs.python.org/release/3.0.1/howto/unicode.html
Why would Python be trying to encode to ascii at any point in this code?
The problem is that repr(line) in Python 3 returns also the Unicode string. It does not convert the above 128 characters to the ASCII escape sequences.
Use ascii(line) instead if you want to see the escape sequences.
Actually, the repr(line) is expected to return the string that if placed in a source code would produce the object with the same value. This way, the Python 3 behaviour is just fine as there is no need for ASCII escape sequences in the source files to express a string with more than ASCII characters. It is quite natural to use UTF-8 or some other Unicode encoding these day. The truth is that Python 2 produced the escape sequences for such characters.
What's your output encoding? If you remove the call to print(), does it start working?
I suspect you've got a non-UTF-8 locale, so Python is trying to encode repr(line) as ASCII as part of printing it.
To resolve the issue, you must either encode the string and print the byte array, or set your default encoding to something that can handle your strings (UTF-8 being the obvious choice).

Categories