easy way to convert windows paths with variables into python string - python

i tend to copy a lot of paths from a windows file explorer address bar, and they can get pretty long:
C:\Users\avnav\Documents\Desktop\SITES\SERVERS\4. server1\0_SERVER_MAKER
now let's say i want to replace avnav with getpass.getuser().
if i try something like:
project_file = f'C:\Users\{getpass.getuser()}\Documents\Desktop\SITES\SERVERS\4. server1\0_SERVER_MAKER'
i will get the following error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
i presume it's because of the single back slash - although would be cool to pinpoint exactly what part causes the error, because i'm not sure.
in any case, i can also do something like:
project_file = r'C:\Users\!!\Documents\Desktop\SITES\SERVERS\4. server1\0_SERVER_MAKER'.replace('!!', getpass.getuser())
but this becomes annoying once i have more variables.
so is there a solution to this? or do i have to find and replace slashes or something?

Error
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
Read carefully. Yes, the error is that. There's a U char after a \.
This, in Python, is the way to insert in a string a Unicode character. So the interpreter expects hex digits to be after \U, but there's none so there's an exception raised.
Replace
so is there a solution to this? or do i have to find and replace slashes or something?
That is a possible solution, after replacing all \ with / everything should be fine, but that's not the best solution.
Raw string
Have you ever heard of characters escaping? In this case \ is escaping {.
In order to prevent this, just put the path as a raw-string.
rf'C:\Users\{getpass.getuser()}\Documents\Desktop\SITES\SERVERS\4. server1\0_SERVER_MAKER'

Related

Text issues with Abspath

I've been using
CURRENT_DIR=os.path.dirname(os.path.abspath(__file__)) and it has been working well.
However, for whatever reason when I have any path with a /N such as
tk.PhotoImage(f'{CURRENT_DIR}\Library\Images\NA_notes.png')
or
wb.open_new(f'{CURRENT_DIR}\Library\PD\Vender\No Hold Open\DOOR.pdf')
I get:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 15-16: malformed \N character escape
It gets fixed if I just change the /N to /n
The problem is I have hundreds of these instances within a ton of folders, so is there any way to fix this without having to change every single folder name?
Try escaping your location as \ is used for several escape sequence characters. You can do this by replacing \ with \\. Some thing like this:
tk.PhotoImage(f'{CURRENT_DIR}\\Library\\Images\\NA_notes.png')
wb.open_new(f'{CURRENT_DIR}\\Library\\PD\\Vender\\No Hold Open\\DOOR.pdf')
If you don't want to change path then simply say to python it's raw string.
print(fr'{CURRENT_DIR}\Library\Images\NA_notes.png')

Can't do ASCII with 'u' character in python

I'm trying to do an ascii image in python but gives me this error
File "main.py", line 1
teste = print('''
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 375-376: truncated \UXXXXXXXX escape
And I think it's because of the U character, why that happened, is any way to solve this?
ASCII image
You've got \U in your string, which is being interpreted as the beginning of a Unicode ordinal escape (it expects it to be followed by 8 hex characters representing a single Unicode ordinal).
You could double the escape, making it \\U, but that would make it harder to see the image in the code itself. The simplest approach is to make it a raw string that ignores all escapes save escapes applied to the quote character, by putting an r immediately before the literal:
teste = print(r'''
Note the r immediately after the (, before the '''.

How to list Amharic (Unicode) code points in python 3.6

I want a list containing Amharic alphabet from utf-8. The character ranges are from U+1200 to U+1399. I am using windows 8. I encountered SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-5: truncated \UXXXXXXXX escape.
I tried this:
[print(c) for c in u'U1399']
How can i list the characters?
To print the characters from U-1200 to U-1399, I would use a for loop with an int control variable. It's easy enough to convert numbers to characters using chr().
The integer value 0x1200 (i.e. 1200 in hexadecimal) can be converted to the Unicode codepoint U-1200 like so: chr(0x1200) == '\u1200'.
Similarly for 0x1201, 0x1202, ... 0x1399.
Note that we use .isprintable() to filter out code some of the useless entries.
print(' '.join(chr(x) for x in range(0x1200, 0x139A) if chr(x).isprintable()))
or
for x in range(0x1200, 0x139A):
if chr(x).isprintable():
print(hex(x), chr(x))
Note that the code samples require Python3.
Your posted code doesn't produce any errors at all:
>>> [print(c) for c in u'U1399']
U
1
3
9
9
[None, None, None, None, None]
It also doesn't have any non-ASCII characters in it.
You probably wanted to use a Unicode backslash escape. And your problem is probably more like this:
>>> u'\U1399'
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-5: truncated \UXXXXXXXX escape
The reason is that—as the error message implies—a \U escape requires 8 hex digits, and you've only provided 4. So:
>>> u'\U00001399'
'᎙'
But there's a different escape, sequence \u (notice the lowercase u), which takes only 4 digits:
>>> u'\u1399'
'᎙'
If you're using Python 2.7, and possibly even with Python 3 on Windows, you may not see that nice output, but instead something with backslash escapes in it. But if you print that string, you will see the right character.
The full details for \U and \u escapes (and other escapes) are documented in String and Bytes literals (make sure to switch to the Python version you're actually using, because the details can be different, especially between 2.x and 3.x), but usually you don't need to know much more than explained above.

How to correctly represent a supplementary unicode char in python3 (3.6.1+) by using \u or \U escape within string

Recently I'm learing python and has encountered a problem with unicode escape literal in python 3.
It seems that like Java, the \u escape is interpreted as UTF-16 code point which Java uses, but here comes the problem:
For example, if I try to put a 3 bytes utf-8 char like "♬" (https://unicode-table.com/en/266C/) or even supplementary unicode char like "𠜎" (https://unicode-table.com/en/2070E/) by the format of \uXXXX or \UXXXXXXXX in a normal string as followed:
print('\u00E2\u99AC') # UTF-8, messy code for sure
print('\U00E299AC') # UTF-8, with 8 bytes \U, (unicode error) for sure
print('\u266C') # UTF-16 BE, music note appeares
# from which I suppose \u and \U function the same way they should do in Java
# (may be a little different since they function like macro in Java, and can be useed in comments)
# However, while print('\u266C') gives me '♬','\u266C' == '♬' is equal to false
# which is true in Java semantics.
# Further more, print('\UD841DF0E') didn't give me '𠜎' : (unicode error) 'unicodeescape' codec can't decode bytes in position 0-9: illegal Unicode character
# which I suppose it should be, so it appears to me that I may get it wrong
# Here again : print('\uD841\uDF0E') # Error, 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed
print('\xD8\x41\xDF\x0E') # also tried this, messy code
# maybe UTF-16 LE?
print('\u41D8\u0EDF') # messy code
print('\U41D80EDF') # error
So, I could see that python "doesn't support supplementary escape literal", and its behavior is also weird.
Well, I already know that the correct way to decode and encode such characters:
s_decoded = '\\xe2\\x99\\xac'.encode().decode('unicode-escape')\
.encode('latin-1').decode('utf-8')
print(b'\xf0\xa0\x9c\x8e'.decode('utf-8'))
print(b'\xd8\x41\xdf\x0e'.decode('utf-16 be'))
assert s_decoded == '♬'
But still don't get how to do it right using \u & \U escape literal. Hopefully someone could point it out what I'm doing wrong and how it differs from Java's way, thanks!
By the way, my environment is PyCharm win, python 3.6.1, source code is encoded as UTF-8
Python 3.6.3:
>>> print('\u266c') # U+266C
♬
>>> print('\U0002070E') # U+2070E. Python is not Java
𠜎
>>> '\u266c' == '♬'
True
>>> '\U0002070E' == '𠜎'
True

Unexpected recurrence of Python unicode string ascii codec error

After days and months of desperation I recently found a solution to overcome the infamous UnicodeEncodeError: 'ascii' codec cant encoe character u'\u2026' in position 18: ordinal not in range (128). It was dealing with multilingual strings pretty well until recently, I bumped into this error AGAIN!
I tried type(thatstring) and it returned Unicode.
So I tried:
thatstring=thatstring.decode('utf-8')
This was handling those multilanguage strings pretty well but it came back now. I also tried
thatstring=thatstring.decode('utf-8','ignore')
No use.
thatstring=thatstring.encode('utf-8','ignore')
bounces with the error
UnicodeDecodeError: 'ascii' codec cant decode byte 0xc3 in position 48: ordinal not in range (128) faster than its counterpart.
Please help me. Thanks.
You did the right thing by trying type(thatstring), but you didn't draw the right conclusion from the result.
A unicode string has already been decoded, so trying to decode it again will produce an error if it contains non-ascii characters. When you use decode() on a unicode object, you effectively force python to do something like this:
temp = thatstring.encode('ascii') # convert unicode to bytes first
thatstring = temp.decode('utf-8') # now decode bytes back to unicode
Obviously, the first line will blow up as soon as it finds a non-ascii character, which explains why you see a unicode encode error, even though you are trying to decode the string. So the simple answer to your problem is: don't do that!
Instead, whenever your program receives string inputs, and wants to make sure they're converted to unicode, it should do something like this:
if isinstance(thatstring, bytes):
thatstring = thatstring.decode(encoding)

Categories