I got a tiny script in ArcGIS which creates a hyperlink.
My code:
def Befahrung(value1, value2):
if value1 is '':
return ''
else:
return "G:\\Example\\" + str(value1) + "\\File_" + str(value2) + ".pdf"
The error (only when !Bezeichnun! contains a special character):
ERROR 000539: Error running expression: Befahrung(u" ",u"1155Mönch1")
Traceback (most recent call last):
File "<expression>", line 1 in <module>
File "<string>", line 5 in Befahrung
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 5: ordinal not in range(128)
!Bezeichnun! and !Auftrag! are both strings. It works very well until !Bezeichnun! contains a special character. I can't change the characters, I need to save them.
What do I have to change?
In Befahrung, you convert a string (Unicode in this case) to ASCII:
str(value1);
str(value2);
cannot work if value1 or value2 contain non-ASCII characters. You want to use
unicode(value1)
or better, use string formatting:
return u"G:\\Example\\{}\\File_{}.pdf".format(value1, value2)
(works in Python 2.7 and above)
I recommend reading the Python Unicode HOWTO. The error can be distilled to
>>> str(u"1155Mönch1")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 5: ordinal not in range(128)
If you know what character encoding you need (e.g., UTF-8), you can encode it like
value1.encode('utf-8')
Related
I have this 😭👏🏻 (loudly crying face and clapping hand emoji character) in a string.txt file (encoded in utf-8).
I am trying to print it out into the default python IDLE, in a sentence.
with open('string.txt','r') as f:
string = f.read()
The code:
>>> string
'\xf0\x9f\x98\xad\xf0\x9f\x91\x8f\xf0\x9f\x8f\xbb'
>>> print string
ð゚リᆳð゚ムマð゚マᄏ
>>> print string.decode('utf-8')
😭👏🏻 # <-- this is the output I want in a middle of sentence
That's the output I want (rectangles). The tricky part is that I want them in middle of a sentence. So:
>>> print 'The string is: {}!'.format(string.decode('utf-8')) # will get error
Traceback (most recent call last):
File "<pyshell#81>", line 1, in <module>
print 'The string is: {}!'.format(string.decode('utf-8'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)
Got an error. But if I don't decode it, it works:
>>> print 'The string is: {}!'.format(string)
The string is: ð゚リᆳð゚ムマð゚マᄏ!
It did not raise any error, but I don't want this output. I want the rectangles.
How should I solve this issue so it will behave like this:
>>> print 'The string is: {}!'.format(magical_string)
The string is: 😭👏🏻!
Preferred to not use any 3rd party library.
EDIT:
My Operating System: Windows 7 (preferred solution for all Windows 7-10)
Python: 2.7
I think it's a setting of your IDE, and not really a python issue.
When I save the first line of your question into a txt file and read it:
Copied from terminal:
>>> open('test.txt').read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\joost\Desktop\pythontests\venv\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 19: character maps to <undefined>
>>> open('test.txt', encoding='utf-8').read()
'I have this 😭👏🏻 (loudly crying\n'
>>>
As a picture:
Perhaps specify your encoding when opening the file?
I need to print some unicode characters in Python 2.7.
Why does the following code gave me an error?
print(u'\u2601')
Output
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2601' in position 0: ordinal not in range(128)
I'm getting errors such as
UnicodeEncodeError('ascii', u'\x01\xff \xfeJ a z z', 1, 2, 'ordinal not in range(128)'
I'm also getting sequences such as
u'\x17\x01\xff \xfeA r t B l a k e y'
I recognize \x01\xff\xfe as a BOM, but how do I transform these into the obvious output (Jazz and Art Blakey)?
These are coming from a program that reads music file tags.
I've tried various encodings, such a s.encode('utf8'), and various decodes followed by encodes, without success.
As requested:
from hsaudiotag import auto
inf = 'test.mp3'
song = auto.File(inf)
print song.album, song.artist, song.title, song.genre
> Traceback (most recent call last): File "audio2.py", line 4, in
> <module>
> print song.album, song.artist, song.title, song.genre File "C:\program files\python27\lib\encodings\cp437.py", line 12, in encode
> return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\xfe' in
> position 4 : character maps to <undefined>
If I change the print statement to
with open('x', 'wb') as f:
f.write(song.genre)
I get
Traceback (most recent call last):
File "audio2.py", line 6, in <module>
f.write(song.genre)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in position 1:
ordinal not in range(128)
For your actual question, you need to write bytes, not characters, to files. Call:
f.write(song.genre.encode('utf-8'))
and you won't get the error. You can use io.open to get a character stream that you can write to with the encoding done automatically, ie:
with io.open('x', 'wb', encoding='utf-8') as f:
f.write(song.genre)
Getting Unicode to the Console can be a matter of some difficulty (under Windows in particular)—see PrintFails.
However, as discussed in the comments, what you've got doesn't look like a working tag value... it looks more like an mangled ID3v2 frame data block, which it might not be possible to recover. I don't know if this is a bug in your tag reading library or you just have a file with rubbish tags.
I have scripts which print out messages by the logging system or sometimes print commands. On the Windows console I get error messages like
Traceback (most recent call last):
File "C:\Python32\lib\logging\__init__.py", line 939, in emit
stream.write(msg)
File "C:\Python32\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 4537:character maps to <undefined>
Is there a general way to make all encodings in the logging system, print commands, etc. fail-safe (ignore errors)?
The problem is that your terminal/shell (cmd as your are on Windows) cannot print every Unicode character.
You can fail-safe encode your strings with the errors argument of the str.encode method. For example you can replace not supported chars with ? by setting errors='replace'.
>>> s = u'\u2019'
>>> print s
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\cp850.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can\'t encode character u'\u2019' in position
0: character maps to <undefined>
>>> print s.encode('cp850', errors='replace')
?
See the documentation for other options.
Edit If you want a general solution for the logging, you can subclass StreamHandler:
class CustomStreamHandler(logging.StreamHandler):
def emit(self, record):
record = record.encode('cp850', errors='replace')
logging.StreamHandler.emit(self, record)
I am at a scenario where I call api and based on the results from api I call database for each record that I in api. My api call return strings and when I make the database call for the items return by api, for some elements I get the following error.
Traceback (most recent call last):
File "TopLevelCategories.py", line 267, in <module>
cursor.execute(categoryQuery, {'title': startCategory});
File "/opt/ts/python/2.7/lib/python2.7/site-packages/MySQLdb/cursors.py", line 158, in execute
query = query % db.literal(args)
File "/opt/ts/python/2.7/lib/python2.7/site-packages/MySQLdb/connections.py", line 265, in literal
return self.escape(o, self.encoders)
File "/opt/ts/python/2.7/lib/python2.7/site-packages/MySQLdb/connections.py", line 203, in unicode_literal
return db.literal(u.encode(unicode_literal.charset))
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013' in position 3: ordinal not in range(256)
The segment of my code the above error is referring is:
...
for startCategory in value[0]:
categoryResults = []
try:
categoryRow = ""
baseCategoryTree[startCategory] = []
#print categoryQuery % {'title': startCategory};
cursor.execute(categoryQuery, {'title': startCategory}) #unicode issue
done = False
cont...
After doing some google search I tried the following on my command line to understand whats going on...
>>> import sys
>>> u'\u2013'.encode('iso-8859-1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013' in position 0: ordinal not in range(256)
>>> u'\u2013'.encode('cp1252')
'\x96'
>>> '\u2013'.encode('cp1252')
'\\u2013'
>>> u'\u2013'.encode('cp1252')
'\x96'
But I am not sure what would be the solution to overcome this issue. Also I don't know what is the theory behind encode('cp1252') it would be great if I can get some explanation on what I tried above.
If you need Latin-1 encoding, you have several options to get rid of the en-dash or other code points above 255 (characters not included in Latin-1):
>>> u = u'hello\u2013world'
>>> u.encode('latin-1', 'replace') # replace it with a question mark
'hello?world'
>>> u.encode('latin-1', 'ignore') # ignore it
'helloworld'
Or do your own custom replacements:
>>> u.replace(u'\u2013', '-').encode('latin-1')
'hello-world'
If you aren't required to output Latin-1, then UTF-8 is a common and preferred choice. It is recommended by the W3C and nicely encodes all Unicode code points:
>>> u.encode('utf-8')
'hello\xe2\x80\x93world'
The unicode character u'\02013' is the "en dash". It is contained in the Windows-1252 (cp1252) character set (with the encoding x96), but not in the Latin-1 (iso-8859-1) character set. The Windows-1252 character set has some more characters defined in the area x80 - x9f, among them the en dash.
The solution would be for you to choose a different target character set than Latin-1, such as Windows-1252 or UTF-8, or to replace the en dash with a simple "-".
u.encode('utf-8') converts it to bytes which can then be printed on stdout using sys.stdout.buffer.write(bytes)
checkout the displayhook on
https://docs.python.org/3/library/sys.html