Python encoding - error: 'latin-1' codec can't encode character

Python encoding - error: 'latin-1' codec can't encode character - python

In my snippet below, I'm process a string of text thats: Déclaration.png
I return the description as unicode:
return self.render_json(request, {..."description": u''.join((instance.description)),..})
In another function, I use the description above as follows:
if document.description:
file_name = document.description.split(".")
file_name = "{}.{}.{}".format(
"_".join(file_name[:-1]),
str(document.id),
file_name[-1]
)
file_name is: [u'De\u0301claration', u'png']
When I try .format() on file_name I get the following error:
error: 'latin-1' codec can't encode character u'\u0301' in position 2: ordinal not in range(256)
Any ideas?

"{}.{}.{}" is a string but you try to fill it with unicode.
use
...
file_name = u"{}.{}.{}".format(
...
instead
also have a look at this nice talk: https://www.youtube.com/watch?v=sgHbC6udIqc

Related

'charmap' codec can't decode byte 0x9d in position 4836: character maps to <undefined>

I am trying to figure out this error that pops up from this code:
filename = os.path.join(os.path.expanduser("~"), "data", "blogs",
"1005545.male.25.Engineering.Sagittarius.xml")
#filename = open('C:/Users/spenc/data/blogs/1005545.male.25.Engineering.Sagittarius.xml',
#encoding='utf-8', errors = 'ignore')
all_posts = []
allPosts = []
with open(filename) as inf:
postStart = False
post = []
for line in inf:
line = line.strip()
if line == "<post>":
postStart = True
elif line == "</post>":
postStart = False
allPosts.append("\n".join(post))
post =[]
elif postStart:
post.append(line)
print(allPosts[0])
print(len(allPosts))
filename.close()
and get this error:
File "D:\Anaconda-Python\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4836: character maps to <undefined> here
I am just trying to figure out the encoding error to make sure this works in finding the length of the posts and print the post itself, but it keeps getting caught up on the allposts.append line. Not really sure of anywork around or if there is a newer way of doing something of this sort. I was trying to follow a textbook on it, but cant continue on in the chapter until this has been worked out.

How can I replace 'æ' 'ø' and 'å' in a text without error: 'ascii' codec can't decode byte 0xc3 in position 0

I am making a program which is supposed to open a textfile, then replace letters 'æ, ø, and å' (Danish text) with 'ae, oe, aa'.
I need to open the program and run it through the mac terminal.
I tried using the replace() function, and tried writing:
# -*- coding: utf-8 -*-
#!/usr/bin/env python
in the beginning of the file.
But I keep getting error:
File "replace.py", line 20, in replace_nonascii
word = word.replace('å', 'aa')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
any suggestions? have tried googling this for days, I have no clue how to fix it.
Here is my program:
filepath = input('insert path for text')
with codecs.open(filepath, 'r', encoding = 'utf8') as file_object:
filename_cont['text1'] = file_object.read()
def replace_nonascii(word):
word = word.lower()
word = word.replace('å', 'aa')
word = word.replace('æ', 'ae')
word = word.strip('/-.,?!')
print(word)
for text in filename_cont:
newtext = filename_cont[text]
for word in newtext.split():
replace_nonascii(word)

Unicode-encode issues while sending desktop notification using Python

I am fetching latest football scores from a website and sending a notification on the desktop (OS X). I am using BeautifulSoup to scrape the data. I had issues with the unicode data which was generating this error
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 2: ordinal not in range(128).
So I inserted this at the beginning which solved the problem while outputting on the terminal.
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
But the problem exists when I am sending notifications on the desktop. I use terminal-notifier to send desktop-notifications.
def notify (title, subtitle, message):
t = '-title {!r}'.format(title)
s = '-subtitle {!r}'.format(subtitle)
m = '-message {!r}'.format(message)
os.system('terminal-notifier {}'.format(' '.join((m, t, s))))
The below images depict the output on the terminal Vs the desktop notification.
Output on terminal.
Desktop Notification
Also, if I try to replace the comma in the string, I get the error,
new_scorer = str(new_scorer[0].text).replace(",","")
File "live_football_bbc01.py", line 41, in get_score
new_scorer = str(new_scorer[0].text).replace(",","")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 2: ordinal not in range(128)
How do I get the output on the desktop notifications like the one on the terminal? Thanks!
Edit : Snapshot of the desktop notification. (Solved)

You are formatting using !r which gives you the repr output, forget the terrible reload logic and either use unicode everywhere:
def notify (title, subtitle, message):
t = u'-title {}'.format(title)
s = u'-subtitle {}'.format(subtitle)
m = u'-message {}'.format(message)
os.system(u'terminal-notifier {}'.format(u' '.join((m, t, s))))
or encode:
def notify (title, subtitle, message):
t = '-title {}'.format(title.encode("utf-8"))
s = '-subtitle {}'.format(subtitle.encode("utf-8"))
m = '-message {}'.format(message.encode("utf-8"))
os.system('terminal-notifier {}'.format(' '.join((m, t, s))))
When you call str(new_scorer[0].text).replace(",","") you are trying to encode to ascii, you need to specify the encoding to use:
In [13]: s1=s2=s3= u'\xfc'
In [14]: str(s1) # tries to encode to ascii
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-14-589849bdf059> in <module>()
----> 1 str(s1)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 0: ordinal not in range(128)
In [15]: "{}".format(s1) + "{}".format(s2) + "{}".format(s3) # tries to encode to ascii---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-15-7ca3746f9fba> in <module>()
----> 1 "{}".format(s1) + "{}".format(s2) + "{}".format(s3)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 0: ordinal not in range(128)
You can encode straight away:
In [16]: "{}".format(s1.encode("utf-8")) + "{}".format(s2.encode("utf-8")) + "{}".format(s3.encode("utf-8"))
Out[16]: '\xc3\xbc\xc3\xbc\xc3\xbc'
Or use use all unicode prepending a u to the format strings and encoding last:
In [17]: out = u"{}".format(s1) + u"{}".format(s2) + u"{}".format(s3)
In [18]: out
Out[18]: u'\xfc\xfc\xfc'
In [19]: out.encode("utf-8")
Out[19]: '\xc3\xbc\xc3\xbc\xc3\xbc'
If you use !r you are always going to the the bytes in the output:
In [30]: print "{}".format(s1.encode("utf-8"))
ü
In [31]: print "{!r}".format(s1).encode("utf-8")
u'\xfc'
You can also pass the args using subprocess:
from subprocess import check_call
def notify (title, subtitle, message):
cheek_call(['terminal-notifier','-title',title.encode("utf-8"),
'-subtitle',subtitle.encode("utf-8"),
'-message'.message.encode("utf-8")])

Use: ˋsys.getfilesystemencoding` to get your encoding
Encode your string with it, ignore or replace errors:
import sys
encoding = sys.getfilesystemencoding()
msg = new_scorer[0].text.replace(",", "")
print(msg.encode(encoding, errons="replace"))

python: batch string en- / decoding

i have following problem:
i wrote a python script and it needs inputparameters to run... but if the parameters include one of our german "umlaute" like äüö or ß the script stops with following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position
8: ordinal not in range(128)
and if i start the script with a batchfile, the "umlaute" are replaced with random chars like ?, some other variation of the ö....
pls help me.. thx :)
part of the code:
...
if batch_exe:
try:
aIndex = sys.argv.index("-a")
buchungsart_regEx = sys.argv[aIndex+1]
except:
buchungsart_regEx = ""
else:
...
select_stmt = select_stmt + " AND REGEXP_LIKE (BUCHUNGSART, " + "'" + buchungsart_regEx + "')"
...
db_list = sde_conn.execute(select_stmt)
...
and the cmdinput is something like:
python C:\...\Script.py -i ..... -a äöüß

Check this answer: https://stackoverflow.com/a/846931/1686094
You can use his sys.argv = win32_unicode_argv()
And maybe you can then encode your sys.argv with utf-8 for future use.

You could try adding the type of encoding at the top of your script:
# -*- coding: utf-8 -*-

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9'

I use python to get json data from bing api
accountKeyEnc = base64.b64encode(accountKey + ':' + accountKey)
headers = {'Authorization': 'Basic ' + accountKeyEnc}
req = urllib2.Request(bingUrl, headers = headers)
response = urllib2.urlopen(req)
content = response.read()
data = json.loads(content)
for i in range(0,6):
print data["d"]["results"][i]["Description"]
But I got error
print data["d"]["results"][0]["Description"]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 11: ordinal not in range(128)

Your problem is that you are reading Unicode from the Bing API and then failing to explicitly convert it to ASCII. There does not exist a good mapping between the two. Prefix all of your const strings with u so that they will be seen as Unicode strings, see if that helps.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python encoding - error: 'latin-1' codec can't encode character - python

"{}.{}.{}" is a string but you try to fill it with unicode. use ... file_name = u"{}.{}.{}".format( ... instead also have a look at this nice talk: https://www.youtube.com/watch?v=sgHbC6udIqc

Related

'charmap' codec can't decode byte 0x9d in position 4836: character maps to <undefined>

How can I replace 'æ' 'ø' and 'å' in a text without error: 'ascii' codec can't decode byte 0xc3 in position 0

Unicode-encode issues while sending desktop notification using Python

python: batch string en- / decoding

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9'

Categories

Resources