Unicode-encode issues while sending desktop notification using Python - python

I am fetching latest football scores from a website and sending a notification on the desktop (OS X). I am using BeautifulSoup to scrape the data. I had issues with the unicode data which was generating this error
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 2: ordinal not in range(128).
So I inserted this at the beginning which solved the problem while outputting on the terminal.
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
But the problem exists when I am sending notifications on the desktop. I use terminal-notifier to send desktop-notifications.
def notify (title, subtitle, message):
t = '-title {!r}'.format(title)
s = '-subtitle {!r}'.format(subtitle)
m = '-message {!r}'.format(message)
os.system('terminal-notifier {}'.format(' '.join((m, t, s))))
The below images depict the output on the terminal Vs the desktop notification.
Output on terminal.
Desktop Notification
Also, if I try to replace the comma in the string, I get the error,
new_scorer = str(new_scorer[0].text).replace(",","")
File "live_football_bbc01.py", line 41, in get_score
new_scorer = str(new_scorer[0].text).replace(",","")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 2: ordinal not in range(128)
How do I get the output on the desktop notifications like the one on the terminal? Thanks!
Edit : Snapshot of the desktop notification. (Solved)

You are formatting using !r which gives you the repr output, forget the terrible reload logic and either use unicode everywhere:
def notify (title, subtitle, message):
t = u'-title {}'.format(title)
s = u'-subtitle {}'.format(subtitle)
m = u'-message {}'.format(message)
os.system(u'terminal-notifier {}'.format(u' '.join((m, t, s))))
or encode:
def notify (title, subtitle, message):
t = '-title {}'.format(title.encode("utf-8"))
s = '-subtitle {}'.format(subtitle.encode("utf-8"))
m = '-message {}'.format(message.encode("utf-8"))
os.system('terminal-notifier {}'.format(' '.join((m, t, s))))
When you call str(new_scorer[0].text).replace(",","") you are trying to encode to ascii, you need to specify the encoding to use:
In [13]: s1=s2=s3= u'\xfc'
In [14]: str(s1) # tries to encode to ascii
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-14-589849bdf059> in <module>()
----> 1 str(s1)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 0: ordinal not in range(128)
In [15]: "{}".format(s1) + "{}".format(s2) + "{}".format(s3) # tries to encode to ascii---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-15-7ca3746f9fba> in <module>()
----> 1 "{}".format(s1) + "{}".format(s2) + "{}".format(s3)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 0: ordinal not in range(128)
You can encode straight away:
In [16]: "{}".format(s1.encode("utf-8")) + "{}".format(s2.encode("utf-8")) + "{}".format(s3.encode("utf-8"))
Out[16]: '\xc3\xbc\xc3\xbc\xc3\xbc'
Or use use all unicode prepending a u to the format strings and encoding last:
In [17]: out = u"{}".format(s1) + u"{}".format(s2) + u"{}".format(s3)
In [18]: out
Out[18]: u'\xfc\xfc\xfc'
In [19]: out.encode("utf-8")
Out[19]: '\xc3\xbc\xc3\xbc\xc3\xbc'
If you use !r you are always going to the the bytes in the output:
In [30]: print "{}".format(s1.encode("utf-8"))
ü
In [31]: print "{!r}".format(s1).encode("utf-8")
u'\xfc'
You can also pass the args using subprocess:
from subprocess import check_call
def notify (title, subtitle, message):
cheek_call(['terminal-notifier','-title',title.encode("utf-8"),
'-subtitle',subtitle.encode("utf-8"),
'-message'.message.encode("utf-8")])

Use: ˋsys.getfilesystemencoding` to get your encoding
Encode your string with it, ignore or replace errors:
import sys
encoding = sys.getfilesystemencoding()
msg = new_scorer[0].text.replace(",", "")
print(msg.encode(encoding, errons="replace"))

Related

Python convention name files encoding from iso-8859-5 to utf-8

I have about 3500 files whose name is encoded in 'iso-8859-5' and the contents too.
here's how it looks on the Linux console and the 7 zip program:
I'm trying to write a script that converts to 'UTF-8'
# -*- coding: utf-8 -*-
import os
#Exemple
# how it should look like
#iso-8859-5 ==> utf-8
#НјБ_ФШРУ_Г99 ==> ЭМС_диаг_У99
path = r"C://Users//Kamel//Desktop//работа//macros"
obj = os.scandir(path)
for entry in obj:
if entry.is_dir() or entry.is_file():
command = entry.name
print(command, end="\t\t")
file_name = command.encode('iso-8859-5').decode('UTF-8')
print(command)
I get this error
C:\Python\Python310\python.exe D:/PycharmProjects/pythonProject3/ansi_to_utf.py
Traceback (most recent call last):
File "D:\PycharmProjects\pythonProject3\ansi_to_utf.py", line 15, in <module>
file_name = command.encode('iso-8859-5').decode('UTF-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 11: invalid start byte
BE_BEF BE_BEF
BE_BEF_IMP_0 BE_BEF_IMP_0
BE_BEF_IMP_1 BE_BEF_IMP_1
BE_BEF_IMP_6 BE_BEF_IMP_6
BE_BEF_IMP_7 BE_BEF_IMP_7
BE_BEF_IMP_8 BE_BEF_IMP_8
BE_BEF_IMP_K BE_BEF_IMP_K
BE_BEF_IMP_T BE_BEF_IMP_T
BE_BEF_IMP_В
Process finished with exit code 1
A mojibake case. Your example НјБ_ФШРУ_Г99 ==> ЭМС_диаг_У99 could be accomplished as:
'НјБ_ФШРУ_Г99'.encode('cp1251').decode('iso-8859-5')
# 'ЭМС_диаг_У99'
or (alternatively) as
'НјБ_ФШРУ_Г99'.encode('ptcp154').decode('iso-8859-5')
# 'ЭМС_диаг_У99'
Your failing example (… can't decode byte 0xb2 in position 11):
'BE_BEF_IMP_В'.encode('iso-8859-5')
# b'BE_BEF_IMP_\xb2'
is solved using the same mechanism:
'BE_BEF_IMP_В'.encode('cp1251').decode('iso-8859-5')
# 'BE_BEF_IMP_Т'

How can I replace 'æ' 'ø' and 'å' in a text without error: 'ascii' codec can't decode byte 0xc3 in position 0

I am making a program which is supposed to open a textfile, then replace letters 'æ, ø, and å' (Danish text) with 'ae, oe, aa'.
I need to open the program and run it through the mac terminal.
I tried using the replace() function, and tried writing:
# -*- coding: utf-8 -*-
#!/usr/bin/env python
in the beginning of the file.
But I keep getting error:
File "replace.py", line 20, in replace_nonascii
word = word.replace('å', 'aa')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
any suggestions? have tried googling this for days, I have no clue how to fix it.
Here is my program:
filepath = input('insert path for text')
with codecs.open(filepath, 'r', encoding = 'utf8') as file_object:
filename_cont['text1'] = file_object.read()
def replace_nonascii(word):
word = word.lower()
word = word.replace('å', 'aa')
word = word.replace('æ', 'ae')
word = word.strip('/-.,?!')
print(word)
for text in filename_cont:
newtext = filename_cont[text]
for word in newtext.split():
replace_nonascii(word)

Python encoding - error: 'latin-1' codec can't encode character

In my snippet below, I'm process a string of text thats: Déclaration.png
I return the description as unicode:
return self.render_json(request, {..."description": u''.join((instance.description)),..})
In another function, I use the description above as follows:
if document.description:
file_name = document.description.split(".")
file_name = "{}.{}.{}".format(
"_".join(file_name[:-1]),
str(document.id),
file_name[-1]
)
file_name is: [u'De\u0301claration', u'png']
When I try .format() on file_name I get the following error:
error: 'latin-1' codec can't encode character u'\u0301' in position 2: ordinal not in range(256)
Any ideas?
"{}.{}.{}" is a string but you try to fill it with unicode.
use
...
file_name = u"{}.{}.{}".format(
...
instead
also have a look at this nice talk: https://www.youtube.com/watch?v=sgHbC6udIqc

UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2019' in position 4: ordinal not in range(256)

I am using eyeD3 to edit metadata of mp3 files. I am unable to set lyrics tag.
def fetch_lyrics(title, artist):
URL='http://makeitpersonal.co/lyrics?artist=%s&title=%s'
webaddr=(URL %(artist, title)).replace(" ", "%20")
print webaddr
response = requests.get(webaddr)
if response.content=="Sorry, We don't have lyrics for this song yet.":
return 0
else:
return response.content
def get_lyrics(pattern, path=os.getcwd()):
files=find(pattern, path)
matches = len(files)
if matches==1:
tag = eyeD3.Tag()
tag.link(files[0])
lyrics = tag.getLyrics()
if lyrics:
for l in lyrics:
print l.lyrics
else:
print "Lyrics not found. Searching online..."
tag = eyeD3.Tag()
tag.link(files[0])
artist = tag.getArtist()
title = tag.getTitle()
l = fetch_lyrics(title, artist)
if l==0:
print "No matches found."
else:
#print l
tag.addLyrics(l.decode('utf-8'))
tag.update()
The traceback that I got is:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "lyrics.py", line 99, in get_lyrics
tag.update()
File "/usr/lib/python2.7/dist-packages/eyeD3/tag.py", line 526, in update
self.__saveV2Tag(version);
File "/usr/lib/python2.7/dist-packages/eyeD3/tag.py", line 1251, in __saveV2Ta
g
raw_frame = f.render();
File "/usr/lib/python2.7/dist-packages/eyeD3/frames.py", line 1200, in render
self.lyrics.encode(id3EncodingToString(self.encoding))
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2019' in position
4: ordinal not in range(256)
I don't understand the error. Do I need to pass any other parameter to the update() or addLyrics() functions. Any help?
I imagine you're trying to write ID3v1 (or ID3v2 single-byte) tag which only permits latin-1.
I think I had to patch my eyeD3 once to fix that problem. Try to turn ID3v1 off and set ID3v2 to v2.4 UTF-8.
Ideally - catch, turn off ID3v1, retry. The specific problem is that ’ quote is multi-byte.

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9'

I use python to get json data from bing api
accountKeyEnc = base64.b64encode(accountKey + ':' + accountKey)
headers = {'Authorization': 'Basic ' + accountKeyEnc}
req = urllib2.Request(bingUrl, headers = headers)
response = urllib2.urlopen(req)
content = response.read()
data = json.loads(content)
for i in range(0,6):
print data["d"]["results"][i]["Description"]
But I got error
print data["d"]["results"][0]["Description"]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 11: ordinal not in range(128)
Your problem is that you are reading Unicode from the Bing API and then failing to explicitly convert it to ASCII. There does not exist a good mapping between the two. Prefix all of your const strings with u so that they will be seen as Unicode strings, see if that helps.

Categories