This is the code
A = "Diga sí por cualquier número de otro cuidador.".encode("utf-8")
I get this error:
'ascii' codec can't decode byte 0xed in position 6: ordinal not in range(128)
I tried numerous encodings unsuccessfully.
Edit:
I already have this at the beginning
# -*- coding: utf-8 -*-
Changing to
A = u"Diga sí por cualquier número de otro cuidador.".encode("utf-8")
doesn't help
Are you using Python 2?
In Python 2, that string literal is a bytestring. You're trying to encode it, but you can encode only a Unicode string, so Python will first try to decode the bytestring to a Unicode string using the default "ascii" encoding.
Unfortunately, your string contains non-ASCII characters, so it can't be decoded to Unicode.
The best solution is to use a Unicode string literal, like this:
A = u"Diga sí por cualquier número de otro cuidador.".encode("utf-8")
Error message: 'ascii' codec can't decode byte 0xed in position 6: ordinal not in range(128)
says that the 7th byte is 0xed. This is either the first byte of the UTF-8 sequence for some (maybe CJK) high-ordinal Unicode character (that's absolutely not consistent with the reported facts), or it's your i-acute encoded in Latin1 or cp1252. I'm betting on the cp1252.
If your file was encoded in UTF-8, the offending byte would be not 0xed but 0xc3:
Preliminaries:
>>> import unicodedata
>>> unicodedata.name(u'\xed')
'LATIN SMALL LETTER I WITH ACUTE'
>>> uc = u'Diga s\xed por'
What happens if file is encoded in UTF-8:
>>> infile = uc.encode('utf8')
>>> infile
'Diga s\xc3\xad por'
>>> infile.encode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6: ordinal not in range(128)
#### NOT the message reported in the question ####
What happens if file is encoded in cp1252 or latin1 or similar:
>>> infile = uc.encode('cp1252')
>>> infile
'Diga s\xed por'
>>> infile.encode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 6: ordinal not in range(128)
#### As reported in the question ####
Having # -*- coding: utf-8 -*- at the start of your code does not magically ensure that your file is encoded in UTF-8 -- that's up to you and your text editor.
Actions:
save your file as UTF-8.
As
suggested by others, you need u'blah
blah'
put on first line of your code this:
# -*- coding: utf-8 -*-
You should specify your source file's encoding by adding the following line to the very beginning of your code (assuming that your file is encoded in UTF-8):
# Encoding: UTF-8
Otherwise, Python will assume an ASCII encoding and fail during parsing.
You probably operate on normal string, not unicode string:
>> type(u"zażółć gęślą jaźń")
-> <type 'unicode'>
>> type("zażółć gęślą jaźń")
-> <type 'str'>
so
u"Diga sí por cualquier número de otro cuidador.".encode("utf-8")
should work.
If you want use unicode strings by default, put
# -*- coding: utf-8 -*-
in the first line of your script.
Look also in docs.
P.S. It's Polish in examples above :)
In the first or second line of your code, type the comment:
# -*- coding: latin-1 -*-
For a list of symbols supported see:
http://en.wikipedia.org/wiki/Latin-1_Supplement_%28Unicode_block%29
And the languages covered: http://en.wikipedia.org/wiki/ISO_8859-1
Maybe this is what you want to do:
A = 'Diga sí por cualquier número de otro cuidador'.decode('latin-1')
And don't forget to add # -*- coding: latin-1 -*- at the beginning of your code.
Related
Configuring the encoding for cp1252 to configure some sequences like "~ '^", for example when I put in coding: utf-8 the strings with text formatting error, I already tested with "latin-1" and the same error continues.
If I edit "cp1252" to "cp1252" again, I can open it normally, but when I close and later open the file, the error message comes back.
Someone knows how to solve the following error, it always appears that I will execute the main script file.
SyntaxError: 'charmap' codec can't decode byte 0x81 in position 45: character maps to <undefined>
# coding: cp1252
import sys
import os
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from ytconsole import *
self.msgt2 =QMessageBox()
self.msgt2.setIcon(QMessageBox.Information)
self.msgt2.setWindowTitle('Python Youtube Downloader') -->[Here is the problem] line 353
self.msgt2.setText("{} arquivos MP4 foram baixados, salvos na Área de Trabalho".format(int(l)))``
I am reading a json file with python 3.5. In this file it have characters like "í". I would like to print it in that format. How do I make the below code print the character correctly?
t = 'í'
print(t)
Traceback (most recent call last):
File "test.py", line 15, in <module>
print(t)
UnicodeEncodeError: 'ascii' codec can't encode character '\xed' in position 0: ordinal not in range(128)
Try adding # -*- coding: iso-8859-15 -*- as the first or second line of your source file.
try this -
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
t = 'í'
print(t.encode("ascii" , "ignore"))
Use unicode format.
t = u'i'
print(t)
You have to add u before the character 'i' so that python understands it as unicode.
Try this:
print(t.decode("utf-8"))
I know this question have been asked various time but somehow I am not getting results.
I am fetching data from web which contains a string Elzéar. While going to read in CSV file it gives error which mentioned in question title.
While producing data I did following:
address = str(address).strip()
address = address.encode('utf8')
return name+','+address+','+city+','+state+','+phone+','+fax+','+pumps+','+parking+','+general+','+entertainment+','+fuel+','+resturants+','+services+','+technology+','+fuel_cards+','+credit_cards+','+permits+','+money_services+','+security+','+medical+','+longit+','+latit
and writing it as:
with open('records.csv', 'a') as csv_file:
print(type(data)) #prints <unicode>
data = data.encode('utf8')
csv_file.write(id+','+data+'\n')
status = 'OK'
the_file.write(ts+'\t'+url+'\t'+status+'\n')
Generates error as:
'ascii' codec can't encode character u'\xe9' in position 55: ordinal
not in range(128)
You could try something like (python2.7):
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import codecs
...
with codecs.open('records.csv', 'a', encoding="utf8") as csv_file:
print(type(data)) #prints <unicode>
# because data is unicode
csv_file.write(unicode(id)+u','+data+u'\n')
status = u'OK'
the_file.write(unicode(ts, encoding="utf8")+u'\t'+unicode(url, encoding="utf8")+u'\t'+status+u'\n')
The main idea is to work with unicode as much as possible and return str when outputing (better do not operate over str).
I am using Python 2.7 to read data from a MySQL table. In MySQL the name looks like this:
Garasa, Ángel.
But when I print it in Python the output is
Garasa, �ngel
The character set name in MySQL is utf8.
This is my Python code:
# coding: utf-8
import MySQLdb
connection = MySQLdb.connect
(host="localhost",user="root",passwd="root",db="jmdb")
cursor = connection.cursor ()
cursor.execute ("select * from actors where actorid=672462;")
data = cursor.fetchall ()
for row in data:
print "IMDB Name=",row[4]
wiki=("".join(row[4]))
print wiki
I have tried decoding it, but get error such as:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc1 in position 8:
invalid start byte
I have read about decoding and UTF-8 but couldn't find a solution.
Get the Mysql driver to return Unicode strings instead. This means that you don't have to deal with decoding in your code.
Simply set use_unicode=True in the connection parameters. If the table has been set with a specific encoding then set the charset attribute accordingly.
I think the right character mapping in your case is cp1252 :
>>> s = 'Garasa, Ángel.'
>>> s.decode('utf-8')
Traceback (most recent call last):
File "<pyshell#63>", line 1, in <module>
s.decode('utf-8')
File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc1 in position 8: invalid start byte
>>> s.decode('cp1252')
u'Garasa, \xc1ngel.'
>>>
>>> print s.decode('cp1252')
Garasa, Ángel.
EDIT: It could also be possible that it is latin-1 as well:
>>> s.decode('latin-1')
u'Garasa, \xc1ngel.'
>>> print s.decode('latin-1')
Garasa, Ángel.
As cp1252 and latin-1 code pages intersects for all codes except the range 128 to 159.
Quoting from this source (latin-1):
The Windows-1252 codepage coincides with ISO-8859-1 for all codes
except the range 128 to 159 (hex 80 to 9F), where the little-used C1
controls are replaced with additional characters including all the
missing characters provided by ISO-8859-15
And this one (cp1252):
This character encoding is a superset of ISO 8859-1, but differs from
the IANA's ISO-8859-1 by using displayable characters rather than
control characters in the 80 to 9F (hex) range.
I am trying to read some French text and do some frequency analysis of words. I want the characters with the umlauts and other diacritics to stay. So, I did this for testing:
>>> import codecs
>>> f = codecs.open('file','r','utf-8')
>>> for line in f:
... print line
...
Faites savoir à votre famille que vous êtes en sécurité.
So far, so good. But, I have a list of French files which I iterate over in the following way:
import codecs,sys,os
path = sys.argv[1]
for f in os.listdir(path):
french = codecs.open(os.path.join(path,f),'r','utf-8')
for line in french:
print line
Here, it gives the following error:
rdholaki74: python TestingCodecs.py ../frenchResources | more
Traceback (most recent call last):
File "TestingCodecs.py", line 7, in <module>
print line
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 14: ordinal not in range(128)
Why is it that the same file throws up an error when passed as an argument and not when given explicitly in the code?
Thanks.
Because you're misinterpreting the cause. The fact that you're piping the output means that Python can't detect what encoding to use. If stdout is not a TTY then you'll need to encode as UTF-8 manually before outputting.
It is a print error due to redirection. You could use:
PYTHONIOENCODING=utf-8 python ... | ...
Specify another encoding if your terminal doesn't use utf-8