UnicodeEncodeError -- utf8 and unicode() not working - python

I have a synopsis as follows:
synopsis = 'Eine Geschichte, wie im normalen Leben... Der als äußerst vorsichtig
geltende Risikoanalytiker Ruben verlässt seine Frau,...'
I am trying to write this to a file, but keep running into:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 705: ordinal not in range(128)
Here is what I've tried:
synopsis = unicode(synopsis)
new_file.write('%s' % synopsis)
synopsis = synopsis.encode('utf-8')
new_file.write('%s' % synopsis)
In addition, I have # # -*- coding: utf-8 -*- specified at the top of my file.
Why is this occurring and how can I fix it?

How are you opening new_file?
import codecs
new_file = codecs.open('out', mode='w', encoding='utf-8')
This should allow you to write Unicode strings to the file, which will be encoded as UTF-8.
(Unless otherwise set, sys.getdefaultencoding() is 'ascii', which affects the encoding of newly-opened files.)

Related

unicode decode error while importing Medical Data on pandas

I tried importing a medical data and I ran into this unicode error, here is my code:
output_path = r"C:/Users/muham/Desktop/AI projects/cancer doc classification"
my_file = glob.glob(os.path.join(output_path, '*.csv'))
for files in my_file:
data = pd.read_csv(files)
print(data)
My error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 3314: invalid start byte
Try other encodings, default one is utf-8
like
import pandas
pandas.read_csv(path, encoding="cp1252")
or ascii, latin1, etc ...

how can save file with names in utf-8

I need to save file with utf-8 names.but when I do it django error:
UnicodeEncodeError at /uploaded/document/ 'فیلتر.png'
'ascii' codec can't encode characters in position 55-59: ordinal not in range(128)
although, my filefield like it:
# -*- coding: utf-8 -*-
def get_path(instance, filename):
return u' '.join((u'document', filename)).encode('utf-8').strip()
class Document(models.Model):
file_path = models.FileField(verbose_name='File', upload_to=get_path,
storage=FileSystemStorage(base_url=settings.LOCAL_MEDIA_URL))
how can I fix it?
I use tastypie api to upload file.
my question answered here:
https://itekblog.com/ascii-codec-cant-encode-characters-in-position/#The_Code
I should change apache2 encoding:
/etc/apache/envvars
export LANG='en_US.UTF-8'
export LC_ALL='en_US.UTF-8'

Python return error while writing data into file(Python 2.7)

I am parsing XML file with python mini-Dom module.while writing data into file its giving error like Unicode Encode Error: 'ASCII' codec can't encode characters in position 0-3: ordinal not in range(128). But Out put prints perfectly on command line Please tell me the solution.
my XML file is:
<?xml version="1.0"?>
<Feature>
<Word Root ="ਨੌਕਰ-ਚਾਕਰ">
<info Inflection ="ਨੌਕਰਾਂ-ਚਾਕਰਾਂ">
<posinfo gender ="Masculine" number ="Plural" case ="Oblique" />
</info>
</Word>
</Feature>
My python code is:
import sys
from xml.dom import minidom
file=open("npu.txt","w+")
doc = minidom.parse("NPU.xml")
word = doc.getElementsByTagName("Word")
for each in word:
# print "root"+each.getAttribute("Root")
file.write(each.getAttribute("Root")+"\n")
hh=each.getElementsByTagName("info")
for each1 in hh:
# print "inflection"+each1.getAttribute("Inflection")
file.write(each1.getAttribute("Inflection")+"\t")
vv=each1.getElementsByTagName("posinfo")
for each2 in vv:
# print each2.getAttribute("gender")
# print each2.getAttribute("number")
# print each2.getAttribute("case")
file.write( each2.getAttribute("gender")+",")
file.write( each2.getAttribute("number")+",")
file.write(each2.getAttribute("case"))
file.write("\n")
file.write("--------\n")
encode data while writing-
#!/usr/bin/env python
# -*- coding: utf-8 -*-
file=open("npu.txt","w+")
file.write("ਨੌਕਰ-ਚਾਕਰ")
The problem isn't in the way you parse the XML, this is an encoding problem.
The error is caused by the encoding of your text (UTF-8).
You are trying to write your text as ASCII that doesn't include the characters that you are using.
try with codecs as follow:
import codecs
file = codecs.open("npu.txt", "w+", "utf-8")
file.write("ਨੌਕਰ-ਚਾਕਰ".decode('utf-8'))
file.close()
EDIT :
You can also set the default encoding to UTF-8 adding the special comment
# -*- coding: UTF-8 -*-
at the beginning of the python source. The default encoding is ASCII (7-bit).
Note that Python identifiers are still restricted to ASCII characters.

UnicodeDecodeError: 'utf8' codec can't decode bytes

I'm parsing an xml file which has "iso-8859-15" encoding.
Words like 'Zürich', 'Aktienrückk' get converted to "&#228 ;" etc.
I tried these suggestions :
p = ElementTree.fromstring(u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'.encode('utf8'))
>>> p.text
u'found "\u62c9\u67cf \u591a\u516c \u56ed"'
>>> print p.text
but I get errors like UnicodeDecodeError: 'ascii' codec can't decode byte
Even this doesn't help
content = unicode(mystring.strip(codecs.BOM_UTF8), 'utf-8')
I tried a lot of suggestions on Stack Overflow, but I couldn't figure out my way.
I need to write the parsed content back to a html file with same character sets like 'ü'
Try this:
from xml.etree import ElementTree
p = ElementTree.fromstring(u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'.encode('utf8'))
print p.text.encode('utf8')
found "拉柏 多公 园"
For your example:
# -*- coding: utf-8 -*-
from xml.etree import ElementTree
text = 'Aktienrückk'.decode('utf8')
print text.encode('utf8')
Aktienrückk
Don't forget to put # -*- coding: utf-8 -*- at the beginning of the file.

Python script to convert from UTF-8 to ASCII [duplicate]

This question already has answers here:
Convert Unicode to ASCII without errors in Python
(12 answers)
Closed 8 years ago.
I'm trying to write a script in python to convert utf-8 files into ASCII files:
#!/usr/bin/env python
# *-* coding: iso-8859-1 *-*
import sys
import os
filePath = "test.lrc"
fichier = open(filePath, "rb")
contentOfFile = fichier.read()
fichier.close()
fichierTemp = open("tempASCII", "w")
fichierTemp.write(contentOfFile.encode("ASCII", 'ignore'))
fichierTemp.close()
When I run this script I have the following error :
UnicodeDecodeError: 'ascii' codec
can't decode byte 0xef in position 13:
ordinal not in range(128)
I thought that can ignore error with the ignore parameter in the encode method. But it seems not.
I'm open to other ways to convert.
data="UTF-8 DATA"
udata=data.decode("utf-8")
asciidata=udata.encode("ascii","ignore")
import codecs
...
fichier = codecs.open(filePath, "r", encoding="utf-8")
...
fichierTemp = codecs.open("tempASCII", "w", encoding="ascii", errors="ignore")
fichierTemp.write(contentOfFile)
...
UTF-8 is a superset of ASCII. Either your UTF-8 file is ASCII, or it can't be converted without loss.

Categories