how can save file with names in utf-8 - python

I need to save file with utf-8 names.but when I do it django error:
UnicodeEncodeError at /uploaded/document/ 'فیلتر.png'
'ascii' codec can't encode characters in position 55-59: ordinal not in range(128)
although, my filefield like it:
# -*- coding: utf-8 -*-
def get_path(instance, filename):
return u' '.join((u'document', filename)).encode('utf-8').strip()
class Document(models.Model):
file_path = models.FileField(verbose_name='File', upload_to=get_path,
storage=FileSystemStorage(base_url=settings.LOCAL_MEDIA_URL))
how can I fix it?
I use tastypie api to upload file.

my question answered here:
https://itekblog.com/ascii-codec-cant-encode-characters-in-position/#The_Code
I should change apache2 encoding:
/etc/apache/envvars
export LANG='en_US.UTF-8'
export LC_ALL='en_US.UTF-8'

Related

How can I solve Python Unicode Encoding Error? [duplicate]

This question already has answers here:
Python, Unicode, and the Windows console
(15 answers)
Closed 2 years ago.
I was trying to load a .json file. This file is an English word dictionary that contained a lot of Unicode like \u266f. By using encoding = "utf8" can not solve the error. Then I replaced all of the Unicode with UTF-8; but still, it shows the same error.
My code:
import json
data = json.load(open("data.json", encoding="utf8"))
print(data)
Result:
Traceback (most recent call last):
File "E:\Python\Dictionary App\test.py", line 4, in <module>
print(data)
File "C:\Users\ahnaf\AppData\Local\Programs\Python\Python38-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u266f' in position 657370: character maps to <undefined>
[Finished in 0.32s]
The json file: data.json
Try to include:
# -*- coding: utf-8 -*-
or
# coding: utf-8
in your python file header.
Exemple:
# coding: utf-8
import json
with open("data.json") as f:
data = json.loads(f.read())
print(data)

Python return error while writing data into file(Python 2.7)

I am parsing XML file with python mini-Dom module.while writing data into file its giving error like Unicode Encode Error: 'ASCII' codec can't encode characters in position 0-3: ordinal not in range(128). But Out put prints perfectly on command line Please tell me the solution.
my XML file is:
<?xml version="1.0"?>
<Feature>
<Word Root ="ਨੌਕਰ-ਚਾਕਰ">
<info Inflection ="ਨੌਕਰਾਂ-ਚਾਕਰਾਂ">
<posinfo gender ="Masculine" number ="Plural" case ="Oblique" />
</info>
</Word>
</Feature>
My python code is:
import sys
from xml.dom import minidom
file=open("npu.txt","w+")
doc = minidom.parse("NPU.xml")
word = doc.getElementsByTagName("Word")
for each in word:
# print "root"+each.getAttribute("Root")
file.write(each.getAttribute("Root")+"\n")
hh=each.getElementsByTagName("info")
for each1 in hh:
# print "inflection"+each1.getAttribute("Inflection")
file.write(each1.getAttribute("Inflection")+"\t")
vv=each1.getElementsByTagName("posinfo")
for each2 in vv:
# print each2.getAttribute("gender")
# print each2.getAttribute("number")
# print each2.getAttribute("case")
file.write( each2.getAttribute("gender")+",")
file.write( each2.getAttribute("number")+",")
file.write(each2.getAttribute("case"))
file.write("\n")
file.write("--------\n")
encode data while writing-
#!/usr/bin/env python
# -*- coding: utf-8 -*-
file=open("npu.txt","w+")
file.write("ਨੌਕਰ-ਚਾਕਰ")
The problem isn't in the way you parse the XML, this is an encoding problem.
The error is caused by the encoding of your text (UTF-8).
You are trying to write your text as ASCII that doesn't include the characters that you are using.
try with codecs as follow:
import codecs
file = codecs.open("npu.txt", "w+", "utf-8")
file.write("ਨੌਕਰ-ਚਾਕਰ".decode('utf-8'))
file.close()
EDIT :
You can also set the default encoding to UTF-8 adding the special comment
# -*- coding: UTF-8 -*-
at the beginning of the python source. The default encoding is ASCII (7-bit).
Note that Python identifiers are still restricted to ASCII characters.

UnicodeDecodeError when import json file

I want to open a json file in python and I have the error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 64864: ordinal not in range(128)
my code is quite simple:
# -*- coding: utf-8 -*-
import json
with open('birdw3l2.json') as data_file:
data = json.load(data_file)
print(data)
Someone can help me? Thanks!
Try the following code.
import json
with open('birdw3l2.json') as data_file:
data = json.load(data_file).decode('utf-8')
print(data)
You should specify your encoding format when you load your json file. like this:
data = json.load(data_file, encoding='utf-8')
The encoding depends on your file encoding.

UnicodeDecodeError: 'utf8' codec can't decode bytes

I'm parsing an xml file which has "iso-8859-15" encoding.
Words like 'Zürich', 'Aktienrückk' get converted to "&#228 ;" etc.
I tried these suggestions :
p = ElementTree.fromstring(u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'.encode('utf8'))
>>> p.text
u'found "\u62c9\u67cf \u591a\u516c \u56ed"'
>>> print p.text
but I get errors like UnicodeDecodeError: 'ascii' codec can't decode byte
Even this doesn't help
content = unicode(mystring.strip(codecs.BOM_UTF8), 'utf-8')
I tried a lot of suggestions on Stack Overflow, but I couldn't figure out my way.
I need to write the parsed content back to a html file with same character sets like 'ü'
Try this:
from xml.etree import ElementTree
p = ElementTree.fromstring(u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'.encode('utf8'))
print p.text.encode('utf8')
found "拉柏 多公 园"
For your example:
# -*- coding: utf-8 -*-
from xml.etree import ElementTree
text = 'Aktienrückk'.decode('utf8')
print text.encode('utf8')
Aktienrückk
Don't forget to put # -*- coding: utf-8 -*- at the beginning of the file.

UnicodeEncodeError -- utf8 and unicode() not working

I have a synopsis as follows:
synopsis = 'Eine Geschichte, wie im normalen Leben... Der als äußerst vorsichtig
geltende Risikoanalytiker Ruben verlässt seine Frau,...'
I am trying to write this to a file, but keep running into:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 705: ordinal not in range(128)
Here is what I've tried:
synopsis = unicode(synopsis)
new_file.write('%s' % synopsis)
synopsis = synopsis.encode('utf-8')
new_file.write('%s' % synopsis)
In addition, I have # # -*- coding: utf-8 -*- specified at the top of my file.
Why is this occurring and how can I fix it?
How are you opening new_file?
import codecs
new_file = codecs.open('out', mode='w', encoding='utf-8')
This should allow you to write Unicode strings to the file, which will be encoded as UTF-8.
(Unless otherwise set, sys.getdefaultencoding() is 'ascii', which affects the encoding of newly-opened files.)

Categories