How can I solve Python Unicode Encoding Error? [duplicate] - python

This question already has answers here:
Python, Unicode, and the Windows console
(15 answers)
Closed 2 years ago.
I was trying to load a .json file. This file is an English word dictionary that contained a lot of Unicode like \u266f. By using encoding = "utf8" can not solve the error. Then I replaced all of the Unicode with UTF-8; but still, it shows the same error.
My code:
import json
data = json.load(open("data.json", encoding="utf8"))
print(data)
Result:
Traceback (most recent call last):
File "E:\Python\Dictionary App\test.py", line 4, in <module>
print(data)
File "C:\Users\ahnaf\AppData\Local\Programs\Python\Python38-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u266f' in position 657370: character maps to <undefined>
[Finished in 0.32s]
The json file: data.json

Try to include:
# -*- coding: utf-8 -*-
or
# coding: utf-8
in your python file header.
Exemple:
# coding: utf-8
import json
with open("data.json") as f:
data = json.loads(f.read())
print(data)

Related

how can save file with names in utf-8

I need to save file with utf-8 names.but when I do it django error:
UnicodeEncodeError at /uploaded/document/ 'فیلتر.png'
'ascii' codec can't encode characters in position 55-59: ordinal not in range(128)
although, my filefield like it:
# -*- coding: utf-8 -*-
def get_path(instance, filename):
return u' '.join((u'document', filename)).encode('utf-8').strip()
class Document(models.Model):
file_path = models.FileField(verbose_name='File', upload_to=get_path,
storage=FileSystemStorage(base_url=settings.LOCAL_MEDIA_URL))
how can I fix it?
I use tastypie api to upload file.
my question answered here:
https://itekblog.com/ascii-codec-cant-encode-characters-in-position/#The_Code
I should change apache2 encoding:
/etc/apache/envvars
export LANG='en_US.UTF-8'
export LC_ALL='en_US.UTF-8'

unable to decode this string using python

I have this text.ucs file which I am trying to decode using python.
file = open('text.ucs', 'r')
content = file.read()
print content
My result is
\xf\xe\x002\22
I tried doing decoding with utf-16, utf-8
content.decode('utf-16')
and getting error
Traceback (most recent call last): File "", line 1, in
File "C:\Python27\lib\encodings\utf_16.py", line 16, in
decode
return codecs.utf_16_decode(input, errors, True) UnicodeDecodeError: 'utf16' codec can't decode bytes in position
32-33: illegal encoding
Please let me know if I am missing anything or my approach is wrong
Edit: Screenshot has been asked
The string is encoded as UTF16-BE (Big Endian), this works:
content.decode("utf-16-be")
oooh, as i understand you using python 2.x.x but encoding parameter was added only in python 3.x.x as I know, i am doesn't master of python 2.x.x but you can search in google about io.open for example try:
file = io.open('text.usc', 'r',encoding='utf-8')
content = file.read()
print content
but chek do you need import io module or not
You can specify which encoding to use with the encoding argument:
with open('text.ucs', 'r', encoding='utf-16') as f:
text = f.read()
your string need to Be Uncoded With The Coding utf-8 you can do What I Did Now for decode your string
f = open('text.usc', 'r',encoding='utf-8')
print f

Can't find the directory no idea why

import requests
test = requests.get("https://www.hipstercode.com/")
outfile = open("./settings.txt", "w")
test.encoding = 'ISO-8859-1'
outfile.write(str(test.text))
The error that i'm getting is:
File "C:/Users/Bamba/PycharmProjects/Requests/Requests/Requests.py", line 8, in <module>
outfile.write(str(test.text))
File "C:\Users\Bamba\AppData\Local\Programs\Python\Python35\lib\encodings\cp1255.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xef' in position 0: character maps to <undefined>
So, looks like response contains smth you can't encode in cp1251.
If utf-8 is ok for you, try
import requests
test = requests.get("https://www.hipstercode.com/")
outfile = open("./settings.txt", "wb")
outfile.write(test.text.encode('ISO-8859-1'))
If you're getting error while encoding, you simply cannot encode lossless. Options you have described in encode docs: https://docs.python.org/3/library/stdtypes.html#str.encode
I.e., you can
outfile.write(test.text.encode('ISO-8859-1', 'replace'))
to handle errors without losing most sense of text written in smth that doesn't fit ISO-8859-1

UnicodeDecodeError when import json file

I want to open a json file in python and I have the error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 64864: ordinal not in range(128)
my code is quite simple:
# -*- coding: utf-8 -*-
import json
with open('birdw3l2.json') as data_file:
data = json.load(data_file)
print(data)
Someone can help me? Thanks!
Try the following code.
import json
with open('birdw3l2.json') as data_file:
data = json.load(data_file).decode('utf-8')
print(data)
You should specify your encoding format when you load your json file. like this:
data = json.load(data_file, encoding='utf-8')
The encoding depends on your file encoding.

Python script to convert from UTF-8 to ASCII [duplicate]

This question already has answers here:
Convert Unicode to ASCII without errors in Python
(12 answers)
Closed 8 years ago.
I'm trying to write a script in python to convert utf-8 files into ASCII files:
#!/usr/bin/env python
# *-* coding: iso-8859-1 *-*
import sys
import os
filePath = "test.lrc"
fichier = open(filePath, "rb")
contentOfFile = fichier.read()
fichier.close()
fichierTemp = open("tempASCII", "w")
fichierTemp.write(contentOfFile.encode("ASCII", 'ignore'))
fichierTemp.close()
When I run this script I have the following error :
UnicodeDecodeError: 'ascii' codec
can't decode byte 0xef in position 13:
ordinal not in range(128)
I thought that can ignore error with the ignore parameter in the encode method. But it seems not.
I'm open to other ways to convert.
data="UTF-8 DATA"
udata=data.decode("utf-8")
asciidata=udata.encode("ascii","ignore")
import codecs
...
fichier = codecs.open(filePath, "r", encoding="utf-8")
...
fichierTemp = codecs.open("tempASCII", "w", encoding="ascii", errors="ignore")
fichierTemp.write(contentOfFile)
...
UTF-8 is a superset of ASCII. Either your UTF-8 file is ASCII, or it can't be converted without loss.

Categories