Can't install l18n, UnicodeDecodeError: 'cp950' codec can't decode byte

Can't install l18n, UnicodeDecodeError: 'cp950' codec can't decode byte - python

Error occurs when I run
pip install l18n
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\X\AppData\Local\Temp\pip-install-8urtlamu\l18n\setup.py", line 99, in <module>
long_description=open(os.path.join('README.rst')).read(),
UnicodeDecodeError: 'cp950' codec can't decode byte 0xc3 in position 2135: illegal multibyte sequence
Tried but didnt work:
chcp 65001
Alternative concole: cmder
Config:
Windows 7
Python 3.6.4
Pip 10.0.1
Thanks!

It is manifestly a bug inside l18n: in the setup.py, the long_description parameter is build by reading the README.rst file (this is a classic way to do that).
The trace back says: 'cp950' codec can't decode byte 0xc3 in position 2135. This is a classic error with utf-8 encoded text which contains non-ascii characters.
The source code is stored in Bitbucket:
long_description=open(os.path.join('README.rst')).read(),
The behavior of the open function changed in Python 3. You must set the file encoding which is utf-8 here:
A portable way to solve that is to define a function:
import io
def read(path):
with io.open(path, mode='r', encoding='utf-8') as f:
return f.read()
And to use it like this:
long_description=read('README.rst')
There is an issue about that.

Related

codecs issue when writing to csv python3

output_file = open(OUTPUT_FILENAME,'w',newline='') #create new file
dict_writer = csv.DictWriter(output_file, keys)
dict_writer.writeheader()
#some logic
for row in items:
#di dictionary
dict_writer.writerow(di)
Hello, I am new to python. I created this script on Linux (centos) I ran it and it works fine,
I tried running it on windows I got this error
Traceback (most recent call last):
File "C:\Users\user157\Desktop\test.py", line 180, in <module>
dict_writer.writerow(di)
File "C:\Users\user157\AppData\Local\Programs\Python\Python38-32\lib\csv.py", line 154, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "C:\Users\user157\AppData\Local\Programs\Python\Python38-32\lib\encodings\cp1256.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xe3' in position 33: character maps to <undefined>
I tried solving it by using before I write the dictionary but still same error
for k,v in di.items():
try:
di[k] =v.encode().decode('utf-8')
except:
pass
I have python 3.7.5 on centos and 3.8.2 on windows

You need to check what your input file encoding is in Windows and use encoding='' in your input file open statement. In windows the default ending is not 'utf8' and therefore will have encoding issues if not opened with correct encoding like below,
open(input_file_name,encoding='iso-8859-1')
or better change your input file to 'utf8' encoding, so that the script can be used without modification on windoes and in linux.

encode 'UCS-2 Little Endian' file to 'utf8' using python error

I'm trying to encode from UCS-2 Little Endian file to utf8 using python and I'm getting a weird error.
The code I'm using:
file=open("C:/AAS01.txt", 'r', encoding='utf8')
lines = file.readlines()
file.close()
And I'm getting the following error:
Traceback (most recent call last):
File "C:/Users/PycharmProjects/test.py", line 18, in <module>
main()
File "C:/Users/PycharmProjects/test.py", line 7, in main
lines = file.readlines()
File "C:\Python34\lib\codecs.py", line 319, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
I tried to use codecs commands, but also didn't work...
Any idea what I can do?

The encoding argument to open sets the input encoding. Use encoding='utf_16_le'.

If you're trying to read UCS-2, why are you telling Python it's UTF-8? The 0xff is most likely the first byte of a little endian byte order marker:
>>> codecs.BOM_UTF16_LE
b'\xff\xfe'
UCS-2 is also deprecated, for the simple reason that Unicode outgrew it. The typical replacement would be UTF-16.
More info linked in Python 3: reading UCS-2 (BE) file

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 2: ordinal not in range(128)

I installed pywikibot-core (version 2.0b3) for my Mediawiki installation. I got an error when i tried to run a command which contains Unicode text.
I run the following command:
python pwb.py replace.py -regex -start:! "\[মুয়ায্যম হুসায়ন খান\]" "[মুয়ায্‌যম হুসায়ন খান]" -summary:"fix: মুয়ায্যম > মুয়ায্‌যম"
Here is the error i got:
Traceback (most recent call last):
File "pwb.py", line 161, in <module>
import pywikibot # noqa
File "/var/www/html/banglapedia_bn/core/pywikibot/__init__.py", line 32, in <module>
from pywikibot import config2 as config
File "/var/www/html/banglapedia_bn/core/pywikibot/config2.py", line 285, in <module>
if arg.startswith("-verbose") or arg == "-v":
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 2: ordinal not in range(128)

Use python3 instead of python.
You are seeing that error because the module config2.py uses from __future__ import unicode_literals, making all strings in the module unicode objects. However, sys.args is a bytestring, and is not affected by __future__ imports.
Therefore, because arg is a byte string, but "-verbose" and "-v" are two unicode strings, arg gets implicitly promoted to unicode, but this is failing because implicit conversion only works with ASCII.
Instead, in Python 3, all strings are unicode by default, including sys.args.

Python ignores encoding argument in favor of cp1252

I have a lengthy json file that contains utf-8 characters (and is encoded in utf-8). I want to read it in python using the built-in json module.
My code looks like this:
dat = json.load(open("data.json"), "utf-8")
Though I understand the "utf-8" argument should be unnecessary as it is assumed as the default. However, I get this error:
Traceback (most recent call last):
File "winratio.py", line 9, in <module>
dat = json.load(open("data.json"), "utf-8")
File "C:\Python33\lib\json\__init__.py", line 271, in load
return loads(fp.read(),
File "C:\Python33\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 28519: ch
aracter maps to <undefined>
My question is: Why does python seem to ignore my encoding specification and try to load the file in cp1252?

Try this:
import codecs
dat = json.load(codecs.open("data.json", "r", "utf-8"))
Also here are described some tips about a writing mode in context of the codecs library: Write to UTF-8 file in Python

UnicodeDecodeError reading string in CSV

I'm having a problem reading some chars in python.
I have a csv file in UTF-8 format, and I'm reading, but when script read:
PreuÃŸen MÃ¼nster-Kaiserslautern II
I get this error:
Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/__init__.py", line 515, in __call__
handler.get(*groups)
File "/Users/fermin/project/gae/cuotastats/controllers/controllers.py", line 50, in get
f.name = unicode( row[1])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
I tried to use Unicode functions and convert string to Unicode, but I haven't found the solution. I tried to use sys.setdefaultencoding('utf8') but that doesn't work either.

Try the unicode_csv_reader() generator described in the csv module docs.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't install l18n, UnicodeDecodeError: 'cp950' codec can't decode byte - python

Related

codecs issue when writing to csv python3

encode 'UCS-2 Little Endian' file to 'utf8' using python error

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 2: ordinal not in range(128)

Python ignores encoding argument in favor of cp1252

UnicodeDecodeError reading string in CSV

Categories

Resources