Utf-8 decode error in pyramid/WebOb request - python

I found an error in the logs of a website of mine, in the log i got the body of the request, so i tried to reproduce that
This is what i got.
>>> from mondishop.models import *
>>> from pyramid.request import *
>>> req = Request.blank('/')
>>> b = DBSession.query(Log).filter(Log.id == 503).one().payload.encode('utf-8')
>>> req.method = 'POST'
>>> req.body = b
>>> req.params
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/request.py", line 856, in params
params = NestedMultiDict(self.GET, self.POST)
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/request.py", line 807, in POST
vars = MultiDict.from_fieldstorage(fs)
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/multidict.py", line 92, in from_fieldstorage
obj.add(field.name, decode(value))
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/multidict.py", line 78, in <lambda>
decode = lambda b: b.decode(charset)
File "/home/phas/virtualenv/mondishop/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 52: invalid start byte
>>> req.POST
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/request.py", line 807, in POST
vars = MultiDict.from_fieldstorage(fs)
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/multidict.py", line 92, in from_fieldstorage
obj.add(field.name, decode(value))
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/multidict.py", line 78, in <lambda>
decode = lambda b: b.decode(charset)
File "/home/phas/virtualenv/mondishop/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 52: invalid start byte
>>>
The error is the same as the one i got i my log, so apparently something goes bad try to decoding the original post.
What is weird is that i get an error trying to utf-8 decode something that i just utf-8 encoded.
I cannot provide the content of the original request body because it contains some sensitive data (it's a paypal IPN) and i don't really have any idea on how to start addressing this issue.

Related

jieba.analyse: 'generator' object has no attribute 'decode'

I have to encode that json file by utf-8 and use a generator to get content. when I tried to run it, there is an AttributeError:
Traceback (most recent call last):
File "F:\Files\python\yiyouhome\WordSeg\json_load.py", line 25, in <module>
tags = jieba.analyse.extract_tags(content_seg,topK = top_K, withWeight = False, allowPOS = allow_pos)
File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\jieba\analyse\tfidf.py", line 94, in extract_tags
for w in words:
File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\jieba\posseg\__init__.py", line 249, in cut
for w in self.__cut_internal(sentence, HMM=HMM):
File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\jieba\posseg\__init__.py", line 217, in __cut_internal
sentence = strdecode(sentence)
File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\jieba\_compat.py", line 37, in strdecode
sentence = sentence.decode('utf-8')
AttributeError: 'generator' object has no attribute 'decode'
Why does this happen?
At first:
Traceback (most recent call last):
File "F:\Files\python\yiyouhome\WordSeg\json_load.py", line 10, in <module>
json_data = open('spider_raw.json',encoding = 'gbk').read() #,encoding = 'utf-8'
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa3 in position 74: illegal multibyte sequence
So i add encoding = 'utf-8' to fix it.
Here is my code:
import json
import jieba.analyse
import jieba.posseg as pseg
json_data = open('spider_raw.json',encoding = 'utf-8').read()
data = json.loads(json_data)
top_K = 20
allow_pos = ('nr',)
def getcontent(spiderlist):
for k,v in spiderlist.items():
for item in v['talk_mutidetails']:
yield(item['cotent'])
#def getcontenttopic(spiderlist):
item = getcontent(data)
content_seg = pseg.cut(item)
tags = jieba.analyse.extract_tags(content_seg,topK = top_K, withWeight = False, allowPOS = allow_pos)
for t in tags:
print(t)

Python 3 UnicodeEncodeError [duplicate]

This question already has answers here:
Unsupported operation :not writeable python
(2 answers)
Closed 5 years ago.
I have strings as follows in my python list (taken from command prompt):
>>> o['records'][5790]
(5790, 'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60,
True, '40141613')
>>>
I have tried suggestions as mentioned here: Changing default encoding of Python?
Further changed the default encoding to utf-16 too. But still json.dumps() threw and exception as follows:
>>> write(o)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "okapi_create_master.py", line 49, in write
o = json.dumps(output)
File "C:\Python27\lib\json\__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "C:\Python27\lib\json\encoder.py", line 201, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Python27\lib\json\encoder.py", line 264, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 25: invalid
continuation byte
Can't figure what kind of transformation is required for such strings so that json.dumps() works.
\xe1 is not decodable using utf-8, utf-16 encoding.
>>> '\xe1'.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 0: unexpected end of data
>>> '\xe1'.decode('utf-16')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\utf_16.py", line 16, in decode
return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0xe1 in position 0: truncated data
Try latin-1 encoding:
>>> record = (5790, 'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ',
... 60, True, '40141613')
>>> json.dumps(record, encoding='latin1')
'[5790, "Vlv-Gate-Assy-Mdl-\\u00e1M1-2-\\u00e19/16-10K-BB Credit Memo ", 60, true, "40141613"]'
Or, specify ensure_ascii=False, json.dumps to make json.dumps not try to decode the string.
>>> json.dumps(record, ensure_ascii=False)
'[5790, "Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ", 60, true, "40141613"]'
I had a similar problem, and came up with the following approach to either guarantee unicodes or byte strings, from either input. In short, include and use the following lambdas:
# guarantee unicode string
_u = lambda t: t.decode('UTF-8', 'replace') if isinstance(t, str) else t
_uu = lambda *tt: tuple(_u(t) for t in tt)
# guarantee byte string in UTF8 encoding
_u8 = lambda t: t.encode('UTF-8', 'replace') if isinstance(t, unicode) else t
_uu8 = lambda *tt: tuple(_u8(t) for t in tt)
Applied to your question:
import json
o = (5790, u"Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ", 60,
True, '40141613')
as_json = json.dumps(_uu8(*o))
as_obj = json.loads(as_json)
print "object\n ", o
print "json (type %s)\n %s " % (type(as_json), as_json)
print "object again\n ", as_obj
=>
object
(5790, u'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60, True, '40141613')
json (type <type 'str'>)
[5790, "Vlv-Gate-Assy-Mdl-\u00e1M1-2-\u00e19/16-10K-BB Credit Memo ", 60, true, "40141613"]
object again
[5790, u'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60, True, u'40141613']
Here's some more reasoning about this.

pyspark traceback: 'utf8' codec can't decode byte 0xce in position 22: invalid continuation byte

It's my code to run spark in python, and I just follow the code provided by others, but traceback:'utf8' codec can't decode byte 0xce in position 22: invalid continuation byte
# -*- coding: utf-8 -*-
from pyspark import SparkContext, SparkConf
import os
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
os.environ['SPARK_HOME'] = r'D:\spark-2.1.0-bin-hadoop2.7'
appName ="jhl_spark_1"
master= "local"
conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)
rdd = sc.parallelize([2, 3, 4])
print sorted(rdd.flatMap(lambda x: range(1, x)).collect())
and the traceback:
File "F:/eclipse/�ı��ھ�/������APRIORI/spark.py", line 14, in <module>
print sorted(rdd.flatMap(lambda x: range(1, x)).collect())
File "D:\Anaconda2\lib\site-packages\pyspark\rdd.py", line 808, in collect with SCCallSiteSync(self.context) as css:
File "D:\Anaconda2\lib\site-packages\pyspark\traceback_utils.py", line 72, in __enter__self._context._jsc.setCallSite(self._call_site)
File "D:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 1124, in __call__args_command, temp_args = self._build_args(*args)
File "D:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 1094, in _build_args
[get_command_part(arg, self.pool) for arg in new_args])
File "D:\Anaconda2\lib\site-packages\py4j\protocol.py", line 283, in get_command_part
command_part = STRING_TYPE + escape_new_line(parameter)
File "D:\Anaconda2\lib\site-packages\py4j\protocol.py", line 183, in escape_new_line
return smart_decode(original).replace("\\", "\\\\").replace("\r", "\\r").\
File "D:\Anaconda2\lib\site-packages\py4j\protocol.py", line 210, in smart_decode
return unicode(s, "utf-8")
UnicodeDecodeError: 'utf8' codec can't decode byte 0xce in position 22: invalid continuation byte

UnicodeDecodeError: 'ascii' codec can't decode byte 0x87 in position 10: ordinal not in range(128)

I'm working with images2gif and getting this error. Any ideas?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x87 in position 10: ordinal not in range(128)
Test file:
from PIL import Image
from images2gif import writeGif
FRAMES = 2
FRAME_DELAY = 0.75
WIDTH, HEIGHT = 600, 600
frames = []
img1 = Image.open('1.jpg')
img2 = Image.open('2.jpg')
frames.append(img1)
frames.append(img2)
writeGif("test.gif", frames, duration=FRAME_DELAY, dither=0)
Traceback:
Traceback (most recent call last):
File "gif.py", line 15, in <module>
writeGif("topmovie.gif", frames, duration=FRAME_DELAY, dither=0)
File "/Users/Craig/Documents/github/RTB/images2gif.py", line 575, in writeGif
gifWriter.writeGifToFile(fp, images, duration, loops, xy, dispose)
File "/Users/Craig/Documents/github/RTB/images2gif.py", line 435, in writeGifToFile
fp.write(header.encode('utf-8'))
images2gif line 435: fp.write(header.encode('utf-8'))
Updated traceback:
Traceback (most recent call last):
File "gif.py", line 16, in <module>
writeGif("test.gif", frames, duration=FRAME_DELAY, dither=0)
File "/Users/Craig/Documents/github/RTB/images2gif.py", line 579, in writeGif
gifWriter.writeGifToFile(fp, images, duration, loops, xy, dispose)
File "/Users/Craig/Documents/github/RTB/images2gif.py", line 440, in writeGifToFile
fp.write(globalPalette)
TypeError: must be string or buffer, not None
Your updated question is also here: Error in images2gif.py with GlobalPalette
It references an issue on images2gif in which the author says they're going to rewrite the module to not use PIL/Pillow, but are busy: https://code.google.com/p/visvis/issues/detail?id=81
That in turn references a patched images2gif that is claimed to fix the issue: https://github.com/rec/echomesh/blob/master/code/python/external/images2gif.py

UnicodeDecodeError while using json.dumps() [duplicate]

This question already has answers here:
Unsupported operation :not writeable python
(2 answers)
Closed 5 years ago.
I have strings as follows in my python list (taken from command prompt):
>>> o['records'][5790]
(5790, 'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60,
True, '40141613')
>>>
I have tried suggestions as mentioned here: Changing default encoding of Python?
Further changed the default encoding to utf-16 too. But still json.dumps() threw and exception as follows:
>>> write(o)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "okapi_create_master.py", line 49, in write
o = json.dumps(output)
File "C:\Python27\lib\json\__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "C:\Python27\lib\json\encoder.py", line 201, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Python27\lib\json\encoder.py", line 264, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 25: invalid
continuation byte
Can't figure what kind of transformation is required for such strings so that json.dumps() works.
\xe1 is not decodable using utf-8, utf-16 encoding.
>>> '\xe1'.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 0: unexpected end of data
>>> '\xe1'.decode('utf-16')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\utf_16.py", line 16, in decode
return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0xe1 in position 0: truncated data
Try latin-1 encoding:
>>> record = (5790, 'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ',
... 60, True, '40141613')
>>> json.dumps(record, encoding='latin1')
'[5790, "Vlv-Gate-Assy-Mdl-\\u00e1M1-2-\\u00e19/16-10K-BB Credit Memo ", 60, true, "40141613"]'
Or, specify ensure_ascii=False, json.dumps to make json.dumps not try to decode the string.
>>> json.dumps(record, ensure_ascii=False)
'[5790, "Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ", 60, true, "40141613"]'
I had a similar problem, and came up with the following approach to either guarantee unicodes or byte strings, from either input. In short, include and use the following lambdas:
# guarantee unicode string
_u = lambda t: t.decode('UTF-8', 'replace') if isinstance(t, str) else t
_uu = lambda *tt: tuple(_u(t) for t in tt)
# guarantee byte string in UTF8 encoding
_u8 = lambda t: t.encode('UTF-8', 'replace') if isinstance(t, unicode) else t
_uu8 = lambda *tt: tuple(_u8(t) for t in tt)
Applied to your question:
import json
o = (5790, u"Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ", 60,
True, '40141613')
as_json = json.dumps(_uu8(*o))
as_obj = json.loads(as_json)
print "object\n ", o
print "json (type %s)\n %s " % (type(as_json), as_json)
print "object again\n ", as_obj
=>
object
(5790, u'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60, True, '40141613')
json (type <type 'str'>)
[5790, "Vlv-Gate-Assy-Mdl-\u00e1M1-2-\u00e19/16-10K-BB Credit Memo ", 60, true, "40141613"]
object again
[5790, u'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60, True, u'40141613']
Here's some more reasoning about this.

Categories