Pycurl and io.StringIO - pycurl.error: (23, 'Failed writing body)

Pycurl and io.StringIO - pycurl.error: (23, 'Failed writing body) - python

I'm porting ebay sdk to python3 and I've stumbled upon the following issue.
I'm using pycurl to send some HTTP requests.
Here is how I configure it:
self._curl = pycurl.Curl()
self._curl.setopt(pycurl.FOLLOWLOCATION, 1)
self._curl.setopt(pycurl.URL, str(request_url))
self._curl.setopt(pycurl.SSL_VERIFYPEER, 0)
self._response_header = io.StringIO()
self._response_body = io.StringIO()
self._curl.setopt(pycurl.CONNECTTIMEOUT, self.timeout)
self._curl.setopt(pycurl.TIMEOUT, self.timeout)
self._curl.setopt(pycurl.HEADERFUNCTION, self._response_header.write)
self._curl.setopt(pycurl.WRITEFUNCTION, self._response_body.write)
When I call self._curl.perform() I get the following error:
pycurl.error: (23, 'Failed writing body (1457 != 1460)')
As far as I know this means that there is an issue with the write function, but I can't figure out what it is exactly. Could be related to migration from StringIO module to io, but I'm not sure.
UPD:
I've tried the following:
def body(buf):
self._response_body.write(buf)
def header(buf):
self._response_header.write(buf)
self._curl.setopt(pycurl.HEADERFUNCTION, header)
self._curl.setopt(pycurl.WRITEFUNCTION, body)
and it works. I've tried to do the same trick with lambdas (instead of defining those awkward function, but it didn't work.

I believe the problem is that pycurl no longer functions with StringIO like desired. A solution is to use io.BytesIO instead. You can then get information written into the buffer and decode it into a string.
Using BytesIO with pycurl instead of StringIO:
e = io.BytesIO()
c.setopt(pycurl.WRITEFUNCTION, e.write)
Decoding byte information from the BytesIO object:
htmlString = e.getvalue().decode('UTF-8')
You can use any type of decoding you want, but this should give you a string object you can parse.
Hope this helps people using Python 3.

Related

Upload file to Databricks DBFS with Python API

I'm following the Databricks example for uploading a file to DBFS (in my case .csv):
import json
import requests
import base64
DOMAIN = '<databricks-instance>'
TOKEN = '<your-token>'
BASE_URL = 'https://%s/api/2.0/dbfs/' % (DOMAIN)
def dbfs_rpc(action, body):
""" A helper function to make the DBFS API request, request/response is encoded/decoded as JSON """
response = requests.post(
BASE_URL + action,
headers={'Authorization': 'Bearer %s' % TOKEN },
json=body
)
return response.json()
# Create a handle that will be used to add blocks
handle = dbfs_rpc("create", {"path": "/temp/upload_large_file", "overwrite": "true"})['handle']
with open('/a/local/file') as f:
while True:
# A block can be at most 1MB
block = f.read(1 << 20)
if not block:
break
data = base64.standard_b64encode(block)
dbfs_rpc("add-block", {"handle": handle, "data": data})
# close the handle to finish uploading
dbfs_rpc("close", {"handle": handle})
When using the tutorial as is, I get an error:
Traceback (most recent call last):
File "db_api.py", line 65, in <module>
data = base64.standard_b64encode(block)
File "C:\Miniconda3\envs\dash_p36\lib\base64.py", line 95, in standard_b64encode
return b64encode(s)
File "C:\Miniconda3\envs\dash_p36\lib\base64.py", line 58, in b64encode
encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'str'
I tried doing with open('./sample.csv', 'rb') as f: before passing the blocks to base64.standard_b64encode but then getting another error:
TypeError: Object of type 'bytes' is not JSON serializable
This happens when the encoded block data is being sent into the API call.
I tried skipping encoding entirely and just passing the blocks into the post call. In this case the file gets created in the DBFS but has 0 bytes size.
At this point I'm trying to make sense of it all. It doesn't want a string but it doesn't want bytes either. What am I doing wrong? Appreciate any help.

In Python we have strings and bytes, which are two different entities note that there is no implicit conversion between them, so you need to know when to use which and how to convert when necessary. This answer provides nice explanation.
With the code snippet I see two issues:
This you already got - open by default reads the file as text. So your block is a string, while standard_b64encode expects bytes and returns bytes. To read bytes from file it needs to be opened in binary mode:
with open('/a/local/file', 'rb') as f:
Only strings can be encoded as JSON. There's no source code available for dbfs_rpc (or I can't find it), but apparently it expects a string, which it internally encodes. Since your data is bytes, you need to convert it to string explicitly and that's done using decode:
dbfs_rpc("add-block", {"handle": handle, "data": data.decode('utf8')})

Decompressing gzip from HTTPS traffic with zlib, incorrect header check

I'm trying to make a proxy using python that also reads the content of the requests and responses and I'm using this to do it: https://github.com/inaz2/proxy2/blob/python3/proxy2.py
But for some reason I cannot decompress any gzip compressed payloads. What I have tried so far:
#staticmethod
def decode_content_body(data, encoding):
print(encoding) # -> 'gzip'
if not data:
return None
if encoding == 'identity':
text = data
elif encoding in ('gzip', 'x-gzip'):
try:
data = data.encoded('latin_1')
# data = str(data) # no luck
# data = data.encoded() # no luck
compressed_stream = StringIO(data)
gzipper = gzip.GzipFile(fileobj=compressed_stream)
text = gzipper.read() # -> TypeError: can't concat str to bytes
except:
# data has to be bytes like object, says zlib
# text = zlib.decompress(data.encode()) # -> zlib.error: Error -3 while decompressing data: incorrect header check
text = zlib.decompress(data.encode(), -zlib.MAX_WBITS) # -> zlib.error: Error -3 while decompressing data: invalid block type
elif encoding == 'deflate':
try:
text = zlib.decompress(data)
except zlib.error:
text = zlib.decompress(data, -zlib.MAX_WBITS)
else:
raise Exception("Unknown Content-Encoding: {}".format(encoding))
return text
data is not in human readable format so it's clearly compressed with something. Proxy is working with sites that are using HTTPS.

The issue is likely with the data passed to your function.
It is of type str, so the code you're using to read the request data has already attempted to decode the bytes data it received into a str. One of two things may have gone wrong:
it does not expect bytes to contain data compressed at all
it expects the data to be compressed using a different mechanism than the one used in reality
Hence: str contains gibberish!
I had the same problem when using the stomp.py package! In my case, the solution was to explicitly set the auto_decode=False parameter when opening a Stomp connection.
Hope that helps!
PS: More info on how my particular problem was solved: https://groups.google.com/forum/#!topic/openraildata-talk/IsO206F5US8

Python 3.6 Googleads TypeError: cannot use a string pattern on a bytes-like object

I try to make connection with the Google Adwords API using Python 3.6. I managed to install the libraries, got a developer token, client_customer_id, user_agent, client_id, client_secret and requested succesfully a refresh_token.
My googleads.yaml file looks like this:
adwords:
developer_token: hta...
client_customer_id: 235-...-....
user_agent: mycompany
client_id: 25785...apps.googleusercontent.com
client_secret: J9Da...
refresh_token: 1/ckhGH6...
When running the first python script get_campaigns.py, I get the very generic response TypeError: cannot use a string pattern on a bytes-like object in ...\Anaconda3\lib\site-packages\googleads-10.0.0-py3.6.egg\googleads\util.py", line 302, in filter
Other functions like traffic_estimator_service.get(selector) produce the same error. Furthermore, when starting the Python script get_campaigns.py, I get the following warning, which might explains something:
WARNING:googleads.common:Your default encoding, cp1252, is not UTF-8. Please run this script with UTF-8 encoding to avoid errors.
INFO:oauth2client.client:Refreshing access_token
INFO:googleads.common:Request summary - {'methodName': get, 'clientCustomerId': xxx-xxx-xxxx}
I tried many things, but still can't find what causes my error. My settings seem to be right, and I use the examples as provided here. Help is highly appreciated!

There are two solutions for now:
One:
Use Python2.7, solved this error for me.
Two:
For python 3
def method_waraper(self, record):
def filter(self, record):
if record.args:
arg = record.args[0]
if isinstance(arg, suds.transport.Request):
new_arg = suds.transport.Request(arg.url)
sanitized_headers = arg.headers.copy()
if self._AUTHORIZATION_HEADER in sanitized_headers:
sanitized_headers[self._AUTHORIZATION_HEADER] = self._REDACTED
new_arg.headers = sanitized_headers
msg = arg.message
if sys.version_info.major < 3:
msg = msg.decode('utf-8')
new_arg.message = self._DEVELOPER_TOKEN_SUB.sub(
self._REDACTED, str(msg, encoding='utf-8'))
record.args = (new_arg,)
return filter(self, record)
googleads.util._SudsTransportFilter.filter = method_waraper
This solution changes code provided by google and add utf encoding for the binary string, which solves our problem.

Falcon parsing json error

I'm trying out Falcon for a small api project. Unfortunate i'm stuck on the json parsing stuff and code from the documentation examples does not work.
I have tried so many things i've found on Stack and Google but no changes.
I've tried the following codes that results in the errors below
import json
import falcon
class JSON_Middleware(object):
def process_request(self, req, resp):
raw_json = json.loads(req.stream.read().decode('UTF-8'))
"""Exception: AttributeError: 'str' object has no attribute 'read'"""
raw_json = json.loads(req.stream.read(), 'UTF-8')
"""Exception: TypeError: the JSON object must be str, not 'bytes'"""
raw_json = json.loads(req.stream, 'UTF-8')
"""TypeError: the JSON object must be str, not 'Body'"""
I'm on the way of giving up, but if somebody can tell me why this is happening and how to parse JSON in Falcon i would be extremely thankful.
Thanks
Environment:
OSX Sierra
Python 3.5.2
Falcon and other is the latest version from Pip

your code should work if other pieces of code are in place . a quick test(filename app.py):
import falcon
import json
class JSON_Middleware(object):
def process_request(self, req, resp):
raw_json = json.loads(req.stream.read())
print raw_json
class Test:
def on_post(self,req,resp):
pass
app = application = falcon.API(middleware=JSON_Middleware())
t = Test()
app.add_route('/test',t)
run with: gunicorn app
$ curl -XPOST 'localhost:8000' -d '{"Hello":"wold"}'

You have to invoke encode() on the bytes returned by read() with something like req.stream.read().encode('utf-8').
This way the bytes are converted to a str as expected by json.loads().
The other way not to bother with all this boring and error prone encode/decode and bytes/str stuff (which BTW differs in Py2 and Py3), is to use simplejson as a replacement for json. It is API compatible, so the only change is to replace import json with import simplejson as json in your code.
In addition, it simplifies the code since reading the body can be done with json.load(req.bounded_stream), which is much shorter and more readable than json.loads(req.bounded_stream.read().encode('utf-8')).
I now do it this way, and don't use the standard json module any more.

xmlrpc newPaste - expected an object with the buffer interface

in py2 there was
rv = xmlrpc.pastes.newPaste(language, code, None, filename, mimetype, private)
I'm getting error : expected an object with the buffer interface
Can't find any docs about xmlrpc and py3. I found only this snippet :
p1 = subprocess.Popen(['gpg','--clearsign'], stdin = subprocess.PIPE, stdout=subprocess.PIPE)
p1.stdin.write(bytes(input, 'UTF8'))
output = p1.communicate()[0]
s = ServerProxy('http://paste.pocoo.org/xmlrpc/')
pasteid = s.pastes.newPaste('text',output.decode())
print ("http://paste.pocoo.org/raw/",pasteid,"/", sep="")
but I'm still being confused about it... my version used many arguments, where can I find full description of it / fix for it ?
Thank you.

That error message usually means it's looking for str (which is Unicode in Python 3), not bytes . Like in the example, you'll need to decode the argument which is in bytes. Maybe:
rv = xmlrpc.pastes.newPaste(language, code.decode(), None, filename, mimetype, private)
But it's hard to tell what the problem is without seeing your code.

In Python 3. xmlrpclib has been split into two modules, xmlrpc.client and xmlrpc.server.
The docs for 3.2.1 can be found at:
http://docs.python.org/release/3.2.1/library/xmlrpc.client.html
http://docs.python.org/release/3.2.1/library/xmlrpc.server.html

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pycurl and io.StringIO - pycurl.error: (23, 'Failed writing body) - python

Related

Upload file to Databricks DBFS with Python API

Decompressing gzip from HTTPS traffic with zlib, incorrect header check

Python 3.6 Googleads TypeError: cannot use a string pattern on a bytes-like object

Falcon parsing json error

xmlrpc newPaste - expected an object with the buffer interface

Categories

Resources