Frequent ChunkedEncodingError with Google App Engine / Requests - python

I frequently have a ChunkedEncodingError when requesting servers using Requests (Python) and Google App Engine.
I looked at the answer from IncompleteRead using httplib but the problem is that I don't believe my issue is related to the querying server : I often get this error with various endpoints I'm using, including Intercom and FullContact.
I would have suspected the issue was related to the server of one service if the issue was always raised from the same server (for example, FullContact), but it's not the case. I've also encounter this issue with other, non related, requests.
So I'm suspecting the problem is either my code or Google. But from my code "point of view", I don't know what would be wrong. Here's a snippet:
result = requests.post(
"https://api.intercom.io/companies",
json={'some': 'data', 'that': 'are', 'sent': 'ok'},
headers={'Accept': 'application/json'},
auth=("app_id", "app_key",)
)
As you can see, the request is quite standard, nothing fancy. It also fails with something as simple as:
r = requests.get(url, params=params, timeout=3)
Does anyone experiences those issues with Google App Engine? Is there something I can do to avoid that?

There is a patch that (seems) to work on GAE.
The issue is located in the iter_content function of requests, that uses the subsequent urllib3 library.
The issue is that Google override this library for their own implementation, but with a few changes that yield a ChunkedEncodingError at the Requests level.
I tried this patch, and so far, so good. In details, you must replace the following line in your requests/models.py file :
for chunk in self.raw.stream(chunk_size, decode_content=True):
yield chunk
by :
if isinstance(self.raw._original_response._method, int):
while True:
chunk = self.raw.read(chunk_size, decode_content=True)
if not chunk:
break
yield chunk
else:
for chunk in self.raw.stream(chunk_size, decode_content=True):
yield chunk
And the problem will stop.
I submitted an issue to talk about it on the Requests repository, and we'll see how this will evolve.

Related

SAS: proc http working - request.get in Python does not - why?

have tried looking into other similar question and search the web, but I cannot seem to find the answer, so hope some clever people here can help or guide me.
I have a proc http request in SAS which runs fine on my local machine, no problems:
filename lst temp;
proc http
url = "http://xxx/api/job/"
method = "get"
out = lst;
run;
libname lst json fileref=lst automap=create;
Trying to do the same in Python gives me error code 401.
import requests
response = requests.get("http://xxx/api/job/")
print(response)
print(response.status_code)
This is an API from a system running internally in our organization. One needs to log on the first time when accessing through a web browser, but then it works.
I have tried all the different auth= etc. I could find in the documentation giving my user and password. But, nothing seems to work.
Some how, SAS proc http must be working since my profile/user is somehow verified, but via Python it is not - or at least that is what I am thinking.
Any suggestions?

Debugging a python requests module 400 error

I'm doing a post request using python and confluence REST API in order to update confluence pages via a script.
I ran into a problem which caused me to receive a 400 error in response to a
requests.put(url, data = jsonData, auth = (username, passwd), headers = {'Content-Type' : 'application/json'})
I spent some time on this to discover that the reason for it was me not supplying an incremented version when updating the content. I have managed to make my script work, but that is not the point of this question.
During my attempts to make this work, I swapped from requests to an http.client connection. Using this module, I get a lot more information regarding my error:
b'{"statusCode":400,"data":{"authorized":false,"valid":true,"allowedInReadOnlyMode":true,"errors":[],"successful":false},"message":"Must supply an incremented version when updating Content. No version supplied.","reason":"Bad Request"}'
Is there a way for me to get the same feedback information while using requests? I've turned on logging, but this kind of info is never shown.
You're looking for
requests.json()
It outputs everything the requests item returns, as a dictionary.

Python SSLError: VERSION_TOO_LOW

I'm having some trouble using urllib to fetch some web content on my Debian server. I use the following code to get the contents of most websites without problems:
import urllib.request as request
url = 'https://www.metal-archives.com/'
req = request.Request(url, headers={'User-Agent': "foobar"})
response = request.urlopen(req)
response.read()
However, if the website is using an older encryption protocol, the urlopen function will throw the following error:
ssl.SSLError: [SSL: VERSION_TOO_LOW] version too low (_ssl.c:748)
I have found a way to work around this problem, consisting in using an SSL context and passing it as an argument to the urlopen function, so the previous code would have to be modified:
...
context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
response = request.urlopen(req, context=context)
...
Which will work, provided the protocol specified matches the website I'm trying to access. However, this does not seem like the best solution since:
If the site owners ever update their cryptography methods, the code will stop working
The code above will only work for this site, and I would have to create special cases for every website I visit in the entire program, since everyone could be using a different version of the protocol. That would lead to pretty messy code
The first solution I posted (the one without the ssl context) oddly seems to work on my ArchLinux machine, even though they both have the same versions of everything
Does anyone know about a generic solution that would work for every TLS version? Am I missing something here?
PS: For completeness, I will add that I'm using Debian 9, python v3.6.2, openssl v1.1.0f and urllib3 v1.22
In the end, I've opted to wrap the method call inside a try-except, so I can use the older SSL version as fallback. The final code is this:
url = 'https://www.metal-archives.com'
req = request.Request(url, headers={"User-Agent": "foobar"})
try:
response = request.urlopen(req)
except (ssl.SSLError, URLError):
# Try to use the older TLSv1 to see if we can fix the problem
context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
response = request.urlopen(req, context=context)
I have only tested this code on a dozen websites and it seems to work so far, but I'm not sure it will work every time. Also, this solution seems inefficient, since it needs two http requests, which can be very slow.
Improvements are still welcome :)

Python 3.6 Requests too long

I am trying to use requests to pull information from the NPI API but it is taking on average over 20 seconds to pull the information. If I try and access it via my web browser it takes less than a second. I'm rather new to this and any help would be greatly appreciated. Here is my code.
import json
import sys
import requests
url = "https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=&last_name=&organization_name=&address_purpose=&city=&state=&postal_code=10017&country_code=&limit=&skip="
htmlfile=requests.get(url)
data = htmlfile.json()
for i in data["results"]:
print(i)
This might be due to the response being incorrectly formatted, or due to requests taking longer than necessary to set up the request. To solve these issues, read on:
Server response formatted incorrectly
A possible issue might be that the response parsing is actually the offending line. You can check this by not reading the response you receive from the server. If the code is still slow, this is not your problem, but if this fixed it, the problem might lie with parsing the response.
In case some headers are set incorrectly, this can lead to parsing errors which prevents chunked transfer (source).
In other cases, setting the encoding manually might resolve parsing problems (source).
To fix those, try:
r = requests.get(url)
r.raw.chunked = True # Fix issue 1
r.encoding = 'utf-8' # Fix issue 2
print(response.text)
Setting up the request takes long
This is mainly applicable if you're sending multiple requests in a row. To prevent requests having to set up the connection each time, you can utilize a requests.Session. This makes sure the connection to the server stays open and configured and also persists cookies as a nice benefit. Try this (source):
import requests
session = requests.Session()
for _ in range(10):
session.get(url)
Didn't solve your issue?
If that did not solve your issue, I have collected some other possible solutions here.

tornado.httpclient AsyncHTTPClient() python3

I have a strange problem and i hope somebody encountered with it.
I'm working with TelegramAPI and i want to POST file using
multipart/form-data. File size 32K
data = {'photo': open('test.jpg', 'rb').read()}
Using simple requests python lib i have no problem:
res = requests.post(url, files=data)
BUT
When i try to use
http_client = httpclient.AsyncHTTPClient()
http_client.fetch(url, method='POST', body=urllib.parse.urlencode(data))
With the same picture
I got an error
tornado.httpclient.HTTPError: HTTP 413: Request Entity Too Large
I don't know why? requests works fine, but not AsyncHTTPClient, help me please
Please check out this demo code. You will see there an example on how to upload files.
The body argument in Tornado's HTTP client is similar to the data argument in requests. The files argument is something else entirely: it encodes the file using the multipart encoding. Which one you want to use depends on what format the server is expecting.
In this case the server is expecting multipart encoding, not URL encoding. Tornado does not have built-in support for generating multipart encoding, but as Vitalie said in the other answer, this example code shows how to do it.

Categories