tornado.httpclient AsyncHTTPClient() python3

tornado.httpclient AsyncHTTPClient() python3 - python

I have a strange problem and i hope somebody encountered with it.
I'm working with TelegramAPI and i want to POST file using
multipart/form-data. File size 32K
data = {'photo': open('test.jpg', 'rb').read()}
Using simple requests python lib i have no problem:
res = requests.post(url, files=data)
BUT
When i try to use
http_client = httpclient.AsyncHTTPClient()
http_client.fetch(url, method='POST', body=urllib.parse.urlencode(data))
With the same picture
I got an error
tornado.httpclient.HTTPError: HTTP 413: Request Entity Too Large
I don't know why? requests works fine, but not AsyncHTTPClient, help me please

Please check out this demo code. You will see there an example on how to upload files.

The body argument in Tornado's HTTP client is similar to the data argument in requests. The files argument is something else entirely: it encodes the file using the multipart encoding. Which one you want to use depends on what format the server is expecting.
In this case the server is expecting multipart encoding, not URL encoding. Tornado does not have built-in support for generating multipart encoding, but as Vitalie said in the other answer, this example code shows how to do it.

Related

Is it possible to get only the header without fetching the body using the requests.get command? the server is blocking HEAD

In a configuration I am using, a minio server hosting files, accepts only GET requests and does not accepts HEAD requests. I need the header information to check for file-type to avoid fetching the entire file.
I would do it usually with requests.head(url) however as I mentioned earlier only the GET method is allowed.
In curl it is possible to do the following:
curl -I -X GET http://domain.dom/path/
which curls the header of the url but overrides the used method with the GET HTTP method.
Is there something equivalent for the Python3 requests package?

Unfortunately there doesn't seem to be a clean way to do this. If the server accepts Range header, you could try requesting the bytes from 0 to 0, which nets you access to the header data but not the body. For example
import requests
url = "http://stackoverflow.com"
headers = {"Range": "bytes=0-0"}
res = requests.get(url, headers=headers)
print(res.headers)
As said, this still depends on the server implementation. For reference: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range

Based on the definition of a GET, it sounds like you could modify the request headers to include a range-request.
A client can alter the semantics of GET to be a "range request", requesting transfer of only some part(s) of the selected representation, by sending a Range header field in the request (Section 14.2).
I haven't tried this, but maybe setting a byte range of 0-1 would skip the body and you'd get the headers for free.

Python http.client - What is the difference between request and putrequest?

The documentation I've found explaining http.client for Python seems a bit sparse. I want to use it over requests because requests has not worked for our project.
So, knowing that I'm using Python's http.client, I'm seeing again and again request and putrequest. Both methods are defined here under HTTPConnection.
HTTPConnection.request: This will send a request to the server using
the HTTP request method method and the selector url.
HTTPConnection.putrequest: This should be the first call after the
connection to the server has been made. It sends a line to the server
consisting of the method string, the url string, and the HTTP version
(HTTP/1.1). To disable automatic sending of Host: or Accept-Encoding:
headers (for example to accept additional content encodings), specify
skip_host or skip_accept_encoding with non-False values.
Also, the source code for both is defined in this file.
From my guess and reading things, it seems like request is a more high level API compared to putrequest. Is that correct?

The Answer: request() is an abstracted version of multiple functions, putrequest() being one of them.
Although this is defined in the documentation, it's easy to skip over the line that answers this question.
This is pointed out in this line of the http.client documentation:
As an alternative to using the request() method described above, you can also send your request step by step, by using the four functions below.

Frequent ChunkedEncodingError with Google App Engine / Requests

I frequently have a ChunkedEncodingError when requesting servers using Requests (Python) and Google App Engine.
I looked at the answer from IncompleteRead using httplib but the problem is that I don't believe my issue is related to the querying server : I often get this error with various endpoints I'm using, including Intercom and FullContact.
I would have suspected the issue was related to the server of one service if the issue was always raised from the same server (for example, FullContact), but it's not the case. I've also encounter this issue with other, non related, requests.
So I'm suspecting the problem is either my code or Google. But from my code "point of view", I don't know what would be wrong. Here's a snippet:
result = requests.post(
"https://api.intercom.io/companies",
json={'some': 'data', 'that': 'are', 'sent': 'ok'},
headers={'Accept': 'application/json'},
auth=("app_id", "app_key",)
)
As you can see, the request is quite standard, nothing fancy. It also fails with something as simple as:
r = requests.get(url, params=params, timeout=3)
Does anyone experiences those issues with Google App Engine? Is there something I can do to avoid that?

There is a patch that (seems) to work on GAE.
The issue is located in the iter_content function of requests, that uses the subsequent urllib3 library.
The issue is that Google override this library for their own implementation, but with a few changes that yield a ChunkedEncodingError at the Requests level.
I tried this patch, and so far, so good. In details, you must replace the following line in your requests/models.py file :
for chunk in self.raw.stream(chunk_size, decode_content=True):
yield chunk
by :
if isinstance(self.raw._original_response._method, int):
while True:
chunk = self.raw.read(chunk_size, decode_content=True)
if not chunk:
break
yield chunk
else:
for chunk in self.raw.stream(chunk_size, decode_content=True):
yield chunk
And the problem will stop.
I submitted an issue to talk about it on the Requests repository, and we'll see how this will evolve.

Documentation for Flask app object `get` and `post` class methods?

In the Flask documentation on testing (http://flask.pocoo.org/docs/testing/), it has a line of code
rv = self.app.get('/')
And below it, it mentions "By using self.app.get we can send an HTTP GET request to the application with the given path."
Where can the documentation be found for these direct access methods (I'm assuming that there's one for all of the restful methods)? Specifically, I'm wondering what sort of arguments they can take (for example, passing in data, headers, etc). Looking around on flask's documentation for a Flask object, it doesn't seem to list these methods, even though it uses them in the above example.
Alternatively, a knowledgeable individual could answer what I am trying to figure out: I'm trying to simulate sending a POST request to my server, as I would with the following line, if I were doing it over HTTP:
res = requests.post("http://localhost:%d/generate" % port,
data=json.dumps(payload),
headers={"content-type": "application/json"})
The above works when running a Flask app on the proper port. But I tried replacing it with the following:
res = self.app.post("/generate",
data=json.dumps(payload),
headers={"content-type": "application/json"})
And instead, the object I get in response is a 400 BAD REQUEST.

This is documented in the Werkzeug project, from which Flask gets the test client: Werkzeug's test client.
The test client does not issue HTTP requests, it dispatches requests internally, so there is no need to specify a port.
The documentation isn't very clear about support for JSON in the body, but it seems if you pass a string and set the content type you should be fine, so I'm not exactly sure why you get back a code 400. I would check if your /generate view function is invoked at all. A debugger should be useful to figure out where is the 400 coming from.

Inconsistent behavior with HTTP POST requests in Python

Trying to make a POST request between a Python (WSGI) and a NodeJS + Express application. They are on different servers.
The problem is that when using different IP addresses (i.e. private network vs. public network), a urllib2 request on the public network succeeds, but the same request for the private network fails with a 502 Bad Gateway or URLError [32] Broken pipe.
The urllib2 code I'm using is this:
req = urllib2.Request(url, "{'some':'data'}", {'Content-Type' : 'application/json; charset=utf-8'})
res = urllib2.urlopen(req)
print f.read()
Now, I have also coded the request like this, using requests:
r = requests.post(url, headers = {'Content-Type' : 'application/json; charset=utf-8'}, data = "{'some':'data'}")
print r.text
And get a 200 OK response. This alternate method works for both networks.
I am interested in finding out if there is some additional configuration needed for a urllib2 request that I don't know of, or if I need to look into some network configuration which might be missing (I don't believe this is the case, since the alternate request method works, but I could definitely be wrong).
Any suggestions or pointers with this will be greatly appreciated. Thanks!

The problem here is that, as Austin Phillips pointed out, urllib2.Request's constructor's data parameter:
may be a string specifying additional data to send to the server… data should be a buffer in the standard application/x-www-form-urlencoded format. The urllib.urlencode() function takes a mapping or sequence of 2-tuples and returns a string in this format.
By passing it JSON-encoded data instead of urlencoded data, you're confusing it somewhere.
However, Request has a method add_data:
Set the Request data to data. This is ignored by all handlers except HTTP handlers — and there it should be a byte string, and will change the request to be POST rather than GET.
If you use this, you should probably also use add_header rather than passing it in the constructor, although that doesn't seem to be mentioned specifically anywhere in the documentation.
So, this should work:
req = urllib2.Request(url)
req.add_data("{'some':'data'}")
req.add_header('Content-Type', 'application/json; charset=utf-8')
res = urllib2.urlopen(req)
In a comment, you said:
The reason I don't want to just switch over to requests without finding out why I'm seeing this problem is that there may be some deeper underlying issue that this points to that could come back and cause harder-to-detect problems later on.
If you want to find deep underlying issues, you're not going to do that by just looking at your client-side source. The first step to figuring out "Why does X work but Y fails?" with network code is to figure out exactly what bytes X and Y each send. Then you can try to narrow down what the relevant difference is, and then figure out what part of your code is causing Y to send the wrong data in the relevant place.
You can do this by logging things at the service (if you control it), running Wireshark, etc., but the easiest way, for simple cases, is netcat. You'll need to read man nc for your system (and, on Windows, you'll need to get and install netcat before you can run it), because the syntax is different for each version, but it's always something simple like nc -kl 12345.
Then, in your client, change the URL to use localhost:12345 in place of the hostname, and it'll connect up to netcat and send its HTTP request, which will be dumped to the terminal. You can then copy that and use nc HOST 80 and paste it to see how the real server responds, and use that to narrow down where the problem is. Or, if you get stuck, at least you can copy and paste the data to your SO question.
One last thing: This is almost certainly not relevant to your problem (because you're sending the exact same data with requests and it's working), but your data is not actually valid JSON, because it uses single quotes instead of double quotes. According to the docs, string is defined as:
string
""
" chars "
(The docs have a nice graphical representation as well.)
In general, except for really simple test cases, you don't want to write JSON by hand. In many cases (including yours), all you have to do is replace the "…" with json.dumps(…), so this isn't a serious hardship. So:
req = urllib2.Request(url)
req.add_data(json.dumps({'some':'data'}))
req.add_header('Content-Type', 'application/json; charset=utf-8')
res = urllib2.urlopen(req)
So, why is it working? Well, in JavaScript, single-quoted strings are legal, as well as other things like backslash escapes that aren't valid in JSON, and any JS code that uses restricted-eval (or, worse, raw eval) for parsing will accept it. And, because so many people got used to writing bad JSON because of this, many browsers' native JSON parsers and many JSON libraries in other languages have workarounds to allow common errors. But you shouldn't rely on that.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.