Inconsistent behavior with HTTP POST requests in Python

Inconsistent behavior with HTTP POST requests in Python - python

Trying to make a POST request between a Python (WSGI) and a NodeJS + Express application. They are on different servers.
The problem is that when using different IP addresses (i.e. private network vs. public network), a urllib2 request on the public network succeeds, but the same request for the private network fails with a 502 Bad Gateway or URLError [32] Broken pipe.
The urllib2 code I'm using is this:
req = urllib2.Request(url, "{'some':'data'}", {'Content-Type' : 'application/json; charset=utf-8'})
res = urllib2.urlopen(req)
print f.read()
Now, I have also coded the request like this, using requests:
r = requests.post(url, headers = {'Content-Type' : 'application/json; charset=utf-8'}, data = "{'some':'data'}")
print r.text
And get a 200 OK response. This alternate method works for both networks.
I am interested in finding out if there is some additional configuration needed for a urllib2 request that I don't know of, or if I need to look into some network configuration which might be missing (I don't believe this is the case, since the alternate request method works, but I could definitely be wrong).
Any suggestions or pointers with this will be greatly appreciated. Thanks!

The problem here is that, as Austin Phillips pointed out, urllib2.Request's constructor's data parameter:
may be a string specifying additional data to send to the server… data should be a buffer in the standard application/x-www-form-urlencoded format. The urllib.urlencode() function takes a mapping or sequence of 2-tuples and returns a string in this format.
By passing it JSON-encoded data instead of urlencoded data, you're confusing it somewhere.
However, Request has a method add_data:
Set the Request data to data. This is ignored by all handlers except HTTP handlers — and there it should be a byte string, and will change the request to be POST rather than GET.
If you use this, you should probably also use add_header rather than passing it in the constructor, although that doesn't seem to be mentioned specifically anywhere in the documentation.
So, this should work:
req = urllib2.Request(url)
req.add_data("{'some':'data'}")
req.add_header('Content-Type', 'application/json; charset=utf-8')
res = urllib2.urlopen(req)
In a comment, you said:
The reason I don't want to just switch over to requests without finding out why I'm seeing this problem is that there may be some deeper underlying issue that this points to that could come back and cause harder-to-detect problems later on.
If you want to find deep underlying issues, you're not going to do that by just looking at your client-side source. The first step to figuring out "Why does X work but Y fails?" with network code is to figure out exactly what bytes X and Y each send. Then you can try to narrow down what the relevant difference is, and then figure out what part of your code is causing Y to send the wrong data in the relevant place.
You can do this by logging things at the service (if you control it), running Wireshark, etc., but the easiest way, for simple cases, is netcat. You'll need to read man nc for your system (and, on Windows, you'll need to get and install netcat before you can run it), because the syntax is different for each version, but it's always something simple like nc -kl 12345.
Then, in your client, change the URL to use localhost:12345 in place of the hostname, and it'll connect up to netcat and send its HTTP request, which will be dumped to the terminal. You can then copy that and use nc HOST 80 and paste it to see how the real server responds, and use that to narrow down where the problem is. Or, if you get stuck, at least you can copy and paste the data to your SO question.
One last thing: This is almost certainly not relevant to your problem (because you're sending the exact same data with requests and it's working), but your data is not actually valid JSON, because it uses single quotes instead of double quotes. According to the docs, string is defined as:
string
""
" chars "
(The docs have a nice graphical representation as well.)
In general, except for really simple test cases, you don't want to write JSON by hand. In many cases (including yours), all you have to do is replace the "…" with json.dumps(…), so this isn't a serious hardship. So:
req = urllib2.Request(url)
req.add_data(json.dumps({'some':'data'}))
req.add_header('Content-Type', 'application/json; charset=utf-8')
res = urllib2.urlopen(req)
So, why is it working? Well, in JavaScript, single-quoted strings are legal, as well as other things like backslash escapes that aren't valid in JSON, and any JS code that uses restricted-eval (or, worse, raw eval) for parsing will accept it. And, because so many people got used to writing bad JSON because of this, many browsers' native JSON parsers and many JSON libraries in other languages have workarounds to allow common errors. But you shouldn't rely on that.

Related

Is it possible to get only the header without fetching the body using the requests.get command? the server is blocking HEAD

In a configuration I am using, a minio server hosting files, accepts only GET requests and does not accepts HEAD requests. I need the header information to check for file-type to avoid fetching the entire file.
I would do it usually with requests.head(url) however as I mentioned earlier only the GET method is allowed.
In curl it is possible to do the following:
curl -I -X GET http://domain.dom/path/
which curls the header of the url but overrides the used method with the GET HTTP method.
Is there something equivalent for the Python3 requests package?

Unfortunately there doesn't seem to be a clean way to do this. If the server accepts Range header, you could try requesting the bytes from 0 to 0, which nets you access to the header data but not the body. For example
import requests
url = "http://stackoverflow.com"
headers = {"Range": "bytes=0-0"}
res = requests.get(url, headers=headers)
print(res.headers)
As said, this still depends on the server implementation. For reference: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range

Based on the definition of a GET, it sounds like you could modify the request headers to include a range-request.
A client can alter the semantics of GET to be a "range request", requesting transfer of only some part(s) of the selected representation, by sending a Range header field in the request (Section 14.2).
I haven't tried this, but maybe setting a byte range of 0-1 would skip the body and you'd get the headers for free.

Using Python urllib2, How can I stream between a GET and a POST?

I want to write code to transfer a file from one site to another. This can be a large file, and I'd like to do it without creating a local temporary file.
I saw the trick of using mmap to upload a large file in Python: "HTTP Post a large file with streaming", but what I really need is a way to link up the response from the GET to creating the POST.
Anyone done this before?

You can't, or at least shouldn't.
urllib2 request objects have no way to stream data into them on the fly, period. And in the other direction, response objects are file-like objects, so in theory you can read(8192) out of them instead of read(), but for most protocols—including HTTP—it will either often or always read the whole response into memory and serve your read(8192) calls out of its buffer, making it pointless. So, you have to intercept the request, steal the socket out of it, and deal with it manually, at which point urllib2 is getting in your way more than it's helping.
urllib2 makes some things easy, some things much harder than they should be, and some things next to impossible; when it isn't making things easy, stop using it.
One solution is to use a higher-level third-party library. For example, requests gets you half-way there (it makes it very easy to stream from a response, but can only stream into a response in limited situations), and requests-toolbelt gets you the rest of the way there (it adds various ways to stream-upload).
The other solution is to use a lower-level library. And here, you don't even have to leave the stdlib. httplib forces you to think in terms of sending and receiving things bit by bit, but that's exactly what you want. On the get request, you can just call connect and request, and then call read(8192) repeatedly on the response object. On the post request, you call connect, putrequest, putheader, endheaders, then repeatedly send each buffer from the get request, then getresponse when you're done.
In fact, in Python 3.2+'s http.client (the equivalent of 2.x's httplib), HTTPClient.request doesn't have to be a string, it can be any iterable or any file-like object with read and fileno methods… which includes an response object. So, it's this simple:
import http.client
getconn = httplib.HTTPConnection('www.example.com')
getconn.request('GET', 'http://www.example.com/spam')
getresp = getconn.getresponse()
getconn = httplib.HTTPConnection('www.example.com')
getconn.request('POST', 'http://www.example.com/eggs', body=getresp)
getresp = getconn.getresponse()
… except, of course, that you probably want to craft appropriate headers (you can actually use urllib.request, the 3.x version of urllib2, to build a Request object and not send it…), and pull the host and port out of the URL with urlparse instead of hardcoding them, and you want to exhaust or at least check the response from the POST request, and so on. But this shows the hard part, and it's not hard.
Unfortunately, I don't think this works in 2.x.
Finally, if you're familiar with libcurl, there are at least three wrappers for it (including one that comes with the source distribution). I'm not sure whether to call libcurl higher-level or lower-level than urllib2, it's sort of on its own weird axis of complexity. :)

urllib2 may be too simple for this task. You might want to look into pycurl. I know it supports streaming.

Send a GET request with a body

I'm using elasticsearch and the RESTful API supports supports reading bodies in GET requests for search criteria.
I'm currently doing
response = urllib.request.urlopen(url, data).read().decode("utf-8")
If data is present, it issues a POST, otherwise a GET. How can I force a GET despite the fact that I'm including data (which should be in the request body as per a POST)
Nb: I'm aware I can use a source property in the Url but the queries we're running are complex and the query definition is quite verbose resulting in extremely long urls (long enough that they can interfere with some older browsers and proxies).

I'm not aware of a nice way to do this using urllib. However, requests makes it trivial (and, in fact, trivial with any arbitrary verb and request content) by using the requests.request* function:
requests.request(method='get', url='localhost/test', data='some data')
Constructing a small test web server will show that the data is indeed sent in the body of the request, and that the method perceived by the server is indeed a GET.
*note that I linked to the requests.api.requests code because that's where the actual function definition lives. You should call it using requests.request(...)

Interfacing to the LinkPoint API with Python - Sending XML Over SSL with Authentication

I'm trying to make a successful connection to the LinkPoint gateway using Python. For those of you unfamiliar with their API you get a .pem file you use for authentication purposes.
I'm having trouble using this file and creating a secure connection over SSL.
According to their API documentation (which leaves a lot to be desired, btw) I believe the configuration should look similar to below:
HOST = 'secure.linkpt.net'
API_URL = 'https://secure.linkpt.net/lpc/servlet/lppay'
PORT = 1129
cert_key = my_cert_key.pem
Using this information and a valid XML string how can I create this connection?
I'm pretty new to HTTP connections in Python. I've successfully implemented connections with other APIs using a POST with urllib2. Naturally, my first attempt started with a similar approach hoping I could stumble on to a solution.
Something like:
headers = { 'User-Agent' : 'Rico',
'Content-type' : 'text/xml; charset=\"UTF-8\"',
'Content-length' : len(self.xml_string),
}
# POST to First Data (Link Point)
req = urllib2.Request(API_URL, self.xml_string, headers)
response = urllib2.urlopen(req)
self.handleResponse(response.read())
I had little hopes this would work as I didn't provide anything about the cert_key or the PORT.
After this attempt I tried to use a similar approach as I found from a solution from another stackoverflow post. Unfortunately I wasn't able to get far with this as I don't have ca_certs or cert files (that I know of).
I've tried to use Requests but can't find the documentation/examples for me to make sense of it.
I've also tried to use Twisted, and I really hoped I could do something with this but this feels like trying to open a door with a wrecking ball. It just feels like overkill to me. I just need a simple connection/request/response...this seems overly complicated for that.
My next attempt was going to be PycURL, but have confronted enough despair during this process I thought I'd come here to see if someone had some good suggestions before diving into this.
If you think I should re-visit one of these tools please let me know. I didn't spend a great deal of time with any of these - just enough to get my feet wet. If you could also point me to a good example or detailed documentation that would be fantastic.
Also, I'd prefer not to use the standard SSL library to build the connection myself - I don't want to reinvent the wheel if I don't have to.

The solution I was able to use to get a valid connection was using httplib as follows:
import httplib
HOST = 'staging.linkpt.net'
API_URL = '/lpc/servlet/lppay'
PORT = 1129
CERTFILE = 'my_cert_file.pem'
headers = { 'User-Agent' : 'Rico',
'Content-type' : 'text/xml; charset=\"UTF-8\"',
'Content-length' : len(xml_str),
}
conn = httplib.HTTPSConnection(HOST, PORT, cert_file = CERTFILE)
conn.putrequest("POST", API_URL)
conn.putheader(headers)
conn.endheaders()
conn.send(xml_str)
response = conn.getresponse()
I have yet been able to generate a valid request. Apparently I interpreted the API documentation incorrectly and keep getting a Malformed or unrecognized request. but at least I'm making the connection.
I'll update this answer if I'm able to determine more useful information regarding this subject.
UPDATE: A Link Point customer service employee told me I was using old API documentation. I've since tried with the newer version and still cannot connect. I can't even get a response from their server. This is no longer a possible solution to this problem.
UPDATE 2: I was able to solve this problem in another post SSL Connection Using .pem Certificate With Python
Enjoy!

PUT Variables Missing between Python and Tomcat

I'm trying to get a PUT request from Python into a servlet in Tomcat. The parameters are missing when I get into Tomcat.
The same code is happily working for POST requests, but not for PUT.
Here's the client:
lConnection = httplib.HTTPConnection('localhost:8080')
lHeaders = {"Content-type": "application/x-www-form-urlencoded",
"Accept": "text/plain"}
lParams = {'Username':'usr', 'Password':'password', 'Forenames':'First','Surname':'Last'}
lConnection.request("PUT", "/my/url/", urllib.urlencode(lParams), lHeaders)
Once in the server, a request.getParameter("Username") is returning null.
Has anyone got any clues as to where I'm losing the parameters?

I tried your code and it seems that the parameters get to the server using that code. Tcpdump gives:
PUT /my/url/ HTTP/1.1
Host: localhost
Accept-Encoding: identity
Content-Length: 59
Content-type: application/x-www-form-urlencoded
Accept: text/plain
Username=usr&Password=password&Surname=Last&Forenames=First
So the request gets to the other side correctly, it must be something with either tomcat configuration or the code that is trying to read the parameters.

I don't know what the Tomcat side of your code looks like, or how Tomcat processes and provides access to request parameters, but my guess is that Tomcat is not "automagically" parsing the body of your PUT request into nice request parameters for you.
I ran into the exact same problem using the built-in webapp framework (in Python) on App Engine. It did not parse the body of my PUT requests into request parameters available via self.request.get('param'), even though they were coming in as application/x-www-form-urlencoded.
You'll have to check on the Tomcat side to confirm this, though. You may end up having to access the body of the PUT request and parse out the parameters yourself.
Whether or not your web framework should be expected to automagically parse out application/x-www-form-urlencoded parameters in PUT requests (like it does with POST requests) is debatable.

I'm guessing here, but I think the problem is that PUT isn't meant to be used that way. The intent of PUT is to store a single entity, contained in the request, into the resource named in the headers. What's all this stuff about user name and stuff?
Your Content Type is application/X-www-form-urlencoded, which is a bunch of field contents. What PUT wants is something like an encoded file. You know, a single bunch of data it can store somewhere.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.