PUT Variables Missing between Python and Tomcat - python

I'm trying to get a PUT request from Python into a servlet in Tomcat. The parameters are missing when I get into Tomcat.
The same code is happily working for POST requests, but not for PUT.
Here's the client:
lConnection = httplib.HTTPConnection('localhost:8080')
lHeaders = {"Content-type": "application/x-www-form-urlencoded",
"Accept": "text/plain"}
lParams = {'Username':'usr', 'Password':'password', 'Forenames':'First','Surname':'Last'}
lConnection.request("PUT", "/my/url/", urllib.urlencode(lParams), lHeaders)
Once in the server, a request.getParameter("Username") is returning null.
Has anyone got any clues as to where I'm losing the parameters?

I tried your code and it seems that the parameters get to the server using that code. Tcpdump gives:
PUT /my/url/ HTTP/1.1
Host: localhost
Accept-Encoding: identity
Content-Length: 59
Content-type: application/x-www-form-urlencoded
Accept: text/plain
Username=usr&Password=password&Surname=Last&Forenames=First
So the request gets to the other side correctly, it must be something with either tomcat configuration or the code that is trying to read the parameters.

I don't know what the Tomcat side of your code looks like, or how Tomcat processes and provides access to request parameters, but my guess is that Tomcat is not "automagically" parsing the body of your PUT request into nice request parameters for you.
I ran into the exact same problem using the built-in webapp framework (in Python) on App Engine. It did not parse the body of my PUT requests into request parameters available via self.request.get('param'), even though they were coming in as application/x-www-form-urlencoded.
You'll have to check on the Tomcat side to confirm this, though. You may end up having to access the body of the PUT request and parse out the parameters yourself.
Whether or not your web framework should be expected to automagically parse out application/x-www-form-urlencoded parameters in PUT requests (like it does with POST requests) is debatable.

I'm guessing here, but I think the problem is that PUT isn't meant to be used that way. The intent of PUT is to store a single entity, contained in the request, into the resource named in the headers. What's all this stuff about user name and stuff?
Your Content Type is application/X-www-form-urlencoded, which is a bunch of field contents. What PUT wants is something like an encoded file. You know, a single bunch of data it can store somewhere.

Related

Is it possible to get only the header without fetching the body using the requests.get command? the server is blocking HEAD

In a configuration I am using, a minio server hosting files, accepts only GET requests and does not accepts HEAD requests. I need the header information to check for file-type to avoid fetching the entire file.
I would do it usually with requests.head(url) however as I mentioned earlier only the GET method is allowed.
In curl it is possible to do the following:
curl -I -X GET http://domain.dom/path/
which curls the header of the url but overrides the used method with the GET HTTP method.
Is there something equivalent for the Python3 requests package?
Unfortunately there doesn't seem to be a clean way to do this. If the server accepts Range header, you could try requesting the bytes from 0 to 0, which nets you access to the header data but not the body. For example
import requests
url = "http://stackoverflow.com"
headers = {"Range": "bytes=0-0"}
res = requests.get(url, headers=headers)
print(res.headers)
As said, this still depends on the server implementation. For reference: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range
Based on the definition of a GET, it sounds like you could modify the request headers to include a range-request.
A client can alter the semantics of GET to be a "range request", requesting transfer of only some part(s) of the selected representation, by sending a Range header field in the request (Section 14.2).
I haven't tried this, but maybe setting a byte range of 0-1 would skip the body and you'd get the headers for free.

Get HTTP request message from Request object in Scrapy

I need to somehow extract plain HTTP request message from a Request object in Scrapy (so that I could, for example, copy/paste this request and run from Burp).
So given a scrapy.http.Request object, I would like to get the corresponding request message, such as e.g.
POST /test/demo_form.php HTTP/1.1
Host: w3schools.com
name1=value1&name2=value2
Clearly I have all the information I need in the Request object, however trying to reconstruct the message manually is error-prone as I could miss some edge cases. My understanding is that Scrapy first converts this Request into Twisted object, which then writes headers and body into a TCP transport. So maybe there's away to do something similar, but write to a string instead?
UPDATE
I could use the following code to get HTTP 1.0 request message, which is based on http.py. Is there a way to do something similar with HTTP 1.1 requests / http11.py, which is what's actually being sent? I would obviously like to avoid duplicating code from Scrapy/Twisted frameworks as much as possible.
factory = webclient.ScrapyHTTPClientFactory(request)
transport = StringTransport()
protocol = webclient.ScrapyHTTPPageGetter()
protocol.factory = factory protocol.makeConnection(transport)
request_message = transport.value()
print(request_message.decode("utf-8"))
As scrapy is open source and also has plenty of extension points, this should be doable.
The requests are finally assembled and sent out in scrapy/core/downloader/handlers/http11.py in ScrapyAgent.download_request ( https://github.com/scrapy/scrapy/blob/master/scrapy/core/downloader/handlers/http11.py#L270 )
If you place your hook there you can dump the request type, request headers, and request body.
To place your code there you can either try monkey patching ScrapyAgent.download_request or to subclass ScrapyAgent to do the request logging, then subclass HTTP11DownloadHandler to use your Scrapy Agent and then set HTTP11DownloadHandler as new DOWNLOAD_HANDLER for http / https requests in your project's settings.py (for details see: https://doc.scrapy.org/en/latest/topics/settings.html#download-handlers)
In my opinion this is the closest you can get to logging the requests going out without using a packet sniffer or a logging proxy (which might be a bit overkill for your scenario).

Documentation for Flask app object `get` and `post` class methods?

In the Flask documentation on testing (http://flask.pocoo.org/docs/testing/), it has a line of code
rv = self.app.get('/')
And below it, it mentions "By using self.app.get we can send an HTTP GET request to the application with the given path."
Where can the documentation be found for these direct access methods (I'm assuming that there's one for all of the restful methods)? Specifically, I'm wondering what sort of arguments they can take (for example, passing in data, headers, etc). Looking around on flask's documentation for a Flask object, it doesn't seem to list these methods, even though it uses them in the above example.
Alternatively, a knowledgeable individual could answer what I am trying to figure out: I'm trying to simulate sending a POST request to my server, as I would with the following line, if I were doing it over HTTP:
res = requests.post("http://localhost:%d/generate" % port,
data=json.dumps(payload),
headers={"content-type": "application/json"})
The above works when running a Flask app on the proper port. But I tried replacing it with the following:
res = self.app.post("/generate",
data=json.dumps(payload),
headers={"content-type": "application/json"})
And instead, the object I get in response is a 400 BAD REQUEST.
This is documented in the Werkzeug project, from which Flask gets the test client: Werkzeug's test client.
The test client does not issue HTTP requests, it dispatches requests internally, so there is no need to specify a port.
The documentation isn't very clear about support for JSON in the body, but it seems if you pass a string and set the content type you should be fine, so I'm not exactly sure why you get back a code 400. I would check if your /generate view function is invoked at all. A debugger should be useful to figure out where is the 400 coming from.

Inconsistent behavior with HTTP POST requests in Python

Trying to make a POST request between a Python (WSGI) and a NodeJS + Express application. They are on different servers.
The problem is that when using different IP addresses (i.e. private network vs. public network), a urllib2 request on the public network succeeds, but the same request for the private network fails with a 502 Bad Gateway or URLError [32] Broken pipe.
The urllib2 code I'm using is this:
req = urllib2.Request(url, "{'some':'data'}", {'Content-Type' : 'application/json; charset=utf-8'})
res = urllib2.urlopen(req)
print f.read()
Now, I have also coded the request like this, using requests:
r = requests.post(url, headers = {'Content-Type' : 'application/json; charset=utf-8'}, data = "{'some':'data'}")
print r.text
And get a 200 OK response. This alternate method works for both networks.
I am interested in finding out if there is some additional configuration needed for a urllib2 request that I don't know of, or if I need to look into some network configuration which might be missing (I don't believe this is the case, since the alternate request method works, but I could definitely be wrong).
Any suggestions or pointers with this will be greatly appreciated. Thanks!
The problem here is that, as Austin Phillips pointed out, urllib2.Request's constructor's data parameter:
may be a string specifying additional data to send to the server… data should be a buffer in the standard application/x-www-form-urlencoded format. The urllib.urlencode() function takes a mapping or sequence of 2-tuples and returns a string in this format.
By passing it JSON-encoded data instead of urlencoded data, you're confusing it somewhere.
However, Request has a method add_data:
Set the Request data to data. This is ignored by all handlers except HTTP handlers — and there it should be a byte string, and will change the request to be POST rather than GET.
If you use this, you should probably also use add_header rather than passing it in the constructor, although that doesn't seem to be mentioned specifically anywhere in the documentation.
So, this should work:
req = urllib2.Request(url)
req.add_data("{'some':'data'}")
req.add_header('Content-Type', 'application/json; charset=utf-8')
res = urllib2.urlopen(req)
In a comment, you said:
The reason I don't want to just switch over to requests without finding out why I'm seeing this problem is that there may be some deeper underlying issue that this points to that could come back and cause harder-to-detect problems later on.
If you want to find deep underlying issues, you're not going to do that by just looking at your client-side source. The first step to figuring out "Why does X work but Y fails?" with network code is to figure out exactly what bytes X and Y each send. Then you can try to narrow down what the relevant difference is, and then figure out what part of your code is causing Y to send the wrong data in the relevant place.
You can do this by logging things at the service (if you control it), running Wireshark, etc., but the easiest way, for simple cases, is netcat. You'll need to read man nc for your system (and, on Windows, you'll need to get and install netcat before you can run it), because the syntax is different for each version, but it's always something simple like nc -kl 12345.
Then, in your client, change the URL to use localhost:12345 in place of the hostname, and it'll connect up to netcat and send its HTTP request, which will be dumped to the terminal. You can then copy that and use nc HOST 80 and paste it to see how the real server responds, and use that to narrow down where the problem is. Or, if you get stuck, at least you can copy and paste the data to your SO question.
One last thing: This is almost certainly not relevant to your problem (because you're sending the exact same data with requests and it's working), but your data is not actually valid JSON, because it uses single quotes instead of double quotes. According to the docs, string is defined as:
string
""
" chars "
(The docs have a nice graphical representation as well.)
In general, except for really simple test cases, you don't want to write JSON by hand. In many cases (including yours), all you have to do is replace the "…" with json.dumps(…), so this isn't a serious hardship. So:
req = urllib2.Request(url)
req.add_data(json.dumps({'some':'data'}))
req.add_header('Content-Type', 'application/json; charset=utf-8')
res = urllib2.urlopen(req)
So, why is it working? Well, in JavaScript, single-quoted strings are legal, as well as other things like backslash escapes that aren't valid in JSON, and any JS code that uses restricted-eval (or, worse, raw eval) for parsing will accept it. And, because so many people got used to writing bad JSON because of this, many browsers' native JSON parsers and many JSON libraries in other languages have workarounds to allow common errors. But you shouldn't rely on that.

How do I do JSONP with python on my webspace..?

I just checked my webspace and it's signature says: Apache/2.2.9 (Debian) mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g
This give me hope that Python is somehow supported. Why is python listed twice? mod_python/3.3.1 AND Python/2.5.2 ???
There is a cgi-bin folder on my webspace.
What I want to do: I need to do a cross-site call to get some text-data from a server. The text-data is not JSON but I guess I should convert it to JSON (or is there an option to do cross-site without JSON?)
The python script gets the request for some JSONP. Depending on the request (I guess I should somehow parse the URL) the python script is to load the a requested text-data file from the webserver and wrap it in some JSON and return it.
Can somebody tell me how I do these three steps with python on my webspace?
First off, the signature isn't listing python twice. Its listing first the version of mod_python, which is an Apache web server plugin, then it is listing the version of the python interpreter on the system.
python cgi module - This is really an inefficient approach to writing python server code, but here it is. Ultimately you should consider one of the many amazing python web frameworks out there. But, using the cgi module, your response would always start with this:
print 'Content-Type: application/json\n\n'
Your python script would run on the server from an HTTP request. In that script you would check the request and determine the data you will want to serve from either the URL value or the query string.
At the very least you would just wrap your return value in a basic JSON data structure. The text data itself can just be a string:
import json
text_data = "FOO"
json_data = json.dumps({'text': text_data})
print json_data
# {"text": "FOO"}
For the JSONP aspect, you would usually check the query string to see if the request contains a specific name for the callback function the client wants, or just default to 'callback'
print "callback(%s);" % json_data
# callback({"text": "FOO"});
Returning that would be a JSONP type response, because when the client receives it, the callback is executed for the client.
And to conclude, let me add that you should be aware that python cgi scripts will need to start a brand new python interpreter process for every single request (even repeat requests from the same client). This can easily overwhelm a server under increased load. For this reason, people usually go with the wsgi route (mod_wsgi in apache). wsgi allows a persistant application to keep running, and handles ongoing requests.

Categories