Why does the following print None?
import requests
r = requests.request('http://cnn.com', data={"foo":"bar"})
print r.request.body
# None
If you change cnn.com to www.cnn.com, it prints the proper body. I noticed a redirect (there is a 301 in r.history). What's going on?
Your code as it stands doesn't actually work—it'll raise a TypeError right off the bat. But I think I can guess at what you're trying to do.
If you change that request to a post, it will indeed successfully return None.
Why? Because you're asking for the body of the redirect, not the body of the original request. For that, you want r.history[0].request.body.
Read Redirection and History for more info. Note that auto-redirecting isn't actually documented to work for POST requests, even though it often does anyway. Also note that in earlier versions of requests, history entries didn't have complete Request objects. (You'll have to look at the version history if you need to know when that changed. But it seems to be there in 1.2.0, and not in 0.14.2—and a lot of things that were added or changes in 1.0.0 aren't really documented, because it was a major rewrite.)
As a side note… why do you need this? If you really need to know what body you sent, why not do the two-step process of creating a request and sending it, so you can see the body beforehand? (Or, for that matter, just encode the data explicitly?)
Related
Short version: Can I just use the Requests module for POST, GET, and DELETE?
I'm trying to use the Pinterest REST API. (Pinterest API Explorer)
I'm going the simple route and just manually got my authentication token via oauth, so basically all I need to know how to do is POST, GET, and DELETE to a specific URL and also include the parameters, then return a json.
I really only need three API functions, list authorized user's followers (GET), follow user (POST), and unfollow user (DELETE). The only param I need for any of those is my access_token that I got manually.
It seems like a simple problem, but there's about 5 python Pinterest API wrappers, none of them complete, some of them not working at all. I've looked at the pycurl, httplib, and requests modules. They all look like they have a simple enough method for GET, but it gets more complicated with POST and maybe DELETE. It seems like it should be super simple, a function that takes a method (POST/GET/DELETE/etc), a url, and a set of parameters, so why is it more complicated than that? If it were that easy, I don't understand why all these API wrappers would be half done since it theoretically should be as simple as calling a function with those 3 parameters (with an array for the 3rd parameter) for every function in the API.
In the Requests python package, there's this function under the RequestMethods class:
def request(self, method, url, fields=None, headers=None, **urlopen_kw)
Looks like I understand everything except what the headers are and the **urlopen_kw, but I think it should work without those to variables, correct?
I'd appreciate it if someone could point me in the right direction.
From the docs:
Here is an example of doing a PUT request using Request:
import urllib.request
DATA = b'some data'
req = urllib.request.Request(url='http://localhost:8080', data=DATA,method='PUT')
with urllib.request.urlopen(req) as f:
pass
print(f.status)
print(f.reason)
In your case the method would be 'POST', 'Delete' or whatever you like.
If you want to make more complex requests, have a look at this guide for the httplib2 library - it's worth reading.
I want to make a JSON request with the Python library requests where I only obtain certain JSON objects.
I know that it is really easy to process the JSON object obtained to only focus in the needed information, but that would throttle the request efficiency (in case it is done repeatedly).
As said, I know this is a possibility:
url = 'www.foo.com'
r = requests.get(url).json()
#Do something with r[3]['data4'], the only one who is going to be used.
But how could I directly only obtain r[3]['data4'] from the request?
Short Answer
To answer your question no, you can't but to understand why you need to know what is happening behind the scenes.
Behind the scenes
When you make a request such as r = requests.get('www.foo.bar') you are making a request to the server and you are viewing the result of that request when you do r.json(). This means that you cannot just get r[3]['data'] as you are parsing what the server sends to you unless the server only sends r[3]['data']. It may be possible to filter out everything else apart from that in the response processing but I am unaware of how to do it.
You can't, if the server does not allow it. If the target server allows you to specify fields you want then you can send that field list in your request and server will return you only those fields in JSON. Otherwise your will have to parse full JSON response and get your desired fields.
I need to create software in Python which monitoring sites when changes have happened. At the moment I have periodic task and check content of site with previous version. Is there any easier way to check if content of site has been changed, maybe time of last changes, so to avoid downloading content everu time ?
You could use the HEAD HTTP method and look at the Date-Modified and ETag headers, etc. before actually downloading the full content again.
However nothing guarantees that the server will actually update these headers when the entity's (URL's) content changes, or indeed even respond properly to the HEAD method.
Altough it doesn't answer your question I think its worth to mention that you don't have to store the previous version of website to look for changes. You can just count md5 sum of it and store this sum, then count it for the new version and check if they are equal.
And about the question itself, AKX gave a great answer - just look for Date-Modified header, but remember it is not guaranteed to work.
I'm using python feedparser in an aggregator client that runs behind a squid proxy. I want it to send a cache-control: max-age=600 header in the request, so that we get a reasonably up-to-date response. (At the moment the feeds are returned by the proxy from its cache, even days after they changed, which is reasonable based on heuristic expiry but not good enough.)
There doesn't seem to be any direct api in feedparser to do this so what's the best way? I don't really want to change the source.
update: there's a bug, 224, asking for a way to add arbitrary headers, with partial patches, but not yet merged. That's probably the cleanest way. Otherwise it seems I need to monkeypatch either urllib or feedparser. ick.
It seems to me there are two ways:
1- wait for http://code.google.com/p/feedparser/issues/detail?id=224 to be fixed. I put up a patch that lets you send extra_headers={'Cache-control': 'max-age=0'} and we'll see if they accept it.
2- monkeypatch in to urllib2 to put some extra headers on the request, which seems to be the only answer without changing feedparser.
Better answers very welcome...
update 2010-10-29 patch is now merged upstream, and waiting for a release
The semantics of the argument have changed (it's called request_headers now) but there is a new release of feedparser out that should support this use case.
How can I find out the http request my python cgi received? I need different behaviors for HEAD and GET.
Thanks!
import os
if os.environ['REQUEST_METHOD'] == 'GET':
# blah
Why do you need to distinguish between GET and HEAD?
Normally you shouldn't distinguish and should treat a HEAD request just like a GET. This is because a HEAD request is meant to return the exact same headers as a GET. The only difference is there will be no response content. Just because there is no response content though doesn't mean you no longer have to return a valid Content-Length header, or other headers, which are dependent on the response content.
In mod_wsgi, which various people are pointing you at, it will actually deliberately change the request method from HEAD to GET in certain cases to guard against people who wrongly treat HEAD differently. The specific case where this is done is where an Apache output filter is registered. The reason that it is done in this case is because the output filter may expect to see the response content and from that generate additional response headers. If you were to decide not to bother to generate any response content for a HEAD request, you will deprive the output filter of the content and the headers they add may then not agree with what would be returned from a GET request. The end result of this is that you can stuff up caches and the operation of the browser.
The same can apply equally for CGI scripts behind Apache as output filters can still be added in that case as well. For CGI scripts there is nothing in place though to protect against users being stupid and doing things differently for a HEAD request.
This is not a direct answer to your question. But your question stems from doing things the wrong way.
Do not write Python CGI scripts.
Write a mod_wsgi application. Better still, use a Python web framework. There are dozens. Choose one like Werkzeug.
The WSGI standard (described in PEP 333) makes it much, much easier to find things in the web request.
The mod_wsgi implementation is faster and more secure than a CGI.
A web framework is also simpler than writing your own CGI script or mod_wsgi application.