I'm using python feedparser in an aggregator client that runs behind a squid proxy. I want it to send a cache-control: max-age=600 header in the request, so that we get a reasonably up-to-date response. (At the moment the feeds are returned by the proxy from its cache, even days after they changed, which is reasonable based on heuristic expiry but not good enough.)
There doesn't seem to be any direct api in feedparser to do this so what's the best way? I don't really want to change the source.
update: there's a bug, 224, asking for a way to add arbitrary headers, with partial patches, but not yet merged. That's probably the cleanest way. Otherwise it seems I need to monkeypatch either urllib or feedparser. ick.
It seems to me there are two ways:
1- wait for http://code.google.com/p/feedparser/issues/detail?id=224 to be fixed. I put up a patch that lets you send extra_headers={'Cache-control': 'max-age=0'} and we'll see if they accept it.
2- monkeypatch in to urllib2 to put some extra headers on the request, which seems to be the only answer without changing feedparser.
Better answers very welcome...
update 2010-10-29 patch is now merged upstream, and waiting for a release
The semantics of the argument have changed (it's called request_headers now) but there is a new release of feedparser out that should support this use case.
Related
Ok so I am new to python as a whole. That said I don't know what I'm really looking for to ask this question properly. I know this has to be possible though. I want to ask this before really digging in and finding out I did something wrong and have to do it all over.
All in all what I want to know is, from the front end of my stack I want to pass down custom HTTP headers (which I can do with my Ajax calls, currently). The question is how do I actually read said headers? Similarly how can I pass back up from the server custom headers via python.
You can access custom header in Django view:
request.META.get("Custom_Header")
For django:
How to add an HTTP header to all Django responses
You can definitely do it in the front end. You can do it with Javascript's native XMLHttpRequest, the newer fetch API, jQuery, or some other library (like axios).
Short version: Can I just use the Requests module for POST, GET, and DELETE?
I'm trying to use the Pinterest REST API. (Pinterest API Explorer)
I'm going the simple route and just manually got my authentication token via oauth, so basically all I need to know how to do is POST, GET, and DELETE to a specific URL and also include the parameters, then return a json.
I really only need three API functions, list authorized user's followers (GET), follow user (POST), and unfollow user (DELETE). The only param I need for any of those is my access_token that I got manually.
It seems like a simple problem, but there's about 5 python Pinterest API wrappers, none of them complete, some of them not working at all. I've looked at the pycurl, httplib, and requests modules. They all look like they have a simple enough method for GET, but it gets more complicated with POST and maybe DELETE. It seems like it should be super simple, a function that takes a method (POST/GET/DELETE/etc), a url, and a set of parameters, so why is it more complicated than that? If it were that easy, I don't understand why all these API wrappers would be half done since it theoretically should be as simple as calling a function with those 3 parameters (with an array for the 3rd parameter) for every function in the API.
In the Requests python package, there's this function under the RequestMethods class:
def request(self, method, url, fields=None, headers=None, **urlopen_kw)
Looks like I understand everything except what the headers are and the **urlopen_kw, but I think it should work without those to variables, correct?
I'd appreciate it if someone could point me in the right direction.
From the docs:
Here is an example of doing a PUT request using Request:
import urllib.request
DATA = b'some data'
req = urllib.request.Request(url='http://localhost:8080', data=DATA,method='PUT')
with urllib.request.urlopen(req) as f:
pass
print(f.status)
print(f.reason)
In your case the method would be 'POST', 'Delete' or whatever you like.
If you want to make more complex requests, have a look at this guide for the httplib2 library - it's worth reading.
Why does the following print None?
import requests
r = requests.request('http://cnn.com', data={"foo":"bar"})
print r.request.body
# None
If you change cnn.com to www.cnn.com, it prints the proper body. I noticed a redirect (there is a 301 in r.history). What's going on?
Your code as it stands doesn't actually work—it'll raise a TypeError right off the bat. But I think I can guess at what you're trying to do.
If you change that request to a post, it will indeed successfully return None.
Why? Because you're asking for the body of the redirect, not the body of the original request. For that, you want r.history[0].request.body.
Read Redirection and History for more info. Note that auto-redirecting isn't actually documented to work for POST requests, even though it often does anyway. Also note that in earlier versions of requests, history entries didn't have complete Request objects. (You'll have to look at the version history if you need to know when that changed. But it seems to be there in 1.2.0, and not in 0.14.2—and a lot of things that were added or changes in 1.0.0 aren't really documented, because it was a major rewrite.)
As a side note… why do you need this? If you really need to know what body you sent, why not do the two-step process of creating a request and sending it, so you can see the body beforehand? (Or, for that matter, just encode the data explicitly?)
I'm wondering if there's a clever pattern for request-scoping arbitrary information without resorting to either TLS or putting the information in the session.
Really, this would be for contextual attributes that I'd like to not look up more than once in a request path, but which are tied to a request invocation and there's no good reason to let them thresh around in the session.
Something like a dict that's pinned to the request where I can shove things or lazy load them. I could write a wrapper for request and swap it out in a middleware, but I figured I'd check to see what best-practice might be here?
Just assign the dictionary directly to the request. You can do that in middleware or in your view, as you like.
Context processors. They are called once for every request and receive the actual request object - so you can add ANY data to the context, also based on the curent request!
How can I find out the http request my python cgi received? I need different behaviors for HEAD and GET.
Thanks!
import os
if os.environ['REQUEST_METHOD'] == 'GET':
# blah
Why do you need to distinguish between GET and HEAD?
Normally you shouldn't distinguish and should treat a HEAD request just like a GET. This is because a HEAD request is meant to return the exact same headers as a GET. The only difference is there will be no response content. Just because there is no response content though doesn't mean you no longer have to return a valid Content-Length header, or other headers, which are dependent on the response content.
In mod_wsgi, which various people are pointing you at, it will actually deliberately change the request method from HEAD to GET in certain cases to guard against people who wrongly treat HEAD differently. The specific case where this is done is where an Apache output filter is registered. The reason that it is done in this case is because the output filter may expect to see the response content and from that generate additional response headers. If you were to decide not to bother to generate any response content for a HEAD request, you will deprive the output filter of the content and the headers they add may then not agree with what would be returned from a GET request. The end result of this is that you can stuff up caches and the operation of the browser.
The same can apply equally for CGI scripts behind Apache as output filters can still be added in that case as well. For CGI scripts there is nothing in place though to protect against users being stupid and doing things differently for a HEAD request.
This is not a direct answer to your question. But your question stems from doing things the wrong way.
Do not write Python CGI scripts.
Write a mod_wsgi application. Better still, use a Python web framework. There are dozens. Choose one like Werkzeug.
The WSGI standard (described in PEP 333) makes it much, much easier to find things in the web request.
The mod_wsgi implementation is faster and more secure than a CGI.
A web framework is also simpler than writing your own CGI script or mod_wsgi application.