I have a simple web.py code like below, deployed with mod_wsgi in apache.
import web
urls = (
'/', 'index'
)
class index:
def GET(self):
content = 'hello'
web.header('Content-length', len(content))
return content
app = web.application(urls, globals())
application = app.wsgifunc()
This website runs well, except one minor issue. When mod_deflate is turn on, the response is chunked, even it has a very small response body.
Response Header
HTTP/1.1 200 OK
Date: Wed, 20 May 2015 20:14:12 GMT
Server: Apache/2.4.7 (Ubuntu)
Vary: Accept-Encoding
Content-Encoding: gzip
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html
When mod_deflate is turn off, Content-Length header is back.
HTTP/1.1 200 OK
Date: Wed, 20 May 2015 20:30:09 GMT
Server: Apache/2.4.7 (Ubuntu)
Content-Length: 5
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=utf-8
I've searched around and someone said reduce the DeflateBufferSize will help, but this response's size is only 5, far from it's default value: 8096, so I don't think it interferes with this issue.
And someone said apache send chunked response because it doesn't know the response's size before begin to send the response to client, but in my code, I do set Content-Length.
I've also tried Flask and Apache/2.2.15 (CentOS), same result.
How do I set content-length when deflate module is enabled? and I don't like to gzip content in python.
The response Content-Length has to reflect the final length of the data sent after the compression has been done, not the original length. Thus mod_deflate has to remove the original Content-Length header and use chunked transfer encoding. The only way it could otherwise know the content length to be able to send the Content-Length before sending the compressed data, would be to buffer up the complete compressed response in memory or into a file and then calculate the length. Buffering all the compressed content isn't practical and in part defeats the point of compressing the data as the response is streamed.
If you don't want mod_deflate enabled for the whole site, then only enable it for certain URL prefixes by scoping it within a Location block.
Related
I am trying to create simple API for my site. I created the route with flask:
#api.route('/api/rate&message_id=<message_id>&performer=<performer_login>', methods=['POST'])
def api_rate_msg(message_id, performer_login):
print("RATE API ", message_id, ' ', performer_id)
return 400
print(...) function don't execute...
I use flask-socketio to communicate between client and server.
I send json from client and process it with:
#socket.on('rate')
def handle_rate(data):
print(data)
payload = {'message_id':data['message_id'], 'performer':data['performer']}
r = requests.post('/api/rate', params=payload)
print (r.status_code)
Note, that data variable is sending from client and is correct(I've checked it).
print(r.status_code) don't exec too...
Where I'm wrong? Please, sorry for my bad english :(
This api function must increase rate of message, which stored in mongodb, if interesting.
Don't put &message_id=<message_id>&performer=<performer_login> in your route string. Instead, get these arguments from request.args.
Try it:
from flask import request
...
#api.route('/api/rate', methods=['POST'])
def api_rate_msg():
print(request.args)
return ''
I've tested it with httpie:
$ http -v POST :5000/api/rate message_id==123 performer_login==foo
POST /api/rate?message_id=123&performer_login=foo HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 0
Host: localhost:5000
User-Agent: HTTPie/0.9.8
HTTP/1.0 200 OK
Content-Length: 0
Content-Type: text/html; charset=utf-8
Date: Sun, 02 Apr 2017 13:54:40 GMT
Server: Werkzeug/0.11.11 Python/2.7.13
And from flask's log:
ImmutableMultiDict([('message_id', u'123'), ('performer_login', u'foo')])
127.0.0.1 - - [02/Apr/2017 22:54:40] "POST /api/rate?message_id=123&performer_login=foo HTTP/1.1" 200 -
Remove the below part from your api route
&message_id=<message_id>&performer=<performer_login
This is not required in POST request. It helps in GET requests. API call in request is not matching the route definition and therefore you have the current problem
I am trying to perform a simple action:
POST to a URL
Return HTTP 303 (SeeOther)
GET from new URL
From what I can tell, this is a pretty standard practice:
http://en.wikipedia.org/wiki/Post/Redirect/Get
Also, it would seem that SeeOther is designed to work this way:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.4
I'm using web.py as my server-side controller, but I suspect that it's not the issue. If I GET, SeeOther works flawlessly as expected. If I POST to the same URL, the browser fails to redirect or load anything at all.
Thinking it was a browser issue, I tried both IE9 and Google Chrome (v23 ish). Both have the same issue.
Thinking web.py might be serving the page incorrectly, or generating a bad URL, I used telnet to examine the headers. I found this:
HTTP GET (this works in the browser):
GET /Users/1 HTTP/1.1
HOST: domain.com
HTTP/1.1 303 See Other
Date: Mon, 24 Dec 2012 18:07:55 GMT
Server: Apache/2
Cache-control: no-cache
Location: http://domain.com/Users
Content-Length: 0
Content-Type: text/html
HTTP POST (this does not work in the browser):
POST /Users/1 HTTP/1.1
HOST: domain.com
HTTP/1.1 303 See Other
Date: Mon, 24 Dec 2012 18:12:35 GMT
Server: Apache/2
Cache-control: no-cache
Location: http://domain.com/Users
Content-Length: 0
Content-Type: text/html
Another thing that could be throwing a wrench in the works:
I'm using mod-rewrite so that the user-visible domain.com/Users/1 is actually domain.com/control.py/Users/1
There may be more information/troubleshooting that I have, but I'm drawing a blank right now.
The Question:
Why does this work with a GET request, but not a POST request? Am I missing a response header somewhere?
EDIT:
Using IE9 Developer Tools and Chrome's Inspector, it looks like the 303 isn't coming back to the browser after a POST. However, I can see the 303 come in when I do a GET request.
However, after looking more closely at Chrome's Inspector, I saw the ability to log every request (don't clear w/ each page call). This allowed me to see that for some reason, my POST request looks like it's failing. Again - GET works just fine.
It's entirely possible that this isn't your issue, but since you don't have your code posted I'll take a shot (just in case).
Since you're using web.py, do you have the POST method defined on your object?
i.e.
urls = (
'/page', 'page'
)
class page:
def POST(self):
# Do something
def GET(self):
# Do something else
I have a strange problem i've been trying to 'google-out' for several hours.
I've tried also solutions from similar topics on stack but still wiht no positive result:
How do I set cookies using Python urlopen?
Handling rss redirects with Python/urllib2
So the case is that i want to download whole set of articles form some webpage. Its sub-links with proper content differ with just one number, so I loop for whole range ( 1 to 400 000 ) and write html's to files. Whats importatnt here is this webpage need the cookies to be re-send in order to get to proper url, and after a lecture of How to use Python to login to a webpage and retrieve cookies for later usage? i have this done.
But some times my script returns error:
response = meth(req, response)
File "/usr/lib/python3.1/urllib/request.py", line 468, in http_response
'http', request, response, code, msg, hdrs)
....
File "/usr/lib/python3.1/urllib/request.py", line 553, in http_error_302 self.inf_msg + msg, headers, fp)
urllib.error.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Found
This problem is hard to reproduce because script generally works fine but it happens randomly after a few thousands of 'for loops'.
Here is curl ouptut from server:
$ curl -I "http://my.url/"
HTTP/1.1 200 OK
Date: Wed, 17 Oct 2012 10:14:13 GMT
Server: Apache/2.2.15 (Oracle)
X-Powered-By: PHP/5.3.3
Set-Cookie: Kuuxk=ae7s3isu2cEshhijte4nb1clk5; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Vary: Accept-Encoding
Connection: close
Content-Type: text/html; charset=UTF-8
Some folks suggested to use mechanize or try to catch exception but i have no clue how to do this, others said that error is caused by wrong cookie handling but i tried also to get and send cookies 'manually' using urllib2 and add_header('cookie', cookie) with similar result.
I wonder if my for loop and mabey to short sleep might cause script to fail some times..
Anwyay - any help appreciated.
edit:
In case this might work - how to catch the exception and try ignore it ?
edit:
Solved by simply ignoring this error. No everything goes fine.
I used
try:
#here open url
except any_HTTPError:
pass
On each time i use url.open instruction.
TO BE CLOSED.
Let me suggest another solution:
HTTP status code 302 means Found redirection (See: https://en.wikipedia.org/wiki/HTTP_302).
For example:
HTTP/1.1 302 Found
Location: http://www.iana.org/domains/example/
You can grab the Location header and try fetching this url.
There are 8 redirection status codes (301-308). You can for Location header if 301 <= status code <= 308.
I just want a better idea of what's going on here, I can of course "work around" the problem by using urllib2.
import urllib
import urllib2
url = "http://www.crutchfield.com/S-pqvJFyfA8KG/p_15410415/Dynamat-10415-Xtreme-Speaker-Kit.html"
# urllib2 works fine (foo.headers / foo.read() also behave)
foo = urllib2.urlopen(url)
# urllib throws errors though, what specifically is causing this?
bar = urllib.urlopen(url)
http://pae.st/AxDW/ shows this code in action with the exception/stacktrace. foo.headers and foo.read() work fine
stu#sente.cc ~ $: curl -I "http://www.crutchfield.com/S-pqvJFyfA8KG/p_15410415/Dynamat-10415-Xtreme-Speaker-Kit.html"
HTTP/1.1 302 Object Moved
Cache-Control: private
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
Location: /S-FSTWJcduy5w/p_15410415/Dynamat-10415-Xtreme-Speaker-Kit.html
Server: Microsoft-IIS/7.5
Set-Cookie: SESSIONID=FSTWJcduy5w; domain=.crutchfield.com; expires=Fri, 22-Feb-2013 22:06:43 GMT; path=/
Set-Cookie: SYSTEMID=0; domain=.crutchfield.com; expires=Fri, 22-Feb-2013 22:06:43 GMT; path=/
Set-Cookie: SESSIONDATE=02/23/2012 17:07:00; domain=.crutchfield.com; expires=Fri, 22-Feb-2013 22:06:43 GMT; path=/
X-AspNet-Version: 4.0.30319
HostName: cws105
Date: Thu, 23 Feb 2012 22:06:43 GMT
Thanks.
This server is both non-deterministic and sensitive to HTTP version. urllib2 is HTTP/1.1, urllib is HTTP/1.0. You can reproduce this by running curl --http1.0 -I "http://www.crutchfield.com/S-pqvJFyfA8KG/p_15410415/Dynamat-10415-Xtreme-Speaker-Kit.html"
a few times in a row. You should see the output curl: (52) Empty reply from server occasionally; that's the error urllib is reporting. (If you re-issue the request a bunch of times with urllib, it should succeed sometimes.)
I solved the Problem. I simply using now the urrlib instead of urllib2 and anything works fine thank you all :)
I wrote a crawler in python, fetched urls has different types: it can be url with html and url with image or big archives or other files. So i need fast determine this case to prevent of reading of big files such as big archives and continue crawling. How is the best way to determine url type at start of page loading?
i understand what i can do it by url name (end's with .rar .jpg etc) but i think it's not full solution. I need check header or something like that for this? also i need some page size predicition to prevent of large downloads. In other words set limit of downloaded page size, to prevent fast memory eating.
If you use a HTTP HEAD request on the resource, you will get relevant metadata on the resource without the resource data itself. Specifically, the content-length and content-type headers will be of interest.
E.g.
HEAD /stackoverflow/img/favicon.ico HTTP/1.1
host: sstatic.net
HTTP/1.1 200 OK
Cache-Control: max-age=604800
Content-Length: 1150
Content-Type: image/x-icon
Last-Modified: Mon, 02 Aug 2010 06:04:04 GMT
Accept-Ranges: bytes
ETag: "2187d82832cb1:0"
X-Powered-By: ASP.NET
Date: Sun, 12 Sep 2010 13:38:36 GMT
You can do this in python using httplib:
>>> import httplib
>>> conn = httplib.HTTPConnection("sstatic.net")
>>> conn.request("HEAD", "/stackoverflow/img/favicon.ico")
>>> res = conn.getresponse()
>>> print res.getheaders()
[('content-length', '1150'), ('x-powered-by', 'ASP.NET'), ('accept-ranges', 'bytes'), ('last-modified', 'Mon, 02 Aug 2010 06:04:04 GMT'), ('etag', '"2187d82832cb1:0"'), ('cache-control', 'max-age=604800'), ('date', 'Sun, 12 Sep 2010 13:39:26 GMT'), ('content-type', 'image/x-icon')]
This tells you it's an image (image/* mime-type) of 1150 bytes. Enough information for you to decide if you want to fetch the full resource.
Additionally, this header tells you the server accepts HTTP partial content request (accept-ranges header) which allows you to retrieve data in batches.
You will get the same header information if you do a GET directly, but this will also start sending the resource data in the body of the response, something you want to avoid.
If you want to learn more about HTTP headers and their meaning, you can use an online tool such as 'Fetch'