i'm dealing with HTTPS and i want to get HTTP header for live.com
import urllib2
try:
email="HelloWorld1234560#hotmail.com"
response = urllib2.urlopen("https://signup.live.com/checkavail.aspx?chkavail="+email+"&tk=1258056184535&ru=http%3a%2f%2fmail.live.com%2f%3frru%3dinbox&wa=wsignin1.0&rpsnv=11&ct=1258055283&rver=6.0.5285.0&wp=MBI&wreply=http:%2F%2Fmail.live.com%2Fdefault.aspx&lc=1036&id=64855&bk=1258055288&rollrs=12&lic=1")
print 'response headers: "%s"' % response.info()
except IOError, e:
if hasattr(e, 'code'): # HTTPError
print 'http error code: ', e.code
elif hasattr(e, 'reason'): # URLError
print "can't connect, reason: ", e.reason
else:
raise
so i don't want all the information from headers i just want Set-Cookie information
if you asking what is script do : it's for checking if email avilable to use in hotmail by get the amount from this viralbe CheckAvail=
after edit
thanks for help .. after fixing get only Set-Cookie i got problem it's when i get cookie not get CheckAvil= i got a lot information without `CheckAvil= after open it in browser and open the source i got it !! see the picture
The object returned by response.info() is an instance of mimetools.Message (as described by the urllib2 docs), which is a subclass of rfc822.Message, which has a getheader() method.
So you can do the following:
response = urllib2.urlopen("...")
print response.info().getheader("Set-Cookie") # get the value of the Set-Cookie header
However, if you are checking for mail, I would recommend you to use POP3 or IMAP if available (Python comes with modules for both).
It's because 'Httponly' in your http response, which meant, because of the specifications, only http connection can view.
Related
Need to capture the response body for a HTTP error in python. Currently using the python request module's raise_for_status(). This method only returns the Status Code and description. Need a way to capture the response body for a detailed error log.
Please suggest alternatives to python requests module if similar required feature is present in some different module. If not then please suggest what changes can be done to existing code to capture the said response body.
Current implementation contains just the following:
resp.raise_for_status()
I guess I'll write this up quickly. This is working fine for me:
try:
r = requests.get('https://www.google.com/404')
r.raise_for_status()
except requests.exceptions.HTTPError as err:
print(err.request.url)
print(err)
print(err.response.text)
you can do something like below, which returns the content of the response, in unicode.
response.text
or
try:
r = requests.get('http://www.google.com/nothere')
r.raise_for_status()
except requests.exceptions.HTTPError as err:
print(err)
sys.exit(1)
# 404 Client Error: Not Found for url: http://www.google.com/nothere
here you'll get the full explanation on how to handle the exception. please check out Correct way to try/except using Python requests module?
You can log resp.text if resp.status_code >= 400.
There are some tools you may pick up such as Fiddler, Charles, wireshark.
However, those tools can just display the body of the response without including the reason or error stack why the error raises.
so I want to check if a URL is reachable from python, and I got this code from googling:
def checkUrl(url):
p = urlparse(url)
conn = http.client.HTTPConnection(p.netloc)
conn.request('HEAD', p.path)
resp = conn.getresponse()
return resp.status < 400
Here is my URL: https://eurotableau.nomisonline.com.
It works fine if I just pass that in to the function. The resp.status is 302. However, if I add a port 443 at the end of it, https://eurotableau.nomisonline.com:443, it returns false. The resp.status is 400. I tried both URL in google Chrome, both of them work. So my question is why is this happening? Anyway I can include the port value and still get valid resp.status value (< 400)? Thanks.
Use http.client.HTTPSConnection instead. The plain old HTTPConnection ignores the protocol that is part of the URL.
If you do not require the HEAD method but just wish to check if host is available then why not do:
from urllib2 import urlopen
try:
u = urlopen("https://eurotableau.nomisonline.com")
u.close()
print "Everything fine!"
except Exception, e:
if hasattr(e, "code"):
print "Server is there but something is wrong with rest of URL"
else: print "Server is on vacations or was never there!"
print e
This will establish a connection with server but it won't download any data unless you read it. It'll only read few KB to get the header (like when using HEAD method) and wait for you to request more. But you will close it there.
So, you can catch an exception and see what the problem is, or if there is no exception, just close the connection.
urllib2 will handle HTTPS and protocol://user#URL:PORT for you neatly.
No worries about anything.
Why can I not change method to PUT. Can I change to PUT without too many code changes?
Here is my code:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
#code to change method to PUT
opener.get_method = lambda: 'PUT'
print "now using method:", meth # prints now using PUT
try:
r = opener.open("http://the_url")
except urllib2.HTTPError as e:
if hasattr(e, 'code'):
report += "HTTP error status " + str(e.code) + " FAIL\n"
if hasattr(e, 'reason'):
print "HTTP Error reason " + e.reason
else:
report += "HTTP error occurred FAIL\n"
But I get runtime error
HTTP Error reason Request method 'POST' not supported
PUT session test
HTTP error status 405 FAIL
It seems urllib2 only supports GET and POST. I decided to use Apache Requests lib instead.
The opener.get_method = lambda: 'PUT' is some code I found on the web. It doesn't actually change the verb used to send the request, even though if you get_method it will reply with whatever you changed it to.
For example, in my case, because request contained data (not actually shown in example above) it sends a POST.
In a shorten-er built by web2by i want to validate url's first, if it's not valid goes back to the first page with an error message. this is my code in controller (mvc arch.) but i don't get what's wrong..!!
import urllib
def index():
return dict()
def random_maker():
url = request.vars.url
try:
urllib.urlopen(url)
return dict(rand_url = ''.join(random.choice(string.ascii_uppercase +
string.digits + string.ascii_lowercase) for x in range(6)),
input_url=url)
except IOError:
return index()
Couldn't you check the http response code using httplib. If it was 200 then the page is valid, if it is anything else (like 404) or an error then it is invalid.
See this question: What’s the best way to get an HTTP response code from a URL?
Update:
Based on your comment it looks like your issue is how you are handling the error. You are only handling IOError issues. In your case you can either handle all errors singularly by switching to:
except:
return index()
You could also build your own exception handler by overriding http_default_error. See How to catch 404 error in urllib.urlretrieve for more information.
Or you can switch to urllib2 which has specific errors, You can then handle the specific errors that urllib2 throws like this:
from urllib2 import Request, urlopen, URLError
req = Request('http://jfvbhsjdfvbs.com')
try:
response = urlopen(req)
except URLError, e:
if hasattr(e, 'reason'):
print 'We failed to reach a server.'
print 'Reason: ', e.reason
elif hasattr(e, 'code'):
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
else:
print 'URL is good!'
The above code with that will return:
We failed to reach a server.
Reason: [Errno 61] Connection refused
The specifics of each exception class is contained in the urllib.error api documentation.
I am not exactly sure how to slot this into your code, because I am not sure exactly what you are trying to do, but IOError is not going to handle the exceptions thrown by urllib.
I'm trying to write a small program that will simply display the header information of a website. Here is the code:
import urllib2
url = 'http://some.ip.add.ress/'
request = urllib2.Request(url)
try:
html = urllib2.urlopen(request)
except urllib2.URLError, e:
print e.code
else:
print html.info()
If 'some.ip.add.ress' is google.com then the header information is returned without a problem. However if it's an ip address that requires basic authentication before access then it returns a 401. Is there a way to get header (or any other) information without authentication?
I've worked it out.
After try has failed due to unauthorized access the following modification will print the header information:
print e.info()
instead of:
print e.code()
Thanks for looking :)
If you want just the headers, instead of using urllib2, you should go lower level and use httplib
import httplib
conn = httplib.HTTPConnection(host)
conn.request("HEAD", path)
print conn.getresponse().getheaders()
If all you want are HTTP headers then you should make HEAD not GET request. You can see how to do this by reading Python - HEAD request with urllib2.