In a shorten-er built by web2by i want to validate url's first, if it's not valid goes back to the first page with an error message. this is my code in controller (mvc arch.) but i don't get what's wrong..!!
import urllib
def index():
return dict()
def random_maker():
url = request.vars.url
try:
urllib.urlopen(url)
return dict(rand_url = ''.join(random.choice(string.ascii_uppercase +
string.digits + string.ascii_lowercase) for x in range(6)),
input_url=url)
except IOError:
return index()
Couldn't you check the http response code using httplib. If it was 200 then the page is valid, if it is anything else (like 404) or an error then it is invalid.
See this question: What’s the best way to get an HTTP response code from a URL?
Update:
Based on your comment it looks like your issue is how you are handling the error. You are only handling IOError issues. In your case you can either handle all errors singularly by switching to:
except:
return index()
You could also build your own exception handler by overriding http_default_error. See How to catch 404 error in urllib.urlretrieve for more information.
Or you can switch to urllib2 which has specific errors, You can then handle the specific errors that urllib2 throws like this:
from urllib2 import Request, urlopen, URLError
req = Request('http://jfvbhsjdfvbs.com')
try:
response = urlopen(req)
except URLError, e:
if hasattr(e, 'reason'):
print 'We failed to reach a server.'
print 'Reason: ', e.reason
elif hasattr(e, 'code'):
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
else:
print 'URL is good!'
The above code with that will return:
We failed to reach a server.
Reason: [Errno 61] Connection refused
The specifics of each exception class is contained in the urllib.error api documentation.
I am not exactly sure how to slot this into your code, because I am not sure exactly what you are trying to do, but IOError is not going to handle the exceptions thrown by urllib.
Related
I have the below flask code :
from flask import Flask,request,jsonify
import requests
from werkzeug.exceptions import InternalServerError, NotFound
import sys
import json
app = Flask(__name__)
app.config['SECRET_KEY'] = "Secret!"
class InvalidUsage(Exception):
status_code = 400
def __init__(self, message, status_code=None, payload=None):
Exception.__init__(self)
self.message = message
if status_code is not None:
self.status_code = status_code
self.payload = payload
def to_dict(self):
rv = dict(self.payload or ())
rv['message'] = self.message
rv['status_code'] = self.status_code
return rv
#app.errorhandler(InvalidUsage)
def handle_invalid_usage(error):
response = jsonify(error.to_dict())
response.status_code = error.status_code
return response
#app.route('/test',methods=["GET","POST"])
def test():
url = "https://httpbin.org/status/404"
try:
response = requests.get(url)
if response.status_code != 200:
try:
response.raise_for_status()
except requests.exceptions.HTTPError:
status = response.status_code
print status
raise InvalidUsage("An HTTP exception has been raised",status_code=status)
except requests.exceptions.RequestException as e:
print e
if __name__ == "__main__":
app.run(debug=True)
My question is how do i get the exception string(message) and other relevant params from the requests.exceptions.RequestException object e ?
Also what is the best way to log such exceptions . In case of an HTTPError exceptions i have the status code to refer to.
But requests.exceptions.RequestException catches all request exceptions . So how do i differentiate between them and also what is the best way to log them apart from using print statements.
Thanks a lot in advance for any answers.
RequestException is a base class for HTTPError, ConnectionError, Timeout, URLRequired, TooManyRedirects and others (the whole list is available at the GitHub page of requests module). Seems that the best way of dealing with each error and printing the corresponding information is by handling them starting from more specific and finishing with the most general one (the base class). This has been elaborated widely in the comments in this StackOverflow topic. For your test() method this could be:
#app.route('/test',methods=["GET","POST"])
def test():
url = "https://httpbin.org/status/404"
try:
# some code...
except requests.exceptions.ConnectionError as ece:
print("Connection Error:", ece)
except requests.exceptions.Timeout as et:
print("Timeout Error:", et)
except requests.exceptions.RequestException as e:
print("Some Ambiguous Exception:", e)
This way you can firstly catch the errors that inherit from the RequestException class and which are more specific.
And considering an alternative for printing statements - I'm not sure if that's exactly what you meant, but you can log into console or to a file with standard Python logging in Flask or with the logging module itself (here for Python 3).
This is actually not a question about using the requests library as much as it is a general Python question about how to extract the error string from an exception instance. The answer is relatively straightforward: you convert it to a string by calling str() on the exception instance. Any properly written exception handler (in requests or otherwise) would have implemented an __str__() method to allow an str() call on an instance. Example below:
import requests
rsp = requests.get('https://httpbin.org/status/404')
try:
if rsp.status_code >= 400:
rsp.raise_for_status()
except requests.exceptions.RequestException as e:
error_str = str(e)
# log 'error_str' to disk, a database, etc.
print('The error was:', error_str)
Yes, in this example, we print it, but once you have the string you have additional options. Anyway, saving this to test.py results in the following output given your test URL:
$ python3 test.py
The error was: 404 Client Error: NOT FOUND for url: https://httpbin.org/status/404
I have to make a series of requests to my localserver and check response. Basically I am trying to hit the right url by brute forcing. This is my code:
for i in range(48,126):
test = chr(i)
urln = '012a4' + test
url = {"tk" : urln}
data = urllib.urlencode(url)
print data
request = urllib2.Request("http://127.0.0.1/brute.php", data)
response = urllib2.urlopen(request)
status_code = response.getcode()
I've to make request like: http://127.0.0.1/brute.php?tk=some_val
I am getting an error because the url is not properly encoding. I am internal server error 500 even when one of the url in series should give 200. manually giving that url confirms it. Also, what is the right way to skip 500/400 errors until I get a 200?
When using urllib2 you should always handle any exceptions that are raised as follows:
import urllib, urllib2
for i in range(0x012a40, 0x12a8e):
url = {"tk" : '{:x}'.format(i)}
data = urllib.urlencode(url)
print data
try:
request = urllib2.Request("http://127.0.0.1/brute.php", data)
response = urllib2.urlopen(request)
status_code = response.getcode()
except urllib2.URLError, e:
print e.reason
This will display the following when the connection fails, and then continue to try the next connection:
[Errno 10061] No connection could be made because the target machine actively refused it
e.reason will give you the textual reason, and e.errno will give you the error code. So you could still stop if the error was something other than 10061 for example.
Lastly, you seem to be cycling through a range of numbers in hex format? You might find it easier to work directly with 0x formatting to build your strings.
It sounds like you will benefit from a try/except block:
for i in range(48,126):
test = 'chr(i)'
new urln = '012a4' + test
url = {"tk" : urln}
data = urllib.urlencode(url)
print data
request = urllib2.Request("http://127.0.0.1/brute.php", data)
try:
response = urllib2.urlopen(request)
except:
status_code = response.getcode()**strong text**
print status_code
You typically would also want to catch the error as well:
except Exception, e:
print e
Or catch specific errors only, for example:
except ValueError:
#do stuff
Though you wouldn't get a ValueError in your code.
so I want to check if a URL is reachable from python, and I got this code from googling:
def checkUrl(url):
p = urlparse(url)
conn = http.client.HTTPConnection(p.netloc)
conn.request('HEAD', p.path)
resp = conn.getresponse()
return resp.status < 400
Here is my URL: https://eurotableau.nomisonline.com.
It works fine if I just pass that in to the function. The resp.status is 302. However, if I add a port 443 at the end of it, https://eurotableau.nomisonline.com:443, it returns false. The resp.status is 400. I tried both URL in google Chrome, both of them work. So my question is why is this happening? Anyway I can include the port value and still get valid resp.status value (< 400)? Thanks.
Use http.client.HTTPSConnection instead. The plain old HTTPConnection ignores the protocol that is part of the URL.
If you do not require the HEAD method but just wish to check if host is available then why not do:
from urllib2 import urlopen
try:
u = urlopen("https://eurotableau.nomisonline.com")
u.close()
print "Everything fine!"
except Exception, e:
if hasattr(e, "code"):
print "Server is there but something is wrong with rest of URL"
else: print "Server is on vacations or was never there!"
print e
This will establish a connection with server but it won't download any data unless you read it. It'll only read few KB to get the header (like when using HEAD method) and wait for you to request more. But you will close it there.
So, you can catch an exception and see what the problem is, or if there is no exception, just close the connection.
urllib2 will handle HTTPS and protocol://user#URL:PORT for you neatly.
No worries about anything.
Why can I not change method to PUT. Can I change to PUT without too many code changes?
Here is my code:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
#code to change method to PUT
opener.get_method = lambda: 'PUT'
print "now using method:", meth # prints now using PUT
try:
r = opener.open("http://the_url")
except urllib2.HTTPError as e:
if hasattr(e, 'code'):
report += "HTTP error status " + str(e.code) + " FAIL\n"
if hasattr(e, 'reason'):
print "HTTP Error reason " + e.reason
else:
report += "HTTP error occurred FAIL\n"
But I get runtime error
HTTP Error reason Request method 'POST' not supported
PUT session test
HTTP error status 405 FAIL
It seems urllib2 only supports GET and POST. I decided to use Apache Requests lib instead.
The opener.get_method = lambda: 'PUT' is some code I found on the web. It doesn't actually change the verb used to send the request, even though if you get_method it will reply with whatever you changed it to.
For example, in my case, because request contained data (not actually shown in example above) it sends a POST.
i'm dealing with HTTPS and i want to get HTTP header for live.com
import urllib2
try:
email="HelloWorld1234560#hotmail.com"
response = urllib2.urlopen("https://signup.live.com/checkavail.aspx?chkavail="+email+"&tk=1258056184535&ru=http%3a%2f%2fmail.live.com%2f%3frru%3dinbox&wa=wsignin1.0&rpsnv=11&ct=1258055283&rver=6.0.5285.0&wp=MBI&wreply=http:%2F%2Fmail.live.com%2Fdefault.aspx&lc=1036&id=64855&bk=1258055288&rollrs=12&lic=1")
print 'response headers: "%s"' % response.info()
except IOError, e:
if hasattr(e, 'code'): # HTTPError
print 'http error code: ', e.code
elif hasattr(e, 'reason'): # URLError
print "can't connect, reason: ", e.reason
else:
raise
so i don't want all the information from headers i just want Set-Cookie information
if you asking what is script do : it's for checking if email avilable to use in hotmail by get the amount from this viralbe CheckAvail=
after edit
thanks for help .. after fixing get only Set-Cookie i got problem it's when i get cookie not get CheckAvil= i got a lot information without `CheckAvil= after open it in browser and open the source i got it !! see the picture
The object returned by response.info() is an instance of mimetools.Message (as described by the urllib2 docs), which is a subclass of rfc822.Message, which has a getheader() method.
So you can do the following:
response = urllib2.urlopen("...")
print response.info().getheader("Set-Cookie") # get the value of the Set-Cookie header
However, if you are checking for mail, I would recommend you to use POP3 or IMAP if available (Python comes with modules for both).
It's because 'Httponly' in your http response, which meant, because of the specifications, only http connection can view.