I have a server setup for testing, with a self-signed certificate, and want to be able to test towards it.
How do you ignore SSL verification in the Python 3 version of urlopen?
All information I found regarding this is regarding urllib2 or Python 2 in general.
urllib in python 3 has changed from urllib2:
Python 2, urllib2: urllib2.urlopen(url[, data[, timeout[, cafile[, capath[, cadefault[, context]]]]])
https://docs.python.org/2/library/urllib2.html#urllib2.urlopen
Python 3: urllib.request.urlopen(url[, data][, timeout])
https://docs.python.org/3.0/library/urllib.request.html?highlight=urllib#urllib.request.urlopen
So I know this can be done in Python 2 in the following way. However Python 3 urlopen is missing the context parameter.
import urllib2
import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
urllib2.urlopen("https://your-test-server.local", context=ctx)
And yes I know this is a bad idea. This is only meant for testing on a private server.
I could not find how this is supposed to be done in the Python 3 documentation, or in any other question. Even the ones explicitly mentioning Python 3, still had a solution for urllib2/Python 2.
The accepted answer just gave advise to use python 3.5+, instead of direct answer. It causes confusion.
For someone looking for a direct answer, here it is:
import ssl
import urllib.request
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
with urllib.request.urlopen(url_string, context=ctx) as f:
f.read(300)
Alternatively, if you use requests library, it has much better API:
import requests
with open(file_name, 'wb') as f:
resp = requests.get(url_string, verify=False)
f.write(resp.content)
The answer is copied from this post (thanks #falsetru): How do I disable the ssl check in python 3.x?
These two questions should be merged.
Python 3.0 to 3.3 does not have context parameter, It was added in Python 3.4. So, you can update your Python version to 3.5 to use context.
Related
Is there a difference between those two bs4 objects?
from urllib2 import urlopen, Request
from bs4 import BeautifulSoup
req1 = Request("https://stackoverflow.com/") # HTTPS
html1 = urlopen(req1).read()
req2 = Request("http://stackoverflow.com/") # HTTP
html2 = urlopen(req2).read()
bsObj1 = BeautifulSoup(html1, "html.parser")
bsObj2 = BeautifulSoup(html2, "html.parser")
Do you really need to specify an HTTP protocol?
Here's my limited understanding: There isn't a practical difference in this case.
My understanding is that most websites that have https will redirect http URLs to https, as is the case here. It's possible for a site to have an http version and an https version up simultaneously, in which case they might not redirect. This would be bad practice, but nothing is stopping someone from doing it.
I would still explicitly use https whenever possible, just as a best practice.
All communication over the HTTP protocol happens using HTTP verbs GET, POST, PUT, DELETE. Specifying the protocol has two purposes:
1) It specifies the scheme for data communication.
A general URI is of the form:
scheme:[//[user[:password]#]host[:port]][/path][?query][#fragment] and common schemes are http(s), ftp, mailto, file, data, and irc.
2) It specifies if the scheme supports SSL encryption:
With http schemes, the added 's' in https ensures SSL encryption of data.
According to urllib3 Python docs:
It is highly recommended to always use SSL certificate verification.In order to enable verification you will need a set of root certificates. The easiest and most reliable method is to use the certifi package which provides Mozilla’s root certificate bundle:
pip install certifi
>>> import certifi
>>> import urllib3
>>> http = urllib3.PoolManager(
... cert_reqs='CERT_REQUIRED',
... ca_certs=certifi.where())
The PoolManager will automatically handle certificate verification and will raise SSLError if verification fails:
>>> http.request('GET', 'https://google.com')
(No exception)
>>> http.request('GET', 'https://expired.badssl.com')
urllib3.exceptions.SSLError ...
I have a code as below :
headers = {'content-type': 'ContentType.APPLICATION_XML'}
uri = "www.client.url.com/hit-here/"
clientCert = "path/to/cert/abc.crt"
clientKey = "path/to/key/abc.key"
PROTOCOL = ssl.PROTOCOL_TLSv1
context = ssl.SSLContext(PROTOCOL)
context.load_default_certs()
context.load_cert_chain(clientCert, clientKey)
conn = httplib.HTTPSConnection(uri, some_port, context=context)
I am not really a network programmer, so i did some googling for handshake connection and found ssl.SSLContext(PROTOCOL) as the needed function, code works fine.
Then i hit the roadblock, my local has version 2.7.10 but all the production boxes have 2.7.3 with them, so SSLContext is not supported and upgrading python version is not an option / in control.
I tried reading ssl — SSL wrapper for socket objects but couldn't make sense out of it.
what i tried (in vain) :
s_ = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s = ssl.wrap_socket(s_, keyfile=clientKey, certfile=clientCert, cert_reqs=ssl.CERT_REQUIRED)
new_conn = s.connect((uri, some_port))
but returns :
SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)')
Question - how to generate SSL Context on older version so as to have a secure https connection?
You have to specify the ca_certs file (which should point to trust store)
I've got the perfect solution using the requests library. The requests library has got to be my favorite library I've ever used, cause it takes something in Python that is inherently difficult to do -- SSL and REST requests -- and makes it unbelievably simple. I checked out their version support and Python 2.6+ is supported.
Here is an example of how to use their library.
>>> requests.get(uri)
And that is all you have to do. The requests library takes care of establishing a ssl connection.
Taking this one step farther. If you need to persist cookies between requests, you can do so like this.
>>> sess = requests.Session()
>>> credentials = {"username": "user",
"password": "pass"}
>>> sess.post("https://some-website/login", params=credentials)
<Response [200]>
>>> sess.get("https://some-website/a-backend-page").text
<html> the backend page... </html>
Edit: If you need to, you can also pass in the path to the certificate and the key like so requests.get(uri, cert=('path/to/cert/abc.crt', 'path/to/key/abc.key'))
Now hopefully you can convince them to install the requests library on the production boxes, cause it would be well worth it. Let me know if this works out for you.
I am trying to use Python to get a JSON file from the Web. If I open the URL in my browser (Mozilla or Chromium) I do see the JSON. But when I do the following with the Python:
response = urllib2.urlopen(url)
data = json.loads(response.read())
I get an error message that tells me the following (after translation in English): Errno 10060, a connection troughs an error, since the server after a certain time period did not react, or the connection was erroneous, or the host did not react.
ADDED
It looks like there are many people who faced the described problem. There are also some answers to the similar (or the same) question. For example here we can see the following solution:
import requests
r = requests.get("http://www.google.com", proxies={"http": "http://61.233.25.166:80"})
print(r.text)
It is already a step forward for me (I think that it is very likely that the proxy is the reason of the problem). However, I still did not get it done since I do not know URL of my proxy and I probably will need user name and password. Howe can I find them? How did it happen that my browsers have them I do not?
ADDED 2
I think I am now one step further. I have used this site to find out what my proxy is: http://www.whatismyproxy.com/
Then I have used the following code:
proxies = {'http':'my_proxy.blabla.com/'}
r = requests.get(url, proxies = proxies)
print r
As a result I get
<Response [404]>
Looks not so good, but at least I think that my proxy is correct, because when I randomly change the address of the proxy I get another error:
Cannot connect to proxy
So, I can connect to proxy but something is not found.
I think there might be something wrong, when you're trying to get the json from the online source(URL). Just to make things clear, here is a small code snippet
#!/usr/bin/env python
try:
# For Python 3+
from urllib.request import urlopen
except ImportError:
# For Python 2
from urllib2 import urlopen
import json
def get_jsonparsed_data(url):
response = urlopen(url)
data = str(response.read())
return json.loads(data)
If you still get a connection error, You can try a couple of steps:
Try to urlopen() a random site from the Interpreter (Interactive Mode). If you are able to grab the source code you're good. If not check internet conditions or try the request module. Check here
Check and see if the json in the URL is in the correct syntax. For sample json syntax check here
Try the simplejson module.
Edit 1:
if you want to access websites using a system wide proxy you will have to use a proxy handler to use loopback(local host) to connect to that proxy.. A sample code is shown below.
proxy = urllib2.ProxyHandler({
'http': '127.0.0.1',
'https': '127.0.0.1'
})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
# this way you can send both http and https request using proxies
urllib2.urlopen('http://www.google.com')
urllib2.urlopen('https://www.google.com')
I have not not worked a lot with ProxyHandler. I just know the theory and code. I am sure there are better ways to access websites through proxies; One which does not involve installing the opener everytime you run the program. But hopefully it will point you in the right direction.
I am trying to open an https URL using the urlopen method in Python 3's urllib.request module. It seems to work fine, but the documentation warns that "[i]f neither cafile nor capath is specified, an HTTPS request will not do any verification of the server’s certificate".
I am guessing I need to specify one of those parameters if I don't want my program to be vulnerable to man-in-the-middle attacks, problems with revoked certificates, and other vulnerabilities.
cafile and capath are supposed to point to a list of certificates. Where am I supposed to get this list from? Is there any simple and cross-platform way to use the same list of certificates that my OS or browser uses?
Works in python 2.7 and above
context = ssl.create_default_context(cafile=certifi.where())
req = urllib2.urlopen(urllib2.Request(url, body, headers), context=context)
I found a library that does what I'm trying to do: Certifi. It can be installed by running pip install certifi from the command line.
Making requests and verifying them is now easy:
import certifi
import urllib.request
urllib.request.urlopen("https://example.com/", cafile=certifi.where())
As I expected, this returns a HTTPResponse object for a site with a valid certificate and raises a ssl.CertificateError exception for a site with an invalid certificate.
Elias Zamarias answer still works, but gives a deprecation warning:
DeprecationWarning: cafile, cpath and cadefault are deprecated, use a custom context instead.
I was able to solve the same problem this way instead (using Python 3.7.0):
import ssl
import urllib.request
ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
response = urllib.request.urlopen("http://www.example.com", context=ssl_context)
You can download the certificates Mozilla in a format usable for urllib (e.g. PEM format) at http://curl.haxx.se/docs/caextract.html
Different Linux distributives have different pack names. I tested in Centos and Ubuntu. These certificate bundles are updates with system update. So you may just detect which bundle is available and use it with urlopen.
cafile = None
for i in [
'/etc/ssl/certs/ca-bundle.crt',
'/etc/ssl/certs/ca-certificates.crt',
]:
if os.path.exists(i):
cafile = i
break
if cafile is None:
raise RuntimeError('System CA-certificates bundle not found')
import certifi
import ssl
import urllib.request
try:
from urllib.request import HTTPSHandler
context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
context.options |= ssl.OP_NO_SSLv2
context.verify_mode = ssl.CERT_REQUIRED
context.load_verify_locations(certifi.where(), None)
https_handler = HTTPSHandler(context=context, check_hostname=True)
opener = urllib.request.build_opener(https_handler)
except ImportError:
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', YOUR_USER_AGENT)]
urllib.request.install_opener(opener)
Does urllib2 in Python 2.6.1 support proxy via https?
I've found the following at http://www.voidspace.org.uk/python/articles/urllib2.shtml:
NOTE
Currently urllib2 does not support
fetching of https locations through a
proxy. This can be a problem.
I'm trying automate login in to web site and downloading document, I have valid username/password.
proxy_info = {
'host':"axxx", # commented out the real data
'port':"1234" # commented out the real data
}
proxy_handler = urllib2.ProxyHandler(
{"http" : "http://%(host)s:%(port)s" % proxy_info})
opener = urllib2.build_opener(proxy_handler,
urllib2.HTTPHandler(debuglevel=1),urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)
fullurl = 'https://correct.url.to.login.page.com/user=a&pswd=b' # example
req1 = urllib2.Request(url=fullurl, headers=headers)
response = urllib2.urlopen(req1)
I've had it working for similar pages but not using HTTPS and I suspect it does not get through proxy - it just gets stuck in the same way as when I did not specify proxy. I need to go out through proxy.
I need to authenticate but not using basic authentication, will urllib2 figure out authentication when going via https site (I supply username/password to site via url)?
EDIT:
Nope, I tested with
proxies = {
"http" : "http://%(host)s:%(port)s" % proxy_info,
"https" : "https://%(host)s:%(port)s" % proxy_info
}
proxy_handler = urllib2.ProxyHandler(proxies)
And I get error:
urllib2.URLError: urlopen error
[Errno 8] _ssl.c:480: EOF occurred in
violation of protocol
Fixed in Python 2.6.3 and several other branches:
_bugs.python.org/issue1424152 (replace _ with http...)
http://www.python.org/download/releases/2.6.3/NEWS.txt
Issue #1424152: Fix for httplib, urllib2 to support SSL while working through
proxy. Original patch by Christopher Li, changes made by Senthil Kumaran.
I'm not sure Michael Foord's article, that you quote, is updated to Python 2.6.1 -- why not give it a try? Instead of telling ProxyHandler that the proxy is only good for http, as you're doing now, register it for https, too (of course you should format it into a variable just once before you call ProxyHandler and just repeatedly use that variable in the dict): that may or may not work, but, you're not even trying, and that's sure not to work!-)
Incase anyone else have this issue in the future I'd like to point out that it does support https proxying now, make sure the proxy supports it too or you risk running into a bug that puts the python library into an infinite loop (this happened to me).
See the unittest in the python source that is testing https proxying support for further information:
http://svn.python.org/view/python/branches/release26-maint/Lib/test/test_urllib2.py?r1=74203&r2=74202&pathrev=74203