Verifying HTTPS certificates with urllib.request - python

I am trying to open an https URL using the urlopen method in Python 3's urllib.request module. It seems to work fine, but the documentation warns that "[i]f neither cafile nor capath is specified, an HTTPS request will not do any verification of the server’s certificate".
I am guessing I need to specify one of those parameters if I don't want my program to be vulnerable to man-in-the-middle attacks, problems with revoked certificates, and other vulnerabilities.
cafile and capath are supposed to point to a list of certificates. Where am I supposed to get this list from? Is there any simple and cross-platform way to use the same list of certificates that my OS or browser uses?

Works in python 2.7 and above
context = ssl.create_default_context(cafile=certifi.where())
req = urllib2.urlopen(urllib2.Request(url, body, headers), context=context)

I found a library that does what I'm trying to do: Certifi. It can be installed by running pip install certifi from the command line.
Making requests and verifying them is now easy:
import certifi
import urllib.request
urllib.request.urlopen("https://example.com/", cafile=certifi.where())
As I expected, this returns a HTTPResponse object for a site with a valid certificate and raises a ssl.CertificateError exception for a site with an invalid certificate.

Elias Zamarias answer still works, but gives a deprecation warning:
DeprecationWarning: cafile, cpath and cadefault are deprecated, use a custom context instead.
I was able to solve the same problem this way instead (using Python 3.7.0):
import ssl
import urllib.request
ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
response = urllib.request.urlopen("http://www.example.com", context=ssl_context)

You can download the certificates Mozilla in a format usable for urllib (e.g. PEM format) at http://curl.haxx.se/docs/caextract.html

Different Linux distributives have different pack names. I tested in Centos and Ubuntu. These certificate bundles are updates with system update. So you may just detect which bundle is available and use it with urlopen.
cafile = None
for i in [
'/etc/ssl/certs/ca-bundle.crt',
'/etc/ssl/certs/ca-certificates.crt',
]:
if os.path.exists(i):
cafile = i
break
if cafile is None:
raise RuntimeError('System CA-certificates bundle not found')

import certifi
import ssl
import urllib.request
try:
from urllib.request import HTTPSHandler
context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
context.options |= ssl.OP_NO_SSLv2
context.verify_mode = ssl.CERT_REQUIRED
context.load_verify_locations(certifi.where(), None)
https_handler = HTTPSHandler(context=context, check_hostname=True)
opener = urllib.request.build_opener(https_handler)
except ImportError:
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', YOUR_USER_AGENT)]
urllib.request.install_opener(opener)

Related

Getting a 401 response while using Requests package

I am trying to access a server over my internal network under https://prodserver.de/info.
I have the code structure as below:
import requests
from requests.auth import *
username = 'User'
password = 'Hello#123'
resp = requests.get('https://prodserver.de/info/', auth=HTTPBasicAuth(username,password))
print(resp.status_code)
While trying to access this server via browser, it works perfectly fine.
What am I doing wrong?
By default, requests library verifies the SSL certificate for HTTPS requests. If the certificate is not verified, it will raise a SSLError. You check this by disabling the certificate verification by passing verify=False as an argument to the get method, if this is the issue.
import requests
from requests.auth import *
username = 'User'
password = 'Hello#123'
resp = requests.get('https://prodserver.de/info/', auth=HTTPBasicAuth(username,password), verify=False)
print(resp.status_code)
try using requests' generic auth, like this:
resp = requests.get('https://prodserver.de/info/', auth=(username,password)
What am I doing wrong?
I can not be sure without investigating your server, but I suggest checking if assumption (you have made) that server is using Basic authorization, there exist various Authentication schemes, it is also possible that your server use cookie-based solution, rather than headers-based one.
While trying to access this server via browser, it works perfectly
fine.
You might then use developer tools to see what is actually send inside and with request which does result in success.

How to use a client certificate from the Windows certificate store in python?

I want to invoke a web request using a client certificate (public+private key) stored in the Windows certificate store.
With PowerShell my call would look like this (this works):
Invoke-WebRequest -CertificateThumbprint $thumbprint -Uri $uri
Now I am searching for an equivalent in python. I do not want to extract the certificate and pass the file but directly use the store or at least only keep the certificate in memory.
I have tried wincertstore but the certificate lies in the UserStore(cert:\CurrentUser\My) so I cannot access it. Same problem with sslContext.
Installing python-certifi-win32 as mentioned in this answer seems to only load the CA-certificates in order to verify the server, but what I need is a client certificate to verify myself against the server.
Are there any ways other than calling powershell with subprocess to achieve this?
Many thanks in advance.
For anyone with the same problem. I solved it using clr to export the certificate into memory and requests_toolbelt to use it with requests.
Code example to make it work:
import clr
import requests
import requests_toolbelt
from cryptography.hazmat.primitives.serialization.pkcs12 import load_key_and_certificates
from cryptography.hazmat.primitives.serialization import Encoding, PrivateFormat, NoEncryption
from cryptography.hazmat.backends import default_backend
from requests_toolbelt.adapters.x509 import X509Adapter
clr.AddReference('System')
clr.AddReference('System.Linq')
clr.AddReference('System.Security.Cryptography.X509Certificates')
clr.AddReference('System.Security.Cryptography')
from System.Security.Cryptography.X509Certificates import X509Store, StoreName, StoreLocation,OpenFlags,X509Certificate2Collection,X509FindType,X509Certificate2, X509ContentType
from System.Security.Cryptography import AsymmetricAlgorithm
store = X509Store(StoreName.My, StoreLocation.CurrentUser)
store.Open(OpenFlags.ReadOnly)
user = os.environ['USERNAME']
certCollection = store.Certificates.Find(
X509FindType.FindBySubjectName,
user,
False)
cert = certCollection.get_Item(0)
pkcs12 = cert.Export(X509ContentType.Pkcs12, <passphrase>)
backend = default_backend()
pkcs12_password_bytes = "<password>".encode('utf8')
pycaP12 = load_key_and_certificates(pkcs12, pkcs12_password_bytes, backend)
cert_bytes = pycaP12[1].public_bytes(Encoding.DER)
pk_bytes = pycaP12[0].private_bytes(Encoding.DER, PrivateFormat.PKCS8, NoEncryption())
adapter = X509Adapter(max_retries=3, cert_bytes=cert_bytes, pk_bytes=pk_bytes, encoding=Encoding.DER)
session = requests.Session()
session.mount('https://', adapter)
session.get('url', verify=True)

Does specifying HTTP protocol makes a difference?

Is there a difference between those two bs4 objects?
from urllib2 import urlopen, Request
from bs4 import BeautifulSoup
req1 = Request("https://stackoverflow.com/") # HTTPS
html1 = urlopen(req1).read()
req2 = Request("http://stackoverflow.com/") # HTTP
html2 = urlopen(req2).read()
bsObj1 = BeautifulSoup(html1, "html.parser")
bsObj2 = BeautifulSoup(html2, "html.parser")
Do you really need to specify an HTTP protocol?
Here's my limited understanding: There isn't a practical difference in this case.
My understanding is that most websites that have https will redirect http URLs to https, as is the case here. It's possible for a site to have an http version and an https version up simultaneously, in which case they might not redirect. This would be bad practice, but nothing is stopping someone from doing it.
I would still explicitly use https whenever possible, just as a best practice.
All communication over the HTTP protocol happens using HTTP verbs GET, POST, PUT, DELETE. Specifying the protocol has two purposes:
1) It specifies the scheme for data communication.
A general URI is of the form:
scheme:[//[user[:password]#]host[:port]][/path][?query][#fragment] and common schemes are http(s), ftp, mailto, file, data, and irc.
2) It specifies if the scheme supports SSL encryption:
With http schemes, the added 's' in https ensures SSL encryption of data.
According to urllib3 Python docs:
It is highly recommended to always use SSL certificate verification.In order to enable verification you will need a set of root certificates. The easiest and most reliable method is to use the certifi package which provides Mozilla’s root certificate bundle:
pip install certifi
>>> import certifi
>>> import urllib3
>>> http = urllib3.PoolManager(
... cert_reqs='CERT_REQUIRED',
... ca_certs=certifi.where())
The PoolManager will automatically handle certificate verification and will raise SSLError if verification fails:
>>> http.request('GET', 'https://google.com')
(No exception)
>>> http.request('GET', 'https://expired.badssl.com')
urllib3.exceptions.SSLError ...

How to use Python requests to perform NTLM SSPI authentication?

My goal is to authenticate my client that uses the requests library (2.11.1) in Python 3.5.2 through NTLM with SSPI so that the user does not have to manually enter her domain credentials (used to login to the PC).
I have found the following possibilities, but none work for me:
HttpNtlmSspiAuth provokes an exception in requests:
import requests
from requests_ntlm import HttpNtlmAuth, HttpNtlmSspiAuth
requests.get(site_url, auth=HttpNtlmSspiAuth())
requests-sspi-ntlm always gets a 401:
import requests
from requests_sspi_ntlm import HttpNtlmAuth
session = requests.Session()
session.auth = HttpNtlmAuth()
session.get("http://ntlm_protected_site.com")
And requests-negotiate-sspi also triggers an exception in requests:
import requests
from requests_negotiate_sspi import HttpNegotiateAuth
r = requests.get('https://iis.contoso.com', auth=HttpNegotiateAuth())
Am I doing something wrong?
The package requests-negotiate-sspi works for me.
I probably had the same issue with PO, but I was too lazy to try PO's solution and integrate PO's code into mine. And Google helped me out. In case anyone encounters the same exception raised from sspi.py ValueError: year 30828 is out of range, it's a known issue for python 3.6 of requests-negotiate-sspi. See here: Github-Issue
I solved this by creating a new conda environment with python 3.4. Then reinstall some dependencies as well as requests-negotiate-sspi, boom, all works.
Same issue here but solved when I realized I was in a adm account that doesn’t have authorization to that resource uri.

SSL Context for older python version

I have a code as below :
headers = {'content-type': 'ContentType.APPLICATION_XML'}
uri = "www.client.url.com/hit-here/"
clientCert = "path/to/cert/abc.crt"
clientKey = "path/to/key/abc.key"
PROTOCOL = ssl.PROTOCOL_TLSv1
context = ssl.SSLContext(PROTOCOL)
context.load_default_certs()
context.load_cert_chain(clientCert, clientKey)
conn = httplib.HTTPSConnection(uri, some_port, context=context)
I am not really a network programmer, so i did some googling for handshake connection and found ssl.SSLContext(PROTOCOL) as the needed function, code works fine.
Then i hit the roadblock, my local has version 2.7.10 but all the production boxes have 2.7.3 with them, so SSLContext is not supported and upgrading python version is not an option / in control.
I tried reading ssl — SSL wrapper for socket objects but couldn't make sense out of it.
what i tried (in vain) :
s_ = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s = ssl.wrap_socket(s_, keyfile=clientKey, certfile=clientCert, cert_reqs=ssl.CERT_REQUIRED)
new_conn = s.connect((uri, some_port))
but returns :
SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)')
Question - how to generate SSL Context on older version so as to have a secure https connection?
You have to specify the ca_certs file (which should point to trust store)
I've got the perfect solution using the requests library. The requests library has got to be my favorite library I've ever used, cause it takes something in Python that is inherently difficult to do -- SSL and REST requests -- and makes it unbelievably simple. I checked out their version support and Python 2.6+ is supported.
Here is an example of how to use their library.
>>> requests.get(uri)
And that is all you have to do. The requests library takes care of establishing a ssl connection.
Taking this one step farther. If you need to persist cookies between requests, you can do so like this.
>>> sess = requests.Session()
>>> credentials = {"username": "user",
"password": "pass"}
>>> sess.post("https://some-website/login", params=credentials)
<Response [200]>
>>> sess.get("https://some-website/a-backend-page").text
<html> the backend page... </html>
Edit: If you need to, you can also pass in the path to the certificate and the key like so requests.get(uri, cert=('path/to/cert/abc.crt', 'path/to/key/abc.key'))
Now hopefully you can convince them to install the requests library on the production boxes, cause it would be well worth it. Let me know if this works out for you.

Categories