python - getting SSL error when trying to scrape a webpage - python

I'm trying to scrape this webpage using Python:
https://fftoolbox.scoutfantasysports.com/football/rankings/PrintVersion.php
I've been using the requests package. I can "solve" the issue by setting verify=False, however I've read that that's not secure. In other threads, people said to point the requests.get() function to the filepath of the relevant certificate. I exported the certificate from my browser, and then tried that, but with no luck. This
requests.get('https://fftoolbox.scoutfantasysports.com/football/rankings/PrintVersion.php',verify='C:/Users/ericb/Desktop/fftoolboxscoutfantasysportscom.crt')
gives the SSL error still
SSLError: HTTPSConnectionPool(host='fftoolbox.scoutfantasysports.com', port=443): Max retries exceeded with url: /football/rankings/PrintVersion.php (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",),))
And this
requests.get('https://fftoolbox.scoutfantasysports.com/football/rankings/PrintVersion.php',cert='C:/Users/ericb/Desktop/fftoolboxscoutfantasysportscom.crt')
yields
Error: [('PEM routines', 'PEM_read_bio', 'no start line'), ('SSL routines', 'SSL_CTX_use_PrivateKey_file', 'PEM lib')]
I've done a decent amount of webscraping before, but I've never had to deal with certificates until now. How can I get around this? I should also note that I'd like to put my final Python script and any files it uses onto a public GitHub repo. But I don't want do do anything that would jeopardize my security, like uploading keys or something.

The server is misconfigured, it does not send the intermediate certificate it needs to send.
See this report: https://www.ssllabs.com/ssltest/analyze.html?d=fftoolbox.scoutfantasysports.com&hideResults=on
Certificates provided 1 (1776 bytes)
Chain issues Incomplete
Or https://sslanalyzer.comodoca.com/?url=fftoolbox.scoutfantasysports.com
Trusted by Microsoft? No (unable to get local issuer certificate) UNTRUSTED
Trusted by Mozilla? No (unable to get local issuer certificate) UNTRUSTED
With openssl s_client -connect fftoolbox.scoutfantasysports.com:443 -showcerts you can see:
Certificate chain
0 s:/OU=Domain Control Validated/CN=fftoolbox.scoutfantasysports.com
i:/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2
And the webserver should be configured to send the /C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2 intermediary certificate but it does not.
So, you could contact the website and tells them they are misconfigured. You will not be the only one impacted by that, as the second link shows.
Alternatively, you could add the missing certificate locally as fully trusted, but this kind of lowers your security. You can also download the missing certificate (not the one of the website, the intermediary one) locally and add verify=/path/to/certificate in your requests.get call.

Related

Unable to Complete SSL Connection with Certificate and Python Request

i'm having this problem for days now, and can't figure out what exactly is wrong. I'm trying to connect to a server that requires authentication with digital certificates, and have done this before with the requests library in the following manner:
cert = (f'/path/to/cert.crt', f'/path/to/cert.open.key')
response = requests.get(url_server,cert=cert,headers=headers,proxies=proxies)
At first, i received the following error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'. I imagined it had something to do with the server certificate, because i have used the same client certificate to log in other sites before. For testing purposes, i made the call with verify=False in order to ignore any problems with the server certificate, and got this:
SSLError(1, '[SSL: SSLV3_ALERT_BAD_CERTIFICATE] sslv3 alert bad certificate (_ssl.c:852)'),))
I tried to point the verify to a folder with the cert chain files (root and intermediate), and also to the certBundle file when the previous didn´t work, as documented in https://docs.python-requests.org/en/master/user/advanced/#ssl-cert-verification, but to no avail. I'm lacking knowledge about the minutia of the handshake process and therefore can´t think in any new way to debug this issue. Any help would be deeply appreciated.
I figured it out. The client certificate was lacking the certificate chain (for the client). Using Wireshark i intercepted the packages and compared what the browser was sending with the packages requests was sending. This allowed me to see that the browser automatically inserts the client certificate chain to the certificate, while requests doesn't.

Pulling Mimecast Logs with Python

I am hoping someone has gone through this and hopefully has a working Python script. I have been trying to pull MTA logs from Mimecast. So far, I have tried the
codes from the below websites:
https://www.mimecast.com/tech-connect/documentation/endpoint-reference/logs-and-statistics/get-siem-logs/
https://github.com/JoshuaSmeda/mimecast_log_collector
https://github.com/bsdkid/mimecast-api-class/blob/master/get-TTP.py
The error I get is
SSLError: HTTPSConnectionPool(host='api.mimecast.com', port=443): Max retries exceeded with url: /api/login/discover-authentication (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)'),))
I also have all the necessary credentials, such as user(account), password, app_id, app_key, access_key, and secret_key. Unfortunately, nothing has worked for me.
Any help is much appreciated.
Thanks
You probably got some sort of SSL inspection happening in your environment.
Have you tried testing on a another test instance perhaps where there is no transparent proxy filtering internet traffic.
You can also try using the SSL verify argument (set to false) for the API request to ignore the cert validation issue.
Arg:
verify=False
Example based on https://github.com/JoshuaSmeda/mimecast_log_collector:
try:
r = requests.post(url='https://api.mimecast.com/api/login/discover-authentication', data=json.dumps(post_body), headers=headers, verify=False)
If it works for the discovery - then add the verify argument to each post. Keep in mind the risks of doing this because you open yourself up to MITM attacks as an example. The risks of HTTP would apply.
Documentation on requests can be found here:
https://buildmedia.readthedocs.org/media/pdf/requests/latest/requests.pdf
Hope this helps.

How to capture python https traffic in fiddler?

Python throws in errors when ever I try to do some data fetching task.
This only happens when I set fiddler to decrypt https traffic.
I have tried routing python traffic through 127.0.0.1:8888 and same with mozilla inorder to catch its traffic.
I also installed the certificate and trusted it via fiddler, I am not sure where I am going wrong.
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='google.com', port=443):
Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: CERTIFIC
ATE_VERIFY_FAILED] certificate verify failed (_ssl.c:748)'),))
This above is the error I get whenever I try to fetch a page with requests.
TL;DR The requests library does not use the windows certificate store, it has it's own one (as per https://bugs.python.org/issue28547). This means that your fiddler MITM certificate is not available to python requests by default.
Your options are
Disable SSL verification (verify=False)
Add your cert via the REQUESTS_CA_BUNDLE environment variable
Add your fiddler cert explicitly (verify='\path\to\cert')
More details can be found at http://docs.python-requests.org/en/master/user/advanced/#ssl-cert-verification
On a side note, it does feel a little strange for requests to be using it's own cert bundle, rather than the platform supplied one - especially given all the browsers are happy to use the platform ones.
As pointed out by polhemic and Eric Aronesty, for testing purposes, you can set temporarily "CURL_CA_BUNDLE" to an empty string.
import os
os.environ['CURL_CA_BUNDLE'] = ''

Can I provide a list of acceptable hostnames to an SSLSocket?

I'm trying to connect to a server with a self-signed certificate and no domain name. The problem is, despite having loaded a copy of the server's certificate with SSLContext.load_verify_locations(), it seems to consider it invalid:
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:550)
I'm pretty sure it's just a hostname mismatch, because I'm connect()ing to the server's IP, and the certificate doesn't have the IP in the Common Name field. Is there any way to tell the SSLSocket “it's okay if the server's certificate is for one of these hostnames”?
You can add your IP to the Subject Alternative Names. There's good information in this question here

Obtain SSL certificate from peer without verification using Python

Am in the process of building a quick python script to periodically check my clients websites are working correctly. One of these checks is to ensure their SSL certificates are current, or to provide an alert if their certificate is about to expire.
The ssl packages provides a way to obtain the peer certificate with the SSLSocket.getpeercert() method but this will only return a certificate if the certificate can be validated. If the CA cert has not been obtained the validation does not work.
What I want to do is obtain the peer certificate even if it can not be validated so I am able to get the information required to both obtain the correct CA certificate and do other checks such as checking the domain name matches, expiry date is in the correct range etc. Does anybody know of a way to obtain this information?
pyCurl and pyOpenSSL look like possible candidates but have not been able to find an example or manage to get them to return the certificate.
Cheers
It may be possible to use a shell script to grab the certificates and then use Python to iterate over certificate output files. Something like:
$ openssl s_client -connect host:port -showcerts > certfile
might work. You might also read the documentation on pyOpenSSL's Connection object, which has a get_peer_certificate() method:
http://packages.python.org/pyOpenSSL/openssl-connection.html#l2h-187
I haven't ever used the pyOpenSSL module, but it's probably your best bet for keeping everything in Python.

Categories