Databricks CLI: SSLError, can't find local issuer certificate - python

I have installed and configured the Databricks CLI, but when I try using it I get an error indicating that it can't find a local issuer certificate:
$ dbfs ls dbfs:/databricks/cluster_init/
Error: SSLError: HTTPSConnectionPool(host='dbc-12345678-1234.cloud.databricks.com', port=443): Max retries exceeded with url: /api/2.0/dbfs/list?path=dbfs%3A%2Fda
tabricks%2Fcluster_init%2F (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer
certificate (_ssl.c:1123)')))
Does the above error indicate that I need to install a certificate, or somehow configure my environment so that it knows how to find the correct certificate?
My environment is Windows 10 with WSL (Ubuntu 20.04) (the command above is from WSL/Ubuntu command line).
The Databricks CLI was installed into an Anaconda environment including the following certificates and SSL packages:
$ conda list | grep cert
ca-certificates 2020.6.20 hecda079_0 conda-forge
certifi 2020.6.20 py38h32f6830_0 conda-forge
$ conda list | grep ssl
openssl 1.1.1g h516909a_1 conda-forge
pyopenssl 19.1.0 py_1 conda-forge
I get a similar error when I attept to use the REST API with curl:
$ curl -n -X GET https://dbc-12345678-1234.cloud.databricks.com/api/2.0/clusters/list
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

This problem can be solved by disabling the SSL certificate verification. In Databricks CLI you can do so by specifying insecure = True in your Databricks configuration file .databrickscfg.

I established trust to my Databricks instance by setting the environment variable REQUESTS_CA_BUNDLE.
➜ databricks workspace list
Error: SSLError: HTTPSConnectionPool(host='HOSTNAME.azuredatabricks.net', port=443): Max retries exceeded with url: /api/2.0/workspace/list?path=%2F (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))
➜ export REQUESTS_CA_BUNDLE=path/to/ca-bundle
➜ databricks workspace list
Users
Shared
Repos
From GitHub Issue:
Download the root CA certificate used to sign the Databricks certificate. Determine the path to the CA bundle and set the environment variable REQUESTS_CA_BUNDLE. See SSL Cert Verification for more information.

There is a similar issue in GitHub for Azure CLI. The solution is practically the same. Combining that with the Erik's answer:
Download the certificate using your browser and save it to disk
Open you Chrome and go to the Databricks website
Press CTRL + SHIFT + I to open the dev tools
Click Security tab
Click View certificate button
Click Details tab
On the Certification Hierarchy, (the top panel), click the highest node in the tree
Click Export the selected certificate
Choose where you want to save (eg. /home/cert/certificate.crt)
Use the SET command on Windows or the export on Linux to create a env variable called REQUESTS_CA_BUNDLE and point it to the downloaded file in the Step 1. (keep in mind that this need to be done in the same machine as you are trying to use the dbfs not in the cluster) For instance:
Linux
export REQUESTS_CA_BUNDLE=/home/cert/certificate.crt
Windows
set REQUESTS_CA_BUNDLE=c:\temp\cert\certificate.crt
Try to run your command dbfs ls dbfs:/databricks/cluster_init/ again
$ dbfs ls dbfs:/databricks/cluster_init/
It should work!

Related

SSL error only in python command window with apify request

I am trying to use endpoint from apify.com. When I run my request in web browser with token everything is fine but if I run my request via requests library from python console I am getting following error:
SSLError: HTTPSConnectionPool(host='', port=443): Max retries exceeded with url: /endpoint?token=token (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1131)')))
Moreover if I set verify = False in my request than request is working. Does anyone have an idea what can be wrong? Thanks in advance
I had this issue come up a few weeks ago.
>>> pip install certifi
>>> python -m certifi
I'm not certain that one needs to actually call the module to get it's functionality, but I did and it solved the error. More info on Certifi here. It is also a recommended package extension to requests from their website. I added those lasts bits because I was wary of installing a package that ostensibly was never called after installation.
Solution was to install internal company SSL package for managing SSL connection from python. There was a recent change.

PySpark and SparkContext are unable to verify self-signed certificate in chain when installing packages

I am using a Jupyter Notebook with the PySpark kernel and am attempting to install packages with the command sc.install_pypi_package("<PACKAGE>","ARTIFACTORY_DOMAIN>"). This results in
Collecting pandas
Could not fetch URL https://<ARTIFACTORY_DOMAIN>: There was a problem confirming the
ssl certificate: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self
signed certificate in certificate chain (_ssl.c:1091) - skipping
Our Artifactory server is in a corporate network so we need a self signed certificate (PEM key) in order to access the domain. Is there a way to troubleshoot this error so that we can allow the package installation to work? I am not very certain of the architecture since it is an AWS managed service that provisions notebooks but a virtualenv is used for PySpark and should be running on the EC2 instance that has the self-signed certificate located in /etc/ssl/certs/<CERT>.
If I use a Python kernel I am able to simply use pip install <PACKAGE> and this reaches out to Artifactory to get the package. We have a global etc/pip.conf and /etc/conda/condarc. I am not sure why this works while the PySpark kernel does not.

SSL Error on using ESRI arcgis python api

For my company's project, i need to use ESRI arcgis python API to access the data in our Enterprise ArcGIS portal.
After installing the arcgis library, i did a test of the connection via GIS().
the code looks as below
gis = GIS( profile="link to the portal",username ="username",password="password",verify_cert = False ,proxy_host='username:password#proxy_host',proxy_port=proxy_port)
But it gives me an error as below
Please set verify_cert=False due to encountered SSL error: HTTPSConnectionPool(host='www.arcgis.com', port=443): Max retries exceeded with url: /sharing/rest/generateToken (Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1123)')))
The error still exists even though i set verify_cert = False
Also i tried to set the proxy in the environment beforehand
os.environ['https_proxy'] = "http://proxy"
No luck as well.
my openssl version is OpenSSL 1.1.1k 25 Mar 2021-
It's very appreciate if anyone could provide me some solutions
we managed to get past this error by downgrading urllib3 to 1.25.11 in the python virt env after installing arcgis, using pip install urllib3==1.25.11
then connected using out proxy gis = GIS(the_url, verify_cert=False, proxy_host="our.proxy", proxy_port=port_num)

Error creating AWS lambda function using "sam build"

When running sam build --use-containers to create an AWS python 3.8 lambda function that uses a downloaded library, I am getting an error:
pip._vendor.requests.exceptions.SSLError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Max retries exceeded with url: /packages/d0/32/6c367f54699bd51961cf3e10299f6dee976f0f6813210052a4d8c2bd1d2b/pymemcache-3.2.0-py2.py3-none-any.whl (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate is not yet valid (_ssl.c:1108)')))
I checked the certificate on https://files.pythonhosted.org, and the cert is marked as starting on 7/13/2020. it's currently 7/14/2020.
I see that I can set the trusted hosts option to hopefully avoid this, (similar to: pip install fails with "connection error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:598)"), but when PIP is being run from within a container via a script Im not sure how to set it.
if looks like I can use an environment variable to set the PIP trusted hosts as well, but I am not sure how to set that in the docker image used by SAM
(running on a windows 10 system)

SSL 1108 Mac Issue

I am making a bot for Discord using discord.py. I've seen multiple threads on this but are still experiencing issues. I am on Mac an every time I try to run my script in VS Code, I get this error raise ClientConnectorCertificateError(
aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host discordapp.com:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1108)')]
As seen here, you'll need to go to Applications/Python 3.X/ and you'll see a folder called Install Certificates.command.
Double-click that and you should be fine when you run your bot again.
To fix the issue please do the below:
Install this package: https://pypi.org/project/certifi/
pip install certifi
Go to your Terminal an paste this:
/Applications/Python\ 3.11/Install\ Certificates.command
Note: maybe you must change the Python version to your actual version

Categories