How to setup ssl connection in Azure redis cache with python?

How to setup ssl connection in Azure redis cache with python? - python

We have a premium subscription in azure portal.
i just want to know what is the difference between ssl_cert_reqs or ssl=True
if i am making connection with below will this is secured?
r = redis.StrictRedis(host=myHostname, port=6380,
password=myPassword, ssl_cert_reqs='none', ssl=True)
if it's not secured how i can create a certificate file for that?
looks like if SSL is enabled we need to specify ssl_ca_certs, where to get this file? or do we need to generate it with some azure service?.
For Azure Cache for Redis version 3.0 or higher, TLS/SSL certificate check is enforced. ssl_ca_certs must be explicitly set when connecting to Azure Cache for Redis
Ref: https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-python-get-started

Related

Authenticating to Azure Key Vault locally using DefaultAzureCredential

I am attempting to run this 'Retrieve a secret from the vault' example locally (Ubuntu 19.10) to retrieve a secret from an Azure Key Vault:
from azure.keyvault.secrets import SecretClient
from azure.identity import DefaultAzureCredential
client = SecretClient(vault_url="https://<<vaultname>>.vault.azure.com",
credential=DefaultAzureCredential())
secret = client.get_secret("<<mysecret>>")
However I receive the following error:
azure.core.exceptions.ClientAuthenticationError:
No credential in this chain provided a token.
Attempted credentials:
EnvironmentCredential: Incomplete environment configuration. See https://aka.ms/python-sdk-identity#environment-variables for
expected environment variables
ImdsCredential: IMDS endpoint unavailable
Please visit the documentation at
https://aka.ms/python-sdk-identity#defaultazurecredential
to learn what options DefaultAzureCredential supports
The documentation on Service-to-Service authentication to Key Vault seems to suggest that I should be able to authenticate by the Azure CLI, and I've followed the steps to login via az login, select the appropriate subscription (which I've done just in case, despite only having one), and verify access via az account get-access-token --resource https://vault.azure.net which does return a token, however still receive the error above.
Am I wrong in assuming I should be able to authenticate after logging in via the cli?
And if so, and I need to manually set the environment variables described in the documentation link provided for EnvironmentCredential, what values do I need to supply for AZURE_CLIENT_ID and AZURE_CLIENT_SECRET?

Am I wrong in assuming I should be able to authenticate after logging in via the cli?
You're not wrong, it's possible with the current preview version of azure-identity, 1.4.0b2 as I write this. With that installed, your code should work once you've logged in to the CLI.
... what values do I need to supply for AZURE_CLIENT_ID and AZURE_CLIENT_SECRET?
These would be the client (or "application") ID of a service principal, and one of its secrets. The azure-keyvault-secrets documentation describes how to create a service principal and configure its access to a Key Vault, using the CLI.
Briefly restating that documentation here, you can create a service principal with this command:
az ad sp create-for-rbac --name http://my-application
From the output of that command, "appId" is the value of AZURE_CLIENT_ID and "password" is the value of AZURE_CLIENT_SECRET.
Then, to grant the service principal access to the Key Vault's secrets:
az keyvault set-policy --name <<vaultname>> --spn $AZURE_CLIENT_ID --secret-permissions get set list delete backup recover restore purge

how to configure Google Cloud Platform Data Loss Prevention client library for python to work behind a SSL proxy?

I am trying to have Google Cloud Platform Data Loss Prevention (DLP) client library for python working behind a SSL proxy:
https://cloud.google.com/dlp/docs/libraries#client-libraries-usage-python
I am using the code snippet from the doc:
# Import the client library
import google.cloud.dlp
import os
import subprocess
import json
import requests
import getpass
import urllib.parse
import logging
logging.basicConfig(level=logging.DEBUG)
# Instantiate a client.
dlp_client = google.cloud.dlp.DlpServiceClient()
# The string to inspect
content = 'Robert Frost'
# Construct the item to inspect.
item = {'value': content}
# The info types to search for in the content. Required.
info_types = [{'name': 'FIRST_NAME'}, {'name': 'LAST_NAME'}]
# The minimum likelihood to constitute a match. Optional.
min_likelihood = 'LIKELIHOOD_UNSPECIFIED'
# The maximum number of findings to report (0 = server maximum). Optional.
max_findings = 0
# Whether to include the matching string in the results. Optional.
include_quote = True
# Construct the configuration dictionary. Keys which are None may
# optionally be omitted entirely.
inspect_config = {
'info_types': info_types,
'min_likelihood': min_likelihood,
'include_quote': include_quote,
'limits': {'max_findings_per_request': max_findings},
}
# Convert the project id into a full resource id.
parent = dlp_client.project_path('my-project-id')
# Call the API.
response = dlp_client.inspect_content(parent, inspect_config, item)
# Print out the results.
if response.result.findings:
for finding in response.result.findings:
try:
print('Quote: {}'.format(finding.quote))
except AttributeError:
pass
print('Info type: {}'.format(finding.info_type.name))
# Convert likelihood value to string respresentation.
likelihood = (google.cloud.dlp.types.Finding.DESCRIPTOR
.fields_by_name['likelihood']
.enum_type.values_by_number[finding.likelihood]
.name)
print('Likelihood: {}'.format(likelihood))
else:
print('No findings.')
I also setup the following ENV variable:
GOOGLE_APPLICATION_CREDENTIALS
It run without issue when U am not behind a SSL proxy. When I am working behind a proxy, I am setting up the 3 ENV variables:
REQUESTS_CA_BUNDLE
HTTP_PROXY
HTTPS_PROXY
With such setup other GCP Client python libraries works fine behind a SSL proxy as for example for storage or bigquery).
For the DLP Client python lib, I am getting:
E0920 12:21:49.931000000 24852 src/core/tsi/ssl_transport_security.cc:1229] Handshake failed with fatal error SSL_ERROR_SSL: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed.
DEBUG:google.api_core.retry:Retrying due to 503 Connect Failed, sleeping 0.0s ...
E0920 12:21:50.927000000 24852 src/core/tsi/ssl_transport_security.cc:1229] Handshake failed with fatal error SSL_ERROR_SSL: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed.
DEBUG:google.api_core.retry:Retrying due to 503 Connect Failed, sleeping 0.0s ...
I didn't find in the documentation explaining if the lib works with proxy as the one GCP client lib and how to configure it to works with SSL proxy. The lib is in beta so it could be that it is not yet implemented.
It seems related to CA certificate and handshake. No issue with the same CA for BigQuery and Storage Client python lib. Any idea ?

Your proxy is performing TLS Interception. This results in the Google libraries not trusting the SSL certificate that your proxy is presenting when accessing Google API endpoints. This is a man-in-the-middle problem.
The solution is to bypass the proxy for Google APIs. In your VPC subnet where your application is running, enable Private Google Access. This requires that the default VPC routing rule still exists (or recreate it).
Private Google Access
[EDIT after comments below]
I am adding this comment to scare the beeswax out of management.
TLS Interception is so dangerous that no reasonable company would implement it if they read the following.
The scenario in this example. I am an IT person responsible for a corporate proxy. The company has implemented TLS Interception and I control the proxy. I have no access to Google Cloud resources for my company. I am very smart and I understand Google Cloud IAM and OAuth very well. I am going to hack my company because maybe I did not get a raise (invent your own reason).
I wait for one of the managers who has an organization or project owner/editor level permissions to authenticate with Google Cloud. My proxy logs the HTTPS headers, body and response for everything going to https://www.googleapis.com/oauth2/v4/token and a few more URLs.
Maybe the proxy is storing the logs on a Google Cloud Bucket or a SAN volume without solid authorization implemented. Maybe I am just a software engineer that finds the proxy log files laying about or easily accessed.
The corporate admin logs into his Google Account. I capture the returned OAuth Access Token. I can now impersonate the org admin for the next 3,600 seconds. Additionally, I capture the OAuth Refresh Token. I can now recreate OAuth Access Tokens at my will anytime I want until the Refresh Token is revoked which for most companies, they never do.
For doubters, study my Golang project which shows how to save OAuth Access Tokens and Refresh Tokens to a file for any Google Account used to authenticate. I can take this file home and be authorized without any authentication. This code will recreate the Access Token when it expires giving me almost forever access to any account these credentials are authorized for. Your internal IT resources will never know that I am doing this outside of your corporate network.
Note: Stackdriver Audit logging can capture the IP address, however, the identity will be the credentials that I stole. To hide my IP address, I would go to Starbucks or a public library a few hours drive from my home/job and do my deeds from there. Now figure out the where and the who for this hacker. This will give a forensics expert heartburn.
https://github.com/jhanley-com/google-cloud-shell-cli-go
Note: This problem is not an issue with Google OAuth or Google Cloud. This is an example of a security problem that the company has deployed (TLS Interceptions). This style of technique will work for almost all authentication systems that I know of that do not use MFA.
[END EDIT]

Summary:
Data Loss Prevention Client libray for python use gRCP.
google-cloud-dlp use gRPC while google-cloud-bigquery and
google-cloud-storage rely on the requests library for
JSON-over-HTTPS. Because it is gRPC other env variable need to be
setup:
GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=path_file.pem
# for debugging
RPC_TRACE=transport_security,tsi
GRPC_VERBOSITY=DEBUG
More details and links can be found here link
This doesn't solve all the issues because it continue to fail after
the handsake (TLS proxy) as described here link. As well
explained by #John Hanley we should enable Private Google Access
instead which is the recommended and secure way. This is not yet in
place in the network zone I am using the APIs so the proxy team
added a SSL bypass and it is now working. I am waiting to have Private Google Access enbale to have a clean and secure setup to use GCP APIs.

how to make a connection to an existing Redshift database using JDBC driver and Boto3 API in Python

I do not know how to write a connection string in Python using Boto3 API to make a jdbc connection to an existing database on AWS Redshift. I am using MobaXterm or putty to make a SSH connection. I have some codes to create the table, but am lost as to how to connect to the Database in Redshift
import boto3
s3client = boto3.client('redshift', config=client-config)
CREATE TABLE pheaapoc_schema.green_201601_csv (
vendorid varchar(4),
pickup_ddatetime TIMESTAMP,
dropoff_datetime TIMESTAMP,
I need to connect to database "dummy" and create a table.

TL;DR; You do not need IAM credentials or boto3 to connect to Redshift. What you need is end_point for the Redshift cluster and redshift credentials and a postgres client using which you can connect.
You can connect to Redshift cluster just the way you connect to any Database (Like MySQL, PostgreSQL or MongoDB). To connect to any database, you need 5 items.
host - (This is nothing but the end point you get from AWS console/Redshift)
username - (Refer again to AWS console/Redshift. Take a look at master username section)
password - (If you created the Redshift, you should know the password for master user)
port number - (5439 for Redshift)
Database - (The default database you created at first)
Refer to the screenshot if it is not intuitive.
What boto3 APIs do?
Boto3 provides APIs using which you can modify your Redshift cluster. For example, it provides APIs to delete your cluster, resize your cluster or take a snapshot of your cluster. They do not involve connection whatsoever.
Screenshots for reference:

PySpark via Dataproc + SSL Connection to Cloud SQL

I have a Cloud SQL instance storing data in a database, and I have checked the option for this Cloud SQL instance to block all unencrypted connections. When I select this option, I am given three SSL certificates - a server certificate, a client public key, and a client private key (three separate .pem files) (link to relevant CloudSQL+SSL documentation). These certificate files are used to establish encrypted connections to the Cloud SQL instance.
I'm able to successfully connect to Cloud SQL using MySQL from the command line using the --ssl-ca, --ssl-cert, and --ssl-key options to specify the server certificate, client public key, and client private key, respectively:
mysql -uroot -p -h <host-ip-address> \
--ssl-ca=server-ca.pem \
--ssl-cert=client-cert.pem \
--ssl-key=client-key.pem
I am now trying to run a PySpark job that connects to this Cloud SQL instance to extract the data to analyze it. The PySpark job is basically the same as this example provided by Google Cloud training team. On line 39 of said script, there is a JDBC connection that is made to the Cloud SQL instance:
jdbcDriver = 'com.mysql.jdbc.Driver'
jdbcUrl = 'jdbc:mysql://%s:3306/%s?user=%s&password=%s' % (CLOUDSQL_INSTANCE_IP, CLOUDSQL_DB_NAME, CLOUDSQL_USER, CLOUDSQL_PWD)
but this does not make an encrypted connection and does not provide the three certificate files. If I have unencrypted connections to the Cloud SQL instance disabled, I see the following error message:
17/09/21 06:23:21 INFO org.spark_project.jetty.util.log: Logging initialized #5353ms
17/09/21 06:23:21 INFO org.spark_project.jetty.server.Server: jetty-9.3.z-SNAPSHOT
17/09/21 06:23:21 INFO org.spark_project.jetty.server.Server: Started #5426ms
17/09/21 06:23:21 INFO org.spark_project.jetty.server.AbstractConnector: Started ServerConnector#74af54ac{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
[...snip...]
py4j.protocol.Py4JJavaError: An error occurred while calling o51.load.
: java.sql.SQLException: Access denied for user 'root'#'<cloud-sql-instance-ip>' (using password: YES)
whereas if I have unencrypted connections to the Cloud SQL instance enabled, the job runs just fine. (This indicates that the issue is not with Cloud SQL API permissions - the cluster I'm running the PySpark job from definitely have permission to access the Cloud SQL instance.)
The JDBC connection strings I have found involving SSL add a &useSSL=true or &encrypt=true but do not point to external certificates; or, they use a keystore in some kind of Java-specific procedure. How can I modify the JDBC connection string from the Python script linked to above, in order to point JDBC (via PySpark) to the locations of the server certificate and client public/private keys (server-ca.pem, client-cert.pem, and client-key.pem) on disk?

There's a handy initialization action for configuring the CloudSQL Proxy on Dataproc clusters. By default it assumes you intend to use CloudSQL for the Hive metastore, but if you download it and customize it setting ENABLE_CLOUD_SQL_METASTORE=0 and then re-upload it into your own bucket to use as your custom initialization action, then you should automatically get the CloudSQL proxy installed on all your nodes. And then you just set your mysql connection string to point to localhost instead of the real CloudSQL IP.
When specifying the metadata flags, if you've disabled additional-cloud-sql-instances instead of hive-metastore-instance in your metadata:
--metadata "additional-cloud-sql-instances=<PROJECT_ID>:<REGION>:<ANOTHER_INSTANCE_NAME>=tcp<PORT_#>`
In this case you can optionally use the same port assignment the script would've used by default for the metastore, which is port 3306.

Creating a CA certificate (.pem) file to connect to crate database

I am trying to connect to a crate database with python
from crate import client
url = '434.342.435.2:4400' # Faked these numbers for purposes of this post
conn = client.connect(url)
It seems like I need to pass the cert_file and key_file arguments to client.connect which point to my .pem and .key files. Looking in the documentation, I cannot find any resource to create or download these files.
Any advice? Even a comment pointing me to a good resource for beginners would be appreciated.

So cert and key files are part of the TLS encryption of a HTTP(S) connection that are required if you use a self-signed certificate :)
This seems to be a very good explanation of the file types
As mfussenegger explained in the comment, these files are optional and only required if your CrateDB instance is "hidden" behind a reverse proxy server like NGINX or Apache with a self-signed certificate.
A small green lock on the far left of your browser's address bar indicates HTTPS (and therefore TLS) with known certificates.
Typically certificates signed by an unknown CA - like yourself - result in a warning page and a red indicator:
Since you are also referring to username and password, they usually indicate some sort of auth (maybe basic auth) which is not yet supported by crate-python :(

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.