Failing to connect redshift cluster having not publicly accessible - python

I'm newbie to AWS Redshift & I'm not able to fetch data from schema from Redshift cluster. Cluster is inside VPC and having no publicly access. I configured security group and configured inbound with:
1) Type : Redshift, Protocol : TCP, Port Range: 5439, Source: 0.0.0.0/0
2) Type : Redshift, Protocol : TCP, Port Range: 5439, Source: 'Security group name'
I've followed some of the docs, blogs & various libraries like redshift_tool with pandas, SQLAlchemy, pyodbc to configure clusters for python & came up with one library to use psycopg2
conn = psycopg2.connect(
host=HOST,
port=RS_PORT,
user=RS_USER,
password=PWD)
But I'm facing following error for all the libraries including psycopg2
psycopg2.OperationalError: could not connect to server: Connection timed out
Is the server running on host and accepting
TCP/IP connections on port 5439?
Q1) Am I missing some steps to configure? Please do suggest if any other way is available to connect cluster with python/drivers
Q2) How to connect clusters via SSH tunnel? If this is possible way to connect then please help me with the steps.
Thanks in Advance.

Make sure you added 'VPC security groups'
You can find it on 'Network and Security'.

Related

Redshift new cluster used to be accessible but not anymore

Last week, I was able to successfully connect to Redshift clusters. This week I am unable to connect even though I gave same configs for the following:
Virtual Private Cloud VPC
Security Groups
Cluster subnet group
Publicly accessible Cluster permissions
But this week I get the error
Traceback (most recent call last):
File "create_staging_tables.py", line 93, in <module>
conn = psycopg2.connect(
File "/Users/bsubramanian/.pyenv/versions/3.8.2/lib/python3.8/site-packages/psycopg2/__init__.py", line 122, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: could not connect to server: Operation timed out
Is the server running on host "clustername.region.redshift.amazonaws.com" (54.243.82.201) and accepting
TCP/IP connections on port 5439?
when running from a Python script which is used to connect to redshift cluster and create some tables.
How do I debug what is wrong?
Typically these issues are network related. Checking connectivity from your client system to the database is a good start.
First off check the connection information - go to the Redshift console and confirm the IP address given in the error message is the IP address of the leader node. If these don't match your code has some wrong configuration. (Note that Redshift can also have a public IP if you configured the cluster as such. Most users don't do this for security reasons. If you do you likely should be using that IP address.)
Next a simple test of network connectivity is a good step. The Linux command telnet can do this - telnet 5439. Now telnet cannot talk to Redshift but if you get any response other than a time out telnet is able to make the initial connection to Redshift. If this doesn't work then a lot more information about your network configuration will be needed to debug.
Now all of this assumes you don't have a connection pool server in between your client and the DB. It looks to be the case but ...
If you can connect via IP address but not with the cluster DNS name then a DNS issue is likely. We'll need more info on your DNS setup (and some on the network). This doesn't look to be the issue but ...
If telnet can connect but your client cannot (with the same info) then it could be a security group configuration issue.
There are lots of possibilities. Start by checking the connection info and update the issue as you learn more.
I was able to resolve this by creating new instances of the following
Virtual Private Cloud(VPC)
VPC Security Group
Cluster Subnet group

GCP dataproc with presto - is there a way to run queries remotely via python using pyhive?

I am trying to run queries on a presto cluster I have running on dataproc - via python (using presto from pyhive) on my local machine. But I can't seem to figure out the host URL. Does GCP dataproc even allow accessing the presto clusters remotely?
I tried using the URL on Presto's web UI, but that didn't work either.
I also checked the docs about using Cloud Client Libraries for Python. Wasn't helpful either. https://cloud.google.com/dataproc/docs/tutorials/python-library-example
from pyhive import presto
query = '''select * FROM system.runtime.nodes'''
presto_conn = presto.Connection(host={host}, port=8060, username ={user})
presto_cursor = presto_conn.cursor()
presto_cursor.execute(query)
Error
ConnectionError: HTTPConnectionPool(host='https', port=80): Max retries exceeded with url: {url}
(Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb41c0c25d0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
Update
I was able to manually create a VM on GCP compute, configure trino and setup firewall rules and load balancer to be able to access the cluster.
Gotta check if dataproc allows similar config.
Looks like Google firewall is blocking connections from the outside world.
How to fix
Quick and dirty solution
Just allow access to ports 8060 from your IP to the dataproc cluster.
This might not scale if you're on a public IP address but will allow you to develop.
It is a bad idea to expose "big data" services to the whole internet. You might get hacked, and Google will shut down the service.
Use a SSH tunnel
Create a small instance (one from the free-tier), expose the SSH port to the inernet, and use port-forwarding.
Your URLs won't be https://dataproc-cluster:8060..., but https://localhost:forwarded_port
This is easy to do and you can turn off that bastion vm when it's not needed.

Issue in connecting Python with MySQL on Google Cloud Platform

I have used the following code in Python:
import mysql.connector as mysql
import sys
HOST = "34.87.95.90"
DATABASE = "CAO_db"
USER = "root"
PASSWORD = "*********"
db_connection = mysql.connect(user=USER, password=PASSWORD, host=HOST, database=DATABASE)
cur = db_connection.cursor()
When I run the above code, I get the following error messages:
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
InterfaceError: 2003: Can't connect to MySQL server on '34.87.95.90:3306' (10060 A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond)
I am not sure of how to fix my code and/or resolve the given errors. Please ask me if you would like more details of the error messages to help with the issue. I would greatly appreciate all the help I can get towards resolving the issues.
One thing I'm not seeing here is whether or not you have configured your Cloud SQL instance to accept connections.
You can configure it to accept connections from within the GCP stratosphere using their "Private IP" internal networking magic, AND you can configure it to accept connections from other machines using a combination of Public IP and either an authorized external network (like if you were accessing your GCP Cloud SQL instance from, say, an Amazon EC2 instance), or their Cloud SQL Proxy tool (which is what I use to connect to my Cloud SQL instance from my laptop).
In the GCP Console, go to your project
From the hamburger menu, select SQL
Click on your Cloud SQL instance
In the left nav, click on Connections
If you have Private IP checked and you're running this code on a GCP Compute/GKE resource, confirm that the "Network" field is set to the network used by that resource.
If you're just trying to get a connection from your local machine and you don't have a static IP to whitelist, your best option is to use Public IP in combination with Cloud SQL Proxy.
Cloud SQL Proxy essentially creates a TCP tunnel that allows your laptop to connect to 'localhost' on a port you specify, and it then redirects your connection to the remote Cloud SQL instance.
Once you've established that your networking situation isn't the problem, you could use the same Python connection code that you wrote above, but change HOST to 127.0.0.1 and add an attribute for PORT=3308.
EDITED to add: I suggest using PORT=3308 for your cloud_sql_proxy connection so that it doesn't interfere with any existing port 3306 (MySQL default) connections that you may already be actually running on your local machine. If this isn't the case, you can either omit the PORT attribute or keep it explicit, but change it to 3306.

How to connect to Amazon redshift cluster from within my Amazon EC2 instance

I have a Redshift Cluster in my AWS account. I am able to connect to it in python and when I run the script locally, it runs perfectly fine:
import psycopg2
con = psycopg2.connect(dbname='some_dbname', host='hostname.us-east-2.redshift.amazonaws.com', port='port#', user='username', password='password')
cursor=con.cursor()
query = "select * from table;"
cursor.execute(query)
data = np.array(cursor.fetchall())
cursor.close()
con.commit()
con.close()
But, when I copy the above script to my EC2 instance (Amazon Linux AMI), and then try running it, I get the following error:
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: could not connect to server: Connection timed out
Is the server running on host "hostname.us-east-2.redshift.amazonaws.com" and accepting
TCP/IP connections on port port#?
Can anybody help me out in how to connect to my Redshift cluster from my EC2 instance? Thanks!
It's either networking or security groups
When you provision an Amazon Redshift cluster, it is locked down by default so nobody has access to it. To grant other users inbound access to an Amazon Redshift cluster, you associate the cluster with a security group.
See http://docs.aws.amazon.com/redshift/latest/mgmt/working-with-security-groups.html
If the EC2 is in the same VPC as the Redshift cluster you should be ok for networking. If not, look at this guide
http://docs.aws.amazon.com/redshift/latest/mgmt/enhanced-vpc-routing.html
The issue is related to Security Group(s) attached to the Redshift Cluster.
I have faced this issue myself, So I would like you to follow these steps -
VPC & Region Checking
Check -
VPC-if your EC2 is in the Same VPC as that of the Redshift cluster
Account Region - Ensure the Regions are same too.
if the above to criterias are true, then -
Check if your EC2's Private IP is whitelisted in the security group attached to the redshift cluster. ( I usually dont create instances with public IP's instead I login via bastion host for better security and ease as I have to handle a lot of EC2's)
You may refer to this AWS Documentation to Do the above :- Authorize Access To Cluster
After this, verify your work by logging into your ec2 and run -
telnet < redshift_cluster_endpoint > < redshift_cluster_port >
If you see the following [Successful Connection] -
Connection Successful To Redshift Snapshot
it means you can connect to redshift via your ec2 instance.
In Addition to the above,
You are able to connect to Redshift while running the code locally from your machine because your public IP(Home/Work-Place) must have been whitelisted in Redshift's attached security groups. So consider EC2 as a similar machine on cloud which needs Whitelisting(IP).

Pymongo connection timeout from remote machine

I have a Bitnami MEAN Stack running on AWS EC2. I'm trying to connect from a remote machine using PyMongo.
from pymongo import MongoClient
conn = MongoClient('mongodb://username:password#ec2blah.us-east-1.compute.amazonaws.com:27017/dbname')
but I keep getting an error along the lines of pymongo.errors.ConnectionFailure: timed out
I have edited /opt/bitnami/mongodb/mongodb.conf to supposedly allow external connections by commenting out bind_ip = 127.0.0.1 and uncommented bind_ip = 0.0.0.0 and all permutations of commenting/uncommenting those lines.
I've looked over the web for about 90 minutes now trying different things but without luck!
On the mongoDB server, do the port connection test, and make sure the DB service running well. If not, start the service.
telnet ec2blah.us-east-1.compute.amazonaws.com 27017
On the remote machine, do the port connection test, to make sure there is no firewall issue.
telnet ec2blah.us-east-1.compute.amazonaws.com 27017
If you have issue to connect, you need check security groups on this instance.
Click the ec2 instance name --> Description --> view rules, you should see the ports are opened
If not, create a new security group , such as `mongoDB`, tcp port 27017 should be opened for inbound traffic, then assign to that instance.
You should be fine to connect it now.
At the time of start-up of MongoDB, set the bind_ip argument to ::,0.0.0.0
mongod --bind_ip ::,0.0.0.0
Read more in the docs of MongoDB: IP Binding.

Categories