Unable to locate credentials setting up airflow connection through env variables - python

So I am trying to set up an S3Hook in my airflow dag, by setting the connection programmatically in my script, like so
from airflow.hooks.S3_hook import S3Hook
from airflow.models import Connection
from airflow import settings
def s3_test_hook():
conn = Connection(
conn_id='aws-s3',
conn_type='s3',
extra={"aws_access_key_id":aws_key,
"aws_secret_access_key": aws_secret},
)
I can run the conn line no problem, which tells me the connection can be made. aws_key and aws_secret are loaded int through dotenv with an .env file I have in my local directory.
However when I run the next two lines in the function:
s3_hook = S3Hook(aws_conn_id='aws-s3')
find_bucket = s3_hook.check_for_bucket('nba-data')
to check for a bucket I know exists.. I receive this error
NoCredentialsError: Unable to locate credentials
Any thoughts on how to approach this?
Thanks!

In your code, you have created an Airflow Connection object, but this doesn't do anything by itself. When a hook is given a connection id, it will look up the given id in various locations (in this order):
Secrets backend (if configured)
Environment variable AIRFLOW_CONN_*
Airflow metastore
Your connection is currently only defined in code, but Airflow is unable to locate it in any of the three locations above.
The Airflow documentation provides some pointers for configuring an AWS connection: https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html

Related

Airflow conn_id missing

I'm running a containerized airflow project which loads API data to Azure Blob or Data Lake. I'm currently having trouble getting airflow to identify my connections. I've tried several methods to resolve to issue but I still haven't progressed in fixing this problem.
I've tried manually adding connections in the airflow UI inputting
conn_id="azure_data_lake",
conn_type="Azure Blob Storage",
host="",
login= StorageAccountName,
password=StorageAccountKey
port=""
however, once I run the dag I get this error. I've tried running airflow db reset and airflow db init.
File "/opt/airflow/plugins/operators/coinmarketcap_toAzureDataLake.py", line 60, in upload_to_azureLake
wasb_hook = WasbHook(self.azure_conn_id)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/microsoft/azure/hooks/wasb.py", line 65, in __init__
self.connection = self.get_conn()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/microsoft/azure/hooks/wasb.py", line 71, in get_conn
return BlockBlobService(account_name=conn.login, account_key=conn.password, **service_options)
File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/attributes.py", line 365, in __get__
retval = self.descriptor.__get__(instance, owner)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/connection.py", line 213, in get_password
return fernet.decrypt(bytes(self._password, 'utf-8')).decode()
File "/home/airflow/.local/lib/python3.8/site-packages/cryptography/fernet.py", line 194, in decrypt
raise InvalidToken
cryptography.fernet.InvalidToken
If I programmatically add this via a python script. Running the airflow dag gives me a missing conn_id error. But surprisingly when I run the airflow connections list command I see the conn_id in the db.
from airflow import settings
from airflow.models import Connection
conn = Connection(
conn_id="azure_data_lake",
conn_type="Azure Blob Storage",
host="",
login= StorageAccountName,
password=StorageAccountKey
port=""
) #create a connection object
session = settings.Session() # get the session
session.add(conn)
session.commit()
In your case, there is a problem with token cryptography.fernet.InvalidToken.
The Airflow uses Fernet for all passwords for its connections in the backend database.
The Airflow backend is using the previous fernet key and you have generated a key using which you have created a new connection.
My recommendation is to do the following first:
airflow resetdb
this will help in deleting all the existing records in your backend db.
Then,
airflow initdb
this will initialize the backend like fresh.
If the error still persists, change your fernet_key (airflow.cfg > fernet_key)
$ python
>>> from cryptography.fernet import Fernet
>>> k=Fernet.generate_key()
>>> print(k)
Z6BkzaWcF7r5cC-VMAumjpBpudSyjGskQ0ObquGJhG0=
than, edit $AIRFLOW_HOME/airflow.cfg

Connecting to Milvus Database through Google Kubernetes Engine and Python

I’m looking to connect to a Milvus database I deployed on Google Kubernetes Engine.
I am running into an error in the last line of the script. I'm running the script locally.
Here's the process I followed to set up the GKE cluster: (https://milvus.io/docs/v2.0.0/gcp.md)
Here is a similar question I'm drawing from
Any thoughts on what I'm missing?
import os
from pymilvus import connections
from kubernetes import client, config
My_Kubernetes_IP = 'XX.XXX.XX.XX'
# Authenticate with GCP credentials
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = os.path.abspath('credentials.json')
# load milvus config file and connect to GKE instance
config = client.Configuration(os.path.abspath('milvus/config.yaml'))
config.host = f'https://{My_Kubernetes_IP}:19530'
client.Configuration.set_default(config)
## connect to milvus
milvus_ip = 'xx.xxx.xx.xx'
connections.connect(host=milvus_ip, port= '19530')
Error:
BaseException: <BaseException: (code=2, message=Fail connecting to server on xx.xxx.xx.xx:19530. Timeout)>
If you want to connect to the Milvus in the k8s cluster by ip+port, you may need to forward your local port 19530 to the Milvus service. Use a command like the following:
$ kubectl port-forward service/my-release-milvus 19530
Have you checked where your milvus external IP is?
Follow the instructions by the documentation you should use kubectl get services to check which external IP is allocated for the milvus.

AWS RDS Proxy error (postgres) - RDS Proxy currently doesn’t support command-line options

I try to read or write from/to an AWS RDS Proxy with a postgres RDS as the endpoint.
The operation works with psql but fails on the same client with pg8000 or psycopg2 as client libraries in Python.
The operation works with with pg8000 and psycopg2 if I use the RDS directly as endpoint (without the RDS proxy).
sqlaclchemy/psycopg2 error message:
Feature not supported: RDS Proxy currently doesn’t support command-line options.
A minimal version of the code I use:
from sqlalchemy import create_engine
import os
from dotenv import load_dotenv
load_dotenv()
login_string = os.environ['login_string_proxy']
engine = create_engine(login_string, client_encoding="utf8", echo=True, connect_args={'options': '-csearch_path={}'.format("testing")})
engine.execute(f"INSERT INTO testing.mytable (product) VALUES ('123')")
pg8000: the place it stops / waits for something is in core.py:
def sock_read(b):
try:
return self._sock.read(b)
except OSError as e:
raise InterfaceError("network error on read") from e
A minimal version of the code I use:
import pg8000
import os
from dotenv import load_dotenv
load_dotenv()
db_connection = pg8000.connect(database=os.environ['database'], host=os.environ['host'], port=os.environ['port'], user=os.environ['user'], password=os.environ['password'])
db_connection.run(f"INSERT INTO mytable (data) VALUES ('data')")
db_connection.commit()
db_connection.close()
The logs in the RDS Proxy looks always normal for all the examples I mentioned - e.g.:
A new client connected from ...:60614.
Received Startup Message: [username="", database="", protocolMajorVersion=3, protocolMinorVersion=0, sslEnabled=false]
Proxy authentication with PostgreSQL native password authentication succeeded for user "" with TLS off.
A TCP connection was established from the proxy at ...:42795 to the database at ...:5432.
The new database connection successfully authenticated with TLS off.
I opened up all ports via security groups on the RDS and the RDS proxy and I used an EC2 inside the VPC.
I tried with autocommit on and off.
The 'command-line option" being referred to is the -csearch_path={}.
Remove that, and then once the connection is established execute set search_path = whatever as your first query.
This is a known issue that pg8000 can't connect to AWS RDS proxy (postgres). I did a PR https://github.com/tlocke/pg8000/pull/72 let see if Tony Locke (the father of pg8000) approves the change. ( if not you have to change the lines of the core.py https://github.com/tlocke/pg8000/pull/72/files )
self._write(FLUSH_MSG)
if (code != PASSWORD):
self._write(FLUSH_MSG)

How do I save database configuration without writting them on my python file

I have a python application that requires database credentials, the code looks something like this.
def __init__ (self):
self.conn = pymysql.connect("localhost","user","pass","db", use_unicode=True, charset="utf8")
As you can see, I am hard-coding the sensitive data, and when I push this code to github I have to remove it, and when I run it in the server, I have to modify it. So, basically I have to edit this file for prod/dev environment.
I know we can store variables in Linux as database_name=foobar and later echo $database_name to retrieve the value, but how do I use this in my python application?
you mean envirnment variables?
you can access them like this:
import os
os.getenv('DATABASE_NAME')
What I have been doing in similar cases is to keep a separate module with all config settings, declared as "constant" variables, like this:
#global_setup.py (separate module)
MY_DB_SERVER = "localhost"
MY_DB_USER = "user"
MY_DB_PASS = "pass"
MY_DB_DB = "db"
Then you import everything from that module whenever you need to use those constants. You can create a separate version of that file whitout sensitive info in order to upload to public Git servers.
# main module
from global_setup import *
def __init__ (self):
self.conn = pymysql.connect(MY_DB_SERVER, MY_DB_USER, MY_DB_PASS, MY_DB_DB, use_unicode=True, charset="utf8")
Now, take care in case your application will be deployed in an environment where the user should not be able to access the database itself, or if it will be accessing the database through a non encrypted connection. You may need more security measures in those cases, like connection through SSL, or having a server-side application creating an API.
You can use envparse module. It allows you to use environment variables and cast them to the proper types.
You can add variables for each value, as database name, database host, or create a postprocessor and define the variable as an URL:
from envparse import env
db_host = env(DB_HOST, 'localhost')
db_port = env.int(DB_PORT, 3306)
db_user = env(DB_USER)
db_pass = env(DB_PASS)
db_name = env(DB_NAME)
conn = pymysql.connect(db_host,db_user,db_pass,db_name, use_unicode=True, charset="utf8")
There is a module ConfigParser in python to create .ini file and save credentials and read when you required them.
https://docs.python.org/3/library/configparser.html

NoAuthHandlerFound AWS boto ec2 - With env variables set

I am getting what seems to be quite a common error for people starting with AWS, python and boto.
NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV4Handler'] Check your credentials
I have tried this and this but still get the error.
I know the credentials work and are correct because I have used them to test previous things such as an rds connections.
Script for rds is the following:
import boto.rds as rds
import boto3 as b3
import boto
from sqlalchemy import create_engine
conn = boto.rds.connect_to_region("us-west-2",aws_access_key_id='<ID>',aws_secret_access_key='<KEY>')
engine = create_engine('postgresql://my_id:my_pass#datawarehouse.stuff.us-west-2.rds.amazonaws.com/db_name', echo=False)
res = engine.execute("select * from table")
print res,engine
Which runs without error.
Is there anything I am missing in terms of VPC? Access rights?
Its making me nuts!
I have BOTO_CONFIG set to C:/Users/%USER%/boto.config at the user level (not system level).
and C:/Users/%USER%/boto.config reads as:
[default]
aws_access_key_id = <MY_ID>
aws_secret_access_key = <MY_SECRET>
print boto.__version__
yields:
2.40.0
Thanks for any help.

Categories