Airflow conn_id missing - python

I'm running a containerized airflow project which loads API data to Azure Blob or Data Lake. I'm currently having trouble getting airflow to identify my connections. I've tried several methods to resolve to issue but I still haven't progressed in fixing this problem.
I've tried manually adding connections in the airflow UI inputting
conn_id="azure_data_lake",
conn_type="Azure Blob Storage",
host="",
login= StorageAccountName,
password=StorageAccountKey
port=""
however, once I run the dag I get this error. I've tried running airflow db reset and airflow db init.
File "/opt/airflow/plugins/operators/coinmarketcap_toAzureDataLake.py", line 60, in upload_to_azureLake
wasb_hook = WasbHook(self.azure_conn_id)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/microsoft/azure/hooks/wasb.py", line 65, in __init__
self.connection = self.get_conn()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/microsoft/azure/hooks/wasb.py", line 71, in get_conn
return BlockBlobService(account_name=conn.login, account_key=conn.password, **service_options)
File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/attributes.py", line 365, in __get__
retval = self.descriptor.__get__(instance, owner)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/connection.py", line 213, in get_password
return fernet.decrypt(bytes(self._password, 'utf-8')).decode()
File "/home/airflow/.local/lib/python3.8/site-packages/cryptography/fernet.py", line 194, in decrypt
raise InvalidToken
cryptography.fernet.InvalidToken
If I programmatically add this via a python script. Running the airflow dag gives me a missing conn_id error. But surprisingly when I run the airflow connections list command I see the conn_id in the db.
from airflow import settings
from airflow.models import Connection
conn = Connection(
conn_id="azure_data_lake",
conn_type="Azure Blob Storage",
host="",
login= StorageAccountName,
password=StorageAccountKey
port=""
) #create a connection object
session = settings.Session() # get the session
session.add(conn)
session.commit()

In your case, there is a problem with token cryptography.fernet.InvalidToken.
The Airflow uses Fernet for all passwords for its connections in the backend database.
The Airflow backend is using the previous fernet key and you have generated a key using which you have created a new connection.
My recommendation is to do the following first:
airflow resetdb
this will help in deleting all the existing records in your backend db.
Then,
airflow initdb
this will initialize the backend like fresh.
If the error still persists, change your fernet_key (airflow.cfg > fernet_key)
$ python
>>> from cryptography.fernet import Fernet
>>> k=Fernet.generate_key()
>>> print(k)
Z6BkzaWcF7r5cC-VMAumjpBpudSyjGskQ0ObquGJhG0=
than, edit $AIRFLOW_HOME/airflow.cfg

Related

Discord.py | Mysqlx not working in heroku

Im trying to connect my DB for my discord bot.
I decided to use mysqlx module for python but when I run the code in heroku raises an error.
I'm using github for deployment method
Version checked.
It is locally working.
Checked get_session definition in local.
get_session definition in local:
def get_session(*args, **kwargs):
"""Creates a Session instance using the provided connection data.
Args:
*args: Variable length argument list with the connection data used
to connect to a MySQL server. It can be a dictionary or a
connection string.
**kwargs: Arbitrary keyword arguments with connection data used to
connect to the database.
Returns:
mysqlx.Session: Session object.
"""
settings = _get_connection_settings(*args, **kwargs)
return Session(settings)
I have another app hosted in Heroku that uses the same module and deployment method and works perfectly. I can't understand why it doesn't work in this heroku.
Heroku error:
File "/app/bot.py", line 25, in <module>
session = mysqlx.get_session({
AttributeError: module 'mysqlx' has no attribute 'get_session'
The code:
import mysqlx
session = mysqlx.get_session({
"host": "",
"port": 3306,
"user": "",
"password": ""
})

Unable to locate credentials setting up airflow connection through env variables

So I am trying to set up an S3Hook in my airflow dag, by setting the connection programmatically in my script, like so
from airflow.hooks.S3_hook import S3Hook
from airflow.models import Connection
from airflow import settings
def s3_test_hook():
conn = Connection(
conn_id='aws-s3',
conn_type='s3',
extra={"aws_access_key_id":aws_key,
"aws_secret_access_key": aws_secret},
)
I can run the conn line no problem, which tells me the connection can be made. aws_key and aws_secret are loaded int through dotenv with an .env file I have in my local directory.
However when I run the next two lines in the function:
s3_hook = S3Hook(aws_conn_id='aws-s3')
find_bucket = s3_hook.check_for_bucket('nba-data')
to check for a bucket I know exists.. I receive this error
NoCredentialsError: Unable to locate credentials
Any thoughts on how to approach this?
Thanks!
In your code, you have created an Airflow Connection object, but this doesn't do anything by itself. When a hook is given a connection id, it will look up the given id in various locations (in this order):
Secrets backend (if configured)
Environment variable AIRFLOW_CONN_*
Airflow metastore
Your connection is currently only defined in code, but Airflow is unable to locate it in any of the three locations above.
The Airflow documentation provides some pointers for configuring an AWS connection: https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html

Set IAM Policy works on local machine but not in GCE instance

The following lines from my Python app execute with no problems on my local machine.
import googleapiclient.discovery
project_id = 'some-project-id'
resource_manager = googleapiclient.discovery.build('cloudresourcemanager', 'v1')
iam_policy_request = resource_manager.projects().getIamPolicy(resource=project_id, body={})
iam_policy_response = iam_policy_request.execute(num_retries=3)
new_policy = dict()
new_policy['policy'] = iam_policy_response
del new_policy['policy']['version']
iam_policy_update_request = resourcemanager.projects().setIamPolicy(resource=project_id, body=new_policy)
update_result = iam_policy_update_request.execute(num_retries=3)
When I run the app in a GCE instance, and more precisely from within a Docker container inside the GCE instance, I get the exception:
URL being requested: POST https://cloudresourcemanager.googleapis.com/v1/projects/some-project-id:setIamPolicy?alt=json
Traceback (most recent call last):
File "/env/lib/python3.5/site-packages/google/api_core/grpc_helpers.py", line 54, in error_remapped_callable
return callable_(*args, **kwargs)
File "/env/lib/python3.5/site-packages/grpc/_channel.py", line 487, in __call__
return _end_unary_response_blocking(state, call, False, deadline)
File "/env/lib/python3.5/site-packages/grpc/_channel.py", line 437, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.PERMISSION_DENIED, User not authorized to perform this action.)>
i.e. an authorization error. Oddly, when I open a Python terminal session inside the GCE instance and run the Python code line by line, I do not get the exception. It only throws the exception when the code is running as part of the app.
I am using a service account inside of the GCE instance, as opposed to my regular account on my local machine. But I don't think that is the problem since I am able to run the lines of code one by one inside of the instance while still relying on the service account roles.
I would like to be able to run the app without the exception within the Docker container inside of GCE. I feel like I'm missing something but can't figure out what the missing piece is.
Looking to your issue it seems an authentication issue, because your application is not properly authenticated :
1- First run this command it will let your application temporarily use your own user credentials:
gcloud beta auth application-default login
the output should be like this:
Credentials saved to file: $SOME_PATH/application_default_credentials.json
2-Then you have set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path to the key file:
export GOOGLE_APPLICATION_CREDENTIALS=$SOME_PATH/application_default_credentials.json
Try to run you Application after that.

Using Python sdk | Couchbase

I am new with this, I was trying to connect to the couchbase bucket using python, just tried whatever given in the documentation.
from couchbase.cluster import Cluster
from couchbase.cluster import PasswordAuthenticator
cluster = Cluster('couchbase://localhost')
authenticator = PasswordAuthenticator('username', 'password')
cluster.authenticate(authenticator)
bucket = cluster.open_bucket('bucket-name')
Replaced the IP, username, password, bucket name. But I am getting this error everytime I am trying to run this,
File "get_data_couche.py", line 6, in <module>
bucket = cluster.open_bucket('jizo')
File "/usr/lib64/python2.7/site-packages/couchbase/cluster.py", line 100, in open_bucket
rv = self.bucket_class(str(connstr), **kwargs)
File "/usr/lib64/python2.7/site-packages/couchbase/bucket.py", line 252, in __init__
self._do_ctor_connect()
File "/usr/lib64/python2.7/site-packages/couchbase/bucket.py", line 261, in _do_ctor_connect
self._connect()
couchbase.exceptions._AuthError_0x2 (generated, catch AuthError): <RC=0x2[Authentication failed. You may have provided an invalid username/password combination], There was a problem while trying to send/receive your request over the network. This may be a result of a bad network or a misconfigured client or server, C Source=(src/bucket.c,793)>
Please help me with this, not able to find so much on internet. Thanks in advance.

DNS query using Google App Engine socket

I'm trying to use the new socket support for Google App Engine in order to perform some DNS queries. I'm using dnspython to perform the query, and the code works fine outside GAE.
The code is the following:
class DnsQuery(webapp2.RequestHandler):
def get(self):
domain = self.request.get('domain')
logging.info("Test Query for "+domain)
answers = dns.resolver.query(domain, 'TXT', tcp=True)
logging.info("DNS OK")
for rdata in answers:
rc = str(rdata.exchange).lower()
logging.info("Record "+rc)
When I run in GAE I get the following error:
File "/base/data/home/apps/s~/one.366576281491296772/main.py", line 37, in post
return self.get()
File "/base/data/home/apps/s~/one.366576281491296772/main.py", line 41, in get
answers = dns.resolver.query(domain, 'TXT', tcp=True)
File "/base/data/home/apps/s~/one.366576281491296772/dns/resolver.py", line 976, in query
raise_on_no_answer, source_port)
File "/base/data/home/apps/s~/one.366576281491296772/dns/resolver.py", line 821, in query
timeout = self._compute_timeout(start)
File "/base/data/home/apps/s~/one.366576281491296772/dns/resolver.py", line 735, in _compute_timeout
raise Timeout
Which is raised by dnspython when no answer is returned within the time limit. I've raised the timelimit to 60 seconds, and DnsQuery is a task, but still getting the same error.
Is there any limitation in Google App Engine socket implementation, which prevents the execution of DNS requests ?
This is a bug and will be fixed ASAP.
As a workaround, pass in the source='' argument to dns.resolver.query.
tcp=True is not necessary.
No. There is no limit on UDP ports. (only smtp ports on TCP).
It is possible there is an issue with the socket service routing. Please file an issue with the app engine issue tracker. https://code.google.com/p/googleappengine/issues/list
dnspython is using socket. However, socket is only available in paid apps.1

Categories