Can not acces to ADLS data, error 'response' [duplicate] - python

I 'm looking at Microsoft Documentation here and here, I have created Web App in Azure Active Directory to access the Data Lake Store
From the Web App I have Object ID, Application ID and Key
looking at the documentations I see this:
adlCreds = lib.auth(tenant_id = 'FILL-IN-HERE', client_secret = 'FILL-IN-HERE', client_id = 'FILL-IN-HERE', resource = 'https://datalake.azure.net/')
how to use it to authenticate my code and run operation on Data Lake Store?
here is my full test code:
## Use this for Azure AD authentication
from msrestazure.azure_active_directory import AADTokenCredentials
## Required for Azure Data Lake Store account management
from azure.mgmt.datalake.store import DataLakeStoreAccountManagementClient
from azure.mgmt.datalake.store.models import DataLakeStoreAccount
## Required for Azure Data Lake Store filesystem management
from azure.datalake.store import core, lib, multithread
# Common Azure imports
import adal
from azure.mgmt.resource.resources import ResourceManagementClient
from azure.mgmt.resource.resources.models import ResourceGroup
## Use these as needed for your application
import logging, getpass, pprint, uuid, time
## Declare variables
subscriptionId = 'FILL-IN-HERE'
adlsAccountName = 'FILL-IN-HERE'
tenant_id = 'FILL-IN-HERE'
client_secret = 'FILL-IN-HERE'
client_id = 'FILL-IN-HERE'
## adlCreds = lib.auth(tenant_id = 'FILL-IN-HERE', client_secret = 'FILL-IN-HERE', client_id = 'FILL-IN-HERE', resource = 'https://datalake.azure.net/')
from azure.common.credentials import ServicePrincipalCredentials
adlCreds = lib.auth(tenant_id, client_secret, client_id, resource = 'https://datalake.azure.net/')
## Create a filesystem client object
adlsFileSystemClient = core.AzureDLFileSystem(adlCreds, store_name=adlsAccountName)
## Create a directory
adlsFileSystemClient.mkdir('/mysampledirectory')
when I try to ru the code I get error:
[Running] python "c:....\dls.py"
Traceback (most recent call last):
File "c:....\dls.py", line 38, in
adlCreds = lib.auth(tenant_id, client_secret, client_id, resource = 'https://datalake.azure.net/')
File "C:\Python36\lib\site-packages\azure\datalake\store\lib.py", line 130, in auth password, client_id)
File "C:\Python36\lib\site-packages\adal\authentication_context.py", line 145, in acquire_token_with_username_password
return self._acquire_token(token_func)
File "C:\Python36\lib\site-packages\adal\authentication_context.py", line 109, in _acquire_token
return token_func(self)
File "C:\Python36\lib\site-packages\adal\authentication_context.py", line 143, in token_func
return token_request.get_token_with_username_password(username, password)
File "C:\Python36\lib\site-packages\adal\token_request.py", line 280, in get_token_with_username_password
self._user_realm.discover()
File "C:\Python36\lib\site-packages\adal\user_realm.py", line 152, in discover
raise AdalError(return_error_string, error_response)
adal.adal_error.AdalError: User Realm Discovery request returned http error: 404 and server response:
404 - File or directory not found.
Server Error
404 - File or directory not found.
The resource you are looking for might have been removed, had its name changed, or is temporarily unavailable.
[Done] exited with code=1 in 1.216 seconds

There are two different ways of authenticating. The first one is interactive which is suitable for end users. It even works with multi factor authentication.
Here is how you do it. You need to be interactive in order to log on.
from azure.datalake.store import core, lib, multithread
token = lib.auth()
The second method is to use service principal identities in Azure Active directory. A step by step tutorial for setting up an Azure AD application, retrieving the client id and secret and configuring access using the SPI is available here: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-service-to-service-authenticate-using-active-directory#create-an-active-directory-application
from azure.common.credentials import ServicePrincipalCredentials
token = lib.auth(tenant_id = '<your azure tenant id>', client_secret = '<your client secret>', client_id = '<your client id>')
Here is blog post that shows how to access it through pandas and Jupyter. It also has a step by step on how to get the authentication token. https://medium.com/azure-data-lake/using-jupyter-notebooks-and-pandas-with-azure-data-lake-store-48737fbad305

Related

Unable to access Azure Vault from Azure Container Instance using System assigned identity

I am not able to access vault from Azure Container Instance deployed into a private network with system managed identity.
My code works fine if i use a service principal to access vault , by pass the environment variable to the container.
https://learn.microsoft.com/en-us/azure/developer/python/azure-sdk-authenticate?tabs=bash
my code:
import os
from azure.keyvault.secrets import SecretClient
from azure.identity import DefaultAzureCredential
keyVaultName = 'XXXXXXX'
KVUri = "https://" + keyVaultName + ".vault.azure.net"
credential = DefaultAzureCredential()
client = SecretClient(vault_url=KVUri, credential=credential)
def secretVal(name):
logging.debug("Retriving the secret from vault for %s", name)
val = client.get_secret(name)
return val.value
error
2020-05-21:02:09:37,349 INFO [_universal.py:412] Request URL: 'http://169.254.169.254/metadata/identity/oauth2/token'
2020-05-21:02:09:37,349 INFO [_universal.py:413] Request method: 'GET'
2020-05-21:02:09:37,349 INFO [_universal.py:414] Request headers:
2020-05-21:02:09:37,349 INFO [_universal.py:417] 'Metadata': 'REDACTED'
2020-05-21:02:09:37,349 INFO [_universal.py:417] 'User-Agent': 'azsdk-python-identity/1.3.1 Python/3.8.3 (Linux-4.15.0-1082-azure-x86_64-with-glibc2.2.5)'
2020-05-21:02:09:37,352 DEBUG [connectionpool.py:226] Starting new HTTP connection (1): 169.254.169.254:80
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/azure/identity/_credentials/default.py", line 105, in get_token
return super(DefaultAzureCredential, self).get_token(*scopes, **kwargs)
File "/usr/local/lib/python3.8/site-packages/azure/identity/_credentials/chained.py", line 71, in get_token
raise ClientAuthenticationError(message=error_message)
azure.core.exceptions.ClientAuthenticationError: No credential in this chain provided a token.
Attempted credentials:
EnvironmentCredential: Incomplete environment configuration. See https://aka.ms/python-sdk-identity#environment-variables for expected environment variables
ImdsCredential: IMDS endpoint unavailable
The issue seems to something similar to the below.
https://github.com/Azure/azure-sdk-for-python/issues/8557
i tried pausing my code for the metadata service to be available using the below while creating the instance. But it still doesn't work.
--command-line "/bin/bash -c 'sleep 90; /usr/local/bin/python xxxx.py'"
Unfortunately, the managed identity of the Azure Container Instance does not support when you create it in the virtual network. See the limitations:
You can't use a managed identity in a container group deployed to a
virtual network.
The ACI in the virtual network is a preview version currently. All the limitations are shown here. So when it's in the Vnet, use the service principal to authenticate, it's similar to the Managed identity, just display in different styles.

Azure Data Factory Pipelines: Creating pipelines with Python: Authentication (via az cli)

I'm trying to create azure data factory pipelines via python, using the example provided by Microsoft here:
https://learn.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-python
def main():
# Azure subscription ID
subscription_id = '<Specify your Azure Subscription ID>'
# This program creates this resource group. If it's an existing resource group, comment out the code that creates the resource group
rg_name = 'ADFTutorialResourceGroup'
# The data factory name. It must be globally unique.
df_name = '<Specify a name for the data factory. It must be globally unique>'
# Specify your Active Directory client ID, client secret, and tenant ID
credentials = ServicePrincipalCredentials(client_id='<Active Directory application/client ID>', secret='<client secret>', tenant='<Active Directory tenant ID>')
resource_client = ResourceManagementClient(credentials, subscription_id)
adf_client = DataFactoryManagementClient(credentials, subscription_id)
rg_params = {'location':'eastus'}
df_params = {'location':'eastus'}
However I cannot pass the credentials in as shown above since azure login is carried out as a separate step earlier in the pipeline, leaving me with an authenticated session to azure (no other credentials may be passed into this script).
Before I run the python code to create the pipeline, I do "az login" via a Jenkins deployment pipeline, which gets me an authenticated azurerm session. I should be able to re-use this session in the python script to get a data factory client, without authenticating again.
However, I'm unsure how to modify the client creation part of the code, as there do not seem to be any examples that make use of an already established azurerm session:
adf_client = DataFactoryManagementClient(credentials, subscription_id)
rg_params = {'location':'eastus'}
df_params = {'location':'eastus'}
#Create a data factory
df_resource = Factory(location='eastus')
df = adf_client.factories.create_or_update(rg_name, df_name, df_resource)
print_item(df)
while df.provisioning_state != 'Succeeded':
df = adf_client.factories.get(rg_name, df_name)
time.sleep(1)
Microsofts authentication documentation suggests I can authenticate using a previously established session as follows:
from azure.common.client_factory import get_client_from_cli_profile
from azure.mgmt.compute import ComputeManagementClient
client = get_client_from_cli_profile(ComputeManagementClient)
( ref: https://learn.microsoft.com/en-us/python/azure/python-sdk-azure-authenticate?view=azure-python )
This works, however azure data factory object instantiation fails with:
Traceback (most recent call last):
File "post-scripts/check-data-factory.py", line 72, in <module>
main()
File "post-scripts/check-data-factory.py", line 65, in main
df = adf_client.factories.create_or_update(rg_name, data_factory_name, df_resource)
AttributeError: 'ComputeManagementClient' object has no attribute 'factories'
So perhaps some extra steps are required between this and getting a df object?
Any clue appreciated!
Just replace the class with the correct type:
from azure.common.client_factory import get_client_from_cli_profile
from azure.mgmt.resource import ResourceManagementClient
from azure.mgmt.datafactory import DataFactoryManagementClient
resource_client = get_client_from_cli_profile(ResourceManagementClient)
adf_client = get_client_from_cli_profile(DataFactoryManagementClient)
The error you got is because you created a Compute client (to handle VM), not a ADF client. But yes, you found the right doc for your needs :)
(disclosure: I work at MS in the Python SDK team)

Python code to access Azure Data Lake Store

I 'm looking at Microsoft Documentation here and here, I have created Web App in Azure Active Directory to access the Data Lake Store
From the Web App I have Object ID, Application ID and Key
looking at the documentations I see this:
adlCreds = lib.auth(tenant_id = 'FILL-IN-HERE', client_secret = 'FILL-IN-HERE', client_id = 'FILL-IN-HERE', resource = 'https://datalake.azure.net/')
how to use it to authenticate my code and run operation on Data Lake Store?
here is my full test code:
## Use this for Azure AD authentication
from msrestazure.azure_active_directory import AADTokenCredentials
## Required for Azure Data Lake Store account management
from azure.mgmt.datalake.store import DataLakeStoreAccountManagementClient
from azure.mgmt.datalake.store.models import DataLakeStoreAccount
## Required for Azure Data Lake Store filesystem management
from azure.datalake.store import core, lib, multithread
# Common Azure imports
import adal
from azure.mgmt.resource.resources import ResourceManagementClient
from azure.mgmt.resource.resources.models import ResourceGroup
## Use these as needed for your application
import logging, getpass, pprint, uuid, time
## Declare variables
subscriptionId = 'FILL-IN-HERE'
adlsAccountName = 'FILL-IN-HERE'
tenant_id = 'FILL-IN-HERE'
client_secret = 'FILL-IN-HERE'
client_id = 'FILL-IN-HERE'
## adlCreds = lib.auth(tenant_id = 'FILL-IN-HERE', client_secret = 'FILL-IN-HERE', client_id = 'FILL-IN-HERE', resource = 'https://datalake.azure.net/')
from azure.common.credentials import ServicePrincipalCredentials
adlCreds = lib.auth(tenant_id, client_secret, client_id, resource = 'https://datalake.azure.net/')
## Create a filesystem client object
adlsFileSystemClient = core.AzureDLFileSystem(adlCreds, store_name=adlsAccountName)
## Create a directory
adlsFileSystemClient.mkdir('/mysampledirectory')
when I try to ru the code I get error:
[Running] python "c:....\dls.py"
Traceback (most recent call last):
File "c:....\dls.py", line 38, in
adlCreds = lib.auth(tenant_id, client_secret, client_id, resource = 'https://datalake.azure.net/')
File "C:\Python36\lib\site-packages\azure\datalake\store\lib.py", line 130, in auth password, client_id)
File "C:\Python36\lib\site-packages\adal\authentication_context.py", line 145, in acquire_token_with_username_password
return self._acquire_token(token_func)
File "C:\Python36\lib\site-packages\adal\authentication_context.py", line 109, in _acquire_token
return token_func(self)
File "C:\Python36\lib\site-packages\adal\authentication_context.py", line 143, in token_func
return token_request.get_token_with_username_password(username, password)
File "C:\Python36\lib\site-packages\adal\token_request.py", line 280, in get_token_with_username_password
self._user_realm.discover()
File "C:\Python36\lib\site-packages\adal\user_realm.py", line 152, in discover
raise AdalError(return_error_string, error_response)
adal.adal_error.AdalError: User Realm Discovery request returned http error: 404 and server response:
404 - File or directory not found.
Server Error
404 - File or directory not found.
The resource you are looking for might have been removed, had its name changed, or is temporarily unavailable.
[Done] exited with code=1 in 1.216 seconds
There are two different ways of authenticating. The first one is interactive which is suitable for end users. It even works with multi factor authentication.
Here is how you do it. You need to be interactive in order to log on.
from azure.datalake.store import core, lib, multithread
token = lib.auth()
The second method is to use service principal identities in Azure Active directory. A step by step tutorial for setting up an Azure AD application, retrieving the client id and secret and configuring access using the SPI is available here: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-service-to-service-authenticate-using-active-directory#create-an-active-directory-application
from azure.common.credentials import ServicePrincipalCredentials
token = lib.auth(tenant_id = '<your azure tenant id>', client_secret = '<your client secret>', client_id = '<your client id>')
Here is blog post that shows how to access it through pandas and Jupyter. It also has a step by step on how to get the authentication token. https://medium.com/azure-data-lake/using-jupyter-notebooks-and-pandas-with-azure-data-lake-store-48737fbad305

Verifying firebase auth token permission denied

Following the directions in the google docs for using firebase for auth in GAE, I am sending an authorization token from Android to my backend python server. Reading that token using the following code:
import google.auth.transport.requests
import google.oauth2.id_token
HTTP_REQUEST = google.auth.transport.requests.Request()
id_token = headers['authorization'].split(' ').pop()
user_info = google.oauth2.id_token.verify_firebase_token(
id_token, HTTP_REQUEST)
results in the following stack trace:
File "/Users/alex/projects/don/don_server/mobile/main.py", line 61, in get_video
user_id = get_user_id(self.request_state.headers)
File "/Users/alex/projects/don/don_server/mobile/main.py", line 37, in get_user_id
id_token, HTTP_REQUEST)
File "/Users/alex/projects/don/don_server/mobile/lib/google/oauth2/id_token.py", line 115, in verify_firebase_token
id_token, request, audience=audience, certs_url=_GOOGLE_APIS_CERTS_URL)
File "/Users/alex/projects/don/don_server/mobile/lib/google/oauth2/id_token.py", line 76, in verify_token
certs = _fetch_certs(request, certs_url)
File "/Users/alex/projects/don/don_server/mobile/lib/google/oauth2/id_token.py", line 50, in _fetch_certs
response = request(certs_url, method='GET')
File "/Users/alex/projects/don/don_server/mobile/lib/google/auth/transport/requests.py", line 111, in __call__
raise exceptions.TransportError(exc)
TransportError: ('Connection aborted.', error(13, 'Permission denied'))
I've double checked my firebase project settings and localhost is listed as an authorized domain in the authentication sign-in section (I'm running this on the GAE local dev server).
As far as I can recall this was working a couple weeks ago. Any ideas?
UPDATE:
I implemented the same authentication using a service account as recommended in the firebase docs but am getting the same error message:
from firebase_admin import auth, credentials
import firebase_admin
fpath = os.path.join(os.path.dirname(__file__), 'shared', 'firebase-admin-private-key.json')
cred = credentials.Certificate(fpath)
firebase_admin.initialize_app(cred)
Then to process an incoming token
id_token = headers['authorization'].split(' ').pop()
user_info = auth.verify_id_token(id_token)
At some point I upgraded my requests library. Because requests doesn't play well with GAE, the calls to the firebase server failed. By downgrading to version 2.3.0 this now works.
pip install -t lib requests==2.3.0
Alternatively monkeypatching requests as suggested in this answer works as well!
import requests_toolbelt.adapters.appengine
requests_toolbelt.adapters.appengine.monkeypatch()

Google Api Auth Http Module Error

I'm working on my first app ever to use Google Api for Calendar. I've read the Google examples at: https://developers.google.com/google-apps/calendar/instantiate
The first time I ran the program below it was successful. I allowed my app to access my Google account and the application made a calendar.dat file with the auth info in my app-directory. After I renamed the filed the code was in the auth stopped working. I have already deleted the file entirely and recreated it from scratch, but the error persists.
I do still get the Google authentication page and can still confirm access, after which I get a message that the authentication flow was completed.
This is the code (standard Google example which I fill in with my app details):
import gflags
import httplib2
from apiclient.discovery import build
from oauth2client.file import Storage
from oauth2client.client import OAuth2WebServerFlow
from oauth2client.tools import run
FLAGS = gflags.FLAGS
# Set up a Flow object to be used if we need to authenticate. This
# sample uses OAuth 2.0, and we set up the OAuth2WebServerFlow with
# the information it needs to authenticate. Note that it is called
# the Web Server Flow, but it can also handle the flow for native
# applications
# The client_id and client_secret are copied from the API Access tab on
# the Google APIs Console
FLOW = OAuth2WebServerFlow(
client_id='YOUR_CLIENT_ID',
client_secret='YOUR_CLIENT_SECRET',
scope='https://www.googleapis.com/auth/calendar',
user_agent='YOUR_APPLICATION_NAME/YOUR_APPLICATION_VERSION')
# To disable the local server feature, uncomment the following line:
# FLAGS.auth_local_webserver = False
# If the Credentials don't exist or are invalid, run through the native client
# flow. The Storage object will ensure that if successful the good
# Credentials will get written back to a file.
storage = Storage('calendar.dat')
credentials = storage.get()
if credentials is None or credentials.invalid == True:
credentials = run(FLOW, storage)
# Create an httplib2.Http object to handle our HTTP requests and authorize it
# with our good Credentials.
http = httplib2.Http()
http = credentials.authorize(http)
# Build a service object for interacting with the API. Visit
# the Google APIs Console
# to get a developerKey for your own application.
service = build(serviceName='calendar', version='v3', http=http,
developerKey='YOUR_DEVELOPER_KEY')
And this is the output:
Your browser has been opened to visit:
https://accounts.google.com/o/oauth2/auth? (auth url shortened)
If your browser is on a different machine then exit and re-run this
application with the command-line parameter
--noauth_local_webserver
Traceback (most recent call last):
File "C:\Users\Desktop\Google Drive\Code\Python\Rooster\calendar.py", line 2, in <module>
import httplib2
File "C:\Python27\lib\site-packages\httplib2-0.7.6-py2.7.egg\httplib2\__init__.py", line 42, in <module>
import calendar
File "C:\Users\Desktop\Google Drive\Code\Python\Rooster\calendar.py", line 33, in <module>
credentials = run(FLOW, storage)
File "C:\Python27\lib\site-packages\google_api_python_client-1.0-py2.7.egg\oauth2client\util.py", line 120, in positional_wrapper
return wrapped(*args, **kwargs)
File "C:\Python27\lib\site-packages\google_api_python_client-1.0-py2.7.egg\oauth2client\tools.py", line 169, in run
credential = flow.step2_exchange(code, http=http)
File "C:\Python27\lib\site-packages\google_api_python_client-1.0-py2.7.egg\oauth2client\util.py", line 120, in positional_wrapper
return wrapped(*args, **kwargs)
File "C:\Python27\lib\site-packages\google_api_python_client-1.0-py2.7.egg\oauth2client\client.py", line 1128, in step2_exchange
http = httplib2.Http()
AttributeError: 'module' object has no attribute 'Http'
The problem is that in your run directory you have the file named calendar.py. When Google's httplib2 wants to import a standard calendar module it gets the local one instead. In the local one it executes it to perform the import. But because httplib2 is not yet fully imported the calendar.py code is not working properly.
Just rename the calendar.py to something like myCalendar.py.

Categories