I am new come to the python, but I need to invoke Power BI REST API with python to publish my pbix file in my repo to the workspace.
Based on this document, I could successfully authenticated and get the workspace:
import json, requests, pandas as pd
try:
from azure.identity import ClientSecretCredential
except Exception:
!pip install azure.identity
from azure.identity import ClientSecretCredential
# --------------------------------------------------------------------------------------#
# String variables: Replace with your own
tenant = 'Your-Tenant-ID'
client = 'Your-App-Client-ID'
client_secret = 'Your-Client-Secret-Value' # See Note 2: Better to use key vault
api = 'https://analysis.windows.net/powerbi/api/.default'
# --------------------------------------------------------------------------------------#
# Generates the access token for the Service Principal
auth = ClientSecretCredential(authority = 'https://login.microsoftonline.com/',
tenant_id = tenant,
client_id = client,
client_secret = client_secret)
access_token = auth.get_token(api)
access_token = access_token.token
print('\nSuccessfully authenticated.')
But I do not know how to publish my pbix to one of my workspace and with parameter overwrite by using REST API with python. And if the pbix already existed in the workspace, provide the parameter to overwrite it.
Any advice would be greatly appreciated and a sample will be greate.
I am having a lot of difficulty writing an API response as json to a blob within an Azure Storage Container. I have tried multiple solutions online but have not managed to see any through to success. I would like to share 2 attempts I have made and hopefully there is someone out there that can assist me in getting at least one methodology correct
Attempt/Method 1
I have tried to use a Service Principle to authenticate my BlobServiceClient from Azure-Storage-Blob. My service principal has been assigned the role of Storage Blob Data Contributor for the Container within which I am trying to create the blob. However on execution of the script I receive an error along the lines of "Unsupported Credential". Below is my script and the error:
My script and resulting error are:
import azure.functions as func
import requests
import json
import uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
from msrestazure.azure_active_directory import ServicePrincipalCredentials
from azure.storage.common import TokenCredential
# Initialise parameters to obtain data from Rest API
url = "https://api.powerbi.com/v1.0/myorg/admin/groups?$top=1000&$expand=datasets,dataflows,reports,users,dashboards"
headers = {'Authorization': get_access_token()}
# Get response. I want to save the response output to a blob.
response = requests.get(url, headers=headers)
response = response.json()
# Initialise parameters for credentials
CLIENT = "bxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx7" # Azure App/Service Principal ID
KEY = "Gxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1" # Azure App/Service Principal Key
TENANT_ID = "cxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx7" # Tenant where Storage Account is which is different to the Tenant the App resides
RESOURCE = f"https://storageaccountxxxxxxxxx.blob.core.windows.net"
# Create credentials & token
credentials = ServicePrincipalCredentials(
client_id = CLIENT,
secret = KEY,
#tenant = TENANT_ID,
resource = RESOURCE
)
tokenCre = TokenCredential(credentials.token["access_token"])
# Initialise parameters for BlobServiceClient
ACCOUNT_URL = "https://storageaccountxxxxxxxxx.blob.core.windows.net/pbiactivity" # includes container name at end of url
#Create BlobServiceClient
blobService = BlobServiceClient(account_url = ACCOUNT_URL, token_credential=tokenCre)
#Create blobClient
blobClient = BlobClient(account_url = RESOURCE,container_name=CONTAINER_NAME, blob_name="response.json", credential = tokenCre )
#Upload response json as blob
blobClient.upload_blob(response, blob_type = "BlockBlob")
Click here for the error that comes after the upload_blob method call]1
Attempt/Method 2
In my second attempt I tried to create ,my BlobServiceClient using Azure-Storage-Blob using my storage account connection string. This method actually allows me to create containers, however when I try to upload a blob as in the script below, However I am unable to create blobs within a container as I get a 403 Forbidden response.
My script and resulting error are:
import requests
import json
import uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
# Initialise parameters to obtain data from Rest API
url = "https://api.powerbi.com/v1.0/myorg/admin/groups?$top=1000&$expand=datasets,dataflows,reports,users,dashboards"
headers = {'Authorization': get_access_token()}
# Get response. I want to save the response output to a blob.
response = requests.get(url, headers=headers)
response = response.json()
# Initialise parameters
CONNECTION_STRING = "DefaultEndpointsProtocol=https;AccountName=storageaccountxxxxxxxxx;AccountKey=rxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxQ==;EndpointSuffix=core.windows.net"
# Create blobServiceClient from connection string
blobServiceClient = BlobServiceClient.from_connection_string(conn_str=CONNECTION_STRING)
#Create blobClient
blobClient = blobServiceClient.get_blob_client(container = "pbiactivity", blob = "response.json")
#Upload response json to blob
blobClient.upload_blob(response, blob_type = "BlockBlob")
Click Here for the errors that come after the upload_blob method call]2
Here is one of the workaround that worked for me:-
import os
import logging
from azure.storage.blob import BlobServiceClient, BlobClient
#Initialise parameters
url = "<YourURL>"
headers = {'Authorization': get_access_token()}
#Get response
response = requests.get(url, headers=headers)
response = response.json()
connectionString= "<Your_Connection_String>"
containerName = "<Name_of_your_container>"
blobServiceClient = BlobServiceClient.from_connection_string(connectionString)
blobContainerClient = blobServiceClient.get_container_client(containerName)
#To create Container (If the container has already been created you can ignore this)
#blobContainerClient.create_container()
#Create blobClient
blobClient = blobServiceClient.get_blob_client(container = "<Name_of_your_container>", blob = "response.json")
with open("response", "rb") as blob_file:
blobClient.upload_blob(data=blob_file)
In my Storage Account:-
from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name=AZURE_ACCOUNT_NAME, account_key=AZURE_ACCOUNT_KEY)
file = block_blob_service.get_blob_to_bytes(AZURE_CONTAINER, "CS_MDMM_Global.xlsx")
file.content // the issue is at this line it give me data in some encoded form, i want to decode the data and store in panada data frame.
I'm getting encoded data from blob but I'm unable to figure out how I will decode the data to PANDA DATAFRAME.
It sounds like you want to read the content of a xlsx blob file stored in Azure Blob Storage via pandas to get a pandas dataframe.
I have a xlsx sample file stored in my Azure Blob Storage, its content is as the figure below.
So I will directly read it by Azure Storage SDK for Python and pandas, the first step is to install these packages below.
pip install pandas azure-storage xlrd
Here is my sample code.
# Generate a url of excel blob with sas token
from azure.storage.blob.baseblobservice import BaseBlobService
from azure.storage.blob import BlobPermissions
from datetime import datetime, timedelta
account_name = '<your storage account name>'
account_key = '<your storage key>'
container_name = '<your container name>'
blob_name = '<your excel blob>'
blob_service = BaseBlobService(
account_name=account_name,
account_key=account_key
)
sas_token = blob_service.generate_blob_shared_access_signature(container_name, blob_name, permission=BlobPermissions.READ, expiry=datetime.utcnow() + timedelta(hours=1))
blob_url_with_sas = blob_service.make_blob_url(container_name, blob_name, sas_token=sas_token)
# pass the blob url with sas to function `read_excel`
import pandas as pd
df = pd.read_excel(blob_url_with_sas)
print(df)
And the result is:
Actually, your post question is duplicated with the other SO thread Read in azure blob using python, for more details, please refer to my answer for it.
I am trying to download an Azure Blob Storage file from my storage account, to do so, I have checked what the URL is and I am doing the following:
with urllib.request.urlopen("<url_file>") as resp:
img = np.asarray(bytearray(resp.read()), dtype="uint8")
But I am getting the following error:
urllib.error.HTTPError: HTTP Error 404: The specified resource does not exist.
I have doubled checked that the url is correct. Could this have something to do with not having passed the keys of my subscription or any other info about the Storage Account?
Any idea?
As on Dec 26, 2019 I am unable to import BaseBlobService from azure cloud storage. Neither of BlobPermissions, generate_blob_shared_access_signature worked for me. Below is something I used and it worked in my case and hope it helps
from azure.storage.blob import generate_blob_sas, AccountSasPermissions
def scan_product():
account_name=<account_name>
container_name=<container_name>
blob_name=<blob_name>
account_key=<account_key>
url = f"https://{account_name}.blob.core.windows.net/{container_name}/{blob_name}"
sas_token = generate_blob_sas(
account_name=account_name,
account_key=account_key,
container_name=container_name,
blob_name=blob_name,
permission=AccountSasPermissions(read=True),
expiry=datetime.utcnow() + timedelta(hours=1)
)
url_with_sas = f"{url}?{sas_token}"
Actually, you can generate a blob url with sas token in Azure Storage SDK for Python for accessing directly, as my sample code below.
from azure.storage.blob.baseblobservice import BaseBlobService
from azure.storage.blob import BlobPermissions
from datetime import datetime, timedelta
account_name = '<account name>'
account_key = '<account key>'
container_name = '<container name>'
blob_name = '<blob name>'
url = f"https://{account_name}.blob.core.windows.net/{container_name}/{blob_name}"
service = BaseBlobService(account_name=account_name, account_key=account_key)
token = service.generate_blob_shared_access_signature(container_name, blob_name, permission=BlobPermissions.READ, expiry=datetime.utcnow() + timedelta(hours=1),)
url_with_sas = f"{url}?{token}"
Then,
import urllib
import numpy as np
req = urllib.urlopen(url_with_sas)
img = np.asarray(bytearray(req.read()), dtype=np.uint8)
For downloading using url directly, you should put the blob in a public container, or in the private container then you should generate a sas token for the blob(the url looks like : https://xxx.blob.core.windows.net/aa1/0116.txt?sp=r&st=2019-06-26T09:47:04Z&se=2019-06-26xxxxx).
I test your code with the url which contains a sas token, it can be downloaded.
Test result:
How to generate sas token for a blob:
To solve the issue all I needed to do was to change the Blob Storage access level to Blob (anonymous read access for blob only). Once this is done, it will work.
I am trying to download files from google drive and all I have is the drive's URL.
I have read about google API that talks about some drive_service and MedioIO, which also requires some credentials( mainly JSON file/OAuth). But I am unable to get any idea about how it is working.
Also, tried urllib2.urlretrieve, but my case is to get files from the drive. Tried wget too but no use.
Tried PyDrive library. It has good upload functions to drive but no download options.
Any help will be appreciated.
Thanks.
If by "drive's url" you mean the shareable link of a file on Google Drive, then the following might help:
import requests
def download_file_from_google_drive(id, destination):
URL = "https://docs.google.com/uc?export=download"
session = requests.Session()
response = session.get(URL, params = { 'id' : id }, stream = True)
token = get_confirm_token(response)
if token:
params = { 'id' : id, 'confirm' : token }
response = session.get(URL, params = params, stream = True)
save_response_content(response, destination)
def get_confirm_token(response):
for key, value in response.cookies.items():
if key.startswith('download_warning'):
return value
return None
def save_response_content(response, destination):
CHUNK_SIZE = 32768
with open(destination, "wb") as f:
for chunk in response.iter_content(CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
if __name__ == "__main__":
file_id = 'TAKE ID FROM SHAREABLE LINK'
destination = 'DESTINATION FILE ON YOUR DISK'
download_file_from_google_drive(file_id, destination)
The snipped does not use pydrive, nor the Google Drive SDK, though. It uses the requests module (which is, somehow, an alternative to urllib2).
When downloading large files from Google Drive, a single GET request is not sufficient. A second one is needed - see wget/curl large file from google drive.
I recommend gdown package.
pip install gdown
Take your share link
https://drive.google.com/file/d/0B9P1L--7Wd2vNm9zMTJWOGxobkU/view?usp=sharing
and grab the id - eg. 1TLNdIufzwesDbyr_nVTR7Zrx9oRHLM_N by pressing the download button (look for at the link), and swap it in after the id below.
import gdown
url = 'https://drive.google.com/uc?id=0B9P1L--7Wd2vNm9zMTJWOGxobkU'
output = '20150428_collected_images.tgz'
gdown.download(url, output, quiet=False)
Having had similar needs many times, I made an extra simple class GoogleDriveDownloader starting on the snippet from #user115202 above. You can find the source code here.
You can also install it through pip:
pip install googledrivedownloader
Then usage is as simple as:
from google_drive_downloader import GoogleDriveDownloader as gdd
gdd.download_file_from_google_drive(file_id='1iytA1n2z4go3uVCwE__vIKouTKyIDjEq',
dest_path='./data/mnist.zip',
unzip=True)
This snippet will download an archive shared in Google Drive. In this case 1iytA1n2z4go3uVCwE__vIKouTKyIDjEq is the id of the sharable link got from Google Drive.
Here's an easy way to do it with no third-party libraries and a service account.
pip install google-api-core and google-api-python-client
from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
from google.oauth2 import service_account
import io
credz = {} #put json credentials her from service account or the like
# More info: https://cloud.google.com/docs/authentication
credentials = service_account.Credentials.from_service_account_info(credz)
drive_service = build('drive', 'v3', credentials=credentials)
file_id = '0BwwA4oUTeiV1UVNwOHItT0xfa2M'
request = drive_service.files().get_media(fileId=file_id)
#fh = io.BytesIO() # this can be used to keep in memory
fh = io.FileIO('file.tar.gz', 'wb') # this can be used to write to disk
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))
PyDrive allows you to download a file with the function GetContentFile(). You can find the function's documentation here.
See example below:
# Initialize GoogleDriveFile instance with file id.
file_obj = drive.CreateFile({'id': '<your file ID here>'})
file_obj.GetContentFile('cats.png') # Download file as 'cats.png'.
This code assumes that you have an authenticated drive object, the docs on this can be found here and here.
In the general case this is done like so:
from pydrive.auth import GoogleAuth
gauth = GoogleAuth()
# Create local webserver which automatically handles authentication.
gauth.LocalWebserverAuth()
# Create GoogleDrive instance with authenticated GoogleAuth instance.
drive = GoogleDrive(gauth)
Info on silent authentication on a server can be found here and involves writing a settings.yaml (example: here) in which you save the authentication details.
There's in the docs a function that downloads a file when we provide an ID of the file to download,
from __future__ import print_function
import io
import google.auth
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from googleapiclient.http import MediaIoBaseDownload
def download_file(real_file_id):
"""Downloads a file
Args:
real_file_id: ID of the file to download
Returns : IO object with location.
Load pre-authorized user credentials from the environment.
TODO(developer) - See https://developers.google.com/identity
for guides on implementing OAuth2 for the application.
"""
creds, _ = google.auth.default()
try:
# create drive api client
service = build('drive', 'v3', credentials=creds)
file_id = real_file_id
# pylint: disable=maybe-no-member
request = service.files().get_media(fileId=file_id)
file = io.BytesIO()
downloader = MediaIoBaseDownload(file, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print(F'Download {int(status.progress() * 100)}.')
except HttpError as error:
print(F'An error occurred: {error}')
file = None
return file.getvalue()
if __name__ == '__main__':
download_file(real_file_id='1KuPmvGq8yoYgbfW74OENMCB5H0n_2Jm9')
This bears the question:
How do we get the file ID to download the file?
Generally speaking, a URL from a shared file from Google Drive looks like this
https://drive.google.com/file/d/1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh/view?usp=sharing
where 1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh corresponds to fileID.
You can simply copy it from the URL or, if you prefer, it's also possible to create a function to get the fileID from the URL.
For instance, given the following url = https://drive.google.com/file/d/1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh/view?usp=sharing,
def url_to_id(url):
x = url.split("/")
return x[5]
Printing x will give
['https:', '', 'drive.google.com', 'file', 'd', '1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh', 'view?usp=sharing']
And so, as we want to return the 6th array value, we use x[5].
This has also been described above,
from pydrive.auth import GoogleAuth
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
This creates its own server too do the dirty work of authenticating
file_obj = drive.CreateFile({'id': '<Put the file ID here>'})
file_obj.GetContentFile('Demo.txt')
This downloads the file
import requests
def download_file_from_google_drive(id, destination):
URL = "https://docs.google.com/uc?export=download"
session = requests.Session()
response = session.get(URL, params = { 'id' : id , 'confirm': 1 }, stream = True)
token = get_confirm_token(response)
if token:
params = { 'id' : id, 'confirm' : token }
response = session.get(URL, params = params, stream = True)
save_response_content(response, destination)
def get_confirm_token(response):
for key, value in response.cookies.items():
if key.startswith('download_warning'):
return value
return None
def save_response_content(response, destination):
CHUNK_SIZE = 32768
with open(destination, "wb") as f:
for chunk in response.iter_content(CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
if __name__ == "__main__":
file_id = 'TAKE ID FROM SHAREABLE LINK'
destination = 'DESTINATION FILE ON YOUR DISK'
download_file_from_google_drive(file_id, destination)
Just repeating the accepted answer but adding confirm=1 parameter so it always downloads even if the file is too big
# Importing [PyDrive][1] OAuth
from pydrive.auth import GoogleAuth
def download_tracking_file_by_id(file_id, download_dir):
gauth = GoogleAuth(settings_file='../settings.yaml')
# Try to load saved client credentials
gauth.LoadCredentialsFile("../credentials.json")
if gauth.credentials is None:
# Authenticate if they're not there
gauth.LocalWebserverAuth()
elif gauth.access_token_expired:
# Refresh them if expired
gauth.Refresh()
else:
# Initialize the saved creds
gauth.Authorize()
# Save the current credentials to a file
gauth.SaveCredentialsFile("../credentials.json")
drive = GoogleDrive(gauth)
logger.debug("Trying to download file_id " + str(file_id))
file6 = drive.CreateFile({'id': file_id})
file6.GetContentFile(download_dir+'mapmob.zip')
zipfile.ZipFile(download_dir + 'test.zip').extractall(UNZIP_DIR)
tracking_data_location = download_dir + 'test.json'
return tracking_data_location
The above function downloads the file given the file_id to a specified downloads folder. Now the question remains, how to get the file_id? Simply split the url by id= to get the file_id.
file_id = url.split("id=")[1]
I tried using google Colaboratory: https://colab.research.google.com/
Suppose your sharable link is https://docs.google.com/spreadsheets/d/12hiI0NK7M0KEfscMfyBaLT9gxcZMleeu/edit?usp=sharing&ouid=102608702203033509854&rtpof=true&sd=true
all you need is id that is 12hiI0NK7M0KEfscMfyBaLT9gxcZMleeu
command in cell
!gdown 12hiI0NK7M0KEfscMfyBaLT9gxcZMleeu
run the cell and you will see that file is downloaded in /content/Amazon_Reviews.xlsx
Note: one should know how to use Google colab
This example is based on an similar to RayB, but keeps the file in memory
and is a little simpler, and you can paste it into colab and it works.
import googleapiclient.discovery
import oauth2client.client
from google.colab import auth
auth.authenticate_user()
def download_gdrive(id):
creds = oauth2client.client.GoogleCredentials.get_application_default()
service = googleapiclient.discovery.build('drive', 'v3', credentials=creds)
return service.files().get_media(fileId=id).execute()
a = download_gdrive("1F-yaQB8fdsfsdafm2l8WFjhEiYSHZrCcr")
You can install https://pypi.org/project/googleDriveFileDownloader/
pip install googleDriveFileDownloader
And download the file, here is the sample code to download
from googleDriveFileDownloader import googleDriveFileDownloader
a = googleDriveFileDownloader()
a.downloadFile("https://drive.google.com/uc?id=1O4x8rwGJAh8gRo8sjm0kuKFf6vCEm93G&export=download")