Download XLSX File from Azure Blob in Python - python

from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name=AZURE_ACCOUNT_NAME, account_key=AZURE_ACCOUNT_KEY)
file = block_blob_service.get_blob_to_bytes(AZURE_CONTAINER, "CS_MDMM_Global.xlsx")
file.content // the issue is at this line it give me data in some encoded form, i want to decode the data and store in panada data frame.
I'm getting encoded data from blob but I'm unable to figure out how I will decode the data to PANDA DATAFRAME.

It sounds like you want to read the content of a xlsx blob file stored in Azure Blob Storage via pandas to get a pandas dataframe.
I have a xlsx sample file stored in my Azure Blob Storage, its content is as the figure below.
So I will directly read it by Azure Storage SDK for Python and pandas, the first step is to install these packages below.
pip install pandas azure-storage xlrd
Here is my sample code.
# Generate a url of excel blob with sas token
from azure.storage.blob.baseblobservice import BaseBlobService
from azure.storage.blob import BlobPermissions
from datetime import datetime, timedelta
account_name = '<your storage account name>'
account_key = '<your storage key>'
container_name = '<your container name>'
blob_name = '<your excel blob>'
blob_service = BaseBlobService(
account_name=account_name,
account_key=account_key
)
sas_token = blob_service.generate_blob_shared_access_signature(container_name, blob_name, permission=BlobPermissions.READ, expiry=datetime.utcnow() + timedelta(hours=1))
blob_url_with_sas = blob_service.make_blob_url(container_name, blob_name, sas_token=sas_token)
# pass the blob url with sas to function `read_excel`
import pandas as pd
df = pd.read_excel(blob_url_with_sas)
print(df)
And the result is:
Actually, your post question is duplicated with the other SO thread Read in azure blob using python, for more details, please refer to my answer for it.

Related

How to create a blob in an Azure Storage Container using Python & Azure Functions

I am having a lot of difficulty writing an API response as json to a blob within an Azure Storage Container. I have tried multiple solutions online but have not managed to see any through to success. I would like to share 2 attempts I have made and hopefully there is someone out there that can assist me in getting at least one methodology correct
Attempt/Method 1
I have tried to use a Service Principle to authenticate my BlobServiceClient from Azure-Storage-Blob. My service principal has been assigned the role of Storage Blob Data Contributor for the Container within which I am trying to create the blob. However on execution of the script I receive an error along the lines of "Unsupported Credential". Below is my script and the error:
My script and resulting error are:
import azure.functions as func
import requests
import json
import uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
from msrestazure.azure_active_directory import ServicePrincipalCredentials
from azure.storage.common import TokenCredential
# Initialise parameters to obtain data from Rest API
url = "https://api.powerbi.com/v1.0/myorg/admin/groups?$top=1000&$expand=datasets,dataflows,reports,users,dashboards"
headers = {'Authorization': get_access_token()}
# Get response. I want to save the response output to a blob.
response = requests.get(url, headers=headers)
response = response.json()
# Initialise parameters for credentials
CLIENT = "bxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx7" # Azure App/Service Principal ID
KEY = "Gxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1" # Azure App/Service Principal Key
TENANT_ID = "cxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx7" # Tenant where Storage Account is which is different to the Tenant the App resides
RESOURCE = f"https://storageaccountxxxxxxxxx.blob.core.windows.net"
# Create credentials & token
credentials = ServicePrincipalCredentials(
client_id = CLIENT,
secret = KEY,
#tenant = TENANT_ID,
resource = RESOURCE
)
tokenCre = TokenCredential(credentials.token["access_token"])
# Initialise parameters for BlobServiceClient
ACCOUNT_URL = "https://storageaccountxxxxxxxxx.blob.core.windows.net/pbiactivity" # includes container name at end of url
#Create BlobServiceClient
blobService = BlobServiceClient(account_url = ACCOUNT_URL, token_credential=tokenCre)
#Create blobClient
blobClient = BlobClient(account_url = RESOURCE,container_name=CONTAINER_NAME, blob_name="response.json", credential = tokenCre )
#Upload response json as blob
blobClient.upload_blob(response, blob_type = "BlockBlob")
Click here for the error that comes after the upload_blob method call]1
Attempt/Method 2
In my second attempt I tried to create ,my BlobServiceClient using Azure-Storage-Blob using my storage account connection string. This method actually allows me to create containers, however when I try to upload a blob as in the script below, However I am unable to create blobs within a container as I get a 403 Forbidden response.
My script and resulting error are:
import requests
import json
import uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
# Initialise parameters to obtain data from Rest API
url = "https://api.powerbi.com/v1.0/myorg/admin/groups?$top=1000&$expand=datasets,dataflows,reports,users,dashboards"
headers = {'Authorization': get_access_token()}
# Get response. I want to save the response output to a blob.
response = requests.get(url, headers=headers)
response = response.json()
# Initialise parameters
CONNECTION_STRING = "DefaultEndpointsProtocol=https;AccountName=storageaccountxxxxxxxxx;AccountKey=rxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxQ==;EndpointSuffix=core.windows.net"
# Create blobServiceClient from connection string
blobServiceClient = BlobServiceClient.from_connection_string(conn_str=CONNECTION_STRING)
#Create blobClient
blobClient = blobServiceClient.get_blob_client(container = "pbiactivity", blob = "response.json")
#Upload response json to blob
blobClient.upload_blob(response, blob_type = "BlockBlob")
Click Here for the errors that come after the upload_blob method call]2
Here is one of the workaround that worked for me:-
import os
import logging
from azure.storage.blob import BlobServiceClient, BlobClient
#Initialise parameters
url = "<YourURL>"
headers = {'Authorization': get_access_token()}
#Get response
response = requests.get(url, headers=headers)
response = response.json()
connectionString= "<Your_Connection_String>"
containerName = "<Name_of_your_container>"
blobServiceClient = BlobServiceClient.from_connection_string(connectionString)
blobContainerClient = blobServiceClient.get_container_client(containerName)
#To create Container (If the container has already been created you can ignore this)
#blobContainerClient.create_container()
#Create blobClient
blobClient = blobServiceClient.get_blob_client(container = "<Name_of_your_container>", blob = "response.json")
with open("response", "rb") as blob_file:
blobClient.upload_blob(data=blob_file)
In my Storage Account:-

Google Cloud Function error deploying error

Hello I have a python cloud function :
import requests
import pandas as pd
import datetime
from google.cloud import storage
import os
api_key = os.environ['API_KEY']
url = f'https://api.sportsdata.io/v3/nba/scores/json/TeamSeasonStats/2022?key=${api_key}'
def export_data(url):
response = requests.get(url) # Make a GET request to the URL
print(response)
payload = response.json() # Parse `response.text` into JSON
# Use the flatsplode package to quickly turn the JSON response to a DF
new_list = pd.DataFrame(list(payload))
# Convert your df to str: it is straightforward, just do not provide
# any value for the first param path_or_buf
csv_str = new_list.to_csv()
# Then, upload it to cloud storage
def upload_blob(bucket_name, data, destination_blob_name):
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
# Note the use of upload_from_string here. Please, provide
# the appropriate content type if you wish
blob.upload_from_string(data, content_type='text/csv')
upload_blob('basketball_api_data', csv_str, 'data-' + str(datetime.date.today()) + '.csv')
export_data(url)
Its basically getting Basketball API Data: however I am getting errors deploying this
my requirements.txt looks like this :
# Function dependencies, for example:
# package>=version
google-cloud-storage
pandas
requests
datetime
os
google.cloud
Am I writing my cloud function in a wrong way? What am I doing wrong to cause error deploying this to GCF?
This is my error:

generate shared access signature through python

I'm tryng to generate a shared access signature link through python of my files which are already at blob storage, but something goes wrong , I received this message when I put the generate link on web browser:
"Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature."
I'm generating the key from my container name on right button get shared access signature, but I can't go further.
from datetime import datetime
from datetime import timedelta
#from azure.storage.blob import BlobService
datetime.utcnow()
from azure.storage.blob import generate_blob_sas, AccountSasPermissions,AccessPolicy
def generate_link():
account_name='my_account_name_storage'
container_name='container_name'
blob_name='file_name.xsl'
account_key='?sv=2019-12-12&ss=bfqt&srt=sco&sp=rwdlacupx&se=2020-09-17T05:49:57Z&st=2020-09-16T21:49:57Z&spr=https&sig=sdfsdhgbjgnbdkfnglfkdnhklfgnhklgf%30'
url = f"https://{account_name}.blob.core.windows.net/{container_name}/{blob_name}"
sas_token = generate_blob_sas(
account_name=account_name,
account_key=account_key,
container_name=container_name,
blob_name=blob_name,
permission=AccountSasPermissions(read=True),
expiry=datetime.utcnow() + timedelta(hours=1)
)
print(sas_token)
url_with_sas = f"{url}?{sas_token}"
print(url_with_sas)
generate_link()```
It's wrong about account_key in your code.
To find account_key of your storage account, please nav to azure portal -> your storage account -> Settings -> Access keys, then you can see the account_key. The screenshot as below:

Azure blob download Authorization resource type mismatch

I am trying to download a client's blob data which is in JSON format from their azure storage.
I have the account URL containing the SAS token, the SAS token itself as well as connection URL.
However, when I try to download a blob from their server, I am getting the following error :
This request is not authorized to perform this operation using this resource type.
RequestId:d6d9d23e-301e-0078-32be-ecd22a000000
Time:2020-02-26T16:02:05.6188260Z
ErrorCode:AuthorizationResourceTypeMismatch
Error:None
Here is the code I am using :
import os, uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
try:
account_url = "account_url/sas_token"
# Open the container containing the relevant blobs
container = ContainerClient(account_url, container_name="container_name")
# Printing the available blobs
blob_list = container.list_blobs()
for blob in blob_list:
print(blob.name + '\n')
blob_client=container.get_blob_client("blob_name")
print(blob_client)
with open("./test.txt", "wb") as my_blob:
blob_data = blob_client.download_blob()
blob_data.readinto(my_blob)
except Exception as ex:
print('Exception:')
print(ex)
Any more information about this error would be appreciated !
Thanks :D
Since your objective is to download blob, please make sure that your signed resource type (srt) should include object (o).
Your srt should be like srt=co (or just srt=o).
Please regenerate the SAS token accordingly.
You can see the Account SAS permissions by operations here: https://learn.microsoft.com/en-us/rest/api/storageservices/create-account-sas#account-sas-permissions-by-operation (Download Blob is basically Get Blob operation).

How to download an Azure Blob Storage file via URL in Python?

I am trying to download an Azure Blob Storage file from my storage account, to do so, I have checked what the URL is and I am doing the following:
with urllib.request.urlopen("<url_file>") as resp:
img = np.asarray(bytearray(resp.read()), dtype="uint8")
But I am getting the following error:
urllib.error.HTTPError: HTTP Error 404: The specified resource does not exist.
I have doubled checked that the url is correct. Could this have something to do with not having passed the keys of my subscription or any other info about the Storage Account?
Any idea?
As on Dec 26, 2019 I am unable to import BaseBlobService from azure cloud storage. Neither of BlobPermissions, generate_blob_shared_access_signature worked for me. Below is something I used and it worked in my case and hope it helps
from azure.storage.blob import generate_blob_sas, AccountSasPermissions
def scan_product():
account_name=<account_name>
container_name=<container_name>
blob_name=<blob_name>
account_key=<account_key>
url = f"https://{account_name}.blob.core.windows.net/{container_name}/{blob_name}"
sas_token = generate_blob_sas(
account_name=account_name,
account_key=account_key,
container_name=container_name,
blob_name=blob_name,
permission=AccountSasPermissions(read=True),
expiry=datetime.utcnow() + timedelta(hours=1)
)
url_with_sas = f"{url}?{sas_token}"
Actually, you can generate a blob url with sas token in Azure Storage SDK for Python for accessing directly, as my sample code below.
from azure.storage.blob.baseblobservice import BaseBlobService
from azure.storage.blob import BlobPermissions
from datetime import datetime, timedelta
account_name = '<account name>'
account_key = '<account key>'
container_name = '<container name>'
blob_name = '<blob name>'
url = f"https://{account_name}.blob.core.windows.net/{container_name}/{blob_name}"
service = BaseBlobService(account_name=account_name, account_key=account_key)
token = service.generate_blob_shared_access_signature(container_name, blob_name, permission=BlobPermissions.READ, expiry=datetime.utcnow() + timedelta(hours=1),)
url_with_sas = f"{url}?{token}"
Then,
import urllib
import numpy as np
req = urllib.urlopen(url_with_sas)
img = np.asarray(bytearray(req.read()), dtype=np.uint8)
For downloading using url directly, you should put the blob in a public container, or in the private container then you should generate a sas token for the blob(the url looks like : https://xxx.blob.core.windows.net/aa1/0116.txt?sp=r&st=2019-06-26T09:47:04Z&se=2019-06-26xxxxx).
I test your code with the url which contains a sas token, it can be downloaded.
Test result:
How to generate sas token for a blob:
To solve the issue all I needed to do was to change the Blob Storage access level to Blob (anonymous read access for blob only). Once this is done, it will work.

Categories