copying azure blob to azure fileshare, using Python SDK - python

I'm trying to copy a blob from Azure storage blob container to a file share, running the following script on Azure Databricks
dbutils.library.installPyPI('azure-storage-blob')
dbutils.library.installPyPI('azure-storage-file-share')
from azure.storage.blob import BlobServiceClient, BlobClient
from azure.storage.fileshare import ShareClient, ShareFileClient
connection_string = my_connection_string
blobserviceclient = BlobServiceClient.from_connection_string(connection_string)
source_blob = BlobClient(blobserviceclient.url,container_name = 'my-container-name', blob_name = 'my_file.json')
fileshareclient = ShareClient.from_connection_string(connection_string, 'my-fileshare-name')
destination_file= fileshareclient.get_file_client('my_file.json')
destination_file.start_copy_from_url(source_blob.url)
I get the following error:
ResourceNotFoundError: The specified resource does not exist.
When I check for source_blob.url and destination_file.url, they both exist:
source_blob.url
'https://myaccountname.file.core.windows.net/my-container-name/my_file.json'
and
destination_file.url
'https://myaccountname.file.core.windows.net/my-fileshare-name/my_file.json'
I used the examples from this: https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/storage/azure-storage-file-share/samples/file_samples_client.py
Any idea what I'm doing wrong?
This works when I use AzCopy.
I can also copy from one blob container to another, just not from the blob container to file share.

You should use sasToken with the blob url when using the method start_copy_from_url or set the source blob container as public. Otherwise, it will throw the error you've seen.
For sasToken, you can generate it from code or from azure portal.
Here is the sample code including generating sas token for blob:
from azure.storage.blob import BlobServiceClient, BlobClient, generate_blob_sas, BlobSasPermissions
from azure.storage.fileshare import ShareClient, ShareFileClient
from datetime import datetime, timedelta
connection_string="xxx"
blobserviceclient = BlobServiceClient.from_connection_string(connection_string)
source_blob = BlobClient(blobserviceclient.url,container_name="xxx", blob_name="xxx")
#generate sas token for this blob
sasToken = generate_blob_sas(
account_name="xxx",
container_name="xxx",
blob_name="xxxx",
account_key="xxx",
permission= BlobSasPermissions(read=True),
expiry=datetime.utcnow() + timedelta(hours=1)
)
fileshareclient =ShareClient.from_connection_string(connection_string,"xxx")
destination_file = fileshareclient.get_file_client('xxx')
destination_file.start_copy_from_url(source_blob.url+"?"+sasToken)
print("**copy completed**")

Related

Azure Storage Account: How to rename/move a Blob within a Container

I am using Microsoft Azure SDK for Python and I want to move or rename a Blob within the same container.
For example, how to move this blob
https://demostorage.blob.core.windows.net/container-a/folder1/file1.jpg
into this?
https://demostorage.blob.core.windows.net/container-a/folder2/file2.jpg
After reproducing from my end I could able to achieve this using copy_blob of azure-storage versioned 0.20.0.
from azure.storage.blob import BlobService
accountName = "<YourAccountName>"
accountKey = "<YourAccountKey>"
blob_service = BlobService(accountName ,accountKey )
sourceFileName = 'file1.png'
destinationFileName = 'file2.png'
sourcePath = 'container1/folder1'
destinationPath = 'container1/folder2'
blob_url = blob_service.make_blob_url(sourcePath,sourceFileName)
blob_service.copy_blob(destinationPath, destinationFileName, blob_url)

get contents of all azure blobs via python

I want to list all the blobs in a container and then ultimately store each blobs contents (each blob stores a csv file) into a data frame, it appears that the blob service client is the easiest way to list all the blobs, and this is what I have:
#!/usr/bin/env python3
import os
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
from pathlib import Path
from io import StringIO
import pandas as pd
def main():
connect_str = os.environ['AZURE_CONNECT_STR']
container = os.environ['CONTAINER']
print(connect_str + "\n")
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_client = blob_service_client.get_container_client(container)
blob_list = container_client.list_blobs()
for blob in blob_list:
print("\t" + blob.name)
if __name__ == "__main__":
main()
However, in the last version of blob storage client there appears to be no method which allows me to get the actual contents of the blob, what code should I be using ? there are other clients in the Python SDK for Azure, but it getting a full list of the blobs in a container using these seems cumbersome.
What you would need to do is create an instance of BlobClient using the container_client and the blob's name. You can then call download_blob method to download the blob.
Something like:
for blob in blob_list:
print("\t" + blob.name)
blob_client = container_client.get_blob_client(blob.name)
blob_client.download(...)

Download a picture from a blob using python ( azure Blob storage)

I want to download an image from a blob that is in a container.
I searched and I only found how to download a container, but as I said I do not want to download the whole container and not the whole blob otherwise just an image.
(container/blob/image.png)
this is the code that i found ( to download all the container):
import os
from azure.storage.blob import BlobServiceClient, BlobClient
from azure.storage.blob import ContentSettings, ContainerClient
# IMPORTANT: Replace connection string with your storage account connection string
# Usually starts with DefaultEndpointsProtocol=https;...
MY_CONNECTION_STRING = "CONNECTION_STRING"
# Replace with blob container
MY_BLOB_CONTAINER = "name"
# Replace with the local folder where you want files to be downloaded
LOCAL_BLOB_PATH = "Blobsss"
BLOBNAME="test"
class AzureBlobFileDownloader:
def __init__(self):
print("Intializing AzureBlobFileDownloader")
# Initialize the connection to Azure storage account
self.blob_service_client = BlobServiceClient.from_connection_string(MY_CONNECTION_STRING)
self.my_container = self.blob_service_client.get_container_client(MY_BLOB_CONTAINER)
def save_blob(self, file_name, file_content):
# Get full path to the file
download_file_path = os.path.join(LOCAL_BLOB_PATH, file_name)
# for nested blobs, create local path as well!
os.makedirs(os.path.dirname(download_file_path), exist_ok=True)
with open(download_file_path, "wb") as file:
file.write(file_content)
def download_all_blobs_in_container(self):
my_blobs = self.my_container.list_blobs()
for blob in my_blobs:
print(blob.name)
bytes = self.my_container.get_blob_client(blob).download_blob().readall()
self.save_blob(blob.name, bytes)
# Initialize class and upload files
azure_blob_file_downloader = AzureBlobFileDownloader()
azure_blob_file_downloader.download_all_blobs_in_container()
Could you please help me ?
thanks you

Copy Azure Blob as BlockBlob from Remote URL

I'm using the azure-sdk-for-python BlobClient start_copy_from_url to copy a remote file to my local storage.
However, the file always ends up as an AppendBlob instead of BlockBlob. I can't see how I can force the destination BlockType to be BlockBlob.
connection_string = "connection string to my dest blob storage account"
container_name = "myContainerName"
dest_file_name = "myDestFile.csv"
remote_blob_url = "http://path/to/remote/blobfile.csv"
client = BlobServiceClient.from_connection_string(connection_string)
dest_blob = client.get_blob_client(container_name, dest_file_name)
dest_blob.start_copy_from_url(remote_blob_url)
You can't change blob type as soon as you create it.Please see the Copy Blob From URL REST API,no blob-types header.
You could refer to my code to create block blob from append blob:
from azure.storage.blob import BlobPermissions
from datetime import datetime, timedelta
from azure.storage.blob import BlockBlobService
import requests
from io import BytesIO
account_name = "***"
account_key = "***"
container_name = "test"
blob_name = "test2.csv"
block_blob_service = BlockBlobService(account_name, account_key)
sas_token = block_blob_service.generate_blob_shared_access_signature(container_name, blob_name,
permission=BlobPermissions.READ,
expiry=datetime.utcnow() + timedelta(hours=1))
blob_url_with_sas = block_blob_service.make_blob_url(container_name, blob_name, sas_token=sas_token)
r = requests.get(blob_url_with_sas, stream=True)
block_blob_service.create_blob_from_stream("test", "jay.block", stream=BytesIO(r.content))
Here is what you want to do using the latest version (v12)
According to the documentation,
The source blob for a copy operation may be a block blob, an append blob,
or a page blob. If the destination blob already exists, it must be of the
same blob type as the source blob.
Right now, you cannot use start_copy_from_url to specify a blob type. However, you can use the synchronous copy APIS to do so in some cases.
For example, for block to page blob, create the destination page blob first and invoke update_range_from_url on the destination, with each chunk of 4 MB from the source.
Similarly, in your case, create an empty block blob first and the use the stage_block_from_url method.
from azure.storage.blob import ContainerClient
import os
conn_str = os.getenv("AZURE_STORAGE_CONNECTION_STRING")
dest_blob_name = "mynewblob"
source_url = "http://www.gutenberg.org/files/59466/59466-0.txt"
container_client = ContainerClient.from_connection_string(conn_str, "testcontainer")
blob_client = container_client.get_blob_client(dest_blob_name)
# upload the empty block blob
blob_client.upload_blob(b'')
# this will only stage your block
blob_client.stage_block_from_url(block_id=1, source_url=source_url)
# now it is committed
blob_client.commit_block_list(['1'])
# if you want to verify it's committed now
committed, uncommitted = blob_client.get_block_list('all')
assert len(committed) == 1
Let me know if this doesn't work.
EDIT:
You can leverage the source_offset and source_length params to upload blocks in chunks.
For example,
stage_block_from_url(block_id, source_url, source_offset=0, source_length=10)
will upload the first 10 bytes i.e. bytes from 0 to 9.
So, you can use a counter to keep incrementing the block_id and track your offset and length till you exhaust all your chunks.
EDIT2:
for step in range(....):
###
blob.stage_block_from_url(...)
##do not commit it##
#outside the for loop
blob.commit_block_list([j for j in range(i+1)]) (#or i+2?)
As I know there is no direct conversion between blob types. To do this you need to download the blob and reupload it as Block Blob.

Azure Blob - Read using Python

Can someone tell me if it is possible to read a csv file directly from Azure blob storage as a stream and process it using Python? I know it can be done using C#.Net (shown below) but wanted to know the equivalent library in Python to do this.
CloudBlobClient client = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = client.GetContainerReference("outfiles");
CloudBlob blob = container.GetBlobReference("Test.csv");*
Yes, it is certainly possible to do so. Check out Azure Storage SDK for Python
from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='myaccount', account_key='mykey')
block_blob_service.get_blob_to_path('mycontainer', 'myblockblob', 'out-sunset.png')
You can read the complete SDK documentation here: http://azure-storage.readthedocs.io.
Here's a way to do it with the new version of the SDK (12.0.0):
from azure.storage.blob import BlobClient
blob = BlobClient(account_url="https://<account_name>.blob.core.windows.net"
container_name="<container_name>",
blob_name="<blob_name>",
credential="<account_key>")
with open("example.csv", "wb") as f:
data = blob.download_blob()
data.readinto(f)
See here for details.
One can stream from blob with python like this:
from tempfile import NamedTemporaryFile
from azure.storage.blob.blockblobservice import BlockBlobService
entry_path = conf['entry_path']
container_name = conf['container_name']
blob_service = BlockBlobService(
account_name=conf['account_name'],
account_key=conf['account_key'])
def get_file(filename):
local_file = NamedTemporaryFile()
blob_service.get_blob_to_stream(container_name, filename, stream=local_file,
max_connections=2)
local_file.seek(0)
return local_file
Provide Your Azure subscription Azure storage name and Secret Key as Account Key here
block_blob_service = BlockBlobService(account_name='$$$$$$', account_key='$$$$$$')
This still get the blob and save in current location as 'output.jpg'
block_blob_service.get_blob_to_path('you-container_name', 'your-blob', 'output.jpg')
This will get text/item from blob
blob_item= block_blob_service.get_blob_to_bytes('your-container-name','blob-name')
blob_item.content
I recommend using smart_open.
import os
from azure.storage.blob import BlobServiceClient
from smart_open import open
connect_str = os.environ['AZURE_STORAGE_CONNECTION_STRING']
transport_params = {
'client': BlobServiceClient.from_connection_string(connect_str),
}
# stream from Azure Blob Storage
with open('azure://my_container/my_file.txt', transport_params=transport_params) as fin:
for line in fin:
print(line)
# stream content *into* Azure Blob Storage (write mode):
with open('azure://my_container/my_file.txt', 'wb', transport_params=transport_params) as fout:
fout.write(b'hello world')
Since I wasn't able to find what I needed on this thread, I wanted to follow up on #SebastianDziadzio's answer to retrieve the data without downloading it as a local file, which is what I was trying to find for myself.
Replace the with statement with the following:
from io import BytesIO
import pandas as pd
with BytesIO() as input_blob:
blob_client_instance.download_blob().download_to_stream(input_blob)
input_blob.seek(0)
df = pd.read_csv(input_blob, compression='infer', index_col=0)
Here is the simple way to read a CSV using Pandas from a Blob:
import os
from azure.storage.blob import BlobServiceClient
service_client = BlobServiceClient.from_connection_string(os.environ['AZURE_STORAGE_CONNECTION_STRING'])
client = service_client.get_container_client("your_container")
bc = client.get_blob_client(blob="your_folder/yourfile.csv")
data = bc.download_blob()
with open("file.csv", "wb") as f:
data.readinto(f)
df = pd.read_csv("file.csv")
To Read from Azure Blob
I want to use csv from azure blob storage to openpyxl xlsx
from io import BytesIO
conn_str = os.environ.get('BLOB_CONN_STR')
container_name = os.environ.get('CONTAINER_NAME')
blob = BlobClient.from_connection_string(conn_str, container_name=container_name,
blob_name="YOUR BLOB PATH HERE FROM AZURE BLOB")
data = blob.download_blob()
workbook_obj = openpyxl.load_workbook(filename=BytesIO(data.readall()))
To write in Azure Blob
I struggled lot for this I don't want anyone to do same,
If you are using openpyxl and want to directly write from azure function to blob storage do following steps and you will achieve what you are seeking for.
Thanks. HMU if you need anyhelp.
blob=BlobClient.from_connection_string(conn_str=conString,container_name=container_name, blob_name=r'YOUR_PATH/test1.xlsx')
blob.upload_blob(save_virtual_workbook(wb))
I know this is an old post but if someone wants to do the same.
I was able to access as per below codes
Note: you need to set the AZURE_STORAGE_CONNECTION_STRING which can be obtained from Azure Portal -> Go to your storage -> Settings -> Access keys and then you will get the connection string there.
For Windows:
setx AZURE_STORAGE_CONNECTION_STRING ""
For Linux:
export AZURE_STORAGE_CONNECTION_STRING=""
For macOS:
export AZURE_STORAGE_CONNECTION_STRING=""
import os
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
connect_str = os.getenv('AZURE_STORAGE_CONNECTION_STRING')
print(connect_str)
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_client = blob_service_client.get_container_client("Your Storage Name Here")
try:
print("\nListing blobs...")
# List the blobs in the container
blob_list = container_client.list_blobs()
for blob in blob_list:
print("\t" + blob.name)
except Exception as ex:
print('Exception:')
print(ex)

Categories