get contents of all azure blobs via python - python

I want to list all the blobs in a container and then ultimately store each blobs contents (each blob stores a csv file) into a data frame, it appears that the blob service client is the easiest way to list all the blobs, and this is what I have:
#!/usr/bin/env python3
import os
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
from pathlib import Path
from io import StringIO
import pandas as pd
def main():
connect_str = os.environ['AZURE_CONNECT_STR']
container = os.environ['CONTAINER']
print(connect_str + "\n")
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_client = blob_service_client.get_container_client(container)
blob_list = container_client.list_blobs()
for blob in blob_list:
print("\t" + blob.name)
if __name__ == "__main__":
main()
However, in the last version of blob storage client there appears to be no method which allows me to get the actual contents of the blob, what code should I be using ? there are other clients in the Python SDK for Azure, but it getting a full list of the blobs in a container using these seems cumbersome.

What you would need to do is create an instance of BlobClient using the container_client and the blob's name. You can then call download_blob method to download the blob.
Something like:
for blob in blob_list:
print("\t" + blob.name)
blob_client = container_client.get_blob_client(blob.name)
blob_client.download(...)

Related

How to download all files from a blob container using python

I have to mention that i barely know anything to python. I use an application that has no native support for downloading data from blop's. But it support python.
I have found a way to list all blop's within the container.
But I have no clue how to download them.
from azure.storage.blob import BlobServiceClient, ContainerClient
import io
from io import StringIO
import pandas as pd
from csv import reader
sas_url = r'https://ubftp.blob.core.windows.netxxxxxxxxxxxxxxxx'
container = ContainerClient.from_container_url(sas_url, delimiter='/')
blob_list = container.list_blobs()
for index, blob in enumerate(blob_list):
#for blob in blob_list:
#print(list(blob.keys()))
print(type(blob_name),blob['name'])
blob_name = blob['name']
It list's all the blops within every subfolder.
What do I add to the code to download them?
Or read them into a dataframe?
Kind regards
This is may be what you are looking for:
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python?tabs=managed-identity%2Croles-azure-portal%2Csign-in-azure-cli#download-blobs
# Download the blob to a local file
# Add 'DOWNLOAD' before the .txt extension so you can see both files in the data directory
download_file_path = os.path.join(local_path, str.replace(local_file_name ,'.txt', 'DOWNLOAD.txt'))
container_client = blob_service_client.get_container_client(container= container_name)
print("\nDownloading blob to \n\t" + download_file_path)
with open(file=download_file_path, mode="wb") as download_file:
download_file.write(container_client.download_blob(blob.name).readall())

Python - List all the files and blob inside an Azure Storage Container

This is my first post here on StackOverflow, hope it respects the guideline of this community.
I'm trying to accomplish a simple task in Python because even though I'm really new to it, I found it very easy to use.
I have a storage account on Azure, with a lot of containers inside.
Each container contains some random files and/or blobs.
What I'm trying to do, is to get the name of all these files and/or blob and put it on a file.
For now, I got here:
import os, uuid
import sys
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
connection_string = "my_connection_string"
blob_svc = BlobServiceClient.from_connection_string(conn_str=connection_string)
try:
print("Azure Blob Storage v" + __version__ + " - Python quickstart sample")
print("\nListing blobs...")
containers = blob_svc.list_containers()
list_of_blobs = []
for c in containers:
container_client = blob_svc.get_container_client(c)
blob_list = container_client.list_blobs()
for blob in blob_list:
list_of_blobs.append(blob.name)
file_path = 'C:/my/path/to/file/randomfile.txt'
sys.stdout = open(file_path, "w")
print(list_of_blobs)
except Exception as ex:
print('Exception:')
print(ex)
But I'm having 3 problems:
I'm getting the <name_of_ the_blob>/<name_of_the_file_inside>:
I would like to have just the name of the file inside the blob
If in a container there is a blob (or more than 1 blob) + a random file, this script prints only the name of the blob + the name of the file inside, skipping the other files outside the blobs.
I would like to put all the names of the blobs/files in a .csv file.
But I'm not sure how to do point 3, and how to resolve points 1 and 2.
Cloud some maybe help on this?
Thanks!
Edit:
I'm adding an image here just to clarify a little what I mean when I talk about blob/files
Just to clarify that there are no 2 things such as files or blobs in the Blob Storage the files inside Blob Storage are called blobs. Below is the hierarchy that you can observe in blob storage.
Blob Storage > Containers > Directories/Virtual Folders > Blobs
I'm getting the <name_of_ the_blob>/<name_of_the_file_inside>: I would like to have just the name of the file inside the blob
for this, you can iterate through your container using list_blobs(<Container_Name>) taking only the names of the blobs i.e., blob.name. Here is how the code goes when you are trying to list all the blobs names inside a container.
generator = blob_service.list_blobs(CONTAINER_NAME)
for blob in generator:
print("\t Blob name: "+c.name+'/'+ blob.name)
If in a container there is a blob (or more than 1 blob) + a random file, this script prints only the name of the blob + the name of the file inside, skipping the other files outside the blobs.
you can use iterate for containers using list_containers() and then use list_blobs(<Container_Name>) for iterating over the blob names and then finally write the blob names to a local file.
I would like to put all the names of the blobs/files in a .csv file.
A simple with open('<filename>.csv', 'w') as f write. Below is the sample code
with open('BlobsNames.csv', 'w') as f:
f.write(<statements>)
Here is the complete sample code that worked for us where each blob from every folder will be listed.
import os
from azure.storage.blob import BlockBlobService
ACCOUNT_NAME = "<ACCOUNT_NAME>"
SAS_TOKEN='<YOUR_SAS_TOKEN>'
blob_service = BlockBlobService(account_name=ACCOUNT_NAME,account_key=None,sas_token=SAS_TOKEN)
print("\nList blobs in the container")
with open('BlobsNames.txt', 'w') as f:
containers = blob_service.list_containers()
for c in containers:
generator = blob_service.list_blobs(c.name)
for blob in generator:
print("\t Blob name: "+c.name+'/'+ blob.name)
f.write(c.name+'/'+blob.name)
f.write('\n')
This works even when there are folders in containers.
RESULT:
NOTE: You can just remove c.name while printing the blob to file if your requirement is to just pull out the blob names.
Thanks all for your reply,
in the end, I took what SwethaKandikonda-MT wrote, and I change it a little bit to fit the connection problem that I had.
Here is what I came up:
import os, uuid
import sys
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
import csv
connection_string = "my_account_storage_connection_string"
blob_svc = BlobServiceClient.from_connection_string(conn_str=connection_string)
list_of_blobs = []
print("\nList blobs in the container")
with open('My_path/to/the/file.csv', 'w') as f:
containers = blob_svc.list_containers()
for c in containers:
container_client = blob_svc.get_container_client(c.name)
blob_list = container_client.list_blobs()
for blob in blob_list:
print("\t Blob name: "+c.name +'/'+ blob.name) #this will print on the console
f.write('/'+blob.name) #this will write on the csv file just the blob name
f.write('\n')

Download a picture from a blob using python ( azure Blob storage)

I want to download an image from a blob that is in a container.
I searched and I only found how to download a container, but as I said I do not want to download the whole container and not the whole blob otherwise just an image.
(container/blob/image.png)
this is the code that i found ( to download all the container):
import os
from azure.storage.blob import BlobServiceClient, BlobClient
from azure.storage.blob import ContentSettings, ContainerClient
# IMPORTANT: Replace connection string with your storage account connection string
# Usually starts with DefaultEndpointsProtocol=https;...
MY_CONNECTION_STRING = "CONNECTION_STRING"
# Replace with blob container
MY_BLOB_CONTAINER = "name"
# Replace with the local folder where you want files to be downloaded
LOCAL_BLOB_PATH = "Blobsss"
BLOBNAME="test"
class AzureBlobFileDownloader:
def __init__(self):
print("Intializing AzureBlobFileDownloader")
# Initialize the connection to Azure storage account
self.blob_service_client = BlobServiceClient.from_connection_string(MY_CONNECTION_STRING)
self.my_container = self.blob_service_client.get_container_client(MY_BLOB_CONTAINER)
def save_blob(self, file_name, file_content):
# Get full path to the file
download_file_path = os.path.join(LOCAL_BLOB_PATH, file_name)
# for nested blobs, create local path as well!
os.makedirs(os.path.dirname(download_file_path), exist_ok=True)
with open(download_file_path, "wb") as file:
file.write(file_content)
def download_all_blobs_in_container(self):
my_blobs = self.my_container.list_blobs()
for blob in my_blobs:
print(blob.name)
bytes = self.my_container.get_blob_client(blob).download_blob().readall()
self.save_blob(blob.name, bytes)
# Initialize class and upload files
azure_blob_file_downloader = AzureBlobFileDownloader()
azure_blob_file_downloader.download_all_blobs_in_container()
Could you please help me ?
thanks you

How do I read/print the contents within a blob storage container?

I am trying to read the contents within "clients" blob storage.
As you can see in the attached picture, I have listed the blobs within that container and now I would like to print the contents of the blobs. for example, I trying to print out the contents of a json file called "clients.json"
import os, uuid, json
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
from config import config, store
try:
print("Azure Blob Storage v" + __version__ + " - Python quickstart sample")
configConnectionString = config
blob_service_client = BlobServiceClient.from_connection_string(configConnectionString)
container_client = blob_service_client.get_container_client(store)
# List the blobs in the container
# blob_list = container_client.list_blobs()
# for blob in blob_list:
# print("\t" + blob.name)
# Read the contents within the blob containers
with open('clients.json') as json_file:
data = json.load(json_file)
print(data)
except Exception as ex:
print('Exception:')
print(ex)
this is to show that my code is working when I want to list the blobs in the container
The get_container_client return an instance of ContainerClient class. It has a method called download_blob which can be used to download the blob to the StorageStreamDownloader. You can then use the content_as_text method of the StorageStreamDownloader class to download the contents of the blob, and decode as text.
container_client = blob_service_client.get_container_client(store)
downloader = container_client.download_blob("clients.json")
print(downloader.content_as_text())

Azure Blob - Read using Python

Can someone tell me if it is possible to read a csv file directly from Azure blob storage as a stream and process it using Python? I know it can be done using C#.Net (shown below) but wanted to know the equivalent library in Python to do this.
CloudBlobClient client = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = client.GetContainerReference("outfiles");
CloudBlob blob = container.GetBlobReference("Test.csv");*
Yes, it is certainly possible to do so. Check out Azure Storage SDK for Python
from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='myaccount', account_key='mykey')
block_blob_service.get_blob_to_path('mycontainer', 'myblockblob', 'out-sunset.png')
You can read the complete SDK documentation here: http://azure-storage.readthedocs.io.
Here's a way to do it with the new version of the SDK (12.0.0):
from azure.storage.blob import BlobClient
blob = BlobClient(account_url="https://<account_name>.blob.core.windows.net"
container_name="<container_name>",
blob_name="<blob_name>",
credential="<account_key>")
with open("example.csv", "wb") as f:
data = blob.download_blob()
data.readinto(f)
See here for details.
One can stream from blob with python like this:
from tempfile import NamedTemporaryFile
from azure.storage.blob.blockblobservice import BlockBlobService
entry_path = conf['entry_path']
container_name = conf['container_name']
blob_service = BlockBlobService(
account_name=conf['account_name'],
account_key=conf['account_key'])
def get_file(filename):
local_file = NamedTemporaryFile()
blob_service.get_blob_to_stream(container_name, filename, stream=local_file,
max_connections=2)
local_file.seek(0)
return local_file
Provide Your Azure subscription Azure storage name and Secret Key as Account Key here
block_blob_service = BlockBlobService(account_name='$$$$$$', account_key='$$$$$$')
This still get the blob and save in current location as 'output.jpg'
block_blob_service.get_blob_to_path('you-container_name', 'your-blob', 'output.jpg')
This will get text/item from blob
blob_item= block_blob_service.get_blob_to_bytes('your-container-name','blob-name')
blob_item.content
I recommend using smart_open.
import os
from azure.storage.blob import BlobServiceClient
from smart_open import open
connect_str = os.environ['AZURE_STORAGE_CONNECTION_STRING']
transport_params = {
'client': BlobServiceClient.from_connection_string(connect_str),
}
# stream from Azure Blob Storage
with open('azure://my_container/my_file.txt', transport_params=transport_params) as fin:
for line in fin:
print(line)
# stream content *into* Azure Blob Storage (write mode):
with open('azure://my_container/my_file.txt', 'wb', transport_params=transport_params) as fout:
fout.write(b'hello world')
Since I wasn't able to find what I needed on this thread, I wanted to follow up on #SebastianDziadzio's answer to retrieve the data without downloading it as a local file, which is what I was trying to find for myself.
Replace the with statement with the following:
from io import BytesIO
import pandas as pd
with BytesIO() as input_blob:
blob_client_instance.download_blob().download_to_stream(input_blob)
input_blob.seek(0)
df = pd.read_csv(input_blob, compression='infer', index_col=0)
Here is the simple way to read a CSV using Pandas from a Blob:
import os
from azure.storage.blob import BlobServiceClient
service_client = BlobServiceClient.from_connection_string(os.environ['AZURE_STORAGE_CONNECTION_STRING'])
client = service_client.get_container_client("your_container")
bc = client.get_blob_client(blob="your_folder/yourfile.csv")
data = bc.download_blob()
with open("file.csv", "wb") as f:
data.readinto(f)
df = pd.read_csv("file.csv")
To Read from Azure Blob
I want to use csv from azure blob storage to openpyxl xlsx
from io import BytesIO
conn_str = os.environ.get('BLOB_CONN_STR')
container_name = os.environ.get('CONTAINER_NAME')
blob = BlobClient.from_connection_string(conn_str, container_name=container_name,
blob_name="YOUR BLOB PATH HERE FROM AZURE BLOB")
data = blob.download_blob()
workbook_obj = openpyxl.load_workbook(filename=BytesIO(data.readall()))
To write in Azure Blob
I struggled lot for this I don't want anyone to do same,
If you are using openpyxl and want to directly write from azure function to blob storage do following steps and you will achieve what you are seeking for.
Thanks. HMU if you need anyhelp.
blob=BlobClient.from_connection_string(conn_str=conString,container_name=container_name, blob_name=r'YOUR_PATH/test1.xlsx')
blob.upload_blob(save_virtual_workbook(wb))
I know this is an old post but if someone wants to do the same.
I was able to access as per below codes
Note: you need to set the AZURE_STORAGE_CONNECTION_STRING which can be obtained from Azure Portal -> Go to your storage -> Settings -> Access keys and then you will get the connection string there.
For Windows:
setx AZURE_STORAGE_CONNECTION_STRING ""
For Linux:
export AZURE_STORAGE_CONNECTION_STRING=""
For macOS:
export AZURE_STORAGE_CONNECTION_STRING=""
import os
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
connect_str = os.getenv('AZURE_STORAGE_CONNECTION_STRING')
print(connect_str)
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_client = blob_service_client.get_container_client("Your Storage Name Here")
try:
print("\nListing blobs...")
# List the blobs in the container
blob_list = container_client.list_blobs()
for blob in blob_list:
print("\t" + blob.name)
except Exception as ex:
print('Exception:')
print(ex)

Categories