Write zip to Blob storage Azure - python

I'm trying to zip files present in container 'input' and move them to container 'output'.
I'm using python SDK
# connection to blob storage via Azure Python SDK
connection_string = "myConnectionString"
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
# get container client
input_container = blob_service_client.get_container_client(container="input")
# filename
filename = "document_to_zip.pdf"
# init zip object
zip_filename = "document_zipped.zip"
zip_object = ZipFile(zip_filename, "w")
data = input_container.download_blob(filename).readall()
zip_object.write(data)
# upload blob to results container as .zip file
results_blob = blob_service_client.get_blob_client(container="output",blob=zip_filename)
results_blob.upload_blob(zip_object, overwrite=True)
Get the following error :
Exception: ValueError: stat: embedded null character in path.
More general question : do you think my approach is fine regarding ziping and moving blob from one container to another ?
Thanks

In general, this error occurs when path contains '/' or ' \' in it. Meanwhile I could able to resolve it by removing the zip_object.write(data) line. Also keep in mind that the above-mentioned code works only for a single file in input container with an unsupported content which throws an error when downloaded.
The below code works but gives error when downloaded
from azure.storage.blob import BlobServiceClient
from zipfile import ZipFile
# connection to blob storage via Azure Python SDK
connection_string = "<YOUR_CONNECTION_STRING>"
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
# get container client
input_container = blob_service_client.get_container_client(container="input")
# filename
filename = "document_to_zip.pdf"
# init zip object
zip_filename = "document_zipped.zip"
zip_object = ZipFile(zip_filename, "w")
data = input_container.download_blob(filename).readall()
# upload blob to results container as .zip file
results_blob = blob_service_client.get_blob_client(container="output",blob=zip_filename)
results_blob.upload_blob(zip_object, overwrite=True)
RESULTS:
Meanwhile you can save a group of files by looping inside the input container and zip them inside output container.
from azure.storage.blob import BlobServiceClient
from zipfile import ZipFile
connection_string = "<Your_CONNECTION_STRING>"
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
input_container = blob_service_client.get_container_client(container="input")
generator = input_container.list_blobs()
for blob in generator:
data = input_container.download_blob(blob.name).readall()
results_blob = blob_service_client.get_blob_client(container="output"+"/"+"ZipFolder.zip",blob=blob.name)
results_blob.upload_blob(data, overwrite=True)
RESULTS:

Related

Azure blob storage trying to upload the same file to different folders

I am trying to upload multiple files with the name 'data' to the Azure blob, I am uploading them in different folders in my function, I create a new one for each new 'data' file, but still I get the error BlobAlreadyExistsThe specified blob already exists.Any ideas ?
`
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
blob_client = blob_service_client.get_blob_client(container=container_name+'\\'+id+'\\'+uploadnr, blob=filename)
with open(pathF+filename,"rb") as data:
blob_client.upload_blob(data)
print(f"Uploaded {filename}.")
`
I tried in my environment and got below results:
BlobAlreadyExistsThe specified blob already exists.
Initially I tried in my environment and got same results:
Code:
from azure.storage.blob import BlobServiceClient
connection_string="<Connect_string>"
container="test\data"
filename="sample1.pdf"
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
blob_client = blob_service_client.get_blob_client(container,filename)
with open("path+filename","rb") as data:
blob_client.upload_blob(data)
print(f"Uploaded {filename}.")
Console:
After I made changes in code it executed successfully.

Python - List all the files and blob inside an Azure Storage Container

This is my first post here on StackOverflow, hope it respects the guideline of this community.
I'm trying to accomplish a simple task in Python because even though I'm really new to it, I found it very easy to use.
I have a storage account on Azure, with a lot of containers inside.
Each container contains some random files and/or blobs.
What I'm trying to do, is to get the name of all these files and/or blob and put it on a file.
For now, I got here:
import os, uuid
import sys
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
connection_string = "my_connection_string"
blob_svc = BlobServiceClient.from_connection_string(conn_str=connection_string)
try:
print("Azure Blob Storage v" + __version__ + " - Python quickstart sample")
print("\nListing blobs...")
containers = blob_svc.list_containers()
list_of_blobs = []
for c in containers:
container_client = blob_svc.get_container_client(c)
blob_list = container_client.list_blobs()
for blob in blob_list:
list_of_blobs.append(blob.name)
file_path = 'C:/my/path/to/file/randomfile.txt'
sys.stdout = open(file_path, "w")
print(list_of_blobs)
except Exception as ex:
print('Exception:')
print(ex)
But I'm having 3 problems:
I'm getting the <name_of_ the_blob>/<name_of_the_file_inside>:
I would like to have just the name of the file inside the blob
If in a container there is a blob (or more than 1 blob) + a random file, this script prints only the name of the blob + the name of the file inside, skipping the other files outside the blobs.
I would like to put all the names of the blobs/files in a .csv file.
But I'm not sure how to do point 3, and how to resolve points 1 and 2.
Cloud some maybe help on this?
Thanks!
Edit:
I'm adding an image here just to clarify a little what I mean when I talk about blob/files
Just to clarify that there are no 2 things such as files or blobs in the Blob Storage the files inside Blob Storage are called blobs. Below is the hierarchy that you can observe in blob storage.
Blob Storage > Containers > Directories/Virtual Folders > Blobs
I'm getting the <name_of_ the_blob>/<name_of_the_file_inside>: I would like to have just the name of the file inside the blob
for this, you can iterate through your container using list_blobs(<Container_Name>) taking only the names of the blobs i.e., blob.name. Here is how the code goes when you are trying to list all the blobs names inside a container.
generator = blob_service.list_blobs(CONTAINER_NAME)
for blob in generator:
print("\t Blob name: "+c.name+'/'+ blob.name)
If in a container there is a blob (or more than 1 blob) + a random file, this script prints only the name of the blob + the name of the file inside, skipping the other files outside the blobs.
you can use iterate for containers using list_containers() and then use list_blobs(<Container_Name>) for iterating over the blob names and then finally write the blob names to a local file.
I would like to put all the names of the blobs/files in a .csv file.
A simple with open('<filename>.csv', 'w') as f write. Below is the sample code
with open('BlobsNames.csv', 'w') as f:
f.write(<statements>)
Here is the complete sample code that worked for us where each blob from every folder will be listed.
import os
from azure.storage.blob import BlockBlobService
ACCOUNT_NAME = "<ACCOUNT_NAME>"
SAS_TOKEN='<YOUR_SAS_TOKEN>'
blob_service = BlockBlobService(account_name=ACCOUNT_NAME,account_key=None,sas_token=SAS_TOKEN)
print("\nList blobs in the container")
with open('BlobsNames.txt', 'w') as f:
containers = blob_service.list_containers()
for c in containers:
generator = blob_service.list_blobs(c.name)
for blob in generator:
print("\t Blob name: "+c.name+'/'+ blob.name)
f.write(c.name+'/'+blob.name)
f.write('\n')
This works even when there are folders in containers.
RESULT:
NOTE: You can just remove c.name while printing the blob to file if your requirement is to just pull out the blob names.
Thanks all for your reply,
in the end, I took what SwethaKandikonda-MT wrote, and I change it a little bit to fit the connection problem that I had.
Here is what I came up:
import os, uuid
import sys
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
import csv
connection_string = "my_account_storage_connection_string"
blob_svc = BlobServiceClient.from_connection_string(conn_str=connection_string)
list_of_blobs = []
print("\nList blobs in the container")
with open('My_path/to/the/file.csv', 'w') as f:
containers = blob_svc.list_containers()
for c in containers:
container_client = blob_svc.get_container_client(c.name)
blob_list = container_client.list_blobs()
for blob in blob_list:
print("\t Blob name: "+c.name +'/'+ blob.name) #this will print on the console
f.write('/'+blob.name) #this will write on the csv file just the blob name
f.write('\n')

Download a picture from a blob using python ( azure Blob storage)

I want to download an image from a blob that is in a container.
I searched and I only found how to download a container, but as I said I do not want to download the whole container and not the whole blob otherwise just an image.
(container/blob/image.png)
this is the code that i found ( to download all the container):
import os
from azure.storage.blob import BlobServiceClient, BlobClient
from azure.storage.blob import ContentSettings, ContainerClient
# IMPORTANT: Replace connection string with your storage account connection string
# Usually starts with DefaultEndpointsProtocol=https;...
MY_CONNECTION_STRING = "CONNECTION_STRING"
# Replace with blob container
MY_BLOB_CONTAINER = "name"
# Replace with the local folder where you want files to be downloaded
LOCAL_BLOB_PATH = "Blobsss"
BLOBNAME="test"
class AzureBlobFileDownloader:
def __init__(self):
print("Intializing AzureBlobFileDownloader")
# Initialize the connection to Azure storage account
self.blob_service_client = BlobServiceClient.from_connection_string(MY_CONNECTION_STRING)
self.my_container = self.blob_service_client.get_container_client(MY_BLOB_CONTAINER)
def save_blob(self, file_name, file_content):
# Get full path to the file
download_file_path = os.path.join(LOCAL_BLOB_PATH, file_name)
# for nested blobs, create local path as well!
os.makedirs(os.path.dirname(download_file_path), exist_ok=True)
with open(download_file_path, "wb") as file:
file.write(file_content)
def download_all_blobs_in_container(self):
my_blobs = self.my_container.list_blobs()
for blob in my_blobs:
print(blob.name)
bytes = self.my_container.get_blob_client(blob).download_blob().readall()
self.save_blob(blob.name, bytes)
# Initialize class and upload files
azure_blob_file_downloader = AzureBlobFileDownloader()
azure_blob_file_downloader.download_all_blobs_in_container()
Could you please help me ?
thanks you

how to download all blob from a container where blob is in sub directory style using python

The below code will download a particular blob by giving the blob name
import constants
import os
import tempfile
from azure.storage.blob import BlobServiceClient
temp_dir = tempfile.TemporaryDirectory()
print(temp_dir.name)
Local_path = os.path.join(temp_dir.name, constants.BLOB_NAME)
class AzureBlob:
def __init__(self, CONNECTION_STRING, BLOB_CONTAINER,
BLOB_PATH, BLOB_NAME):
self.blob_service_client = self.activate_blob_service()
self.container_client = self.initialize_container()
self.BLOB_CONTAINER = BLOB_CONTAINER
self.CONNECTION_STRING = CONNECTION_STRING
self.BLOB_PATH = BLOB_PATH
self.BLOB_NAME = BLOB_NAME
# Initialize a BlobServiceClient object
def activate_blob_service(self):
self.blob_service_client = BlobServiceClient.from_connection_string(self.CONNECTION_STRING)
# print(self.CONNECTION_STRING)
return self.blob_service_client
# Initialize a container from its name
def initialize_container(self):
self.container_client = self.blob_service_client.get_container_client(self.BLOB_CONTAINER)
# print(container_client)
return self.container_client
# Download Blob to local
def download_file(self):
with open(Local_path, 'wb+') as f:
f.write(self.container_client.download_blob(os.path.join(self.BLOB_PATH, self.BLOB_NAME)).readall())
return Local_path
# AzureBlob().download_file()
a = AzureBlob(constants.CONNECTION_STRING, constants.BLOB_CONTAINER,
constants.BLOB_PATH, constants.BLOB_NAME)
What iam actualy trying to achieve is to download all blob from a container where blob is in sub directory. I will provide the directory path of the blob and i need all the information inside the directory to be downloaded.
To achieve the above requirement you can try the below workaround to download all the files from your container,
# download_blobs.py
# Python program to bulk download blob files from azure storage
# Uses latest python SDK() for Azure blob storage
# Requires python 3.6 or above
import os
from azure.storage.blob import BlobServiceClient, BlobClient
from azure.storage.blob import ContentSettings, ContainerClient
# IMPORTANT: Replace connection string with your storage account connection string
# Usually starts with DefaultEndpointsProtocol=https;...
MY_CONNECTION_STRING = "REPLACE_THIS"
# Replace with blob container
MY_BLOB_CONTAINER = "myimages"
# Replace with the local folder where you want files to be downloaded
LOCAL_BLOB_PATH = "REPLACE_THIS"
class AzureBlobFileDownloader:
def __init__(self):
print("Intializing AzureBlobFileDownloader")
# Initialize the connection to Azure storage account
self.blob_service_client = BlobServiceClient.from_connection_string(MY_CONNECTION_STRING)
self.my_container = self.blob_service_client.get_container_client(MY_BLOB_CONTAINER)
def save_blob(self,file_name,file_content):
# Get full path to the file
download_file_path = os.path.join(LOCAL_BLOB_PATH, file_name)
# for nested blobs, create local path as well!
os.makedirs(os.path.dirname(download_file_path), exist_ok=True)
with open(download_file_path, "wb") as file:
file.write(file_content)
def download_all_blobs_in_container(self):
my_blobs = self.my_container.list_blobs()
for blob in my_blobs:
print(blob.name)
bytes = self.my_container.get_blob_client(blob).download_blob().readall()
self.save_blob(blob.name, bytes)
# Initialize class and upload files
azure_blob_file_downloader = AzureBlobFileDownloader()
azure_blob_file_downloader.download_all_blobs_in_container()
For more information please refer this blog post & SO THREAD

Azure Storage Blob and Python IndexError

I'm pretty new to python and fairly stupid. I'm working on a POC to upload blobs to Azure Blob Storage with a BlobSasURL. Below is my code. When I run it, I get the following error
container_name, blob_name = unquote(path_blob[-2]), unquote(path_blob[-1])
IndexError: list index out of range
Code as it is currently
import os
import yaml
from azure.storage.blob import BlobClient
'''
Importing the configs from yaml
This method required the use of a blob SAS URL or Token
Create config.yaml in teh same path as bluppy.py withg the following
if 'account_url' contains the token or shared_access_key, you don't need to add it to the yaml.
---
source_folder: "./blobs"
account_url: "<ProperlyFormattedBlobSaSURLwithcontainerandcredentialincluded>"
container_name: "<container_name>"
'''
#Import configs from yaml
def bluppy_cfg():
cfg_root = os.path.dirname(os.path.abspath(__file__))
with open(cfg_root + "/config.yaml", "r") as yamlfile:
return yaml.load(yamlfile, Loader=yaml.FullLoader)
#Look in source folder for files to upload to storage
def get_blobs_up(dir):
with os.scandir(dir) as to_go:
for thing in to_go:
if thing.is_file() and not thing.name.startswith('.'):
yield thing
# Uploads a blob to Azure Blob Storage Conatiner via BlobSaSURL
def blob_upload(blob_url, container_name, blob_name):
blob_client = BlobClient.from_blob_url(blob_url, container_name, blob_name)
print("Bluppy is uploading a blob")
for file in files:
azbl_client = blob_client.get_blob_client(file.name)
with open(file.path, "rb") as data:
azbl_client.upload_blob(data)
print(f'{file.name} uploaded to blob storage successfully')
config = bluppy_cfg()
blob_name = get_blobs_up(config["source_folder"])
#print(*blob_name)
blob_upload(config["account_url"], config["container_name"], config["blob_name"])
There are files in the folder. When I print(*blob_name) I see the files/blobs in the folder I'm scanning for upload. I am not sure what I am missing and would appreciate any help.
Again, new/stupid coder here, so please be gentle and thanks in advance for your help!

Categories