I'm pretty new to python and fairly stupid. I'm working on a POC to upload blobs to Azure Blob Storage with a BlobSasURL. Below is my code. When I run it, I get the following error
container_name, blob_name = unquote(path_blob[-2]), unquote(path_blob[-1])
IndexError: list index out of range
Code as it is currently
import os
import yaml
from azure.storage.blob import BlobClient
'''
Importing the configs from yaml
This method required the use of a blob SAS URL or Token
Create config.yaml in teh same path as bluppy.py withg the following
if 'account_url' contains the token or shared_access_key, you don't need to add it to the yaml.
---
source_folder: "./blobs"
account_url: "<ProperlyFormattedBlobSaSURLwithcontainerandcredentialincluded>"
container_name: "<container_name>"
'''
#Import configs from yaml
def bluppy_cfg():
cfg_root = os.path.dirname(os.path.abspath(__file__))
with open(cfg_root + "/config.yaml", "r") as yamlfile:
return yaml.load(yamlfile, Loader=yaml.FullLoader)
#Look in source folder for files to upload to storage
def get_blobs_up(dir):
with os.scandir(dir) as to_go:
for thing in to_go:
if thing.is_file() and not thing.name.startswith('.'):
yield thing
# Uploads a blob to Azure Blob Storage Conatiner via BlobSaSURL
def blob_upload(blob_url, container_name, blob_name):
blob_client = BlobClient.from_blob_url(blob_url, container_name, blob_name)
print("Bluppy is uploading a blob")
for file in files:
azbl_client = blob_client.get_blob_client(file.name)
with open(file.path, "rb") as data:
azbl_client.upload_blob(data)
print(f'{file.name} uploaded to blob storage successfully')
config = bluppy_cfg()
blob_name = get_blobs_up(config["source_folder"])
#print(*blob_name)
blob_upload(config["account_url"], config["container_name"], config["blob_name"])
There are files in the folder. When I print(*blob_name) I see the files/blobs in the folder I'm scanning for upload. I am not sure what I am missing and would appreciate any help.
Again, new/stupid coder here, so please be gentle and thanks in advance for your help!
Related
I am trying to upload multiple files with the name 'data' to the Azure blob, I am uploading them in different folders in my function, I create a new one for each new 'data' file, but still I get the error BlobAlreadyExistsThe specified blob already exists.Any ideas ?
`
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
blob_client = blob_service_client.get_blob_client(container=container_name+'\\'+id+'\\'+uploadnr, blob=filename)
with open(pathF+filename,"rb") as data:
blob_client.upload_blob(data)
print(f"Uploaded {filename}.")
`
I tried in my environment and got below results:
BlobAlreadyExistsThe specified blob already exists.
Initially I tried in my environment and got same results:
Code:
from azure.storage.blob import BlobServiceClient
connection_string="<Connect_string>"
container="test\data"
filename="sample1.pdf"
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
blob_client = blob_service_client.get_blob_client(container,filename)
with open("path+filename","rb") as data:
blob_client.upload_blob(data)
print(f"Uploaded {filename}.")
Console:
After I made changes in code it executed successfully.
This is my first post here on StackOverflow, hope it respects the guideline of this community.
I'm trying to accomplish a simple task in Python because even though I'm really new to it, I found it very easy to use.
I have a storage account on Azure, with a lot of containers inside.
Each container contains some random files and/or blobs.
What I'm trying to do, is to get the name of all these files and/or blob and put it on a file.
For now, I got here:
import os, uuid
import sys
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
connection_string = "my_connection_string"
blob_svc = BlobServiceClient.from_connection_string(conn_str=connection_string)
try:
print("Azure Blob Storage v" + __version__ + " - Python quickstart sample")
print("\nListing blobs...")
containers = blob_svc.list_containers()
list_of_blobs = []
for c in containers:
container_client = blob_svc.get_container_client(c)
blob_list = container_client.list_blobs()
for blob in blob_list:
list_of_blobs.append(blob.name)
file_path = 'C:/my/path/to/file/randomfile.txt'
sys.stdout = open(file_path, "w")
print(list_of_blobs)
except Exception as ex:
print('Exception:')
print(ex)
But I'm having 3 problems:
I'm getting the <name_of_ the_blob>/<name_of_the_file_inside>:
I would like to have just the name of the file inside the blob
If in a container there is a blob (or more than 1 blob) + a random file, this script prints only the name of the blob + the name of the file inside, skipping the other files outside the blobs.
I would like to put all the names of the blobs/files in a .csv file.
But I'm not sure how to do point 3, and how to resolve points 1 and 2.
Cloud some maybe help on this?
Thanks!
Edit:
I'm adding an image here just to clarify a little what I mean when I talk about blob/files
Just to clarify that there are no 2 things such as files or blobs in the Blob Storage the files inside Blob Storage are called blobs. Below is the hierarchy that you can observe in blob storage.
Blob Storage > Containers > Directories/Virtual Folders > Blobs
I'm getting the <name_of_ the_blob>/<name_of_the_file_inside>: I would like to have just the name of the file inside the blob
for this, you can iterate through your container using list_blobs(<Container_Name>) taking only the names of the blobs i.e., blob.name. Here is how the code goes when you are trying to list all the blobs names inside a container.
generator = blob_service.list_blobs(CONTAINER_NAME)
for blob in generator:
print("\t Blob name: "+c.name+'/'+ blob.name)
If in a container there is a blob (or more than 1 blob) + a random file, this script prints only the name of the blob + the name of the file inside, skipping the other files outside the blobs.
you can use iterate for containers using list_containers() and then use list_blobs(<Container_Name>) for iterating over the blob names and then finally write the blob names to a local file.
I would like to put all the names of the blobs/files in a .csv file.
A simple with open('<filename>.csv', 'w') as f write. Below is the sample code
with open('BlobsNames.csv', 'w') as f:
f.write(<statements>)
Here is the complete sample code that worked for us where each blob from every folder will be listed.
import os
from azure.storage.blob import BlockBlobService
ACCOUNT_NAME = "<ACCOUNT_NAME>"
SAS_TOKEN='<YOUR_SAS_TOKEN>'
blob_service = BlockBlobService(account_name=ACCOUNT_NAME,account_key=None,sas_token=SAS_TOKEN)
print("\nList blobs in the container")
with open('BlobsNames.txt', 'w') as f:
containers = blob_service.list_containers()
for c in containers:
generator = blob_service.list_blobs(c.name)
for blob in generator:
print("\t Blob name: "+c.name+'/'+ blob.name)
f.write(c.name+'/'+blob.name)
f.write('\n')
This works even when there are folders in containers.
RESULT:
NOTE: You can just remove c.name while printing the blob to file if your requirement is to just pull out the blob names.
Thanks all for your reply,
in the end, I took what SwethaKandikonda-MT wrote, and I change it a little bit to fit the connection problem that I had.
Here is what I came up:
import os, uuid
import sys
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
import csv
connection_string = "my_account_storage_connection_string"
blob_svc = BlobServiceClient.from_connection_string(conn_str=connection_string)
list_of_blobs = []
print("\nList blobs in the container")
with open('My_path/to/the/file.csv', 'w') as f:
containers = blob_svc.list_containers()
for c in containers:
container_client = blob_svc.get_container_client(c.name)
blob_list = container_client.list_blobs()
for blob in blob_list:
print("\t Blob name: "+c.name +'/'+ blob.name) #this will print on the console
f.write('/'+blob.name) #this will write on the csv file just the blob name
f.write('\n')
I want to download an image from a blob that is in a container.
I searched and I only found how to download a container, but as I said I do not want to download the whole container and not the whole blob otherwise just an image.
(container/blob/image.png)
this is the code that i found ( to download all the container):
import os
from azure.storage.blob import BlobServiceClient, BlobClient
from azure.storage.blob import ContentSettings, ContainerClient
# IMPORTANT: Replace connection string with your storage account connection string
# Usually starts with DefaultEndpointsProtocol=https;...
MY_CONNECTION_STRING = "CONNECTION_STRING"
# Replace with blob container
MY_BLOB_CONTAINER = "name"
# Replace with the local folder where you want files to be downloaded
LOCAL_BLOB_PATH = "Blobsss"
BLOBNAME="test"
class AzureBlobFileDownloader:
def __init__(self):
print("Intializing AzureBlobFileDownloader")
# Initialize the connection to Azure storage account
self.blob_service_client = BlobServiceClient.from_connection_string(MY_CONNECTION_STRING)
self.my_container = self.blob_service_client.get_container_client(MY_BLOB_CONTAINER)
def save_blob(self, file_name, file_content):
# Get full path to the file
download_file_path = os.path.join(LOCAL_BLOB_PATH, file_name)
# for nested blobs, create local path as well!
os.makedirs(os.path.dirname(download_file_path), exist_ok=True)
with open(download_file_path, "wb") as file:
file.write(file_content)
def download_all_blobs_in_container(self):
my_blobs = self.my_container.list_blobs()
for blob in my_blobs:
print(blob.name)
bytes = self.my_container.get_blob_client(blob).download_blob().readall()
self.save_blob(blob.name, bytes)
# Initialize class and upload files
azure_blob_file_downloader = AzureBlobFileDownloader()
azure_blob_file_downloader.download_all_blobs_in_container()
Could you please help me ?
thanks you
I was reading the python documentation for google cloud storage and was successfully able to create a method that uploads files, however, I am not able to find a way to download files using a blob's URL. I was able to download the file using the filename, but that's not practical since the user could upload files with the same name. The blob is private. I have access to the blob's URL, so I was wondering if there is a way to download files using this link.
This is my upload code which works perfectly:
def upload_blob(bucket_name, filename, file_obj):
if filename and file_obj:
storage_client = storage.Client()
bucket = storage_client.bucket('example-storage-bucket')
blob = bucket.blob(filename)
blob.upload_from_file(file_obj) # binary file data
form_logger.info('File {} uploaded'.format(filename))
return blob
This code downloads the file, but I could only figure it out with the blob name, not URL:
def download_blob(bucket_name, url):
if url:
storage_client = storage.Client()
bucket = storage_client.bucket('example-storage-bucket')
blob = bucket.blob(url)
blob.download_to_filename("example.pdf")
Any suggestions or thoughts on how to download the file using the blob's media link URL?
For example, bucket example-storage-bucket has file folder/example.pdf and its
Link URL is https://storage.cloud.google.com/example-storage-bucket/folder/example.pdf and
URI is gs://example-storage-bucket/folder/example.pdf
Use below function to download blob using GCS link URL(if you are using Python 3.x):
import os
from urllib.parse import urlparse
def decode_gcs_url(url):
p = urlparse(url)
path = p.path[1:].split('/', 1)
bucket, file_path = path[0], path[1]
return bucket, file_path
def download_blob(url):
if url:
storage_client = storage.Client()
bucket, file_path = decode_gcs_url(url)
bucket = storage_client.bucket(bucket)
blob = bucket.blob(file_path)
blob.download_to_filename(os.path.basename(file_path))
I think what you're saying is that you want to download the blob to a file whose name is based on the blob name, correct? If so, you can find the blob name in the blob.metadata, and then pick a filename based on that blob name.
I am using AWS Sagemaker and trying to upload a data folder into S3 from Sagemaker. I am trying to do is to upload my data into the s3_train_data directory (the directory exists in S3). However, it wouldn't upload it in that bucket, but in a default Bucket that has been created, and in turn creates a new folder directory with the S3_train_data variables.
code to input in directory
import os
import sagemaker
from sagemaker import get_execution_role
sagemaker_session = sagemaker.Session()
role = get_execution_role()
bucket = <bucket name>
prefix = <folders1/folders2>
key = <input>
s3_train_data = 's3://{}/{}/{}/'.format(bucket, prefix, key)
#path 'data' is the folder in the Jupyter Instance, contains all the training data
inputs = sagemaker_session.upload_data(path= 'data', key_prefix= s3_train_data)
Is the problem in the code or more in how I created the notebook?
You could look at the Sample notebooks, how to upload the data S3 bucket
There have many ways. I am just giving you hints to answer.
And you forgot create a boto3 session to access the S3 bucket
It is one of the ways to do it.
import os
import urllib.request
import boto3
def download(url):
filename = url.split("/")[-1]
if not os.path.exists(filename):
urllib.request.urlretrieve(url, filename)
def upload_to_s3(channel, file):
s3 = boto3.resource('s3')
data = open(file, "rb")
key = channel + '/' + file
s3.Bucket(bucket).put_object(Key=key, Body=data)
# caltech-256
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-train.rec')
upload_to_s3('train', 'caltech-256-60-train.rec')
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-val.rec')
upload_to_s3('validation', 'caltech-256-60-val.rec')
link : https://buildcustom.notebook.us-east-2.sagemaker.aws/notebooks/sample-notebooks/introduction_to_amazon_algorithms/imageclassification_caltech/Image-classification-fulltraining.ipynb
Another way to do it.
bucket = '<your_s3_bucket_name_here>'# enter your s3 bucket where you will copy data and model artifacts
prefix = 'sagemaker/breast_cancer_prediction' # place to upload training files within the bucket
# do some processing then prepare to push the data.
f = io.BytesIO()
smac.write_numpy_to_dense_tensor(f, train_X.astype('float32'), train_y.astype('float32'))
f.seek(0)
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train', train_file)).upload_fileobj(f)
Link : https://buildcustom.notebook.us-east-2.sagemaker.aws/notebooks/sample-notebooks/introduction_to_applying_machine_learning/breast_cancer_prediction/Breast%20Cancer%20Prediction.ipynb
Youtube link : https://www.youtube.com/watch?v=-YiHPIGyFGo - how to pull the data in S3 bucket.