I was reading the python documentation for google cloud storage and was successfully able to create a method that uploads files, however, I am not able to find a way to download files using a blob's URL. I was able to download the file using the filename, but that's not practical since the user could upload files with the same name. The blob is private. I have access to the blob's URL, so I was wondering if there is a way to download files using this link.
This is my upload code which works perfectly:
def upload_blob(bucket_name, filename, file_obj):
if filename and file_obj:
storage_client = storage.Client()
bucket = storage_client.bucket('example-storage-bucket')
blob = bucket.blob(filename)
blob.upload_from_file(file_obj) # binary file data
form_logger.info('File {} uploaded'.format(filename))
return blob
This code downloads the file, but I could only figure it out with the blob name, not URL:
def download_blob(bucket_name, url):
if url:
storage_client = storage.Client()
bucket = storage_client.bucket('example-storage-bucket')
blob = bucket.blob(url)
blob.download_to_filename("example.pdf")
Any suggestions or thoughts on how to download the file using the blob's media link URL?
For example, bucket example-storage-bucket has file folder/example.pdf and its
Link URL is https://storage.cloud.google.com/example-storage-bucket/folder/example.pdf and
URI is gs://example-storage-bucket/folder/example.pdf
Use below function to download blob using GCS link URL(if you are using Python 3.x):
import os
from urllib.parse import urlparse
def decode_gcs_url(url):
p = urlparse(url)
path = p.path[1:].split('/', 1)
bucket, file_path = path[0], path[1]
return bucket, file_path
def download_blob(url):
if url:
storage_client = storage.Client()
bucket, file_path = decode_gcs_url(url)
bucket = storage_client.bucket(bucket)
blob = bucket.blob(file_path)
blob.download_to_filename(os.path.basename(file_path))
I think what you're saying is that you want to download the blob to a file whose name is based on the blob name, correct? If so, you can find the blob name in the blob.metadata, and then pick a filename based on that blob name.
Related
As the topic indicates...
I have try two ways and none of them work:
First:
I want to programmatically talk to GCS in Python. such as reading gs://{bucketname}/{blobname} as a path or a file. The only thing I can find is a gsutil module, however it seems used in a commend line instead of a python application.
i find a code here Accessing data in google cloud bucket, but still confused on how to retrieve it to a type i need. there is a jpg file in the bucket, and want to download it for a text detection, this will be deploy on google funtion.
Second:
download_as_bytes()method, Link to the blob document I import the googe.cloud.storage module and provide the GCP key, however the error rise saying the Blob has no attribute of download_as_bytes().
is there anything else i haven't try? Thank you!
for the reference:
def text_detected(user_id):
bucket=storage_client.bucket(
'img_platecapture')
blob=bucket.blob({user_id})
content= blob.download_as_bytes()
image = vision.Image(content=content) #insert a content
response = vision_client.text_detection(image=image)
if response.error.message:
raise Exception(
'{}\nFor more info on error messages, check: '
'https://cloud.google.com/apis/design/errors'.format(
response.error.message))
img = Image.open(input_file) #insert a path
draw = ImageDraw.Draw(img)
font = ImageFont.truetype("simsun.ttc", 18)
for text in response.text_annotations[1::]:
ocr = text.description
draw.text((bound.vertices[0].x-25, bound.vertices[0].y-25),ocr,fill=(255,0,0),font=font)
draw.polygon(
[
bound.vertices[0].x,
bound.vertices[0].y,
bound.vertices[1].x,
bound.vertices[1].y,
bound.vertices[2].x,
bound.vertices[2].y,
bound.vertices[3].x,
bound.vertices[3].y,
],
None,
'yellow',
)
texts=response.text_annotations
a=str(texts[0].description.split())
b=re.sub(u"([^\u4e00-\u9fa5\u0030-u0039])","",a)
b1="".join(b)
print("偵測到的地址為:",b1)
return b1
#handler.add(MessageEvent, message=ImageMessage)
def handle_content_message(event):
message_content = line_bot_api.get_message_content(event.message.id)
user = line_bot_api.get_profile(event.source.user_id)
data=b''
for chunk in message_content.iter_content():
data+= chunk
global bucket_name
bucket_name = 'img_platecapture'
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(f'{user.user_id}.jpg')
blob.upload_from_string(data)
text_detected1=text_detected(user.user_id) ####Here's the problem
line_bot_api.reply_message(
event.reply_token,
messages=TextSendMessage(
text=text_detected1
))
reference code(gcsfs/fsspec):
gcs = gcsfs.GCSFileSystem()
bucket=storage_client.bucket('img_platecapture')
blob=bucket.blob({user_id})
f =fsspec.open("gs://img_platecapture/{user_id}")
with f.open({user_id}, "rb") as fp:
content = fp.read()
image = vision.Image(content=content)
response = vision_client.text_detection(image=image)
You can do that with the Cloud Storage Python client :
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# The ID of your GCS bucket
# bucket_name = "your-bucket-name"
# The ID of your GCS object
# source_blob_name = "storage-object-name"
# The path to which the file should be downloaded
# destination_file_name = "local/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
# Construct a client side representation of a blob.
# Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
# any content from Google Cloud Storage. As we don't need additional data,
# using `Bucket.blob` is preferred here.
blob = bucket.blob(source_blob_name)
# blob.download_to_filename(destination_file_name)
# blob.download_as_string()
blob.download_as_bytes()
print(
"Downloaded storage object {} from bucket {} to local file {}.".format(
source_blob_name, bucket_name, destination_file_name
)
)
You can use the following methods :
blob.download_to_filename(destination_file_name)
blob.download_as_string()
blob.download_as_bytes()
To be able to correctly use this library, you have to install the expected pip package in your virtual env.
Example of project structure :
my-project
requirements.txt
your_python_script.py
The requirements.txt file :
google-cloud-storage==2.6.0
Run the following command :
pip install -r requirements.txt
In your case maybe the package was not installed correctly in your virtual env, that's why you could not access to the download_as_bytes method.
I'd be using fsspec's GCS filesystem implementation instead.
https://github.com/fsspec/gcsfs/
>>> import gcsfs
>>> fs = gcsfs.GCSFileSystem(project='my-google-project')
>>> fs.ls('my-bucket')
['my-file.txt']
>>> with fs.open('my-bucket/my-file.txt', 'rb') as f:
... print(f.read())
b'Hello, world'
https://gcsfs.readthedocs.io/en/latest/#examples
I want to download an image from a blob that is in a container.
I searched and I only found how to download a container, but as I said I do not want to download the whole container and not the whole blob otherwise just an image.
(container/blob/image.png)
this is the code that i found ( to download all the container):
import os
from azure.storage.blob import BlobServiceClient, BlobClient
from azure.storage.blob import ContentSettings, ContainerClient
# IMPORTANT: Replace connection string with your storage account connection string
# Usually starts with DefaultEndpointsProtocol=https;...
MY_CONNECTION_STRING = "CONNECTION_STRING"
# Replace with blob container
MY_BLOB_CONTAINER = "name"
# Replace with the local folder where you want files to be downloaded
LOCAL_BLOB_PATH = "Blobsss"
BLOBNAME="test"
class AzureBlobFileDownloader:
def __init__(self):
print("Intializing AzureBlobFileDownloader")
# Initialize the connection to Azure storage account
self.blob_service_client = BlobServiceClient.from_connection_string(MY_CONNECTION_STRING)
self.my_container = self.blob_service_client.get_container_client(MY_BLOB_CONTAINER)
def save_blob(self, file_name, file_content):
# Get full path to the file
download_file_path = os.path.join(LOCAL_BLOB_PATH, file_name)
# for nested blobs, create local path as well!
os.makedirs(os.path.dirname(download_file_path), exist_ok=True)
with open(download_file_path, "wb") as file:
file.write(file_content)
def download_all_blobs_in_container(self):
my_blobs = self.my_container.list_blobs()
for blob in my_blobs:
print(blob.name)
bytes = self.my_container.get_blob_client(blob).download_blob().readall()
self.save_blob(blob.name, bytes)
# Initialize class and upload files
azure_blob_file_downloader = AzureBlobFileDownloader()
azure_blob_file_downloader.download_all_blobs_in_container()
Could you please help me ?
thanks you
I'm pretty new to python and fairly stupid. I'm working on a POC to upload blobs to Azure Blob Storage with a BlobSasURL. Below is my code. When I run it, I get the following error
container_name, blob_name = unquote(path_blob[-2]), unquote(path_blob[-1])
IndexError: list index out of range
Code as it is currently
import os
import yaml
from azure.storage.blob import BlobClient
'''
Importing the configs from yaml
This method required the use of a blob SAS URL or Token
Create config.yaml in teh same path as bluppy.py withg the following
if 'account_url' contains the token or shared_access_key, you don't need to add it to the yaml.
---
source_folder: "./blobs"
account_url: "<ProperlyFormattedBlobSaSURLwithcontainerandcredentialincluded>"
container_name: "<container_name>"
'''
#Import configs from yaml
def bluppy_cfg():
cfg_root = os.path.dirname(os.path.abspath(__file__))
with open(cfg_root + "/config.yaml", "r") as yamlfile:
return yaml.load(yamlfile, Loader=yaml.FullLoader)
#Look in source folder for files to upload to storage
def get_blobs_up(dir):
with os.scandir(dir) as to_go:
for thing in to_go:
if thing.is_file() and not thing.name.startswith('.'):
yield thing
# Uploads a blob to Azure Blob Storage Conatiner via BlobSaSURL
def blob_upload(blob_url, container_name, blob_name):
blob_client = BlobClient.from_blob_url(blob_url, container_name, blob_name)
print("Bluppy is uploading a blob")
for file in files:
azbl_client = blob_client.get_blob_client(file.name)
with open(file.path, "rb") as data:
azbl_client.upload_blob(data)
print(f'{file.name} uploaded to blob storage successfully')
config = bluppy_cfg()
blob_name = get_blobs_up(config["source_folder"])
#print(*blob_name)
blob_upload(config["account_url"], config["container_name"], config["blob_name"])
There are files in the folder. When I print(*blob_name) I see the files/blobs in the folder I'm scanning for upload. I am not sure what I am missing and would appreciate any help.
Again, new/stupid coder here, so please be gentle and thanks in advance for your help!
I want to copy data from one bucket to another bucket using google cloud function. At this time I am able to copy only a single file to destination but I want to copy all files, folders, and sub-folders to my destination bucket.
from google.cloud import storage
def copy_blob(bucket_name= "loggingforproject", blob_name= "assestnbfile.json", destination_bucket_name= "test-assest", destination_blob_name= "logs"):
"""Copies a blob from one bucket to another with a new name."""
bucket_name = "loggingforproject"
blob_name = "assestnbfile.json"
destination_bucket_name = "test-assest"
destination_blob_name = "logs"
storage_client = storage.Client()
source_bucket = storage_client.bucket(bucket_name)
source_blob = source_bucket.blob(blob_name)
destination_bucket = storage_client.bucket(destination_bucket_name)
blob_copy = source_bucket.copy_blob(
source_blob, destination_bucket, destination_blob_name
)
print(
"Blob {} in bucket {} copied to blob {} in bucket {}.".format(
source_blob.name,
source_bucket.name,
blob_copy.name,
destination_bucket.name,
)
)
Using gsutil cp is a good option. However, if you want to copy the files using Cloud Functions - it can be achieved as well.
At the moment, your function only copies a single file. In order to copy the whole content of your bucket you would need to iterate through the files within it.
Here is a code sample that I wrote for an HTTP Cloud Function and tested - you can use it for a reference:
MAIN.PY
from google.cloud import storage
def copy_bucket_files(request):
"""
Copies the files from a specified bucket into the selected one.
"""
# Check if the bucket's name was specified in the request
if request.args.get('bucket'):
bucketName = request.args.get('bucket')
else:
return "The bucket name was not provided. Please try again."
try:
# Initiate Cloud Storage client
storage_client = storage.Client()
# Define the origin bucket
origin = storage_client.bucket(bucketName)
# Define the destination bucket
destination = storage_client.bucket('<my-test-bucket>')
# Get the list of the blobs located inside the bucket which files you want to copy
blobs = storage_client.list_blobs(bucketName)
for blob in blobs:
origin.copy_blob(blob, destination)
return "Done!"
except:
return "Failed!"
REQUIREMENTS.TXT
google-cloud-storage==1.22.0
How to call that function:
It can be called via the URL provided for triggering the function, by appending that URL with /?bucket=<name-of-the-bucket-to-copy> (name without <, >):
https://<function-region>-<project-name>.cloudfunctions.net/<function-name>/?bucket=<bucket-name>
You can use the gsutil cp command for this:
gsutil cp gs://first-bucket/* gs://second-bucket
See https://cloud.google.com/storage/docs/gsutil/commands/cp for more details
Here is my typescript code, I call it from my website when a need to move images.
exports.copiarImagen = functions.https.onCall(async (data, response) => {
var origen = data.Origen;
var destino = data.Destino;
console.log('Files:');
const [files] = await admin.storage().bucket("bucket´s path").getFiles({ prefix: 'path where your images are"});
files.forEach(async file => {
var nuevaRuta = file.name;
await admin.storage().bucket("posavka.appspot.com").file(file.name)
.copy(admin.storage().bucket("posavka.appspot.com").file(nuevaRuta.replace(origen,destino)));
await admin.storage().bucket("posavka.appspot.com").file(file.name).delete();
});
First I get all files in a specific path, then I copy those files to the new path, and finally I delete the file on the old path
I hope it helps you :D
I am using AWS Sagemaker and trying to upload a data folder into S3 from Sagemaker. I am trying to do is to upload my data into the s3_train_data directory (the directory exists in S3). However, it wouldn't upload it in that bucket, but in a default Bucket that has been created, and in turn creates a new folder directory with the S3_train_data variables.
code to input in directory
import os
import sagemaker
from sagemaker import get_execution_role
sagemaker_session = sagemaker.Session()
role = get_execution_role()
bucket = <bucket name>
prefix = <folders1/folders2>
key = <input>
s3_train_data = 's3://{}/{}/{}/'.format(bucket, prefix, key)
#path 'data' is the folder in the Jupyter Instance, contains all the training data
inputs = sagemaker_session.upload_data(path= 'data', key_prefix= s3_train_data)
Is the problem in the code or more in how I created the notebook?
You could look at the Sample notebooks, how to upload the data S3 bucket
There have many ways. I am just giving you hints to answer.
And you forgot create a boto3 session to access the S3 bucket
It is one of the ways to do it.
import os
import urllib.request
import boto3
def download(url):
filename = url.split("/")[-1]
if not os.path.exists(filename):
urllib.request.urlretrieve(url, filename)
def upload_to_s3(channel, file):
s3 = boto3.resource('s3')
data = open(file, "rb")
key = channel + '/' + file
s3.Bucket(bucket).put_object(Key=key, Body=data)
# caltech-256
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-train.rec')
upload_to_s3('train', 'caltech-256-60-train.rec')
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-val.rec')
upload_to_s3('validation', 'caltech-256-60-val.rec')
link : https://buildcustom.notebook.us-east-2.sagemaker.aws/notebooks/sample-notebooks/introduction_to_amazon_algorithms/imageclassification_caltech/Image-classification-fulltraining.ipynb
Another way to do it.
bucket = '<your_s3_bucket_name_here>'# enter your s3 bucket where you will copy data and model artifacts
prefix = 'sagemaker/breast_cancer_prediction' # place to upload training files within the bucket
# do some processing then prepare to push the data.
f = io.BytesIO()
smac.write_numpy_to_dense_tensor(f, train_X.astype('float32'), train_y.astype('float32'))
f.seek(0)
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train', train_file)).upload_fileobj(f)
Link : https://buildcustom.notebook.us-east-2.sagemaker.aws/notebooks/sample-notebooks/introduction_to_applying_machine_learning/breast_cancer_prediction/Breast%20Cancer%20Prediction.ipynb
Youtube link : https://www.youtube.com/watch?v=-YiHPIGyFGo - how to pull the data in S3 bucket.