How to download a file from Google Cloud Platform storage

How to download a file from Google Cloud Platform storage - python

I was reading the python documentation for google cloud storage and was successfully able to create a method that uploads files, however, I am not able to find a way to download files using a blob's URL. I was able to download the file using the filename, but that's not practical since the user could upload files with the same name. The blob is private. I have access to the blob's URL, so I was wondering if there is a way to download files using this link.
This is my upload code which works perfectly:
def upload_blob(bucket_name, filename, file_obj):
if filename and file_obj:
storage_client = storage.Client()
bucket = storage_client.bucket('example-storage-bucket')
blob = bucket.blob(filename)
blob.upload_from_file(file_obj) # binary file data
form_logger.info('File {} uploaded'.format(filename))
return blob
This code downloads the file, but I could only figure it out with the blob name, not URL:
def download_blob(bucket_name, url):
if url:
storage_client = storage.Client()
bucket = storage_client.bucket('example-storage-bucket')
blob = bucket.blob(url)
blob.download_to_filename("example.pdf")
Any suggestions or thoughts on how to download the file using the blob's media link URL?

For example, bucket example-storage-bucket has file folder/example.pdf and its
Link URL is https://storage.cloud.google.com/example-storage-bucket/folder/example.pdf and
URI is gs://example-storage-bucket/folder/example.pdf
Use below function to download blob using GCS link URL(if you are using Python 3.x):
import os
from urllib.parse import urlparse
def decode_gcs_url(url):
p = urlparse(url)
path = p.path[1:].split('/', 1)
bucket, file_path = path[0], path[1]
return bucket, file_path
def download_blob(url):
if url:
storage_client = storage.Client()
bucket, file_path = decode_gcs_url(url)
bucket = storage_client.bucket(bucket)
blob = bucket.blob(file_path)
blob.download_to_filename(os.path.basename(file_path))

I think what you're saying is that you want to download the blob to a file whose name is based on the blob name, correct? If so, you can find the blob name in the blob.metadata, and then pick a filename based on that blob name.

Related

How to read a jpg. from google storage as a path or file type

As the topic indicates...
I have try two ways and none of them work:
First:
I　want to programmatically talk to GCS in Python.　such as reading gs://{bucketname}/{blobname} as a path or a file. The only thing I can find is a gsutil module, however it seems used in a commend line instead of a python application.
i find a code here Accessing data in google cloud bucket, but still confused on how to retrieve it to a type i need. there is a jpg file in the bucket, and want to download it for a text detection, this will be deploy on google funtion.
Second:
download_as_bytes()method, Link to the blob document I import the googe.cloud.storage module and provide the GCP key, however the error rise saying the Blob has no attribute of download_as_bytes().
is there anything else i haven't try? Thank you!
for the reference:
def text_detected(user_id):
bucket=storage_client.bucket(
'img_platecapture')
blob=bucket.blob({user_id})
content= blob.download_as_bytes()
image = vision.Image(content=content) #insert a content
response = vision_client.text_detection(image=image)
if response.error.message:
raise Exception(
'{}\nFor more info on error messages, check: '
'https://cloud.google.com/apis/design/errors'.format(
response.error.message))
img = Image.open(input_file) #insert a path
draw = ImageDraw.Draw(img)
font = ImageFont.truetype("simsun.ttc", 18)
for text in response.text_annotations[1::]:
ocr = text.description
draw.text((bound.vertices[0].x-25, bound.vertices[0].y-25),ocr,fill=(255,0,0),font=font)
draw.polygon(
[
bound.vertices[0].x,
bound.vertices[0].y,
bound.vertices[1].x,
bound.vertices[1].y,
bound.vertices[2].x,
bound.vertices[2].y,
bound.vertices[3].x,
bound.vertices[3].y,
],
None,
'yellow',
)
texts=response.text_annotations
a=str(texts[0].description.split())
b=re.sub(u"([^\u4e00-\u9fa5\u0030-u0039])","",a)
b1="".join(b)
print("偵測到的地址為:",b1)
return b1
#handler.add(MessageEvent, message=ImageMessage)
def handle_content_message(event):
message_content = line_bot_api.get_message_content(event.message.id)
user = line_bot_api.get_profile(event.source.user_id)
data=b''
for chunk in message_content.iter_content():
data+= chunk
global bucket_name
bucket_name = 'img_platecapture'
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(f'{user.user_id}.jpg')
blob.upload_from_string(data)
text_detected1=text_detected(user.user_id) ####Here's the problem
line_bot_api.reply_message(
event.reply_token,
messages=TextSendMessage(
text=text_detected1
))
reference code(gcsfs/fsspec):
gcs = gcsfs.GCSFileSystem()
bucket=storage_client.bucket('img_platecapture')
blob=bucket.blob({user_id})
f =fsspec.open("gs://img_platecapture/{user_id}")
with f.open({user_id}, "rb") as fp:
content = fp.read()
image = vision.Image(content=content)
response = vision_client.text_detection(image=image)

You can do that with the Cloud Storage Python client :
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# The ID of your GCS bucket
# bucket_name = "your-bucket-name"
# The ID of your GCS object
# source_blob_name = "storage-object-name"
# The path to which the file should be downloaded
# destination_file_name = "local/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
# Construct a client side representation of a blob.
# Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
# any content from Google Cloud Storage. As we don't need additional data,
# using `Bucket.blob` is preferred here.
blob = bucket.blob(source_blob_name)
# blob.download_to_filename(destination_file_name)
# blob.download_as_string()
blob.download_as_bytes()
print(
"Downloaded storage object {} from bucket {} to local file {}.".format(
source_blob_name, bucket_name, destination_file_name
)
)
You can use the following methods :
blob.download_to_filename(destination_file_name)
blob.download_as_string()
blob.download_as_bytes()
To be able to correctly use this library, you have to install the expected pip package in your virtual env.
Example of project structure :
my-project
requirements.txt
your_python_script.py
The requirements.txt file :
google-cloud-storage==2.6.0
Run the following command :
pip install -r requirements.txt
In your case maybe the package was not installed correctly in your virtual env, that's why you could not access to the download_as_bytes method.

I'd be using fsspec's GCS filesystem implementation instead.
https://github.com/fsspec/gcsfs/
>>> import gcsfs
>>> fs = gcsfs.GCSFileSystem(project='my-google-project')
>>> fs.ls('my-bucket')
['my-file.txt']
>>> with fs.open('my-bucket/my-file.txt', 'rb') as f:
... print(f.read())
b'Hello, world'
https://gcsfs.readthedocs.io/en/latest/#examples

Download a picture from a blob using python ( azure Blob storage)

I want to download an image from a blob that is in a container.
I searched and I only found how to download a container, but as I said I do not want to download the whole container and not the whole blob otherwise just an image.
(container/blob/image.png)
this is the code that i found ( to download all the container):
import os
from azure.storage.blob import BlobServiceClient, BlobClient
from azure.storage.blob import ContentSettings, ContainerClient
# IMPORTANT: Replace connection string with your storage account connection string
# Usually starts with DefaultEndpointsProtocol=https;...
MY_CONNECTION_STRING = "CONNECTION_STRING"
# Replace with blob container
MY_BLOB_CONTAINER = "name"
# Replace with the local folder where you want files to be downloaded
LOCAL_BLOB_PATH = "Blobsss"
BLOBNAME="test"
class AzureBlobFileDownloader:
def __init__(self):
print("Intializing AzureBlobFileDownloader")
# Initialize the connection to Azure storage account
self.blob_service_client = BlobServiceClient.from_connection_string(MY_CONNECTION_STRING)
self.my_container = self.blob_service_client.get_container_client(MY_BLOB_CONTAINER)
def save_blob(self, file_name, file_content):
# Get full path to the file
download_file_path = os.path.join(LOCAL_BLOB_PATH, file_name)
# for nested blobs, create local path as well!
os.makedirs(os.path.dirname(download_file_path), exist_ok=True)
with open(download_file_path, "wb") as file:
file.write(file_content)
def download_all_blobs_in_container(self):
my_blobs = self.my_container.list_blobs()
for blob in my_blobs:
print(blob.name)
bytes = self.my_container.get_blob_client(blob).download_blob().readall()
self.save_blob(blob.name, bytes)
# Initialize class and upload files
azure_blob_file_downloader = AzureBlobFileDownloader()
azure_blob_file_downloader.download_all_blobs_in_container()
Could you please help me ?
thanks you

Azure Storage Blob and Python IndexError

I'm pretty new to python and fairly stupid. I'm working on a POC to upload blobs to Azure Blob Storage with a BlobSasURL. Below is my code. When I run it, I get the following error
container_name, blob_name = unquote(path_blob[-2]), unquote(path_blob[-1])
IndexError: list index out of range
Code as it is currently
import os
import yaml
from azure.storage.blob import BlobClient
'''
Importing the configs from yaml
This method required the use of a blob SAS URL or Token
Create config.yaml in teh same path as bluppy.py withg the following
if 'account_url' contains the token or shared_access_key, you don't need to add it to the yaml.
---
source_folder: "./blobs"
account_url: "<ProperlyFormattedBlobSaSURLwithcontainerandcredentialincluded>"
container_name: "<container_name>"
'''
#Import configs from yaml
def bluppy_cfg():
cfg_root = os.path.dirname(os.path.abspath(__file__))
with open(cfg_root + "/config.yaml", "r") as yamlfile:
return yaml.load(yamlfile, Loader=yaml.FullLoader)
#Look in source folder for files to upload to storage
def get_blobs_up(dir):
with os.scandir(dir) as to_go:
for thing in to_go:
if thing.is_file() and not thing.name.startswith('.'):
yield thing
# Uploads a blob to Azure Blob Storage Conatiner via BlobSaSURL
def blob_upload(blob_url, container_name, blob_name):
blob_client = BlobClient.from_blob_url(blob_url, container_name, blob_name)
print("Bluppy is uploading a blob")
for file in files:
azbl_client = blob_client.get_blob_client(file.name)
with open(file.path, "rb") as data:
azbl_client.upload_blob(data)
print(f'{file.name} uploaded to blob storage successfully')
config = bluppy_cfg()
blob_name = get_blobs_up(config["source_folder"])
#print(*blob_name)
blob_upload(config["account_url"], config["container_name"], config["blob_name"])
There are files in the folder. When I print(*blob_name) I see the files/blobs in the folder I'm scanning for upload. I am not sure what I am missing and would appreciate any help.
Again, new/stupid coder here, so please be gentle and thanks in advance for your help!

Google cloud function to copy all data of source bucket to another bucket using python

I want to copy data from one bucket to another bucket using google cloud function. At this time I am able to copy only a single file to destination but I want to copy all files, folders, and sub-folders to my destination bucket.
from google.cloud import storage
def copy_blob(bucket_name= "loggingforproject", blob_name= "assestnbfile.json", destination_bucket_name= "test-assest", destination_blob_name= "logs"):
"""Copies a blob from one bucket to another with a new name."""
bucket_name = "loggingforproject"
blob_name = "assestnbfile.json"
destination_bucket_name = "test-assest"
destination_blob_name = "logs"
storage_client = storage.Client()
source_bucket = storage_client.bucket(bucket_name)
source_blob = source_bucket.blob(blob_name)
destination_bucket = storage_client.bucket(destination_bucket_name)
blob_copy = source_bucket.copy_blob(
source_blob, destination_bucket, destination_blob_name
)
print(
"Blob {} in bucket {} copied to blob {} in bucket {}.".format(
source_blob.name,
source_bucket.name,
blob_copy.name,
destination_bucket.name,
)
)

Using gsutil cp is a good option. However, if you want to copy the files using Cloud Functions - it can be achieved as well.
At the moment, your function only copies a single file. In order to copy the whole content of your bucket you would need to iterate through the files within it.
Here is a code sample that I wrote for an HTTP Cloud Function and tested - you can use it for a reference:
MAIN.PY
from google.cloud import storage
def copy_bucket_files(request):
"""
Copies the files from a specified bucket into the selected one.
"""
# Check if the bucket's name was specified in the request
if request.args.get('bucket'):
bucketName = request.args.get('bucket')
else:
return "The bucket name was not provided. Please try again."
try:
# Initiate Cloud Storage client
storage_client = storage.Client()
# Define the origin bucket
origin = storage_client.bucket(bucketName)
# Define the destination bucket
destination = storage_client.bucket('<my-test-bucket>')
# Get the list of the blobs located inside the bucket which files you want to copy
blobs = storage_client.list_blobs(bucketName)
for blob in blobs:
origin.copy_blob(blob, destination)
return "Done!"
except:
return "Failed!"
REQUIREMENTS.TXT
google-cloud-storage==1.22.0
How to call that function:
It can be called via the URL provided for triggering the function, by appending that URL with /?bucket=<name-of-the-bucket-to-copy> (name without <, >):
https://<function-region>-<project-name>.cloudfunctions.net/<function-name>/?bucket=<bucket-name>

You can use the gsutil cp command for this:
gsutil cp gs://first-bucket/* gs://second-bucket
See https://cloud.google.com/storage/docs/gsutil/commands/cp for more details

Here is my typescript code, I call it from my website when a need to move images.
exports.copiarImagen = functions.https.onCall(async (data, response) => {
var origen = data.Origen;
var destino = data.Destino;
console.log('Files:');
const [files] = await admin.storage().bucket("bucket´s path").getFiles({ prefix: 'path where your images are"});
files.forEach(async file => {
var nuevaRuta = file.name;
await admin.storage().bucket("posavka.appspot.com").file(file.name)
.copy(admin.storage().bucket("posavka.appspot.com").file(nuevaRuta.replace(origen,destino)));
await admin.storage().bucket("posavka.appspot.com").file(file.name).delete();
});
First I get all files in a specific path, then I copy those files to the new path, and finally I delete the file on the old path
I hope it helps you :D

AWS uploading file into wrong bucket

I am using AWS Sagemaker and trying to upload a data folder into S3 from Sagemaker. I am trying to do is to upload my data into the s3_train_data directory (the directory exists in S3). However, it wouldn't upload it in that bucket, but in a default Bucket that has been created, and in turn creates a new folder directory with the S3_train_data variables.
code to input in directory
import os
import sagemaker
from sagemaker import get_execution_role
sagemaker_session = sagemaker.Session()
role = get_execution_role()
bucket = <bucket name>
prefix = <folders1/folders2>
key = <input>
s3_train_data = 's3://{}/{}/{}/'.format(bucket, prefix, key)
#path 'data' is the folder in the Jupyter Instance, contains all the training data
inputs = sagemaker_session.upload_data(path= 'data', key_prefix= s3_train_data)
Is the problem in the code or more in how I created the notebook?

You could look at the Sample notebooks, how to upload the data S3 bucket
There have many ways. I am just giving you hints to answer.
And you forgot create a boto3 session to access the S3 bucket
It is one of the ways to do it.
import os
import urllib.request
import boto3
def download(url):
filename = url.split("/")[-1]
if not os.path.exists(filename):
urllib.request.urlretrieve(url, filename)
def upload_to_s3(channel, file):
s3 = boto3.resource('s3')
data = open(file, "rb")
key = channel + '/' + file
s3.Bucket(bucket).put_object(Key=key, Body=data)
# caltech-256
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-train.rec')
upload_to_s3('train', 'caltech-256-60-train.rec')
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-val.rec')
upload_to_s3('validation', 'caltech-256-60-val.rec')
link : https://buildcustom.notebook.us-east-2.sagemaker.aws/notebooks/sample-notebooks/introduction_to_amazon_algorithms/imageclassification_caltech/Image-classification-fulltraining.ipynb
Another way to do it.
bucket = '<your_s3_bucket_name_here>'# enter your s3 bucket where you will copy data and model artifacts
prefix = 'sagemaker/breast_cancer_prediction' # place to upload training files within the bucket
# do some processing then prepare to push the data.
f = io.BytesIO()
smac.write_numpy_to_dense_tensor(f, train_X.astype('float32'), train_y.astype('float32'))
f.seek(0)
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train', train_file)).upload_fileobj(f)
Link : https://buildcustom.notebook.us-east-2.sagemaker.aws/notebooks/sample-notebooks/introduction_to_applying_machine_learning/breast_cancer_prediction/Breast%20Cancer%20Prediction.ipynb
Youtube link : https://www.youtube.com/watch?v=-YiHPIGyFGo - how to pull the data in S3 bucket.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to download a file from Google Cloud Platform storage - python

I think what you're saying is that you want to download the blob to a file whose name is based on the blob name, correct? If so, you can find the blob name in the blob.metadata, and then pick a filename based on that blob name.

Related

How to read a jpg. from google storage as a path or file type

Download a picture from a blob using python ( azure Blob storage)

Azure Storage Blob and Python IndexError

Google cloud function to copy all data of source bucket to another bucket using python

AWS uploading file into wrong bucket

Categories

Resources