Download GCS object to your Vertex AI notebook in GCP - python

If you have a gs:// blob, how do you download that file from GCS to Vertex AI notebook in GCP using python client library?

To download GCS file to Vertex AI notebook refer following python code:
from google.cloud import storage
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# bucket_name = "your-bucket-name"
# The ID of your GCS object
# source_blob_name = "storage-object-name"
# destination_file_name = "/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print(
"Downloaded storage object {} from bucket {} to file {}.".format(
source_blob_name, bucket_name, destination_file_name
)
)
Alternatively, if you want to download GCS file to jupyter directory you can use the command:
!gsutil cp gs://BUCKET_NAME/OBJECT_NAME JUPYTER_LOCATION

Related

AttributeError: module 'google.cloud.storage' has no attribute 'open'

I developed a flask app on localhost and now I want to publish it by google cloud platform. Problem occurs when trying to read/write a file in google cloud storage.
First I installed google-cloud-storage via pip:
pip install --upgrade google-cloud-storage
then my code:
from google.cloud import storage
data_file_path = "url-to-txt-file-in-google-storage"
with storage.open(data_file_path, "wb") as gcsFile:
gcsFile.write("xx")
At this point error says: AttributeError: module 'google.cloud.storage' has no attribute 'open'
I also tried it with pip install GoogleAppEngineCloudStorageClient but it didn't work either.
What should I do to make it work? How could I read and write to a file in google cloud storage without downloading it?
Firstly, the error is very clear, google.cloud.storage is a python package, it is not an object so it doesn't have open() method or any method. Actually you can see the source code here: https://github.com/googleapis/python-storage/tree/main/google/cloud/storage
Basically you need to create a Client first. If you want to just copy the file, here is an example:
https://cloud.google.com/storage/docs/copying-renaming-moving-objects#storage-copy-object-python
from google.cloud import storage
def copy_blob(
bucket_name, blob_name, destination_bucket_name, destination_blob_name
):
"""Copies a blob from one bucket to another with a new name."""
# bucket_name = "your-bucket-name"
# blob_name = "your-object-name"
# destination_bucket_name = "destination-bucket-name"
# destination_blob_name = "destination-object-name"
storage_client = storage.Client()
source_bucket = storage_client.bucket(bucket_name)
source_blob = source_bucket.blob(blob_name)
destination_bucket = storage_client.bucket(destination_bucket_name)
blob_copy = source_bucket.copy_blob(
source_blob, destination_bucket, destination_blob_name
)
print(
"Blob {} in bucket {} copied to blob {} in bucket {}.".format(
source_blob.name,
source_bucket.name,
blob_copy.name,
destination_bucket.name,
)
)

Google AI platform can't write to Cloud Storage

Running a tensorflow-cloud job on Google AI Platform, the entrypoint of the job is the following:
import tensorflow as tf
filename = r'gs://my_bucket_name/hello.txt'
with tf.io.gfile.GFile(filename, mode='w') as w:
w.write("Hello, world!")
with tf.io.gfile.GFile(filename, mode='r') as r:
print(r.read())
The job completed successfully, in the logs it prints "hello world".
The bucket and the job are both in the same region.
But I can't find the file in Cloud Storage. It is not there. I ran some other tests, where I did tf.io.gfile.listdir then wrote a new file and again tf.io.gfile.listdir, I printed the before and after, it seems that a file was added but when I open cloud storage, I can't find it there. Also was able to read files from storage.
I'm not getting any permissions errors, and as the official docs say, AI Platform already has the permission to read/write to Cloud Storage.
Here is my main.py file:
import tensorflow_cloud as tfc
tfc.run(
entry_point="run_me.py",
requirements_txt="requirements.txt",
chief_config=tfc.COMMON_MACHINE_CONFIGS['CPU'],
docker_config=tfc.DockerConfig(
image_build_bucket="test_ai_storage"),
)
This is the most minimal version where I can reproduce the problem.
Cloud Storage is not a file system. Having this in mind you can perform uploads, downloads or Deletion operations in a bucket.
What you are trying to do is to open a file and write into it. What you should do is to create your file locally and then upload it to your desired bucket.
from google.cloud import storage
def upload_blob(bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket."""
# bucket_name = "your-bucket-name"
# source_file_name = "local/path/to/file"
# destination_blob_name = "storage-object-name"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print(
"File {} uploaded to {}.".format(
source_file_name, destination_blob_name
)
)

Upload images to Google Cloud instead of saving to disk (Python)

I have a script which downloads certain images (<1 MB) from web and saves into local disk. Can I save it into a Google Cloud Storage bucket instead of my local system?
def downloadImage(url):
try:
print("Downloading {}".format(url))
image_name = str(url).split('/')[-1]
resp = urlopen(url)
image = np.asarray(bytearray(resp.read()), dtype="uint8")
image = cv2.imdecode(image, cv2.IMREAD_COLOR)
cv2.imwrite(current_path + "\\Downloaded\\" + image_name, image)
except Exception as error:
print(error)
You can employ the GCS Python Client Library to programatically perform tasks related to GCS. The following code will upload a local file located at /PATH/TO/SOURCE_FILE to the GCS bucket gs://BUCKET_NAME
from google.cloud import storage
def upload_blob(bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket"""
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print(
"File {} uploaded to {}.".format(
source_file_name, destination_blob_name
)
)
BUCKET_NAME = "BUCKET_NAME"
SOURCE_FILE_NAME = "/PATH/TO/SOURCE_FILE"
DESTINATION_FILE_NAME = "DESTINATION_FILE"
upload_blob(BUCKET_NAME, SOURCE_FILE_NAME, DESTINATION_FILE_NAME)
Keep in mind that, in order to use the upload_blob method, it is necessary to have installed the GCS Python client library and to set up the authentication credentials. You can find here information about how to implement these steps.
Try this: https://cloud.google.com/python
And remember, always check if there is a library for what you want.

How to delete all files in google storage folder using python

I have google cloud storage bucket path stored in one variable called GS_PATH
example of google cloud storage path is gs://test/one/
Under this i have few more folders and files.
How can i delete all under gs://test/one/ path using python code
Thanks,
Arjun
There is an API to do this:
from google.cloud import storage
my_storage = storage.Client()
bucket = my_storage.get_bucket('test')
blobs = bucket.list_blobs(prefix='one/')
for blob in blobs:
blob.delete()
See https://cloud.google.com/storage/docs/deleting-objects#storage-delete-object-python for reference.
from google.cloud import storage
storage_client = storage.Client()
bucket = storage_client.get_bucket('bucket_name')
blobs = bucket.list_blobs(prefix='folder_prefix/')
for blob in blobs:
blob.delete()

Transfer file from URL to Cloud Storage

I'm a Ruby dev trying my hand at Google Cloud Functions written in Python and have hit a wall with transferring a remote file from a given URL to Google Cloud Storage (GCS).
In an equivalent RoR app I download to the app's ephemeral storage and then upload to GSC.
I am hoping there's a way to simply 'download' the remote file to my GCS bucket via the Cloud Function.
Here's a simplified example of what I am doing with some comments, the real code fetches the URLs from a private API, but that works fine and isn't where the issue is.
from google.cloud import storage
project_id = 'my-project'
bucket_name = 'my-bucket'
destination_blob_name = 'upload.test'
storage_client = storage.Client.from_service_account_json('my_creds.json')
# This works fine
#source_file_name = 'localfile.txt'
# When using a remote URL I get 'IOError: [Errno 2] No such file or directory'
source_file_name = 'http://www.hospiceofmontezuma.org/wp-content/uploads/2017/10/confused-man.jpg'
def upload_blob(bucket_name, source_file_name, destination_blob_name):
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
upload_blob(bucket_name, source_file_name, destination_blob_name)
Thanks in advance.
It is not possible to upload a file to Google Cloud Storage directly from an URL. Since you are running the script from a local environment, the file contents that you want to upload, need to be in that same environment. This means that the contents of the url need to either be stored in the memory, or in a file.
An example showing how to do it, based in your code:
Option 1: You can use the wget module, that will fetch the url and download it's contents into a local file (similar to the wget CLI command). Note that this means that the file will be stored locally, and then uploaded from the file. I added the os.remove line to remove the file once the upload is done.
from google.cloud import storage
import wget
import io, os
project_id = 'my-project'
bucket_name = 'my-bucket'
destination_blob_name = 'upload.test'
storage_client = storage.Client.from_service_account_json('my_creds.json')
source_file_name = 'http://www.hospiceofmontezuma.org/wp-content/uploads/2017/10/confused-man.jpg'
def upload_blob(bucket_name, source_file_name, destination_blob_name):
filename = wget.download(source_file_name)
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(filename, content_type='image/jpg')
os.remove(filename)
upload_blob(bucket_name, source_file_name, destination_blob_name)
Option 2: using the urllib module, works similar to the wget module, but instead of writing into a file it writes to a variable. Note that I did this example im Python3, there are some differences if you plan to run your script in Python 2.X.
from google.cloud import storage
import urllib.request
project_id = 'my-project'
bucket_name = 'my-bucket'
destination_blob_name = 'upload.test'
storage_client = storage.Client.from_service_account_json('my_creds.json')
source_file_name = 'http://www.hospiceofmontezuma.org/wp-content/uploads/2017/10/confused-man.jpg'
def upload_blob(bucket_name, source_file_name, destination_blob_name):
file = urllib.request.urlopen(source_file_name)
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_string(link.read(), content_type='image/jpg')
upload_blob(bucket_name, source_file_name, destination_blob_name)
Directly transferring URLs into GCS is possible through the Cloud Transfer service, but setting up a cloud transfer job for a single URL is a lot of overhead. That sort of solution is targeted towards a situation with millions of URLs that need to become GCS objects.
Instead, I recommend writing a job that pumps an incoming stream from reading a URL into a write stream to GCS and running that somewhere in the Google Cloud close to the bucket.

Categories