Google AI platform can't write to Cloud Storage - python

Running a tensorflow-cloud job on Google AI Platform, the entrypoint of the job is the following:
import tensorflow as tf
filename = r'gs://my_bucket_name/hello.txt'
with tf.io.gfile.GFile(filename, mode='w') as w:
w.write("Hello, world!")
with tf.io.gfile.GFile(filename, mode='r') as r:
print(r.read())
The job completed successfully, in the logs it prints "hello world".
The bucket and the job are both in the same region.
But I can't find the file in Cloud Storage. It is not there. I ran some other tests, where I did tf.io.gfile.listdir then wrote a new file and again tf.io.gfile.listdir, I printed the before and after, it seems that a file was added but when I open cloud storage, I can't find it there. Also was able to read files from storage.
I'm not getting any permissions errors, and as the official docs say, AI Platform already has the permission to read/write to Cloud Storage.
Here is my main.py file:
import tensorflow_cloud as tfc
tfc.run(
entry_point="run_me.py",
requirements_txt="requirements.txt",
chief_config=tfc.COMMON_MACHINE_CONFIGS['CPU'],
docker_config=tfc.DockerConfig(
image_build_bucket="test_ai_storage"),
)
This is the most minimal version where I can reproduce the problem.

Cloud Storage is not a file system. Having this in mind you can perform uploads, downloads or Deletion operations in a bucket.
What you are trying to do is to open a file and write into it. What you should do is to create your file locally and then upload it to your desired bucket.
from google.cloud import storage
def upload_blob(bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket."""
# bucket_name = "your-bucket-name"
# source_file_name = "local/path/to/file"
# destination_blob_name = "storage-object-name"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print(
"File {} uploaded to {}.".format(
source_file_name, destination_blob_name
)
)

Related

Google cloud storage API not uploading blob to bucket

I have adhered to google cloud storage documentation for getting a local file on a server uploaded as a blob to a bucket I created yet it does not work. It does work manually via the cloud console gui however. Note that my code for downloading an export generated object from a bucket in a cloudstoragesink works which does also call the cloud storage API, but the upload does not. I have tried to get some errors info for debugging on stdout or stderr to no avail. My credentials/authentication all checks out as I am able to use them for downloading. Also, the bucket I am trying to upload to has already been created in another function call, could it be that the storage client bucket method is attempting to instantiate an already existing bucket, thus not getting past that and erroring out before going to the upload blob method ? Any insight is appreciated.
Py reference for blob upload from filename function
GCS api guide for uploading objects to bucket
def upload_blob(bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket."""
# The ID of your GCS bucket
# bucket_name = "your-bucket-name"
# The path to your file to upload
# source_file_name = "local/path/to/file"
# The ID of your GCS object
# destination_blob_name = "storage-object-name"
storage_client = storage.Client.from_service_account_json('/opt/gws/creds/gws-vault-data-export-ops-bfd51297e810.json')
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print(
"File {} uploaded to {}.".format(
source_file_name, destination_blob_name
)
)
print('\n')
print('************List Blobs***********')
blobs_list = list_blobs(bucket_name_targ)
print('\n')
print(blobs_list)
print('\n')
blobs_list_set = set(map(lambda blob: blob.name, blobs_list))
print(blobs_list_set)
print('\n')
dl_dir = export['name']
blob_file = export_filename_dl
source_file_name = os.path.join(working_dir_param, dl_dir, blob_file)
print(source_file_name)
destination_blob_name = blob_file + 'api-upload'
upload_blob(bucket_name_targ, source_file_name, destination_blob_name)
#if blob_file not in blobs_list_set:
#print('AT UPLOAD CONDITION BLOCK')
#upload_blob(bucket_name_targ, source_file_name, destination_blob_name)
#else:
#print('Upload did not succeed')
I got it working. Turns out, since I am using a custom ansible module with field parameters including one for Linux pwd for the downloads of the exports which the upload function gets passed the string path for, I had to correctly combine the custom module parameter with the downloaded directory/file path to pass into source_file_name . Like so:
"pwd": {"default": None, "type": "str"}
working_dir_param = module.params["pwd"]
path_tail_actual = directory_main + filename_main
source_file_name_actual = working_dir_param + path_tail_actual

gzip an image file through cloud function on storage trigger

I am trying to automate gzipping on image uploaded to cloud storage bucket.
Everytime i upload an image, i want cloud function to run python code to convert it into gzip and move it to another bucket present in storage.
My code is not running. It's giving me File not found error. Also what's the right way to give full location.
My code is ..
from google.cloud import storage
import gzip
import shutil
client = storage.Client()
def hello_gcs(event, context):
"""Triggered by a change to a Cloud Storage bucket.
Args:
event (dict): Event payload.
context (google.cloud.functions.Context): Metadata for
the event.
"""
with open("/"+event['name'],'rb') as f_input :
with gzip.open("'/tmp/'+event['name']+'.gz'","wb") as f_output:
shutil.copyfileobj(f_input,f_output)
source_bucket= client.bucket(event['bucket'])
source_blob = source_bucket.blob("/tmp/" + event['name'])
destination_bucket = client.bucket('baalti2')
blob_copy = source_bucket.copy_blob(
source_blob, destination_bucket, event['name']
)
print(
"Blob {} in bucket {} moved to blob {} in bucket {}.".format(
source_blob.name,
source_bucket.name,
blob_copy.name,
destination_bucket.name,
)
)
You are using Linux file system APIs (open, shutil.copy) to access Cloud Storge which will not work. Copy the file from the bucket to local storage. Gzip the file. Copy the gzip file to the destination bucket. Use the Cloud Storage APIs to interact with Cloud Storage.

What type of API or references/commands do I need when uploading a file and also connecting to a MongoDB from a Google Function?

I'm now trying to make this simple Python script that will run when called through an API call or through a Google Function. I'm very, very new to GCP and Python as I'm more familiar with Azure and PowerShell, but I need to know what I need to use/call in order to upload a file to a bucket and also read the file information, plus then connect to a MongoDB database.
Here is the flow of what I need to do:
API/function will be called with its URL and attached to it will be an actual file, like a seismic file type.
When the API/function is called, a Python script will run that will grab that file and upload it to a bucket.
Then I need to run commands against the uploaded file to retrieve items like "version","company","wellname", etc.
Then I want to upload a document, with all of these values, into a MongoDB database.
We're basically trying to replicate something we did in Azure with Functions and a CosmosDB instance. There, we created a function that would upload the file to Azure storage, then retrieve values from the file, which I believe is the metadata of it. After, we would upload a document to CosmosDB with these values. It's a way of recording values retrieved from the file itself. Any help would be appreciated as this is part of a POC I'm trying to present on!! Please ask any questions!
To answer your questions:
Here's a code on how to upload a file to a Cloud Storage bucket using Python:
from google.cloud import storage
def upload_blob(bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket."""
# bucket_name = "your-bucket-name"
# source_file_name = "local/path/to/file"
# destination_blob_name = "storage-object-name"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print(
"File {} uploaded to {}.".format(
source_file_name, destination_blob_name
)
)
Make sure that you have the Cloud Storage client library installed on your requirements.txt. Example:
google-cloud-storage>=1.33.0
To retrieve and view the object metadata here's the code and link as reference:
from google.cloud import storage
def blob_metadata(bucket_name, blob_name):
"""Prints out a blob's metadata."""
# bucket_name = 'your-bucket-name'
# blob_name = 'your-object-name'
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.get_blob(blob_name)
print("Blob: {}".format(blob.name))
print("Bucket: {}".format(blob.bucket.name))
print("Storage class: {}".format(blob.storage_class))
print("ID: {}".format(blob.id))
print("Size: {} bytes".format(blob.size))
...
Finally,
To connect to MongoDB using Python, here's a sample project which could help you understand how it works.

Not able to upload_from_filename files into GCP bucket using python script and GCP service account cred

• Installed Python 3.7.2
• Created GCP service account and given owner role to it, also enabled storage API and created a cloud storage bucket
• Now I’m trying to upload files to GCP cloud storage folder using python script but I couldn’t. But, by using the same structure, I’m able to create new cloud storage bucket and able to edit existing files in it
• Here with have attached pythonscript
Ref used:
https://googleapis.github.io/google-cloud-python/latest/storage/blobs.html
https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-python
from google.cloud import storage
bucket_name='buckettest'
source_file_name='D:/file.txt'
source_file_name1='D:/jenkins structure.png'
destination_blob_name='test/'
def upload_blob(bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket."""
client = storage.Client.from_service_account_json('D:\gmailseviceaccount.json')
bucket = client.create_bucket('bucketcreate')
bucket = client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
blob.upload_from_filename(source_file_name1)
print('File {} uploaded to {}.'.format(
source_file_name,
destination_blob_name))
if __name__ == '__main__':
upload_blob(bucket_name, source_file_name, destination_blob_name)
I was able to run your code and debug it. I will put what I used below and explain the changes I made.
As you did, I put my service account as Owner and was able to upload. I recommend following the best practices of least privileges when your done testing.
I removed client.create_bucket since buckets are unique we shouldn't be hard coding bucket names to create. You can come up with a naming convention for your needs, however, for testing I removed it.
I fixed the variable destination_blob_name since you were using it as a folder for the file to be placed. This would not work as GCS does not use folders, it instead just uses file names. What was happening is that you were actually "converting" your TXT files into a folder named 'test'. For a better understanding, I recommend looking through the documentation on How Sub-directories Work.
from google.cloud import storage
bucket_name='bucket-test-18698335'
source_file_name='./hello.txt'
destination_blob_name='test/hello.txt'
def upload_blob(bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket."""
client = storage.Client.from_service_account_json('./test.json')
bucket = client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print('File {} uploaded to {}.'.format(
source_file_name,
destination_blob_name))
if __name__ == '__main__':
upload_blob(bucket_name, source_file_name, destination_blob_name)

Transfer file from URL to Cloud Storage

I'm a Ruby dev trying my hand at Google Cloud Functions written in Python and have hit a wall with transferring a remote file from a given URL to Google Cloud Storage (GCS).
In an equivalent RoR app I download to the app's ephemeral storage and then upload to GSC.
I am hoping there's a way to simply 'download' the remote file to my GCS bucket via the Cloud Function.
Here's a simplified example of what I am doing with some comments, the real code fetches the URLs from a private API, but that works fine and isn't where the issue is.
from google.cloud import storage
project_id = 'my-project'
bucket_name = 'my-bucket'
destination_blob_name = 'upload.test'
storage_client = storage.Client.from_service_account_json('my_creds.json')
# This works fine
#source_file_name = 'localfile.txt'
# When using a remote URL I get 'IOError: [Errno 2] No such file or directory'
source_file_name = 'http://www.hospiceofmontezuma.org/wp-content/uploads/2017/10/confused-man.jpg'
def upload_blob(bucket_name, source_file_name, destination_blob_name):
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
upload_blob(bucket_name, source_file_name, destination_blob_name)
Thanks in advance.
It is not possible to upload a file to Google Cloud Storage directly from an URL. Since you are running the script from a local environment, the file contents that you want to upload, need to be in that same environment. This means that the contents of the url need to either be stored in the memory, or in a file.
An example showing how to do it, based in your code:
Option 1: You can use the wget module, that will fetch the url and download it's contents into a local file (similar to the wget CLI command). Note that this means that the file will be stored locally, and then uploaded from the file. I added the os.remove line to remove the file once the upload is done.
from google.cloud import storage
import wget
import io, os
project_id = 'my-project'
bucket_name = 'my-bucket'
destination_blob_name = 'upload.test'
storage_client = storage.Client.from_service_account_json('my_creds.json')
source_file_name = 'http://www.hospiceofmontezuma.org/wp-content/uploads/2017/10/confused-man.jpg'
def upload_blob(bucket_name, source_file_name, destination_blob_name):
filename = wget.download(source_file_name)
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(filename, content_type='image/jpg')
os.remove(filename)
upload_blob(bucket_name, source_file_name, destination_blob_name)
Option 2: using the urllib module, works similar to the wget module, but instead of writing into a file it writes to a variable. Note that I did this example im Python3, there are some differences if you plan to run your script in Python 2.X.
from google.cloud import storage
import urllib.request
project_id = 'my-project'
bucket_name = 'my-bucket'
destination_blob_name = 'upload.test'
storage_client = storage.Client.from_service_account_json('my_creds.json')
source_file_name = 'http://www.hospiceofmontezuma.org/wp-content/uploads/2017/10/confused-man.jpg'
def upload_blob(bucket_name, source_file_name, destination_blob_name):
file = urllib.request.urlopen(source_file_name)
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_string(link.read(), content_type='image/jpg')
upload_blob(bucket_name, source_file_name, destination_blob_name)
Directly transferring URLs into GCS is possible through the Cloud Transfer service, but setting up a cloud transfer job for a single URL is a lot of overhead. That sort of solution is targeted towards a situation with millions of URLs that need to become GCS objects.
Instead, I recommend writing a job that pumps an incoming stream from reading a URL into a write stream to GCS and running that somewhere in the Google Cloud close to the bucket.

Categories