On a regular (non-flexible) instance of Google App Engine, you can use the Blobstore API and create a URL to allow a user to upload a file directly into your Blobstore. When it is uploaded, your app engine application is notified of the location of the file and can process it. An example of the python code is:
from google.appengine.ext import blobstore
upload_url = blobstore.create_upload_url('/upload_photo')
See the Blobstore docs.
Switching to Google App Engine Flexible Environment, usage of the Blobstore has been largely replaced by Cloud Storage. In such a case, is there an equivalent of create_upload_url?
My current implementation takes a standard file upload to a python Flask application. Then proceeds with something like:
from flask import request
from google.cloud import storage
uploaded_file = request.files.get('file')
gcs = storage.Client()
bucket = gcs.get_bucket(bucket_name)
blob = bucket.blob(blob_name)
blob.upload_from_string(
uploaded_file.read(),
content_type=uploaded_file.content_type
)
This seems like it is doubling the network load compared with create_upload_url because the file is coming into my app engine instance and then immediately being copied out. So the uploader will be made to wait extra time whilst this is happening. Presumably I will also incur extra App Engine charges for this. Is there a better way?
I have workers that later process the uploaded file, but I tend to download the file from Cloud Storage again in their code because I don't think you can assume that the worker will still have access to a file stored in the instance file system. Therefore I don't get any benefit of having the file uploaded to my instance rather than direct to it's storage location.
I have started using create_resumable_upload_session to create a signed URL that our client side application can upload a file to. Something like:
gcs = storage.Client()
bucket = gcs.get_bucket(BUCKET)
blob = bucket.blob(blob_name)
signed_url = blob.create_resumable_upload_session(content_type=content_type)
Then when the client has successfully uploaded a file to our storage, I subscribe to a Pub/Sub notification of the creation using this Cloud Pub/Sub Notifications for Cloud Storage.
Each blob created with the new Google Cloud Storage Client has a public_url property:
from flask import request
from google.cloud import storage
uploaded_file = request.files.get('file')
gcs = storage.Client()
bucket = gcs.get_bucket(bucket_name)
blob = bucket.blob('blob_name')
blob.upload_from_string(
uploaded_file.read(),
content_type=uploaded_file.content_type
)
url = blob.public_url
--
With the Blobstore, a GAE system handler in your instance takes care of the uploaded file you pass to the upload url created. I'm not sure if it's an issue handling it yourself in your code. If your current approach is problematic, you might want to consider doing the upload client side and not pass the file through App Engine at all. GCS has a REST API and the cloud storage client uses it underneath, so you can read and upload the file directly to GCS on the client side if it's more convenient. There's firebase.google.com/docs/storage/web/upload-files to ease you through the process
Related
I want to create CSV file from pandas data frame in a google storage bucket using colab tool.
Right now we use our gmail authentication to load csv file in storage bucket using below command
df.to_csv("gs://Jobs/data.csv")
I have checked below google links
https://cloud.google.com/docs/authentication/production#linux-or-macos
Currently, we used the below code to get credentials from service account
def getCredentialsFromServiceAccount(path: str) -> service_account.Credentials:
return service_account.Credentials.from_service_account_info(
path
)
Kindly suggest
There are two main approaches you can use to upload a csv to Cloud Storage within Colab. Both store the csv locally first and then upload it.
Use gsutil from within Colab.
Use Cloud Storage Python Client Library
The first approach is easiest but authenticates with the user signed into Colab and not a service account.
from google.colab import auth
auth.authenticate_user()
bucket_name = 'my-bucket'
df.to_csv('data.csv', index = False)
!gsutil cp 'data.csv' 'gs://{bucket_name}/'
The next approach uses the client library and authenticates with a service account.
from google.cloud import storage
bucket_name = 'my-bucket'
# store csv locally
df.to_csv('data.csv', index = False)
# start storage client
client = storage.Client.from_service_account_json("path/to/key.json")
# get bucket
bucket = client.bucket(bucket_name)
# create blob (where you want to store csv within bucket)
blob = bucket.blob("jobs/data.csv")
# upload blob to bucket
blob.upload_from_filename("data.csv")
I am uploading a file to a Cloud Storage bucket using the Python SDK:
from google.cloud import storage
bucket = storage.Client().get_bucket('mybucket')
df = # pandas df to save
csv = df.to_csv(index=False)
output = 'test.csv'
blob = bucket.blob(output)
blob.upload_from_string(csv)
How can I get the response to know if the file was uploaded successfully? I need to log the response to notify the user about the operation.
I tried with:
response = blob.upload_from_string(csv)
but it always return a None object even when the operation has succeded.
You can try with tqdm library.
import os
from google.cloud import storage
from tqdm import tqdm
def upload_function(client, bucket_name, source, dest, content_type=None):
bucket = client.bucket(bucket_name)
blob = bucket.blob(dest)
with open(source, "rb") as in_file:
total_bytes = os.fstat(in_file.fileno()).st_size
with tqdm.wrapattr(in_file, "read", total=total_bytes, miniters=1, desc="upload to %s" % bucket_name) as file_obj:
blob.upload_from_file(file_obj,content_type=content_type,size=total_bytes,
)
return blob
if __name__ == "__main__":
upload_function(storage.Client(), "bucket", "C:\files\", "Cloud:\blob.txt", "text/plain")
Regarding how to get notifications about changes made into the buckets there is a few ways that you could also try:
Using Pub/Sub - This is the recommended way where Pub/Sub notifications send information about changes to objects in your buckets to Pub/Sub, where the information is added to a Pub/Sub topic of your choice in the form of messages. Here you will find an example using python, as in your case, and using other ways as gsutil, other supported languages or REST APIs.
Object change notification with Watchbucket: This will create a notification channel that sends notification events to the given application URL for the given bucket using a gsutil command.
Cloud Functions with Google Cloud Storage Triggers using event-driven functions to handle events from Google Cloud Storage configuring these notifications to trigger in response to various events inside a bucket—object creation, deletion, archiving and metadata updates. Here there is some documentation on how to implement it.
Another way is using Eventarc to build an event-driven architectures, it offers a standardized solution to manage the flow of state changes, called events, between decoupled microservices. Eventarc routes these events to Cloud Run while managing delivery, security, authorization, observability, and error-handling for you. Here there is a guide on how to implement it.
Here you’ll be able to find related post with the same issue and answers:
Using Storage-triggered Cloud Function.
With Object Change Notification and Cloud Pub/Sub Notifications for Cloud Storage.
Answer with a Cloud Pub/Sub topic example.
You can verify if the upload gets any error, then use the exception's response methods:
def upload(blob,content):
try:
blob.upload_from_string(content)
except Exception as e:
status_code = e.response.status_code
status_desc = e.response.json()['error']['message']
else:
status_code = 200
status_desc = 'success'
finally:
return status_code,status_desc
Refs:
https://googleapis.dev/python/google-api-core/latest/_modules/google/api_core/exceptions.html
https://docs.python.org/3/tutorial/errors.html
I'm now trying to make this simple Python script that will run when called through an API call or through a Google Function. I'm very, very new to GCP and Python as I'm more familiar with Azure and PowerShell, but I need to know what I need to use/call in order to upload a file to a bucket and also read the file information, plus then connect to a MongoDB database.
Here is the flow of what I need to do:
API/function will be called with its URL and attached to it will be an actual file, like a seismic file type.
When the API/function is called, a Python script will run that will grab that file and upload it to a bucket.
Then I need to run commands against the uploaded file to retrieve items like "version","company","wellname", etc.
Then I want to upload a document, with all of these values, into a MongoDB database.
We're basically trying to replicate something we did in Azure with Functions and a CosmosDB instance. There, we created a function that would upload the file to Azure storage, then retrieve values from the file, which I believe is the metadata of it. After, we would upload a document to CosmosDB with these values. It's a way of recording values retrieved from the file itself. Any help would be appreciated as this is part of a POC I'm trying to present on!! Please ask any questions!
To answer your questions:
Here's a code on how to upload a file to a Cloud Storage bucket using Python:
from google.cloud import storage
def upload_blob(bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket."""
# bucket_name = "your-bucket-name"
# source_file_name = "local/path/to/file"
# destination_blob_name = "storage-object-name"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print(
"File {} uploaded to {}.".format(
source_file_name, destination_blob_name
)
)
Make sure that you have the Cloud Storage client library installed on your requirements.txt. Example:
google-cloud-storage>=1.33.0
To retrieve and view the object metadata here's the code and link as reference:
from google.cloud import storage
def blob_metadata(bucket_name, blob_name):
"""Prints out a blob's metadata."""
# bucket_name = 'your-bucket-name'
# blob_name = 'your-object-name'
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.get_blob(blob_name)
print("Blob: {}".format(blob.name))
print("Bucket: {}".format(blob.bucket.name))
print("Storage class: {}".format(blob.storage_class))
print("ID: {}".format(blob.id))
print("Size: {} bytes".format(blob.size))
...
Finally,
To connect to MongoDB using Python, here's a sample project which could help you understand how it works.
I am trying open a file I have in Google Cloud Storage using cloudstorage library.
I get the error that module cloudstorage has no attribute 'open'.
I want to specify a read buffer size when I load the file from Google Cloud Storage to Google BigQuery. This is the function that I wish to use for that. The parameters require a file like object.
Client.load_table_from_file(file_obj, destination, rewind=False, size=None, num_retries=6, job_id=None, job_id_prefix=None, location=None, project=None, job_config=None)[source]
Upload the contents of this table from a file-like object.
Is there any other way to the pass the Cloud Storage file as an object to this method? Or perhaps another way to load a file from cloud storage to Google BigQuery while specifying a read buffer size.
from google.cloud import bigquery
from google.cloud import storage
import cloudstorage as gcs
def hello_gcs(event, context):
gcs_file = gcs.open('no-trigger/transaction.csv')
job_config = bigquery.LoadJobConfig()
job_config.autodetect = False
job_config.max_bad_records=1
job_config.create_disposition = 'CREATE_IF_NEEDED'
job_config.source_format = bigquery.SourceFormat.CSV
load_job = bclient.load_table_from_file(
gcs_file,
dataset_ref.table(temptablename),
location='asia-northeast1',
size=2147483648,
job_config=job_config) # API request
We currently use blobstore.create_upload_url to create upload urls to be used on the frontend see Uploading a blob.
However, with the push toward Google Cloud Storage (GCS) by Google, I'd like to use GCS instead of the blobstore. We use currently blobstore.create_upload_url but I can't find anything equivalent in the GCS documentation. Am I missing something? Is there a better way to upload files to GCS from the frontend?
Thanks
Rob
If you will provide gs_bucket_name to blobstore.create_upload_url file will be stored in GCS instead of blobstore, this is described in official documentation: Using the Blobstore API with Google Cloud Storage
blobstore.create_upload_url(
success_path=webapp2.uri_for('upload'),
gs_bucket_name="mybucket/dest/location")
You can take a look at simple upload handler implementation made in webapp2
from google.appengine.ext import blobstore
from google.appengine.ext.webapp import blobstore_handlers
import webapp2
import cloudstorage as gcs
class Upload(blobstore_handlers.BlobstoreUploadHandler):
"""Upload handler
To upload new file you need to follow those steps:
1. send GET request to /upload to retrieve upload session URL
2. send POST request to URL retrieved in step 1
"""
def post(self):
"""Copy uploaded files to provided bucket destination"""
fileinfo = self.get_file_infos()[0]
uploadpath = fileinfo.gs_object_name[3:]
stat = gcs.stat(uploadpath)
# remove auto generated filename from upload path
destpath = "/".join(stat.filename.split("/")[:-1])
# copy file to desired location with proper filename
gcs.copy2(uploadpath, destpath)
# remove file from uploadpath
gcs.delete(uploadpath)
def get(self):
"""Returns URL to open upload session"""
self.response.write(blobstore.create_upload_url(
success_path=uri_for('upload'),
gs_bucket_name="mybucket/subdir/subdir2/filename.ext"))