I developed a flask app on localhost and now I want to publish it by google cloud platform. Problem occurs when trying to read/write a file in google cloud storage.
First I installed google-cloud-storage via pip:
pip install --upgrade google-cloud-storage
then my code:
from google.cloud import storage
data_file_path = "url-to-txt-file-in-google-storage"
with storage.open(data_file_path, "wb") as gcsFile:
gcsFile.write("xx")
At this point error says: AttributeError: module 'google.cloud.storage' has no attribute 'open'
I also tried it with pip install GoogleAppEngineCloudStorageClient but it didn't work either.
What should I do to make it work? How could I read and write to a file in google cloud storage without downloading it?
Firstly, the error is very clear, google.cloud.storage is a python package, it is not an object so it doesn't have open() method or any method. Actually you can see the source code here: https://github.com/googleapis/python-storage/tree/main/google/cloud/storage
Basically you need to create a Client first. If you want to just copy the file, here is an example:
https://cloud.google.com/storage/docs/copying-renaming-moving-objects#storage-copy-object-python
from google.cloud import storage
def copy_blob(
bucket_name, blob_name, destination_bucket_name, destination_blob_name
):
"""Copies a blob from one bucket to another with a new name."""
# bucket_name = "your-bucket-name"
# blob_name = "your-object-name"
# destination_bucket_name = "destination-bucket-name"
# destination_blob_name = "destination-object-name"
storage_client = storage.Client()
source_bucket = storage_client.bucket(bucket_name)
source_blob = source_bucket.blob(blob_name)
destination_bucket = storage_client.bucket(destination_bucket_name)
blob_copy = source_bucket.copy_blob(
source_blob, destination_bucket, destination_blob_name
)
print(
"Blob {} in bucket {} copied to blob {} in bucket {}.".format(
source_blob.name,
source_bucket.name,
blob_copy.name,
destination_bucket.name,
)
)
Related
If you have a gs:// blob, how do you download that file from GCS to Vertex AI notebook in GCP using python client library?
To download GCS file to Vertex AI notebook refer following python code:
from google.cloud import storage
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# bucket_name = "your-bucket-name"
# The ID of your GCS object
# source_blob_name = "storage-object-name"
# destination_file_name = "/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print(
"Downloaded storage object {} from bucket {} to file {}.".format(
source_blob_name, bucket_name, destination_file_name
)
)
Alternatively, if you want to download GCS file to jupyter directory you can use the command:
!gsutil cp gs://BUCKET_NAME/OBJECT_NAME JUPYTER_LOCATION
I have adhered to google cloud storage documentation for getting a local file on a server uploaded as a blob to a bucket I created yet it does not work. It does work manually via the cloud console gui however. Note that my code for downloading an export generated object from a bucket in a cloudstoragesink works which does also call the cloud storage API, but the upload does not. I have tried to get some errors info for debugging on stdout or stderr to no avail. My credentials/authentication all checks out as I am able to use them for downloading. Also, the bucket I am trying to upload to has already been created in another function call, could it be that the storage client bucket method is attempting to instantiate an already existing bucket, thus not getting past that and erroring out before going to the upload blob method ? Any insight is appreciated.
Py reference for blob upload from filename function
GCS api guide for uploading objects to bucket
def upload_blob(bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket."""
# The ID of your GCS bucket
# bucket_name = "your-bucket-name"
# The path to your file to upload
# source_file_name = "local/path/to/file"
# The ID of your GCS object
# destination_blob_name = "storage-object-name"
storage_client = storage.Client.from_service_account_json('/opt/gws/creds/gws-vault-data-export-ops-bfd51297e810.json')
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print(
"File {} uploaded to {}.".format(
source_file_name, destination_blob_name
)
)
print('\n')
print('************List Blobs***********')
blobs_list = list_blobs(bucket_name_targ)
print('\n')
print(blobs_list)
print('\n')
blobs_list_set = set(map(lambda blob: blob.name, blobs_list))
print(blobs_list_set)
print('\n')
dl_dir = export['name']
blob_file = export_filename_dl
source_file_name = os.path.join(working_dir_param, dl_dir, blob_file)
print(source_file_name)
destination_blob_name = blob_file + 'api-upload'
upload_blob(bucket_name_targ, source_file_name, destination_blob_name)
#if blob_file not in blobs_list_set:
#print('AT UPLOAD CONDITION BLOCK')
#upload_blob(bucket_name_targ, source_file_name, destination_blob_name)
#else:
#print('Upload did not succeed')
I got it working. Turns out, since I am using a custom ansible module with field parameters including one for Linux pwd for the downloads of the exports which the upload function gets passed the string path for, I had to correctly combine the custom module parameter with the downloaded directory/file path to pass into source_file_name . Like so:
"pwd": {"default": None, "type": "str"}
working_dir_param = module.params["pwd"]
path_tail_actual = directory_main + filename_main
source_file_name_actual = working_dir_param + path_tail_actual
I'm working with GCP buckets to store data, my first approach to read write files into/from the buckets was:
def upload_blob(bucket_name, source_file_name, destination_blob_name,credentials):
"""Uploads a file to the bucket."""
client = storage.Client.from_service_account_json(credentials)
bucket = client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print('File {} uploaded to {}.'.format(
source_file_name,
destination_blob_name))
def download_blob(bucket_name, source_blob_name, destination_file_name,credentials):
"""Downloads a blob from the bucket."""
storage_client = storage.Client.from_service_account_json(credentials)
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print('Blob {} downloaded to {}.'.format(
source_blob_name,
destination_file_name))
Which does works fine, but I wan to save it as envioment variables to not keep the files around all the time. As I understand it, google if the credentials are not provided, changing:
client = storage.Client.from_service_account_json(credentials)
for:
client = storage.Client()
Then google with search for the default credentials, which can be set by doing:
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json"
Which I'm doing and don't get any error:
But when I try to access to the bucket I get the following error:
DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
I'm following the link, creating a new key:
And trying whit that one instead, but still getting the same error.
You are getting the error because, the interpreter unable to fetch the login credentials needed to proceed.
client = storage.Client.from_service_account_json(credentials)
On this line, are you mentioning just credentials or the path of the file, can you try importing the serviceaccount.json file and then try.
Solution:please refer below code snippet,
from google.cloud import storage client = storage.Client.from_service_account_json('serviecaccount.json')
where as the 'serviecaccount.json' file i have kept in the same project repo
I'm a Ruby dev trying my hand at Google Cloud Functions written in Python and have hit a wall with transferring a remote file from a given URL to Google Cloud Storage (GCS).
In an equivalent RoR app I download to the app's ephemeral storage and then upload to GSC.
I am hoping there's a way to simply 'download' the remote file to my GCS bucket via the Cloud Function.
Here's a simplified example of what I am doing with some comments, the real code fetches the URLs from a private API, but that works fine and isn't where the issue is.
from google.cloud import storage
project_id = 'my-project'
bucket_name = 'my-bucket'
destination_blob_name = 'upload.test'
storage_client = storage.Client.from_service_account_json('my_creds.json')
# This works fine
#source_file_name = 'localfile.txt'
# When using a remote URL I get 'IOError: [Errno 2] No such file or directory'
source_file_name = 'http://www.hospiceofmontezuma.org/wp-content/uploads/2017/10/confused-man.jpg'
def upload_blob(bucket_name, source_file_name, destination_blob_name):
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
upload_blob(bucket_name, source_file_name, destination_blob_name)
Thanks in advance.
It is not possible to upload a file to Google Cloud Storage directly from an URL. Since you are running the script from a local environment, the file contents that you want to upload, need to be in that same environment. This means that the contents of the url need to either be stored in the memory, or in a file.
An example showing how to do it, based in your code:
Option 1: You can use the wget module, that will fetch the url and download it's contents into a local file (similar to the wget CLI command). Note that this means that the file will be stored locally, and then uploaded from the file. I added the os.remove line to remove the file once the upload is done.
from google.cloud import storage
import wget
import io, os
project_id = 'my-project'
bucket_name = 'my-bucket'
destination_blob_name = 'upload.test'
storage_client = storage.Client.from_service_account_json('my_creds.json')
source_file_name = 'http://www.hospiceofmontezuma.org/wp-content/uploads/2017/10/confused-man.jpg'
def upload_blob(bucket_name, source_file_name, destination_blob_name):
filename = wget.download(source_file_name)
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(filename, content_type='image/jpg')
os.remove(filename)
upload_blob(bucket_name, source_file_name, destination_blob_name)
Option 2: using the urllib module, works similar to the wget module, but instead of writing into a file it writes to a variable. Note that I did this example im Python3, there are some differences if you plan to run your script in Python 2.X.
from google.cloud import storage
import urllib.request
project_id = 'my-project'
bucket_name = 'my-bucket'
destination_blob_name = 'upload.test'
storage_client = storage.Client.from_service_account_json('my_creds.json')
source_file_name = 'http://www.hospiceofmontezuma.org/wp-content/uploads/2017/10/confused-man.jpg'
def upload_blob(bucket_name, source_file_name, destination_blob_name):
file = urllib.request.urlopen(source_file_name)
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_string(link.read(), content_type='image/jpg')
upload_blob(bucket_name, source_file_name, destination_blob_name)
Directly transferring URLs into GCS is possible through the Cloud Transfer service, but setting up a cloud transfer job for a single URL is a lot of overhead. That sort of solution is targeted towards a situation with millions of URLs that need to become GCS objects.
Instead, I recommend writing a job that pumps an incoming stream from reading a URL into a write stream to GCS and running that somewhere in the Google Cloud close to the bucket.
How can I upload a file to Google Cloud Storage from Python 3? Eventually Python 2, if it's infeasible from Python 3.
I've looked and looked, but haven't found a solution that actually works. I tried boto, but when I try to generate the necessary .boto file through gsutil config -e, it keeps saying that I need to configure authentication through gcloud auth login. However, I have done the latter a number of times, without it helping.
Use the standard gcloud library, which supports both Python 2 and Python 3.
Example of Uploading File to Cloud Storage
from gcloud import storage
from oauth2client.service_account import ServiceAccountCredentials
import os
credentials_dict = {
'type': 'service_account',
'client_id': os.environ['BACKUP_CLIENT_ID'],
'client_email': os.environ['BACKUP_CLIENT_EMAIL'],
'private_key_id': os.environ['BACKUP_PRIVATE_KEY_ID'],
'private_key': os.environ['BACKUP_PRIVATE_KEY'],
}
credentials = ServiceAccountCredentials.from_json_keyfile_dict(
credentials_dict
)
client = storage.Client(credentials=credentials, project='myproject')
bucket = client.get_bucket('mybucket')
blob = bucket.blob('myfile')
blob.upload_from_filename('myfile')
A simple function to upload files to a gcloud bucket.
from google.cloud import storage
#pip install --upgrade google-cloud-storage.
def upload_to_bucket(blob_name, path_to_file, bucket_name):
""" Upload data to a bucket"""
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json(
'creds.json')
#print(buckets = list(storage_client.list_buckets())
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(blob_name)
blob.upload_from_filename(path_to_file)
#returns a public url
return blob.public_url
You can generate a credential file using this link: https://cloud.google.com/storage/docs/reference/libraries?authuser=1#client-libraries-install-python
Asynchronous Example:
import asyncio
import aiohttp
# pip install aiofile
from aiofile import AIOFile
# pip install gcloud-aio-storage
from gcloud.aio.storage import Storage
BUCKET_NAME = '<bucket_name>'
FILE_NAME = 'requirements.txt'
async def async_upload_to_bucket(blob_name, file_obj, folder='uploads'):
""" Upload csv files to bucket. """
async with aiohttp.ClientSession() as session:
storage = Storage(service_file='./creds.json', session=session)
status = await storage.upload(BUCKET_NAME, f'{folder}/{blob_name}', file_obj)
#info of the uploaded file
# print(status)
return status['selfLink']
async def main():
async with AIOFile(FILE_NAME, mode='r') as afp:
f = await afp.read()
url = await async_upload_to_bucket(FILE_NAME, f)
print(url)
# Python 3.6
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
# Python 3.7+
# asyncio.run(main())
Imports the Google Cloud client library (need credentials)
from google.cloud import storage
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="C:/Users/siva/Downloads/My First Project-e2d95d910f92.json"
Instantiates a client
storage_client = storage.Client()
buckets = list(storage_client.list_buckets())
bucket = storage_client.get_bucket("ad_documents") # your bucket name
blob = bucket.blob('chosen-path-to-object/{name-of-object}')
blob.upload_from_filename('D:/Download/02-06-53.pdf')
print(buckets)
When installing Google Cloud Storage API:
pip install google-cloud
will throw a ModuleNotFoundError:
from google.cloud import storage
ModuleNotFoundError: No module named 'google'
Make sure you install as in Cloud Storage Client Libraries Docs:
pip install --upgrade google-cloud-storage
This official repo contains a handful of snippets demonstrating the different ways to upload a file to a bucket: https://github.com/googleapis/python-storage/tree/05e07f248fc010d7a1b24109025e9230cb2a7259/samples/snippets
upload_from_string()
upload_from_file()
upload_from_filename()