I want to load "fonts" from Google Storage, I've try two ways, but none of them work. Any pointers? Appreciated for any advices provided.
First:
I follow the instruction load_font_from_gcs(uri)given in the answer here, but I received an NameError: name 'load_font_from_gcs' is not defined message. I installed google storage dependency and execute from google.cloud import storage
.
Second:
I try to execute the following code (reference #1) , and running into an blob has no attribute open() error, just the same answer I get it here, but as the reference in this link, it give a positive answer.
reference #1
bucket = storage_client.bucket({bucket_name})
blob = bucket.get_blob({blob_name)
with blob.open("r") as img:
imgblob = Image.open(img)
draw = ImageDraw.Draw(imgblob)
According to the provided links, your code must use BytesIO in order to work with the font file loaded from GCS.
The load_font_from_gcs is a custom function, written by the author of that question you referencing.
And it is not represented in the google-cloud-storage package.
Next, according to the official Google Cloud Storage documentation here:
Files from storage can be accessed this way (this example loads the font file into the PIL.ImageFont.truetype):
# Import PIL
from PIL import Image, ImageFont, ImageDraw
# Import the Google Cloud client library
from google.cloud import storage
# Import BytesIO module
from io import BytesIO
# Instantiate a client
storage_client = storage.Client()
# The name of the bucket
bucket_name = "my-new-bucket"
# Required blob
blob_name = "somefont.otf"
# Creates the bucket & blob instance
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(blob_name)
# Download the given blob
blob_content = blob.download_as_string()
# Make ImageFont out of it (or whatever you want)
font = ImageFont.truetype(BytesIO(font_file), 18)
So, your reference code can be changed respectively:
bucket = storage_client.bucket({bucket_name})
blob = bucket.get_blob({blob_name).download_as_string()
bytes = BytesIO(blob)
imgblob = Image.open(bytes)
draw = ImageDraw.Draw(imgblob)
You can read more about PIL here.
Also, don't forget to check the official Google Cloud Storage documentation.
(There are plenty of examples using Python code.)
Related
My requirement is to export the data from BQ to GCS in a particular sorted order which I am not able to get using automatic export and hence trying to write a manual export for this.
File format is like below:
HDR001||5378473972abc||20101|182082||
DTL001||436282798101|
DTL002||QS
DTL005||3733|8
DTL002||QA
DTL005||3733|8
DTL002||QP
DTL005||3733|8
DTL001||436282798111|
DTL002||QS
DTL005||3133|2
DTL002||QA
DTL005||3133|8
DTL002||QP
DTL005||3133|0
I am very new to this and am able to write the file in local disk but I am not sure how I can write this to file to GCS. I tried to use the write_to_file but I seem to be missing something.
import pandas as pd
import pickle as pkl
import tempfile
from google.colab import auth
from google.cloud import bigquery, storage
#import cloudstorage as gcs
auth.authenticate_user()
df = pd.DataFrame(data=job)
sc = storage.Client(project='temp-project')
with tempfile.NamedTemporaryFile(mode='w+b', buffering=- 1,prefix='test',suffix='temp') as fh:
with open(fh.name,'w+',newline='') as f:
dfAsString = df.to_string(header=" ", index=False)
fh.name = fh.write(dfAsString)
fh.close()
bucket = sc.get_bucket('my-bucket')
target_fn = 'test.csv'
source_fn = fh.name
destination_blob_name = bucket.blob('test.csv')
bucket.blob(destination_blob_name).upload_from_file(source_fn)
Can someone please help?
Thank You.
I would suggest to upload an object through a Cloud Storage bucket. Instead of upload_from_file, you need to use upload_from_filename. Your code should look like this:
bucket.blob(destination_blob_name).upload_from_filename(source_fn)
Here are links for the documentation on how to upload an object to Cloud Storage bucket and Client library docs.
EDIT:
The reason why you're getting that is because somewhere in your code, you're passing a Blob object, rather than a String. Currently your destination variable is a Blob Object, change it to String instead:
destination_blob_name = bucket.blob('test.csv')
to
destination_blob_name = 'test.csv'
I can successfully access the google cloud bucket from my python code running on my PC using the following code.
client = storage.Client()
bucket = client.get_bucket('bucket-name')
blob = bucket.get_blob('images/test.png')
Now I don't know how to retrieve and display image from the "blob" without writing to a file on the hard-drive?
You could, for example, generate a temporary url
from gcloud import storage
client = storage.Client() # Implicit environ set-up
bucket = client.bucket('my-bucket')
blob = bucket.blob('my-blob')
url_lifetime = 3600 # Seconds in an hour
serving_url = blob.generate_signed_url(url_lifetime)
Otherwise you can set the image as public in your bucket and use the permanent link that you can find in your object details
https://storage.googleapis.com/BUCKET_NAME/OBJECT_NAME
Download the image from GCS as bytes, wrap it in BytesIO object to make the bytes file-like, then read in as a PIL Image object.
from io import BytesIO
from PIL import Image
img = Image.open(BytesIO(blob.download_as_bytes()))
Then you can do whatever you want with img -- for example, to display it, use plt.imshow(img).
In Jupyter notebooks you can display the image directly with download_as_bytes:
from google.cloud import storage
from IPython.display import Image
client = storage.Client() # Implicit environment set up
# with explicit set up:
# client = storage.Client.from_service_account_json('key-file-location')
bucket = client.get_bucket('bucket-name')
blob = bucket.get_blob('images/test.png')
Image(blob.download_as_bytes())
I want to load a model which is saved as a joblib file from Google Cloud Storage bucket. When it is in local path, we can load it as follows (considering model_file is the full path in system):
loaded_model = joblib.load(model_file)
How can we do the same task with Google Cloud Storage?
For anyone googling around for an answer to this.
Here are two more options besides the obvious, to use Google AI platform for model hosting (and online predictions).
Option 1 is to use TemporaryFile like this:
from google.cloud import storage
from sklearn.externals import joblib
from tempfile import TemporaryFile
storage_client = storage.Client()
bucket_name=<bucket name>
model_bucket='model.joblib'
bucket = storage_client.get_bucket(bucket_name)
#select bucket file
blob = bucket.blob(model_bucket)
with TemporaryFile() as temp_file:
#download blob into temp file
blob.download_to_file(temp_file)
temp_file.seek(0)
#load into joblib
model=joblib.load(temp_file)
#use the model
model.predict(...)
Option 2 is to use BytesIO like this:
from google.cloud import storage
from sklearn.externals import joblib
from io import BytesIO
storage_client = storage.Client()
bucket_name=<bucket name>
model_bucket='model.joblib'
bucket = storage_client.get_bucket(bucket_name)
#select bucket file
blob = bucket.blob(model_bucket)
#download blob into an in-memory file object
model_file = BytesIO()
blob.download_to_file(model_file)
#load into joblib
model=joblib.load(model_local)
Alternate answer as of 2020 using tf2, you can do this:
import joblib
import tensorflow as tf
gcs_path = 'gs://yourpathtofile'
loaded_model = joblib.load(tf.io.gfile.GFile(gcs_path, 'rb'))
I found using gcsfs to be the fastest (and most compact) method to use:
def load_joblib(bucket_name, file_name):
fs = gcsfs.GCSFileSystem()
with fs.open(f'{bucket_name}/{file_name}') as f:
return joblib.load(f)
I don't think that's possible, at least in a direct way. I though about a workaround, but the might not be as efficient as you want.
By using the Google Cloud Storage client libraries [1] you can download the model file first, load it, and when your program ends, delete it. Of course, this means that you need to download the file every time you run the code. Here is a snippet:
from google.cloud import storage
from sklearn.externals import joblib
storage_client = storage.Client()
bucket_name=<bucket name>
model_bucket='model.joblib'
model_local='local.joblib'
bucket = storage_client.get_bucket(bucket_name)
#select bucket file
blob = bucket.blob(model_bucket)
#download that file and name it 'local.joblib'
blob.download_to_filename(model_local)
#load that file from local file
job=joblib.load(model_local)
For folks who are Googling around with this problem - here's another option. The open source modelstore library is a wrapper that deals with the process of saving, uploading, and downloading models from Google Cloud Storage.
Under the hood, it saves scikit-learn models using joblib, creates a tar archive with the files, and up/downloads them from a Google Cloud Storage bucket using blob.upload_from_file() and blob.download_to_filename().
In practice it looks a bit like this (a full example is here):
# Create modelstore instance
from modelstore import ModelStore
ModelStore.from_gcloud(
os.environ["GCP_PROJECT_ID"], # Your GCP project ID
os.environ["GCP_BUCKET_NAME"], # Your Cloud Storage bucket name
)
# Train and upload a model (this currently works with 9 different ML frameworks)
model = train() # Replace with your code to train a model
meta_data = modelstore.sklearn.upload("my-model-domain", model=model)
# ... and later when you want to download it
model_path = modelstore.download(
local_path="/path/to/a/directory",
domain="my-model-domain",
model_id=meta_data["model"]["model_id"],
)
The full documentation is here.
This is the shortest way I found so far:
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket("my-gcs-bucket")
blob = bucket.blob("model.joblib")
with blob.open(mode="rb") as file:
model = joblib.load(file)
I feel kind of stupid right now. I have been reading numerous documentations and stackoverflow questions but I can't get it right.
I have a file on Google Cloud Storage. It is in a bucket 'test_bucket'. Inside this bucket there is a folder, 'temp_files_folder', which contains two files, one .txt file named 'test.txt' and one .csv file named 'test.csv'. The two files are simply because I try using both but the result is the same either way.
The content in the files is
hej
san
and I am hoping to read it into python the same way I would do on a local with
textfile = open("/file_path/test.txt", 'r')
times = textfile.read().splitlines()
textfile.close()
print(times)
which gives
['hej', 'san']
I have tried using
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('test_bucket')
blob = bucket.get_blob('temp_files_folder/test.txt')
print(blob.download_as_string)
but it gives the output
<bound method Blob.download_as_string of <Blob: test_bucket, temp_files_folder/test.txt>>
How can I get the actual string(s) in the file?
download_as_string is a method, you need to call it.
print(blob.download_as_string())
More likely, you want to assign it to a variable so that you download it once and can then print it and do whatever else you want with it:
downloaded_blob = blob.download_as_string()
print(downloaded_blob)
do_something_else(downloaded_blob)
The method 'download_as_string()' will read in the content as byte.
Find below an example to process a .csv file.
import csv
from io import StringIO
from google.cloud import storage
storage_client = storage.Client()
bucket = storage_client.get_bucket(YOUR_BUCKET_NAME)
blob = bucket.blob(YOUR_FILE_NAME)
blob = blob.download_as_string()
blob = blob.decode('utf-8')
blob = StringIO(blob) #tranform bytes to string here
names = csv.reader(blob) #then use csv library to read the content
for name in names:
print(f"First Name: {name[0]}")
According to the documentation (https://googleapis.dev/python/storage/latest/blobs.html), As of the time of writing (2021/08), the download_as_string method is a depreciated alias for the download_as_byte method which - as suggested by the name - returns a byte object.
You can instead use the download_as_text method to return a str object.
For instances, to download the file MYFILE from bucket MYBUCKET and store it as an utf-8 encoded string:
from google.cloud.storage import Client
client = Client()
bucket = client.get_bucket(MYBUCKET)
blob = bucket.get_blob(MYFILE)
downloaded_file = blob.download_as_text(encoding="utf-8")
You can then also use this in order to read different file formats. For json, replace the last line to
import json
downloaded_json_file = json.loads(blob.download_as_text(encoding="utf-8"))
For yaml files, replace the last line to :
import yaml
downloaded_yaml_file = yaml.safe_load(blob.download_as_text(encoding="utf-8"))
DON'T USE: blob.download_as_string()
USE: blob.download_as_text()
blob.download_as_text() does indeed return a string.
blob.download_as_string() is deprecated and returns a bytes object instead of a string object.
Works out when reading a docx / text file
from google.cloud import storage
# create storage client
storage_client = storage.Client.from_service_account_json('**PATH OF JSON FILE**')
bucket = storage_client.get_bucket('**BUCKET NAME**')
# get bucket data as blob
blob = bucket.blob('**SPECIFYING THE DOXC FILENAME**')
downloaded_blob = blob.download_as_string()
downloaded_blob = downloaded_blob.decode("utf-8")
print(downloaded_blob)
I'm trying to figure out how to upload a Pillow Image instance to a Firebase storage bucket. Is this possible?
Here's some code:
from PIL import Image
image = Image.open(file)
# how to upload to a firebase storage bucket?
I know there's a gcloud-python library but does this support Image instances? Is converting the image to a string my only option?
The gcloud-python library is the correct library to use. It supports uploads from Strings, file pointers, and local files on the file system (see the docs).
from PIL import Image
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('bucket-id-here')
blob = bucket.blob('image.png')
# use pillow to open and transform the file
image = Image.open(file)
# perform transforms
image.save(outfile)
of = open(outfile, 'rb')
blob.upload_from_file(of)
# or... (no need to use pillow if you're not transforming)
blob.upload_from_filename(filename=outfile)
This is how to directly upload the pillow image to firebase storage
from PIL import Image
from firebase_admin import credentials, initialize_app, storage
# Init firebase with your credentials
cred = credentials.Certificate("YOUR DOWNLOADED CREDENTIALS FILE (JSON)")
initialize_app(cred, {'storageBucket': 'YOUR FIREBASE STORAGE PATH (without gs://)'})
bucket = storage.bucket()
blob = bucket.blob('image.jpg')
bs = io.BytesIO()
im = Image.open("test_image.jpg")
im.save(bs, "jpeg")
blob.upload_from_string(bs.getvalue(), content_type="image/jpeg")