migrating images to google app engine datastore or blobstore

migrating images to google app engine datastore or blobstore - python

I have property model, containing a field image_url.
class Property(ndb.Model):
date_created=data.UTCDateTimeProperty(auto_now_add=True)
# some other fields here
image_url = ndb.StringProperty(indexed=False)
and image model,
class Image(ndb.Model):
property = ndb.KeyProperty()
file = ndb.KeyProperty(indexed=False)
# some other fields
image_url = ndb.StringProperty(indexed=False)
Now I have 'n' number of images for each property in my local machine. Name of each image is mapped to corresponding property id in csv file. I want to bulk upload all these images from my local machine to google app engine datastore or blobstore.
I tried to google up but feel like I am stuck, any help or reference would be highly appreciated.

Google Cloud Storage might be a better option for you:
You get a nice program to work with it, gsutil, that will let you upload easily from the console, so you can write your own scripts :)
You can keep the filenames you already have, and setup your own directory structure so that it makes more sense for your app. If data is static then you might not even need support models.
Example, from the links above, on how you'd end up uploading your images:
gsutil cp *.jpg gs://images
The cp command behaves much like the Unix cp command with the recursion (-R) option, allowing you to copy whole directories or just the contents of directories. gsutil also supports wildcards, which makes it easy for you to copy or move batches of files.

Related

Multiple storage containers in Django

Is it possible in Django to store image files in different buckets/containers in external object storage OpenStack Swift?
I have an issue related with creating proper way to upload image files through REST service request. The user is able to POST 'Task' instance via first endpoint, containing name, description, owner_id etc. He is also able to POST images via another endpoint, which has a relation many to one with Task.
Images should be stored in OpenStack Swift in my case (server already exists, was set up etc.) in unique containers/buckets as follows: owner_id_task_id. It is related, that users can upload files with the same names and extensions, but different content.
Another part will be sending also to the same containers in OpenStack Swift a files from Celery worker tasks (some processing based on uploaded files).
My goal is to achieve dynamically-created/overrided at runtime container structure, for storing raw images, and also post-processed images.
Any ideas how this problem can be solved?
Thanks for the help!

Yes. This is possible. You can use FileField.storage to configure which storage to use in individual model fields.

upload image file to datastore using endpoints in python?

I am trying to make a form for adding users that will store user information to my database.
I want to upload an image file using Cloud Endpoints (python). I do not have any idea how to do it.
What will be the input class (request class) and output class (response class)?
#endpoints.method(inputclass, outputclass,
path='anypath', http_method='GET',
name='apiname')"
What url will I provide in the action= form field for uploading the image? How can I then show the file?

You have two ways to store data files (include image file).
The first one is to convert your image in base 64 format, and store it on datastore (Not the best).
The second one, is to store your image file in Google Cloud Storage (That is the best) or the blobstore (By Google themself, there is no good reason to use Blobstore).
So you have to store, your image file to Google Cloud Storage with your endpoint: https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/?hl=fr
Personnaly, I use a servlet (App Engine) to store image in GCS. My endpoints call my servlet and pass image in parameter, and my servlet store image in GCS. It works very well =)
Hope I help you..

Storing images (from URL) to GAE Datastore

I am building a web application based on Google App Engine using python; however I have seem to have hit a roadblock.
Currently, my site fetches remote image URLs from an external website. The URLs are placed in a list and sent back to my application. I want to be able to store the respective images (not the URLs) in my datastore; in order to avoid having to fetch the remote images each time as well as having to deal with broken links.
The solutions I have found online all deal with a user having to upload their own images. Which I tried implementing, but I am not sure what happens to an uploaded image (or how it gets converted into a blob) once the user hits the submit button.
To my understanding, a blob is a collection of binary data stored as a single entity (from Wikipedia). Therefore, I tried using the following:
class rentalPropertyDB(ndb.Model):
streetNAME = ndb.StringProperty(required=True)
image = ndb.BlobProperty(default=None)
class MainPage(BaseHandler):
def get(self):
self.render("index.html")
def post(self):
rental = rentalPropertyDB()
for image in img_urls:
rental.image = urlfetch.Fetch(image).content
rental.put()
The solution to this question: Image which is stored as a blob in Datastore in a html page
is identical to mine, however the solution suggests to upload the image to the blobstore and uses:
upload_files = self.get_uploads('file')
blob_info = upload_files[0]
This confuses me because I am not sure what exactly 'file' refers to. Would I replace 'file' with the URL of each image? Or would I need to perform some operation to each image prior to replacing?
I have been stuck on this issue for at least two days now and would greatly appreciate any help that's provided. I think the main reason why this is confusing me so much is because of the variety of methods used in each solution. I.e. using Google Cloud Storage, URLFetch, Images API, and the various types of ndb.Blobstore's (BlobKeyProperty vs. BlobProperty) and etc.
Thank you.

Be careful with blob inside Models. A Model cannot be more than 1 Mo including the blobproperty.
If we listen Google, there is no good reason to use the blobstore. If you can, use Google Cloud Storage. It made to store files.

How to serve cloudstorage files using app engine SDK

In app engine I can serve cloudstorage files like a pdf using the default bucket of my application:
http://storage.googleapis.com/<appid>.appspot.com/<file_name>
But how can I serve local cloudstorage files in the SDK, without making use of a blob_key?
I write to the default bucket like this:
gcs_file_name = '/%s/%s' % (app_identity.get_default_gcs_bucket_name(), file_name)
with gcs.open(gcs_file_name, 'w') as f:
f.write(data)
The name of the default bucket in the SDK = 'app_default_bucket'
In the SDK datastore I have a Kind: GsFileInfo showing: filename: /app_default_bucket/example.pdf
Update and workaround: You can get a serving url for NON image files like css, js and pdf.
gs_file = '/gs/%s/%s/%s' % (app_identity.get_default_gcs_bucket_name(), folder, filename)
serving_url = images.get_serving_url(blobstore.create_gs_key(gs_file))

UPDATE I found this feature to serve cloudstorage files using the SDK:
This feature has not been documented yet.
http://localhost:8080/_ah/gcs/app_default_bucket/filename
This meands we do not need the img serving url to serve NON images as shown below !!!
To create e serving url for cloudstorage files like images, css, js and pdf's in the default_bucket, I use this code for testing(SDK) and GAE production:
IMPORTANT: the images.get_serving_url() works also for NON images in the SDK!!
In the SDK you stll need the blobstore to read a blob and create a serving url for a cloudstorage object.
I also added the code to read, write and upload cloudstorage blobs in the SDK and GAE production.
The code can be found here.

This is the value that you see in the Development mode from app_identity_stub.py:
APP_DEFAULT_GCS_BUCKET_NAME = 'app_default_bucket'
The comments in this file explain it:
This service behaves the same as the production service, except using
constant values instead of app-specific values
You should get the correct URL in your production code.
EDIT:
This is from the support forum:
In development mode, the app engine tools simulate Google Cloud
Storage services locally. Objects in that simulated environment are
non-persistent so your app is failing because the desired object
doesn't exist in the local store. If you first create (and optionally
write to) the object you're trying to read, it should work fine in dev
mode (it did for me). Of course, objects in the production service are
persistent so there's no need for that extra step when running your
app in production mode (assuming the object already exists).
Hope that helps,
Marc Google Cloud Storage Team
This means you have to write a file first, then you can use it. If I understand correctly, you can use any bucket name for this purpose, including 'app_default_bucket'.

I was here earlier looking for answers and just wanted to share what I found, now that I have it working.
You can do this now, and it's only a little painful. Tricking the image or blobstore API isn't supported and doesn't seem to work any longer.
See:
https://cloud.google.com/storage/docs/access-control/signed-urls
https://cloud.google.com/storage/docs/access-control/create-signed-urls-gsutil
If you sign your URLs, you can give auto-expiring links to your content, for anonymous or paywalled consumption. You wouldn't want to serve your whole site this way, but for a PDF or whatnot, this is a valid and semi-secure option.
Missing from the documentation, you might need to drop the newline for the canonical extended headers. The storage endpoint will tell you what it expects when the signature is bad.
Also, your host should be: https://storage-download.googleapis.com/
If you're using App Engine, then the GoogleAccessId is: <projectname>#appspot.gserviceaccount.com
See: app_identity.get_service_account_name()
Example of how to generate the signature:
from google.appengine.api import app_identity
def signFile(path, verb='GET', md5='', contentType='',
expiration=''):
signatureRequest = '{}\n{}\n{}\n{}\n{}'.format(
verb, md5, contentType, expiration, path)
return app_identity.sign_blob(signatureRequest)
That returns a tuple of (privateKey, binarySignature).
Now you need to construct the URL. The signature should be base64 encoded, then urlencoded. See the following for how to finish constructing the URL. You should probable use the download host mentioned above.
Example URL from the docs:
https://storage.googleapis.
com/example-bucket/cat.jpeg?GoogleAccessId=example#example-project.iam.gservicea
ccount.com&Expires=1458238630&Signature=VVUgfqviDCov%2B%2BKnmVOkwBR2olSbId51kSib
uQeiH8ucGFyOfAVbH5J%2B5V0gDYIioO2dDGH9Fsj6YdwxWv65HE71VEOEsVPuS8CVb%2BVeeIzmEe8z
7X7o1d%2BcWbPEo4exILQbj3ROM3T2OrkNBU9sbHq0mLbDMhiiQZ3xCaiCQdsrMEdYVvAFggPuPq%2FE
QyQZmyJK3ty%2Bmr7kAFW16I9pD11jfBSD1XXjKTJzgd%2FMGSde4Va4J1RtHoX7r5i7YR7Mvf%2Fb17
zlAuGlzVUf%2FzmhLPqtfKinVrcqdlmamMcmLoW8eLG%2B1yYW%2F7tlS2hvqSfCW8eMUUjiHiSWgZLE
VIG4Lw%3D%3D
I hope this helps someone!
Oh yeah, you only need to do all the signature stuff if your bucket isn't publicly accessible (read-all).

Storing multiple files with the same name in Google Cloud Storage?

So I am trying to port a Python webapp written with Flask to Google App Engine. The app hosts user uploaded files up to 200mb in size, and for non-image files the original name of the file needs to be retained. To prevent filename conflicts, e.g. two people uploading stuff.zip, each containing completely different and unrelated contents, the app creates a UUID folder on the filesystem and stores the file within that, and serves them to users. Google App Engine's Cloud Storage, which I was planning on using to store the user files, by making a bucket - according to their documentation has "no notion of folders". What is the best way to go about getting this same functionality with their system?
The current method, just for demonstration:
# generates a new folder with a shortened UUID name to save files
# other than images to avoid filename conflicts
else:
# if there is a better way of doing this i'm not clever enough
# to figure it out
new_folder_name = shortuuid.uuid()[:9]
os.mkdir(
os.path.join(app.config['FILE_FOLDER'], new_folder_name))
file.save(
os.path.join(os.path.join(app.config['FILE_FOLDER'], new_folder_name), filename))
new_folder_path = os.path.join(
app.config['FILE_FOLDER'], new_folder_name)
return url_for('uploaded_file', new_folder_name=new_folder_name)

From the Google Cloud Storage Client Library Overview documentation:
GCS and "subdirectories"
Google Cloud Storage documentation refers to "subdirectories" and the GCS client library allows you to supply subdirectory delimiters when you create an object. However, GCS does not actually store the objects into any real subdirectory. Instead, the subdirectories are simply part of the object filename. For example, if I have a bucket my_bucket and store the file somewhere/over/the/rainbow.mp3, the file rainbow.mp3 is not really stored in the subdirectory somewhere/over/the/. It is actually a file named somewhere/over/the/rainbow.mp3. Understanding this is important for using listbucket filtering.
While Cloud Storage does not support subdirectories per se, it allows you to use subdirectory delimiters inside filenames. This basically means that the path to your file will still look exactly as if it was inside a subdirectory, even though it is not. This apparently should concern you only when you're iterating over the entire contents of the bucket.
From the Request URIs documentation:
URIs for Standard Requests
For most operations you can use either of the following URLs to access objects:
storage.googleapis.com/<bucket>/<object>
<bucket>.storage.googleapis.com/<object>
This means that the public URL for their example would be http://storage.googleapis.com/my_bucket/somewhere/over/the/rainbow.mp3. Their service would interpret this as bucket=my_bucket and object=somewhere/over/the/rainbow.mp3 (i.e. no notion of subdirectories, just an object name with embedded slashes in it); the browser however will just see the path /my_bucket/somewhere/over/the/rainbow.mp3 and will interpret it as if the filename is rainbow.mp3.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.