Storing images (from URL) to GAE Datastore - python

I am building a web application based on Google App Engine using python; however I have seem to have hit a roadblock.
Currently, my site fetches remote image URLs from an external website. The URLs are placed in a list and sent back to my application. I want to be able to store the respective images (not the URLs) in my datastore; in order to avoid having to fetch the remote images each time as well as having to deal with broken links.
The solutions I have found online all deal with a user having to upload their own images. Which I tried implementing, but I am not sure what happens to an uploaded image (or how it gets converted into a blob) once the user hits the submit button.
To my understanding, a blob is a collection of binary data stored as a single entity (from Wikipedia). Therefore, I tried using the following:
class rentalPropertyDB(ndb.Model):
streetNAME = ndb.StringProperty(required=True)
image = ndb.BlobProperty(default=None)
class MainPage(BaseHandler):
def get(self):
self.render("index.html")
def post(self):
rental = rentalPropertyDB()
for image in img_urls:
rental.image = urlfetch.Fetch(image).content
rental.put()
The solution to this question: Image which is stored as a blob in Datastore in a html page
is identical to mine, however the solution suggests to upload the image to the blobstore and uses:
upload_files = self.get_uploads('file')
blob_info = upload_files[0]
This confuses me because I am not sure what exactly 'file' refers to. Would I replace 'file' with the URL of each image? Or would I need to perform some operation to each image prior to replacing?
I have been stuck on this issue for at least two days now and would greatly appreciate any help that's provided. I think the main reason why this is confusing me so much is because of the variety of methods used in each solution. I.e. using Google Cloud Storage, URLFetch, Images API, and the various types of ndb.Blobstore's (BlobKeyProperty vs. BlobProperty) and etc.
Thank you.

Be careful with blob inside Models. A Model cannot be more than 1 Mo including the blobproperty.
If we listen Google, there is no good reason to use the blobstore. If you can, use Google Cloud Storage. It made to store files.

Related

How to properly lazy load images from a blob container?

I'm willing to implement a lazy loading approach to load images stored in an "images" folder inside an azure storage account.
I have a container in my flutter app where whenever the user scrolls down the bottom new 10 images will be loaded from the storage based on the most recent(timestamp).
I looked into this sample retrieved from: https://azuresdkdocs.blob.core.windows.net/$web/python/azure-storage-blob/12.0.0b5/index.html#id20
from azure.storage.blob.aio import ContainerClient
container = ContainerClient.from_connection_string(conn_str="my_connection_string", container_name="my_container")
blob_list = []
async for blob in container.list_blobs():
blob_list.append(blob)
print(blob_list)
But it's not what I need.I am looking for a way to make a get request that will retrieve me new set of images whenever the function is invoked..
Thankful for suggestions!
I was able to implement a lazy loading approach by using the marker continuation object
Example:
mark=req.params.get('NextMarker')
entit = table_service.query_entities('UserRequests','PartitionKey eq \'' + emailAddress + '\'',num_results=21,select= '..', marker=mark)
Dict = {"NextMarker": entit.next_marker}
return json.dumps(Dict)
This way I am able to send the marker in the http get request every time to get the second batch.
I hope this helps someone one day!
If you want to list blobs by blob creation time, unfortunately, it is not supported by Azure list blobs API(SDKs are based on APIs). Blob creation time belongs to blob properties, and as the official doc indicates, blob properties can't be set as a request param.
So if you want to fetch all new images for each request, maybe you should get a blob list first and sort them yourself and cut out the items that you need. There will be some extra codes that you need to write. But if you use Azure PowerShell to do that, you can implement the whole process easier. You can refer to this similar requirement.

migrating images to google app engine datastore or blobstore

I have property model, containing a field image_url.
class Property(ndb.Model):
date_created=data.UTCDateTimeProperty(auto_now_add=True)
# some other fields here
image_url = ndb.StringProperty(indexed=False)
and image model,
class Image(ndb.Model):
property = ndb.KeyProperty()
file = ndb.KeyProperty(indexed=False)
# some other fields
image_url = ndb.StringProperty(indexed=False)
Now I have 'n' number of images for each property in my local machine. Name of each image is mapped to corresponding property id in csv file. I want to bulk upload all these images from my local machine to google app engine datastore or blobstore.
I tried to google up but feel like I am stuck, any help or reference would be highly appreciated.
Google Cloud Storage might be a better option for you:
You get a nice program to work with it, gsutil, that will let you upload easily from the console, so you can write your own scripts :)
You can keep the filenames you already have, and setup your own directory structure so that it makes more sense for your app. If data is static then you might not even need support models.
Example, from the links above, on how you'd end up uploading your images:
gsutil cp *.jpg gs://images
The cp command behaves much like the Unix cp command with the recursion (-R) option, allowing you to copy whole directories or just the contents of directories. gsutil also supports wildcards, which makes it easy for you to copy or move batches of files.

upload image file to datastore using endpoints in python?

I am trying to make a form for adding users that will store user information to my database.
I want to upload an image file using Cloud Endpoints (python). I do not have any idea how to do it.
What will be the input class (request class) and output class (response class)?
#endpoints.method(inputclass, outputclass,
path='anypath', http_method='GET',
name='apiname')"
What url will I provide in the action= form field for uploading the image? How can I then show the file?
You have two ways to store data files (include image file).
The first one is to convert your image in base 64 format, and store it on datastore (Not the best).
The second one, is to store your image file in Google Cloud Storage (That is the best) or the blobstore (By Google themself, there is no good reason to use Blobstore).
So you have to store, your image file to Google Cloud Storage with your endpoint: https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/?hl=fr
Personnaly, I use a servlet (App Engine) to store image in GCS. My endpoints call my servlet and pass image in parameter, and my servlet store image in GCS. It works very well =)
Hope I help you..

How to retrieve videos from Instagram Python client?

I'm using Python Instagram Client to retrieve data from Instagram. I have created an Instagram account for testing purposes where I have three media content: two images and one video. After making a request using Python Instagram Client using python console, I get next response (django shell):
>>> recent_media, next = api.user_recent_media()
>>> recent_media
>>> [Media: 673901579909298365_1166496117, Media: 673880146437045009_1166496117, Media: 673827880594143995_1166496117]
I have inspected all the media objects, and there is no video information in them, in spite of last media object being a video. All three objects return an attribute called images; last media object, despite being a video as I said before, has also an images attribute with a video snapshot in different resolutions. After reading Instagram Rest API, my understanding is last Media object should have an attribute called videos, which would be a dict, and video information would be there (basically I'm interested in retrieving videos' urls).
My question is: is Python Instagram Client outdated so it returns no video information at all and I have to use the rest api to get video info? Or am I doing something wrong in my requests?
Thanks in advance
You are not doing anything wrong. The Python API for Instagram is full of missing features and bugs. I've fixed them on my own local version, but I haven't pushed anything to the official github and I am not sure they would accept the changes.
What is happening in general is their API client is stripping out data when it converts things back to a model. Why they didn't just use something that would convert dictionaries to dot notation models, I am unsure. It's completely manually and full of mistakes/bad Python IMO. Anyway, the gist is that the data is all there, but they are ignoring it when converting from dictionaries into their proprietary API models.
Here is what I found is problematic for what you are trying to do:
No "type" information is returned in the API media model. There is a "type" property that you can check for any media related response to see if it is an image or video. You can add this yourself as I did, or you can try to just assume that anything you get that has a "videos" section with populated data is a video.
No "videos" information is returned with an API media model. I also just added this myself. There are two URLs you can use which you can see if you look at the json, one for standard resolution and one for low resolution. When you process the response, these properties aren't always there so your code should make checks with get/getattr/etc. accordingly.
The paging information in the API is also broken IMO. You are supposed to get back an object with a few different pieces of information, part of which they claim is deprecated (why they are inflating the response at the same version endpoint with this info, I have no idea). The only piece of information you get back here is the next url for paging, which is completely useless in the python API client. There is no reason to get back a REST URL that you would have to manually call and parse outside the API when the whole reason you're using the python client is to avoid that. Anyway, what you will need to do is patch the API client to again send you back the proper models for this or simply parse it out of the URL. I chose to do the latter originally because originally I hoped to not patch the client itself. You'll run into an additional problem because some end points such as tags actually change the querystring parameters in the paging url you get back, so you'll have to conditionally check what they give you. Again, the design is inconsistent and that's not a good thing IMO.
I can post code for all of this if you like, but if you want to try to find a more elegant way to patch all this, you want to look in I believe models.py in the API. I'm not in front of the code right now, but here's what I did from memory.
Create a new video model that inherits from the media model, as they did for the image model.
Where they read the response dictionary, parse out the videos and add them to the response dictionary as they did the images. Remember to add a pre-condition to check if the videos key is missing as I mentioned earlier.
Parse the type property and add it to the response model.
Add a model for the paging data and parse it out into the model. Alternatively, just wrap this via some querystring parsing in your own code if you prefer.
If you do all the above, you should be able to simply read a "videos" property and get the 2 video URLs. That's it. The information is always coming back in the response, just remember they are dropping it in the code. I'm happy to provide code\more info if you like.
Edit: Here's some code - put in models.py in object_from_dictionary in the API:
#add the videos
if "videos" in entry:
new_media.videos = {}
for version, version_info in entry['videos'].iteritems():
new_media.videos[version] = Video.object_from_dictionary(version_info)
#add the type
new_media.type = entry.get('type')
#Add this class as well for the videos....
class Video(ApiModel):
def __init__(self, url, width, height):
self.url = url
self.height = height
self.width = width
def __unicode__(self):
return "Video: %s" % self.url

How to serve url to audio in the gae blobstore

I have audio file stored as blobs in google app engine's blobstore. I'm not sure how to get a good url to pass to the client side to play the blob. I would like to do something like the image library.
image.get_serving_url()
But, there is no audio module. So, is there a good way to get a url from a blob to play audio or even better any media?
The rendering of an image is done by the browser. It's the same for audio, the browser decides what to do with a resource you point it to. For that, you need to add the correct mime type[1] header. If the file already had the correct mime type set when being uploaded you don't need to do this manually.
As for serving the blob, you need to create a blobstore download handler:
http://code.google.com/appengine/docs/python/tools/webapp/blobstorehandlers.html#BlobstoreDownloadHandler
[1] http://en.wikipedia.org/wiki/Internet_media_type
I think what you're looking for is something like how S3 works, where the blobs you upload are automatically given a URL that can then be dropped directly in to the browser. Blobstore was designed to primarily give developers control over their URLs and fine grained control over access to the blobs. It does not have the facility to simply provide a URL based on, say, the blob reference. I think schuppe's answer is correct in describing what you need to do.
If you are interested in simply serving a blob to a user without any kind of authentication or restriction, it's not that hard to write a handler. The one that is in the documentation that schuppe referred you to will work ok, however, be careful, because it could open your app up to certain types of DOS attacks. Also, if you do it as the documentation does it, anyone who has one of your blob-reference strings can access any blob throughout your whole application, whether you mean to or not. Therefore you should build some additional access control around it.
Of course, if you're not concerned with controlling access to the data, that solutions is simple and will work fine.

Categories