PDF/TIFF Document Text Detection - python

I am currently trying to use Google's cloud vision API for my project. The problem is that Google cloud vision API for document text detection accepts only Google Cloud Services URI as input and output destination. But I have all my projects, data in Amazon S3 server which cant be directly used with this API.
Points to be noted:-
All data should be in kept in S3 only.
I can't change my cloud storage to GCS now.
I can't download files from S3 and upload to GCS manually.The number
of files that are incoming per day is more than 1000 and less than
100,000.
Even if I could automate downloading and uploading of the pdf, this
would serve as a bottleneck for the entire project, since I would have to deal
with concurrency issues and memory management.
Is there any workaround to make this API work with S3 URI? I am in need of your help.
Thank You

Currently, Vision API doesn't work with URLs, apart from the Google Cloud Storage ones. There's a feature request for the image search related to use the API with specific URLs where you could ask to consider this feature for the PDF/TIFF documents too, or raise a new feature request for this scenario.

Related

Automatically sync new files from google drive to google cloud storage

I want to automatically sync new files that are added to google drive to google cloud storage.
I have seen various people asking this on the web and most of them suggest something along the lines of:
Develop an app to poll for new files in the drive
Retrieve new files and upload them to GCS
If someone has already written an open-source library/script for this then I would like to reuse it instead of re-inventing the wheel.
Edit:
I have now written a watcher webhook API in python, and subscribed to the folder to get notification when a new file is added to google drive.
Now the issue is, when the webhook is called by Google, no information is provided about the new files/folders added.
I understand you are looking for a method to sync content on different services( NFS, Disks, etc ) to GCS in order to have a backup there and make data accessible to applications which can only access to the cloud storage buckets.
We don't have a google owned solution for this however we have several partners link which offer proprietary solutions which might work for your usecase.

How to delivery content (video) by download with python sdk?

i'm a python developer, inexperienced in microsoft azure services.
For a client I have to allow downloading of videos using the azure media service (video streaming). I did find information on the subject in the documentation (https://learn.microsoft.com/en-us/azure/media-services/previous/media-services-deliver-asset-download), but I want to get there using python (so either the rest azure api, or the python sdk).
I'm starting to believe it's impossible.
I need your help please.
Everything you need to do should be completely possible with the Python SDK.
I do not recommend using the REST API directly! It does not have any built in retry policies that Azure Resource Management API requires. You can get into issues with that in production - unless you know what you are doing and roll your own retry logic.
Use the official Python SDK client for Media Services only.
Also, the link above for the REST API is pointing to the legacy v2 API - do not use that now. Use the latest v3 SDK client only here -
pip install azure-mgmt-media
We have a limited number of Python samples up here that show how to use the client SDK for Python - https://github.com/Azure-Samples/media-services-v3-python
None of us on the team are Python experts, and we don't seem to get a lot of contributions to that repo - so it is not anywhere near as comprehensive as our .NET samples here - https://github.com/Azure-Samples/media-services-v3-dotnet
But keep in mind that all the Azure SDK's are just auto generated off the REST API Swagger (Open API) - so they all use the exact same entities, and use the same JSON structure on the wire - so if you know what the REST API is doing and what the Entites are - you can easily port things around between languages. Helps to know Python first though!
You mentioned you want to download stuff - that will require the use of the Storage SDKs for python. Media Services just uses Azure Storage accounts. Meaning you can access the containers using SAS URl's to upload and download stuff. Look at the Storage samples for Python to see what to do there. https://pypi.org/project/azure-storage-blob/
The uploaded video are stored as an Assest file if the files are uploaded using Azure Media Services SDK. which will make it easier to stream video to different devices.
To stream or download an asset, you first need to "publish" it by creating a locator. Locators provide access to files contained in the asset.
Media Services supports two types of locators:
OnDemandOrigin locators, used to stream media (for example, MPEG DASH, HLS, or Smooth Streaming)
Access Signature (SAS) locators, used to download media files.
Once you create the locators, you can build the URLs that are used to stream or download your files.
Here's a guide for doing that using Rest API : https://learn.microsoft.com/en-us/azure/media-services/previous/media-services-rest-get-started
Note : you're uploading your videos directly to Azure Storage? If that's the case, instead of uploading your videos directly to Azure Storage, my suggestion would be to upload your videos using the Azure Media Services SDK
Azure Media Services has pretty good documentation which might help with your other asks: http://azure.microsoft.com/en-us/develop/media-services/resources/

It's possible to upload content to Cloud Storage automatically from Google Drive?

I need to load CSV files from Google Drive into BigQuery automatically and I was wondering if it's possible to do it that way:
Google Drive Folder
Pub/Sub, Cloud Functions, DriveApi... ??
Cloud Storage Bucket
Bigquery
I have developed a python script that uploads the CSV file stored in Cloud Storage automatically to BigQuery, now I need to create the workflow between Google Drive and Cloud Storage.
I've been researching but really donĀ“t really know how to proceed.
Any hints?
You will need to develop an app to listen for changes, Google App Engine works well here or Cloud Functions.
The app will need to implement the Retrieve Changes logic that makes sense to your use case.
See these Google Drive API docs https://developers.google.com/drive/api/v3/manage-changes
With Drive, I recommend asking whether the OAuth is worth it for any app. Asking your users to submit to a lightweight frontend might be easy and faster to develop.
Try using Google drive API to pull data from google drive and load it to which ever location you want, i.e. GCS, BQ table and so on.
You can refer following example to create a code to achieve same.
You will need to develop an app to listen for changes, Google App Engine works well here or Cloud Functions.
The app will need to implement the Retrieve Changes logic that makes sense to your use case.
See these Google Drive API docs https://developers.google.com/drive/api/v3/manage-changes
With Drive, I recommend asking whether the OAuth is worth it for any app. Asking your users to submit to a lightweight frontend might be easy and faster to develop.

How to download all files from google bucket directory to local directory using google oauth

Is there any way using OAuth to download all content of a google bucket directory to a local directory.
I found two ways using (get request object) from storage API and gsutil. But since API uses direct name downloading I have to first parse all the list of bucket content and then send get request and then download it. I find gsutil more convenient but for this, I have to hard code details for the credential.
Basically, i am developing a client related application where I have to download the big query table data to the client local server
Can anyone help me for this
Unless your application knows ahead of time the object names that you want to download, you'll need to perform a list followed by GETs for each object.
You can use the gcloud-python client library to do this. You can configure your client application has the OAuth2 credentials and the library should handle the rest of the necessary authentication for you. See the documentation here for the basics of authentication, and [here](https://googlecloudplatform.github.io/google-cloud-python/stable/storage-blobs.html for interacting with Google Cloud Storage objects.

Can I upload a file to GCS from Google Endpoints?

I'm trying to upload a file from API Rest (Google Endpoints) to GCS, but I have retrieve a lot of errors. I don't know if I'm using a bad way or simply Google Endpoints does not upload a file.
I'm trying who my customers upload files to my project bucket.
I read "Endpoints doesn't accept the multipart/form-data encoding so you can't upload the image directly to Endpoints".
Mike answered me at this post but dont know how to implement that on my project.
I'm using this libray (Python):
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/
If is possible, whats the better way? Any example?
Thanks so much.
I think what Mike means in the previous post, is that you should use Blobstore API to upload file to GCS, instead of using endpoints, and take the data again to the blobstore.
But that will depends on what platform is your client. If you use Web-based client, you should use ordinary way just as Mike has explained (by using HTML form and handler). But if you use Android or mobile client, you can use GCS Client library or GCS REST API.

Categories