I want to automatically sync new files that are added to google drive to google cloud storage.
I have seen various people asking this on the web and most of them suggest something along the lines of:
Develop an app to poll for new files in the drive
Retrieve new files and upload them to GCS
If someone has already written an open-source library/script for this then I would like to reuse it instead of re-inventing the wheel.
Edit:
I have now written a watcher webhook API in python, and subscribed to the folder to get notification when a new file is added to google drive.
Now the issue is, when the webhook is called by Google, no information is provided about the new files/folders added.
I understand you are looking for a method to sync content on different services( NFS, Disks, etc ) to GCS in order to have a backup there and make data accessible to applications which can only access to the cloud storage buckets.
We don't have a google owned solution for this however we have several partners link which offer proprietary solutions which might work for your usecase.
Related
I need to load CSV files from Google Drive into BigQuery automatically and I was wondering if it's possible to do it that way:
Google Drive Folder
Pub/Sub, Cloud Functions, DriveApi... ??
Cloud Storage Bucket
Bigquery
I have developed a python script that uploads the CSV file stored in Cloud Storage automatically to BigQuery, now I need to create the workflow between Google Drive and Cloud Storage.
I've been researching but really donĀ“t really know how to proceed.
Any hints?
You will need to develop an app to listen for changes, Google App Engine works well here or Cloud Functions.
The app will need to implement the Retrieve Changes logic that makes sense to your use case.
See these Google Drive API docs https://developers.google.com/drive/api/v3/manage-changes
With Drive, I recommend asking whether the OAuth is worth it for any app. Asking your users to submit to a lightweight frontend might be easy and faster to develop.
Try using Google drive API to pull data from google drive and load it to which ever location you want, i.e. GCS, BQ table and so on.
You can refer following example to create a code to achieve same.
You will need to develop an app to listen for changes, Google App Engine works well here or Cloud Functions.
The app will need to implement the Retrieve Changes logic that makes sense to your use case.
See these Google Drive API docs https://developers.google.com/drive/api/v3/manage-changes
With Drive, I recommend asking whether the OAuth is worth it for any app. Asking your users to submit to a lightweight frontend might be easy and faster to develop.
I am looking at Google Drive API tutorials and they tell you to store credentials.json in my working directory (eg https://developers.google.com/drive/api/v3/quickstart/python).
My goal is to make a script which regularly runs on my system and downloads files from Google Drive. My concern is: does storing the credentials.json file leave me open to security risks? If anyone gets access to this file, can they not use it to gain access to all my Google Drive data?
If so, then how should I store the credentials file in a secure manner?
The credentials.json file is used to create user credentials for your application. If someone got a hold of that they could pretend to be your application and request access of users and then do what they wanted with user data. It is very important that you keep this file secure.
Note: If you are only accessing your google drive account and not one owned by other users then you should consider looking into service accounts.
I am currently trying to use Google's cloud vision API for my project. The problem is that Google cloud vision API for document text detection accepts only Google Cloud Services URI as input and output destination. But I have all my projects, data in Amazon S3 server which cant be directly used with this API.
Points to be noted:-
All data should be in kept in S3 only.
I can't change my cloud storage to GCS now.
I can't download files from S3 and upload to GCS manually.The number
of files that are incoming per day is more than 1000 and less than
100,000.
Even if I could automate downloading and uploading of the pdf, this
would serve as a bottleneck for the entire project, since I would have to deal
with concurrency issues and memory management.
Is there any workaround to make this API work with S3 URI? I am in need of your help.
Thank You
Currently, Vision API doesn't work with URLs, apart from the Google Cloud Storage ones. There's a feature request for the image search related to use the API with specific URLs where you could ask to consider this feature for the PDF/TIFF documents too, or raise a new feature request for this scenario.
Is there any way using OAuth to download all content of a google bucket directory to a local directory.
I found two ways using (get request object) from storage API and gsutil. But since API uses direct name downloading I have to first parse all the list of bucket content and then send get request and then download it. I find gsutil more convenient but for this, I have to hard code details for the credential.
Basically, i am developing a client related application where I have to download the big query table data to the client local server
Can anyone help me for this
Unless your application knows ahead of time the object names that you want to download, you'll need to perform a list followed by GETs for each object.
You can use the gcloud-python client library to do this. You can configure your client application has the OAuth2 credentials and the library should handle the rest of the necessary authentication for you. See the documentation here for the basics of authentication, and [here](https://googlecloudplatform.github.io/google-cloud-python/stable/storage-blobs.html for interacting with Google Cloud Storage objects.
Background
I've created a slack bot that listens in a channel for when a file is uploaded, downloads the content and re-uploads it to our Google Drive account and deletes it from Slack. This all works perfectly using slack-client api and google drive api in Python.
Problem
What I would like to recreate is the view that the Google Drive slack integration creates when you import a file from Google Drive, instead of just having a link (like my bot current is capable of).
I'm currently using slack_client.api_call("chat.postMessage", ..., unfurl_media=True, unfurl_links=True) however, that does not solve the problem (it still just appears as a link as seen above, instead of an attachment like the Google Drive integration.)
Anyone have any recomendations on how to achieve the same look at the Google Drive integration? The idea is that the thumbnails and previews of attachments should not go away, but everything should be hosted on Google Drive as opposed to Slack's servers since we share a ton of files.