I'm creating a cli tool to move file around in a user Google Drive space. I'm using Python and Google Drive Api Python SDK to do that and I've created this repo.
Now I have to run every midnight this tool to move file from a folder to another with input from user. Locally I can create credentials.son and retrieve credentials from it, generate the token.json file and use it to authenticate to Drive api. But in public CI environment, I would save my credentials in a Github Secret and give them to my tool at runtime using an option.
Can I do that?
There's some security issues?
I would publish this tool in some forums (like Python SubReddit) to get suggesstions and improvements, but before do that, I'd like to make this move file function as complete as possible.
Related
I've built a desktop app in python and want some friends to use it, I need that when they run it the app will access some online database to make sure that its actually them and not someone else.
I chose google drive for the mean time (created a new user for the app) and everything works with pydrive but it seems that the google user needs to be logged in.
How can the app access the drive to get the necessary information from other computers in which this specific google user is not logged in?
I am working on a Google Colab notebook that requires the user to mount google drive using the colab.drive python library. They then input relative paths on the local directory tree (/content/drive/... by default on that mount) to files of interest for analysis. Now, I want to use a Google Sheet they can create as a configuration file. There is lots of info on how to authenticate gspread and fetch a sheet from its HTTPS url, but I can't find any info on how to access the .gsheet file using gspread that is already mounted on the local filesystem of the colab runtime.
There are many tutorials using this flow: https://colab.research.google.com/notebooks/io.ipynb#scrollTo=yjrZQUrt6kKj , but I don't want to make the user authenticate twice (having already done so for the initial mount), and i don't want to make them input some files as relative path, some as HTTPS URL.
I had thought this would be quite like using gspread to work with google sheets that I might have on my locally mounted drive as well. But, I haven't seen this workflow anywhere either. Any pointers in that direction might help me out as well.
Thank you!
Instead of adding .gsheet on colab's drive you can try storing it in the user's drive and later fetch from there when needed. So until that kernel is running you won't have to re-authenticate the user.
I'm also not finding anything to authenticate into colab from other device. So you would consider modifying your flow a bit.
I tried using gsutil to download files in a bucket, but now would like to incorporate the download in a python script to automate the download process (for downloading specific days data). The following gsutil code worked fine.
gsutil -m cp -r gs://gcp-public-data-goes-16/GLM-L2-LCFA/2019/001 C:\dloadFiles
Using the storage client I have tried:
from google.cloud import storage
client = storage.Client()
with open('C:\dloadFiles') as file_obj:
client.download_blob_to_file(
'gs://gcp-public-data-goes-16/GLM-L2-LCFA/2019/001', file_obj)`
I get error "DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started"
This is a publicly available bucket.
You did not setup GOOGLE_APPLICATION_CREDENTIALS
Follow below link and setup credentials
https://stackoverflow.com/questions/45501082/set-google-application-credentials-in-python-project-to-use-google-api
After setting up credentials your code will work
After authenticating with your GCP credentials, you will also need to run:
gcloud auth application-default
To authenticate your application SDKs, such as your Python client libraries. Then you will be able to interact with GCP services via Python.
Also, you are copying a whole load of files with your gsutil command and not just one as you're doing with python. So you probably want to list_blobs first and then iteratively download them to files.
Also check out blob.download_to_file save you some coding (docs here). With that you can send a blob to a filename directly, without opening the file first.
First thing, turn off public on this bucket unless you really need the bucket to be public. For private access, you should use a service account (your code) or OAuth credentials.
If you are running this code in a Google Compute Service, credentials will be automatically discovered (ADC).
If you are running outside of Google Cloud, change this line:
client = storage.Client()
To this:
client = storage.Client().from_service_account_json('/full/path/to/service-account.json')
This line in your code is trying to open a directory. This is not correct. You need to specify a file name and not a directory name. You also need to specify write permission:
with open('C:\dloadFiles') as file_obj:
Change to
with open('c:/directory/myfilename', 'w')
Or for binary (data) files:
with open('c:/directory/myfilename', 'wb')
I am assuming that this path is a file blob and not a "simulated" folder on GCS. If this is a folder, you will need to change it to a file (storage object blob).
gs://gcp-public-data-goes-16/GLM-L2-LCFA/2019/001
I am not hosting the project on git or anything like that. I'm trying to use Pydrive and it is not letting me load service account credentials from Heroku environment variables. If I put the json credentials file with my project, is there any chance someone could find it in my Heroku project? This project is computer to computer, basically generating Word docs in a Google Team Drive. There is no web interface for it.
I'm having trouble submitting an Apache Beam example from a local machine to our cloud platform.
Using gcloud auth list I can see that the correct account is currently active. I can use gsutil and the web client to interact with the file system. I can use the cloud shell to run pipelines through the python REPL.
But when I try and run the python wordcount example I get the following error:
IOError: Could not upload to GCS path gs://my_bucket/tmp: access denied.
Please verify that credentials are valid and that you have write access
to the specified path.
Is there something I am missing with regards to the credentials?
Here are my two cents after spending the whole morning on the issue.
You should make sure that you login with gcloud on your local machine, however, pay attention to the warning message that return from gcloud auth login:
WARNING: `gcloud auth login` no longer writes application default credentials.
These credentials are required for the python code to identify your credentials properly.
Solution is rather simple, just use:
gcloud auth application-default login
This will write a credentials file under: ~/.config/gcloud/application_default_credentials.json which is used for the authentication in the local development env.
You'll need to create a GCS bucket and folder for your project, then specify that as the pipeline parameter instead of using the default value.
https://cloud.google.com/storage/docs/creating-buckets
Same Error Solved after creating a bucket.
gsutil mb gs://<bucket-name-from-the-error>/
I have faced the same issue where it throws up the IO error. Things that helped me here are (not in the order):
Checking the Name of the bucket. This step helped me a lot. Bucket names are global. If you make mistake in the bucket-name while accessing your bucket then you might be accessing buckets that you have NOT created and you don't have permission to.
Checking the service account that you have filled in:
export GOOGLE_CLOUD_PROJECT= yourkeyfile.json
Activating the service account for the key file you have plugged in -
gcloud auth activate-service-account --key-file=your-key-file.json
Also, listing out the auth accounts available might help you too.
gcloud auth list
One solution might work for you. It did for me.
In the cloud shell window, click on "Launch code Editor" (The Pencil Icon). The editor will work in Chrome (not sure about Firefox), it did not work in Brave browser.
Now, browse to your code file [in the launched code editor on GCP] (.py or .java) and locate the pre-defined PROJECT and BUCKET names and replace the name with your own Project and Bucket names and save it.
Now execute the file and it should work now.
Python doesn't use gcloud auth to authenticate but it uses the environment variable GOOGLE_APPLICATION_CREDENTIALS. So before you run the python command to launch the Dataflow job, you will need to set that environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key"
More info on setting up the environment variable: https://cloud.google.com/docs/authentication/getting-started#setting_the_environment_variable
Then you'll have to make sure that the account you set up has the necessary permissions in your GCP project.
Permissions and service accounts:
User service account or user account: it needs the Dataflow Admin
role at the project level and to be able to act as the worker service
account (source:
https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#worker_service_account).
Worker service account: it will be one worker service account per
Dataflow pipeline. This account will need the Dataflow Worker role at
the project level plus the necessary permissions to the resources
accessed by the Dataflow pipeline (source:
https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#worker_service_account).
Example: if Dataflow pipeline’s input is Pub/Sub topic and output is
BigQuery table, the worker service account will need read access to
the topic as well as write permission to the BQ table.
Dataflow service account: this is the account that gets automatically
created when you enable the Dataflow API in a project. It
automatically gets the Dataflow Service Agent role at the project
level (source:
https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#service_account).