Fetching Google Sheet data from locally mounted drive via gspread - python

I am working on a Google Colab notebook that requires the user to mount google drive using the colab.drive python library. They then input relative paths on the local directory tree (/content/drive/... by default on that mount) to files of interest for analysis. Now, I want to use a Google Sheet they can create as a configuration file. There is lots of info on how to authenticate gspread and fetch a sheet from its HTTPS url, but I can't find any info on how to access the .gsheet file using gspread that is already mounted on the local filesystem of the colab runtime.
There are many tutorials using this flow: https://colab.research.google.com/notebooks/io.ipynb#scrollTo=yjrZQUrt6kKj , but I don't want to make the user authenticate twice (having already done so for the initial mount), and i don't want to make them input some files as relative path, some as HTTPS URL.
I had thought this would be quite like using gspread to work with google sheets that I might have on my locally mounted drive as well. But, I haven't seen this workflow anywhere either. Any pointers in that direction might help me out as well.
Thank you!

Instead of adding .gsheet on colab's drive you can try storing it in the user's drive and later fetch from there when needed. So until that kernel is running you won't have to re-authenticate the user.
I'm also not finding anything to authenticate into colab from other device. So you would consider modifying your flow a bit.

Related

How can I hide non-sensitive credentials on a Google Colab Notebook?

I have a Google Colab Notebook that is using psycopg2 to connect with a free Heroku PostgreSQL instance. I'd like to share the notebook with some colleagues for educational purposes to view and run the code.
There is nothing sensitive related to the account / database but would still like to hide the credentials used to make the initial connection without restricting their access.
My work around was creating a Python module that contained a function who performed the initial connection with credentials. I converted the module into a binary .pyc, uploaded it to Google Drive, downloaded the binary into the Notebook's contents via shell command then used it as an import.
It obviously isn't secure but provides the obfuscation layer I was looking for.

Ask to user credentials to access Google Drive Api

I'm creating a cli tool to move file around in a user Google Drive space. I'm using Python and Google Drive Api Python SDK to do that and I've created this repo.
Now I have to run every midnight this tool to move file from a folder to another with input from user. Locally I can create credentials.son and retrieve credentials from it, generate the token.json file and use it to authenticate to Drive api. But in public CI environment, I would save my credentials in a Github Secret and give them to my tool at runtime using an option.
Can I do that?
There's some security issues?
I would publish this tool in some forums (like Python SubReddit) to get suggesstions and improvements, but before do that, I'd like to make this move file function as complete as possible.

How to download numpy array files from an online drive

I have a dataset contains hundreds of numpy arrays looks like this,
I am trying to save them to an online drive so that I can run the code with this dataset remotely from a sever. I cannot access the drive of the server but can only run code script and access the terminal. So I have tried with google drive and Onedrive, and looked up how to generate a direct download link from those drives but it did not work.
In short, I need to be able to get those files from my python scripts. Could anyone give some hints?
You can get the download URLs very easily from Drive. I assume that you already uploaded the files into a Drive folder. Then you can easily set up a scenario to download the files on Python. First you would need an environment on Python to connect to Drive. If you don't currently have one, you can follow this guide. That guide will install the required libraries, credentials and run a sample script. Once you can run the sample script you can make minor modifications to reach your goal.
To download the files you are going to need their ids. I am assuming that you already know them, but if you don't you could retrieve them by doing a Files.list on the folder where you keep the files. To do so you can use '{ FOLDER ID }' in parents as the q parameter.
To download the files you only have to run a Files.get request by providing the file id. You will find the download URL on the webContentLink property. Feel free to leave a comment if you need further clarifications.

How to upload file using service account and python

I currently have a python code and already created a service account. I now need a way in order to upload file into a google drive folder shared to the service account.
Bonus points is finding first if a certain folder already exists, if not create the folder then upload the file over there.
Check the Google Drive API documentation and specially the Quickstart guide with Python. Also, read this Stack Overflow guide to learn how to write good questions. :D

Persisting data in Google Colaboratory

Has anyone figured out a way to keep files persisted across sessions in Google's newly open sourced Colaboratory?
Using the sample notebooks, I'm successfully authenticating and transferring csv files from my Google Drive instance and have stashed them in /tmp, my ~, and ~/datalab. Pandas can read them just fine off of disk too. But once the session times out , it looks like the whole filesystem is wiped and a new VM is spun up, without downloaded files.
I guess this isn't surprising given Google's Colaboratory Faq:
Q: Where is my code executed? What happens to my execution state if I close the browser window?
A: Code is executed in a virtual machine dedicated to your account. Virtual machines are recycled when idle for a while, and have a maximum lifetime enforced by the system.
Given that, maybe this is a feature (ie "go use Google Cloud Storage, which works fine in Colaboratory")? When I first used the tool, I was hoping that any .csv files that were in the My File/Colab Notebooks Google Drive folder would be also loaded onto the VM instance that the notebook was running on :/
Put that before your code, so will always download your file before run your code.
!wget -q http://www.yoursite.com/file.csv
Your interpretation is correct. VMs are ephemeral and recycled after periods of inactivity. There's no mechanism for persistent data on the VM itself right now.
In order for data to persist, you'll need to store it somewhere outside of the VM, e.g., Drive, GCS, or any other cloud hosting provider.
Some recipes for loading and saving data from external sources is available in the I/O example notebook.
Not sure whether this is the best solution, but you can sync your data between Colab and Drive with automated authentication like this: https://gist.github.com/rdinse/159f5d77f13d03e0183cb8f7154b170a
Include this for files in your Google Drive:
from google.colab import drive
drive.mount('/content/drive')
After it runs you will see it mounted in your files tab and you can access your files with the path:
'/content/drive/MyDrive/<your folder inside drive>/file.ext'
Clouderizer may provide some data persistence, at the cost of a long setup(because you use google colab only as a host) and little space to work on.
But, in my opinion that's best than have your file(s) "recycled" when you forget to save your progress.
As you pointed out, Google Colaboratory's file system is ephemeral. There are workarounds, though there's a network latency penalty and code overhead: e.g. you can use boilerplate code in your notebooks to mount external file systems like GDrive (see their example notebook).
Alternatively, while this is not supported in Colaboratory, other Jupyter hosting services – like Jupyo – provision dedicated VMs with persistent file systems so the data and the notebooks persist across sessions.
If anyone's interested in saving and restoring the whole session, here's a snippet I'm using that you might find useful:
import os
import dill
from google.colab import drive
backup_dir = 'drive/My Drive/colab_sessions'
backup_file = 'notebook_env.db'
backup_path = backup_dir + '/' + backup_file
def init_drive():
# create directory if not exist
drive.mount('drive')
if not os.path.exists(backup_dir):
!mkdir backup_dir
def restart_kernel():
os._exit(00)
def save_session():
init_drive()
dill.dump_session(backup_path)
def load_session():
init_drive()
dill.load_session(backup_path)
Edit: Works fine until your session size is not too big. You need to check if it works for you..
I was interested in importing a module in a separate .py file.
What I ended up doing is copying the .py file contents to the first cell in my notebook, adding the following text as the first line:
%%writefile mymodule.py
This creates a separate file named mymodule.py in the working directory so your notebook can use it with an import line.
I know that by running all of the code in the module would enable using the variables and functions in the notebook, but my code required importing a module, so that was good enough for me.

Categories