I am working on a Google Colab notebook that requires the user to mount google drive using the colab.drive python library. They then input relative paths on the local directory tree (/content/drive/... by default on that mount) to files of interest for analysis. Now, I want to use a Google Sheet they can create as a configuration file. There is lots of info on how to authenticate gspread and fetch a sheet from its HTTPS url, but I can't find any info on how to access the .gsheet file using gspread that is already mounted on the local filesystem of the colab runtime.
There are many tutorials using this flow: https://colab.research.google.com/notebooks/io.ipynb#scrollTo=yjrZQUrt6kKj , but I don't want to make the user authenticate twice (having already done so for the initial mount), and i don't want to make them input some files as relative path, some as HTTPS URL.
I had thought this would be quite like using gspread to work with google sheets that I might have on my locally mounted drive as well. But, I haven't seen this workflow anywhere either. Any pointers in that direction might help me out as well.
Thank you!
Instead of adding .gsheet on colab's drive you can try storing it in the user's drive and later fetch from there when needed. So until that kernel is running you won't have to re-authenticate the user.
I'm also not finding anything to authenticate into colab from other device. So you would consider modifying your flow a bit.
I have a json file with over 16k urls of images, which I parse using a python script and use urllib.request.urlretrieve in it to retrieve images. I uploaded the json file to google drive and run the python script in google Colab.
Though the files were downloaded (I checked this using a print line in the try block of urlretrieve) and it took substantial time to download them, I am unable to see where it has stored these files. When I had run the same script on my local machine, it stored the files in the current folder.
As an answer to this question suggests, the files may be downloaded to some temporary location, say, on some cloud. Is there a way to dump these temporary files to google drive?
(*Note I had mounted the drive in the colab notebook, still the files don't appear to be stored in google drive)
Colab stores files in some temp location which is new every time you run the notebook. If you want your data to persist across sessions you need to store it in GDrive. For that you need to map some GDrive folder in your notebook and use it as path. Also, you need to give the Colab permissions to access your GDrive
After mounting GDrive you need to move files from the Colab to GDrive using command:
!mv /content/filename /content/gdrive/My\ Drive/
I want to know how we can upload the .txt & .vcf files.
I've already mounted the drive then done some sorting and downloading of data with wget in Collab. But I was not able to find resource to export or commit changes to drive.
Please help me!!
Once you are inside a particular notebook, you can use the file browser on the left to upload files to be used the current notebook. Remember that they will be deleted once your current session ends, so you will have to upload them again when you open the notebook later. If you have uploaded them elsewhere, you can simply use !wget to download them to your notebook's temporary storage.
Edit: To copy data, simply use !cp to copy the file(s) from your notebook storage to the drive once you have mounted it. For example, here is how I would copy data.xyz:
from google.colab import drive
drive.mount('/content/gdrive')
!cp data.xyz "gdrive/My Drive/data.xyz"
You may just as simply use !mv to move data to the drive instead of copying it. Just like that, you can copy/move data from the drive to the Collaboratory notebook too.
I am running Google Colab with a local runtime. I use this command to start Jupyter:
jupyter notebook --NotebookApp.allow_origin='https://colab.research.google.com' --port=8888 --NotebookApp.port_retries=0
When I previously connected I was able to mount Google Drive with this code in the notebook:
from google.colab import drive
drive.mount('/content/drive')
I was able to connect to Google Drive and access files while on local runtime.
However, now I am getting this error:
ModuleNotFoundError: No module named 'google.colab'
I see other people have this problem, and some suggest using PyDrive. But I am certain I was connected to Google Drive without using PyDrive.
I suspect the first command I ran to start Jupyter was different when Google Drive was able to connect.
Is there a specific flag I have to add to that first command
The google.colab libraries only exist on managed backends.
When running on your local machine, you'll want to instead use Drive's existing sync apps available here: https://www.google.com/drive/download/
Has anyone figured out a way to keep files persisted across sessions in Google's newly open sourced Colaboratory?
Using the sample notebooks, I'm successfully authenticating and transferring csv files from my Google Drive instance and have stashed them in /tmp, my ~, and ~/datalab. Pandas can read them just fine off of disk too. But once the session times out , it looks like the whole filesystem is wiped and a new VM is spun up, without downloaded files.
I guess this isn't surprising given Google's Colaboratory Faq:
Q: Where is my code executed? What happens to my execution state if I close the browser window?
A: Code is executed in a virtual machine dedicated to your account. Virtual machines are recycled when idle for a while, and have a maximum lifetime enforced by the system.
Given that, maybe this is a feature (ie "go use Google Cloud Storage, which works fine in Colaboratory")? When I first used the tool, I was hoping that any .csv files that were in the My File/Colab Notebooks Google Drive folder would be also loaded onto the VM instance that the notebook was running on :/
Put that before your code, so will always download your file before run your code.
!wget -q http://www.yoursite.com/file.csv
Your interpretation is correct. VMs are ephemeral and recycled after periods of inactivity. There's no mechanism for persistent data on the VM itself right now.
In order for data to persist, you'll need to store it somewhere outside of the VM, e.g., Drive, GCS, or any other cloud hosting provider.
Some recipes for loading and saving data from external sources is available in the I/O example notebook.
Not sure whether this is the best solution, but you can sync your data between Colab and Drive with automated authentication like this: https://gist.github.com/rdinse/159f5d77f13d03e0183cb8f7154b170a
Include this for files in your Google Drive:
from google.colab import drive
drive.mount('/content/drive')
After it runs you will see it mounted in your files tab and you can access your files with the path:
'/content/drive/MyDrive/<your folder inside drive>/file.ext'
Clouderizer may provide some data persistence, at the cost of a long setup(because you use google colab only as a host) and little space to work on.
But, in my opinion that's best than have your file(s) "recycled" when you forget to save your progress.
As you pointed out, Google Colaboratory's file system is ephemeral. There are workarounds, though there's a network latency penalty and code overhead: e.g. you can use boilerplate code in your notebooks to mount external file systems like GDrive (see their example notebook).
Alternatively, while this is not supported in Colaboratory, other Jupyter hosting services – like Jupyo – provision dedicated VMs with persistent file systems so the data and the notebooks persist across sessions.
If anyone's interested in saving and restoring the whole session, here's a snippet I'm using that you might find useful:
import os
import dill
from google.colab import drive
backup_dir = 'drive/My Drive/colab_sessions'
backup_file = 'notebook_env.db'
backup_path = backup_dir + '/' + backup_file
def init_drive():
# create directory if not exist
drive.mount('drive')
if not os.path.exists(backup_dir):
!mkdir backup_dir
def restart_kernel():
os._exit(00)
def save_session():
init_drive()
dill.dump_session(backup_path)
def load_session():
init_drive()
dill.load_session(backup_path)
Edit: Works fine until your session size is not too big. You need to check if it works for you..
I was interested in importing a module in a separate .py file.
What I ended up doing is copying the .py file contents to the first cell in my notebook, adding the following text as the first line:
%%writefile mymodule.py
This creates a separate file named mymodule.py in the working directory so your notebook can use it with an import line.
I know that by running all of the code in the module would enable using the variables and functions in the notebook, but my code required importing a module, so that was good enough for me.