CoLab Accessing Files - python

It is great that I can run jupyter notebooks in CoLab, but I am going crazy saving and loading files. For example, I am writing an assignment for my course and I include figures in it using the HTML tag. (I want to use HTML instead of markdown images so I can set the width.) So in a Text cell I have
<img src="CoLab04.png" width="250">
This works fine when I run the jupyter notebook on my laptop, but in CoLab, it can't find the image even when the image is in the same CoLab folder as the ipynb file. Err.
I have similar problems saving data files. On my laptop I can use the normal python functions open, write, close, etc. That code runs without complaint, but the files do not show up on Google Drive. Not in the CoLab folder or any other folder when I search all of my Google Drive. Err. I read TFM and use
from google.colab import drive, files
drive.mount('/content/gdrive')
fig.savefig("LED12.png") # saves a figure as a file
files.download("LED12.png")
This downloads a file to my laptop. Then I have to upload the file to a Google Drive folder so my students can see it.
Am I missing something? Why is it so hard to create and read Google Drive files using a Google-CoLab jupyter notebook?
I've read https://colab.research.google.com/notebooks/io.ipynb, but why is it so hard? I need something easy for novice students to use. If reading and writing files is this hard, I will have to recommend my students install jupyter on their laptops and not use CoLab.

It seems to me a sys.path problem.
After you mount My Drive by the following code
from google.colab import drive
drive.mount('/content/drive/')
then your main Google Drive can be read with
!ls /content/drive/My Drive/
If you have a sub-folder under My Drive that you wish to centralize your colab project, let's say you have projectA folder under your main Google Drive directory. You can add the projectA folder path to sys.path
import sys
sys.path.append("/content/drive/My Drive/projectA")
Then you should be able to save your fig as the same way you used in your local machine root path. The file will be save to your projectA folder where you run your colab code.
fig.savefig("LED12.png")
You should be able to see the file appear there. If this doesn't work, then try using absolute path when doing open, save, close etc. path sensitive operation:
working_path = '/content/drive/My Drive/projectA'
fig.savefig(os.path.join(working_path, "LED12.png"))

It may be simpler to load notebooks from GitHub, wherein image links in the same repository will be loaded more intuitively.
For example, the notebook below loads a set of images bundled in its GitHub repository.
https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.01-What-Is-Machine-Learning.ipynb
The markdown reference for the first graph:
![](figures/05.01-classification-1.png)
[figure source in Appendix](06.00-Figure-Code.ipynb#Classification-Example-Figure-1)
This corresponds to the GitHub repo here:
https://github.com/jakevdp/PythonDataScienceHandbook/
Building on this example, a common pattern for bundling data files is to add a !git clone ... command at the top of the notebook to bring in the entire repo in one shot.
The reason this is simpler to accomplish in GitHub than Drive is that GitHub has unified ACLs at a repostiory level, whereas Drive manages ACLs at the file level. So, it would be a bit cumbersome to have a Drive notebook shared publicly that referenced images or other Drive files that were not shared.

I have done this in Colab (reading, training my model and uploading my trained model) some days ago. Let's make it simple.
Please do the following steps. I am trying to cover both(reading csv as well as uploading a file).
Step 1 : Go to your google drive and create a folder: Colab and keep your files inside Colab folder.
Step 2 : Now, install pydrive in Colab jupyter notebook
!pip install pydrive
Step 3 : Run following commands for accessing Google drive File
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
Step 4 : Mount drive(Here you will get a link in Colab jupyter shell. Click the generated link and verify your google drive(Just copy and paste the generated code) )
from google.colab import drive
drive.mount('/content/drive/')
Step 5 : Authenticate and create the PyDrive client. Here do the same like step 4 (Click the generated link and verify your google drive(Just copy and paste the generated code) )
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
Step 6 : To get the file, replace the id with id(your file id) of file you want to access. For me, it was csv file. To get the id, go to share and generate a link. you will find something like : https://drive.google.com/file/d/xxxxxxxxxxxxxx/view?usp=sharing. Put it(xxxxxxxxxxxxxx) on below and do the same, how many files you want to read.
normal_1 = drive.CreateFile({'id':'13AR0sS1pndF0fTxmdjQRv_1Bv5aBNpkT'})
normal_1.GetContentFile('normal_1.csv')
normal_2 = drive.CreateFile({'id':'1Z0DO8M1Qco07kyVoxYSgxXBx6XYGBzJd'})
normal_2.GetContentFile('normal_2.csv')
abnormal = drive.CreateFile({'id':'12zFHDXVjreorRrHHhYrA1n82VQLuawsl'})
abnormal.GetContentFile('abnormal.csv')
Step 7 : Now, you can read those files and load in a dataframe for further use.
normal_1 = pd.read_csv('normal_1.csv', skiprows = np.arange(100, normal_1.shape[0]))
normal_2 = pd.read_csv('normal_2.csv', skiprows = np.arange(100, normal_2.shape[0]))
abnormal = pd.read_csv('abnormal.csv', skiprows = np.arange(50, abnormal.shape[0]))
Step 8 : Save the model to disk after training your model:Use joblib
from sklearn.externals import joblib
filename = 'model.sav'
joblib.dump(clf, filename)
# Upload model to you google drive
model_file = drive.CreateFile({'title' : 'model.sav'})
model_file.SetContentFile('model.sav')
model_file.Upload()
Now, go to your My drive and refresh it. You will finding something "model.sav". For the complete code in jupyter notebook file, you can visit my github link. I hope it will help you to solve your problem.

Related

Google Colab: Import data from google drive and make it possible to share it

I'm currently working on a project I want to share with others without them giving them full acess to my drive, which Google aks for anytime I use:
from google.colab import drive
drive.mount('/content/drive')
I want to be able to share a folder with a jupyter notebook and the containing data in subfolders (in google drive itself or just by sending them the files, whichever is easier to solve). I want them being able to use it without correcting the path to their drive, needing them to give acess to my or their drive or something like that.
I know this is possible in normal jupyter notebooks (just sharing the folder with the notebook and data and it just takes the data by simply setting the working directory to the folder it is in and then using relative paths), but the same code won't work in google colab.
Is there maybe a way to link data (in this case images) to a Google Colab notebook?
Thanks for any help :)
after this command, you have to provide your file directory which you already uploaded to your GoogleDrive.
from google.colab import drive
drive.mount('/content/drive')
directory = '/content/drive/MyDrive/Colab Notebooks/YOUR_FOLDER_NAME'
The "directory" of your folder should be directed like mentioned above order...

how to mount a github repository as the current directory folder in google colab?

I am currently loading the images from my Google Drive.
But the issue is, those images are in my drive and when I share my colab notebook to others, they can't run it since it requires my authentication code to access the Drive images.
So I thought if I uploaded the data folder in a Github repository & made that repo as public will allow anyone to fetch the data (in my case images folder). Thus no authentication required to run the colab code.
I have no idea how to mount the directory to a Github repo as Google Drive.
from google.colab import drive
drive.mount('/content/drive/') # this will set my google drive folder as the notebook directory.
Is it possible to do a similar mounting to a github repo?
You could clone the repository directly like this by running git in a code cell.
!git clone https://github.com/yourusername/yourpublicrepo.git
This will create a folder called yourpublicrepo.

Loading from Colab from specific folder

Is there a way to reveal only a specific folder from google Colab?
If I'm coding:
from google.colab import drive
drive.mount('/content/drive')
then my entire google drive is revealed at the "files" sidebar, but when I'm trying
from google.colab import drive
drive.mount('/content/drive/My Drive/Shared')
I'm getting an error:
ValueError: Mountpoint must not contain a space.
p.s. I've also tried to upload the files, but they get deleted after a while without activity.
You can only mount the whole of your Google Drive files.
After that, you can create a symlink to a specific folder to make access to them easier.

How to upload files to drive from Collab?

I want to know how we can upload the .txt & .vcf files.
I've already mounted the drive then done some sorting and downloading of data with wget in Collab. But I was not able to find resource to export or commit changes to drive.
Please help me!!
Once you are inside a particular notebook, you can use the file browser on the left to upload files to be used the current notebook. Remember that they will be deleted once your current session ends, so you will have to upload them again when you open the notebook later. If you have uploaded them elsewhere, you can simply use !wget to download them to your notebook's temporary storage.
Edit: To copy data, simply use !cp to copy the file(s) from your notebook storage to the drive once you have mounted it. For example, here is how I would copy data.xyz:
from google.colab import drive
drive.mount('/content/gdrive')
!cp data.xyz "gdrive/My Drive/data.xyz"
You may just as simply use !mv to move data to the drive instead of copying it. Just like that, you can copy/move data from the drive to the Collaboratory notebook too.

How to Upload Many Files to Google Colab?

I am working on a image segmentation machine learning project and I would like to test it out on Google Colab.
For the training dataset, I have 700 images, mostly 256x256, that I need to upload into a python numpy array for my project. I also have thousands of corresponding mask files to upload. They currently exist in a variety of subfolders on Google drive, but I have been unable to upload them to Google Colab for use in my project.
So far I have attempted using Google Fuse which seems to have very slow upload speeds and PyDrive which has given me a variety of authentication errors. I have been using the Google Colab I/O example code for the most part.
How should I go about this? Would PyDrive be the way to go? Is there code somewhere for uploading a folder structure or many files at a time?
You can put all your data into your google drive and then mount drive. This is how I have done it. Let me explain in steps.
Step 1:
Transfer your data into your google drive.
Step 2:
Run the following code to mount you google drive.
# Install a Drive FUSE wrapper.
# https://github.com/astrada/google-drive-ocamlfuse
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
# Generate auth tokens for Colab
from google.colab import auth
auth.authenticate_user()
# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
# Create a directory and mount Google Drive using that directory.
!mkdir -p My Drive
!google-drive-ocamlfuse My Drive
!ls My Drive/
# Create a file in Drive.
!echo "This newly created file will appear in your Drive file list." > My Drive/created.txt
Step 3:
Run the following line to check if you can see your desired data into mounted drive.
!ls Drive
Step 4:
Now load your data into numpy array as follows. I had my exel files having my train and cv and test data.
train_data = pd.read_excel(r'Drive/train.xlsx')
test = pd.read_excel(r'Drive/test.xlsx')
cv= pd.read_excel(r'Drive/cv.xlsx')
Edit
For downloading the data into your drive from the colab notebook environment, you can run the following code.
# Install the PyDrive wrapper & import libraries.
# This only needs to be done once in a notebook.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
# This only needs to be done once in a notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# Create & upload a file.
uploaded = drive.CreateFile({'data.xlsx': 'data.xlsx'})
uploaded.SetContentFile('data.xlsx')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))
Here are few steps to upload large dataset to Google Colab
1.Upload your dataset to free cloud storage like dropbox, openload, etc.(I used dropbox)
2.Create a shareable link of your uploaded file and copy it.
3.Open your notebook in Google Colab and run this command in one of the cell:
!wget your_shareable_file_link
That's it!
You can compress your dataset in zip or rar file and later unizp it after downloading it in Google Colab by using this command:
!unzip downloaded_filename -d destination_folder
Zip you file first then upload it to Google Drive.
See this simple command to unzip:
!unzip {file_location}
Example:
!unzip drive/models.rar
Step1: Mount the Drive, by running the following command:
from google.colab import drive
drive.mount('/content/drive')
This will output a link. Click on the link, hit allow, copy the authorization code and paste it the box present in colab cell with the text "Enter your authorization code:" written on top of it.
This process is just giving permission for colab to access your Google Drive.
Step2: Upload your folder(zipped or unzipped depending on the size of the folder) to Google Drive
Step3: Now work your way into the Drive directories and files to locate your uploaded folder/zipped file.
This process may look something like this:
The current working directory in colab when you start off will be /content/
Just to make sure, run the following command in the cell:
!pwd
It will show you the current directory you are in. (pwd stands for "print working directory")
Then use the commands like:
!ls
to list the directories and files in the directory you are in
and the command:
!cd /directory/name/of/your/choice
to move into the directories to locate your uploaded folder or the uploaded .zip file.
And just like that, you are ready to get your hands dirty with your Machine Learning model! :)
Hopefully, these simple steps will prevent you from spending too much unnecessary time on figuring out how colab works when you should actually be spending the majority of your time figuring out the Machine learning model, its hyperparameters, pre-processing...
Google Colab had made it more convenient for users to upload files [from the local machine, Google drive, or github]. You need to click on Mount Drive Option to the pane on the left side of the notebook and you'll get access to all the files stored in your drive.
Select the file -> right-click -> Copy path Refer this
Use python import methods to import files from this path, e.g., for example:
import pandas as pd
data = pd.read_csv('your copied path here')
For importing multiple files in one go, you may need to write a function.
There are many ways to do so :
You might want to push your data into a github repository then in Google Colab code cell you can run :
!git clone https://www.github.com/{repo}.git
You can upload your data to Google drive then in your code cell :
from google.colab import drive
drive.mount('/content/drive')
Use transfer.sh tool : you can visit here to see how it works :
transfer.sh

Categories