How to upload files to drive from Collab? - python

I want to know how we can upload the .txt & .vcf files.
I've already mounted the drive then done some sorting and downloading of data with wget in Collab. But I was not able to find resource to export or commit changes to drive.
Please help me!!

Once you are inside a particular notebook, you can use the file browser on the left to upload files to be used the current notebook. Remember that they will be deleted once your current session ends, so you will have to upload them again when you open the notebook later. If you have uploaded them elsewhere, you can simply use !wget to download them to your notebook's temporary storage.
Edit: To copy data, simply use !cp to copy the file(s) from your notebook storage to the drive once you have mounted it. For example, here is how I would copy data.xyz:
from google.colab import drive
drive.mount('/content/gdrive')
!cp data.xyz "gdrive/My Drive/data.xyz"
You may just as simply use !mv to move data to the drive instead of copying it. Just like that, you can copy/move data from the drive to the Collaboratory notebook too.

Related

Google Colab: Import data from google drive and make it possible to share it

I'm currently working on a project I want to share with others without them giving them full acess to my drive, which Google aks for anytime I use:
from google.colab import drive
drive.mount('/content/drive')
I want to be able to share a folder with a jupyter notebook and the containing data in subfolders (in google drive itself or just by sending them the files, whichever is easier to solve). I want them being able to use it without correcting the path to their drive, needing them to give acess to my or their drive or something like that.
I know this is possible in normal jupyter notebooks (just sharing the folder with the notebook and data and it just takes the data by simply setting the working directory to the folder it is in and then using relative paths), but the same code won't work in google colab.
Is there maybe a way to link data (in this case images) to a Google Colab notebook?
Thanks for any help :)
after this command, you have to provide your file directory which you already uploaded to your GoogleDrive.
from google.colab import drive
drive.mount('/content/drive')
directory = '/content/drive/MyDrive/Colab Notebooks/YOUR_FOLDER_NAME'
The "directory" of your folder should be directed like mentioned above order...

how to restore instance once the session is terminated in google colab

I'm using google colab for a tensorflow project. but whenever I terminate the session all the files and work I've done gets wiped out all there is the ipynb file which I was using. then I have to redo everything from the beginning. these are the file
I loose all these files which I'm using then reupload them when I open my ipynb file the next time. how can solve this problem. should I push this entire file structure to git repo and clone it next time I'm using it? or is their another way to do it?
amanpreet!. Yes , you can put all your files in Github ,you can also put all the files in your google drive and access it by mounting drive in Colab . attaching an article for your reference.
https://buomsoo-kim.github.io/colab/2020/05/09/Colab-mounting-google-drive.md/

Copy files from a mounted Google Drive to a local Google Colab session

Because of a very large image dataset, datagen.flow_from_directory is extremely slow. Is there a way to copy the "data" folder on my Google drive to the local session in Google Colab? Kind of like uploading a file from my PC to the session, but from Drive.
Found it. After mounting the drive, it's as simple as
!cp -r /content/drive/MyDrive/data /content/data
The first path is the "data" folder on my Google Drive, and the second one is the destination in my current runtime
i hope that this notebook can help you. link to notebook
FOLDER_PATH - path of folder you want to copy
DESTINATION_PATH - destination of the folder where you want to save the copy
!cp -r '/gdrive/My Drive/FOLDER_PATH' '/gdrive/My Drive/DESTINATION_PATH'

dump files downloaded by google Colab in temporary location to google drive

I have a json file with over 16k urls of images, which I parse using a python script and use urllib.request.urlretrieve in it to retrieve images. I uploaded the json file to google drive and run the python script in google Colab.
Though the files were downloaded (I checked this using a print line in the try block of urlretrieve) and it took substantial time to download them, I am unable to see where it has stored these files. When I had run the same script on my local machine, it stored the files in the current folder.
As an answer to this question suggests, the files may be downloaded to some temporary location, say, on some cloud. Is there a way to dump these temporary files to google drive?
(*Note I had mounted the drive in the colab notebook, still the files don't appear to be stored in google drive)
Colab stores files in some temp location which is new every time you run the notebook. If you want your data to persist across sessions you need to store it in GDrive. For that you need to map some GDrive folder in your notebook and use it as path. Also, you need to give the Colab permissions to access your GDrive
After mounting GDrive you need to move files from the Colab to GDrive using command:
!mv /content/filename /content/gdrive/My\ Drive/

CoLab Accessing Files

It is great that I can run jupyter notebooks in CoLab, but I am going crazy saving and loading files. For example, I am writing an assignment for my course and I include figures in it using the HTML tag. (I want to use HTML instead of markdown images so I can set the width.) So in a Text cell I have
<img src="CoLab04.png" width="250">
This works fine when I run the jupyter notebook on my laptop, but in CoLab, it can't find the image even when the image is in the same CoLab folder as the ipynb file. Err.
I have similar problems saving data files. On my laptop I can use the normal python functions open, write, close, etc. That code runs without complaint, but the files do not show up on Google Drive. Not in the CoLab folder or any other folder when I search all of my Google Drive. Err. I read TFM and use
from google.colab import drive, files
drive.mount('/content/gdrive')
fig.savefig("LED12.png") # saves a figure as a file
files.download("LED12.png")
This downloads a file to my laptop. Then I have to upload the file to a Google Drive folder so my students can see it.
Am I missing something? Why is it so hard to create and read Google Drive files using a Google-CoLab jupyter notebook?
I've read https://colab.research.google.com/notebooks/io.ipynb, but why is it so hard? I need something easy for novice students to use. If reading and writing files is this hard, I will have to recommend my students install jupyter on their laptops and not use CoLab.
It seems to me a sys.path problem.
After you mount My Drive by the following code
from google.colab import drive
drive.mount('/content/drive/')
then your main Google Drive can be read with
!ls /content/drive/My Drive/
If you have a sub-folder under My Drive that you wish to centralize your colab project, let's say you have projectA folder under your main Google Drive directory. You can add the projectA folder path to sys.path
import sys
sys.path.append("/content/drive/My Drive/projectA")
Then you should be able to save your fig as the same way you used in your local machine root path. The file will be save to your projectA folder where you run your colab code.
fig.savefig("LED12.png")
You should be able to see the file appear there. If this doesn't work, then try using absolute path when doing open, save, close etc. path sensitive operation:
working_path = '/content/drive/My Drive/projectA'
fig.savefig(os.path.join(working_path, "LED12.png"))
It may be simpler to load notebooks from GitHub, wherein image links in the same repository will be loaded more intuitively.
For example, the notebook below loads a set of images bundled in its GitHub repository.
https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.01-What-Is-Machine-Learning.ipynb
The markdown reference for the first graph:
![](figures/05.01-classification-1.png)
[figure source in Appendix](06.00-Figure-Code.ipynb#Classification-Example-Figure-1)
This corresponds to the GitHub repo here:
https://github.com/jakevdp/PythonDataScienceHandbook/
Building on this example, a common pattern for bundling data files is to add a !git clone ... command at the top of the notebook to bring in the entire repo in one shot.
The reason this is simpler to accomplish in GitHub than Drive is that GitHub has unified ACLs at a repostiory level, whereas Drive manages ACLs at the file level. So, it would be a bit cumbersome to have a Drive notebook shared publicly that referenced images or other Drive files that were not shared.
I have done this in Colab (reading, training my model and uploading my trained model) some days ago. Let's make it simple.
Please do the following steps. I am trying to cover both(reading csv as well as uploading a file).
Step 1 : Go to your google drive and create a folder: Colab and keep your files inside Colab folder.
Step 2 : Now, install pydrive in Colab jupyter notebook
!pip install pydrive
Step 3 : Run following commands for accessing Google drive File
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
Step 4 : Mount drive(Here you will get a link in Colab jupyter shell. Click the generated link and verify your google drive(Just copy and paste the generated code) )
from google.colab import drive
drive.mount('/content/drive/')
Step 5 : Authenticate and create the PyDrive client. Here do the same like step 4 (Click the generated link and verify your google drive(Just copy and paste the generated code) )
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
Step 6 : To get the file, replace the id with id(your file id) of file you want to access. For me, it was csv file. To get the id, go to share and generate a link. you will find something like : https://drive.google.com/file/d/xxxxxxxxxxxxxx/view?usp=sharing. Put it(xxxxxxxxxxxxxx) on below and do the same, how many files you want to read.
normal_1 = drive.CreateFile({'id':'13AR0sS1pndF0fTxmdjQRv_1Bv5aBNpkT'})
normal_1.GetContentFile('normal_1.csv')
normal_2 = drive.CreateFile({'id':'1Z0DO8M1Qco07kyVoxYSgxXBx6XYGBzJd'})
normal_2.GetContentFile('normal_2.csv')
abnormal = drive.CreateFile({'id':'12zFHDXVjreorRrHHhYrA1n82VQLuawsl'})
abnormal.GetContentFile('abnormal.csv')
Step 7 : Now, you can read those files and load in a dataframe for further use.
normal_1 = pd.read_csv('normal_1.csv', skiprows = np.arange(100, normal_1.shape[0]))
normal_2 = pd.read_csv('normal_2.csv', skiprows = np.arange(100, normal_2.shape[0]))
abnormal = pd.read_csv('abnormal.csv', skiprows = np.arange(50, abnormal.shape[0]))
Step 8 : Save the model to disk after training your model:Use joblib
from sklearn.externals import joblib
filename = 'model.sav'
joblib.dump(clf, filename)
# Upload model to you google drive
model_file = drive.CreateFile({'title' : 'model.sav'})
model_file.SetContentFile('model.sav')
model_file.Upload()
Now, go to your My drive and refresh it. You will finding something "model.sav". For the complete code in jupyter notebook file, you can visit my github link. I hope it will help you to solve your problem.

Categories