Please help if you can! I have a lot of individual images stored in a google bucket. I want to retrieve individual images from the bucket through google colab. I have already set up a connection via gcsfuse but I can still not access the images.
I have tried:
I = io.imread('/content/coco/Val/Val/val2017/000000000139.jpg')
I = file_io.FileIO('/content/coco/Val/Val/val2017/000000000139.jpg', 'r')
I = tf.io.read_file('/content/coco/Val/Val/val2017/000000000139.jpg', 'r')
None have worked and I am confused.
io.imread returns "None"
file_io.FileIO returns "tensorflow.python.lib.io.file_io.FileIO at 0x7fb7e075e588"
which I don't know what to do with.
tf.io.read_file returns an empty tensor.
(I am actually using PyTorch, not Tensorflow but after some google searches, it seemed TensorFlow might have the answer.)
Is unclear to me if your issue is with copying files from Google Cloud Storage to Colab or accesing a file in Colab with Python
As stated in the Colab documentation In order to use Google Cloud Storage you should be using the gsutil tool.
Anyways I tried myself to use the gcsFUSE tool by following this steps and I was able to see the objects of my bucket by running the !ls command
Steps:
from google.colab import auth
auth.authenticate_user()
Once you run this, a link will be generated, you can click on it and get the signing in done.
!echo "deb http://packages.cloud.google.com/apt gcsfuse-bionic main" > /etc/apt/sources.list.d/gcsfuse.list
!curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
!apt -qq update
!apt -qq install gcsfuse
Use this to install gcsfuse on colab.
!mkdir folderOnColab
!gcsfuse folderOnBucket folderOnColab
Replace the folderOnColab with the desired name of your folder and the folderOnBucket with the name of your bucket removing the gs:// preceding the name.
By following all these steps and running the !ls command I was able to see the files form my bucket in the new folder in Colab.
Related
I have access to unlimited GDrive suite and I would like to transfer a public folder from Mega to my Drive. Here are the steps I have already tried along with their issues I faced :
Google Colab method --> Colab disconnecting after 5-10 minutes of running; Issue already raised on Github code
MultiCloud --> Possible to copy files in my MegaDrive; but I want to copy public folders such as this one
RClone --> Same issue as above; Also an error generated while creating a config file
Mega.py library --> Only for files, not folders; Error when downloading from mega.nz as documentation mentions only for mega.co.nz
MegaCopy from MegaTools --> Did not find a Windows implementation; also need a python integration of it if possible
The old, download from Mega and upload to Google Drive method --> Extremely slow download speed
I am exhausted and out of ideas I can think of for this seemingly easy task. It would be extremely helpful if someone could be of my help. Thank you in advance.
I would definitely go with Google Colab.
You just have to install MegaCMD in your Notebook, mount your google drive and get your folder(s)!
I just created one for this purpose: https://colab.research.google.com/drive/1tadBcXE4vkKaFETsWGJ_B7O8uKxp6E9J?usp=sharing
Edit: the content of the Notebook:
Import GDrive:
from google.colab import drive
drive.mount('/content/drive')
Move to your mounted drive and create directory:
cd /content/drive/MyDrive/
!mkdir MegaImport
cd MegaImport
Install dependencies:
!apt install libmms0 libc-ares2 libc6 libcrypto++6 libgcc1 libmediainfo0v5 libpcre3 libpcrecpp0v5 libssl1.1 libstdc++6 libzen0v5 zlib1g apt-transport-https
Download and install MegaCMD:
!wget https://mega.nz/linux/MEGAsync/Debian_9.0/amd64/megacmd_1.4.0-3.1_amd64.deb
!dpkg -i megacmd_1.4.0-3.1_amd64.deb
Login to Mega (replace with your email and pwd):
!mega-login email password
Get the folder you wish:
!mega-get <remote_folder> ./
You could stand up a free tier AWS EC2 instance, or use GCE's new account credit, to run a virtual computer in the cloud for free or mostly free. You could then download and upload from that machine. That would sidestep the download speed issues you're having.
I am currently loading the images from my Google Drive.
But the issue is, those images are in my drive and when I share my colab notebook to others, they can't run it since it requires my authentication code to access the Drive images.
So I thought if I uploaded the data folder in a Github repository & made that repo as public will allow anyone to fetch the data (in my case images folder). Thus no authentication required to run the colab code.
I have no idea how to mount the directory to a Github repo as Google Drive.
from google.colab import drive
drive.mount('/content/drive/') # this will set my google drive folder as the notebook directory.
Is it possible to do a similar mounting to a github repo?
You could clone the repository directly like this by running git in a code cell.
!git clone https://github.com/yourusername/yourpublicrepo.git
This will create a folder called yourpublicrepo.
I want to know how we can upload the .txt & .vcf files.
I've already mounted the drive then done some sorting and downloading of data with wget in Collab. But I was not able to find resource to export or commit changes to drive.
Please help me!!
Once you are inside a particular notebook, you can use the file browser on the left to upload files to be used the current notebook. Remember that they will be deleted once your current session ends, so you will have to upload them again when you open the notebook later. If you have uploaded them elsewhere, you can simply use !wget to download them to your notebook's temporary storage.
Edit: To copy data, simply use !cp to copy the file(s) from your notebook storage to the drive once you have mounted it. For example, here is how I would copy data.xyz:
from google.colab import drive
drive.mount('/content/gdrive')
!cp data.xyz "gdrive/My Drive/data.xyz"
You may just as simply use !mv to move data to the drive instead of copying it. Just like that, you can copy/move data from the drive to the Collaboratory notebook too.
It is great that I can run jupyter notebooks in CoLab, but I am going crazy saving and loading files. For example, I am writing an assignment for my course and I include figures in it using the HTML tag. (I want to use HTML instead of markdown images so I can set the width.) So in a Text cell I have
<img src="CoLab04.png" width="250">
This works fine when I run the jupyter notebook on my laptop, but in CoLab, it can't find the image even when the image is in the same CoLab folder as the ipynb file. Err.
I have similar problems saving data files. On my laptop I can use the normal python functions open, write, close, etc. That code runs without complaint, but the files do not show up on Google Drive. Not in the CoLab folder or any other folder when I search all of my Google Drive. Err. I read TFM and use
from google.colab import drive, files
drive.mount('/content/gdrive')
fig.savefig("LED12.png") # saves a figure as a file
files.download("LED12.png")
This downloads a file to my laptop. Then I have to upload the file to a Google Drive folder so my students can see it.
Am I missing something? Why is it so hard to create and read Google Drive files using a Google-CoLab jupyter notebook?
I've read https://colab.research.google.com/notebooks/io.ipynb, but why is it so hard? I need something easy for novice students to use. If reading and writing files is this hard, I will have to recommend my students install jupyter on their laptops and not use CoLab.
It seems to me a sys.path problem.
After you mount My Drive by the following code
from google.colab import drive
drive.mount('/content/drive/')
then your main Google Drive can be read with
!ls /content/drive/My Drive/
If you have a sub-folder under My Drive that you wish to centralize your colab project, let's say you have projectA folder under your main Google Drive directory. You can add the projectA folder path to sys.path
import sys
sys.path.append("/content/drive/My Drive/projectA")
Then you should be able to save your fig as the same way you used in your local machine root path. The file will be save to your projectA folder where you run your colab code.
fig.savefig("LED12.png")
You should be able to see the file appear there. If this doesn't work, then try using absolute path when doing open, save, close etc. path sensitive operation:
working_path = '/content/drive/My Drive/projectA'
fig.savefig(os.path.join(working_path, "LED12.png"))
It may be simpler to load notebooks from GitHub, wherein image links in the same repository will be loaded more intuitively.
For example, the notebook below loads a set of images bundled in its GitHub repository.
https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.01-What-Is-Machine-Learning.ipynb
The markdown reference for the first graph:
![](figures/05.01-classification-1.png)
[figure source in Appendix](06.00-Figure-Code.ipynb#Classification-Example-Figure-1)
This corresponds to the GitHub repo here:
https://github.com/jakevdp/PythonDataScienceHandbook/
Building on this example, a common pattern for bundling data files is to add a !git clone ... command at the top of the notebook to bring in the entire repo in one shot.
The reason this is simpler to accomplish in GitHub than Drive is that GitHub has unified ACLs at a repostiory level, whereas Drive manages ACLs at the file level. So, it would be a bit cumbersome to have a Drive notebook shared publicly that referenced images or other Drive files that were not shared.
I have done this in Colab (reading, training my model and uploading my trained model) some days ago. Let's make it simple.
Please do the following steps. I am trying to cover both(reading csv as well as uploading a file).
Step 1 : Go to your google drive and create a folder: Colab and keep your files inside Colab folder.
Step 2 : Now, install pydrive in Colab jupyter notebook
!pip install pydrive
Step 3 : Run following commands for accessing Google drive File
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
Step 4 : Mount drive(Here you will get a link in Colab jupyter shell. Click the generated link and verify your google drive(Just copy and paste the generated code) )
from google.colab import drive
drive.mount('/content/drive/')
Step 5 : Authenticate and create the PyDrive client. Here do the same like step 4 (Click the generated link and verify your google drive(Just copy and paste the generated code) )
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
Step 6 : To get the file, replace the id with id(your file id) of file you want to access. For me, it was csv file. To get the id, go to share and generate a link. you will find something like : https://drive.google.com/file/d/xxxxxxxxxxxxxx/view?usp=sharing. Put it(xxxxxxxxxxxxxx) on below and do the same, how many files you want to read.
normal_1 = drive.CreateFile({'id':'13AR0sS1pndF0fTxmdjQRv_1Bv5aBNpkT'})
normal_1.GetContentFile('normal_1.csv')
normal_2 = drive.CreateFile({'id':'1Z0DO8M1Qco07kyVoxYSgxXBx6XYGBzJd'})
normal_2.GetContentFile('normal_2.csv')
abnormal = drive.CreateFile({'id':'12zFHDXVjreorRrHHhYrA1n82VQLuawsl'})
abnormal.GetContentFile('abnormal.csv')
Step 7 : Now, you can read those files and load in a dataframe for further use.
normal_1 = pd.read_csv('normal_1.csv', skiprows = np.arange(100, normal_1.shape[0]))
normal_2 = pd.read_csv('normal_2.csv', skiprows = np.arange(100, normal_2.shape[0]))
abnormal = pd.read_csv('abnormal.csv', skiprows = np.arange(50, abnormal.shape[0]))
Step 8 : Save the model to disk after training your model:Use joblib
from sklearn.externals import joblib
filename = 'model.sav'
joblib.dump(clf, filename)
# Upload model to you google drive
model_file = drive.CreateFile({'title' : 'model.sav'})
model_file.SetContentFile('model.sav')
model_file.Upload()
Now, go to your My drive and refresh it. You will finding something "model.sav". For the complete code in jupyter notebook file, you can visit my github link. I hope it will help you to solve your problem.
I am working on a image segmentation machine learning project and I would like to test it out on Google Colab.
For the training dataset, I have 700 images, mostly 256x256, that I need to upload into a python numpy array for my project. I also have thousands of corresponding mask files to upload. They currently exist in a variety of subfolders on Google drive, but I have been unable to upload them to Google Colab for use in my project.
So far I have attempted using Google Fuse which seems to have very slow upload speeds and PyDrive which has given me a variety of authentication errors. I have been using the Google Colab I/O example code for the most part.
How should I go about this? Would PyDrive be the way to go? Is there code somewhere for uploading a folder structure or many files at a time?
You can put all your data into your google drive and then mount drive. This is how I have done it. Let me explain in steps.
Step 1:
Transfer your data into your google drive.
Step 2:
Run the following code to mount you google drive.
# Install a Drive FUSE wrapper.
# https://github.com/astrada/google-drive-ocamlfuse
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
# Generate auth tokens for Colab
from google.colab import auth
auth.authenticate_user()
# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
# Create a directory and mount Google Drive using that directory.
!mkdir -p My Drive
!google-drive-ocamlfuse My Drive
!ls My Drive/
# Create a file in Drive.
!echo "This newly created file will appear in your Drive file list." > My Drive/created.txt
Step 3:
Run the following line to check if you can see your desired data into mounted drive.
!ls Drive
Step 4:
Now load your data into numpy array as follows. I had my exel files having my train and cv and test data.
train_data = pd.read_excel(r'Drive/train.xlsx')
test = pd.read_excel(r'Drive/test.xlsx')
cv= pd.read_excel(r'Drive/cv.xlsx')
Edit
For downloading the data into your drive from the colab notebook environment, you can run the following code.
# Install the PyDrive wrapper & import libraries.
# This only needs to be done once in a notebook.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
# This only needs to be done once in a notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# Create & upload a file.
uploaded = drive.CreateFile({'data.xlsx': 'data.xlsx'})
uploaded.SetContentFile('data.xlsx')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))
Here are few steps to upload large dataset to Google Colab
1.Upload your dataset to free cloud storage like dropbox, openload, etc.(I used dropbox)
2.Create a shareable link of your uploaded file and copy it.
3.Open your notebook in Google Colab and run this command in one of the cell:
!wget your_shareable_file_link
That's it!
You can compress your dataset in zip or rar file and later unizp it after downloading it in Google Colab by using this command:
!unzip downloaded_filename -d destination_folder
Zip you file first then upload it to Google Drive.
See this simple command to unzip:
!unzip {file_location}
Example:
!unzip drive/models.rar
Step1: Mount the Drive, by running the following command:
from google.colab import drive
drive.mount('/content/drive')
This will output a link. Click on the link, hit allow, copy the authorization code and paste it the box present in colab cell with the text "Enter your authorization code:" written on top of it.
This process is just giving permission for colab to access your Google Drive.
Step2: Upload your folder(zipped or unzipped depending on the size of the folder) to Google Drive
Step3: Now work your way into the Drive directories and files to locate your uploaded folder/zipped file.
This process may look something like this:
The current working directory in colab when you start off will be /content/
Just to make sure, run the following command in the cell:
!pwd
It will show you the current directory you are in. (pwd stands for "print working directory")
Then use the commands like:
!ls
to list the directories and files in the directory you are in
and the command:
!cd /directory/name/of/your/choice
to move into the directories to locate your uploaded folder or the uploaded .zip file.
And just like that, you are ready to get your hands dirty with your Machine Learning model! :)
Hopefully, these simple steps will prevent you from spending too much unnecessary time on figuring out how colab works when you should actually be spending the majority of your time figuring out the Machine learning model, its hyperparameters, pre-processing...
Google Colab had made it more convenient for users to upload files [from the local machine, Google drive, or github]. You need to click on Mount Drive Option to the pane on the left side of the notebook and you'll get access to all the files stored in your drive.
Select the file -> right-click -> Copy path Refer this
Use python import methods to import files from this path, e.g., for example:
import pandas as pd
data = pd.read_csv('your copied path here')
For importing multiple files in one go, you may need to write a function.
There are many ways to do so :
You might want to push your data into a github repository then in Google Colab code cell you can run :
!git clone https://www.github.com/{repo}.git
You can upload your data to Google drive then in your code cell :
from google.colab import drive
drive.mount('/content/drive')
Use transfer.sh tool : you can visit here to see how it works :
transfer.sh