I have access to unlimited GDrive suite and I would like to transfer a public folder from Mega to my Drive. Here are the steps I have already tried along with their issues I faced :
Google Colab method --> Colab disconnecting after 5-10 minutes of running; Issue already raised on Github code
MultiCloud --> Possible to copy files in my MegaDrive; but I want to copy public folders such as this one
RClone --> Same issue as above; Also an error generated while creating a config file
Mega.py library --> Only for files, not folders; Error when downloading from mega.nz as documentation mentions only for mega.co.nz
MegaCopy from MegaTools --> Did not find a Windows implementation; also need a python integration of it if possible
The old, download from Mega and upload to Google Drive method --> Extremely slow download speed
I am exhausted and out of ideas I can think of for this seemingly easy task. It would be extremely helpful if someone could be of my help. Thank you in advance.
I would definitely go with Google Colab.
You just have to install MegaCMD in your Notebook, mount your google drive and get your folder(s)!
I just created one for this purpose: https://colab.research.google.com/drive/1tadBcXE4vkKaFETsWGJ_B7O8uKxp6E9J?usp=sharing
Edit: the content of the Notebook:
Import GDrive:
from google.colab import drive
drive.mount('/content/drive')
Move to your mounted drive and create directory:
cd /content/drive/MyDrive/
!mkdir MegaImport
cd MegaImport
Install dependencies:
!apt install libmms0 libc-ares2 libc6 libcrypto++6 libgcc1 libmediainfo0v5 libpcre3 libpcrecpp0v5 libssl1.1 libstdc++6 libzen0v5 zlib1g apt-transport-https
Download and install MegaCMD:
!wget https://mega.nz/linux/MEGAsync/Debian_9.0/amd64/megacmd_1.4.0-3.1_amd64.deb
!dpkg -i megacmd_1.4.0-3.1_amd64.deb
Login to Mega (replace with your email and pwd):
!mega-login email password
Get the folder you wish:
!mega-get <remote_folder> ./
You could stand up a free tier AWS EC2 instance, or use GCE's new account credit, to run a virtual computer in the cloud for free or mostly free. You could then download and upload from that machine. That would sidestep the download speed issues you're having.
Related
This may sound absurd but I want to write code using my preferred IDE and execute the written code in google collaborator.
I have a low-end PC and it takes ages to execute some codes (high-level codes) which takes much less time when running on Google Colab. So, is there a way to write code in local IDE, upload the code programmatically, and execute it (Output can be shown on the website/colab-site).
I can upload the program file to google drive programmatically but I need to manually execute it from colab which I want to avoid.
I want to write a code that can do all the above stuff like
Upload the program to collab and execute it on the collab.
You can Upload your code to drive
Then Mount
from google.colab import drive
drive.mount('/content/drive')
#Optional: move to the desired location:
%cd drive/My Drive/DIRECTORY_IN_YOUR_DRIVE
Install requirements(optional)
pip install REQUIREMENT
Then RUN your file using
python/python3 filename.py
How can I save a file generated by colab notebook directly to github repo?
It can be assumed that the notebook was opened from the github repo and can be (the same notebook) saved to the same github repo.
Google Colaboratory's integrating with github tends to be lacking, however you can run bash commands from inside the notebook. These allow you to access and modify any data generated.
You'll need to generate a token on github to allow access to the repository you want to save data to. See here for how to create a personal access token.
Once you have that token, you run git commands from inside the notebook to clone the repository, add whatever files you need to, and then upload them. This post here provides an overview of how to do it in depth.
That being said, this approach is kind of cumbersome, and it might be preferable to configure colab to work over an SSH connection. Once you do that, you can mount a folder on the colab instance to a folder on your local machine using sshfs. This will allow you to access the colab as though it were any other folder on your machine, including opening it in your IDE, viewing files in a file browser, and cloning or updating git repositories. This goes more in depth on that.
These are the best options I was able to identify, and I hope one of them can be made to work for you.
I am currently loading the images from my Google Drive.
But the issue is, those images are in my drive and when I share my colab notebook to others, they can't run it since it requires my authentication code to access the Drive images.
So I thought if I uploaded the data folder in a Github repository & made that repo as public will allow anyone to fetch the data (in my case images folder). Thus no authentication required to run the colab code.
I have no idea how to mount the directory to a Github repo as Google Drive.
from google.colab import drive
drive.mount('/content/drive/') # this will set my google drive folder as the notebook directory.
Is it possible to do a similar mounting to a github repo?
You could clone the repository directly like this by running git in a code cell.
!git clone https://github.com/yourusername/yourpublicrepo.git
This will create a folder called yourpublicrepo.
Please help if you can! I have a lot of individual images stored in a google bucket. I want to retrieve individual images from the bucket through google colab. I have already set up a connection via gcsfuse but I can still not access the images.
I have tried:
I = io.imread('/content/coco/Val/Val/val2017/000000000139.jpg')
I = file_io.FileIO('/content/coco/Val/Val/val2017/000000000139.jpg', 'r')
I = tf.io.read_file('/content/coco/Val/Val/val2017/000000000139.jpg', 'r')
None have worked and I am confused.
io.imread returns "None"
file_io.FileIO returns "tensorflow.python.lib.io.file_io.FileIO at 0x7fb7e075e588"
which I don't know what to do with.
tf.io.read_file returns an empty tensor.
(I am actually using PyTorch, not Tensorflow but after some google searches, it seemed TensorFlow might have the answer.)
Is unclear to me if your issue is with copying files from Google Cloud Storage to Colab or accesing a file in Colab with Python
As stated in the Colab documentation In order to use Google Cloud Storage you should be using the gsutil tool.
Anyways I tried myself to use the gcsFUSE tool by following this steps and I was able to see the objects of my bucket by running the !ls command
Steps:
from google.colab import auth
auth.authenticate_user()
Once you run this, a link will be generated, you can click on it and get the signing in done.
!echo "deb http://packages.cloud.google.com/apt gcsfuse-bionic main" > /etc/apt/sources.list.d/gcsfuse.list
!curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
!apt -qq update
!apt -qq install gcsfuse
Use this to install gcsfuse on colab.
!mkdir folderOnColab
!gcsfuse folderOnBucket folderOnColab
Replace the folderOnColab with the desired name of your folder and the folderOnBucket with the name of your bucket removing the gs:// preceding the name.
By following all these steps and running the !ls command I was able to see the files form my bucket in the new folder in Colab.
I am working on a image segmentation machine learning project and I would like to test it out on Google Colab.
For the training dataset, I have 700 images, mostly 256x256, that I need to upload into a python numpy array for my project. I also have thousands of corresponding mask files to upload. They currently exist in a variety of subfolders on Google drive, but I have been unable to upload them to Google Colab for use in my project.
So far I have attempted using Google Fuse which seems to have very slow upload speeds and PyDrive which has given me a variety of authentication errors. I have been using the Google Colab I/O example code for the most part.
How should I go about this? Would PyDrive be the way to go? Is there code somewhere for uploading a folder structure or many files at a time?
You can put all your data into your google drive and then mount drive. This is how I have done it. Let me explain in steps.
Step 1:
Transfer your data into your google drive.
Step 2:
Run the following code to mount you google drive.
# Install a Drive FUSE wrapper.
# https://github.com/astrada/google-drive-ocamlfuse
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
# Generate auth tokens for Colab
from google.colab import auth
auth.authenticate_user()
# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
# Create a directory and mount Google Drive using that directory.
!mkdir -p My Drive
!google-drive-ocamlfuse My Drive
!ls My Drive/
# Create a file in Drive.
!echo "This newly created file will appear in your Drive file list." > My Drive/created.txt
Step 3:
Run the following line to check if you can see your desired data into mounted drive.
!ls Drive
Step 4:
Now load your data into numpy array as follows. I had my exel files having my train and cv and test data.
train_data = pd.read_excel(r'Drive/train.xlsx')
test = pd.read_excel(r'Drive/test.xlsx')
cv= pd.read_excel(r'Drive/cv.xlsx')
Edit
For downloading the data into your drive from the colab notebook environment, you can run the following code.
# Install the PyDrive wrapper & import libraries.
# This only needs to be done once in a notebook.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
# This only needs to be done once in a notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# Create & upload a file.
uploaded = drive.CreateFile({'data.xlsx': 'data.xlsx'})
uploaded.SetContentFile('data.xlsx')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))
Here are few steps to upload large dataset to Google Colab
1.Upload your dataset to free cloud storage like dropbox, openload, etc.(I used dropbox)
2.Create a shareable link of your uploaded file and copy it.
3.Open your notebook in Google Colab and run this command in one of the cell:
!wget your_shareable_file_link
That's it!
You can compress your dataset in zip or rar file and later unizp it after downloading it in Google Colab by using this command:
!unzip downloaded_filename -d destination_folder
Zip you file first then upload it to Google Drive.
See this simple command to unzip:
!unzip {file_location}
Example:
!unzip drive/models.rar
Step1: Mount the Drive, by running the following command:
from google.colab import drive
drive.mount('/content/drive')
This will output a link. Click on the link, hit allow, copy the authorization code and paste it the box present in colab cell with the text "Enter your authorization code:" written on top of it.
This process is just giving permission for colab to access your Google Drive.
Step2: Upload your folder(zipped or unzipped depending on the size of the folder) to Google Drive
Step3: Now work your way into the Drive directories and files to locate your uploaded folder/zipped file.
This process may look something like this:
The current working directory in colab when you start off will be /content/
Just to make sure, run the following command in the cell:
!pwd
It will show you the current directory you are in. (pwd stands for "print working directory")
Then use the commands like:
!ls
to list the directories and files in the directory you are in
and the command:
!cd /directory/name/of/your/choice
to move into the directories to locate your uploaded folder or the uploaded .zip file.
And just like that, you are ready to get your hands dirty with your Machine Learning model! :)
Hopefully, these simple steps will prevent you from spending too much unnecessary time on figuring out how colab works when you should actually be spending the majority of your time figuring out the Machine learning model, its hyperparameters, pre-processing...
Google Colab had made it more convenient for users to upload files [from the local machine, Google drive, or github]. You need to click on Mount Drive Option to the pane on the left side of the notebook and you'll get access to all the files stored in your drive.
Select the file -> right-click -> Copy path Refer this
Use python import methods to import files from this path, e.g., for example:
import pandas as pd
data = pd.read_csv('your copied path here')
For importing multiple files in one go, you may need to write a function.
There are many ways to do so :
You might want to push your data into a github repository then in Google Colab code cell you can run :
!git clone https://www.github.com/{repo}.git
You can upload your data to Google drive then in your code cell :
from google.colab import drive
drive.mount('/content/drive')
Use transfer.sh tool : you can visit here to see how it works :
transfer.sh