Extract Google Drive zip from Google colab notebook - python

I already have a zip of (2K images) dataset on a google drive. I have to use it in a ML training algorithm.
Below Code extracts the content in a string format:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import io
import zipfile
# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# Download a file based on its file ID.
#
# A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz
file_id = '1T80o3Jh3tHPO7hI5FBxcX-jFnxEuUE9K' #-- Updated File ID for my zip
downloaded = drive.CreateFile({'id': file_id})
#print('Downloaded content "{}"'.format(downloaded.GetContentString(encoding='cp862')))
But I have to extract and store it in a separate directory as it would be easier for processing (as well as for understanding) of the dataset.
I tried to extract it further, but getting "Not a zipfile error"
dataset = io.BytesIO(downloaded.encode('cp862'))
zip_ref = zipfile.ZipFile(dataset, "r")
zip_ref.extractall()
zip_ref.close()
Google Drive Dataset
Note: Dataset is just for reference, I have already downloaded this zip to my google drive, and I'm referring to file in my drive only.

You can simply use this
!unzip file_location

TO unzip a file to a directory:
!unzip path_to_file.zip -d path_to_directory

To extract Google Drive zip from a Google colab notebook:
import zipfile
from google.colab import drive
drive.mount('/content/drive/')
zip_ref = zipfile.ZipFile("/content/drive/My Drive/ML/DataSet.zip", 'r')
zip_ref.extractall("/tmp")
zip_ref.close()

Colab research team has a notebook for helping you out.
Still, in short, if you are dealing with a zip file, like for me it is mostly thousands of images and I want to store them in a folder within drive then do this --
!unzip -u "/content/drive/My Drive/folder/example.zip" -d "/content/drive/My Drive/folder/NewFolder"
-u part controls extraction only if new/necessary. It is important if suddenly you lose connection or hardware switches off.
-d creates the directory and extracted files are stored there.
Of course before doing this you need to mount your drive
from google.colab import drive
drive.mount('/content/drive')
I hope this helps! Cheers!!

First, install unzip on colab:
!apt install unzip
then use unzip to extract your files:
!unzip source.zip -d destination.zip

First create a new directory:
!mkdir file_destination
Now, it's the time to inflate the directory with the unzipped files with this:
!unzip file_location -d file_destination

Mount GDrive:
from google.colab import drive
drive.mount('/content/gdrive')
Open the link -> copy authorization code -> paste that into the prompt and press "Enter"
Check GDrive access:
!ls "/content/gdrive/My Drive"
Unzip (q stands for "quiet") file from GDrive:
!unzip -q "/content/gdrive/My Drive/dataset.zip"

For Python
Connect to drive,
from google.colab import drive
drive.mount('/content/drive')
Check for directory
!ls
and !pwd
For unzip
!unzip drive/"My Drive"/images.zip

After mounting on drive, use shutil.unpack_archive. It works with almost all archive formats (e.g., “zip”, “tar”, “gztar”, “bztar”, “xztar”) and it's simple:
import shutil
shutil.unpack_archive("filename", "path_to_extract")

Please use this command in google colab
Unzip the file you want to extract and then the location
!unzip "drive/My Drive/Project/yourfilename.zip" -d "drive/My Drive/Project/yourfolder"

Instead of GetContentString(), use GetContentFile() instead. It will save the file instead of returning the string.
downloaded.GetContentFile('images.zip')
Then you can unzip it later with unzip.

SIMPLE WAY TO CONNECT
1) You'll have to verify authentication
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
2)To fuse google drive
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
3)To verify credentials
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
4)Create a drive name to use it in colab ('gdrive') and check if it's working
!mkdir gdrive
!google-drive-ocamlfuse gdrive
!ls gdrive
!cd gdrive

This is what worked for me.
!apt install unzip
Then I used this code to unzip the file
!unzip /content/file.zip -d /content/
Without installing unzip on Colab first you'll always receive error messages.

Try this:
!unpack file.zip
If its now working or file is 7z try below
!apt-get install p7zip-full
!p7zip -d file_name.tar.7z
!tar -xvf file_name.tar
Or
!pip install pyunpack
!pip install patool
from pyunpack import Archive
Archive(‘file_name.tar.7z’).extractall(‘path/to/’)
!tar -xvf file_name.tar

We assumed that you're already mounting your googleDrive to googleColab. in case if you just want to extract zip file thats contains .csv extention. just call pandas attribute read_csv
pd.read_csv('/content/drive/My Drive/folder/example.zip')

in my idea, you must go to a certain path for example:
from google.colab import drive drive.mount('/content/drive/') cd
drive/MyDrive/f/
then :
!apt install unzip
!unzip zip_folder.zip -d unzip_folder
enter image description here

Related

wget not found in Jupiter notebook to download geojson file from url

Im trying to download and read a geojson file from url to use it latter to create a folium map, I already install wget on mac using brew.
when running the code I get this
# Download and store a geojson file of Indiana containig AGEB boundaries
import wget
import geojson
!wget https://github.com/Alexrendon/Indianapolis-data/blob/main/Indiana_censustracts.geojson
census_tract = r'Indiana_censustracts.geojson'
print("geojson ready!")
OUTPUT
zsh:1: command not found: wget
Solution was to use !curl
import wget
import geojson
!curl -O https://github.com/Alexrendon/Indianapolis-data/blob/main/Indiana_censustracts.geojson
census_tract = r'Indiana_censustracts.geojson'
print("geojson ready!")
Your code
# Download and store a geojson file of Indiana containig AGEB boundaries
import wget
import geojson
!wget https://github.com/Alexrendon/Indianapolis-data/blob/main/Indiana_censustracts.geojson
census_tract = r'Indiana_censustracts.geojson'
print("geojson ready!")
looks like you are confusing wget python module available at PyPI and GNU Wget command line tool. If you want just to downlad file, neither is required as there exist urlretrieve inside urllib.request which is part of python standard library. Consider following simple example
import urllib.request
urllib.request.urlretrieve("https://www.example.com","example.html")
first argument is URL, second is filename
The error tells you that you don't have wget on your machine. To install it on Mac, just do
brew install wget

Saving multiple files on Colab - cp: cannot stat 'path': No such file or directory

I trained several models using Keras on Google Colab and I want to save them to my Drive.
All files are '.h5' files.
First, I mounted my drive
from google.colab import drive
drive.mount('/content/gdrive')
Then I tried to save the models using
import glob, os
os.chdir("/content/")
for file in glob.glob("*.h5"):
path = "/content/"+file
!cp -r path "/content/gdrive/My Drive/models"
But I keep getting this error:
cp: cannot stat 'path': No such file or directory
I tried using from pathlib import Path like this
from pathlib import Path
import glob, os
os.chdir("/content/")
for file in glob.glob("*.h5"):
path = "/content/"+file
!cp -r Path(path) "/content/gdrive/My Drive/models"
But it did not work and I got this error
/bin/bash: -c: line 0: syntax error near unexpected token (' /bin/bash: -c: line 0: cp -r Path(path) "/content/gdrive/My
Drive/models"'
What can I do?
Thank you.
Try add curly braces on the variable, here is path, and remove the double quotes on the destination path, so the final command is !cp -r {path} /content/gdrive/My Drive/models
i used another package called shutil it's efficient for doing this task.
import shutil
shutil.copy(file_source_path, file_dest_path

Unzipping .7z File on Google Colab

I have a Zip file named 'mathoverflow.net.7z' in my google drive which I have loaded to colab using the given code. But, when I try to unzip it I get an error. Please suggest a way to rectify this.
This is my code:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
downloaded = drive.CreateFile({'id':'15h0f8p9n6OG1B796q-gbP5oXstCuOcDM'})
downloaded.GetContentFile('mathoverflow.net.7z')
Till this it works fine. But when I run this I get the following error.
!unzip mathoverflow.net.7z
Archive: mathoverflow.net.7z End-of-central-directory signature not
found. Either this file is not a zipfile, or it constitutes one
disk of a multi-part archive. In the latter case the central
directory and zipfile comment will be found on the last disk(s) of
this archive. unzip: cannot find zipfile directory in one of
mathoverflow.net.7z or
mathoverflow.net.7z.zip, and cannot find mathoverflow.net.7z.ZIP, period.
You can use 7z instead. It's already pre-installed in Colab
!7z e mathoverflow.net.7z
!pip install pyunpack
!pip install patool
from pyunpack import Archive
Archive('file_name.7z').extractall('path/to/')
unzip will not work, you need a different tool: https://www.simplified.guide/linux/extract-7z-file
I don't know that you have installation privileges on colab, so you might have to do it in the privacy of your own machine.

How to unzip a file in a specific folder in colaboratory environment after download it?

I've looking for a solution to solve the slow upload speed of images dataset on google colab when i use a connection from GoogleDrive. Using the follow code:
from google.colab import drive
drive.mount('/content/gdrive')
Using this procedure i can upload images and create labels using a my def load_dataset:
'train_path=content/gdrive/MyDrive/Capstone/Enviroment/cell_images/train'
train_files, train_targets = load_dataset(train_path)
But, as i said, it's very slow, especially because my full dataset is composed by 27560 images.
To solve my problem, i've tried to use this solution.
But now, in order to still use my deffunction, after download the .tar file i wanna extract in a specific folder in the colab enviroment. I found this answer but not solve my problem.
Example:
This is the environment with the test.tar already downloaded.
But i wanna extract the files in the tar file, which structure is train/Uninfected ; train/Parasitized, to get this:
content
cell_images
test
Parasitized
Uninfected
train
Parasitized
Uninfected
valid
Parasitized
Uninfected
To use the path in def function:
train_path = train_path=content/cell_images/train/'
train_files, train_targets = load_dataset(train_path)
test_path = train_path=content/cell_images/test/'
test_files, test_targets = load_dataset(test_path)
valid_path = train_path=content/cell_images/valid/'
valid_files, valid_targets = load_dataset(valid_path)
I tried to use:
! mkdir -p content/cell_images
and
!tar -xvf 'test.tar' content/cell_images
But it doesn't work.
Does anyone know how to proceed?
Thanks!
To extract the files from the tar archiver to the folder content/cell_images use the command-line option -C:
!tar -xvf 'test.tar' -C 'content/cell_images'
Hope this helps!
Although late answer, but might help others:
shutil.unpack_archive works with almost all archive formats (e.g., “zip”, “tar”, “gztar”, “bztar”, “xztar”) and it's simple:
import shutil
shutil.unpack_archive("filename", "path_to_extract")
Connect to drive,
from google.colab import drive
drive.mount('/content/drive')
Check for directory
!ls and !pwd
unzip
!unzip drive/"My Drive"/images.zip -d destination
!tar -xvf "cord-19_2021-12-20.tar.gz"
as given here also
https://colab.research.google.com/github/sudo-ken/compress-decompress-in-Google-Drive/blob/master/Unrar_Unzip_Rar_Zip_in_GDrive.ipynb
If your current directory is the default directory, /content, you can unzip your folder project like this:
%%bash
mkdir foldername
tar -xvf '/content/foldername.tar' -C '/content/'
%%bash lets you script without using ! at the beginning of each line.

How to run a Python script in a '.py' file from a Google Colab notebook?

%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
return false;
}
%run rl_base.py
I run this giving error saying rl_base.py file not found. I have uploaded the same to gdrive in colab and from the same folder I am running my .ipynb file, containing the above code
If you have the test.py file in the corresponding folder in drive as in the below attached image, then the command which you use to run the test.py file is as mentioned below,
!python gdrive/My\ Drive/Colab\ Notebooks/object_detection_demo-master/test.py
Additional Info:
If you jusst want to run !python test.py then you should change directory, by the following command before it,
%cd gdrive/My\ Drive/Colab\ Notebooks/object_detection_demo-master/
When you run your notebook from Google drive, an instance is created only for the notebook. To make the other files in your Google drive folder available you can mount your Google drive with:
from google.colab import drive
drive.mount('/content/gdrive')
Then copy the file you need into the instance with:
!cp gdrive/My\ Drive/path/to/my/file.py .
And run your script:
!python file.py
You should not upload to gdrive. You should upload it to Colab instead, by calling
from google.colab import files
files.upload()
## 1. Check in which directory you are using the command
!ls
## 2.Navigate to the directory where your python script(file.py) is located using the command
%cd path/to/the/python/file
## 3.Run the python script by using the command
!python file.py
A way is also using colabcode.. You will have full ssh access with Visual Studio Code editor.
# install colabcode
!pip install colabcode
# import colabcode
from colabcode import ColabCode
# run colabcode with by deafult options.
ColabCode()
# ColabCode has the following arguments:
# - port: the port you want to run code-server on, default 10000
# - password: password to protect your code server from being accessed by someone else. Note that there is no password by default!
# - mount_drive: True or False to mount your Google Drive
!ColabCode(port=10000, password="abhishek", mount_drive=True)
It will prompt you with a link to visual studio code editor with full access to your colab directories.
Here is a simple answer along with a screenshot
Mount the google drive
from google.colab import drive
drive.mount('/content/drive')
Call the py file path
import sys
import os
py_file_location = "/content/drive/MyDrive/Colab Notebooks"
sys.path.append(os.path.abspath(py_file_location))
It seems necessary to put the .py file's name in ""
!python "file.py"

Categories