Unzipping .7z File on Google Colab - python

I have a Zip file named 'mathoverflow.net.7z' in my google drive which I have loaded to colab using the given code. But, when I try to unzip it I get an error. Please suggest a way to rectify this.
This is my code:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
downloaded = drive.CreateFile({'id':'15h0f8p9n6OG1B796q-gbP5oXstCuOcDM'})
downloaded.GetContentFile('mathoverflow.net.7z')
Till this it works fine. But when I run this I get the following error.
!unzip mathoverflow.net.7z
Archive: mathoverflow.net.7z End-of-central-directory signature not
found. Either this file is not a zipfile, or it constitutes one
disk of a multi-part archive. In the latter case the central
directory and zipfile comment will be found on the last disk(s) of
this archive. unzip: cannot find zipfile directory in one of
mathoverflow.net.7z or
mathoverflow.net.7z.zip, and cannot find mathoverflow.net.7z.ZIP, period.

You can use 7z instead. It's already pre-installed in Colab
!7z e mathoverflow.net.7z

!pip install pyunpack
!pip install patool
from pyunpack import Archive
Archive('file_name.7z').extractall('path/to/')

unzip will not work, you need a different tool: https://www.simplified.guide/linux/extract-7z-file
I don't know that you have installation privileges on colab, so you might have to do it in the privacy of your own machine.

Related

Failed to start localwebserver with Pydrive to Google Drive in Kaggle

Im following this tutorial from the Pydrive documentation to try to connect my Kaggle Notebook to Google Drive so i can upload files to it.
I have both client_secrets.json and credentials.json created and a settings.yaml which has the following code:
client_config_backend: file
client_config_file: client_secrets.json
save_credentials: True
save_credentials_backend: file
save_credentials_file: credentials.json
get_refresh_token: True
oauth_scope:
- https://www.googleapis.com/auth/drive.file
- https://www.googleapis.com/auth/drive.install
I try to authenticate using the following code:
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
But whenever i run this code i either get this
or a it gives me a
that allows me to authenticate, but after selecting my account and the permissions i want to give i get this
and i dont know how to solve this.

How to run a Python script in a '.py' file from a Google Colab notebook?

%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
return false;
}
%run rl_base.py
I run this giving error saying rl_base.py file not found. I have uploaded the same to gdrive in colab and from the same folder I am running my .ipynb file, containing the above code
If you have the test.py file in the corresponding folder in drive as in the below attached image, then the command which you use to run the test.py file is as mentioned below,
!python gdrive/My\ Drive/Colab\ Notebooks/object_detection_demo-master/test.py
Additional Info:
If you jusst want to run !python test.py then you should change directory, by the following command before it,
%cd gdrive/My\ Drive/Colab\ Notebooks/object_detection_demo-master/
When you run your notebook from Google drive, an instance is created only for the notebook. To make the other files in your Google drive folder available you can mount your Google drive with:
from google.colab import drive
drive.mount('/content/gdrive')
Then copy the file you need into the instance with:
!cp gdrive/My\ Drive/path/to/my/file.py .
And run your script:
!python file.py
You should not upload to gdrive. You should upload it to Colab instead, by calling
from google.colab import files
files.upload()
## 1. Check in which directory you are using the command
!ls
## 2.Navigate to the directory where your python script(file.py) is located using the command
%cd path/to/the/python/file
## 3.Run the python script by using the command
!python file.py
A way is also using colabcode.. You will have full ssh access with Visual Studio Code editor.
# install colabcode
!pip install colabcode
# import colabcode
from colabcode import ColabCode
# run colabcode with by deafult options.
ColabCode()
# ColabCode has the following arguments:
# - port: the port you want to run code-server on, default 10000
# - password: password to protect your code server from being accessed by someone else. Note that there is no password by default!
# - mount_drive: True or False to mount your Google Drive
!ColabCode(port=10000, password="abhishek", mount_drive=True)
It will prompt you with a link to visual studio code editor with full access to your colab directories.
Here is a simple answer along with a screenshot
Mount the google drive
from google.colab import drive
drive.mount('/content/drive')
Call the py file path
import sys
import os
py_file_location = "/content/drive/MyDrive/Colab Notebooks"
sys.path.append(os.path.abspath(py_file_location))
It seems necessary to put the .py file's name in ""
!python "file.py"

Extract Google Drive zip from Google colab notebook

I already have a zip of (2K images) dataset on a google drive. I have to use it in a ML training algorithm.
Below Code extracts the content in a string format:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import io
import zipfile
# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# Download a file based on its file ID.
#
# A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz
file_id = '1T80o3Jh3tHPO7hI5FBxcX-jFnxEuUE9K' #-- Updated File ID for my zip
downloaded = drive.CreateFile({'id': file_id})
#print('Downloaded content "{}"'.format(downloaded.GetContentString(encoding='cp862')))
But I have to extract and store it in a separate directory as it would be easier for processing (as well as for understanding) of the dataset.
I tried to extract it further, but getting "Not a zipfile error"
dataset = io.BytesIO(downloaded.encode('cp862'))
zip_ref = zipfile.ZipFile(dataset, "r")
zip_ref.extractall()
zip_ref.close()
Google Drive Dataset
Note: Dataset is just for reference, I have already downloaded this zip to my google drive, and I'm referring to file in my drive only.
You can simply use this
!unzip file_location
TO unzip a file to a directory:
!unzip path_to_file.zip -d path_to_directory
To extract Google Drive zip from a Google colab notebook:
import zipfile
from google.colab import drive
drive.mount('/content/drive/')
zip_ref = zipfile.ZipFile("/content/drive/My Drive/ML/DataSet.zip", 'r')
zip_ref.extractall("/tmp")
zip_ref.close()
Colab research team has a notebook for helping you out.
Still, in short, if you are dealing with a zip file, like for me it is mostly thousands of images and I want to store them in a folder within drive then do this --
!unzip -u "/content/drive/My Drive/folder/example.zip" -d "/content/drive/My Drive/folder/NewFolder"
-u part controls extraction only if new/necessary. It is important if suddenly you lose connection or hardware switches off.
-d creates the directory and extracted files are stored there.
Of course before doing this you need to mount your drive
from google.colab import drive
drive.mount('/content/drive')
I hope this helps! Cheers!!
First, install unzip on colab:
!apt install unzip
then use unzip to extract your files:
!unzip source.zip -d destination.zip
First create a new directory:
!mkdir file_destination
Now, it's the time to inflate the directory with the unzipped files with this:
!unzip file_location -d file_destination
Mount GDrive:
from google.colab import drive
drive.mount('/content/gdrive')
Open the link -> copy authorization code -> paste that into the prompt and press "Enter"
Check GDrive access:
!ls "/content/gdrive/My Drive"
Unzip (q stands for "quiet") file from GDrive:
!unzip -q "/content/gdrive/My Drive/dataset.zip"
For Python
Connect to drive,
from google.colab import drive
drive.mount('/content/drive')
Check for directory
!ls
and !pwd
For unzip
!unzip drive/"My Drive"/images.zip
After mounting on drive, use shutil.unpack_archive. It works with almost all archive formats (e.g., “zip”, “tar”, “gztar”, “bztar”, “xztar”) and it's simple:
import shutil
shutil.unpack_archive("filename", "path_to_extract")
Please use this command in google colab
Unzip the file you want to extract and then the location
!unzip "drive/My Drive/Project/yourfilename.zip" -d "drive/My Drive/Project/yourfolder"
Instead of GetContentString(), use GetContentFile() instead. It will save the file instead of returning the string.
downloaded.GetContentFile('images.zip')
Then you can unzip it later with unzip.
SIMPLE WAY TO CONNECT
1) You'll have to verify authentication
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
2)To fuse google drive
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
3)To verify credentials
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
4)Create a drive name to use it in colab ('gdrive') and check if it's working
!mkdir gdrive
!google-drive-ocamlfuse gdrive
!ls gdrive
!cd gdrive
This is what worked for me.
!apt install unzip
Then I used this code to unzip the file
!unzip /content/file.zip -d /content/
Without installing unzip on Colab first you'll always receive error messages.
Try this:
!unpack file.zip
If its now working or file is 7z try below
!apt-get install p7zip-full
!p7zip -d file_name.tar.7z
!tar -xvf file_name.tar
Or
!pip install pyunpack
!pip install patool
from pyunpack import Archive
Archive(‘file_name.tar.7z’).extractall(‘path/to/’)
!tar -xvf file_name.tar
We assumed that you're already mounting your googleDrive to googleColab. in case if you just want to extract zip file thats contains .csv extention. just call pandas attribute read_csv
pd.read_csv('/content/drive/My Drive/folder/example.zip')
in my idea, you must go to a certain path for example:
from google.colab import drive drive.mount('/content/drive/') cd
drive/MyDrive/f/
then :
!apt install unzip
!unzip zip_folder.zip -d unzip_folder
enter image description here

Use PyDrive on Heroku

I'm trying to use PyDrive on Heroku.
My code is as follows.
from pydrive.auth import GoogleAuth
GoogleAuth.DEFAULT_SETTINGS['client_config_file'] = os.path.join(os.path.dirname(__file__), 'client_secrets.json')
However, the heroku console returned "No such file or directory: '/app/client_secrets.json'".
Through heroku run bash command, I confirmed that '/app/client_secrets.json' surely existed.
How do I fix this?
You should do this first
gauth = GoogleAuth()
then
GoogleAuth.DEFAULT_SETTINGS['client_config_file'] = os.path.join(os.path.dirname(__file__), 'client_secrets.json')
gauth.LoadCredentials()
Hope this is helpful to you.

Google Drive OAuth2

I'm trying to sync between python and google drive with the following details:
Authorized JavaScript origins: http://localhost:8080
Authorized redirect URIs: http://localhost:8080/
I copied the json file to the directory and ran this code:
from pydrive.auth import GoogleAuth
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
and I got this error:
from oauth2client.locked_file import LockedFile
ImportError: No module named locked_file
Can you please help me?
Had the same issue.
It looks there was a change in the newest version of the oauth2client, v2.0.0, which broke compatibility with the google-api-python-client module, which now got fixed https://github.com/adrian-the-git/google-api-python-client/commit/2122d3c9b1aece94b64f6b85c6707a42cca8b093, so an upgrade of the google-api-python-client restores compatibility and make everything working again:
$ pip install --upgrade git+https://github.com/google/google-api-python-client

Categories