How to remove the file from trash in drive in colab - python

I use a google drive in colab. Basically I do the following:
from google.colab import drive
drive.mount('/content/gdrive')
After this I can use os function (listdir, remove) to manipulate the files. The problem is that after removing the file with os.remove it is not actually removed but goes to trash. I would like to remove a file completely but up till now I have not found how to do this.
I tried to locate the file in a trash but the trash directory shows nothing os.listdir('/content/gdrive/.Trash') also I see the files there in the web interface.
How can I remove the file from trash?

It is straightforward to perform this action inside Google Colab by using the pydrive module. In order to delete all files from your Google Drive's Trash folder, code the following lines in your Google Colab notebook:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
my_drive = GoogleDrive(gauth)
After entering authentication code and creating a valid instance of GoogleDrive class, write:
for a_file in my_drive.ListFile({'q': "trashed = true"}).GetList():
# print the name of the file being deleted.
print(f'the file "{a_file['title']}", is about to get deleted permanently.')
# delete the file permanently.
a_file.Delete()
If you'd like to delete a specific file in Trash, then you need to change the last chunck of code. Let's assume you have a file which is named weights-improvement-01-10.5336.hdf5 in your Trash:
for a_file in my_drive.ListFile({'q': "title = 'weights-improvement-01-10.5336.hdf5' and trashed=true"}).GetList():
# print the name of the file being deleted.
print(f'the file "{a_file['title']}", is about to get deleted permanently.')
# delete the file permanently.
a_file.Delete()
If you want to make other and perhaps more complex queries, e.g. delete a bunch of files which have the expression weights-improvement- in common in their names, or files all of which have been modified before a given date; visit:
1) Get all files which matches the query,
2) Search for files and folders.

Jess's answer of using Google Drive API to clear the trash isn't a good way because you might actually have other data in the bin
Because files will move to bin upon delete, so this neat trick reduces the file size to 0 before deleting (cannot be undone!)
import os
delete_filepath = 'drive/My Drive/Colab Notebooks/somefolder/examplefile.png'
open(delete_filename, 'w').close() #overwrite and make the file blank instead - ref: https://stackoverflow.com/a/4914288/3553367
os.remove(delete_filename) #delete the blank file from google drive will move the file to bin instead
Also answered at: https://stackoverflow.com/a/60729089/3553367

If you're looking for a code for removing the file from the trash, you can check this SO post answered by Tanaike - Empty Google Drive Trash:
def main():
credentials = get_credentials()
http = credentials.authorize(httplib2.Http())
service = discovery.build('drive', 'v3', http=http)
service.files().emptyTrash().execute()
or use these methods using Pydrive:
file.Trash() - Move file to trash
file.Untrash() - Move file out of trash
file.Delete() - Permanently delete the file

We can run these codes in google colab successfully:
from google.colab import drive
drive.mount('/root/gdrive', force_remount=True)
from google.colab import auth
auth.authenticate_user()
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')
drive_service.files().emptyTrash().execute()
but if there're many files in the google drive trash bin, we still need to wait for over 20 minutes, sad.
The rclone command has a immediate effect:
!curl https://rclone.org/install.sh | bash
!rclone config # to set google drive client id etc
#delete all files in trash and don't put them in trash again:
!rclone --config /root/sing/rclone.conf delete gdrive: --drive-trashed-only --drive-use-trash=false
!rclone --config /root/sing/rclone.conf ls gdrive: --drive-trashed-only --fast-list

The easiest way to permanently delete files for google drive is the rm command.
You can delete a specific file using rm:
!rm /content/drive/MyDrive/somefolder/examplefile.png
Or, you can delete a whole folder:
!rm -r /content/drive/MyDrive/somefolder
Or, you can use wildcards for deleting files with specific patterns:
!rm /content/drive/MyDrive/somefolder/*.png
However, if you want to delete files from trash, you have to use google drive API.

Related

I can see only files created with my script pydrive

As the title says I have an issue with pydrive. I ran the code given in the pydrive quickstart (https://googleworkspace.github.io/PyDrive/docs/build/html/quickstart.html) and I created a settings and credentials file to avoid entering my credentials all the time.
But when I run this code:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
# Rename the downloaded JSON file to client_secrets.json
# The client_secrets.json file needs to be in the same directory as the script.
gauth = GoogleAuth()
drive = GoogleDrive(gauth)
# List files in Google Drive
fileList = drive.ListFile().GetList()
for drive_file in fileList:
print('title: %s, id: %s' % (drive_file['title'], drive_file['id']))
I can only see the files created with my script. For example if I add this before the list file:
folder = drive.ListFile({'q': "title = 'Python_test' and trashed=false"}).GetList()[0] # get the folder we just created
file = drive.CreateFile({'title': "test.txt", 'parents': [{'id': folder['id']}]})
file.Upload()
I only see the folder and the file ID I just created... And if I add manually a file on my drive (on my browser for example), it doesn't appear.
Anyone got an idea of what's going on?
I just found the problem, it was on my settings.yaml file, I added only this oauth_scope:
oauth_scope:
- https://www.googleapis.com/auth/drive.file
but this gives only access to the files created by the app. To correct that I needed to remove the .file like this
oauth_scope:
- https://www.googleapis.com/auth/drive
If you want more details about the different scopes, check this link:
https://developers.google.com/identity/protocols/oauth2/scopes

How to work with Google Colab efficiently?

I try to train a neural network on Colab using a GPU there. I am now wondering if I am on the right pave and if all the steps I am doing are necessary, because the process I am following does not appear very efficient to me.
# Install the PyDrive wrapper & import libraries.
# This only needs to be done once per notebook.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
import os
# choose a local (colab) directory to store the data.
local_root_path = os.path.expanduser("~/data")
try:
os.makedirs(local_root_path)
except: pass
def ListFolder(google_drive_id, destination):
file_list = drive.ListFile({'q': "'%s' in parents and trashed=false" % google_drive_id}).GetList()
counter = 0
for f in file_list:
# If it is a directory then, create the dicrectory and upload the file inside it
if f['mimeType']=='application/vnd.google-apps.folder':
folder_path = os.path.join(destination, f['title'])
os.makedirs(folder_path)
print('creating directory {}'.format(folder_path))
ListFolder(f['id'], folder_path)
else:
fname = os.path.join(destination, f['title'])
f_ = drive.CreateFile({'id': f['id']})
f_.GetContentFile(fname)
counter += 1
print('{} files were uploaded in {}'.format(counter, destination))
ListFolder("1s1Ks_Gf_cW-F-RwXFjBu96svbmqiXB0o", local_root_path)
This commands allow to connect the Notebook in Colab with my Google Drive and stores the data in Colab. Because I have a lot of images (more than 180k) the storage of the data in Colab takes very, very long and partially the connection breaks. I am now wondering if it is necessray to upload all the data from my Google Drive to Colab?
If no, what do I have to do instead to work with the data from Google Drive?
If yes, is there a way to do this more efficiently?
Or is there maybe a completely different way I should work with Colab?
You can access files directly on your Google drive without copying them into Notebook environment.
Execute this code in one cell:
from google.colab import drive
drive.mount('/content/gdrive')
And try:
!ls /content/gdrive
Now you can copy your files from/to /content/gdrive directory and they will appear in your Google Drive.

Why does Google Drive API corrupt my file extension when uploading files through Python?

I have created a Python script that pulls files with a .xlsx format from a folder on my computer and uploads the file to a specific folder in my Google Drive. This is using the pydrive package in Python. The script runs with no issues, and the files are uploaded as expected. However, for some reason, when the uploaded Google Drive file is downloaded and re-opened, Excel gives the following error message:
Excel cannot open the file...because the file format or file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file.
When I open the file directly on my computer, it opens fine with no issues. When I manually drag/upload the file into the Google Drive folder, and then re-download the file, it opens with no problem. The transformation seems to be coming from my Python script (see below).
Can anyone provide any help here? I have been trying different things and I keep getting the same result. I can provide more information if necessary.
Updated to add full Python Script:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
import glob,os,shutil
import datetime
os.chdir(os.path.dirname(os.path.abspath(__file__)))
gauth = GoogleAuth()
#gauth.LocalWebserverAuth()
# Try to load saved client credentials
gauth.LoadCredentialsFile("mycreds.txt")
if gauth.credentials is None:
# Authenticate if they're not there
gauth.LocalWebserverAuth()
elif gauth.access_token_expired:
# Refresh them if expired
gauth.Refresh()
else:
# Initialize the saved creds
gauth.Authorize()
# Save the current credentials to a file
gauth.SaveCredentialsFile("mycreds.txt")
drive = GoogleDrive(gauth)
#Google Drive Folder ID
fid = '[FOLDER ID PLACEHOLDER]'
#Check to see if today's folder is created on PC
date = datetime.date.today()
today = date.strftime('%Y-%m-%d')
starting_folder = '[FOLDER PLACEHOLDER]'
if not os.path.exists(starting_folder + "/" + today):
os.makedirs(starting_folder + "/" + today)
destination_folder = starting_folder + "/" + today
#Change directory to the folder where the bulk bid tools are stored
os.chdir("[FOLDER PLACEHOLDER]")
for file in glob.glob("*.xlsx"):
try:
print(file)
with open(file,"r") as f:
fn = os.path.basename(f.name)
file_drive = drive.CreateFile({'parents':[{'kind':"drive#parentReference",'id':fid}],'title':fn})
file_drive.Upload()
print("The file: " + fn + " has been uploaded.")
shutil.move(starting_folder + "/" + fn,destination_folder + "/" + fn)
except:
pass
print("All files have been uploaded")
You are not actually passing the information of the file to the request. Meaning, you are not really uploading the actual "bytes" of this file to Drive. Just creating an empty drive file with the same name in the determined folder.
If you look at the documentation for pyDrive you can see that after calling of CreateFile they use SetContentFile.
Copied from the documentation you can see an example like so:
file2 = drive.CreateFile()
file2.SetContentFile('hello.png') # This line needs to be added to your code
# with the name of the file in your computer
file2.Upload()
# Also check the mimetype afterwards to check the file has been correctly uploaded
print('Created file %s with mimeType %s' % (file2['title'],
file2['mimeType']))
# Created file hello.png with mimeType image/png
Also from the comments you say you are still running python2 code. Take into consideration that python 2 is "dead" and there will not be more security updates, neither development/support. You should really be considering change as a lot of package and modules are also dropping (or will start to do so) python 2 support.
More information of this issue in Sunsetting Python 2.

How to upload dataset in google colaboratory?

I need to upload dataset of images in google colaboratory. It has subfolder inside it which contains images. Whatever I found on the net was for the single file.
from google.colab import files
uploaded = files.upload()
Is there any way to do it?
For uploading data to Colab, you have three methods.
Method 1
You can directly upload file or directory in Colab UI
The data is saved in Colab local machine. In my experiment, there are three features:
1) the upload speed is good.
2) it will remain directory structure but it will not unzip directly. You need to execute this code in Colab cell
!makedir {dir_name}
!unzip {zip_file} -d {dir_name}
3) Most importantly, when Colab crashes, the data will be deleted.
Method 2
Execute the code in Colab cell:
from google.colab import files
uploaded = files.upload()
In my experiment, when you run the cell, it appears the upload button. and when the cell executing indicator is still running, you choose a file. 1) After execution, the file name will appear in the result panel. 2)Refresh Colab files, you will see the file. 3) Or execute !ls, you shall see you file. If not, the file is not uploaded successfully.
Method 3
If your data is from kaggle, you can use Kaggle API to download data to Colab local directory.
Method 4
Upload data to Google Drive, you can use 1)Google Drive Web Browser or 2) Drive API (https://developers.google.com/drive/api/v3/quickstart/python). To access drive data, use the following code in Colab.
from google.colab import drive
drive.mount('/content/drive')
I would recommend uploading data to Google Drive because it is permanent.
You need to copy your dataset into Google Drive. Then obtain the DATA_FOLDER_ID.
The best way to do so, is to open the folder in your Google Drive and copy the last part of html address. For example the folder id for the link:
https://drive.google.com/drive/folders/xxxxxxxxxxxxxxxxxxxxxxxx is xxxxxxxxxxxxxxxxxxxxxxxx
Then you can create local folders and upload each file recursively.
DATA_FOLDER_ID = 'xxxxxxxxxxxxxxxxxxxxxxxx'
ROOT_PATH = '~/you_path'
!pip install -U -q PyDrive
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# choose a local (colab) directory to store the data.
local_root_path = os.path.expanduser(ROOT_PATH)
try:
os.makedirs(local_root_path)
except: pass
def ListFolder(google_drive_id, destination):
file_list = drive.ListFile({'q': "'%s' in parents and trashed=false" % google_drive_id}).GetList()
counter = 0
for f in file_list:
# If it is a directory then, create the dicrectory and upload the file inside it
if f['mimeType']=='application/vnd.google-apps.folder':
folder_path = os.path.join(destination, f['title'])
os.makedirs(folder_path)
print('creating directory {}'.format(folder_path))
ListFolder(f['id'], folder_path)
else:
fname = os.path.join(destination, f['title'])
f_ = drive.CreateFile({'id': f['id']})
f_.GetContentFile(fname)
counter += 1
print('{} files were uploaded in {}'.format(counter, destination))
ListFolder(DATA_FOLDER_ID, local_root_path)

Where is dumped file in Google Colab?

When I wrote this code in google colab:
import pickle
x=10;
output = open('data.pkl', 'wb')
pickle.dump(x,output)
x is saved and also in another window in Google Colab I can access this file and read it but I don't know where is the file. Does anybody know where is it?
It’s in the current directory. You can also download it back to your local machine with
from google.colab import files
files.download(‘data.pkl’)
You can upload it to your Google drive:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# get the folder id where you want to save your file
file = drive.CreateFile({'parents':[{u'id': folder_id}]})
file.SetContentFile('data.pkl')
file.Upload()
This code basically fetches the data.pkl from the cloud VM and upload it permanently to your Google Drive under a specific folder.
If you choose not to specify a folder, the file will be uploaded under the root of your Google Drive.
You can save and read the dumped file anywhere in your google drive folder.
import gc
import pickle
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
pick_insert = open('drive/My Drive/data.pickle','wb')
pickle.dump(data, pick_insert)
pick_insert.close()
pick_read = open('drive/My Drive/data.pickle','rb')
data = pickle.load(pick_read)
pick_read.close()
Saved dump then can be loaded from the same dir as below,
dump(stories, open('review_dataset.pkl', 'wb'))
stories = load(open('review_dataset.pkl', 'rb'))
In my case, I was trying to access the pickle files in a sub-directory (data) under the . directory.
The data directory has 2 pickle files generated from the pre-processing step.
So I tried #korakot suggestion in the comments, and it worked fine!. That what I did so far.
# connect your colab with the drive
from google.colab import drive
drive.mount('/content/drive')
# list the directories in the home directory
import os
os.listdir('.')
# move the sub-directory (data) into google-drive
mv /content/data/ /content/drive/MyDrive/
You can obtain the pkl file using the following statements
from google.colab import files files
files.download("model.pkl")
Not only pkl you can retrieve other format of data also by changing the extension
you can save your pkl file by inputting this instead:
import pickle
from google.colab import drive
drive.mount('/content/drive')
x=10;
output = open('/content/drive/MyDrive/Colab Notebooks/data.pkl', 'wb')
pickle.dump(x,output)
and open it using this code:
import pickle
from google.colab import drive
drive.mount('/content/drive')
x = pickle.load(open('/content/drive/MyDrive/Colab Notebooks/data.pkl', 'rb'))
it worked for me :)

Categories