google drive api to upload all pdfs to google drive - python

I am using the pydrive to upload pdf files to my google drive folder. I am wanting to send all *pdf files in a local folder at once with this code but not sure where to go from here? Should I use glob? If so I would like to see an example, please.
working code that sends 1 file to the designated google drive folder:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(g_login)
folder_id = 'google_drive_id_goes_here'
f = drive.CreateFile({'title': 'testing_pdf',
'mimeType': 'application/pdf',
'parents': [{'kind': 'drive#fileLink', 'id':folder_id}]})
f.SetContentFile('/Users/Documents/python/google_drive/testing.pdf')
f.Upload()

You cant upload files at once. Create file with the API is a single thing and pydrive as no mechanism for uploading more then one .
Your going to have to put this in a loop and upload each file as you go.
import os
directory = 'the/directory/you/want/to/use'
for filename in os.listdir(directory):
if filename.endswith(".txt"):
f = open(filename)
lines = f.read()
print (lines[10])
continue
else:
continue

Related

How to download file from google drive folder?

I have a script that gets a list of files from google drive
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
gauth.LoadCredentialsFile("mycreds.txt")
gauth.LoadCredentialsFile("mycreds.txt")
if gauth.credentials is None:
gauth.LocalWebserverAuth()
elif gauth.access_token_expired:
gauth.Refresh()
else:
gauth.Authorize()
gauth.SaveCredentialsFile("mycreds.txt")
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
folder = "1CNtWRS005fkX6vlZowZiXYITNGifXPKS"
file_list = drive.ListFile({'q': f"'{folder}' in parents"}).GetList()
for file in file_list:
print(file['title'])
-> 1.txt
It receives data only from its disk, but I need the script to receive a list of files from a folder to which it has access - "available to me". I have a folder ID, but if I substitute it in the folder field, nothing happens
I think gdown could help you.
pip install gdown
Then could try something like this:
import gdown
id = "folderId..."
gdown.download_folder(id=id, quiet=True, use_cookies=False)

I can see only files created with my script pydrive

As the title says I have an issue with pydrive. I ran the code given in the pydrive quickstart (https://googleworkspace.github.io/PyDrive/docs/build/html/quickstart.html) and I created a settings and credentials file to avoid entering my credentials all the time.
But when I run this code:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
# Rename the downloaded JSON file to client_secrets.json
# The client_secrets.json file needs to be in the same directory as the script.
gauth = GoogleAuth()
drive = GoogleDrive(gauth)
# List files in Google Drive
fileList = drive.ListFile().GetList()
for drive_file in fileList:
print('title: %s, id: %s' % (drive_file['title'], drive_file['id']))
I can only see the files created with my script. For example if I add this before the list file:
folder = drive.ListFile({'q': "title = 'Python_test' and trashed=false"}).GetList()[0] # get the folder we just created
file = drive.CreateFile({'title': "test.txt", 'parents': [{'id': folder['id']}]})
file.Upload()
I only see the folder and the file ID I just created... And if I add manually a file on my drive (on my browser for example), it doesn't appear.
Anyone got an idea of what's going on?
I just found the problem, it was on my settings.yaml file, I added only this oauth_scope:
oauth_scope:
- https://www.googleapis.com/auth/drive.file
but this gives only access to the files created by the app. To correct that I needed to remove the .file like this
oauth_scope:
- https://www.googleapis.com/auth/drive
If you want more details about the different scopes, check this link:
https://developers.google.com/identity/protocols/oauth2/scopes

PyDrive upload speed is too slow

I'm using PyDrive to upload files from my RPi to a specific folder in my Google Drive. It is successfully working, but the speed is terribly slow. For a .npy file (binary numpy file) that is only 40kB, the upload speed is around 2 seconds. When I try uploading a different file (.pptx) that is 2MB, the upload speed is around 5 seconds. I also tried this on my Mac, and it has the same upload speed.
Is there a better way to do this? I need an upload speed that is less than a second since I'm collecting data every second. Here is the code:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
import os
import time
credentials = '/***/pydrive_credentials.txt'
directory = '/***/remote_dir'
gauth = GoogleAuth()
gauth.LoadCredentialsFile(credentials)
# gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
# get id of designated folder in Google Drive
folder = drive.ListFile({'q': "title = 'sample pydrive folder' and trashed=false"}).GetList()[0]
for filename in os.listdir(directory):
f = drive.CreateFile({'title': filename, 'parents': [{'id': folder['id']}]})
# f = drive.CreateFile()
filepath = os.path.join(directory, filename)
f.SetContentFile(filepath)
start = time.time()
f.Upload()
end = time.time()
print(end-start)
# delete file after upload
# os.remove(filepath)
# to ensure no memory leakage
f = None
filepath = None
print("Uploaded: {}".format(filename))

How to upload dataset in google colaboratory?

I need to upload dataset of images in google colaboratory. It has subfolder inside it which contains images. Whatever I found on the net was for the single file.
from google.colab import files
uploaded = files.upload()
Is there any way to do it?
For uploading data to Colab, you have three methods.
Method 1
You can directly upload file or directory in Colab UI
The data is saved in Colab local machine. In my experiment, there are three features:
1) the upload speed is good.
2) it will remain directory structure but it will not unzip directly. You need to execute this code in Colab cell
!makedir {dir_name}
!unzip {zip_file} -d {dir_name}
3) Most importantly, when Colab crashes, the data will be deleted.
Method 2
Execute the code in Colab cell:
from google.colab import files
uploaded = files.upload()
In my experiment, when you run the cell, it appears the upload button. and when the cell executing indicator is still running, you choose a file. 1) After execution, the file name will appear in the result panel. 2)Refresh Colab files, you will see the file. 3) Or execute !ls, you shall see you file. If not, the file is not uploaded successfully.
Method 3
If your data is from kaggle, you can use Kaggle API to download data to Colab local directory.
Method 4
Upload data to Google Drive, you can use 1)Google Drive Web Browser or 2) Drive API (https://developers.google.com/drive/api/v3/quickstart/python). To access drive data, use the following code in Colab.
from google.colab import drive
drive.mount('/content/drive')
I would recommend uploading data to Google Drive because it is permanent.
You need to copy your dataset into Google Drive. Then obtain the DATA_FOLDER_ID.
The best way to do so, is to open the folder in your Google Drive and copy the last part of html address. For example the folder id for the link:
https://drive.google.com/drive/folders/xxxxxxxxxxxxxxxxxxxxxxxx is xxxxxxxxxxxxxxxxxxxxxxxx
Then you can create local folders and upload each file recursively.
DATA_FOLDER_ID = 'xxxxxxxxxxxxxxxxxxxxxxxx'
ROOT_PATH = '~/you_path'
!pip install -U -q PyDrive
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# choose a local (colab) directory to store the data.
local_root_path = os.path.expanduser(ROOT_PATH)
try:
os.makedirs(local_root_path)
except: pass
def ListFolder(google_drive_id, destination):
file_list = drive.ListFile({'q': "'%s' in parents and trashed=false" % google_drive_id}).GetList()
counter = 0
for f in file_list:
# If it is a directory then, create the dicrectory and upload the file inside it
if f['mimeType']=='application/vnd.google-apps.folder':
folder_path = os.path.join(destination, f['title'])
os.makedirs(folder_path)
print('creating directory {}'.format(folder_path))
ListFolder(f['id'], folder_path)
else:
fname = os.path.join(destination, f['title'])
f_ = drive.CreateFile({'id': f['id']})
f_.GetContentFile(fname)
counter += 1
print('{} files were uploaded in {}'.format(counter, destination))
ListFolder(DATA_FOLDER_ID, local_root_path)

Where is dumped file in Google Colab?

When I wrote this code in google colab:
import pickle
x=10;
output = open('data.pkl', 'wb')
pickle.dump(x,output)
x is saved and also in another window in Google Colab I can access this file and read it but I don't know where is the file. Does anybody know where is it?
It’s in the current directory. You can also download it back to your local machine with
from google.colab import files
files.download(‘data.pkl’)
You can upload it to your Google drive:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# get the folder id where you want to save your file
file = drive.CreateFile({'parents':[{u'id': folder_id}]})
file.SetContentFile('data.pkl')
file.Upload()
This code basically fetches the data.pkl from the cloud VM and upload it permanently to your Google Drive under a specific folder.
If you choose not to specify a folder, the file will be uploaded under the root of your Google Drive.
You can save and read the dumped file anywhere in your google drive folder.
import gc
import pickle
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
pick_insert = open('drive/My Drive/data.pickle','wb')
pickle.dump(data, pick_insert)
pick_insert.close()
pick_read = open('drive/My Drive/data.pickle','rb')
data = pickle.load(pick_read)
pick_read.close()
Saved dump then can be loaded from the same dir as below,
dump(stories, open('review_dataset.pkl', 'wb'))
stories = load(open('review_dataset.pkl', 'rb'))
In my case, I was trying to access the pickle files in a sub-directory (data) under the . directory.
The data directory has 2 pickle files generated from the pre-processing step.
So I tried #korakot suggestion in the comments, and it worked fine!. That what I did so far.
# connect your colab with the drive
from google.colab import drive
drive.mount('/content/drive')
# list the directories in the home directory
import os
os.listdir('.')
# move the sub-directory (data) into google-drive
mv /content/data/ /content/drive/MyDrive/
You can obtain the pkl file using the following statements
from google.colab import files files
files.download("model.pkl")
Not only pkl you can retrieve other format of data also by changing the extension
you can save your pkl file by inputting this instead:
import pickle
from google.colab import drive
drive.mount('/content/drive')
x=10;
output = open('/content/drive/MyDrive/Colab Notebooks/data.pkl', 'wb')
pickle.dump(x,output)
and open it using this code:
import pickle
from google.colab import drive
drive.mount('/content/drive')
x = pickle.load(open('/content/drive/MyDrive/Colab Notebooks/data.pkl', 'rb'))
it worked for me :)

Categories