Error downloading a file from Google Drive - python

I exported some images from Google Earth Engine to Google Drive. I need to download those images to a local drive using a Python script. Then, I tried to use oauth2client, apiclient as I saw here:
I got a list of files in Drive and the corresponding IDs, then I use the ID to try to download the file using the gdown lib:
gdown.download(f'https://drive.google.com/uc?id={file_data["id"]}',
f'{download_path}{os.sep}{filename_to_download}.tif')
I got the following error message:
Access denied with the following error:
Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses.
You may still be able to access the file from the browser:
https://drive.google.com/uc?id=<id>
As I got the Drive file list, I suppose that the Drive authentication is ok. If I use the error message suggested link in the browser, I can download the file. If a check file properties at Drive, I can see:
Who can access: not shared.
What should I do to download the files?
This is the complete code:
# https://medium.com/swlh/google-drive-api-with-python-part-i-set-up-credentials-1f729cb0372b
# https://levelup.gitconnected.com/google-drive-api-with-python-part-ii-connect-to-google-drive-and-search-for-file-7138422e0563
# https://stackoverflow.com/questions/38511444/python-download-files-from-google-drive-using-url
import os
from apiclient import discovery
from httplib2 import Http
from oauth2client import client, file, tools
import gdown
class GoogleDrive(object):
# define API scope
def __init__(self, secret_credentials_file_path = './credentials'):
self.DriveFiles = None
SCOPE = 'https://www.googleapis.com/auth/drive'
self.store = file.Storage(f'{secret_credentials_file_path}{os.sep}credentials.json')
self.credentials = self.store.get()
if not self.credentials or self.credentials.invalid:
flow = client.flow_from_clientsecrets(f'{secret_credentials_file_path}{os.sep}client_secret.json',
SCOPE)
self.credentials = tools.run_flow(flow, self.store)
oauth_http = self.credentials.authorize(Http())
self.drive = discovery.build('drive', 'v3', http=oauth_http)
def RetrieveAllFiles(self):
results = []
page_token = None
while True:
try:
param = {}
if page_token:
param['pageToken'] = page_token
files = self.drive.files().list(**param).execute()
# append the files from the current result page to our list
results.extend(files.get('files'))
# Google Drive API shows our files in multiple pages when the number of files exceed 100
page_token = files.get('nextPageToken')
if not page_token:
break
except Exception as error:
print(f'An error has occurred: {error}')
break
self.DriveFiles = results
def GetFileData(self, filename_to_search):
for file_data in self.DriveFiles:
if file_data.get('name') == filename_to_search:
return file_data
else:
return None
def DownloadFile(self, filename_to_download, download_path):
file_data = self.GetFileData(f'{filename_to_download}.tif')
gdown.download(f'https://drive.google.com/uc?id={file_data["id"]}',
f'{download_path}{os.sep}{filename_to_download}.tif')

Google drive may not be the best tool for this, you may want to upload them into a RAW file hosting service like Imgur and download it to a file using requests, you can then read the file using the script or you don't even have to write it to the file and just use image.content instead to specify the image. Here's an example:
image = requests.get("https://i.imgur.com/5SMNGtv.png")
with open("image.png", 'wb') as file:
file.write(image.content)
(You can specify the location of where you want the file to download by adding the PATH before the file name, like this:)
image = requests.get("https://i.imgur.com/5SMNGtv.png")
with open("C://Users//Admin//Desktop//image.png", 'wb') as file:
file.write(image.content)

Solution 1.
Access denied with the following error:
Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses.
You may still be able to access the file from the browser:
https://drive.google.com/uc?id=<id>
In the sharing tab on gdrive (Right click on image, open Share or Get link), please change privacy to anyone with the link. Hopefully your code should work.
Solution 2.
If you can use Google Colab, then you can mount gdrive easily and access files there using
from google.colab import drive
drive.mount('/content/gdrive')

Google has this policy that they do not accept your regular google-/gmail-password. They only accept so called "App Passwords" that you need to create for your google-account in order to authenticate if you are using thirdparty apps

Related

Can you download multiple files from Google Drive asynchronously?

My problem is the following:
I am sending queries via the Google Drive API that fetch all files that match a certain criteria. I won't post the entire code here as it's quite extensive, but the query criteria is just to get all files that belong in folders with a certain name (for example: "I want all files that reside in folders where the folder name contains the string 'meet'").
The code I have written for this particular part, is the following:
import json
import environ
import os
import google.auth
import io
from apiclient import discovery
from httplib2 import Http
from google.cloud import secretmanager
from googleapiclient.http import MediaIoBaseDownload
from oauth2client.service_account import ServiceAccountCredentials
# Imported functions from a local file. Just writing to database and establishing connection
from firestore_drive import add_file, establish_db_connection
.... some other code here ...
def update_files_via_parent_folder(self, parent_id, parent_name):
page_token = None
# Set a query that fetches all files based on the ID of its parent folder
# E.g. "get all files from folder whose ID is parent_id"
query = f"'{parent_id}' in parents"
response = self.execute_query(query, page_token)
files = response.get('files', [])
while True:
# Execute the query, and extract all resulting files in the folder
for file in files:
file_id = file['id']
filename = file['name']
# Start requesting the current file from Drive, and download through a byte-stream
request = self.service.files().get_media(fileId=file_id)
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
dl_counter = 0
while done is False:
# Start downloading the file from Drive, and convert it to JSON (dictionary)
status, done = downloader.next_chunk()
prefab_json = json.loads(fh.getvalue())
# Find the proper collection-name and then add the file to database
collection_name = next(type_name for type_name in self.possible_types if type_name in parent_name)
add_file(self.db, collection_name, filename, file_content=prefab_json)
# Find out if there are more files to download in the same folder
page_token = response.get('nextPageToken', None)
if page_token is None:
if len(files) == 0:
print(f'Folder found, but contained no files.')
break
response = self.execute_query(query, page_token)
files = response.get('files', [])
def execute_query(self, query, page_token):
"""
Helper function for executing a query to Google Drive. Implemented as a function due to repeated usage.
"""
return self.service.files().list(
q=query,
spaces='drive',
fields='nextPageToken, files(id, name)',
pageToken=page_token).execute()
Now my question is this:
Is there a way to download the files asynchronously or in parallel in the following section?
for file in files:
file_id = ...
filename = ...
# Same as above; start download and write to database...
For reference, the point of the code is to extract files that are located on Google Drive, and copy them over to another database. I'm not concerned with local storage, only fetching from Drive and writing to a database (if this is even possible to do in parallel).
I've tried various options such as multiprocessing.pool, multiprocessing.ThreadPool, and asyncio, but I'm not sure if I actually used them correctly. I can also mention that the database used, is Firestore.
Additional note: the reason I want to do it, is because this sequential operation is extremely slow, and I want to deploy this as a cloud function (which has a maximum time limit of 540 second (9 minutes)).
Any feedback is welcome :)

PyDrive "Daily Limit for Unauthenticated Use Exceeded. Continued use requires signup" despite logged in and authenticated

I have a bit of python code using PyDrive, to download files from a shared google drive folder. It looks like this.
from pydrive2.auth import GoogleAuth
from pydrive2.drive import GoogleDrive
gauth = GoogleAuth()
gauth.CommandLineAuth()
drive = GoogleDrive(gauth)
file_list = drive.ListFile({'q': "'XXXXXXXXXXXXXXXXXXXXXXXX' in parents and trashed=false"}).GetList()
for file2 in file_list:
string = "'"+ file2['id'] +"'" + " in parents and trashed=false"
file_list2 = drive.ListFile({'q': string}).GetList()
print(file2['title'], file2['id'])
for file3 in file_list2:
file3.GetContentFile('/data/' + file3['title'])
print('downloaded: ', file3['title'], file3['id'])
This downloads roughly a TB of data in many tens of thousands of files. (The original file link is not XXXXXXXX, I just added that to avoid using the actual file ID in this example). The code works great for like the first 25k files, when it suddenly fails with
code: 403,
message: Daily Limit for Unauthenticated Use Exceeded. Continued use requires signup.
I sign in successfully with OAuth and my google drive api is enabled and it has a quota of 10,000,000 requests a day (far more than I am using). Why does it think I am still using an anonymous account and hitting me with this error?
It is my understanding that because it is a file that I have in my shared drive and because I am signed in I shouldn't be encountering this, whats up with that?

Importing mime .eml file to gmail API using the import function

I am a python developer and somewhat new to using Google's gMail API to import .eml files into a gMail account.
I've gotten all of the groundwork done getting my oAuth credentials working, etc.
However, I am stuck where I load in the data-file. I need help loading the message data in to place in a variable..
How do I create the message_data variable reference - in the appropriate format - from my sample email file (which is stored in rfc822 format) that is on disk?
Assuming I have a file on disk at /path/to/file/sample.eml ... how do I load that to message_data in the proper format for the gMail API import call?
...
# how do I properly load message_data from the rfc822 disk file?
media = MediaIoBaseUpload(message_data, mimetype='message/rfc822')
message_response = service.users().messages().import_(
userId='me',
fields='id',
neverMarkSpam=True,
processForCalendar=False,
internalDateSource='dateHeader',
media_body=media).execute(num_retries=2)
...
You want to import an eml file using Gmail API.
You have already been able to get and put values for Gmail API.
You want to achieve this using google-api-python-client.
service in your script can be used for uploading the eml file.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
Modification point:
In this case, the method of "Users.messages: insert" is used.
Modified script:
Before you run the script, please set the filename with the path of the eml file.
eml_file = "###" # Please set the filename with the path of the eml file.
user_id = "me"
f = open(eml_file, "r", encoding="utf-8")
eml = f.read()
f.close()
message_data = io.BytesIO(eml.encode('utf-8'))
media = MediaIoBaseUpload(message_data, mimetype='message/rfc822', resumable=True)
metadata = {'labelIds': ['INBOX']}
res = service.users().messages().insert(userId=user_id, body=metadata, media_body=media).execute()
print(res)
In above script, the following modules are also required.
import io
from googleapiclient.http import MediaIoBaseUpload
Note:
In above modified script, {'labelIds': ['INBOX']} is used as the metadata. In this case, the imported eml file can be seen at INBOX of Gmail. If you want to change this, please modify this.
Reference:
Users.messages: insert
If I misunderstood your question and this was not the result you want, I apologize.

How can I download zip to python directory from google storage after obtaing response object?

After running the following code successfully, I think I am close to get access to the zip file in gcloud storage. However, I really cannot figure out what to do next, download or something to make the zip file available for python environment as a programmable object.
from gs import GSClient
client = GSClient()
object_meta = client.get("b/rcmikejupyter/o/output1.zip")
with client.get("b/rcmikejupyter/o/output1.zip", params=dict(alt="media"), stream=True) as res:
object_bytes = res.raw.read()
Assuming this is a byesobject
with open("pathto/yourfile.zip", "wb") as file:
file.write(object_bytes)

python + google drive: upload xlsx, convert to google sheet, get sharable link

The flow of my desired program is:
Upload an xlsx spreadsheet to drive (it was created using pandas to_excel)
Convert it to Google Sheets format
Specify that it is editable by anyone with the link
Get the link and share it with someone who will enter information
Download the completed sheet
I am currently using PyDrive, which solves steps 1 and 5, but there are a few unsolved problems.
How can I convert to google sheets format? I tried to just specify the mimeType as 'application/vnd.google-apps.spreadsheet' when I created the file to upload with PyDrive, but that gave me an error.
How can I set the file to be editable by anyone with the link? Once that is set, I can get the sharing link easily enough with PyDrive.
UPDATE: conversion from xlsx to google sheets is easy with a convert=True flag. See below. I am still seeking a way to set the sharing settings of my new file to "anyone with the link can edit".
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
test_file = drive.CreateFile({'title': 'testfile.xlsx'})
test_file.SetContentFile('testfile.xlsx')
test_file.Upload({'convert': True})
There is an Optional query parameter of "convert", for both the "INSERT" and "COPY" method;
convert=true,
Whether to convert this file to the corresponding Google Docs format. (Default: false)
There is a python example here:
Google Documentation - Copy
You need to use the Python client library for the code to work.
from apiclient import errors
from apiclient.http import MediaFileUpload
# ...
def insert_file(service, title, description, parent_id, mime_type, filename):
"""Insert new file.
Args:
service: Drive API service instance.
title: Title of the file to insert, including the extension.
description: Description of the file to insert.
parent_id: Parent folder's ID.
mime_type: MIME type of the file to insert.
filename: Filename of the file to insert.
Returns:
Inserted file metadata if successful, None otherwise.
"""
media_body = MediaFileUpload(filename, mimetype=mime_type, resumable=True)
body = {
'title': title,
'description': description,
'mimeType': mime_type
}
# Set the parent folder.
if parent_id:
body['parents'] = [{'id': parent_id}]
try:
file = service.files().insert(
body=body,
convert=true,
media_body=media_body).execute()
# Uncomment the following line to print the File ID
# print 'File ID: %s' % file['id']
return file
except errors.HttpError, error:
print 'An error occured: %s' % error
return None
I haven't tried this, so you'll need to test it.
In order to set the file to be editable for anyone with the link , you have to insert a new permission with the following information:
from apiclient import errors
# ...
def share_with_anyone(service, file_id):
"""Shares the file with anyone with the link
Args:
service: Drive API service instance.
file_id: ID of the file to insert permission for.
Returns:
The inserted permission if successful, None otherwise.
"""
new_permission = {
'type': "anyone",
'role': "writer",
'withLink': True
}
try:
return service.permissions().insert(
fileId=file_id, body=new_permission).execute()
except errors.HttpError, error:
print 'An error occurred: %s' % error
return None
then to get the link you go to : file["alternateLink"]

Categories