Downloading files from public Google Drive in python: scoping issues? - python

Using my answer to my question on how to download files from a public Google drive I managed in the past to download images using their IDs from a python script and Google API v3 from a public drive using the following bock of code:
from google_auth_oauthlib.flow import Flow, InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.http import MediaFileUpload, MediaIoBaseDownload
from google.auth.transport.requests import Request
import io
import re
SCOPES = ['https://www.googleapis.com/auth/drive']
CLIENT_SECRET_FILE = "myjson.json"
authorized_port = 6006 # authorize URI redirect on the console
flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRET_FILE, SCOPES)
cred = flow.run_local_server(port=authorized_port)
drive_service = build("drive", "v3", credentials=cred)
regex = "(?<=https://drive.google.com/file/d/)[a-zA-Z0-9]+"
for i, l in enumerate(links_to_download):
url = l
file_id = re.search(regex, url)[0]
request = drive_service.files().get_media(fileId=file_id)
fh = io.FileIO(f"file_{i}", mode='wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))
In the mean time I discovered pydrive and pydrive2, two wrappers around Google API v2 that allows to do very useful things such as listing files from folders and basically allows to do the same thing with a lighter syntax:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
import io
import re
CLIENT_SECRET_FILE = "client_secrets.json"
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
regex = "(?<=https://drive.google.com/file/d/)[a-zA-Z0-9]+"
for i, l in enumerate(links_to_download):
url = l
file_id = re.search(regex, url)[0]
file_handle = drive.CreateFile({'id': file_id})
file_handle.GetContentFile(f"file_{i}")
However now whether I use pydrive or the raw API I cannot seem to be able to download the same files and instead I am met with:
googleapiclient.errors.HttpError: <HttpError 404 when requesting https://www.googleapis.com/drive/v3/files/fileID?alt=media returned "File not found: fileID.". Details: "[{'domain': 'global', 'reason': 'notFound', 'message': 'File not found: fileID.', 'locationType': 'parameter', 'location': 'fileId'}]">
I tried everything and registered 3 different apps using Google console it seems it might be (or not) a question of scoping (see for instance this answer, with apps having access to only files in my Google drive or created by this app). However I did not have this issue before (last year).
When going to the Google console explicitly giving https://www.googleapis.com/auth/drive as a scope to the API mandates filling a ton of fields with application's website/conditions of use/confidentiality rules/authorized domains and youtube videos explaining the app. However I will be the sole user of this script.
So I could only give explicitly the following scopes:
/auth/drive.appdata
/auth/drive.file
/auth/drive.install
Is it because of scoping ? Is there a solution that doesn't require creating a homepage and a youtube video ?
EDIT 1:
Here is an example of links_to_download:
links_to_download = ["https://drive.google.com/file/d/fileID/view?usp=drivesdk&resourcekey=0-resourceKeyValue"]
EDIT 2:
It is super instable sometimes it works without a sweat sometimes it doesn't. When I relaunch the script multiple times I get different results. Retry policies are working to a certain extent but sometimes it fails multiple times for hours.

Well thanks to the security update released by Google few months before. This makes the link sharing stricter and you need resource key as well to access the file in-addition to the fileId.
As per the documentation , You need to provide the resource key as well for newer links, if you want to access it in the header X-Goog-Drive-Resource-Keys as fileId1/resourceKey1.
If you apply this change in your code, it will work as normal. Example edit below:
regex = "(?<=https://drive.google.com/file/d/)[a-zA-Z0-9]+"
regex_rkey = "(?<=resourcekey=)[a-zA-Z0-9-]+"
for i, l in enumerate(links_to_download):
url = l
file_id = re.search(regex, url)[0]
resource_key = re.search(regex_rkey, url)[0]
request = drive_service.files().get_media(fileId=file_id)
request.headers["X-Goog-Drive-Resource-Keys"] = f"{file_id}/{resource_key}"
fh = io.FileIO(f"file_{i}", mode='wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))
Well, the regex for resource key was something I quickly made, so cannot be sure on if it supports every case. But this provides you the solution.
Now, you may have to listen to old and new links based on this and set the changes.

Related

How to download a file from Google Drive using Python and the Drive API v3

I have tried downloading file from Google Drive to my local system using python script but facing a "forbidden" issue while running a Python script. The script is as follows:
import requests
url = "https://www.googleapis.com/drive/v3/files/1wPxpQwvEEOu9whmVVJA9PzGPM2XvZvhj?alt=media&export=download"
querystring = {"alt":"media","export":"download"}
headers = {
'Authorization': "Bearer TOKEN",
'Host': "www.googleapis.com",
'Accept-Encoding': "gzip, deflate",
'Connection': "keep-alive",
}
response = requests.request("GET", url, headers=headers, params=querystring)
print(response.url)
#
import wget
import os
from os.path import expanduser
myhome = expanduser("/home/sunarcgautam/Music")
### set working dir
os.chdir(myhome)
url = "https://www.googleapis.com/drive/v3/files/1wPxpQwvEEOu9whmVVJA9PzGPM2XvZvhj?alt=media&export=download"
print('downloading ...')
wget.download(response.url)
In this script, I have got forbidden issue. Am I doing anything wrong in the script?
I have also tried another script that I found on a Google Developer page, which is as follows:
import auth
import httplib2
SCOPES = "https://www.googleapis.com/auth/drive.scripts"
CLIENT_SECRET_FILE = "client_secret.json"
APPLICATION_NAME = "test_Download"
authInst = auth.auth(SCOPES, CLIENT_SECRET_FILE, APPLICATION_NAME)
credentials = authInst.getCredentials()
http = credentials.authorize(httplib2.Http())
drive_serivce = discovery.build('drive', 'v3', http=http)
file_id = '1Af6vN0uXj8_qgqac6f23QSAiKYCTu9cA'
request = drive_serivce.files().export_media(fileId=file_id,
mimeType='application/pdf')
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print ("Download %d%%." % int(status.progress() * 100))
This script gives me a URL mismatch error.
So what should be given for redirect URL in Google console credentials? or any other solution for the issue? Do I have to authorise my Google console app from Google in both the script? If so, what will the process of authorising the app because I haven't found any document regarding that.
To make requests to Google APIs the work flow is in essence the following:
Go to developer console, log in if you haven't.
Create a Cloud Platform project.
Enable for your project, the APIs you are interested in using with you projects' apps (for example: Google Drive API).
Create and download OAuth 2.0 Client IDs credentials that will allow your app to gain authorization for using your enabled APIs.
Head over to OAuth consent screen, click on and add your scope using the button. (scope: https://www.googleapis.com/auth/drive.readonly for you). Choose Internal/External according to your needs, and for now ignore the warnings if any.
To get the valid token for making API request the app will go through the OAuth flow to receive the authorization token. (Since it needs consent)
During the OAuth flow the user will be redirected to your the OAuth consent screen, where it will be asked to approve or deny access to your app's requested scopes.
If consent is given, your app will receive an authorization token.
Pass the token in your request to your authorized API endpoints.[2]
Build a Drive Service to make API requests (You will need the valid token)[1]
NOTE:
The available methods for the Files resource for Drive API v3 are here.
When using the Python Google APIs Client, then you can use export_media() or get_media() as per Google APIs Client for Python documentation
IMPORTANT:
Also, check that the scope you are using, actually allows you to do what you want (Downloading Files from user's Drive) and set it accordingly. ATM you have an incorrect scope for your goal. See OAuth 2.0 API Scopes
Sample Code References:
Building a Drive Service:
import google_auth_oauthlib.flow
from google.auth.transport.requests import Request
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
class Auth:
def __init__(self, client_secret_filename, scopes):
self.client_secret = client_secret_filename
self.scopes = scopes
self.flow = google_auth_oauthlib.flow.Flow.from_client_secrets_file(self.client_secret, self.scopes)
self.flow.redirect_uri = 'http://localhost:8080/'
self.creds = None
def get_credentials(self):
flow = InstalledAppFlow.from_client_secrets_file(self.client_secret, self.scopes)
self.creds = flow.run_local_server(port=8080)
return self.creds
# The scope you app will use.
# (NEEDS to be among the enabled in your OAuth consent screen)
SCOPES = "https://www.googleapis.com/auth/drive.readonly"
CLIENT_SECRET_FILE = "credentials.json"
credentials = Auth(client_secret_filename=CLIENT_SECRET_FILE, scopes=SCOPES).get_credentials()
drive_service = build('drive', 'v3', credentials=credentials)
Making the request to export or get a file
request = drive_service.files().export(fileId=file_id, mimeType='application/pdf')
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%" % int(status.progress() * 100))
# The file has been downloaded into RAM, now save it in a file
fh.seek(0)
with open('your_filename.pdf', 'wb') as f:
shutil.copyfileobj(fh, f, length=131072)

google-drive-sdk export Daily Limit unauthenticated use

I am trying to download/export a file according to the v3 example published by google. I am getting the "Daily Limit for Unauthenticated Use Exceeded. Continued use requires signup." error.
I have searched here and elsewhere and all of the links suggest I am missing setting up the credentials. However, I am building on top of the basic quickstart example and am able to list out the contents of my drive folder in this same application. And yes, I have changed the requested scope from drive.metadata.readonly to drive.readonly to support downloading. What am I missing?
from __future__ import print_function
from apiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
import io
from googleapiclient.http import MediaIoBaseDownload
# Setup the Drive v3 API
SCOPES = 'https://www.googleapis.com/auth/drive.readonly'
store = file.Storage('credentials.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
creds = tools.run_flow(flow, store)
drive_service = build('drive', 'v3', http=creds.authorize(Http()))
# Call the Drive v3 API to list first 10 items (this works)
# example from google.
results = drive_service.files().list(
pageSize=10, fields="nextPageToken, files(id, name)").execute()
items = results.get('files', [])
if not items:
print('No files found.')
else:
print('Files:')
for item in items:
print('{0} ({1})'.format(item['name'], item['id']))
# Try to download the first item (it's a google doc I can edit, this FAILS)
# code pretty much lifted from google
file_id = items[0]['id']
print (file_id)
request = drive_service.files().export_media(fileId=file_id,
mimeType='application/pdf')
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print ( "Download %d%%." % int(status.progress() * 100) )
Found it. The example from Google was caching the credentials file (credentials.json). When I originally ran the example the scoped permissions were not for drive.readonly, but for drive.metadata.readonly. I think when I changed them the request was not longer valid.
I deleted and credentials.json and re-ran the script (and re-approved the credentials request on my browser) and it was successful. I also ended up using the following to store the data as the BytesIO wasn't actually writing to disk.
data = drive_service.files().export(fileId=file_id,
mimeType='application/pdf').execute()
f = open('MyFile.pdf','wb')
f.write(data)
f.close()

Upload file to MS SharePoint using Python OneDrive SDK

Is it possible to upload a file to the Shared Documents library of a Microsoft SharePoint site with the Python OneDrive SDK?
This documentation says it should be (in the first sentence), but I can't make it work.
I'm able to authenticate (with Azure AD) and upload to a OneDrive folder, but when trying to upload to a SharePoint folder, I keep getting this error:
"Exception of type 'Microsoft.IdentityModel.Tokens.AudienceUriValidationFailedException' was thrown."
The code I'm using that returns an object with the error:
(...authentication...)
client = onedrivesdk.OneDriveClient('https://{tenant}.sharepoint.com/{site}/_api/v2.0/', auth, http)
client.item(path='/drive/special/documents').children['test.xlsx'].upload('test.xlsx')
I can successfully upload to https://{tenant}-my.sharepoint.com/_api/v2.0/ (notice the "-my" after the {tenant}) with the following code:
client = onedrivesdk.OneDriveClient('https://{tenant}-my.sharepoint.com/_api/v2.0/', auth, http)
returned_item = client.item(drive='me', id='root').children['test.xlsx'].upload('test.xlsx')
How could I upload the same file to a SharePoint site?
(Answers to similar questions (1,2,3,4) on Stack Overflow are either too vague or suggest using a different API. My question is if it's possible using the OneDrive Python SDK, and if so, how to do it.)
Update: Here is my full code and output. (Sensitive original data replaced with similarly formatted gibberish.)
import re
import onedrivesdk
from onedrivesdk.helpers.resource_discovery import ResourceDiscoveryRequest
# our domain (not the original)
redirect_uri = 'https://example.ourdomain.net/'
# our client id (not the original)
client_id = "a1234567-1ab2-1234-a123-ab1234abc123"
# our client secret (not the original)
client_secret = 'ABCaDEFGbHcd0e1I2fghJijkL3mn4M5NO67P8Qopq+r='
resource = 'https://api.office.com/discovery/'
auth_server_url = 'https://login.microsoftonline.com/common/oauth2/authorize'
auth_token_url = 'https://login.microsoftonline.com/common/oauth2/token'
http = onedrivesdk.HttpProvider()
auth = onedrivesdk.AuthProvider(http_provider=http, client_id=client_id,
auth_server_url=auth_server_url,
auth_token_url=auth_token_url)
should_authenticate_via_browser = False
try:
# Look for a saved session. If not found, we'll have to
# authenticate by opening the browser.
auth.load_session()
auth.refresh_token()
except FileNotFoundError as e:
should_authenticate_via_browser = True
pass
if should_authenticate_via_browser:
auth_url = auth.get_auth_url(redirect_uri)
code = ''
while not re.match(r'[a-zA-Z0-9_-]+', code):
# Ask for the code
print('Paste this URL into your browser, approve the app\'s access.')
print('Copy the resulting URL and paste it below.')
print(auth_url)
code = input('Paste code here: ')
# Parse code from URL if necessary
if re.match(r'.*?code=([a-zA-Z0-9_-]+).*', code):
code = re.sub(r'.*?code=([a-zA-Z0-9_-]*).*', r'\1', code)
auth.authenticate(code, redirect_uri, client_secret, resource=resource)
# If you have access to more than one service, you'll need to decide
# which ServiceInfo to use instead of just using the first one, as below.
service_info = ResourceDiscoveryRequest().get_service_info(auth.access_token)[0]
auth.redeem_refresh_token(service_info.service_resource_id)
auth.save_session() # Save session into a local file.
# Doesn't work
client = onedrivesdk.OneDriveClient(
'https://{tenant}.sharepoint.com/sites/{site}/_api/v2.0/', auth, http)
returned_item = client.item(path='/drive/special/documents')
.children['test.xlsx']
.upload('test.xlsx')
print(returned_item._prop_dict['error_description'])
# Works, uploads to OneDrive instead of SharePoint site
client2 = onedrivesdk.OneDriveClient(
'https://{tenant}-my.sharepoint.com/_api/v2.0/', auth, http)
returned_item2 = client2.item(drive='me', id='root')
.children['test.xlsx']
.upload('test.xlsx')
print(returned_item2.web_url)
Output:
Exception of type 'Microsoft.IdentityModel.Tokens.AudienceUriValidationFailedException' was thrown.
https://{tenant}-my.sharepoint.com/personal/user_domain_net/_layouts/15/WopiFrame.aspx?sourcedoc=%1ABCDE2345-67F8-9012-3G45-6H78IJKL9M01%2N&file=test.xlsx&action=default
I finally found a solution, with the help of (SO user) sytech.
The answer to my original question is that using the original Python OneDrive SDK, it's not possible to upload a file to the Shared Documents folder of a SharePoint Online site (at the moment of writing this): when the SDK queries the resource discovery service, it drops all services whose service_api_version is not v2.0. However, I get the SharePoint service with v1.0, so it's dropped, although it could be accessed using API v2.0 too.
However, by extending the ResourceDiscoveryRequest class (in the OneDrive SDK), we can create a workaround for this. I managed to upload a file this way:
import json
import re
import onedrivesdk
import requests
from onedrivesdk.helpers.resource_discovery import ResourceDiscoveryRequest, \
ServiceInfo
# our domain (not the original)
redirect_uri = 'https://example.ourdomain.net/'
# our client id (not the original)
client_id = "a1234567-1ab2-1234-a123-ab1234abc123"
# our client secret (not the original)
client_secret = 'ABCaDEFGbHcd0e1I2fghJijkL3mn4M5NO67P8Qopq+r='
resource = 'https://api.office.com/discovery/'
auth_server_url = 'https://login.microsoftonline.com/common/oauth2/authorize'
auth_token_url = 'https://login.microsoftonline.com/common/oauth2/token'
# our sharepoint URL (not the original)
sharepoint_base_url = 'https://{tenant}.sharepoint.com/'
# our site URL (not the original)
sharepoint_site_url = sharepoint_base_url + 'sites/{site}'
file_to_upload = 'C:/test.xlsx'
target_filename = 'test.xlsx'
class AnyVersionResourceDiscoveryRequest(ResourceDiscoveryRequest):
def get_all_service_info(self, access_token, sharepoint_base_url):
headers = {'Authorization': 'Bearer ' + access_token}
response = json.loads(requests.get(self._discovery_service_url,
headers=headers).text)
service_info_list = [ServiceInfo(x) for x in response['value']]
# Get all services, not just the ones with service_api_version 'v2.0'
# Filter only on service_resource_id
sharepoint_services = \
[si for si in service_info_list
if si.service_resource_id == sharepoint_base_url]
return sharepoint_services
http = onedrivesdk.HttpProvider()
auth = onedrivesdk.AuthProvider(http_provider=http, client_id=client_id,
auth_server_url=auth_server_url,
auth_token_url=auth_token_url)
should_authenticate_via_browser = False
try:
# Look for a saved session. If not found, we'll have to
# authenticate by opening the browser.
auth.load_session()
auth.refresh_token()
except FileNotFoundError as e:
should_authenticate_via_browser = True
pass
if should_authenticate_via_browser:
auth_url = auth.get_auth_url(redirect_uri)
code = ''
while not re.match(r'[a-zA-Z0-9_-]+', code):
# Ask for the code
print('Paste this URL into your browser, approve the app\'s access.')
print('Copy the resulting URL and paste it below.')
print(auth_url)
code = input('Paste code here: ')
# Parse code from URL if necessary
if re.match(r'.*?code=([a-zA-Z0-9_-]+).*', code):
code = re.sub(r'.*?code=([a-zA-Z0-9_-]*).*', r'\1', code)
auth.authenticate(code, redirect_uri, client_secret, resource=resource)
service_info = AnyVersionResourceDiscoveryRequest().\
get_all_service_info(auth.access_token, sharepoint_base_url)[0]
auth.redeem_refresh_token(service_info.service_resource_id)
auth.save_session()
client = onedrivesdk.OneDriveClient(sharepoint_site_url + '/_api/v2.0/',
auth, http)
# Get the drive ID of the Documents folder.
documents_drive_id = [x['id']
for x
in client.drives.get()._prop_list
if x['name'] == 'Documents'][0]
items = client.item(drive=documents_drive_id, id='root')
# Upload file
uploaded_file_info = items.children[target_filename].upload(file_to_upload)
Authenticating for a different service gives you a different token.

People API returns no connections when authenticating via Service Account [duplicate]

I'm trying to programmatically access the list of contacts on my own personal Google Account using the Python Client Library
This is a script that will run on a server without user input, so I have it set up to use credentials from a Service Account I set up. My Google API console setup looks like this.
I'm using the following basic script, pulled from the examples provided in the API docs -
import json
from httplib2 import Http
from oauth2client.service_account import ServiceAccountCredentials
from apiclient.discovery import build
# Only need read-only access
scopes = ['https://www.googleapis.com/auth/contacts.readonly']
# JSON file downloaded from Google API Console when creating the service account
credentials = ServiceAccountCredentials.from_json_keyfile_name(
'keep-in-touch-5d3ebc885d4c.json', scopes)
# Build the API Service
service = build('people', 'v1', credentials=credentials)
# Query for the results
results = service.people().connections().list(resourceName='people/me').execute()
# The result set is a dictionary and should contain the key 'connections'
connections = results.get('connections', [])
print connections #=> [] - empty!
When I hit the API it returns a result set without any 'connections' key. Specifically it returns -
>>> results
{u'nextSyncToken': u'CNP66PXjKhIBMRj-EioECAAQAQ'}
Is there something pertaining to my setup or code that's incorrect? Is there a way to see the response HTTP status code or get any further detail about what it's trying to do?
Thanks!
Side note: When I try it using the "Try it!" feature in the API docs, it correctly returns my contacts. Although I doubt that uses the client library and instead relies on user authorization via OAuth
The personFields mask is required. Specify one or more valid paths. Valid paths are documented at https://developers.google.com/people/api/rest/v1/people.connections/list/.
Additionally, use fields mask to specify which fields are included in a partial response.
Instead of:
results = service.people().connections().list(resourceName='people/me').execute()
... try:
results = service.people().connections().list(resourceName='people/me',personFields='names,emailAddresses',fields='connections,totalItems,nextSyncToken').execute()
Here is a working demo. I just tested it right now. Python 3.5.2
google-api-python-client==1.6.4
httplib2==0.10.3
oauth2client==4.1.2
You can save it to demo.py and then just run it. I left the create_contact function in case you might want to use it and have one more example on the API usage.
CLIENT_ID and CLIENT_SECRET are environment variables so I don't accidentally share that in code.
"""Google API stuff."""
import httplib2
import json
import os
from apiclient.discovery import build
from oauth2client.file import Storage
from oauth2client.client import OAuth2WebServerFlow
from oauth2client.tools import run_flow
CLIENT_ID = os.environ['CLIENT_ID']
CLIENT_SECRET = os.environ['CLIENT_SECRET']
SCOPE = 'https://www.googleapis.com/auth/contacts'
USER_AGENT = 'JugDemoStackOverflow/v0.1'
def make_flow():
"""Make flow."""
flow = OAuth2WebServerFlow(
client_id=CLIENT_ID,
client_secret=CLIENT_SECRET,
scope=SCOPE,
user_agent=USER_AGENT,
)
return flow
def get_people():
"""Return a people_service."""
flow = make_flow()
storage = Storage('info.dat')
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = run_flow(flow, storage)
http = httplib2.Http()
http = credentials.authorize(http)
people_service = build(serviceName='people', version='v1', http=http)
return people_service
def create_contact(people, user):
"""Create a Google Contact."""
request = people.createContact(
body={
'names': [{'givenName': user.name}],
'phoneNumbers': [
{'canonicalForm': user.phone, 'value': user.phone}],
}
)
return request.execute()
def demo():
"""Demonstrate getting contacts from Google People."""
people_service = get_people()
people = people_service.people()
connections = people.connections().list(
resourceName='people/me',
personFields='names,emailAddresses,phoneNumbers',
pageSize=2000,
)
result = connections.execute()
s = json.dumps(result)
# with open('contacts.json', 'w') as f:
# f.write(s)
return s
if __name__ == '__main__':
print(demo())
With service account, in DwD - G Suite Domain-wide Delegation, is necessary impersonate or delegate user in this way
delegate = credentials.create_delegated('user#xxxx.xxx')
For fellow googlers: I have the same problem using the JS API.
I succeded on my personal gmail address, but not on my work one (g-suite) neither on my secondary gmail address.
Can't see the pattern. It's possible that the work one has contact listing deactivated.

Python: download files from google drive using url

I am trying to download files from google drive and all I have is the drive's URL.
I have read about google API that talks about some drive_service and MedioIO, which also requires some credentials( mainly JSON file/OAuth). But I am unable to get any idea about how it is working.
Also, tried urllib2.urlretrieve, but my case is to get files from the drive. Tried wget too but no use.
Tried PyDrive library. It has good upload functions to drive but no download options.
Any help will be appreciated.
Thanks.
If by "drive's url" you mean the shareable link of a file on Google Drive, then the following might help:
import requests
def download_file_from_google_drive(id, destination):
URL = "https://docs.google.com/uc?export=download"
session = requests.Session()
response = session.get(URL, params = { 'id' : id }, stream = True)
token = get_confirm_token(response)
if token:
params = { 'id' : id, 'confirm' : token }
response = session.get(URL, params = params, stream = True)
save_response_content(response, destination)
def get_confirm_token(response):
for key, value in response.cookies.items():
if key.startswith('download_warning'):
return value
return None
def save_response_content(response, destination):
CHUNK_SIZE = 32768
with open(destination, "wb") as f:
for chunk in response.iter_content(CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
if __name__ == "__main__":
file_id = 'TAKE ID FROM SHAREABLE LINK'
destination = 'DESTINATION FILE ON YOUR DISK'
download_file_from_google_drive(file_id, destination)
The snipped does not use pydrive, nor the Google Drive SDK, though. It uses the requests module (which is, somehow, an alternative to urllib2).
When downloading large files from Google Drive, a single GET request is not sufficient. A second one is needed - see wget/curl large file from google drive.
I recommend gdown package.
pip install gdown
Take your share link
https://drive.google.com/file/d/0B9P1L--7Wd2vNm9zMTJWOGxobkU/view?usp=sharing
and grab the id - eg. 1TLNdIufzwesDbyr_nVTR7Zrx9oRHLM_N by pressing the download button (look for at the link), and swap it in after the id below.
import gdown
url = 'https://drive.google.com/uc?id=0B9P1L--7Wd2vNm9zMTJWOGxobkU'
output = '20150428_collected_images.tgz'
gdown.download(url, output, quiet=False)
Having had similar needs many times, I made an extra simple class GoogleDriveDownloader starting on the snippet from #user115202 above. You can find the source code here.
You can also install it through pip:
pip install googledrivedownloader
Then usage is as simple as:
from google_drive_downloader import GoogleDriveDownloader as gdd
gdd.download_file_from_google_drive(file_id='1iytA1n2z4go3uVCwE__vIKouTKyIDjEq',
dest_path='./data/mnist.zip',
unzip=True)
This snippet will download an archive shared in Google Drive. In this case 1iytA1n2z4go3uVCwE__vIKouTKyIDjEq is the id of the sharable link got from Google Drive.
Here's an easy way to do it with no third-party libraries and a service account.
pip install google-api-core and google-api-python-client
from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
from google.oauth2 import service_account
import io
credz = {} #put json credentials her from service account or the like
# More info: https://cloud.google.com/docs/authentication
credentials = service_account.Credentials.from_service_account_info(credz)
drive_service = build('drive', 'v3', credentials=credentials)
file_id = '0BwwA4oUTeiV1UVNwOHItT0xfa2M'
request = drive_service.files().get_media(fileId=file_id)
#fh = io.BytesIO() # this can be used to keep in memory
fh = io.FileIO('file.tar.gz', 'wb') # this can be used to write to disk
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))
PyDrive allows you to download a file with the function GetContentFile(). You can find the function's documentation here.
See example below:
# Initialize GoogleDriveFile instance with file id.
file_obj = drive.CreateFile({'id': '<your file ID here>'})
file_obj.GetContentFile('cats.png') # Download file as 'cats.png'.
This code assumes that you have an authenticated drive object, the docs on this can be found here and here.
In the general case this is done like so:
from pydrive.auth import GoogleAuth
gauth = GoogleAuth()
# Create local webserver which automatically handles authentication.
gauth.LocalWebserverAuth()
# Create GoogleDrive instance with authenticated GoogleAuth instance.
drive = GoogleDrive(gauth)
Info on silent authentication on a server can be found here and involves writing a settings.yaml (example: here) in which you save the authentication details.
There's in the docs a function that downloads a file when we provide an ID of the file to download,
from __future__ import print_function
import io
import google.auth
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from googleapiclient.http import MediaIoBaseDownload
def download_file(real_file_id):
"""Downloads a file
Args:
real_file_id: ID of the file to download
Returns : IO object with location.
Load pre-authorized user credentials from the environment.
TODO(developer) - See https://developers.google.com/identity
for guides on implementing OAuth2 for the application.
"""
creds, _ = google.auth.default()
try:
# create drive api client
service = build('drive', 'v3', credentials=creds)
file_id = real_file_id
# pylint: disable=maybe-no-member
request = service.files().get_media(fileId=file_id)
file = io.BytesIO()
downloader = MediaIoBaseDownload(file, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print(F'Download {int(status.progress() * 100)}.')
except HttpError as error:
print(F'An error occurred: {error}')
file = None
return file.getvalue()
if __name__ == '__main__':
download_file(real_file_id='1KuPmvGq8yoYgbfW74OENMCB5H0n_2Jm9')
This bears the question:
How do we get the file ID to download the file?
Generally speaking, a URL from a shared file from Google Drive looks like this
https://drive.google.com/file/d/1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh/view?usp=sharing
where 1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh corresponds to fileID.
You can simply copy it from the URL or, if you prefer, it's also possible to create a function to get the fileID from the URL.
For instance, given the following url = https://drive.google.com/file/d/1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh/view?usp=sharing,
def url_to_id(url):
x = url.split("/")
return x[5]
Printing x will give
['https:', '', 'drive.google.com', 'file', 'd', '1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh', 'view?usp=sharing']
And so, as we want to return the 6th array value, we use x[5].
This has also been described above,
from pydrive.auth import GoogleAuth
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
This creates its own server too do the dirty work of authenticating
file_obj = drive.CreateFile({'id': '<Put the file ID here>'})
file_obj.GetContentFile('Demo.txt')
This downloads the file
import requests
def download_file_from_google_drive(id, destination):
URL = "https://docs.google.com/uc?export=download"
session = requests.Session()
response = session.get(URL, params = { 'id' : id , 'confirm': 1 }, stream = True)
token = get_confirm_token(response)
if token:
params = { 'id' : id, 'confirm' : token }
response = session.get(URL, params = params, stream = True)
save_response_content(response, destination)
def get_confirm_token(response):
for key, value in response.cookies.items():
if key.startswith('download_warning'):
return value
return None
def save_response_content(response, destination):
CHUNK_SIZE = 32768
with open(destination, "wb") as f:
for chunk in response.iter_content(CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
if __name__ == "__main__":
file_id = 'TAKE ID FROM SHAREABLE LINK'
destination = 'DESTINATION FILE ON YOUR DISK'
download_file_from_google_drive(file_id, destination)
Just repeating the accepted answer but adding confirm=1 parameter so it always downloads even if the file is too big
# Importing [PyDrive][1] OAuth
from pydrive.auth import GoogleAuth
def download_tracking_file_by_id(file_id, download_dir):
gauth = GoogleAuth(settings_file='../settings.yaml')
# Try to load saved client credentials
gauth.LoadCredentialsFile("../credentials.json")
if gauth.credentials is None:
# Authenticate if they're not there
gauth.LocalWebserverAuth()
elif gauth.access_token_expired:
# Refresh them if expired
gauth.Refresh()
else:
# Initialize the saved creds
gauth.Authorize()
# Save the current credentials to a file
gauth.SaveCredentialsFile("../credentials.json")
drive = GoogleDrive(gauth)
logger.debug("Trying to download file_id " + str(file_id))
file6 = drive.CreateFile({'id': file_id})
file6.GetContentFile(download_dir+'mapmob.zip')
zipfile.ZipFile(download_dir + 'test.zip').extractall(UNZIP_DIR)
tracking_data_location = download_dir + 'test.json'
return tracking_data_location
The above function downloads the file given the file_id to a specified downloads folder. Now the question remains, how to get the file_id? Simply split the url by id= to get the file_id.
file_id = url.split("id=")[1]
I tried using google Colaboratory: https://colab.research.google.com/
Suppose your sharable link is https://docs.google.com/spreadsheets/d/12hiI0NK7M0KEfscMfyBaLT9gxcZMleeu/edit?usp=sharing&ouid=102608702203033509854&rtpof=true&sd=true
all you need is id that is 12hiI0NK7M0KEfscMfyBaLT9gxcZMleeu
command in cell
!gdown 12hiI0NK7M0KEfscMfyBaLT9gxcZMleeu
run the cell and you will see that file is downloaded in /content/Amazon_Reviews.xlsx
Note: one should know how to use Google colab
This example is based on an similar to RayB, but keeps the file in memory
and is a little simpler, and you can paste it into colab and it works.
import googleapiclient.discovery
import oauth2client.client
from google.colab import auth
auth.authenticate_user()
def download_gdrive(id):
creds = oauth2client.client.GoogleCredentials.get_application_default()
service = googleapiclient.discovery.build('drive', 'v3', credentials=creds)
return service.files().get_media(fileId=id).execute()
a = download_gdrive("1F-yaQB8fdsfsdafm2l8WFjhEiYSHZrCcr")
You can install https://pypi.org/project/googleDriveFileDownloader/
pip install googleDriveFileDownloader
And download the file, here is the sample code to download
from googleDriveFileDownloader import googleDriveFileDownloader
a = googleDriveFileDownloader()
a.downloadFile("https://drive.google.com/uc?id=1O4x8rwGJAh8gRo8sjm0kuKFf6vCEm93G&export=download")

Categories