Google App Engine and Google Sheets exceeding soft memory limit - python

I'm writing a simple service to take data from a couple of sources, munge it together, and use the Google API client to send it to a Google Sheet. Easy peasy works good, the data is not that big.
The issue is that calling .spreadsheets() after building the api service (i.e. build('sheets', 'v4', http=auth).spreadsheets()) causes a memory jump of roughly 30 megabytes (I did some profiling to separate out where the memory was being allocated). When deployed to GAE, these spikes stick around for long stretches of time (hours at a time sometimes), creeping upwards and after several requests trigger GAE's 'Exceeded soft private memory limit' error.
I am using memcache for the discovery document and urlfetch for grabbing data, but those are the only other services I am using.
I have tried manual garbage collection, changing threadsafe in app.yaml, even things like changing the point at which .spreadsheets() is called, and can't shake this problem. It's also possible that I am simply misunderstanding something about GAE's architecture, but I know the spike is caused by the call to .spreadsheets() and I am not storing anything in local caches.
Is there a way either to 1) reduce the size of the memory spike from calling .spreadsheets() or 2) keep the spikes from staying around in memory (or preferably do both). A very simplified gist is below to give an idea of the API calls and request handler, I can give fuller code if needed. I know similar questions have been asked before, but I can't get it fixed.
https://gist.github.com/chill17/18f1caa897e6a20201232165aca05239

I ran into this when using the spreadsheets API on a small processor with only 20MB of usable RAM. The problem is the google API client pulls in the whole API in string format and stores it as a resource object in memory.
If free memory is an issue, you should construct your own http object and manually make the desired request. See my Spreadsheet() class as an example of how to create a new spreadsheet using this method.
SCOPES = 'https://www.googleapis.com/auth/spreadsheets'
CLIENT_SECRET_FILE = 'client_secret.json'
APPLICATION_NAME = 'Google Sheets API Python Quickstart'
class Spreadsheet:
def __init__(self, title):
#Get credentials from locally stored JSON file
#If file does not exist, create it
self.credentials = self.getCredentials()
#HTTP service that will be used to push/pull data
self.service = httplib2.Http()
self.service = self.credentials.authorize(self.service)
self.headers = {'content-type': 'application/json', 'accept-encoding': 'gzip, deflate', 'accept': 'application/json', 'user-agent': 'google-api-python-client/1.6.2 (gzip)'}
print("CREDENTIALS: "+str(self.credentials))
self.baseUrl = "https://sheets.googleapis.com/v4/spreadsheets"
self.spreadsheetInfo = self.create(title)
self.spreadsheetId = self.spreadsheetInfo['spreadsheetId']
def getCredentials(self):
"""Gets valid user credentials from storage.
If nothing has been stored, or if the stored credentials are invalid,
the OAuth2 flow is completed to obtain the new credentials.
Returns:
Credentials, the obtained credential.
"""
home_dir = os.path.expanduser('~')
credential_dir = os.path.join(home_dir, '.credentials')
if not os.path.exists(credential_dir):
os.makedirs(credential_dir)
credential_path = os.path.join(credential_dir,
'sheets.googleapis.com-python-quickstart.json')
store = Storage(credential_path)
credentials = store.get()
if not credentials or credentials.invalid:
flow = client.flow_from_clientsecrets(CLIENT_SECRET_FILE, SCOPES)
flow.user_agent = APPLICATION_NAME
if flags:
credentials = tools.run_flow(flow, store, flags)
else: # Needed only for compatibility with Python 2.6
credentials = tools.run(flow, store)
print('Storing credentials to ' + credential_path)
return credentials
def create(self, title):
#Only put title in request body... We don't need anything else for now
requestBody = {
"properties":{
"title":title
},
}
print("BODY: "+str(requestBody))
url = self.baseUrl
response, content = self.service.request(url,
method="POST",
headers=self.headers,
body=str(requestBody))
print("\n\nRESPONSE\n"+str(response))
print("\n\nCONTENT\n"+str(content))
return json.loads(content)

Related

How to download a file from Google Drive using Python and the Drive API v3

I have tried downloading file from Google Drive to my local system using python script but facing a "forbidden" issue while running a Python script. The script is as follows:
import requests
url = "https://www.googleapis.com/drive/v3/files/1wPxpQwvEEOu9whmVVJA9PzGPM2XvZvhj?alt=media&export=download"
querystring = {"alt":"media","export":"download"}
headers = {
'Authorization': "Bearer TOKEN",
'Host': "www.googleapis.com",
'Accept-Encoding': "gzip, deflate",
'Connection': "keep-alive",
}
response = requests.request("GET", url, headers=headers, params=querystring)
print(response.url)
#
import wget
import os
from os.path import expanduser
myhome = expanduser("/home/sunarcgautam/Music")
### set working dir
os.chdir(myhome)
url = "https://www.googleapis.com/drive/v3/files/1wPxpQwvEEOu9whmVVJA9PzGPM2XvZvhj?alt=media&export=download"
print('downloading ...')
wget.download(response.url)
In this script, I have got forbidden issue. Am I doing anything wrong in the script?
I have also tried another script that I found on a Google Developer page, which is as follows:
import auth
import httplib2
SCOPES = "https://www.googleapis.com/auth/drive.scripts"
CLIENT_SECRET_FILE = "client_secret.json"
APPLICATION_NAME = "test_Download"
authInst = auth.auth(SCOPES, CLIENT_SECRET_FILE, APPLICATION_NAME)
credentials = authInst.getCredentials()
http = credentials.authorize(httplib2.Http())
drive_serivce = discovery.build('drive', 'v3', http=http)
file_id = '1Af6vN0uXj8_qgqac6f23QSAiKYCTu9cA'
request = drive_serivce.files().export_media(fileId=file_id,
mimeType='application/pdf')
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print ("Download %d%%." % int(status.progress() * 100))
This script gives me a URL mismatch error.
So what should be given for redirect URL in Google console credentials? or any other solution for the issue? Do I have to authorise my Google console app from Google in both the script? If so, what will the process of authorising the app because I haven't found any document regarding that.
To make requests to Google APIs the work flow is in essence the following:
Go to developer console, log in if you haven't.
Create a Cloud Platform project.
Enable for your project, the APIs you are interested in using with you projects' apps (for example: Google Drive API).
Create and download OAuth 2.0 Client IDs credentials that will allow your app to gain authorization for using your enabled APIs.
Head over to OAuth consent screen, click on and add your scope using the button. (scope: https://www.googleapis.com/auth/drive.readonly for you). Choose Internal/External according to your needs, and for now ignore the warnings if any.
To get the valid token for making API request the app will go through the OAuth flow to receive the authorization token. (Since it needs consent)
During the OAuth flow the user will be redirected to your the OAuth consent screen, where it will be asked to approve or deny access to your app's requested scopes.
If consent is given, your app will receive an authorization token.
Pass the token in your request to your authorized API endpoints.[2]
Build a Drive Service to make API requests (You will need the valid token)[1]
NOTE:
The available methods for the Files resource for Drive API v3 are here.
When using the Python Google APIs Client, then you can use export_media() or get_media() as per Google APIs Client for Python documentation
IMPORTANT:
Also, check that the scope you are using, actually allows you to do what you want (Downloading Files from user's Drive) and set it accordingly. ATM you have an incorrect scope for your goal. See OAuth 2.0 API Scopes
Sample Code References:
Building a Drive Service:
import google_auth_oauthlib.flow
from google.auth.transport.requests import Request
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
class Auth:
def __init__(self, client_secret_filename, scopes):
self.client_secret = client_secret_filename
self.scopes = scopes
self.flow = google_auth_oauthlib.flow.Flow.from_client_secrets_file(self.client_secret, self.scopes)
self.flow.redirect_uri = 'http://localhost:8080/'
self.creds = None
def get_credentials(self):
flow = InstalledAppFlow.from_client_secrets_file(self.client_secret, self.scopes)
self.creds = flow.run_local_server(port=8080)
return self.creds
# The scope you app will use.
# (NEEDS to be among the enabled in your OAuth consent screen)
SCOPES = "https://www.googleapis.com/auth/drive.readonly"
CLIENT_SECRET_FILE = "credentials.json"
credentials = Auth(client_secret_filename=CLIENT_SECRET_FILE, scopes=SCOPES).get_credentials()
drive_service = build('drive', 'v3', credentials=credentials)
Making the request to export or get a file
request = drive_service.files().export(fileId=file_id, mimeType='application/pdf')
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%" % int(status.progress() * 100))
# The file has been downloaded into RAM, now save it in a file
fh.seek(0)
with open('your_filename.pdf', 'wb') as f:
shutil.copyfileobj(fh, f, length=131072)

Not able to regenerate token while in server environment for Oauth2.0 Google Drive API

I would like to regenerate a token while having no access to a browser on a server side environment.
I am having difficulties generating a new token when the current token expires during the Oauth2.0 authorization with the Google Drive API.
I have got to a point where my code provides a URL which we would need to GET to get access to a code that needs to be filled back on the server.
I have tried looking around on StackOverflow but have been unsuccessful in finding a solution. Given below is the current state of my code:
def get_latest_token():
store = file.Storage('token.json')
creds = store.get()
token_params = load_json_as_dict('token.json')
if not creds or creds.access_token_expired:
flow = client.flow_from_clientsecrets('credentials.json', 'https://www.googleapis.com/auth/drive')
flags = tools.argparser.parse_args(args=[])
creds = tools.run_flow(flow, store, flags)
bearer_token = "Bearer " + load_json_as_dict('token.json')["access_token"]
return bearer_token
Any help appreciated.

When does a Credentials object become invalidated?

I am playing with a Python script that is based on https://developers.google.com/drive/v3/web/quickstart/python and it works fine. I can upload simple text files to my Drive account.
The code on that page is as follows:
# If modifying these scopes, delete your previously saved credentials
# at ~/.credentials/drive-python-quickstart.json
SCOPES = 'https://www.googleapis.com/auth/drive.metadata.readonly'
CLIENT_SECRET_FILE = 'client_secret.json'
APPLICATION_NAME = 'Drive API Python Quickstart'
def get_credentials():
"""Gets valid user credentials from storage.
If nothing has been stored, or if the stored credentials are invalid,
the OAuth2 flow is completed to obtain the new credentials.
Returns:
Credentials, the obtained credential.
"""
home_dir = os.path.expanduser('~')
credential_dir = os.path.join(home_dir, '.credentials')
if not os.path.exists(credential_dir):
os.makedirs(credential_dir)
credential_path = os.path.join(credential_dir,
'drive-python-quickstart.json')
store = Storage(credential_path)
credentials = store.get()
if not credentials or credentials.invalid:
flow = client.flow_from_clientsecrets(CLIENT_SECRET_FILE, SCOPES)
flow.user_agent = APPLICATION_NAME
if flags:
credentials = tools.run_flow(flow, store, flags)
else: # Needed only for compatibility with Python 2.6
credentials = tools.run(flow, store)
print('Storing credentials to ' + credential_path)
return credentials
Suppose that the script is executed once, resulting in 'drive-python-quickstart.json' file being saved with something like this (X's replacing sensitive information of course):
{"_module": "oauth2client.client",
"scopes": ["https://www.googleapis.com/auth/drive.file"],
"token_expiry": "2016-11-13T07:15:15Z",
"id_token": null,
"access_token": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"token_uri": "https://accounts.google.com/o/oauth2/token",
"invalid": false,
"token_response": {"access_token": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"token_type": "Bearer",
"expires_in": 3600,
"refresh_token": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"},
"client_id": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.apps.googleusercontent.com",
"token_info_uri": "https://www.googleapis.com/oauth2/v3/tokeninfo",
"client_secret": "XXXXXXXXXXXXXXXXXXXXXXXX",
"revoke_uri": "https://accounts.google.com/o/oauth2/revoke",
"_class": "OAuth2Credentials",
"refresh_token": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"user_agent": null}
Let's suppose that the 'drive-python-quickstart.json' file is always existing and both readable and writable. Suppose some time passes, and the script executes again at some time after the time given by the "token_expiry" key in that JSON value. Is it expected that something detects that the time has expired on the Credentials object, forcing the credentials object to switch into an invalid state, which means that credentials.invalid then becomes True? Or is it the case that the existence of the "refresh_token" field implies that the something in the API will automatically update the 'drive-python-quickstart.json' file automatically such that credentials.invalid always returns True?
The Google python client library will refresh the access token as needed as long as your refresh token is good. To be clear, the client library is used to access the API. The API has no control over your authentication. It expects you, or rather the client library, to send it the information it requires in order for it to work.
Top Tip: Refresh tokens that aren't used for six months will also expire, so I recommend you run your script at least once every six months.

People API returns no connections when authenticating via Service Account [duplicate]

I'm trying to programmatically access the list of contacts on my own personal Google Account using the Python Client Library
This is a script that will run on a server without user input, so I have it set up to use credentials from a Service Account I set up. My Google API console setup looks like this.
I'm using the following basic script, pulled from the examples provided in the API docs -
import json
from httplib2 import Http
from oauth2client.service_account import ServiceAccountCredentials
from apiclient.discovery import build
# Only need read-only access
scopes = ['https://www.googleapis.com/auth/contacts.readonly']
# JSON file downloaded from Google API Console when creating the service account
credentials = ServiceAccountCredentials.from_json_keyfile_name(
'keep-in-touch-5d3ebc885d4c.json', scopes)
# Build the API Service
service = build('people', 'v1', credentials=credentials)
# Query for the results
results = service.people().connections().list(resourceName='people/me').execute()
# The result set is a dictionary and should contain the key 'connections'
connections = results.get('connections', [])
print connections #=> [] - empty!
When I hit the API it returns a result set without any 'connections' key. Specifically it returns -
>>> results
{u'nextSyncToken': u'CNP66PXjKhIBMRj-EioECAAQAQ'}
Is there something pertaining to my setup or code that's incorrect? Is there a way to see the response HTTP status code or get any further detail about what it's trying to do?
Thanks!
Side note: When I try it using the "Try it!" feature in the API docs, it correctly returns my contacts. Although I doubt that uses the client library and instead relies on user authorization via OAuth
The personFields mask is required. Specify one or more valid paths. Valid paths are documented at https://developers.google.com/people/api/rest/v1/people.connections/list/.
Additionally, use fields mask to specify which fields are included in a partial response.
Instead of:
results = service.people().connections().list(resourceName='people/me').execute()
... try:
results = service.people().connections().list(resourceName='people/me',personFields='names,emailAddresses',fields='connections,totalItems,nextSyncToken').execute()
Here is a working demo. I just tested it right now. Python 3.5.2
google-api-python-client==1.6.4
httplib2==0.10.3
oauth2client==4.1.2
You can save it to demo.py and then just run it. I left the create_contact function in case you might want to use it and have one more example on the API usage.
CLIENT_ID and CLIENT_SECRET are environment variables so I don't accidentally share that in code.
"""Google API stuff."""
import httplib2
import json
import os
from apiclient.discovery import build
from oauth2client.file import Storage
from oauth2client.client import OAuth2WebServerFlow
from oauth2client.tools import run_flow
CLIENT_ID = os.environ['CLIENT_ID']
CLIENT_SECRET = os.environ['CLIENT_SECRET']
SCOPE = 'https://www.googleapis.com/auth/contacts'
USER_AGENT = 'JugDemoStackOverflow/v0.1'
def make_flow():
"""Make flow."""
flow = OAuth2WebServerFlow(
client_id=CLIENT_ID,
client_secret=CLIENT_SECRET,
scope=SCOPE,
user_agent=USER_AGENT,
)
return flow
def get_people():
"""Return a people_service."""
flow = make_flow()
storage = Storage('info.dat')
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = run_flow(flow, storage)
http = httplib2.Http()
http = credentials.authorize(http)
people_service = build(serviceName='people', version='v1', http=http)
return people_service
def create_contact(people, user):
"""Create a Google Contact."""
request = people.createContact(
body={
'names': [{'givenName': user.name}],
'phoneNumbers': [
{'canonicalForm': user.phone, 'value': user.phone}],
}
)
return request.execute()
def demo():
"""Demonstrate getting contacts from Google People."""
people_service = get_people()
people = people_service.people()
connections = people.connections().list(
resourceName='people/me',
personFields='names,emailAddresses,phoneNumbers',
pageSize=2000,
)
result = connections.execute()
s = json.dumps(result)
# with open('contacts.json', 'w') as f:
# f.write(s)
return s
if __name__ == '__main__':
print(demo())
With service account, in DwD - G Suite Domain-wide Delegation, is necessary impersonate or delegate user in this way
delegate = credentials.create_delegated('user#xxxx.xxx')
For fellow googlers: I have the same problem using the JS API.
I succeded on my personal gmail address, but not on my work one (g-suite) neither on my secondary gmail address.
Can't see the pattern. It's possible that the work one has contact listing deactivated.

Getting WebViewLinks with Google Drive

I've just started trying to use the Google Drive API. Using the quickstart guide I set up the authentication, I can print a list of my files and I can even make copies. All that works great, however I'm having trouble trying to access data from a file on Drive. In particular, I'm trying to get a WebViewLink, however when I call .get I receive only a small dictionary that has barely any of the file's metadata. The documentation makes it look like all the data should just be there by default but it's not appearing. I couldn't find any way to flag for requesting any additional information.
credentials = get_credentials()
http = credentials.authorize(httplib2.Http())
service = discovery.build('drive', 'v3', http=http)
results = service.files().list(fields="nextPageToken, files(id, name)").execute()
items = results.get('files', [])
if not items:
print('No files found.')
else:
print('Files:')
for item in items:
print(item['name'], item['id'])
if "Target File" in item['name']:
d = service.files().get(fileId=item['id']).execute()
print(repr(d))
This is the output of the above code: (the formatting is my doing)
{u'mimeType': u'application/vnd.google-apps.document',
u'kind': u'drive#file',
u'id': u'1VO9cC8mGM67onVYx3_2f-SYzLJPR4_LteQzILdWJgDE',
u'name': u'Fix TVP Licence Issues'}
For anyone confused about the code there is some missing that's just the basic get_credentials function from the API's quickstart page and some constants and imports. For completeness, here's all that stuff, unmodified in my code:
from __future__ import print_function
import httplib2
import os
from apiclient import discovery
import oauth2client
from oauth2client import client
from oauth2client import tools
SCOPES = 'https://www.googleapis.com/auth/drive'
CLIENT_SECRET_FILE = 'client_secret.json'
APPLICATION_NAME = 'Drive API Python Quickstart'
try:
import argparse
flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
except ImportError:
flags = None
def get_credentials():
"""Gets valid user credentials from storage.
If nothing has been stored, or if the stored credentials are invalid,
the OAuth2 flow is completed to obtain the new credentials.
Returns:
Credentials, the obtained credential.
"""
home_dir = os.path.expanduser('~')
credential_dir = os.path.join(home_dir, '.credentials')
if not os.path.exists(credential_dir):
os.makedirs(credential_dir)
credential_path = os.path.join(credential_dir,
'drive-python-quickstart.json')
store = oauth2client.file.Storage(credential_path)
credentials = store.get()
if not credentials or credentials.invalid:
flow = client.flow_from_clientsecrets(CLIENT_SECRET_FILE, SCOPES)
flow.user_agent = APPLICATION_NAME
if flags:
credentials = tools.run_flow(flow, store, flags)
else: # Needed only for compatibility with Python 2.6
credentials = tools.run(flow, store)
print('Storing credentials to ' + credential_path)
return credentials
So what's missing, how can I get the API to return all that extra meta data that's just not appearing right now?
You are very close. With the newer version of the Drive API v3, to retrieve other metadata properties, you will have to add the fields parameter to specify additional properties to include in a partial response.
In your case, since you are looking to retrieve the WebViewLinkproperty your request should look something similar to this:
results = service.files().list(
pageSize=10,fields="nextPageToken, files(id, name, webViewLink)").execute()
To display your items from the response:
for item in items:
print('{0} {1} {2}'.format(item['name'], item['id'], item['webViewLink']))
I also suggest try it out with the API Explorer so you can view what additional metadata properties you would like to display on your response.
Good Luck and Hope this helps ! :)
You explicitly request only the id and name fields in your files.list call. Add webViewLink to the list to results = service.files().list(fields="nextPageToken, files(id, name, webViewLink)").execute(). To retrieval all metadata files/* should be used. For more information about this performance optimizations see Working with partial resources in the Google Drive docs.
I have written a custom function to help with getting a sharable web link given a file/folder id. More information can be gotten here
def get_webViewLink_by_id(spreadsheet_id):
sharable_link_response = drive_service.files().get( fileId=spreadsheet_id, fields='webViewLink').execute()
return(sharable_link_response['webViewLink'])
print(get_webViewLink_by_id(spreadsheet_id = '10Ik3qXK4wseva20lNGUKUTBzKoywaugi6XOmRUoP-4A'))

Categories