Getting Google Sheets Data into Redshift - python

I'm trying to get data that lives within a Google Sheet into our Redshift database. I was able to follow the directions from this link: https://www.twilio.com/blog/2017/02/an-easy-way-to-read-and-write-to-a-google-spreadsheet-in-python.html
Is it possible to have it pull data from the most recently added google sheets within a folder (instead of just specifying a single sheet) and write to the Redshift table?
Here is what was used to read the google sheets data into Python:
import gspread
from oauth2client.service_account import ServiceAccountCredentials
# use creds to create a client to interact with the Google Drive API
scope = ['https://spreadsheets.google.com/feeds']
creds = ServiceAccountCredentials.from_json_keyfile_name('client_secret.json', scope)
client = gspread.authorize(creds)
# Find a workbook by name and open the first sheet
# Make sure you use the right name here.
sheet = client.open("Copy of Legislators 2017").sheet1
# Extract and print all of the values
list_of_hashes = sheet.get_all_records()
print(list_of_hashes)

You can use the Drive API to query for files added within a given timeframe that are of a specific type. All the search parameters and syntax for such a query are listed here.
# Build the Drive service
...
# Query for recent files, with stipulation that their mimetype contains "spreadsheet"
query = "mimeType contains 'spreadsheet' and modifiedTime > '"
query += someDateAsUTC_inRFC_3339_String + "'"
# Execute the query
request = drive.files.list(q=query, .... )
resp = request.execute()
nextPage = resp['nextPageToken']
if resp['files']:
# Call method to consume files
while nextPage:
request = drive.files.list_next(request, resp)
if request:
resp = request.execute()
nextPage = resp['nextPageToken']
if resp['files']:
# Call method to consume files
else
break
# Done

Related

find all sheet id in a drive

In their documentation, google show the example of how to read a sheet using their API. it requires you to manually find the google sheet ID first from the url. However, I would like to just give a google drive directory, and find the sheet id of all the google sheets in that drive. How can I do that?
Code tried:
from google.oauth2 import service_account
from googleapiclient.discovery import build
sheetMymeType = "application/vnd.google-apps.spreadsheet"
parent = PARENTFOLDER_ID
SERVICE_ACCOUNT_FILE = 'keys.json'
SCOPES = ['https://www.googleapis.com/auth/drive.readonly']
credentials = None
credentials = service_account.Credentials.from_service_account_file(
SERVICE_ACCOUNT_FILE, scopes=SCOPES)
drive = build('drive', 'v3', credentials=credentials)
q = "mimeType = '{}' and parents = '{}'".format(sheetMymeType, parent)
list = drive.files().list(q=q, fields='files(id)').execute()
print(list)
In your case you need to use the Google Drive API, in specific the Files.list method.
As you can see in the documentation for Search files and folders you can supply a q parameter in order to search for a determined type of file. As your goal is give a google drive directory, and find the sheet id of all the google sheets in that drive you need to use two query terms:
mimeType in your case application/vnd.google-apps.spreadsheet (you can review the full list of Google Workspace and Drive MIME Types here)
parent the ID of the folder that you want to use.
And, as you need to retrieve those who accomplish the two requeriments you need to use the query operator and
sample.py
def get_all_sheets_in_folder():
drive = build('drive', 'v3', credentials=creds)
sheetMymeType = "application/vnd.google-apps.spreadsheet"
parent = 'SOME_FOLDER_ID'
q = "mimeType = '{}' and parents in '{}'".format(sheetMymeType, parent)
list = drive.files().list(q=q, fields='files(id)').execute()
print(list)
I recommend you review the quickstart.py in order to know how to set up your project.

Accessing Microsoft SharePoint excel file using python

Can Some one explain how I can take my company shared data from store in Microsoft SharePoint, using python?
How I create connection my company SharePoint particular data store location, using python?
Because I need to get some excel files from SharePoint folder, into pandas data frame so, do that thing initially I need to create connection to my company SharePoint. I refer different document but still I couldn't find correct way to do that task.
If some tell me step by step for following task then I can work on that.
Try this :
from azure.identity import ClientSecretCredential
import pandas as pd
import requests
TENANT_ID = ''
CLIENT = ''
KEY = ''
siteId= ''
itemId =''
tempPath = 'd:/home/test.csv'
cred = ClientSecretCredential(
client_id = CLIENT,
client_secret = KEY,
tenant_id = TENANT_ID
)
access_token = cred.get_token("https://graph.microsoft.com/.default").token
#download csv to local
reqFileURL = 'https://graph.microsoft.com/v1.0/sites/%s/drive/items/%s/content' % (siteId,itemId)
fileContent = requests.get(url = reqFileURL,headers={'Authorization':'Bearer ' + access_token})
f = open(tempPath,'wb')
f.write(fileContent.content)
f.close()
data = pd.read_csv(tempPath)
print(data)
Result:
Basically, I use this MS Graph API to download CSV content.
If you not sure how to get CSV itemID, see this doc.
Pls note, before you run this demo, you should make sure that your sp has been granted permissions the API doc indicated, like:

Is there a way to use loaded Service Account to access google sheets in Cloud Run?

I am currently looking for a way to read data from a Sheet in Python from a GCP Cloud Run instance. A service account (to which I shared the sheet) is loaded during the deployment of the instance and I was wondering if it was possible to use this service account directly without loading its key, as the instance is already using the account.
If you have any idea on how to do it I'll be happy.
Thanks
A service account is an email, as any user account. If you need to access a user Google Sheet document to perform some processing on the data, you can simply share the sheet with the service account email. Put it in "Viewer" for read only or in "Editor" if you also want to update the sheet.
Then in your code, you need to create a credential, with the Cloud Run service account (not a key that you have on your side). The important part here, is to correctly scope the credential.
And then you can use the Sheet API to interact when the document.
from googleapiclient.discovery import build
import google.auth
SCOPES = ['https://www.googleapis.com/auth/spreadsheets.readonly', 'https://www.googleapis.com/auth/cloud-platform']
default_credentials, project_id = google.auth.default(scopes=SCOPES)
# The ID and range of a sample spreadsheet.
SAMPLE_SPREADSHEET_ID = 'YOUR DOCUMENT ID'
SAMPLE_RANGE_NAME = 'A1:C1'
service = build('sheets', 'v4', credentials=default_credentials)
sheet = service.spreadsheets()
result = sheet.values().get(spreadsheetId=SAMPLE_SPREADSHEET_ID,
range=SAMPLE_RANGE_NAME).execute()
values = result.get('values', [])
to_ret = "Result \n"
if not values:
to_ret += "\n" + 'No data found.'
print('No data found.')
else:
to_ret += "\n" + 'Results:'
print('Results:')
for row in values:
to_ret += "\n" + row[0]
print(row)

How to create a sheet under a specific folder with google API for python?

I am able to create a sheet with the code below in the root of ‘My Drive’ but how do I create the sheet under a folder in “My Drive” or “Shared drives”?
from googleapiclient.discovery import build
service = build(‘sheets’, ‘v4’, credentials=creds)
sheet = service.spreadsheets()
body = {}
results = sheet.create(body=body).execute()
pprint(results)
You want to create new Spreadsheet in the specific folder.
You want to achieve this using google-api-python-client with python.
If my understanding is correct, how about this answer?
Issue:
Unfortunately, in the current stage, new Spreadsheet cannot be directly created to the specific folder of Google Drive using Sheets API. In this case, Drive API is required to be used.
Sample script:
Before you run the script, please set the folder ID.
Pattern 1:
In this pattern, the new Spreadsheet is directly created to the specific folder in your Google Drive. In order to create Spreadsheet, the mimeType of application/vnd.google-apps.spreadsheet is used.
Script:
drive = build('drive', 'v3', credentials=creds)
file_metadata = {
'name': 'sampleName',
'parents': ['### folderId ###'],
'mimeType': 'application/vnd.google-apps.spreadsheet',
}
res = drive.files().create(body=file_metadata).execute()
print(res)
Pattern 2:
In this pattern, after the new Spreadsheet is created by Sheets API, the Spreadsheet is moved to the specific folder in your Google Drive.
Script:
# Create Spreadsheet to the root folder.
service = build('sheets', 'v4', credentials=creds)
sheet = service.spreadsheets()
body = {}
results = sheet.create(body=body).execute()
pprint(results)
# Move the created Spreadsheet to the specific folder.
drive = build('drive', 'v3', credentials=creds)
folderId = '### folderId ###'
res = drive.files().update(fileId=results['spreadsheetId'], addParents=folderId, removeParents='root').execute()
print(res)
Note:
For both samples, please add a scope of https://www.googleapis.com/auth/drive. And when the scopes are added, please remove the created credential file including the refresh token and authorize again. By this, the additional scopes are reflected to the refresh token.
If you want to use the shared Drive, please modify as follows.
For pattern 1
file_metadata = {'name': 'sampleName','parents': ['### folderId ###'],'mimeType': 'application/vnd.google-apps.spreadsheet','driveId': "###"}
res = drive.files().create(body=file_metadata, supportsAllDrives=True).execute()
For pattern 2
res = drive.files().update(fileId=results['spreadsheetId'], body={'driveId': "###"}, addParents=folderId, removeParents='root', supportsAllDrives=True).execute()
References:
Files: create
Files: update
If I misunderstood your question and this was not the direction you want, I apologize.

i want to download file from google drive using Drive Api

in this code :
file_id = '0BwwA4oUTeiV1UVNwOHItT0xfa2M'
request = drive_service.files().get_media(fileId=file_id)
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print "Download %d%%." % int(status.progress() * 100)
I don't know how to get file_id , I was getting file_id while uploading but now I am not able to figure out how to get file_id of the file which is present on Google Drive.
Ex. if my uploaded file has name A001002.pdf , how can i get file id for this file.
there is some reference online which i am not able to understand.
link: files.list
any help?
The file.list method contains a q paramater which is used for searching
GET https://www.googleapis.com/drive/v3/files?q=name+%3D+'hello'&key={YOUR_API_KEY}
Python Guess
"""
Shows basic usage of the Drive v3 API.
Creates a Drive v3 API service and prints the names and ids of the last 10 files
the user has access to.
"""
from __future__ import print_function
from apiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
# Setup the Drive v3 API
SCOPES = 'https://www.googleapis.com/auth/drive.metadata.readonly'
store = file.Storage('credentials.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
creds = tools.run_flow(flow, store)
service = build('drive', 'v3', http=creds.authorize(Http()))
# Call the Drive v3 API
results = service.files().list(
pageSize=10, fields="*").execute()
items = results.get('files', [])
if not items:
print('No files found.')
else:
print('Files:')
for item in items:
print('{0} ({1})'.format(item['name'], item['id']))
Note this example does not show how to add the additional parameter i am still Googling that but i am not a python dev you may know more about how to do that than me.
Depending on your implementation, there is one more alternative. In case you do not need to programmatically get the fileID you can just open the file in google docs from the browser and the ID is shown in the URL.

Categories