I was testing with the dropbox provided API for python..my target was to read a Spreadsheet in my dropbox without downloading it to my local storage.
import dropbox
dbx = dropbox.Dropbox('my-token')
print dbx.users_get_current_account()
fl = dbx.files_get_preview('/CGPA.xlsx')[1] # returns a Response object
After the above code, calling the fl.text() method gives an HTML output which shows the preview that would be seen if opened by browser. And the data can be parsed.
My query is, if there is a built-in method of the SDK for getting any particular info from the spreadsheet, like the data of a row or a cell...preferrably in json format...I previously used butterdb for extracting data from a google drive spreadsheet...is there such functionality for dropbox?....could not understand by reading the docs: http://dropbox-sdk-python.readthedocs.io/en/master/
No, the Dropbox API doesn't offer the ability to selectively query parts of a spreadsheet file like this without downloading the whole file, but we'll consider it a feature request.
Related
Can I read a google spreadsheet which is open to people, but doesn't have a share option? There's a discussion here, but it's I need to have an authorization to click the share option.
Even copying by URL to my own Google spreadsheet may serve the purpose.
Update:
The idea was once I create a Google API, I should be able to create a .json file with a client email. In the share option, I'm supposed to provide the client email of .json file. You may see: Accessing Google Spreadsheet Data using Python.
This is the spreadsheet page where I'm not finding any Share option: https://docs.google.com/spreadsheets/d/e/2PACX-1vSc_2y5N0I67wDU38DjDh35IZSIS30rQf7_NYZhtYYGU1jJYT6_kDx4YpF-qw0LSlGsBYP8pqM_a1Pd/pubhtml#
Issue:
Publishing the contents of a spreadsheet to the web is not the same as making a spreadsheet public.
The URL you shared refers to spreadsheet contents that were published to the web following these steps. This published website is not the same as the original file where the data comes from, and so it doesn't have most of its functionalities, like a Share button (it doesn't make sense to have a Share button anyway, since this URL is already public).
Solution:
If you want to access the spreadsheet data using a Service Account, you would have to do one of the following (better to use method 1 if you have access to the spreadsheet):
Share the spreadsheet itself (not the published contents) with the Service Account, as explained in the link you referenced.
Use your application to fetch the website contents from the provided URL.
Reference:
Make Google Docs, Sheets, Slides & Forms public
I have a working website made using django. I have a private GitHub repository, within it I have excel files which I want to read using pandas read_excel and use on the website. The reason I have made the repository private is because the data is company specific.
1) How do I read an excel file using pandas from a private GitHub repository? Do I need to set up personal access token?
2) After a user logs in to my website, is there then a way to require a further password when they navigate to try and view their company specific dataframe? For example, "User A" will only have access to "Dataframe A", and "User B" will only have access to "Data frame B".
On my local system, the following code works to be able to read the dataframe:
file_path = 'C:/Users/james/Desktop/projects/path/to/excel/file
df = pd.read_excel(file_path)
For my live website, my code which produces the problem is:
URL_path = 'https://github.com/path/to/excel/file/in/private/repository
df = pd.read_excel(URL_path)
I am able to read the excel files on my local computer, but when I try to read in from my private github, I get the following error, even though I know I am using the correct url:
urllib.error.HTTPError: HTTP Error 404: Not Found
I verified this by signing out of my github account, and trying to access the github url with my excel in it, it takes me to a 404 not found page since I am not logged in. When I login to my github account, the same URL takes me to the correct page.
You should need to use a PAO (person access token) from github if the repo is set to private.
You would then need to gather the raw url link to the data and make sure to decode it properly prior to using pandas to read it.
Check out this tutorial here; it's using a csv but the idea is essentially the same:
https://medium.com/towards-entrepreneurship/importing-a-csv-file-from-github-in-a-jupyter-notebook-e2c28e7e74a5
I wrote a small python program that works with data from a CSV file. I am tracking some numbers in a google sheet and I created the CSV file by downloading the google sheet. I am trying to find a way to have python read in the CSV file directly from google sheets, so that I do not have to download a new CSV when I update the spreadsheet.
I see that the requests library may be able to handle this, but I'm having a hard time figuring it out. I've chosen not to try the google APIs because this way seems simpler as long as I don't mind making the sheet public to those with the link, which is fine.
I've tried working with the requests documentation but I'm a novice programmer and I can't get it to read in as a CSV.
This is how the data is currently taken into python:
file = open('data1.csv', newline='')
reader = csv.reader(file)
I would like the file = open() to ideally be replaced by the requests library and pull directly from the spreadsheet.
You need to find the correct URL request that download the file.
Sample URL:
csv_url='https://docs.google.com/spreadsheets/d/169AMdEzYzH7NDY20RCcyf-JpxPSUaO0nC5JRUb8wwvc/export?format=csv&id=169AMdEzYzH7NDY20RCcyf-JpxPSUaO0nC5JRUb8wwvc&gid=0'
The way to doing it is by manually download your file while inspecting the requests URL at the Network tab in the Developer Tools in your browser.
Then the following is enough:
import requests as rs
csv_url=YOUR_CSV_DOWNLOAD_URL
res=rs.get(url=csv_url)
open('google.csv', 'wb').write(res.content)
It will save CSV file with the name 'google.csv' in the folder of you python script file.
import pandas as pd
import requests
YOUR_SHEET_ID=''
r = requests.get(f'https://docs.google.com/spreadsheet/ccc?key={YOUR_SHEET_ID}&output=csv')
open('dataset.csv', 'wb').write(r.content)
df = pd.read_csv('dataset.csv')
df.head()
I tried #adirmola's solution but I had to tweak it a little.
When he wrote "You need to find the correct URL request that download the file" he has a point. An easy solution is what I'm showing here. Adding "&output=csv" after your google sheet id.
Hope it helps!
I'm not exactly sure about your usage scenario, and Adirmola already provided a very exact answer to your question, but my immediate question is why you want to download the CSV in the first place.
Google Sheets has a python library so you can just get the data from the GSheet directly.
You may also be interested in this answer since you're interested in watching for changes in GSheets
I would just like to say using the Oauth keys and google python API is not always an option. I found the above to be quite useful for my current application.
I am trying to download specific sheet from a spread-sheet (on Google Drive) but unable to find a method to do so. I am using Python Client API library (v3) and passing file_id and mimeType in export_media() function as shown below:
request = service.files().export_media(fileId=file_id,mimeType='text/csv')
media_request = http.MediaIoBaseDownload(local_fd, request)
This code always export the sheet which is present at first place. Can you please describe a method through which I can download specific sheet/sheets by providing gid or any other parameter.
I don't think the Drive API has a feature to specify a sheet name.
Two workarounds spring to mind...
You could use the Sheets API (https://developers.google.com/sheets/api/reference/rest/) and write your own csv formatter. It sounds more complex than it is. It's probably 10 lines of code, especially if you go for Tab Separated instead of Comma Separated.
Use the Google Spreadsheet File/Publish to the Web feature to publish a csv of any given sheet. Note that the content will be public, so anybody with the link (which is pretty obtuse) would be able to read the data.
You can use an old visualization API URL (see other answer)
f'https://docs.google.com/spreadsheets/d/{doc_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}'
To make this request using the Google API Python library, you can use the credentials you already have and create an HTTP client instance yourself:
http_client = googleapiclient.discovery._auth.authorized_http(creds)
response, content = http_client.request(url)
Check response.status before you proceed.
Note that this API behaves a bit differently than your regular CSV export. Specifically there are some things I saw it does with headers - it will make them disappear if they are not set to Plain Text on a numeric column (see here), or merge multiple text rows appearing in the top of your sheet as a single header row.
I have a simple script set up for getting a file off my google drive account and updating it. I have no problems authenticating and getting access to the drive. The file is in the form of a google spreadsheet on the drive. Thus, when I have the pydrive file object, I get the URL of the file in csv format via google_file['exportLinks']['text/csv']. This has worked in the past, however today I tried this same method for a new file, and instead of getting the csv format of the data, I keep getting HTML. In addition, if I copy and paste the link from my google_file['exportLink']['text/csv'] and put it into a browser, the browser will begin to download the file in csv format as requested. I really have no idea what is going on, especially since this has worked in the past.
Here is basically what my code does:
drive = GoogleDrive(gauth)
drivefiles = drive.ListFile().GetList()
form_file = None
for f in drivefiles:
if f['title'] == formFileName:
form_file = f
break
output = requests.get(form_file['exportLinks']['text/csv'])
print output.text #this ends up being HTML, not text/csv
Has anyone else out there seen this problem before? Should I just try to delete and re-add the google spreadsheet file on the drive?
EDIT: UPDATE
So, after changing permissions on the file on google drive from accessible only to people who the file has been shared with to accessible/editable by all, I was able to access and download the file. Does the clients_secret.json file allow a particular google drive user to securely authorize or is that a general key that allows anyone with it to access the Google API in that particular session? Are there any special data that has to be send over the http request if the file has only been shared with a limited set of email addresses?
I also found handling idiosyncrasies of google drive api little bit tricky. I have written a wrapper around google drive api, which makes it relatively easy to deal with it. See if it helps:-
Google Drive Client
Sample code:-
file_id = 'abc'
file_access_token = '34244324324'
scope = 'https://www.googleapis.com/auth/drive.readonly'
path_to_private_key_file = '#'
service_account_name = '284765467291-0qnqb1do03dlaj0crl88srh4pbhocj35#developer.gserviceaccount.com'
drive_client = GoogleDriveClient(
scope = scope,
private_key = open(path_to_private_key_file).read(),
service_account_name = service_account_name)
file = drive_client.get_drive_file(file_id, file_access_token)
Ofcourse, you need to change access variables according to your profile.
In case anyone else has been having problems with this, changing the settings on the file in the drive to "everyone can view" solved the problem - seems that there were some permission restrictions. Would suggest that you look into changing the permissions on the file to the least restrictive setting (as possible) while debugging.