Uploading pandas dataframe to google spreadsheet - python

I followed the steps here and here but couldn't upload a pandas dataframe to google sheets.
First I tried the following code:
import gspread
from google.oauth2.service_account import Credentials
scope = ['https://spreadsheets.google.com/feeds',
'https://www.googleapis.com/auth/drive']
credentials = Credentials.from_service_account_file('my_json_file_name.json', scopes=scope)
gc = gspread.authorize(credentials)
spreadsheet_key = '1FNMkcPz3aLCaWIbrC51lgJyuDFhe2KEixTX1lsdUjOY'
wks_name = 'Sheet1'
d2g.upload(df_qrt, spreadsheet_key, wks_name, credentials=credentials, row_names=True)
The above code returns an error message like this: AttributeError: module 'df2gspread' has no attribute 'upload' which doesn't make sense since df2spread indeed has a function called upload.
Second, I tried to append my data to a dataframe that I artificially created on the google sheet by just entering the column names. This also didn't work and didn't provide any results.
import gspread_dataframe as gd
ws = gc.open("name_of_file").worksheet("Sheet1")
existing = gd.get_as_dataframe(ws)
updated = existing.append(df_qrt)
gd.set_with_dataframe(ws, updated)
Any help will be appreciated, thanks!

You are not importing the package properly.
Just do this
from df2gspread import df2gspread as d2g
When you convert a worksheet to Dataframe using
existing = gd.get_as_dataframe(ws)
All the blank columns and rows in the sheet are now part of the dataframe with values as NaN, so when you try to append it with another dataframe it won't be appended because columns are mismatched.
Instead try this to covert worksheet to dataframe
existing = pd.DataFrame(ws.get_all_records())
When you export a dataframe in Google Sheets the index of the dataframe is stored in the first column(It happened in my case, can't be sure).
If the first column is index then you can remove the column using
existing.drop([''],axis=1,inplace=True)
Then this will work properly.
updated = existing.append(df_qrt)
gd.set_with_dataframe(ws, updated)

Related

Reading GoogleSheet with pandas dataframe doing search on it

Do I need read_excel GoogleSheet for doing further search action on its columns in Python?
I must gather data from the entire Google Sheet file. I need search by sheetname firstly, then gather information by looking up the values in columns.
I started by looking up the two popular solutions on the internet;
First one is, with the gspread package : as it relies on service_account.json info I will not use it.
Second one is, appropriate for me. But it shows how to export as csv file. I need to take data as xlsx file.
code is below;
import pandas as pd
sheet_id=" url "
sheet_name="sample_1"
url=f"https://docs.google...d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
I have both info sheet_id and sheet_name but need to export as xlsx file.
Here I see an example how to read an excel file. Is tehre a way to read as excel file but google spreadsheet
Using Pandas to pd.read_excel() for multiple worksheets of the same workbook
xls = pd.ExcelFile('excel_file_path.xls')
# Now you can list all sheets in the file
xls.sheet_names
# ['house', 'house_extra', ...]
# to read just one sheet to dataframe:
df = pd.read_excel(file_name, sheet_name="house")
I have no problem reading a google sheet using the method I found here:
Python Read In Google Spreadsheet Using Pandas
spreadsheet_id = "<INSERT YOUR GOOGLE SHEET ID HERE>"
url = f"https://docs.google.com/spreadsheets/d/{spreadsheet_id}/export?format=csv"
df = pd.read_csv(url)
df.to_excel("my_sheet.xlsx")
You need to set the permissions of your sheet though. I found that setting it to "anyone with a link" worked.
UPDATE - based on comments below
If your spreadsheet has multiple tabs and you want to read anything other than the first sheet, you need to specify a sheetID as described here
spreadsheet_id = "<INSERT YOUR GOOGLE spreadsheetId HERE>"
sheet_id = "<INSERT YOUR GOOGLE sheetId HERE>"
url = f"https://docs.google.com/spreadsheets/d/{spreadsheet_id}/export?gid={sheet_id}&format=csv"
df = pd.read_csv(url)
df.to_excel("my_sheet.xlsx")

Read GoogleSheet with multiple sheets into pandas

I want to read google sheet with multiple sheets into a (or several) pandas dataframe.
I don't know the sheet names, or the number of sheets in advance.
The trivial attempt fails:
def main():
path = r"https://docs.google.com/spreadsheets/d/1-MlSisrAxhOyKhrz6S08PG68j667Ym7jGExOyytpCSM/edit?usp=sharing"
pd.read_excel(path)
fails with
ValueError: Excel file format cannot be determined, you must specify an engine manually.
Trying any format doesn't work.
All answers to this question refer to .csv, meaning a single sheet, or knowing the sheet name in advance.
Same goes for the 1st Google hit for "read google sheet python pandas".
Is there a standard way of doing this?
When your Spreadsheet is publicly shared, in your situation, how about the following sample script?
Sample script:
import openpyxl
import pandas as pd
import requests
from io import BytesIO
spreadsheetId = "###" # Please set your Spreadsheet ID.
url = "https://docs.google.com/spreadsheets/export?exportFormat=xlsx&id=" + spreadsheetId
res = requests.get(url)
data = BytesIO(res.content)
xlsx = openpyxl.load_workbook(filename=data)
for name in xlsx.sheetnames:
values = pd.read_excel(data, sheet_name=name)
# do something
In this sample script, the publicly shared Spreadsheet is exported as a XLSX data. And, the exported XLSX data is opened, the sheet names are retrieved. And then, each sheet is put into the dataframe.
If you want to retrieve the specific sheets, please filter the sheet names from xlsx.sheetnames.
Note:
If your Spreadsheet is not publicly shared, this thread might be useful. Ref

Writing a Json file, cell by cell into a google spreadsheet

I am in the process of automating a process, in which I need to upload some data to a Google spreadsheet.
The data is originally located in a pandas dataframe, which is converted to a JSON file for upload.
I am getting to the upload, but i get all the data into each cell, so that cell A1 contains all data from the entire Pandas dataframe, in fact each cell in the spreadsheet contains all the data :/
Of course, what I want to have happen is to place what is cell A1 in the dataframe, as A1 in the Google spreadsheet and so forth down to cell J173.
I am thinking I need to put in some sort of loop to make this happen, but I am not sure how JSON files work, so I am not succeeding in creating this loop.
I hope one of you can help
Below is the code
#Converting data to a json file for upload
csv_data = csv_data.to_json()
#Updating data
cell_list = sheet.range('A1:J173')
for cell in cell_list:
cell.value = csv_data
sheet.update_cells(cell_list)
Windows 10
Python 3.8
You want to put the data of dataframe to Google Spreadsheet.
In your script, csv_data of csv_data.to_json() is the dataframe.
You want to achieve this using gspread with python.
From your script, I understood like this.
You have already been able to get and put values for Google Spreadsheet using Sheets API.
Pattern 1:
In this pattern, the method of values_update of gspread is used.
Sample script:
spreadsheetId = "###" # Please set the Spreadsheet ID.
sheetName = "Sheet1" # Please set the sheet name.
csv_data = # <--- please set the dataframe.
client = gspread.authorize(credentials)
values = [csv_data.columns.values.tolist()]
values.extend(csv_data.values.tolist())
spreadsheet.values_update(sheetName, params={'valueInputOption': 'USER_ENTERED'}, body={'values': values})
Pattern 2:
In this pattern, the library of gspread-dataframe is used.
Sample script:
from gspread_dataframe import set_with_dataframe # Please add this.
spreadsheetId = "###" # Please set the Spreadsheet ID.
sheetName = "Sheet1" # Please set the sheet name.
csv_data = # <--- please set the dataframe.
client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(spreadsheetId)
worksheet = spreadsheet.worksheet(sheetName)
set_with_dataframe(worksheet, csv_data)
References:
values_update
gspread-dataframe

Import every worksheet in an excel workbook and save to a dataframe named by the worksheet name

I have an excel workbook with 3 worksheets, they are called "Z_scores", "Alpha" and "Rho" respectively.
In the future, this workbook will increase as the number of models and their corresponding parameters are stored here.
In my function I am looking to import each worksheet individually and save it to a dataframe, the name of the dataframe should be decided by the name of the worksheet.
So far, I have this function but I am not able to dynamically name the dataframe and I am unsure what should be written in the return statement
FYI: The import identifier function is simply a way of scanning in worksheet names and those with the identifier present should not be inserted e.g. putting a single blankspace at the beginning of the worksheet name will prevent the worksheet being imported.
#import libraries
import pandas as pd
#define function
def import_excel(filename, import_identifier):
#Create dataframe of the excel
df = pd.read_excel('Excel.xlsx')
# this will read the first sheet into df
xls = pd.ExcelFile('Excel.xlsx')
#Delete all worksheet that begin with the import_identifier
worksheets = []
for x in all_worksheets:
if x[0] != import_identifier:
worksheets.append(x)
#Loop through the sheets which are flagged for importing and import each
#sheet individually into a dataframe
for sheetname in worksheets:
#Encase the sheetname in quotation marks to satisfy the sheetname function in read_excel
sheetname_macro_str = '"{}"'.format(sheetname_macro)
#Import the workbook and save to dynamically named dataframe
sheetname_macro = pd.read_excel(xls, sheetname=sheetname_macro_str)
#What would I return here, how do I ensure the data frames are stored?
#return
As you can read in this thread, a DataFrame object can't reliably be "named". Usually, the Python variable to which the object is assigned will be what describes or differentiates it.
If you're looking to store references to multiple DataFrames in your code, you'll probably want to create a list, tuple, or dictionary for that (outside the scope of your import function). If you use a dictionary, then you can use your worksheet names as keys:
dataframes = {}
dataframes[friendly_sheetname] = dataframe_from_sheet

GSPREAD: How do I fetch the last added worksheet from a spreadsheet having many worksheets?

Using gspread, I know how to access a sheet by name, id or index, like:
import gspread
gc = gspread.authorize(credentials)
worksheet = sh.worksheet("January")
or
worksheet = sh.sheet1
But I was wondering if it is possible to open a last added or last updated sheet?
It's not possible to get the last modification of each spreadsheet sheet because this modification is fetched through Google Drive.
It's possible to obtain the last modification of the entire worksheet using the lastUpdateTime:
import gspread
sa = gspread.service_account('authentication')
sa.open("worksheet name").lastUpdateTime

Categories