I am in the process of automating a process, in which I need to upload some data to a Google spreadsheet.
The data is originally located in a pandas dataframe, which is converted to a JSON file for upload.
I am getting to the upload, but i get all the data into each cell, so that cell A1 contains all data from the entire Pandas dataframe, in fact each cell in the spreadsheet contains all the data :/
Of course, what I want to have happen is to place what is cell A1 in the dataframe, as A1 in the Google spreadsheet and so forth down to cell J173.
I am thinking I need to put in some sort of loop to make this happen, but I am not sure how JSON files work, so I am not succeeding in creating this loop.
I hope one of you can help
Below is the code
#Converting data to a json file for upload
csv_data = csv_data.to_json()
#Updating data
cell_list = sheet.range('A1:J173')
for cell in cell_list:
cell.value = csv_data
sheet.update_cells(cell_list)
Windows 10
Python 3.8
You want to put the data of dataframe to Google Spreadsheet.
In your script, csv_data of csv_data.to_json() is the dataframe.
You want to achieve this using gspread with python.
From your script, I understood like this.
You have already been able to get and put values for Google Spreadsheet using Sheets API.
Pattern 1:
In this pattern, the method of values_update of gspread is used.
Sample script:
spreadsheetId = "###" # Please set the Spreadsheet ID.
sheetName = "Sheet1" # Please set the sheet name.
csv_data = # <--- please set the dataframe.
client = gspread.authorize(credentials)
values = [csv_data.columns.values.tolist()]
values.extend(csv_data.values.tolist())
spreadsheet.values_update(sheetName, params={'valueInputOption': 'USER_ENTERED'}, body={'values': values})
Pattern 2:
In this pattern, the library of gspread-dataframe is used.
Sample script:
from gspread_dataframe import set_with_dataframe # Please add this.
spreadsheetId = "###" # Please set the Spreadsheet ID.
sheetName = "Sheet1" # Please set the sheet name.
csv_data = # <--- please set the dataframe.
client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(spreadsheetId)
worksheet = spreadsheet.worksheet(sheetName)
set_with_dataframe(worksheet, csv_data)
References:
values_update
gspread-dataframe
Related
Do I need read_excel GoogleSheet for doing further search action on its columns in Python?
I must gather data from the entire Google Sheet file. I need search by sheetname firstly, then gather information by looking up the values in columns.
I started by looking up the two popular solutions on the internet;
First one is, with the gspread package : as it relies on service_account.json info I will not use it.
Second one is, appropriate for me. But it shows how to export as csv file. I need to take data as xlsx file.
code is below;
import pandas as pd
sheet_id=" url "
sheet_name="sample_1"
url=f"https://docs.google...d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
I have both info sheet_id and sheet_name but need to export as xlsx file.
Here I see an example how to read an excel file. Is tehre a way to read as excel file but google spreadsheet
Using Pandas to pd.read_excel() for multiple worksheets of the same workbook
xls = pd.ExcelFile('excel_file_path.xls')
# Now you can list all sheets in the file
xls.sheet_names
# ['house', 'house_extra', ...]
# to read just one sheet to dataframe:
df = pd.read_excel(file_name, sheet_name="house")
I have no problem reading a google sheet using the method I found here:
Python Read In Google Spreadsheet Using Pandas
spreadsheet_id = "<INSERT YOUR GOOGLE SHEET ID HERE>"
url = f"https://docs.google.com/spreadsheets/d/{spreadsheet_id}/export?format=csv"
df = pd.read_csv(url)
df.to_excel("my_sheet.xlsx")
You need to set the permissions of your sheet though. I found that setting it to "anyone with a link" worked.
UPDATE - based on comments below
If your spreadsheet has multiple tabs and you want to read anything other than the first sheet, you need to specify a sheetID as described here
spreadsheet_id = "<INSERT YOUR GOOGLE spreadsheetId HERE>"
sheet_id = "<INSERT YOUR GOOGLE sheetId HERE>"
url = f"https://docs.google.com/spreadsheets/d/{spreadsheet_id}/export?gid={sheet_id}&format=csv"
df = pd.read_csv(url)
df.to_excel("my_sheet.xlsx")
I want to copy a sheet from a Google sheet to another Google sheet where I want to keep the data and formatting intact but not the formula.
Just want to copy the cell values to another sheet i.e. raw data.
I am using the google sheets api -
spreadsheets().sheets().copyTo(spreadsheetId=spreadsheet_id, sheetId=sheet_id, body={'destination_spreadsheet_id': target_spreadsheet})
but this is copying the formula and throws error
I believe your goal is as follows.
You want to copy a sheet in a source Spreadsheet to a destination Spreadsheet.
You want to remove the formulas while the cell formats and the cell values are kept.
You want to achieve this using googleapis for python.
In this case, how about the following patterns?
Pattern 1:
In this pattern, your showing script is modified.
service = build("sheets", "v4", credentials=creds) # Please use your client.
srcSpreadsheetId = "###" # Please set source Spreadsheet ID.
srcSheetId = "###" # Please set source sheet ID.
dstSpreadsheetId = "###" # Please set destination Spreadsheet ID.
res = service.spreadsheets().sheets().copyTospreadsheetId=srcSpreadsheetId, sheetId=srcSheetId, body={"destination_spreadsheet_id": dstSpreadsheetId}).execute()
)
service.spreadsheets().batchUpdate(spreadsheetId=dstSpreadsheetId, body={"requests": [{"copyPaste": {"source": {"sheetId": res["sheetId"]},"destination": {"sheetId": res["sheetId"]},"pasteType": "PASTE_VALUES"}}]}).execute()
When this script is run, a sheet in a source Spreadsheet is copied to a destination Spreadsheet. In this case, the formulas are also copied. And, only the values are copied using batchUpdate method. By this, the cell formats and the values are copied without the formulas.
Pattern 2:
In this pattern, the copy process is changed from the above pattern. Because, when the above script is used, if the formulas using the other sheets and Spreadsheet are included in the source sheet, the copied sheet has no values. Unfortunately, from your question, I couldn't confirm this. So, I would like to propose this pattern 2.
service = build("sheets", "v4", credentials=creds) # Please use your client.
srcSpreadsheetId = "###" # Please set source Spreadsheet ID.
srcSheetId = "###" # Please set source sheet ID.
dstSpreadsheetId = "###" # Please set destination Spreadsheet ID.
# 1. Duplicate the source sheet in the source Spreadsheet as a temporal sheet.
newSheetId = "123456789"
service.spreadsheets().batchUpdate(spreadsheetId=srcSpreadsheetId,body={"requests": [{"duplicateSheet": {"sourceSheetId": srcSheetId,"newSheetId": newSheetId}}]}).execute()
time.sleep(3)
# 2. Remove formulas.
service.spreadsheets().batchUpdate(spreadsheetId=srcSpreadsheetId,body={"requests": [{"copyPaste": {"source": {"sheetId": newSheetId},"destination": {"sheetId": newSheetId},"pasteType": "PASTE_VALUES"}}]}).execute()
# 3. Copy the source sheet from the source Spreadsheet to the destination Spreadsheet.
service.spreadsheets().sheets().copyTo(spreadsheetId=srcSpreadsheetId,sheetId=newSheetId,body={"destination_spreadsheet_id": dstSpreadsheetId}).execute()
# 4. Delete temporal sheet from source Spreadsheet.
service.spreadsheets().batchUpdate(spreadsheetId=srcSpreadsheetId,body={"requests": [{"deleteSheet": {"sheetId": newSheetId}}]},
).execute()
When this script is run, the following flow is run.
Duplicate the source sheet in the source Spreadsheet as a temporal sheet.
Remove formulas.
Copy the source sheet from the source Spreadsheet to the destination Spreadsheet.
Delete temporal sheet from source Spreadsheet.
If the copied sheet has no value from the formulas, please increase 3 of time.sleep(3).
References:
Method: spreadsheets.sheets.copyTo
Method: spreadsheets.batchUpdate
You could use SpreadsheetApp:
const destination = SpreadsheetApp.openById("id1")
const source = SpreadsheetApp.openById("id2")
source.getSheetByName("sheet-name").copyTo(destination)
I want to read google sheet with multiple sheets into a (or several) pandas dataframe.
I don't know the sheet names, or the number of sheets in advance.
The trivial attempt fails:
def main():
path = r"https://docs.google.com/spreadsheets/d/1-MlSisrAxhOyKhrz6S08PG68j667Ym7jGExOyytpCSM/edit?usp=sharing"
pd.read_excel(path)
fails with
ValueError: Excel file format cannot be determined, you must specify an engine manually.
Trying any format doesn't work.
All answers to this question refer to .csv, meaning a single sheet, or knowing the sheet name in advance.
Same goes for the 1st Google hit for "read google sheet python pandas".
Is there a standard way of doing this?
When your Spreadsheet is publicly shared, in your situation, how about the following sample script?
Sample script:
import openpyxl
import pandas as pd
import requests
from io import BytesIO
spreadsheetId = "###" # Please set your Spreadsheet ID.
url = "https://docs.google.com/spreadsheets/export?exportFormat=xlsx&id=" + spreadsheetId
res = requests.get(url)
data = BytesIO(res.content)
xlsx = openpyxl.load_workbook(filename=data)
for name in xlsx.sheetnames:
values = pd.read_excel(data, sheet_name=name)
# do something
In this sample script, the publicly shared Spreadsheet is exported as a XLSX data. And, the exported XLSX data is opened, the sheet names are retrieved. And then, each sheet is put into the dataframe.
If you want to retrieve the specific sheets, please filter the sheet names from xlsx.sheetnames.
Note:
If your Spreadsheet is not publicly shared, this thread might be useful. Ref
I followed the steps here and here but couldn't upload a pandas dataframe to google sheets.
First I tried the following code:
import gspread
from google.oauth2.service_account import Credentials
scope = ['https://spreadsheets.google.com/feeds',
'https://www.googleapis.com/auth/drive']
credentials = Credentials.from_service_account_file('my_json_file_name.json', scopes=scope)
gc = gspread.authorize(credentials)
spreadsheet_key = '1FNMkcPz3aLCaWIbrC51lgJyuDFhe2KEixTX1lsdUjOY'
wks_name = 'Sheet1'
d2g.upload(df_qrt, spreadsheet_key, wks_name, credentials=credentials, row_names=True)
The above code returns an error message like this: AttributeError: module 'df2gspread' has no attribute 'upload' which doesn't make sense since df2spread indeed has a function called upload.
Second, I tried to append my data to a dataframe that I artificially created on the google sheet by just entering the column names. This also didn't work and didn't provide any results.
import gspread_dataframe as gd
ws = gc.open("name_of_file").worksheet("Sheet1")
existing = gd.get_as_dataframe(ws)
updated = existing.append(df_qrt)
gd.set_with_dataframe(ws, updated)
Any help will be appreciated, thanks!
You are not importing the package properly.
Just do this
from df2gspread import df2gspread as d2g
When you convert a worksheet to Dataframe using
existing = gd.get_as_dataframe(ws)
All the blank columns and rows in the sheet are now part of the dataframe with values as NaN, so when you try to append it with another dataframe it won't be appended because columns are mismatched.
Instead try this to covert worksheet to dataframe
existing = pd.DataFrame(ws.get_all_records())
When you export a dataframe in Google Sheets the index of the dataframe is stored in the first column(It happened in my case, can't be sure).
If the first column is index then you can remove the column using
existing.drop([''],axis=1,inplace=True)
Then this will work properly.
updated = existing.append(df_qrt)
gd.set_with_dataframe(ws, updated)
Using gspread, I know how to access a sheet by name, id or index, like:
import gspread
gc = gspread.authorize(credentials)
worksheet = sh.worksheet("January")
or
worksheet = sh.sheet1
But I was wondering if it is possible to open a last added or last updated sheet?
It's not possible to get the last modification of each spreadsheet sheet because this modification is fetched through Google Drive.
It's possible to obtain the last modification of the entire worksheet using the lastUpdateTime:
import gspread
sa = gspread.service_account('authentication')
sa.open("worksheet name").lastUpdateTime