How to predefined number of rows and cols within gspread - python

I do have a scraped data which i overwriting google sheet daily with it.
The point here that I'm unable to find an option where i can set number of rows and cols for the existing google sheet.
I noticed that can be done only for new created sheet according to documentation but i don't know how to do it for existing sheet!
def api(key):
myfilt = [list of lists]
columns = [name of columns]
gc = gspread.service_account(filename='Auth.json')
sh = gc.open_by_key(key)
worksheet = sh.sheet1
worksheet.clear()
head = worksheet.insert_row(columns, 1)
res = worksheet.insert_rows(myfilt, 2)
api("MyAPIHere")
My target here is to predefined number of rows according to len(myfilt) and number of cols according to len(cols)

I believe your goal as follows.
You want to change the max row and column number of the existing sheet in the Google Spreadsheet.
You want to achieve this using gspread with python.
You have already been able to get and put values for Google Spreadsheet using Sheets API.
Points for achieving your goal:
In this case, it is required to use the method of "spreadsheets.batchUpdate" in Sheets API. And I would like to propose the following flow.
Insert one row.
Insert one column.
Delete rows from 2 to end.
Delete columns from 2 to end.
Insert rows. In this case, you can set the number of rows you want to insert.
Insert columns. In this case, you can set the number of columns you want to insert.
1 and 2 are used for avoiding the error. Because when the DeleteDimensionRequest is run for the sheet which has only one row or one column, an error occurs.
When above flow is reflected to the script using gspread, it becomes as follows.
Sample script:
Please set the Spreadsheet ID and sheet name.
spreadsheetId = "###" # Please set the Spreadsheet ID.
sheetName = "###" # Please set the sheet name.
client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(spreadsheetId)
# worksheet = spreadsheet.worksheet(sheetName)
sheetId = spreadsheet.worksheet(sheetName)._properties['sheetId']
rows = len(myfilt)
columns = len(cols)
req = {
"requests": [
{
"insertDimension": {
"range": {
"sheetId": sheetId,
"startIndex": 0,
"endIndex": 1,
"dimension": "ROWS"
}
}
},
{
"insertDimension": {
"range": {
"sheetId": sheetId,
"startIndex": 0,
"endIndex": 1,
"dimension": "COLUMNS"
}
}
},
{
"deleteDimension": {
"range": {
"sheetId": sheetId,
"startIndex": 1,
"dimension": "ROWS"
}
}
},
{
"deleteDimension": {
"range": {
"sheetId": sheetId,
"startIndex": 1,
"dimension": "COLUMNS"
}
}
},
{
"insertDimension": {
"range": {
"sheetId": sheetId,
"startIndex": 0,
"endIndex": rows - 1,
"dimension": "ROWS"
}
}
},
{
"insertDimension": {
"range": {
"sheetId": sheetId,
"startIndex": 0,
"endIndex": columns - 1,
"dimension": "COLUMNS"
}
}
}
]
}
res = spreadsheet.batch_update(req)
print(res)
References:
Method: spreadsheets.batchUpdate
DeleteDimensionRequest
InsertDimensionRequest
batch_update(body)

I used the following to solve my issue as well:
worksheet.clear() # to clear the sheet firstly.
head = worksheet.insert_row(header, 1) # inserting the header at first row
res = worksheet.insert_rows(mydata, 2) # inserting my data.
worksheet.resize(rows=len(mydata) + 1, cols=len(header)) # resize according to length of cols and rows.

Related

Formatting issues with Python amd GSpread

I have this panda Data Frame (DF1).
DF1= DF1.groupby(['Name', 'Type', 'Metric'])
DF1= DF1.first()
If I output to df1.to_excel("output.xlsx"). The format is correct see bellow :
But when I upload to my google sheets using python and GSpread
from gspread_formatting import *
worksheet5.clear()
set_with_dataframe(worksheet=worksheet1, dataframe=DF1, row=1, include_index=True,
include_column_header=True, resize=True)
That's the output
How can I keep the same format in my google sheets using gspread_formatting like in screenshot 1?
Issue and workaround:
In the current stage, it seems that the data frame including the merged cells cannot be directly put into the Spreadsheet with gspread. So, in this answer, I would like to propose a workaround. The flow of this workaround is as follows.
Prepare a data frame including the merged cells.
Convert the data frame to an HTML table.
Put the HTML table with the batchUpdate method of Sheets API.
By this flow, the values can be put into the Spreadsheet with the merged cells. When this is reflected in a sample script, how about the following sample script?
Sample script:
# This is from your script.
DF1 = DF1.groupby(["Name", "Type", "Metric"])
DF1 = DF1.first()
# I added the below script.
spreadsheetId = "###" # Please set your spreadsheet ID.
sheetName = "Sheet1" # Please set your sheet name you want to put the values.
spreadsheet = client.open_by_key(spreadsheetId)
sheet = spreadsheet.worksheet(sheetName)
body = {
"requests": [
{
"pasteData": {
"coordinate": {"sheetId": sheet.id},
"data": DF1.to_html(),
"html": True,
"type": "PASTE_NORMAL",
}
}
]
}
spreadsheet.batch_update(body)
When this script is run with your sample value including the merged cells, the values are put to the Spreadsheet by reflecting the merged cells.
If you want to clear the cell format, please modify body as follows.
body = {
"requests": [
{
"pasteData": {
"coordinate": {"sheetId": sheet.id},
"data": DF1.to_html(),
"html": True,
"type": "PASTE_NORMAL",
}
},
{
"repeatCell": {
"range": {"sheetId": sheet.id},
"cell": {},
"fields": "userEnteredFormat",
}
},
]
}
References:
Method: spreadsheets.batchUpdate
PasteDataRequest

Why is my python code "deleteDimension" not removing a specific row in my Google spreadsheet?

I spent hours trying to remove a specific row in my Google spreadsheet in python with:
from googleapiclient.discovery import build
from google.oauth2 import service_account
I am trying to remove this row number 3291
Here is my code:
def RemoveRowFromSpreadsheet(sheet,spreadsheet_id, id_row):
spreadsheet_data = [
{
"deleteDimension": {
"range": {
"sheetId": 1483248242,
"dimension": "ROWS",
"startIndex": id_row,
"endIndex": id_row
}
}
}
]
update_spreadsheet_data = {"requests": spreadsheet_data}
updating = sheet.batchUpdate(
spreadsheetId=spreadsheet_id, body=update_spreadsheet_data)
result=updating.execute()
print(f"result:{result}")
googleSheetURL="https://docs.google.com/spreadsheets/d/1viLqgmmeolHohWRQAfKb9Xdh9OyZxf0Y4O25bsjA9kA/edit?usp=sharing"
id_spreadsheet=urlToID(googleSheetURL)
sheet=GetSPreadsheet()
RemoveRowFromSpreadsheet(sheet, id_spreadsheet, 3290)
As you can see I specify index 3290 in order to remove row N° 3291.
And this is the output:
result:{'spreadsheetId': '1viLqgmmeolHohWRQAfKb9Xdh9OyZxf0Y4O25bsjA9kA', 'replies': [{}]}
Process finished with exit code 0
I don't see any issue or any error message. But when I return to the spreadsheet to check if the row was removed, nothing. I checked the row above and the row below this row N° 3291. And nothing changed. I thought my script removed the wrong row but this is not the case.
Issue:
endIndex is exclusive, so if it's the same as startIndex, no row is deleted.
Solution:
In order to delete 1 row, set endIndex to startIndex + 1.
"deleteDimension": {
"range": {
"sheetId": 1483248242,
"dimension": "ROWS",
"startIndex": id_row,
"endIndex": id_row + 1
}
}
Reference:
DimensionRange

Adding columns to Google Sheets

I am trying to add columns to a Google Sheets, but I get an error. I need to add a copy of the previous column.
Code:
def insert_column():
sa = gspread.service_account(filename="service_account.json")
sh = sa.open("**NAME**")
wks = sh.worksheet("Class Data")
data = {
"requests": [
{
"insertDimension": {
"range": {
"sheetId": 0,
"dimension": "COLUMNS",
"startIndex": 4,
"endIndex": 5
},
"inheritFromBefore": True
}
},
],
}
wks.batch_update(data).execute()
An Error: TypeError: string indices must be integers
I think the problem is here wks.batch_update(data).execute() , but I don't know how to solve it.
When I saw the document of gspread, it seems that batch_update(body) is the method of class gspread.spreadsheet.Spreadsheet. But, you are using this method as the method of class gspread.worksheet.Worksheet. I think that this is the reason for your issue of TypeError: string indices must be integers. And also, execute() is not required to be used.
When these points are reflected in your script, it becomes as follows.
Modified script:
def insert_column():
sa = gspread.service_account(filename="service_account.json")
sh = sa.open("**NAME**")
# wks = sh.worksheet("Class Data") # In this script, this line is not used.
data = {
"requests": [
{
"insertDimension": {
"range": {
"sheetId": 0, # <--- Please set the sheet ID of "Class Data" sheet.
"dimension": "COLUMNS",
"startIndex": 4,
"endIndex": 5
},
"inheritFromBefore": True
}
},
],
}
sh.batch_update(data)
Note:
This modified script supposes that your service account can access to the Spreadsheet. Please be careful about this.
Reference
batch_update(body)

JSON input to multiple excel file outputs

I have a JSON file that looks like this:
{
"Person A": {
"Company A": {
"Doctor": {
"Morning": "2000",
"Afternoon": "1200"
},
"Nurse": {}
}
},
"Person B": {
"Education": {
"main": {
"Primary school": {
"2012": "2A",
"2013": "3A"
},
"Secondary school": {
"2016": "1K",
"2017": "2K"
}
}
}
}
}
How do I extract the table for Education (without the main) with
primary_school.xlsx as an excel file:
year, class
secondary_school.xlsx as an excel file:
year, class
PersonA_CompanyA_Doctor.xlsx
Time, salary
PersonA_CompanyA_Nurse.xlsx:
Time, salary
I have tried json_normalize but still cannot get the result that I want.
pd.json_normalize(file, max_level=1)
Is there a simple way of doing it using dataframe?
The JSON data you presented as an example is in the form of a graph with a lot of connections, firstly, after connecting your ports on this data structure, regardless of the format of the data, -cut the green wire :)-
After this process, you should have a one-dimensional array, iterable, in which you will access the names of the xlsx files you specify.
If you are specifically asking about the connection part, it is possible for us to find a general solution by simplifying the example.
But if you want to continue,
Examining the detailed example below and installing the relevant package with the pip install xlsxwriter cli command if necessary.
With the list in your hand, you can create the xlsx files you want, in order.
`
import xlsxwriter
# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook('Expenses01.xlsx')
worksheet = workbook.add_worksheet()
# Some data we want to write to the worksheet.
expenses = (
['Rent', 1000],
['Gas', 100],
['Food', 300],
['Gym', 50],
)
# Start from the first cell. Rows and columns are zero indexed.
row = 0
col = 0
# Iterate over the data and write it out row by row.
for item, cost in (expenses):
worksheet.write(row, col, item)
worksheet.write(row, col + 1, cost)
row += 1
# Write a total using a formula.
worksheet.write(row, 0, 'Total')
worksheet.write(row, 1, '=SUM(B1:B4)')
workbook.close()
`

How to bulk update cells to the same value in google sheets using gspread

I am trying to write a 0 into a predefined range of cells in google sheets. Using gspread I am able to blank clear all the cells I want but I am unable to figure out how to batch update all the cells to one value. Iterating through the cells and updating them one by one is too slow. I'm sure its just a simple formatting error but I can't find documentation on the proper way
sheet.batch_clear(['A2:A50']) #works fine
sheet.batch_update(['B2:B50'], 0) #does not work
In your situation, I thought that when the batch_update method of Class Spreadsheet is used, your goal can be achieved by one API call. When this is reflected in a sample script using gspread, it becomes as follows.
Sample script:
client = gspread.authorize(credentials) # Please use your authorization script.
spreadsheetId = "###" # Please set your Spreadsheet ID.
sheetName = "Sheet1" # Please set your sheet name.
spreadsheet = client.open_by_key(spreadsheetId)
sheet_id = spreadsheet.worksheet(sheetName).id
requests = {
"requests": [
{
"repeatCell": {
"cell": {
"userEnteredValue": {
"numberValue": 0
}
},
"range": {
"sheetId": sheet_id,
"startRowIndex": 1,
"endRowIndex": 50,
"startColumnIndex": 1,
"endColumnIndex": 2
},
"fields": "userEnteredValue"
}
}
]
}
spreadsheet.batch_update(requests)
When this script is run, the value of 0 is put to the cells B2:B50.
In this script, the A1Notation of B2:B50 is used as the GridRange.
References:
batch_update(body)
RepeatCellRequest
GridRange

Categories