Formatting issues with Python amd GSpread - python

I have this panda Data Frame (DF1).
DF1= DF1.groupby(['Name', 'Type', 'Metric'])
DF1= DF1.first()
If I output to df1.to_excel("output.xlsx"). The format is correct see bellow :
But when I upload to my google sheets using python and GSpread
from gspread_formatting import *
worksheet5.clear()
set_with_dataframe(worksheet=worksheet1, dataframe=DF1, row=1, include_index=True,
include_column_header=True, resize=True)
That's the output
How can I keep the same format in my google sheets using gspread_formatting like in screenshot 1?

Issue and workaround:
In the current stage, it seems that the data frame including the merged cells cannot be directly put into the Spreadsheet with gspread. So, in this answer, I would like to propose a workaround. The flow of this workaround is as follows.
Prepare a data frame including the merged cells.
Convert the data frame to an HTML table.
Put the HTML table with the batchUpdate method of Sheets API.
By this flow, the values can be put into the Spreadsheet with the merged cells. When this is reflected in a sample script, how about the following sample script?
Sample script:
# This is from your script.
DF1 = DF1.groupby(["Name", "Type", "Metric"])
DF1 = DF1.first()
# I added the below script.
spreadsheetId = "###" # Please set your spreadsheet ID.
sheetName = "Sheet1" # Please set your sheet name you want to put the values.
spreadsheet = client.open_by_key(spreadsheetId)
sheet = spreadsheet.worksheet(sheetName)
body = {
"requests": [
{
"pasteData": {
"coordinate": {"sheetId": sheet.id},
"data": DF1.to_html(),
"html": True,
"type": "PASTE_NORMAL",
}
}
]
}
spreadsheet.batch_update(body)
When this script is run with your sample value including the merged cells, the values are put to the Spreadsheet by reflecting the merged cells.
If you want to clear the cell format, please modify body as follows.
body = {
"requests": [
{
"pasteData": {
"coordinate": {"sheetId": sheet.id},
"data": DF1.to_html(),
"html": True,
"type": "PASTE_NORMAL",
}
},
{
"repeatCell": {
"range": {"sheetId": sheet.id},
"cell": {},
"fields": "userEnteredFormat",
}
},
]
}
References:
Method: spreadsheets.batchUpdate
PasteDataRequest

Related

Getting this error when running the python code in DATABRICKS

THE ERROR
Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the
referenced columns only include the internal corrupt record column
(named _corrupt_record by default). For example:
spark.read.schema(schema).csv(file).filter($"_corrupt_record".isNotNull).count()
and spark.read.schema(schema).csv(file).select("_corrupt_record").show().
Instead, you can cache or save the parsed results and then send the same query.
For example, val df = spark.read.schema(schema).csv(file).cache() and then
df.filter($"_corrupt_record".isNotNull).count().
THE CODE
from pyspark.sql.functions import explode, col
# Read the JSON file from Databricks storage
df_json = spark.read.json("/mnt/BigData_JSONFiles/new_test.json")
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false")
# Convert the dataframe to a dictionary
data = df_json.toPandas().to_dict()
# Split the data into two parts
d1 = dict(itertools.islice(data.items(), 8))
d2 = dict(itertools.islice(data.items(), 8, len(data.items())))
# Convert the first part of the data back to a dataframe
df1 = spark.createDataFrame([d1])
# Write the first part of the data to a JSON file in Databricks storage
df1.write.format("json").save("/mnt/BigData_JSONFiles/new_test_header.json")
# Convert the second part of the data back to a dataframe
df2 = spark.createDataFrame([d2])
# Write the second part of the data to a JSON file in Databricks storage
df2.write.format("json").save("/mnt/BigData_JSONFiles/new_test_detail.json")
THE SAMPLE JSON FILE OF LARGE JSON FILE
{
"reporting_entity_name": "launcher",
"reporting_entity_type": "launcher",
"plan_name": "launched",
"plan_id_type": "hios",
"plan_id": "1111111111",
"plan_market_type": "individual",
"last_updated_on": "2020-08-27",
"version": "1.0.0",
"in_network": [
{
"negotiation_arrangement": "ffs",
"name": "Boosters",
"billing_code_type": "CPT",
"billing_code_type_version": "2020",
"billing_code": "27447",
"description": "Boosters On Demand",
"negotiated_rates": [
{
"provider_groups": [
{
"npi": [
0
],
"tin": {
"type": "ein",
"value": "11-1111111"
}
}
],
"negotiated_prices": [
{
"negotiated_type": "negotiated",
"negotiated_rate": 123.45,
"expiration_date": "2022-01-01",
"billing_class": "organizational"
}
]
}
]
}
]
}
Hi, I am trying to divide a big json file into two format which is done by the above code. But it is failing saying to cache i used
.cache() at the last of the loading file but still getting this error. Kindly please let me know how can i solve this error.
I am able to resolve this error buy changing the this
df_json = spark.read.json("/mnt/BigData_JSONFiles/new_test.json")
to this
df_json = spark.read.option("multiline","true").json("/mnt/BigData_JSONFiles/new_test.json")

How to bulk update cells to the same value in google sheets using gspread

I am trying to write a 0 into a predefined range of cells in google sheets. Using gspread I am able to blank clear all the cells I want but I am unable to figure out how to batch update all the cells to one value. Iterating through the cells and updating them one by one is too slow. I'm sure its just a simple formatting error but I can't find documentation on the proper way
sheet.batch_clear(['A2:A50']) #works fine
sheet.batch_update(['B2:B50'], 0) #does not work
In your situation, I thought that when the batch_update method of Class Spreadsheet is used, your goal can be achieved by one API call. When this is reflected in a sample script using gspread, it becomes as follows.
Sample script:
client = gspread.authorize(credentials) # Please use your authorization script.
spreadsheetId = "###" # Please set your Spreadsheet ID.
sheetName = "Sheet1" # Please set your sheet name.
spreadsheet = client.open_by_key(spreadsheetId)
sheet_id = spreadsheet.worksheet(sheetName).id
requests = {
"requests": [
{
"repeatCell": {
"cell": {
"userEnteredValue": {
"numberValue": 0
}
},
"range": {
"sheetId": sheet_id,
"startRowIndex": 1,
"endRowIndex": 50,
"startColumnIndex": 1,
"endColumnIndex": 2
},
"fields": "userEnteredValue"
}
}
]
}
spreadsheet.batch_update(requests)
When this script is run, the value of 0 is put to the cells B2:B50.
In this script, the A1Notation of B2:B50 is used as the GridRange.
References:
batch_update(body)
RepeatCellRequest
GridRange

Change Google Spreadsheets cell horizontal alignment using python

I've been trying to configure the cell horizontal alignment format of a google spreadsheet in python. I've read the original google developers information and have also checked other examples from people facing the same issue, but none of them applies to my example. The following code is the one I use to configure my spreadsheet request body. What I want to do is to create a new spreadsheet in which the horizontal alignment of all the cells is centered. As a result, when a user types anything in any cell, it will be centered automatically. Any tips?
spreadsheet_body = {
'properties': {
# spreadsheet name
'title': sheet_name,
"defaultFormat":{
"horizontalAlignment":'CENTER'
}
},
'sheets': [{
'properties': {
# worksheet name
'title': 'Φύλλο1',
'gridProperties': {
# row\column number
'rowCount': 100,
'columnCount': 20
},
},
'data': [{'rowData': [{'values': header_row}]}] # Added
}
]
}
request = service.spreadsheets().create(body=spreadsheet_body)
response = request.execute()
I believe your goal is as follows.
When a new Spreadsheet is created using the method of spreadsheets.create of Sheets API, you want to set the horizontal alignment of the cells.
You want to achieve this using googleapis with python.
Issue and workaround:
Unfortunately, "defaultFormat":{"horizontalAlignment":'CENTER'} of Spreadsheet property cannot be used. This is the current specification. This has already been mentioned ziganotschka's answer.
When I saw your script, the created new Spreadsheet has one sheet. In this case, when this sheet is created, the horizontal alignment of all cells in the sheet can be set as CENTER. But in this case, when a new sheet is created, the default format is used. Please be careful about this. So this answer is a workaround.
When this workaround is reflected in your script, it becomes as follows.
Modified script:
rowCount = 100
columnCount = 20
rowData = []
for r in range(1, rowCount):
temp = []
for c in range(0, columnCount):
temp.append({'userEnteredFormat': {'horizontalAlignment': 'CENTER'}})
rowData.append({'values': temp})
hlen = len(header_row)
if hlen < columnCount:
for c in range(0, columnCount - hlen):
header_row.append({'userEnteredFormat': {'horizontalAlignment': 'CENTER'}})
rowData.insert(0, {'values': header_row})
spreadsheet_body = {
'properties': {
# spreadsheet name
'title': sheet_name
},
'sheets': [{
'properties': {
# worksheet name
'title': 'Φύλλο1',
'gridProperties': {
# row\column number
'rowCount': rowCount,
'columnCount': columnCount
},
},
'data': [{'rowData': rowData}]
}]}
request = service.spreadsheets().create(body=spreadsheet_body)
response = request.execute()
Reference:
Method: spreadsheets.create
Under the resource SpreadsheetProperties it is specified:
defaultFormat
This field is read-only.
So unfortunately it is not possible to create with the Sheets API a spreadsheet where all cells are centered automatically - just like it is not possible to do it via the UI.

Python - GSPREAD - Copy text and format from one google sheet to another one

I have two files in google drive (two google sheets) and I need to move from one file to another the data and the format in a regular mode.
As you can see on the picture bellow, I have different text formats that I need to maintain (bold, text color, etc.):
My first attempt was:
Using only google drive sheet functions using IMPORTRANGE. It copies the data very well but I loose the format that I want to mantain on the destination file.
My second attemp has been:
Using Python and gspread package copy the data and the format from source google sheet to the destination one. Fo that I have the following code:
import gspread
from oauth2client.service_account import ServiceAccountCredentials
source_file_sheet = 'https://docs.google.com/spreadsheets/X'
destination_file_sheet = 'https://docs.google.com/spreadsheets/Y'
service_key = "file.json"
scope = ["https://spreadsheets.google.com/feeds", 'https://www.googleapis.com/auth/spreadsheets','https://www.googleapis.com/auth/drive.file', 'https://www.googleapis.com/auth/drive']
creds_file = ServiceAccountCredentials.from_json_keyfile_name(service_key, scope)
sourceSheetName = cod_source_system_file_operational
destinationSheetName = cod_source_system_file_billing
client = gspread.authorize(creds_file)
spreadsheet_source = client.open_by_url(source_file_sheet)
spreadsheet_destination = client.open_by_url(destination_file_sheet)
sourceSheetId = spreadsheet_source.worksheet('Sheet1')
destinationSheetId = spreadsheet_destination.worksheet('Sheet2')
body = {
"requests": [
{
"copyPaste": {
"source": {
"sheetId": sourceSheetId,
"startRowIndex": 3,
"endRowIndex": 10,
"startColumnIndex": 0,
"endColumnIndex": 5
},
"destination": {
"sheetId": destinationSheetId,
"startRowIndex": 0,
"endRowIndex": 10,
"startColumnIndex": 0,
"endColumnIndex": 5
},
"pasteType": "PASTE_NORMAL"
}
}
]
}
res = destinationSheetId.batch_update(body)
print(res)
But when I run this it gives me the following error:
Traceback (most recent call last):
dict(vr, range=absolute_range_name(self.title, vr['range']))
TypeError: string indices must be integers
How can I solve my problem?
Thanks for your help!
I believe your goal and your current situation as follows.
You want to copy the values of the specific sheet in Google Spreadsheet "A" to the specific sheet in Google Spreadsheet "B".
You want to copy not only the values, but also the cell format.
You want to achieve this using gspread for python.
You have already been able to get and put values for Google Spreadsheet using Sheets API.
Modification points:
Unfortunately, "CopyPasteRequest" of the batchUpdate method cannot copy from Google Spreadsheet to other Google Spreadsheet. It seems that this is the current specification.
In order to copy not only the values, but also the cell format from Google Spreadsheet "A" to google Spreadsheet "B", I would like to propose the following flow.
Copy the source sheet in the source Spreadsheet to the destination Spreadsheet.
Copy the values with the format from the copied sheet to the destination sheet. And, delete the copied sheet.
When above points are reflected to a script, it becomes as follows.
Sample script:
In this sample script, I prepared the script below client = gspread.authorize(credentials) as follows. Before you use this, please set the variables.
client = gspread.authorize(credentials)
sourceSpreadsheetId = "###" # Please set the source Spreadsheet ID.
sourceSheetName = "Sheet1" # Please set the source sheet name.
destinationSpreadsheetId = "###" # Please set the destination Spreadsheet ID.
destinationSheetName = "Sheet2" # Please set the destination sheet name.
srcSpreadsheet = client.open_by_key(sourceSpreadsheetId)
srcSheet = srcSpreadsheet.worksheet(sourceSheetName)
dstSpreadsheet = client.open_by_key(destinationSpreadsheetId)
dstSheet = dstSpreadsheet.worksheet(destinationSheetName)
# 1. Copy the source sheet in the source Spreadsheet to the destination Spreadsheet.
copiedSheet = srcSheet.copy_to(destinationSpreadsheetId)
copiedSheetId = copiedSheet["sheetId"]
# 2. Copy the values with the format from the copied sheet to the destination sheet. And, delete the copied sheet.
body = {
"requests": [
{
"copyPaste": {
"source": {
"sheetId": copiedSheetId,
"startRowIndex": 3,
"endRowIndex": 10,
"startColumnIndex": 0,
"endColumnIndex": 5
},
"destination": {
"sheetId": dstSheet.id,
"startRowIndex": 0,
"endRowIndex": 10,
"startColumnIndex": 0,
"endColumnIndex": 5
},
"pasteType": "PASTE_NORMAL"
}
},
{
"deleteSheet": {
"sheetId": copiedSheetId
}
}
]
}
res = dstSpreadsheet.batch_update(body)
print(res)
References:
CopyPasteRequest
copy_to(spreadsheet_id)
DeleteSheetRequest

Delete (remove) column in google sheet over gspread Python like sheet.delete_row

Is there a method like worksheet.delete_row in gspread google-sheet?
I tried:
delete = sheet.range('A1:A1000')
for cell in delete:
cell.value = ""
sheet.update_cells(delete)
but that only delete all values, not column.
Can anybode help me?
Answer:
There is no method in gspread to delete an entire column, like Workbook.delete_row, however you can do this with a batch update.
Code sample:
spreadsheetId = "your-spreadsheet-id"
sheetId = "id-of-sheet-to-delete-column-from"
sh = client.open_by_key(spreadsheetId)
request = {
"requests": [
{
"deleteDimension": {
"range": {
"sheetId": sheetId,
"dimension": "COLUMNS",
"startIndex": 0,
"endIndex": 1
}
}
}
]
}
result = sh.batch_update(request)
This sample will delete column A, but make sure to change the startIndex and endIndex to be of the column range you wish to delete.
Edit:
If you do not know the sheetId of a given sheet, you can get it using the following:
sheetName = "theSheetName"
sheetId = sh.worksheet(sheetName)._properties["sheetId"]
Note that this is not needed for the original sheet of a Spreadsheet, as this will always be 0.
References:
Method: spreadsheets.batchUpdate | Sheets API | Google Developers
API References - gspread 3.4.0 documentation - batch_update(body)
Update 2020-04-15:
This script was merged with gspread master today from pull request #759 as method delete_column().
The method will be available in the next release v3.5.0.
A method for delete_columns() was also added as a parallel method to the existing delete_rows() from pull request #761.

Categories