Google Sheets Python batchUpdate repeatCell -> issue with range and number format - python

I am trying to use the google sheets api for python to format only a specific columns results to a "NUMBER" type but am struggling to get it to work properly. Am I doing something wrong with the "range" block? There are values that are getting appended to the column and when they get appended (via a different api set) they do not come back as formatted numbers that, when highlighting the entire column, result in a numbered sum.
id_sampleforstackoverflow = 'abcdefg123xidjadsfh192810'
cost_sav_body = {
"requests": [
{
"repeatCell": {
"range": {
"sheetId": 0,
"startRowIndex": 2,
"endRowIndex": 6,
"startColumnIndex": 0,
"endColumnIndex": 6
},
"cell": {
"userEnteredFormat": {
"numberFormat": {
"type": "NUMBER",
"pattern": "#.0#;#.0#"
}
}
},
"fields": "userEnteredFormat.numberFormat"
}
}
]
}
cost_sav_sum = service.spreadsheets().batchUpdate(spreadsheetId=id_sampleforstackoverflow, body=cost_sav_body).execute()
So when I run the above with the rest of my code, the values get appended, however, when highlighting the column, it simply gives me a count of the objects, and not a formatted number summing the total of the values (i.e. there are three values of -24, but only see a "Count" of 3 instead of -72).
I am using the GCP recommendations api for machineType to append the cost projection -> costs -> units value to the column (they append for example like i.e. -24).
Can someone help?
Documentation I have already gone through:
https://cloud.google.com/blog/products/application-development/formatting-cells-with-the-google-sheets-api
https://developers.google.com/sheets/api/guides/formats
https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/other#GridRange

#all
I was able to figure out the problem. When doing straight reporting of the values for the cost (as explained above as an objective) I was converting the output to string using the str() python method. I removed that str() method and kept the rest of the code you see above and now things are posting correctly:
#spend = str(element.primary_impact.cost_projection.cost.units)
spend = element.primary_impact.cost_projection.cost.units
So FYI for anyone else wondering, make sure that str() method is not used if you need to do a custom formatting code to those particular cells!

Related

Sequential Searching Across Multiple Indexes In Elasticsearch

Suppose I have Elasticsearch indexes in the following order:
index-2022-04
index-2022-05
index-2022-06
...
index-2022-04 represents the data stored in the month of April 2022, index-2022-05 represents the data stored in the month of May 2022, and so on. Now let's say in my query payload, I have the following timestamp range:
"range": {
"timestampRange": {
"gte": "2022-04-05T01:00:00.708363",
"lte": "2022-06-06T23:00:00.373772"
}
}
The above range states that I want to query the data that exists between the 5th of April till the 6th of May. That would mean that I have to query for the data inside three indexes, index-2022-04, index-2022-05 and index-2022-06. Is there a simple and efficient way of performing this query across those three indexes without having to query for each index one-by-one?
I am using Python to handle the query, and I am aware that I can query across different indexes at the same time (see this SO post). Any tips or pointers would be helpful, thanks.
You simply need to define an alias over your indices and query the alias instead of the indexes and let ES figure out which underlying indexes it needs to visit.
Eventually, for increased search performance, you can also configure index-time sorting on timestampRange, so that if your alias spans a full year of indexes, ES knows to visit only three of them based on the range constraint in your query (2022-04-05 -> 2022-04-05).
Like you wrote, you can simply use a wildcard in and/or pass a list as target index.
The simplest way would be to to just query all of your indices with an asterisk wildcard (e.g. index-* or index-2022-*) as target. You do not need to define an alias for that, you can just use the wildcard in the target string, like so:
from elasticsearch import Elasticsearch
es_client = Elasticsearch('https://elastic.host:9200')
datestring_start = '2022-04-05T01:00:00.708363'
datestring_end = '2022-06-06T23:00:00.373772'
result = es_client.search(
index = 'index-*',
query = { "bool": {
"must": [{
"range": {
"timestampRange": {
"gte": datestring_start,
"lte": datestring_end
}
}
}]
}
})
This will query all indices that match the pattern, but I would expect Elasticsearch to perform some sort of optimization on this. As #Val wrote in his answer, configuring index-time sorting will be beneficial for performance, as it limits the number of documents that should be visited when the index sort and the search sort are the same.
For completeness sake, if you really wanted to pass just the relevant index names to Elasticsearch, another option would be to first figure out on the Python side which sequence of indices you need to query and supply these as a comma-separated list (e.g. ['index-2022-04', 'index-2022-05', 'index-2022-06']) as target. You could e.g. use the Pandas date_range() function to easily generate such a list of indices, like so
from elasticsearch import Elasticsearch
import pandas as pd
es_client = Elasticsearch('https://elastic.host:9200')
datestring_start = '2022-04-05T01:00:00.708363'
datestring_end = '2022-06-06T23:00:00.373772'
months_list = pd.date_range(pd.to_datetime(datestring_start).to_period('M').to_timestamp(), datestring_end, freq='MS').strftime("index-%Y-%m").tolist()
result = es_client.search(
index = months_list,
query = { "bool": {
"must": [{
"range": {
"timestampRange": {
"gte": datestring_start,
"lte": datestring_end
}
}
}]
}
})

Converting csv to nested Json using python

I want to convert csv file to json file.
I have large data in csv file.
CSV Column Structure
This is my column structure in csv file . I has 200+ records.
id.oid libId personalinfo.Name personalinfo.Roll_NO personalinfo.addr personalinfo.marks.maths personalinfo.marks.physic clginfo.clgName clginfo.clgAddr clginfo.haveCert clginfo.certNo clginfo.certificates.cert_name_1 clginfo.certificates.cert_no_1 clginfo.certificates.cert_exp_1 clginfo.certificates.cert_name_2 clginfo.certificates.cert_no_2 clginfo.certificates.cert_exp_2 clginfo.isDept clginfo.NoofDept clginfo.DeptDetails.DeptName_1 clginfo.DeptDetails.location_1 clginfo.DeptDetails.establish_date_1 _v updatedAt.date
Expected Json
[{
"id":
{
"$oid": "00001"
},
"libId":11111,
"personalinfo":
{
"Name":"xyz",
"Roll_NO":101,
"addr":"aa bb cc ddd",
"marks":
[
"maths":80,
"physic":90
.....
]
},
"clginfo"
{
"clgName":"pqr",
"clgAddr":"qwerty",
"haveCert":true, //this is boolean true or false
"certNo":1, //this could be 1-10
"certificates":
[
{
"cert_name_1":"xxx",
"cert_no_1":12345,
"cert_exp.1":"20/2/20202"
},
{
"cert_name_2":"xxx",
"cert_no_2":12345,
"cert_exp_2":"20/2/20202"
},
......//could be up to 10
],
"isDept":true, //this is boolean true or false
"NoofDept":1 , //this could be 1-10
"DeptDetails":
[
{
"DeptName_1":"yyy",
"location_1":"zzz",
"establish_date_1":"1/1/1919"
},
......//up to 10 records
]
},
"__v": 1,
"updatedAt":
{
"$date": "2022-02-02T13:35:59.843Z"
}
}]
I have tried using pandas but I'm getting output as
My output
[{
"id.$oid": "00001",
"libId":11111,
"personalinfo.Name":"xyz",
"personalinfo.Roll_NO":101,
"personalinfo.addr":"aa bb cc ddd",
"personalinfo.marks.maths":80,
"personalinfo.marks.physic":90,
"clginfo.clgName":"pqr",
"clginfo.clgAddr":"qwerty",
"clginfo.haveCert":true,
"clginfo.certNo":1,
"clginfo.certificates.cert_name_1":"xxx",
"clginfo.certificates.cert_no_1":12345,
"clginfo.certificates.cert_exp.1":"20/2/20202"
"clginfo.certificates.cert_name_2":"xxx",
"clginfo.certificates.cert_no_2":12345,
"clginfo.certificates.cert_exp_2":"20/2/20202"
"clginfo.isDept":true,
"clginfo.NoofDept":1 ,
"clginfo.DeptDetails.DeptName_1":"yyy",
"clginfo.DeptDetails.location_1":"zzz",
"eclginfo.DeptDetails.stablish_date_1":"1/1/1919",
"__v": 1,
"updatedAt.$date": "2022-02-02T13:35:59.843Z",
}]
I am new to python I only know the basic Please help me getting this output.
200+ records is really tiny, so even naive solution is good.
It can't be totally generic because I don't see how it can be seen from the headers that certificates is a list, unless we rely on all names under certificates having _N at the end.
Proposed solution using only basic python:
read header row - split all column names by period. Iterate over resulting list and create nested dicts with appropriate keys and dummy values (if you want to handle lists: create array if current key ends with _N and use N as an index)
for all rows:
clone dictionary with dummy values
for each column use split keys from above to put the value into the corresponding dict. same solution from above for lists.
append the dictionary to list of rows

Pulling a value from DataFrame based on another value

I'm playing around with the Facebook Ads API, I've pulled campaign data for one of my campaigns. If I have this dataframe:
[<Insights> {
"actions": [
{
"action_type": "custom_event_abc",
"value": 50
},
{
"action_type": "custom_event_def",
"value": 42
},]
How would I go about getting the value for custom_event_def out?
In my wider results, I first used (df.loc[0]['actions'][1]['value']) in my code which worked, but my issue with that is that custom_event_abc doesn't always appear and so the position of custom_event_defcan change; meaning my solution only works some of the time.
Can value (42) be pulled out using a reference to the action_type?
This will first create a dictionary actions with the content of "actions", iterate through all value it to find custom_event_def and then print the corresponding value
actions = df.loc[0]['actions']
for i, elem in enumerate(actions):
if elem['action_type'] == "custom_event_def":
print(actions[i]['value'])

Quickest way to write a column with google sheet API and gspread

I have a column (sheet1!A:A) with 6000 rows, I would like to write today's date (todays_date) to each cell in the column. Currently doing it by using .values_update() method in a while loop but it takes too much time and giving APIError due to quota limit.
x=0
while x <= len(column):
sh.values_update(
'Sheet1!A'+str(x),
params={
'valueInputOption': 'USER_ENTERED'
} ,
body={
'values': todays_date]
}
)
x+=1
Is there any other way that I can change the cell values altogether?
You want to put a value to all cells in the column "A" in "Sheet1".
You want to achieve this using gspread with python.
You want to reduce the process cost for this situation.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
In this answer, I used the method of batch_update of gspread and RepeatCellRequest of the method of batchUpdate in Sheets API. In this case, above situation can be achieved by one API call.
Sample script:
Before you run the script, please set the variables of spreadsheetId and sheetName.
spreadsheetId = "###" # Please set the Spreadsheet ID.
sheetName = "Sheet1" # Please set the sheet name.
sh = client.open_by_key(spreadsheetId)
sheetId = sh.worksheet(sheetName)._properties['sheetId']
todays_date = (datetime.datetime.now() - datetime.datetime(1899, 12, 30)).days
requests = [
{
"repeatCell": {
"range": {
"sheetId": sheetId,
"startRowIndex": 0,
"startColumnIndex": 0,
"endColumnIndex": 1
},
"cell": {
"userEnteredValue": {
"numberValue": todays_date
},
"userEnteredFormat": {
"numberFormat": {
"type": "DATE",
"pattern": "dd/mm/yyyy"
}
}
},
"fields": "userEnteredValue,userEnteredFormat"
}
}
]
res = sh.batch_update({'requests': requests})
print(res)
When you run above script, the today's date is put to all cells of the column "A" as the format of dd/mm/yyyy.
If you want to change the value and format, please modify above script.
At (datetime.datetime.now() - datetime.datetime(1899, 12, 30)).days, the date is converted to the serial number. By this, the value can be used as the date object in Google Spreadsheet.
References:
batch_update(body)
RepeatCellRequest
If I misunderstood your question and this was not the direction you want, I apologize.

Setting column in Google Sheets API (with Python) to be number-formatted

I'm trying to format a column of numbers in Google Sheets using the API (Sheets API v.4 and Python 3.6.1, specifically). A portion of my non-functional code is below. I know it's executing, as the background color of the column gets set, but the numbers still show as text, not numbers.
Put another way, I'm trying to get the equivalent of clicking on a column header (A, B, C, or whatever) then choosing the Format -> Number -> Number menu item in the GUI.
def sheets_batch_update(SHEET_ID,data):
print ( ("Sheets: Batch update"))
service.spreadsheets().batchUpdate(spreadsheetId=SHEET_ID,body=data).execute() #,valueInputOption='RAW'
data={
"requests": [
{
"repeatCell": {
"range": {
"sheetId": all_sheets['Users'],
"startColumnIndex": 19,
"endColumnIndex": 20
},
"cell": {
"userEnteredFormat": {
"numberFormat": {
"type": "NUMBER",
"pattern": "#,##0",
},
"backgroundColor": {
"red": 0.0,
"green": 0.4,
"blue": 0.4
},
}
},
"fields": "userEnteredFormat(numberFormat,backgroundColor)"
}
},
]
}
sheets_batch_update(SHEET_ID, data)
The problem is likely that your data is currently stored as strings and therefore not affected by the number format.
"userEnteredValue": {
"stringValue": "1000"
},
"formattedValue": "1000",
"userEnteredFormat": {
"numberFormat": {
"type": "NUMBER",
"pattern": "#,##0"
}
},
When you set a number format via the UI (Format > Number > ...) it's actually doing two things at once:
Setting the number format.
Converting string values to number values, if possible.
Your API call is only doing #1, so any cells that are currently set with a string value will remain a string value and will therefore be unaffected by the number format. One solution would be to go through the affected values and move the stringValue to a numberValue if the cell contains a number.
To flesh out the answer from Eric Koleda a bit more, I ended up solving this two ways, depending on how I was getting the data for the Sheet:
First, if I was appending cells to the sheet, I used a function:
def set_cell_type(cell_contents):
current_cell_contents=str(cell_contents).replace(',', '')
float_cell=re.compile("^\d+\.\d+$")
int_cell=re.compile("^\d+$")
if int_cell.search(current_cell_contents):
data = {"userEnteredValue": {"numberValue": int(current_cell_contents)}}
elif float_cell.search(current_cell_contents):
data = {"userEnteredValue": {"numberValue": float(current_cell_contents)}}
else:
data = {"userEnteredValue": {"stringValue": str(cell_contents)}}
return data
To format the cells properly. Here's the call that actually did the appending:
rows = [{"values": [set_cell_type(cell) for cell in row]} for row in daily_data_output]
data = { "requests": [ { "appendCells": { "sheetId": all_sheets['Daily record'], "rows": rows, "fields": "*", } } ], }
sheets_batch_update(SHEET_ID,data)
Second, if I was replacing a whole sheet, I did:
#convert the ints to ints and floats to floats
float_cell=re.compile("^\d+\.\d+$")
int_cell=re.compile("^\d+$")
row_list=error_message.split("\t")
i=0
while i < len(row_list):
current_cell=row_list[i].replace(',', '') #remove the commas from any numbers
if int_cell.search(current_cell):
row_list[i]=int(current_cell)
elif float_cell.search(current_cell):
row_list[i]=float(current_cell)
i+=1
error_output.append(row_list)
then the following to actually save error_output to the sheet:
data = {'values': [row for row in error_output]}
sheets_update(SHEET_ID,data,'Errors!A1')
those two techniques, coupled with the formatting calls I had already figured out in my initial question, did the trick.

Categories