Quickest way to write a column with google sheet API and gspread - python

I have a column (sheet1!A:A) with 6000 rows, I would like to write today's date (todays_date) to each cell in the column. Currently doing it by using .values_update() method in a while loop but it takes too much time and giving APIError due to quota limit.
x=0
while x <= len(column):
sh.values_update(
'Sheet1!A'+str(x),
params={
'valueInputOption': 'USER_ENTERED'
} ,
body={
'values': todays_date]
}
)
x+=1
Is there any other way that I can change the cell values altogether?

You want to put a value to all cells in the column "A" in "Sheet1".
You want to achieve this using gspread with python.
You want to reduce the process cost for this situation.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
In this answer, I used the method of batch_update of gspread and RepeatCellRequest of the method of batchUpdate in Sheets API. In this case, above situation can be achieved by one API call.
Sample script:
Before you run the script, please set the variables of spreadsheetId and sheetName.
spreadsheetId = "###" # Please set the Spreadsheet ID.
sheetName = "Sheet1" # Please set the sheet name.
sh = client.open_by_key(spreadsheetId)
sheetId = sh.worksheet(sheetName)._properties['sheetId']
todays_date = (datetime.datetime.now() - datetime.datetime(1899, 12, 30)).days
requests = [
{
"repeatCell": {
"range": {
"sheetId": sheetId,
"startRowIndex": 0,
"startColumnIndex": 0,
"endColumnIndex": 1
},
"cell": {
"userEnteredValue": {
"numberValue": todays_date
},
"userEnteredFormat": {
"numberFormat": {
"type": "DATE",
"pattern": "dd/mm/yyyy"
}
}
},
"fields": "userEnteredValue,userEnteredFormat"
}
}
]
res = sh.batch_update({'requests': requests})
print(res)
When you run above script, the today's date is put to all cells of the column "A" as the format of dd/mm/yyyy.
If you want to change the value and format, please modify above script.
At (datetime.datetime.now() - datetime.datetime(1899, 12, 30)).days, the date is converted to the serial number. By this, the value can be used as the date object in Google Spreadsheet.
References:
batch_update(body)
RepeatCellRequest
If I misunderstood your question and this was not the direction you want, I apologize.

Related

Adding Current Date column next to current Data Frame using Python

I'm not fully understanding data frames & am in the process of taking a course on them. This one feels like it should be so easy, but I could really use an explanation.
All I want to do is ADD a column next to my current output that has the CURRENT date in the cells.
I'm getting a timestamp using
time = pd.Timestamp.today()
print (time)
But obviously this is just to print, not connecting it to my other code.
I was able to accomplish this in Google Sheets (once the output lands), but it would be so much cleaner (and informative) if I could do it right from the script.
This is what it currently looks like:
import requests
import pandas as pd
import gspread
gc = gspread.service_account(filename='creds.json')
sh = gc.open_by_key('152qSpr-4nK9V5uHOiYOWTWUx4ojjVNZMdSmFYov-n50')
waveData = sh.get_worksheet(1)
id_list = [
"/Belmar-Surf-Report/3683/",
"/Manasquan-Surf-Report/386/",
"/Ocean-Grove-Surf-Report/7945/",
"/Asbury-Park-Surf-Report/857/",
"/Avon-Surf-Report/4050/",
"/Bay-Head-Surf-Report/4951/",
"/Belmar-Surf-Report/3683/",
"/Boardwalk-Surf-Report/9183/",
]
res = []
for x in id_list:
df = pd.read_html(requests.get("http://magicseaweed.com" +
x).text)[0]
values = [[x], df.columns.values.tolist(), *df.values.tolist()] ## does it go within here?
res.extend(values)
res.append([])
waveData.append_rows(res, value_input_option="USER_ENTERED")
I thought it would go within the values, since this is (where I believe) my columns are built?
Would love to understand this better if someone is willing to take the time.
In your situation, how about the following modification?
From:
waveData.append_rows(res, value_input_option="USER_ENTERED")
To:
waveData.append_rows(res, value_input_option="USER_ENTERED")
# In this case, please add the following script.
row = len(waveData.get_all_values())
col = max([len(e) for e in res])
time = pd.Timestamp.today()
req = { "requests": [{ "repeatCell": { "cell": { "userEnteredValue": { "stringValue": str(time) } }, "range": { "sheetId": waveData.id, "startRowIndex": row - len(res) + 1, "endRowIndex": row, "startColumnIndex": col, "endColumnIndex": col + 1 }, "fields": "userEnteredValue" } }] }
sh.batch_update(req)
When this script is run, the timestamp of pd.Timestamp.today() is added to the next column of the last column by the batchUpdate method.
If you want to add the timestamp to the specific column, please modify range of the above script.

Google Sheets Python batchUpdate repeatCell -> issue with range and number format

I am trying to use the google sheets api for python to format only a specific columns results to a "NUMBER" type but am struggling to get it to work properly. Am I doing something wrong with the "range" block? There are values that are getting appended to the column and when they get appended (via a different api set) they do not come back as formatted numbers that, when highlighting the entire column, result in a numbered sum.
id_sampleforstackoverflow = 'abcdefg123xidjadsfh192810'
cost_sav_body = {
"requests": [
{
"repeatCell": {
"range": {
"sheetId": 0,
"startRowIndex": 2,
"endRowIndex": 6,
"startColumnIndex": 0,
"endColumnIndex": 6
},
"cell": {
"userEnteredFormat": {
"numberFormat": {
"type": "NUMBER",
"pattern": "#.0#;#.0#"
}
}
},
"fields": "userEnteredFormat.numberFormat"
}
}
]
}
cost_sav_sum = service.spreadsheets().batchUpdate(spreadsheetId=id_sampleforstackoverflow, body=cost_sav_body).execute()
So when I run the above with the rest of my code, the values get appended, however, when highlighting the column, it simply gives me a count of the objects, and not a formatted number summing the total of the values (i.e. there are three values of -24, but only see a "Count" of 3 instead of -72).
I am using the GCP recommendations api for machineType to append the cost projection -> costs -> units value to the column (they append for example like i.e. -24).
Can someone help?
Documentation I have already gone through:
https://cloud.google.com/blog/products/application-development/formatting-cells-with-the-google-sheets-api
https://developers.google.com/sheets/api/guides/formats
https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/other#GridRange
#all
I was able to figure out the problem. When doing straight reporting of the values for the cost (as explained above as an objective) I was converting the output to string using the str() python method. I removed that str() method and kept the rest of the code you see above and now things are posting correctly:
#spend = str(element.primary_impact.cost_projection.cost.units)
spend = element.primary_impact.cost_projection.cost.units
So FYI for anyone else wondering, make sure that str() method is not used if you need to do a custom formatting code to those particular cells!

How to compare a json with a CSV file

I have one json payload which is used for one service request. After processing that payload(JSON) will be stored in S3 and through Athena we can download those data in CSV format. Now in the actual scenario, there are more than 100 fields. I want to verify their value through some automated script instead of manual.
say my sample payload is similar to the following:
{
"BOOK": {
"serialno": "123",
"author": "xyz",
"yearofpublish": "2015",
"price": "16"
}, "Author": [
{
"isbn": "xxxxx", "title": "first", "publisher": "xyz", "year": "2020"
}, {
"isbn": "yyyy", "title": "second", "publisher": "zmy", "year": "2019"
}
]
}
the sample csv will be like following:
Can anyone please help me how exactly I can do it on Python? Maybe the library or dictionary?
it looks like you just want to flatten out the JSON structure. It'll be easiest to loop over the "Author" list. Since the CSV has renamed the columns you'll need some way to represent that mapping. Based only on example, this works:
import json
fin=open(some_json_file, 'r')
j=json.load(fin)
result=[]
for author in j['Author']:
val = {'book_serialno': j['BOOK']['serialno'],
'book_author': j['BOOK']['author'],
'book_yearofpublish': j['BOOK']['yearofpublish'],
'book_price': j['BOOK']['price'],
'author_isbn': author['isbn'],
'author_title': author['title'],
'author_publisher': author['publisher'],
'author_year': author['year']}
result.append(val)
This is using a dictionary to show the mapping of data points to the new column names. You might be able to get away with using a list as well. Depends how you want to use it later on. To write to a CSV:
import csv
fout=open(some_csv_file, 'w')
writer=csv.writer(fout)
writer.writerow(result[0].keys())
writer.writerows(r.values() for r in result)
This writes the column names in the first row, then the data. If you don't want the column names, just leave out the writerow(...) line.

Filtering pandas dataframe by date to count views for timeline of programs

I need to count viewers by program for a streaming channel from a json logfile.
I identify the programs by their starttimes, such as:
So far I have two Dataframes like this:
The first one contains all the timestamps from the logfile
viewers_from_log = pd.read_json('sqllog.json', encoding='UTF-8')
# Convert date string to pandas datetime object:
viewers_from_log['time'] = pd.to_datetime(viewers_from_log['time'])
Source JSON file:
[
{
"logid": 191605,
"time": "0:00:17"
},
{
"logid": 191607,
"time": "0:00:26"
},
{
"logid": 191611,
"time": "0:01:20"
}
]
The second contains the starting times and titles of the programs
programs_start_time = pd.DataFrame.from_dict('programs.json', orient='index')
Source JSON file:
{
"2019-05-29": [
{
"title": "\"Amiről a kövek mesélnek\"",
"startTime_dt": "2019-05-29T00:00:40Z"
},
{
"title": "Koffer - Kedvcsináló Kul(t)túrák Külföldön",
"startTime_dt": "2019-05-29T00:22:44Z"
},
{
"title": "Gubancok",
"startTime_dt": "2019-05-29T00:48:08Z"
}
]
}
So what I need to do is to count the entries / program in the log file and link them to the program titles.
My approach is to slice log data for each date range from program data and get the shape. Next add column for program data with results:
import pandas as pd
# setup test data
log_data = {'Time': ['2019-05-30 00:00:26', '2019-05-30 00:00:50', '2019-05-30 00:05:50','2019-05-30 00:23:26']}
log_data = pd.DataFrame(data=log_data)
program_data = {'Time': ['2019-05-30 00:00:00', '2019-05-30 00:22:44'],
'Program': ['Program 1', 'Program 2']}
program_data = pd.DataFrame(data=program_data)
counts = []
for index, row in program_data.iterrows():
# get counts on selected range
try:
log_range = log_data[(log_data['Time'] > program_data.loc[index].values[0]) & (log_data['Time'] < program_data.loc[index+1].values[0])]
counts.append(log_range.shape[0])
except:
log_range = log_data[log_data['Time'] > program_data.loc[index].values[0]]
counts.append(log_range.shape[0])
# add aditional column with collected counts
program_data['Counts'] = counts
Output:
Time Program Counts
0 2019-05-30 00:00:00 Program 1 3
1 2019-05-30 00:22:44 Program 2 1
A working (but maybe a little quick and dirty) method:
Use the .shift(-1) method on the timestamp column of programs_start_time dataframe, to get an additional column with a name date_end indicating the timestamp of end for each TV program.
Then for each example_timestamp in the log file, you can query the TV programs dataframe like this: df[(df['date_start']=<example_timestamp) & (df['date_end']>example_timestamp)] (make sure you substitute df with your dataframe's name: programs_start_time) which will give you exactly one dataframe row and extract from it the name of the TV programm.
Hope this helps!
Solution with histogram, using numpy:
import pandas as pd
import numpy as np
df_p = pd.DataFrame([
{
"title": "\"Amiről a kövek mesélnek\"",
"startTime_dt": "2019-05-29T00:00:40Z"
},
{
"title": "Koffer - Kedvcsináló Kul(t)túrák Külföldön",
"startTime_dt": "2019-05-29T00:22:44Z"
},
{
"title": "Gubancok",
"startTime_dt": "2019-05-29T00:48:08Z"
}
])
df_v = pd.DataFrame([
{
"logid": 191605,
"time": "2019-05-29 0:00:17"
},
{
"logid": 191607,
"time": "2019-05-29 0:00:26"
},
{
"logid": 191611,
"time": "2019-05-29 0:01:20"
}
])
df_p.startTime_dt = pd.to_datetime(df_p.startTime_dt)
df_v.time = pd.to_datetime(df_v.time)
# here's part where I convert datetime to timestamp in seconds - astype(int) casts it to nanoseconds, hence there's // 10**9
programmes_start = df_p.startTime_dt.astype(int).values // 10**9
viewings_starts = df_v.time.astype(int).values // 10**9
# make bins for histogram
# add zero to the beginning of the array
# add value that is time an hour after the start of the last given programme to the end of the array
programmes_start = np.pad(programmes_start, (1, 1), mode='constant', constant_values=(0, programmes_start.max()+3600))
histogram = np.histogram(viewings_starts, bins=programmes_start)
print(histogram[0]
# prints [2 1 0 0]
Interpretation: there were 2 log entries before 'Amiről a kövek mesélnek' started, 1 log entry between starts of 'Amiről a kövek mesélnek' and 'Koffer - Kedvcsináló Kul(t)túrák Külföldön', 0 log entries between starts of 'Koffer - Kedvcsináló Kul(t)túrák Külföldön' and 'Gubancok' and 0 entries after start od 'Gubancok'. Which, looking at the data you provided, seems correct :) Hope this helps.
NOTE: I assume, that you have the date of the viewings. You don't have them in the example log file, but they appear in the screenshot - so I assumed that you can compute/get them somehow and added them by hand to the input dict.

Setting column in Google Sheets API (with Python) to be number-formatted

I'm trying to format a column of numbers in Google Sheets using the API (Sheets API v.4 and Python 3.6.1, specifically). A portion of my non-functional code is below. I know it's executing, as the background color of the column gets set, but the numbers still show as text, not numbers.
Put another way, I'm trying to get the equivalent of clicking on a column header (A, B, C, or whatever) then choosing the Format -> Number -> Number menu item in the GUI.
def sheets_batch_update(SHEET_ID,data):
print ( ("Sheets: Batch update"))
service.spreadsheets().batchUpdate(spreadsheetId=SHEET_ID,body=data).execute() #,valueInputOption='RAW'
data={
"requests": [
{
"repeatCell": {
"range": {
"sheetId": all_sheets['Users'],
"startColumnIndex": 19,
"endColumnIndex": 20
},
"cell": {
"userEnteredFormat": {
"numberFormat": {
"type": "NUMBER",
"pattern": "#,##0",
},
"backgroundColor": {
"red": 0.0,
"green": 0.4,
"blue": 0.4
},
}
},
"fields": "userEnteredFormat(numberFormat,backgroundColor)"
}
},
]
}
sheets_batch_update(SHEET_ID, data)
The problem is likely that your data is currently stored as strings and therefore not affected by the number format.
"userEnteredValue": {
"stringValue": "1000"
},
"formattedValue": "1000",
"userEnteredFormat": {
"numberFormat": {
"type": "NUMBER",
"pattern": "#,##0"
}
},
When you set a number format via the UI (Format > Number > ...) it's actually doing two things at once:
Setting the number format.
Converting string values to number values, if possible.
Your API call is only doing #1, so any cells that are currently set with a string value will remain a string value and will therefore be unaffected by the number format. One solution would be to go through the affected values and move the stringValue to a numberValue if the cell contains a number.
To flesh out the answer from Eric Koleda a bit more, I ended up solving this two ways, depending on how I was getting the data for the Sheet:
First, if I was appending cells to the sheet, I used a function:
def set_cell_type(cell_contents):
current_cell_contents=str(cell_contents).replace(',', '')
float_cell=re.compile("^\d+\.\d+$")
int_cell=re.compile("^\d+$")
if int_cell.search(current_cell_contents):
data = {"userEnteredValue": {"numberValue": int(current_cell_contents)}}
elif float_cell.search(current_cell_contents):
data = {"userEnteredValue": {"numberValue": float(current_cell_contents)}}
else:
data = {"userEnteredValue": {"stringValue": str(cell_contents)}}
return data
To format the cells properly. Here's the call that actually did the appending:
rows = [{"values": [set_cell_type(cell) for cell in row]} for row in daily_data_output]
data = { "requests": [ { "appendCells": { "sheetId": all_sheets['Daily record'], "rows": rows, "fields": "*", } } ], }
sheets_batch_update(SHEET_ID,data)
Second, if I was replacing a whole sheet, I did:
#convert the ints to ints and floats to floats
float_cell=re.compile("^\d+\.\d+$")
int_cell=re.compile("^\d+$")
row_list=error_message.split("\t")
i=0
while i < len(row_list):
current_cell=row_list[i].replace(',', '') #remove the commas from any numbers
if int_cell.search(current_cell):
row_list[i]=int(current_cell)
elif float_cell.search(current_cell):
row_list[i]=float(current_cell)
i+=1
error_output.append(row_list)
then the following to actually save error_output to the sheet:
data = {'values': [row for row in error_output]}
sheets_update(SHEET_ID,data,'Errors!A1')
those two techniques, coupled with the formatting calls I had already figured out in my initial question, did the trick.

Categories