Python gspread - get the last row without fetching all the data? - python

Looking on tips how to get the data of the latest row of a sheet. I've seen solution to get all the data and then taking the length of that.
But this is of course a waste of all that fetching. Wondering if there is a smart way to do it, since you can already append data to the last row+1 with worksheet.append_rows([some_data])

I used the solution #buran metnion. If you init the worksheet with
add_worksheet(title="title", rows=1, cols=10)
and only append new data via
worksheet.append_rows([some_array])
Then #buran's suggestion is brilliant to simply use
worksheet.row_count

I found this code in another question, it creates a dummy append in the sheet.
After that, you can search for the location later on:
def get_last_row_with_data(service, value_input_option="USER_ENTERED"):
last_row_with_data = '1'
try:
# creates a dummy row
dummy_request_append = service.spreadsheets().values().append(
spreadsheetId='<spreadsheet id>',
range="{0}!A:{1}".format('Tab Name', 'ZZZ'),
valueInputOption='USER_ENTERED',
includeValuesInResponse=True,
responseValueRenderOption='UNFORMATTED_VALUE',
body={
"values": [['']]
}
).execute()
# Search the dummy row
a1_range = dummy_request_append.get('updates', {}).get('updatedRange', 'dummy_tab!a1')
bottom_right_range = a1_range.split('!')[1]
number_chars = [i for i in list(bottom_right_range) if i.isdigit()]
last_row_with_data = ''.join(number_chars)
except Exception as e:
last_row_with_data = '1'
return last_row_with_data
You can see a sample of Append in this documentation.
However, for me it is just easier to use:
# The ID of the sheet you are working with.
Google_sheets_ID = 'ID_of_your_Google_Sheet'
# define the start row that has data
# it will later be replace with the last row
# in my test sheet, it starts in row 2
last_row = 2
# code to the get the last row
# range will be the column where the information is located
# remember to change "sheet1" for the name of your worksheet.
response = service.spreadsheets().values().get(
spreadsheetId = Google_sheets_ID,
range = 'sheet1!A1:A'
)execute()
#Add the initial value where the range started to the last row with values
Last_row += len(response['values']) - 1
#If you print last row, you should see the last row with values in the Sheet.
print(last_row)

Related

Using gspread, trying to add a column at the end of Google Sheet that already exists

Here is the code I am working with.
dfs=dfs[['Reserved']] #the column that I need to insert
dfs=dfs.applymap(str) #json did not accept the nan so needed to convert
sh=gc.open_by_key('KEY') #would open the google sheet
sh_dfs=sh.get_worksheet(0) #getting the worksheet
sh_dfs.insert_rows(dfs.values.tolist()) #inserts the dfs into the new worksheet
Running this code would insert the rows at the first column of the worksheet but what I am trying to accomplish is adding/inserting the column at the very last, column p.
In your situation, how about the following modification? In this modification, at first, the maximum column is retrieved. And, the column number is converted to the column letter, and the values are put to the next column of the last column.
From:
sh_dfs.insert_rows(dfs.values.tolist())
To:
# Ref: https://stackoverflow.com/a/23862195
def colnum_string(n):
string = ""
while n > 0:
n, remainder = divmod(n - 1, 26)
string = chr(65 + remainder) + string
return string
values = sh_dfs.get_all_values()
col = colnum_string(max([len(r) for r in values]) + 1)
sh_dfs.update(col + '1', dfs.values.tolist(), value_input_option='USER_ENTERED')
Note:
If an error like exceeds grid limits occurs, please insert the blank column.
Reference:
update

How to use result of search in Smartsheet Python SDK?

After starting the program
results = smart.Search.search("2244113312180")
print(results)
Getting the data
{"results":
[{"contextData": ["2244113312180"],
"objectId": 778251154810756,
"objectType": "row",
"parentObjectId": 3648397300262788,
"parentObjectName": "Sample Sheet",
"parentObjectType": "sheet",
"text": "2244113312180"},
{"contextData": ["2244113312180"],
"objectId": 7803446734415748,
"objectType": "row",
"parentObjectId": 3648397300262788,
"parentObjectName": "Sample Sheet",
"parentObjectType": "sheet",
"text": "2244113312180"}],
"totalCount": 2}
How do I use them correctly in my program?
Please provide a correct usage example.
And how to find out the id_column in which the value was found "2244113312180"?
new_row = smartsheet.models.Row()
new_row.id = results.objectId
Sorry I didn't write the error right away. I can't use the properties from the results. String:
new_row.id = results.objectId
Causes an error
AttributeError: 'SearchResult' object has no attribute 'objectId'
Thank you for any help!
P.S. I found how to do it.
results = smart.Search.search("2244113312180")
text = str(results)
json_op = json.loads(text)
for i in json_op["results"]:
new_row = smartsheet.models.Row()
new_row.id = i["objectId"]
I don't know if this is a good solution or not.
According to the SearchResultItem Object definition in the Smartsheet API docs, a search result item will never contain information about the column where a value exists. As the result JSON you've posted shows, if the specified value is found within the row of a sheet (i.e., in any of the cells that row contains), the corresponding search result item will identify the sheet ID (parentObjectId) and the row ID (objectId).
You can then use those two values to retrieve the row, as described in the Get Row section of the docs:
row = smartsheet_client.Sheets.get_row(
4583173393803140, # sheet_id
2361756178769796 # row_id
)
Then you can iterate through the row.cells array, checking the value property of each cell to determine if it matches the value you searched for previously. When you find a cell object that contains that value, the column_id property of that cell object will give you the column ID where the matching value exists.
UPDATE:
Thanks for clarifying info in your original post. I'm updating this answer to provide a complete code sample that implements the approach I described previously. Hope this is helpful!
This code sample does the following:
searches everything in Smartsheet (that the holder of the API token being used has access to) for a string value
iterates through search result items to process any "row" results (i.e., anywhere that the string appears within the cells of a sheet)
replaces any occurrences within (the cells of) a sheet with the string new value
# set search criteria
query = '2244113312180'
# search everything
search_results = smart.Search.search(query)
# loop through results
# (acting upon only search results that appear within a row of a sheet)
for item in search_results.results:
if item.object_type == 'row':
# get row
row = smart.Sheets.get_row(
item.parent_object_id, # sheet_id
item.object_id # row_id
)
# find the cell that contains the value and update that cell value
for cell in row.cells:
if cell.value == query:
# build new cell value
new_cell = smartsheet.models.Cell()
new_cell.column_id = cell.column_id
new_cell.value = "new value"
new_cell.strict = False
# build the row to update
new_row = smartsheet.models.Row()
new_row.id = item.object_id
new_row.cells.append(new_cell)
# update row
result = smart.Sheets.update_rows(
item.parent_object_id, # sheet_id
[new_row])

Randomization of a list with conditions using Pandas

I'm new to any kind of programming as you can tell by this 'beautiful' piece of hard coding. With sweat and tears (not so bad, just a little), I've created a very sequential code and that's actually my problem. My goal is to create a somewhat-automated script - probably including for-loop (I've unsuccessfully tried).
The main aim is to create a randomization loop which takes original dataset looking like this:
dataset
From this data set picking randomly row by row and saving it one by one to another excel list. The point is that the row from columns called position01 and position02 should be always selected so it does not match with the previous pick in either of those two column values. That should eventually create an excel sheet with randomized rows that are followed always by a row that does not include values from the previous pick. So row02 should not include any of those values in columns position01 and position02 of the row01, row3 should not contain values of the row2, etc. It should also iterate in the range of the list length, which is 0-11. Important is also the excel output since I need the rest of the columns, I just need to shuffle the order.
I hope my aim and description are clear enough, if not, happy to answer any questions. I would appreciate any hint or help, that helps me 'unstuck'. Thank you. Code below. (PS: I'm aware of the fact that there is probably much more neat solution to it than this)
import pandas as pd
import random
dataset = pd.read_excel("C:\\Users\\ibm\\Documents\\Psychopy\\DataInput_Training01.xlsx")
# original data set use for comparisons
imageDataset = dataset.loc[0:11, :]
# creating empty df for storing rows from imageDataset
emptyExcel = pd.DataFrame()
randomPick = imageDataset.sample() # select randomly one row from imageDataset
emptyExcel = emptyExcel.append(randomPick) # append a row to empty df
randomPickIndex = randomPick.index.tolist() # get index of the row
imageDataset2 = imageDataset.drop(index=randomPickIndex) # delete the row with index selected before
# getting raw values from the row 'position01'/02 are columns headers
randomPickTemp1 = randomPick['position01'].values[0]
randomPickTemp2 = randomPick
randomPickTemp2 = randomPickTemp2['position02'].values[0]
# getting a dataset which not including row values from position01 and position02
isit = imageDataset2[(imageDataset2.position01 != randomPickTemp1) & (imageDataset2.position02 != randomPickTemp1) & (imageDataset2.position01 != randomPickTemp2) & (imageDataset2.position02 != randomPickTemp2)]
# pick another row from dataset not including row selected at the beginning - randomPick
randomPick2 = isit.sample()
# save it in empty df
emptyExcel = emptyExcel.append(randomPick2, sort=False)
# get index of this second row to delete it in next step
randomPick2Index = randomPick2.index.tolist()
# delete the another row
imageDataset3 = imageDataset2.drop(index=randomPick2Index)
# AND REPEAT the procedure of comparison of the raw values with dataset already not including the original row:
randomPickTemp1 = randomPick2['position01'].values[0]
randomPickTemp2 = randomPick2
randomPickTemp2 = randomPickTemp2['position02'].values[0]
isit2 = imageDataset3[(imageDataset3.position01 != randomPickTemp1) & (imageDataset3.position02 != randomPickTemp1) & (imageDataset3.position01 != randomPickTemp2) & (imageDataset3.position02 != randomPickTemp2)]
# AND REPEAT with another pick - save - matching - picking again.. until end of the length of the dataset (which is 0-11)
So at the end I've used a solution provided by David Bridges (post from Sep 19 2019) on psychopy websites. In case anyone is interested, here is a link: https://discourse.psychopy.org/t/how-do-i-make-selective-no-consecutive-trials/9186
I've just adjusted the condition in for loop to my case like this:
remaining = [choices[x] for x in choices if last['position01'] != choices[x]['position01'] and last['position01'] != choices[x]['position02'] and last['position02'] != choices[x]['position01'] and last['position02'] != choices[x]['position02']]
Thank you very much for the helpful answer! and hopefully I did not spam it over here too much.
import itertools as it
import random
import pandas as pd
# list of pair of numbers
tmp1 = [x for x in it.permutations(list(range(6)),2)]
df = pd.DataFrame(tmp1, columns=["position01","position02"])
df1 = pd.DataFrame()
i = random.choice(df.index)
df1 = df1.append(df.loc[i],ignore_index = True)
df = df.drop(index = i)
while not df.empty:
val = list(df1.iloc[-1])
tmp = df[(df["position01"]!=val[0])&(df["position01"]!=val[1])&(df["position02"]!=val[0])&(df["position02"]!=val[1])]
if tmp.empty: #looped for 10000 times, was never empty
print("here")
break
i = random.choice(tmp.index)
df1 = df1.append(df.loc[i],ignore_index = True)
df = df.drop(index=i)

Add multiple rows into google spreadsheet using API

I need to add multiple (few hundreds) rows into google spreadsheet. Currently I'm doing it in a loop:
for row in rows
_api_client.InsertRow(row, _spreadsheet_key, _worksheet_id)
which is extremely slow, because rows are added one by one.
Is there any way to speed this up?
Ok, I finally used batch request. The idea is to send multiple changes in a one API request.
Firstly, I created a list of dictionaries, which will be used like rows_map[R][C] to get value of cell at row R and column C.
rows_map = [
{
1: row['first_column']
2: row['second']
3: row['and_last']
}
for row i rows
]
Then I get all the cells from the worksheet
query = gdata.spreadsheet.service.CellQuery()
query.return_empty = 'true'
cells = _api_client.GetCellsFeed(self._key, wksht_id=self._raw_events_worksheet_id, query=query)
And create batch request to modify multiple cells at a time.
batch_request = gdata.spreadsheet.SpreadsheetsCellsFeed()
Then I can modify (or in my case rewrite all the values) the spreadsheet.
for cell_entry in cells.entry:
row = int(cell_entry.cell.row) - 2
col = int(cell_entry.cell.col)
if 0 <= row < len(events_map):
cell_entry.cell.inputValue = rows_map[row][col]
else:
cell_entry.cell.inputValue = ''
batch_request.AddUpdate(cell_entry)
And send all the changes in only one request:
_api_client.ExecuteBatch(batch_request, cells.GetBatchLink().href)
NOTES:
Batch request are possible only with Cell Queries. There is no such mechanism to be used with List Queries.
query.return_empty = 'true' is mandatory. Otherwise API will return only cells which are not empty.

Python/gspread - how can I update multiple cells with DIFFERENT VALUES at once?

To update a range of cells, you use the following command.
## Select a range
cell_list = worksheet.range('A1:A7')
for cell in cell_list:
cell.value = 'O_o'
## Update in batch
worksheet.update_cells(cell_list)
For my application, I would like it to update an entire range, but I am trying to set a different value for each individual cell. The problem with this example is that every cell ends up with the same value. Updating each cell individually is inefficient and takes way too long. How can I do this efficiently?
You can use enumerate on a separate list containing the different values you want in the cells and use the index part of the tuple to match to the appropriate cells in cell_list.
cell_list = worksheet.range('A1:A7')
cell_values = [1,2,3,4,5,6,7]
for i, val in enumerate(cell_values): #gives us a tuple of an index and value
cell_list[i].value = val #use the index on cell_list and the val from cell_values
worksheet.update_cells(cell_list)
Import modules
import gspread
from gspread.cell import Cell
from oauth2client.service_account import ServiceAccountCredentials
import string as string
import random
Create cell array with values
cells = []
cells.append(Cell(row=1, col=1, value='Row-1 -- Col-1'))
cells.append(Cell(row=1, col=2, value='Row-1 -- Col-2'))
cells.append(Cell(row=9, col=20, value='Row-9 -- Col-20'))
Find the sheet
# use creds to create a client to interact with the Google Drive API
scope = ['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive']
creds = ServiceAccountCredentials.from_json_keyfile_name('Sheet-Update-Secret.json', scope)
client = gspread.authorize(creds)
Update the cells
sheet.update_cells(cells)
You could refer to my blog post for more details.
Assuming a table with a header row, as follows:
Name | Weight
------+-------
Apple | 56
Pear | 23
Leaf | 88
Then, the following should be self explanatory
cell_list = []
# get the headers from row #1
headers = worksheet.row_values(1)
# find the column "Weight", we will remember this column #
colToUpdate = headers.index('Weight')
# task 1 of 2
cellLookup = worksheet.find('Leaf')
# get the cell to be updated
cellToUpdate = worksheet.cell(cellLookup.row, colToUpdate)
# update the cell's value
cellToUpdate.value = 77
# put it in the queue
cell_list.append(cellToUpdate)
# task 2 of 2
cellLookup = worksheet.find('Pear')
# get the cell to be updated
cellToUpdate = worksheet.cell(cellLookup.row, colToUpdate)
# update the cell's value
cellToUpdate.value = 28
# put it in the queue
cell_list.append(cellToUpdate)
# now, do it
worksheet.update_cells(cell_list)
You can use batch_update() or update().
https://github.com/burnash/gspread
worksheet.batch_update([
{
'range': 'A1:J1', # head
'values': [['a', 'b', 'c']],
},
{
'range': 'A2', # values
'values': df_array
}
])
Here's my solution if you want to export a pandas data frame to a google sheet with gspread:
We can't access and replace elements in cell_list with values in the data frame intuitively, with [row, col] notation.
However, the elements are stored 'cell_list' are stored in a 'row-wise' order. The relative ordering depends on how many columns in your dataframe. Element (0,0) => 0, element (3,2) in a 5x5 dataframe is 17.
We can construct a function that maps a [row, col] value from a data frame to its position in the list:
def getListIndex(nrow, ncol,row_pos, col_pos):
list_pos = row_pos*ncol + col_pos
return(list_pos)
We can use this function to update the correct element in the list, cell_list, with the respective value in the dataframe, df.
count_row = df.shape[0]
count_col = df.shape[1]
# note this outputs data from the 1st row
cell_list = worksheet.range(1,1,count_row,count_col)
for row in range(0,count_row):
for col in range(0,count_col):
list_index = getListIndex(count_row, count_col, row, col)
cell_list[list_index].value = df.iloc[row,col]
We can output the results of the list, cell_list, to our worksheet.
worksheet.update_cells(cell_list)

Categories