I've just started using gspread and looking for some advice on searching a sheet. I want to search for multiple strings and get the results of the specific row where both strings exist. Both strings must match for a result (logical AND)
An example of the search string would be to search for an IP Address, AND hostname. In the sheet the IP address would be in cell A1 and Hostname would be in B1.
I'm using the below code example from their documentation and have tried various iterations but not having much luck.
amount_re = re.compile(r'(192.168.0.1|Gi0/0.100)')
cell = worksheet.find(amount_re)
Gspread documentation
Here is the format of the data:
192.168.0.1,Gi0/0.100
192.168.0.1,Gi0/0.200
192.168.0.1,Gi0/0.300
192.168.0.2,Gi0/0.100
As you can see there are duplicates in A and B column so the only way to get a unique results is to search for both. e.g.
192.168.0.1,Gi0/0.100
It needs to be in the Gspread search format though. I can't just search for the string '192.168.0.1,Gi0/0.100'
I believe your goal like below.
You want to search 2 values like 192.168.0.1 and Gi0/0.100 from a sheet in a Google Spreadsheet.
The 2 values are to the columns "A" and "B".
When 2 values like 192.168.0.1 and Gi0/0.100 are found at the same row, you want to retrieve the values.
You want to achieve this using gspread with python.
You have alredy been get and put values for Google Spreadsheet using Sheets API.
For achieving your goal, how about this answer?
I think that unfortunately, re.compile(r'(192.168.0.1|Gi0/0.100)') cannot be used for achieving your goal. So here, I would like to propose the following 2 patterns.
Pattern 1:
In this pattern, the values are searched using the Query Language. The access token can be used from the authorization for gspread.
Sample script:
searchValues = ["192.168.0.1", "Gi0/0.100"] # Please set the search values.
spreadsheet_id = "###" # Please set the Spreadsheet ID.
sheetName = "Sheet1" # Please set the sheet name.
client = gspread.authorize(credentials)
ss = client.open_by_key(spreadsheet_id)
ws = ss.worksheet(sheetName)
sheet_id = ws._properties['sheetId']
access_token = client.auth.token_response['access_token']
query = "select * where A='" + \
searchValues[0] + "' and B='" + searchValues[1] + "'"
url = 'https://docs.google.com/spreadsheets/d/' + \
spreadsheet_id + '/gviz/tq?tqx=out:csv&gid=' + \
str(sheet_id) + '&tq=' + urllib.parse.quote(query)
res = requests.get(url, headers={'Authorization': 'Bearer ' + access_token})
ar = [row for row in csv.reader(io.StringIO(res.text), delimiter=',')]
print(ar)
In this case, when the search values are found, ar has the searched rows. When the search values are NOT found, the length of ar is 0.
In this case, the row index cannot be retrieved.
Pattern 2:
In this pattern, at first, all values are retrieved from the worksheet, and the values are searched.
Sample script:
searchValues = ["192.168.0.1", "Gi0/0.100"] # Please set the search values.
spreadsheet_id = "###" # Please set the Spreadsheet ID.
sheetName = "Sheet1" # Please set the sheet name.
client = gspread.authorize(credentials)
ss = client.open_by_key(spreadsheet_id)
ws = ss.worksheet(sheetName)
values = ws.get_all_values()
ar = [{"rowIndex": i, "value": e} for i, e in enumerate(
values) if e[0] == searchValues[0] and e[1] == searchValues[1]]
print(ar)
In this case, when the search values are found, ar has the row index and values of the searched rows. When the search values are NOT found, the length of ar is 0.
In this case, the row index can be retrieved.
References:
Query Language
get_all_values()
Related
I have data in a google sheet with the following structure:
I'd like to use pygsheets in order to delete the rows that match date == '2022-01-02', or any given date that I want to delete.
Is there an easy way to do so by using pyghseets?
I believe your goal is as follows.
You want to search a value from the column "A" of a sheet. And, you want to delete the searched rows.
For example, when a value of 2022-01-02 is found at column "A" of row 3 in a sheet, you want to delete the row.
You want to achieve this using pygsheets for python.
In this case, how about the following sample script?
Sample script:
import pygsheets
client = ### # Please use your client.
spreadsheet_id = "###" # Please set your Spreadsheet ID.
sheet_name = "Sheet1" # Please set your sheet name.
search = "2022-01-02" # Please set the search value.
searchCol = 1 # Please set the search column. 1 is column "A".
sh = client.open_by_key(spreadsheet_id)
wks = sh.worksheet_by_title(sheet_name)
values = wks.get_all_values(value_render="FORMATTED_VALUE")
deleteRows = [i for i, r in enumerate(values) if r[searchCol - 1] == search]
if deleteRows == []:
exit()
reqs = [
{
"deleteDimension": {
"range": {
"sheetId": wks.id,
"startIndex": e,
"endIndex": e + 1,
"dimension": "ROWS",
}
}
}
for e in deleteRows
]
reqs.reverse()
client.sheet.batch_update(spreadsheet_id, reqs)
When this script is run, the value of search is searched from the column "A" of "Sheet1", and the searched rows are deleted.
Reference:
Sheet API Wrapper
Looking on tips how to get the data of the latest row of a sheet. I've seen solution to get all the data and then taking the length of that.
But this is of course a waste of all that fetching. Wondering if there is a smart way to do it, since you can already append data to the last row+1 with worksheet.append_rows([some_data])
I used the solution #buran metnion. If you init the worksheet with
add_worksheet(title="title", rows=1, cols=10)
and only append new data via
worksheet.append_rows([some_array])
Then #buran's suggestion is brilliant to simply use
worksheet.row_count
I found this code in another question, it creates a dummy append in the sheet.
After that, you can search for the location later on:
def get_last_row_with_data(service, value_input_option="USER_ENTERED"):
last_row_with_data = '1'
try:
# creates a dummy row
dummy_request_append = service.spreadsheets().values().append(
spreadsheetId='<spreadsheet id>',
range="{0}!A:{1}".format('Tab Name', 'ZZZ'),
valueInputOption='USER_ENTERED',
includeValuesInResponse=True,
responseValueRenderOption='UNFORMATTED_VALUE',
body={
"values": [['']]
}
).execute()
# Search the dummy row
a1_range = dummy_request_append.get('updates', {}).get('updatedRange', 'dummy_tab!a1')
bottom_right_range = a1_range.split('!')[1]
number_chars = [i for i in list(bottom_right_range) if i.isdigit()]
last_row_with_data = ''.join(number_chars)
except Exception as e:
last_row_with_data = '1'
return last_row_with_data
You can see a sample of Append in this documentation.
However, for me it is just easier to use:
# The ID of the sheet you are working with.
Google_sheets_ID = 'ID_of_your_Google_Sheet'
# define the start row that has data
# it will later be replace with the last row
# in my test sheet, it starts in row 2
last_row = 2
# code to the get the last row
# range will be the column where the information is located
# remember to change "sheet1" for the name of your worksheet.
response = service.spreadsheets().values().get(
spreadsheetId = Google_sheets_ID,
range = 'sheet1!A1:A'
)execute()
#Add the initial value where the range started to the last row with values
Last_row += len(response['values']) - 1
#If you print last row, you should see the last row with values in the Sheet.
print(last_row)
Here is the code I am working with.
dfs=dfs[['Reserved']] #the column that I need to insert
dfs=dfs.applymap(str) #json did not accept the nan so needed to convert
sh=gc.open_by_key('KEY') #would open the google sheet
sh_dfs=sh.get_worksheet(0) #getting the worksheet
sh_dfs.insert_rows(dfs.values.tolist()) #inserts the dfs into the new worksheet
Running this code would insert the rows at the first column of the worksheet but what I am trying to accomplish is adding/inserting the column at the very last, column p.
In your situation, how about the following modification? In this modification, at first, the maximum column is retrieved. And, the column number is converted to the column letter, and the values are put to the next column of the last column.
From:
sh_dfs.insert_rows(dfs.values.tolist())
To:
# Ref: https://stackoverflow.com/a/23862195
def colnum_string(n):
string = ""
while n > 0:
n, remainder = divmod(n - 1, 26)
string = chr(65 + remainder) + string
return string
values = sh_dfs.get_all_values()
col = colnum_string(max([len(r) for r in values]) + 1)
sh_dfs.update(col + '1', dfs.values.tolist(), value_input_option='USER_ENTERED')
Note:
If an error like exceeds grid limits occurs, please insert the blank column.
Reference:
update
After starting the program
results = smart.Search.search("2244113312180")
print(results)
Getting the data
{"results":
[{"contextData": ["2244113312180"],
"objectId": 778251154810756,
"objectType": "row",
"parentObjectId": 3648397300262788,
"parentObjectName": "Sample Sheet",
"parentObjectType": "sheet",
"text": "2244113312180"},
{"contextData": ["2244113312180"],
"objectId": 7803446734415748,
"objectType": "row",
"parentObjectId": 3648397300262788,
"parentObjectName": "Sample Sheet",
"parentObjectType": "sheet",
"text": "2244113312180"}],
"totalCount": 2}
How do I use them correctly in my program?
Please provide a correct usage example.
And how to find out the id_column in which the value was found "2244113312180"?
new_row = smartsheet.models.Row()
new_row.id = results.objectId
Sorry I didn't write the error right away. I can't use the properties from the results. String:
new_row.id = results.objectId
Causes an error
AttributeError: 'SearchResult' object has no attribute 'objectId'
Thank you for any help!
P.S. I found how to do it.
results = smart.Search.search("2244113312180")
text = str(results)
json_op = json.loads(text)
for i in json_op["results"]:
new_row = smartsheet.models.Row()
new_row.id = i["objectId"]
I don't know if this is a good solution or not.
According to the SearchResultItem Object definition in the Smartsheet API docs, a search result item will never contain information about the column where a value exists. As the result JSON you've posted shows, if the specified value is found within the row of a sheet (i.e., in any of the cells that row contains), the corresponding search result item will identify the sheet ID (parentObjectId) and the row ID (objectId).
You can then use those two values to retrieve the row, as described in the Get Row section of the docs:
row = smartsheet_client.Sheets.get_row(
4583173393803140, # sheet_id
2361756178769796 # row_id
)
Then you can iterate through the row.cells array, checking the value property of each cell to determine if it matches the value you searched for previously. When you find a cell object that contains that value, the column_id property of that cell object will give you the column ID where the matching value exists.
UPDATE:
Thanks for clarifying info in your original post. I'm updating this answer to provide a complete code sample that implements the approach I described previously. Hope this is helpful!
This code sample does the following:
searches everything in Smartsheet (that the holder of the API token being used has access to) for a string value
iterates through search result items to process any "row" results (i.e., anywhere that the string appears within the cells of a sheet)
replaces any occurrences within (the cells of) a sheet with the string new value
# set search criteria
query = '2244113312180'
# search everything
search_results = smart.Search.search(query)
# loop through results
# (acting upon only search results that appear within a row of a sheet)
for item in search_results.results:
if item.object_type == 'row':
# get row
row = smart.Sheets.get_row(
item.parent_object_id, # sheet_id
item.object_id # row_id
)
# find the cell that contains the value and update that cell value
for cell in row.cells:
if cell.value == query:
# build new cell value
new_cell = smartsheet.models.Cell()
new_cell.column_id = cell.column_id
new_cell.value = "new value"
new_cell.strict = False
# build the row to update
new_row = smartsheet.models.Row()
new_row.id = item.object_id
new_row.cells.append(new_cell)
# update row
result = smart.Sheets.update_rows(
item.parent_object_id, # sheet_id
[new_row])
I need to add multiple (few hundreds) rows into google spreadsheet. Currently I'm doing it in a loop:
for row in rows
_api_client.InsertRow(row, _spreadsheet_key, _worksheet_id)
which is extremely slow, because rows are added one by one.
Is there any way to speed this up?
Ok, I finally used batch request. The idea is to send multiple changes in a one API request.
Firstly, I created a list of dictionaries, which will be used like rows_map[R][C] to get value of cell at row R and column C.
rows_map = [
{
1: row['first_column']
2: row['second']
3: row['and_last']
}
for row i rows
]
Then I get all the cells from the worksheet
query = gdata.spreadsheet.service.CellQuery()
query.return_empty = 'true'
cells = _api_client.GetCellsFeed(self._key, wksht_id=self._raw_events_worksheet_id, query=query)
And create batch request to modify multiple cells at a time.
batch_request = gdata.spreadsheet.SpreadsheetsCellsFeed()
Then I can modify (or in my case rewrite all the values) the spreadsheet.
for cell_entry in cells.entry:
row = int(cell_entry.cell.row) - 2
col = int(cell_entry.cell.col)
if 0 <= row < len(events_map):
cell_entry.cell.inputValue = rows_map[row][col]
else:
cell_entry.cell.inputValue = ''
batch_request.AddUpdate(cell_entry)
And send all the changes in only one request:
_api_client.ExecuteBatch(batch_request, cells.GetBatchLink().href)
NOTES:
Batch request are possible only with Cell Queries. There is no such mechanism to be used with List Queries.
query.return_empty = 'true' is mandatory. Otherwise API will return only cells which are not empty.