Skip header using Google Sheet Api with python and Pandas - python

I'm writing data to Google Sheets using their API. However, each time I append the document, I get an header. How can I write my data without each time getting a header.
This is my code :
df = pd.DataFrame(["a","b","c"])
df.columns = [''] * len(df.columns)
print(df)
def Export_Data_To_Sheets():
response_date = service.spreadsheets().values().append(
spreadsheetId=SAMPLE_SPREADSHEET_ID_input,
valueInputOption='RAW',
#insertDataOption='INSERT_ROWS',
range=SAMPLE_RANGE_NAME,
body=dict(
majorDimension='ROWS',
values=df.T.reset_index().T.values.tolist())
).execute()
print('Sheet successfully Updated')
Export_Data_To_Sheets()
I thought this was going to work, but the header seems to be added in the Export function.
Any ideas ?

Related

Python API call to BigQuery using cloud functions

I'm trying to build my first cloud function. Its a function that should get data from API, transform to DF and push to bigquery. I've set the cloud function up with a http trigger using validate_http as entry point. The problem is that it states the function is working but it doesnt actually write anything. Its a similiar problem as the problem discussed here: Passing data from http api to bigquery using google cloud function python
import pandas as pd
import json
import requests
from pandas.io import gbq
import pandas_gbq
import gcsfs
#function 1: Responding and validating any HTTP request
def validate_http(request):
request.json = request.get_json()
if request.args:
get_api_data()
return f'Data pull complete'
elif request_json:
get_api_data()
return f'Data pull complete'
else:
get_api_data()
return f'Data pull complete'
#function 2: Get data and transform
def get_api_data():
import pandas as pd
import requests
import json
#Setting up variables with tokens
base_url = "https://"
token= "&token="
token2= "&token="
fields = "&fields=date,id,shippingAddress,items"
date_filter = "&filter=date in '2022-01-22'"
data_limit = "&limit=99999999"
#Performing API call on request with variables
def main_requests(base_url,token,fields,date_filter,data_limit):
req = requests.get(base_url + token + fields +date_filter + data_limit)
return req.json()
#Making API Call and storing in data
data = main_requests(base_url,token,fields,date_filter,data_limit)
#transforming the data
df = pd.json_normalize(data['orders']).explode('items').reset_index(drop=True)
items = df['items'].agg(pd.Series)[['id','itemNumber','colorNumber', 'amount', 'size','quantity', 'quantityReturned']]
df = df.drop(columns=[ 'items', 'shippingAddress.id', 'shippingAddress.housenumber', 'shippingAddress.housenumberExtension', 'shippingAddress.address2','shippingAddress.name','shippingAddress.companyName','shippingAddress.street', 'shippingAddress.postalcode', 'shippingAddress.city', 'shippingAddress.county', 'shippingAddress.countryId', 'shippingAddress.email', 'shippingAddress.phone'])
df = df.rename(columns=
{'date' : 'Date',
'shippingAddress.countryIso' : 'Country',
'id' : 'order_id'})
df = pd.concat([df, items], axis=1, join='inner')
#Push data function
bq_load('Return_data_api', df)
#function 3: Convert to bigquery table
def bq_load(key, value):
project_name = '375215'
dataset_name = 'Returns'
table_name = key
value.to_gbq(destination_table='{}.{}'.format(dataset_name, table_name), project_id=project_name, if_exists='replace')
The problem is that the script doesnt write to bigquery and doesnt return any error. I know that the get_api_data() function is working since I tested it locally and does seem to be able to write to BigQuery. Using cloud functions I cant seem to trigger this function and make it write data to bigquery.
There are a couple of things wrong with the code that would set you right.
you have list data, so store as a csv file (in preference to json).
this would mean updating (and probably renaming) the JsonArrayStore class and its methods to work with CSV.
Once you have completed the above and written well formed csv, you can proceed to this:
reading the csv in the del_btn method would then look like this:
import python
class ToDoGUI(tk.Tk):
...
# methods
...
def del_btn(self):
a = JsonArrayStore('test1.csv')
# read to list
with open('test1.csv') as csvfile:
reader = csv.reader(csvfile)
data = list(reader)
print(data)
Good work, you have a lot to do, if you get stuck further please post again.

Google Drive API - Linking a spreadsheet comment or it's replies to the corresponding row in the spreadsheet

I retrieved the comments of particular cell in my google spreadsheet using their API with the OAUTH_SCOPE = "https://www.googleapis.com/auth/drive" and version 3.
I get an output which is of this form:
{'kind': 'drive#comment', 'id': 'AAAAnggKMaA', 'createdTime': '2023-01-18T08:56:39.693Z', 'modifiedTime': '2023-01-18T09:03:32.426Z', 'author': {'kind': 'drive#user', 'displayName': 'Andrew Flint', 'photoLink': '//lh3.googleusercontent.com/a/AFBCDEDF3BjIhc6Hgtsb5kDdzVt54vIjG3q0W8d1CYi=s50-c-k-no', 'me': True}, 'htmlContent': 'No version specified in current.json', 'content': 'No version specified in current.json', 'deleted': False, 'resolved': False, 'anchor': '{"type":"workbook-range","uid":0,"range":"1713668520"}', 'replies': [{'kind': 'drive#reply', 'id': 'AAAAnggKMaE', 'createdTime': '2023-01-18T09:03:32.426Z', 'modifiedTime': '2023-01-18T09:03:32.426Z', 'author': {'kind': 'drive#user', 'displayName': 'Andrew Flint', 'photoLink': '//lh3.googleusercontent.com/a/ADDDGyFTp7mR3BjIhc6Hgtsb5kDdzVt54vIjG3q0W8d1CYi=s50-c-k-no', 'me': True}, 'htmlContent': 'Unable to find a package version URLfor Mono-Extended. Found\xa0 somewhat matching package details here :\xa0https://aur.archlinux.org/packages/nerd-fonts-noto-sans-mono-extended but not sure if this is the intended package', 'content': 'Unable to find a package version URLfor Mono-Extended. Found\xa0 somewhat matching package details here :\xa0https://aur.archlinux.org/packages/nerd-fonts-noto-sans-mono-extended but not sure if this is the intended package', 'deleted': False}]}
I now want to associate this comment with that particular row from which this comment was extracted through a python script; i.e. I want to be able to know the row index of the cell from which this comment was extracted or the indices of the anchor cell.
At the moment, there does not seem to be an obvious way to do that. But, I suspect the comment-id might be able to help. Google does not seem to give a way to do that in an obvious way.
Any inputs on this will be deeply appreciated! Thanks!
I believe your goal is as follows.
You want to retrieve the row index of the row with the comment.
You want to achieve this using python.
From your previous question, you are using googleapis for python.
Issue and workaround:
When the anchor cell information is retrieved from the comment ID, in your showing sample, it's 'anchor': '{"type":"workbook-range","uid":0,"range":"1713668520"}. But, in the current stage, unfortunately, the anchor cell cannot be known from it. Ref By this, I thought that your goal cannot be directly achieved by Sheets API and Drive API. I think that if the cell coordinate is retrieved from "range":"1713668520", your goal can be achieved.
From the above situation, I would like to propose a workaround. My workaround is as follows.
Download the Google Spreadsheet using Drive API as XLSX data.
Parse XLSX data using openpyxl.
Using openpyxl, the comments are retrieved from XLSX data converted from Google Spreadsheet.
When this flow is reflected in a python script, how about the following sample script?
Sample script 1:
In this case, please use your script of authorization. The access token is retrieved from it. And, please set your Spreadsheet ID.
service = build("drive", "v3", credentials=creds)
access_token = creds.token # or access_token = service._http.credentials.token
spreadsheetId = "###" # Please set the Spreadsheet ID.
sheetName = "Sheet1" # Please set your sheet name.
url = "https://www.googleapis.com/drive/v3/files/" + spreadsheetId + "/export?mimeType=application%2Fvnd.openxmlformats-officedocument.spreadsheetml.sheet"
res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
workbook = openpyxl.load_workbook(filename=BytesIO(res.content), data_only=False)
worksheet = workbook[sheetName]
res = []
for i, row in enumerate(worksheet.iter_rows()):
for j, cell in enumerate(row):
if cell.comment:
res.append({"rowIndex": i, "columnIndex": j, "comment": cell.comment.text})
print(res)
In this script, please add the following libraries.
import openpyxl
import requests
from io import BytesIO
When this script is run, the Google Spreadsheet is exported in XLSX format, and the XLSX data is parsed and retrieved the comments. And, the row and column indexes and the comment text are returned as an array as follows. Unfortunately, the comment ID of Drive API cannot be retrieved from XLSX data. So, I included the comment text.
[
{'rowIndex': 0, 'columnIndex': 0, 'comment': 'sample comment'},
,
,
,
]
Sample script 2:
As a sample script 2, in this sample script, Google Spreadsheet is exported as XLSX format using googleapis for python.
service = build("drive", "v3", credentials=creds) # Please use your client.
spreadsheetId = "###" # Please set the Spreadsheet ID.
sheetName = "Sheet1" # Please set your sheet name.
request = service.files().export_media(fileId=spreadsheetId, mimeType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
fh = BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%" % int(status.progress() * 100))
fh.seek(0)
workbook = openpyxl.load_workbook(filename=fh, data_only=False)
worksheet = workbook[sheetName]
res = []
for i, row in enumerate(worksheet.iter_rows()):
for j, cell in enumerate(row):
if cell.comment:
res.append({"rowIndex": i, "columnIndex": j, "comment": cell.comment.text})
print(res)
In this case, googeapis for python is used. So, requests is not used.
When this script is run, the same value with the above script is obtained.
Reference:
Files: export

How to get google sheets name

Given a url of googlesheets like https://docs.google.com/spreadsheets/d/1dprQgvpy-qHNU5eHDoOUf9qXi6EqwBbsYPKHB_3c/edit#gid=1139845333
How could I use gspread api to get the name of the sheet?
I mean the name may be sheet1, sheet2, etc
Thanks!
I believe your goal is as follows.
You want to retrieve the sheet names from a Google Spreadsheet from the URL of https://docs.google.com/spreadsheets/d/###/edit#gid=1139845333.
From How could I use gspread api to get the name of the sheet?, you want to achieve this using gsperad for python.
In this case, how about the following sample script?
Sample script:
client = gspread.authorize(credentials)
url = "https://docs.google.com/spreadsheets/d/1dprQgvpy-qHNU5eHDoOUf9qXi6EqwBbsYPKHB_3c/edit#gid=1139845333"
spreadsheet = client.open_by_url(url)
sheet_names = [s.title for s in spreadsheet.worksheets()]
print(sheet_names)
In this script, please use your client = gspread.authorize(credentials).
When this script is run, the sheet names are returned as a list.
References:
open_by_url(url)
worksheets()
Added:
About your following new question,
May I know what if I only want the sheet name of a particular one? Usually, for each additional sheet we create, it comes with a series of number at the end (gid=1139845333), I just want the name for that sheet instead of all.
In this case, how about the following sample script?
Sample script:
client = gspread.authorize(credentials)
url = "https://docs.google.com/spreadsheets/d/1dprQgvpy-qHNU5eHDoOUf9qXi6EqwBbsYPKHB_3c/edit#gid=1139845333"
gid = "1139845333"
sheet_name = [s.title for s in spreadsheet.worksheets() if str(s.id) == gid]
if len(sheet_name) == 1:
print(sheet_name)
else:
print("No sheet of the GID " + gid)

Query Couchbase bucket from R and return a data frame

I want to query a couchbasedb bucket from R and store the results in a data frame.
I went through this blogpost and tried to replicate the steps in my own cluster using custom query, but got the error message in couchbase logs
Invalid post received: {mochiweb_request,
[#Port<0.5548256>,'POST',"/query/service/",
{1,1},
{6,
{"host",
{'Host',
"[removed]:8091"},
{"accept-encoding",
{'Accept-Encoding',"gzip, deflate"},
{"accept",
{'Accept',
"application/json, text/xml, application/xml, */*"},
nil,nil},
{"content-type",
{'Content-Type',
"application/x-www-form-urlencoded;charset=UTF-8"},
{"content-length",
{'Content-Length',"59"},
nil,nil},
nil}},
{"user-agent",
{'User-Agent',
"libcurl/7.54.0 r-curl/2.6 httr/1.2.1"},
nil,nil}}}]}
Then I tried to use the reticulate package in R to query couchbasedb using the python SDK.
Python Code:
from couchbase.n1ql import N1QLQuery
from couchbase.bucket import Bucket
import pandas as pd
host = '[host_name]:8091'
bucket = 'my-bucket'
cb = Bucket('couchbase://' + host + '/' + bucket)
query = N1QLQuery('Select * from `my-bucket`')
df = pd.DataFrame()
for row in cb.n1ql_query(query):
df = df.append(row, ignore_index=True)
The code above works perfectly fine and appends the pandas data frame df with expected values.
Below is my unsuccessful attempt to translate the above python code to R using the reticulate function
R Code:
library(reticulate)
reticulate::use_condaenv("my-env", "/usr/local/anaconda3/bin/conda")
Bucket <- reticulate::import("couchbase.bucket")$Bucket
N1QLQuery <- reticulate::import("couchbase.n1ql")$N1QLQuery
pd <- reticulate::import("pandas", "pd")
host <- '[host_name]:8091'
bucket <- 'my-bucket'
cb <- Bucket(paste0('couchbase://', host, '/', bucket))
query = N1QLQuery('Select * from `my-bucket`')
Up to this point everything works fine.
Now, how can I translate the for loop in python to R that will append query results into the data frame?
for row in cb.n1ql_query(query):
df = df.append(row, ignore_index=True)
I tried to use the reticulate::iterate(), but it throws an error. Most likely because I'm not using this function correctly.
> reticulate::iterate(cb$n1ql_query(query), print)
Error in reticulate::iterate(cb$n1ql_query(query), print) :
iterate function called with non-iterator argument
The last resort would be to use rPython package to directly call the python script, but even this doesn't look like a straightforward task.
Any working solution would work. I don't mind how do we get the R data frame.
Help is much appreciated :)

fusion tables importrows

Have anyone used the function importRows() from fusion table API?
As the API reference below,
https://developers.google.com/fusiontables/docs/v1/reference/table/importRows
I have to supply CSV data in the request body.
But what should I do for the html body exactly?
My code:
http = getAuthorizedHttp()
DISCOVERYURL = 'https://www.googleapis.com/discovery/v1/apis/{api}/{apiVersion}/rest'
ftable = build('fusiontables', 'v1', discoveryServiceUrl=DISCOVERYURL, http=http)
body = create_ft(CSVFILE,"title here") # the function to load csv file and create the table with columns from csv file.
result = ftable.table().insert(body=body).execute()
print result["tableId"] # good, I have got the id for new created table
# I have no idea how to go on here..
f = ftable.table().importRows(tableId=result["tableId"])
f.body = ?????????????
f.execute()
I finally fixed my problem, my code can be found in the following link.
https://github.com/childnotfound/parser/blob/master/uploader.py
I fixed the problem like this:
media = http.MediaFileUpload('example.csv', mimetype='application/octet-stream', resumable=True)
request = service.table().importRows(media_body=media, tableId='1cowubQ0vj_H9q3owo1vLM_gMyavvbuoNmRQaYiZV').execute()

Categories