Do I need read_excel GoogleSheet for doing further search action on its columns in Python?
I must gather data from the entire Google Sheet file. I need search by sheetname firstly, then gather information by looking up the values in columns.
I started by looking up the two popular solutions on the internet;
First one is, with the gspread package : as it relies on service_account.json info I will not use it.
Second one is, appropriate for me. But it shows how to export as csv file. I need to take data as xlsx file.
code is below;
import pandas as pd
sheet_id=" url "
sheet_name="sample_1"
url=f"https://docs.google...d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
I have both info sheet_id and sheet_name but need to export as xlsx file.
Here I see an example how to read an excel file. Is tehre a way to read as excel file but google spreadsheet
Using Pandas to pd.read_excel() for multiple worksheets of the same workbook
xls = pd.ExcelFile('excel_file_path.xls')
# Now you can list all sheets in the file
xls.sheet_names
# ['house', 'house_extra', ...]
# to read just one sheet to dataframe:
df = pd.read_excel(file_name, sheet_name="house")
I have no problem reading a google sheet using the method I found here:
Python Read In Google Spreadsheet Using Pandas
spreadsheet_id = "<INSERT YOUR GOOGLE SHEET ID HERE>"
url = f"https://docs.google.com/spreadsheets/d/{spreadsheet_id}/export?format=csv"
df = pd.read_csv(url)
df.to_excel("my_sheet.xlsx")
You need to set the permissions of your sheet though. I found that setting it to "anyone with a link" worked.
UPDATE - based on comments below
If your spreadsheet has multiple tabs and you want to read anything other than the first sheet, you need to specify a sheetID as described here
spreadsheet_id = "<INSERT YOUR GOOGLE spreadsheetId HERE>"
sheet_id = "<INSERT YOUR GOOGLE sheetId HERE>"
url = f"https://docs.google.com/spreadsheets/d/{spreadsheet_id}/export?gid={sheet_id}&format=csv"
df = pd.read_csv(url)
df.to_excel("my_sheet.xlsx")
Related
I want to read google sheet with multiple sheets into a (or several) pandas dataframe.
I don't know the sheet names, or the number of sheets in advance.
The trivial attempt fails:
def main():
path = r"https://docs.google.com/spreadsheets/d/1-MlSisrAxhOyKhrz6S08PG68j667Ym7jGExOyytpCSM/edit?usp=sharing"
pd.read_excel(path)
fails with
ValueError: Excel file format cannot be determined, you must specify an engine manually.
Trying any format doesn't work.
All answers to this question refer to .csv, meaning a single sheet, or knowing the sheet name in advance.
Same goes for the 1st Google hit for "read google sheet python pandas".
Is there a standard way of doing this?
When your Spreadsheet is publicly shared, in your situation, how about the following sample script?
Sample script:
import openpyxl
import pandas as pd
import requests
from io import BytesIO
spreadsheetId = "###" # Please set your Spreadsheet ID.
url = "https://docs.google.com/spreadsheets/export?exportFormat=xlsx&id=" + spreadsheetId
res = requests.get(url)
data = BytesIO(res.content)
xlsx = openpyxl.load_workbook(filename=data)
for name in xlsx.sheetnames:
values = pd.read_excel(data, sheet_name=name)
# do something
In this sample script, the publicly shared Spreadsheet is exported as a XLSX data. And, the exported XLSX data is opened, the sheet names are retrieved. And then, each sheet is put into the dataframe.
If you want to retrieve the specific sheets, please filter the sheet names from xlsx.sheetnames.
Note:
If your Spreadsheet is not publicly shared, this thread might be useful. Ref
I'm trying to use python to replace the contents of a sheet in an existing Excel workbook by importing data from a CSV. Ideally refreshing any pivot tables with the new data too.
How could I go about doing this?
excel_file = r'S:\Andy\Python\Monthly Report.xlsx'
sheet_name = r'Raw Data'
csv_path = r'S:\Andy\Python\Data Export.csv'
I managed to read data from a Google Sheet file using this method:
# ACCES GOOGLE SHEET
googleSheetId = 'myGoogleSheetId'
workSheetName = 'mySheetName'
URL = 'https://docs.google.com/spreadsheets/d/{0}/gviz/tq?tqx=out:csv&sheet={1}'.format(
googleSheetId,
workSheetName
)
df = pd.read_csv(URL)
However, after generating a pd.DataFrame that fetches info from the web using selenium, I need to append that data to the Google Sheet.
Question: Do you know a way to export that DataFrame to Google Sheets?
Yes, there is a module called "gspread". Just install it with pip and import it into your script.
Here you can find the documentation:
https://gspread.readthedocs.io/en/latest/
In particular their section on Examples of gspread with pandas.
worksheet.update([dataframe.columns.values.tolist()] + dataframe.values.tolist())
This might be a little late answer to the original author but will be of a help to others. Following is a utility function which can help write any python pandas dataframe to gsheet.
import pygsheets
def write_to_gsheet(service_file_path, spreadsheet_id, sheet_name, data_df):
"""
this function takes data_df and writes it under spreadsheet_id
and sheet_name using your credentials under service_file_path
"""
gc = pygsheets.authorize(service_file=service_file_path)
sh = gc.open_by_key(spreadsheet_id)
try:
sh.add_worksheet(sheet_name)
except:
pass
wks_write = sh.worksheet_by_title(sheet_name)
wks_write.clear('A1',None,'*')
wks_write.set_dataframe(data_df, (1,1), encoding='utf-8', fit=True)
wks_write.frozen_rows = 1
Steps to get service_file_path, spreadsheet_id, sheet_name:
Click Sheets API | Google Developers
Create new project under Dashboard (provide relevant project name and other required information)
Go to Credentials
Click on “Create Credentials” and Choose “Service Account”. Fill in all required information viz. Service account name, id, description et. al.
Go to Step 2 and 3 and Click on “Done”
Click on your service account and Go to “Keys”
Click on “Add Key”, Choose “Create New Key” and Select “Json”. Your Service Json File will be downloaded. Put this under your repo folder and path to this file is your service_file_path.
In that Json, “client_email” key can be found.
Create a new google spreadsheet. Note the url of the spreadsheet.
Provide an Editor access to the spreadsheet to "client_email" (step 8) and Keep this service json file while running your python code.
Note: add json file to .gitignore without fail.
From url (e.g. https://docs.google.com/spreadsheets/d/1E5gTTkuLTs4rhkZAB8vvGMx7MH008HjW7YOjIOvKYJ1/) extract part between /d/ and / (e.g. 1E5gTTkuLTs4rhkZAB8vvGMx7MH008HjW7YOjIOvKYJ1 in this case) which is your spreadsheet_id.
sheet_name is the name of the tab in google spreadsheet. By default it is "Sheet1" (unless you have modified it.
Google Sheets has a nice api you can use from python (see the docs here), which allows you to append single rows or entire batch updates to a Sheet.
Another way of doing it without that API would be to export the data to a csv file using the python csv library, and then you can easily import that csv file into a Google Sheet.
I am in the process of automating a process, in which I need to upload some data to a Google spreadsheet.
The data is originally located in a pandas dataframe, which is converted to a JSON file for upload.
I am getting to the upload, but i get all the data into each cell, so that cell A1 contains all data from the entire Pandas dataframe, in fact each cell in the spreadsheet contains all the data :/
Of course, what I want to have happen is to place what is cell A1 in the dataframe, as A1 in the Google spreadsheet and so forth down to cell J173.
I am thinking I need to put in some sort of loop to make this happen, but I am not sure how JSON files work, so I am not succeeding in creating this loop.
I hope one of you can help
Below is the code
#Converting data to a json file for upload
csv_data = csv_data.to_json()
#Updating data
cell_list = sheet.range('A1:J173')
for cell in cell_list:
cell.value = csv_data
sheet.update_cells(cell_list)
Windows 10
Python 3.8
You want to put the data of dataframe to Google Spreadsheet.
In your script, csv_data of csv_data.to_json() is the dataframe.
You want to achieve this using gspread with python.
From your script, I understood like this.
You have already been able to get and put values for Google Spreadsheet using Sheets API.
Pattern 1:
In this pattern, the method of values_update of gspread is used.
Sample script:
spreadsheetId = "###" # Please set the Spreadsheet ID.
sheetName = "Sheet1" # Please set the sheet name.
csv_data = # <--- please set the dataframe.
client = gspread.authorize(credentials)
values = [csv_data.columns.values.tolist()]
values.extend(csv_data.values.tolist())
spreadsheet.values_update(sheetName, params={'valueInputOption': 'USER_ENTERED'}, body={'values': values})
Pattern 2:
In this pattern, the library of gspread-dataframe is used.
Sample script:
from gspread_dataframe import set_with_dataframe # Please add this.
spreadsheetId = "###" # Please set the Spreadsheet ID.
sheetName = "Sheet1" # Please set the sheet name.
csv_data = # <--- please set the dataframe.
client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(spreadsheetId)
worksheet = spreadsheet.worksheet(sheetName)
set_with_dataframe(worksheet, csv_data)
References:
values_update
gspread-dataframe
I have around 20 xlsx files that I would like to append using python. I can easily do that with pandas, the problem is that in the first column, I have hyperlinks and when I use pandas to append my xlsx files, I lose the hyperlink and get only the text in the column. Here is the code using pandas.
excels = [pd.ExcelFile(name) for name in files]
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels]
frames[1:] = [df[1:] for df in frames[1:]]
combined = pd.concat(frames)
combined.to_excel("c.xlsx", header=False, index=False)
Is there any way that I can append my files while retaining the hyperlinks? Is there a particular library that can do this?
It depends on how the hyperlinks are written in the original Excel files and on the Excel writer you use. read_excel will return the display text, e.g. if you have a hyperlink to https://www.google.com and the diplay text is just google, then there's no way to retain the link with pandas as you'll have just google in your dataframe.
If no separate display name is given (or the display name is identical with the hyperlink) and you use xlsxwriter (engine='xlsxwriter'), then the output of to_excel is automatically converted to hyperlinks (because it starts with 'http://' or any other scheme) (as of xlsxwriter version 1.1.5).
If you know that all your hyperlinks are 'http://' links with no authority and the display name (if different from the link) is just the url path, then you can prepend the 'http://' suffix and you'll get hyperlinks in the Excel file:
combined.iloc[combined[~combined.iloc[:,0].str.startswith('http')].index,0] = 'http://' + combined.iloc[combined[~combined.iloc[:,0].str.startswith('http')].index,0]
combined.to_excel("c.xlsx", header=False, index=False, engine='xlsxwriter')
A universal solution without pandas using openpyxl is shown in this answer to the same SO question where you took the pandas solution from. In order to copy hyperlinks too, you'll just have to add the following lines to the copySheet function:
if cell.hyperlink is not None:
newCell.hyperlink = copy(cell.hyperlink)