I am using set_with_dataframe from https://github.com/robin900/gspread-dataframe to write my dataframe to a Google Sheet.
I am writing to the sheet iteratively, and trying NOT to include headers. However, when include_column_header is set to False, the headers are included every time.
set_with_dataframe(worksheet,df,row=next_available,col=1,include_index=False,include_column_header=False)
This means that as I write iteratively, the column header is repeated on each iteration.
How can I write to a Google sheet from a dataframe without including the headers?
I have just test it on my side:
set_with_dataframe(sheet,df,include_column_header=False) will not write headers to you sheet.
Related
General trick is that I scrape a website and collect data from it and store in a Panda Data Frame which then I extract to an "Example Sheet" in "Already existing Excel file with existing sheets in it". Everything goes smoothly, the Data is inserted and all.
The question is : is it possible to also create a new file in excel? Meaning that I have a template in excel file which lacks data, I insert that data using my python script and want to save it as a new file while keeping the existing template intact.
Is there a way to it?
Using Beautiful soup, ExcelWriter , PandaDataFrame, Requests,openpyxl
Ive been looking through many threads but didn't find an answer to the problem.
Yes, of course.
eg:
Read the template: new = pd.read_excel('....')
Insert the data into the read with the help of DataFrame.
Save the DataFrame with a new name: new.to_excel('newfile.xlsx')
If you share the template, we will explain it better.
i am trying to obtain data provided in an xslx spreadsheet for download from a url link via python. my fist approach was to read it into a dataframe and save it down to a file that can be manipulated via another script.
i have realized that xlsx is no longer supported by xlrd due to security concerns. my current thought of a workaround is to download to a seperate file, convert to xls and vthen do my initial process/manipulations. i am new to python and wondering if this is the best way to accomplish. i see a potential problem in this method, as the security concern is still present. this particular document probably is downloaded by many institutions daily, so incentive for hacking source doc and deploying bug is high. am i overthinking?
what method would you use to call xlsx into pandas from a static url...additionally, this is my next problem - downloading a document from a dynamic URL and any tips on where to look would be helpful.
my original source code is below, the problem i am trying to accomplish is maintaining a database of all s&p500 constituents and their current weightings.
thank you.
# packages
import pandas as pd
url = 'https://www.ssga.com/us/en/institutional/etfs/library-content/products/fund-data/etfs/us/holdings-daily-us-en-spy.xlsx'
# Load the first sheet of the Excel file into a data frame
df = pd.read_excel(url, sheet_name=0, header=1)
# View the first ten rows
df.head(10)
#is it worth it to download file to a repisotory, convert to xls, then read in?
You can always make the request with requests and then read the xlsx into a pandas dataframe like so:
import pandas as pd
import requests
from io import BytesIO
url = ("https://www.ssga.com/us/en/institutional/etfs/library-content/"
"products/fund-data/etfs/us/holdings-daily-us-en-spy.xlsx")
r = requests.get(url)
bts = BytesIO(r.content)
df = pd.read_excel(bts)
I'm not sure about the security concerns but this would be equivalent to just making the same request in a browser. As for the dynamic url, if you can figure out which parts of the url are changing you can just modify it as follows
stock = 'spy'
url = ("https://www.ssga.com/us/en/institutional/etfs/library-content/"
f"products/fund-data/etfs/us/holdings-daily-us-en-{stock}.xlsx")
I am trying to download specific sheet from a spread-sheet (on Google Drive) but unable to find a method to do so. I am using Python Client API library (v3) and passing file_id and mimeType in export_media() function as shown below:
request = service.files().export_media(fileId=file_id,mimeType='text/csv')
media_request = http.MediaIoBaseDownload(local_fd, request)
This code always export the sheet which is present at first place. Can you please describe a method through which I can download specific sheet/sheets by providing gid or any other parameter.
I don't think the Drive API has a feature to specify a sheet name.
Two workarounds spring to mind...
You could use the Sheets API (https://developers.google.com/sheets/api/reference/rest/) and write your own csv formatter. It sounds more complex than it is. It's probably 10 lines of code, especially if you go for Tab Separated instead of Comma Separated.
Use the Google Spreadsheet File/Publish to the Web feature to publish a csv of any given sheet. Note that the content will be public, so anybody with the link (which is pretty obtuse) would be able to read the data.
You can use an old visualization API URL (see other answer)
f'https://docs.google.com/spreadsheets/d/{doc_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}'
To make this request using the Google API Python library, you can use the credentials you already have and create an HTTP client instance yourself:
http_client = googleapiclient.discovery._auth.authorized_http(creds)
response, content = http_client.request(url)
Check response.status before you proceed.
Note that this API behaves a bit differently than your regular CSV export. Specifically there are some things I saw it does with headers - it will make them disappear if they are not set to Plain Text on a numeric column (see here), or merge multiple text rows appearing in the top of your sheet as a single header row.
This is not a duplicate although the issue has been raised in this forum in 2011Getting a hyperlink URL from an Excel document, 2013 Extracting Hyperlinks From Excel (.xlsx) with Python and 2014 Getting the URL from Excel Sheet Hyper links in Python with xlrd; there is still no answer.
After some deep dive into the xlrd module, it seems the Data_sheet.hyperlink_map.get((row, col)) item trips because "xlrd cannot read the hyperlink without formatting_info, which is currently not supported for xlsx" per #alecxe at Extracting Hyperlinks From Excel (.xlsx) with Python.
Question: has anyone has made progress with extracting URLs from hyperlinks stored in an excel file. Say, of all the customer data, there is a column of hyperlinks. I was toying with the idea of dumping the excel sheet as an html page and proceed per usual scraping (file on local drive). But that's not a production solution. Supplementary: is there any other module that can extract the url from a .cell(row,col).value() call on the hyperlink-cell. Is there a solution in mechanize? Many thanks.
I had the same problem trying to get the hyperlinks from the cells of a xlsx file. The work around I came up with is simply converting the Excel sheet to xls format, from which I could manage to get the hyperlinks withount any trouble, and once finished the editing, I formatted it back to the original xlsx file.
I don't know if this should work for your specific needs, or if the change of format implies some consecuences I am not aware of, but I think it's worth a try.
I was able to read and use hyperlinks to copy files with openpyxl. It has a cell_obj.hyperlink and cell_obj.hyperlink.target which will grab the link value. I made a list of the cell row col values which had hyperlinks, then appended them to a list and then looped through the list to move the linked files.
I have a large volume of XLS files. Within a sheet in the files, it contains a column header of "name" and "number". Unfortunately the format of each XLS varies and the name of the sheet that the data is in varies from one file to another.
I am able to parse through a sheet using Python 2.7x to extract the data from specific columns what I'm now looking to do is open each XLS file and work out which sheet contains the headers "name" and "number" before then extracting the data within these columns and importing to a MySql.
Any suggestions of how to do this or libraries to use?
XYPath might be worth a look - it lets you query XLS files for the contents of tables, including by the names and contents of columns.