Crawl through all sheets in an XLS with Python - python

I have a large volume of XLS files. Within a sheet in the files, it contains a column header of "name" and "number". Unfortunately the format of each XLS varies and the name of the sheet that the data is in varies from one file to another.
I am able to parse through a sheet using Python 2.7x to extract the data from specific columns what I'm now looking to do is open each XLS file and work out which sheet contains the headers "name" and "number" before then extracting the data within these columns and importing to a MySql.
Any suggestions of how to do this or libraries to use?

XYPath might be worth a look - it lets you query XLS files for the contents of tables, including by the names and contents of columns.

Related

Automation using Python

Is is possible to automate where i can extract particular data ( numbers) from scanned PDF file to excel file ?
Currently we need to go page by page to look for particular data and then manually type that in excel sheet .
Thanks

Reading Metadata/Properties from Excel Files using Python

I am in an effort to parse metadata out of Excel files. I have tried openpyxl, but that does not give me the desired results.
I would like to read the Company Name and Author Name from properties.
Thank You
This is the information I am trying to parse

Using gspread-dataframe set_with_dataframe - trying not to include headers

I am using set_with_dataframe from https://github.com/robin900/gspread-dataframe to write my dataframe to a Google Sheet.
I am writing to the sheet iteratively, and trying NOT to include headers. However, when include_column_header is set to False, the headers are included every time.
set_with_dataframe(worksheet,df,row=next_available,col=1,include_index=False,include_column_header=False)
This means that as I write iteratively, the column header is repeated on each iteration.
How can I write to a Google sheet from a dataframe without including the headers?
I have just test it on my side:
set_with_dataframe(sheet,df,include_column_header=False) will not write headers to you sheet.

How to create a hyperlink from one sheet of an excel document to another excel sheet in the same document with Python?

I have an excel file, in one of the columns I want to create a link for another sheet in the same excel file. How can I do it?

How to make XLRD read hyperlinks in XLSX cells?

This is not a duplicate although the issue has been raised in this forum in 2011Getting a hyperlink URL from an Excel document, 2013 Extracting Hyperlinks From Excel (.xlsx) with Python and 2014 Getting the URL from Excel Sheet Hyper links in Python with xlrd; there is still no answer.
After some deep dive into the xlrd module, it seems the Data_sheet.hyperlink_map.get((row, col)) item trips because "xlrd cannot read the hyperlink without formatting_info, which is currently not supported for xlsx" per #alecxe at Extracting Hyperlinks From Excel (.xlsx) with Python.
Question: has anyone has made progress with extracting URLs from hyperlinks stored in an excel file. Say, of all the customer data, there is a column of hyperlinks. I was toying with the idea of dumping the excel sheet as an html page and proceed per usual scraping (file on local drive). But that's not a production solution. Supplementary: is there any other module that can extract the url from a .cell(row,col).value() call on the hyperlink-cell. Is there a solution in mechanize? Many thanks.
I had the same problem trying to get the hyperlinks from the cells of a xlsx file. The work around I came up with is simply converting the Excel sheet to xls format, from which I could manage to get the hyperlinks withount any trouble, and once finished the editing, I formatted it back to the original xlsx file.
I don't know if this should work for your specific needs, or if the change of format implies some consecuences I am not aware of, but I think it's worth a try.
I was able to read and use hyperlinks to copy files with openpyxl. It has a cell_obj.hyperlink and cell_obj.hyperlink.target which will grab the link value. I made a list of the cell row col values which had hyperlinks, then appended them to a list and then looped through the list to move the linked files.

Categories