How to make XLRD read hyperlinks in XLSX cells? - python

This is not a duplicate although the issue has been raised in this forum in 2011Getting a hyperlink URL from an Excel document, 2013 Extracting Hyperlinks From Excel (.xlsx) with Python and 2014 Getting the URL from Excel Sheet Hyper links in Python with xlrd; there is still no answer.
After some deep dive into the xlrd module, it seems the Data_sheet.hyperlink_map.get((row, col)) item trips because "xlrd cannot read the hyperlink without formatting_info, which is currently not supported for xlsx" per #alecxe at Extracting Hyperlinks From Excel (.xlsx) with Python.
Question: has anyone has made progress with extracting URLs from hyperlinks stored in an excel file. Say, of all the customer data, there is a column of hyperlinks. I was toying with the idea of dumping the excel sheet as an html page and proceed per usual scraping (file on local drive). But that's not a production solution. Supplementary: is there any other module that can extract the url from a .cell(row,col).value() call on the hyperlink-cell. Is there a solution in mechanize? Many thanks.

I had the same problem trying to get the hyperlinks from the cells of a xlsx file. The work around I came up with is simply converting the Excel sheet to xls format, from which I could manage to get the hyperlinks withount any trouble, and once finished the editing, I formatted it back to the original xlsx file.
I don't know if this should work for your specific needs, or if the change of format implies some consecuences I am not aware of, but I think it's worth a try.

I was able to read and use hyperlinks to copy files with openpyxl. It has a cell_obj.hyperlink and cell_obj.hyperlink.target which will grab the link value. I made a list of the cell row col values which had hyperlinks, then appended them to a list and then looped through the list to move the linked files.

Related

ExcelWriter , PandaDataFrame, creating new file upon existing one

General trick is that I scrape a website and collect data from it and store in a Panda Data Frame which then I extract to an "Example Sheet" in "Already existing Excel file with existing sheets in it". Everything goes smoothly, the Data is inserted and all.
The question is : is it possible to also create a new file in excel? Meaning that I have a template in excel file which lacks data, I insert that data using my python script and want to save it as a new file while keeping the existing template intact.
Is there a way to it?
Using Beautiful soup, ExcelWriter , PandaDataFrame, Requests,openpyxl
Ive been looking through many threads but didn't find an answer to the problem.
Yes, of course.
eg:
Read the template: new = pd.read_excel('....')
Insert the data into the read with the help of DataFrame.
Save the DataFrame with a new name: new.to_excel('newfile.xlsx')
If you share the template, we will explain it better.

Python - SharePoint List - SharePlum

Currently my firm is using Excel to pull the contents of a SharePoint List. The list gets pulled in and a set of VBA macros are run to produce the required Excel workbooks. This method provides all the joy associated with working with Excel and VBA.
There is one thing this method does do: It maintains the formatting found in the SharePoint List all the way through to Excel. So Bolding and underlines in the text fields are maintained.
Question: using Python and SharePlum is it possible to maintain the bolding found in SharePoint List text fields. Want to extract the text field from SharePoint List and copy it into an Excel cell.
Have not found any postings on this exact topic.
Thank you for your attention to this matter.
KD

Creating an html report with columns and rows from csv

I have a question for you, I'm working on a new jenkins instance and as a result of the job I get a csv file with errors if there were any during the test. I would like to generate an HTML report based on this csv file, which would be more convenient to use than opening excel and loading the csv file to see the errors. I came across a plugin like HTML Publisher, unfortunately I don't know if it supports generating HTML reports based on csv files. Alternatively, you could do something like this with a python script and show the resulting html file in artifats. Do you have any ideas ??

comparing PDF report to Database

I have one use-case .Lets say there is pdf report which has data from testing of some manufacturing components
and this PDF report is loaded in DB using some internally developed software.
We need to develop some reconciliation program wherein the data needs to be compared from PDF report to Database. We can assume pdf file has a fixed template.
If there are many tables and some raw text data in pdf then how mysql save this pdf data..in One table or in many tables .
Please suggest some approach(preferably in python) for comparing data
Finding and extracting specific text from URL PDF files, without downloading or writing (solution) have a look at this example and see if it will help. I found it worked efficiently for me, this is if the pdf is URL based, but you could simply change the input source to be your DB. In your case you can remove the two if statements under the if isinstance(obj, pdfminer.layout.LTTextBoxHorizontal): line. You mention having PDFs with the same template, if you are looking to extract text from one specific area of the template, use the print statement that has been commented out to find coordinates of desired data. Then as is done in the example, use those coordinates in if statements.

Python: How to download images with the URLs in the excel and replace the URLs with the pictures?

As shown in the below picture,there's an excel sheet and about 2,000 URLs of cover images in the F column.
What I want to do is that downloading the pictures with the URLs and replace the URL with the image correspondingly.
Download,Insert the pictures into F column and remove the URLs automatically.
How to complement it with Python ? Any suggestion or code is welcomed.Thanks.
I hope this answers your question:
Write a loop over the rows using Pandas library; you might find https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.read_excel.html and How to iterate over rows in a DataFrame in Pandas? interesting.
Within every iteration save the corresponding picture into a folder (maybe name them with your Pandas index); Refer to
python save image from url
to learn how to save a picture from a URL.
Use XlsxWriter library to put them on their respective cell; see an example at
https://xlsxwriter.readthedocs.io/example_images.html

Categories