hyperlink in pandas (dataframe to excel) - python

I'm trying to create a excel file with url link in a specific column.
Like this
def fill_table(table):
if elapsed_time > datetime.timedelta(minutes=args.mtime):
table.loc[len(table)] = [hostname, int(trigger_id), description, eventStartDate, eventEndDate, elapsed_time, message, useralias, link]
...
writer = pd.ExcelWriter(args.output, engine='xlsxwriter')
I tried to use the excel hyperlink formula in the link variable
link = '=HYPERLINK(\"{0}/tr_events.php?triggerid={1}&eventid={2}\"; \"{3}\")'.format(args.url, trigger_id,event['eventid'], event['name'])
But I get a error message when open the file and the column 'link' fill with zero's

Probably you need a comma (,) instead of a semi-colon (;) in the formula. This is because Excel stores formulas in US-style syntax (see Non US Excel functions and syntax in the XlsxWriter docs).
When I run your formula through XlsxWriter I get an Excel warning about "We found a problem with some content in 'demo.xlsx'" and when I click on "yes" to recover the formula is zero, as your described.
Changing the semi-colon to a comma makes the program work without warning and as expected:
import xlsxwriter
workbook = xlsxwriter.Workbook('demo.xlsx')
worksheet = workbook.add_worksheet()
link = '=HYPERLINK(\"{0}/tr_events.php?triggerid={1}&eventid={2}\", \"{3}\")'.format('www.foo.com', 'abc', 'def', 'event1')
worksheet.write('A1', link)
# Or with a hyperlink format.
url_format = workbook.get_default_url_format()
worksheet.write('A2', link, url_format)
workbook.close()
Output:

Use xlwt which has a formula module which will store as a formula object in your dataframe.
You can then write this to excel with pandas using df.to_excel like so:
import xlwt
... # your other code here
link = '=HYPERLINK(\"{0}/tr_events.php?triggerid={1}&eventid={2}\"; \"{3}\")'.format(args.url, trigger_id,event['eventid'], event['name'])
excel_formatted = xlwt.Formula(link)
Then when this is passed to excel it should appear as the formula of whatever passed. I only tested it with the LEN() function but it worked fine.

Related

Some Hyperlinks not opening with Openpyxl

I have a few hundred files with data and hyperlinks in them that I was trying to upload and append to a single DataFrame when I realized that Pandas was not reading any of the hyperlinks.
I then tried to use Openpyxl to read the hyperlinks in the input Excel files and write a new column into the excels with the text of the hyperlink that hopefully Pandas can read into my dataframe.
However, I am running into issues with my testing the openpyxl code. It is able to read and write some of the hyperlinks but not the others.
My sample file has three rows and looks like this:
My actual data has hyperlinks in the way that I have it for "Google" in my test data set.
The other two hyperlinks in my text data, I inserted by right clicking on the cell and pasting the link.
Sample Test file here: Text.xlsx
Here is the code I wrote to read the hyperlink and paste it in a new column. It works for the first two rows (India and China) but fails for the third row (Google). It's unfortunate because all of my actual data is of that type. Can someone please help me figure it out?
import openpyxl
wb = openpyxl.load_workbook('test.xlsx')
ws = wb.active
column_indices = [1]
max_col = ws.max_column
ws.cell(row=1,column = max_col+1).value = "Hyperlink Text"
for row in range(2,ws.max_row+1):
for col in column_indices:
print(ws.cell(row, column=1).hyperlink.target)
ws.cell(column=max_col+1,row=row).value = ws.cell(row, column=1).hyperlink.target
wb.save('test.xlsx')
The cells where you are using the HYPERLINK function (like google.com) will not be of type hyperlink. You will need to process the cells with HyperLink function using re so similar function.
The values looks like below,
>>> ws.cell(2,1).value
'China'
>>> ws.cell(3,1).value
'India'
>>> ws.cell(4,1).value
'=HYPERLINK("www.google.com","google")'
Suggested code to handle HYPERLINK :
val = ws.cell(row,column).value
if val.find("=HYPERLINK") >= 0 :
hyplink = ws.cell(4,1).value # Or use re module for more robust check
Note : The second for loop to iterate over columns seems not required since you are always using column=1.

How to edit Excel (xlsx and xlsm) in python

I am very new to Python and this is my first project in python.
What I am doing is...
1. Retrieved the data from Sql server
2. Put the data in predefined excel template (specific worksheet).
3. If is there any data in this sheet then it should be replaced and only column name should remain in the sheet.
3. Another sheet in excel template contains a Pivot representation of data from step 2.
4. I need to refresh this pivot with new data from sheet1.
5. no of row in sheet1 can be changed depends on data from database.
I am fine with Step1 but unable oto perform excel operations.
I tried openpyxl but not able to much understand of it.
https://openpyxl.readthedocs.io/en/stable/
code:
from openpyxl import load_workbook
wb2 = load_workbook('CnA_Rec.xlsx')
print (wb2.sheetnames)
rawsheet = wb2.get_sheet_by_name('RawData')
print (rawsheet.cell_range)
Error with above code:
AttributeError: 'Worksheet' object has no attribute 'cell_range'
I can access individual cell but not range.
I need to select current range and replace it will new data.
ref link: https://openpyxl.readthedocs.io/en/stable/api/openpyxl.worksheet.cell_range.html
Can any one point me to some online example for the same or any sample code for this.
So, then let go for it with openpyxl. Where is your problem? This is a very basic start. We can change this script during the process.
import openpyxl
wb = openpyxl.load_workbook('hello_world.xlsx')
# do magic with openpyxl here and save
ws = wb.worksheets[0]
ws.cell(row=1, column=3).value = 'Hello' # example
ws.cell(row=2, column=3).value = 'World' # example
for i in range(2,20):
ws.cell(row=i,column=1).value = 'Row:' + str(i)
data = [ws.cell(row=i,column=1).value for i in range(1,11)]
print(data)
wb.save('hello_world.xlsx')

Obtain name of worksheet using openpyxl

I am using openpyxl to access all of the tabs in a spreadsheet using the following:
rawReturnwb = openpyxl.load_workbook(ValidationsDir)
for sheet in rawReturnwb.worksheets:
do something...
This works fine. I then would like to access the worksheet name to use else where in my code. However when I try access the worksheet name (printing sheet to the console) I get:
<Worksheet "SheetName">
the type of sheet is
<class 'openpyxl.worksheet.worksheet.Worksheet'>
Is there a way that I can just get the worksheet name returned (so my output would be "SheetName" only. Or would I have to convert to string and strip the parts of the string I don't need?
As suggested in comment of the question, sheet.title is working.
For example, this is some code to get the Worksheet name from a given cell:
from openpyxl.cell import Cell
def get_cell_name(cell:Cell) -> str:
"""Get the name of the Worksheet of a given cell"""
return cell.parent.title
And in the case of the OP, the code could be something like:
rawReturnwb = openpyxl.load_workbook(ValidationsDir)
for sheet in rawReturnwb.worksheets:
# ...
if sheet.title == "Sheet1":
continue
# ...
FYI, the names of the variables are weird:
They should be lowercase in Python (according to PEP-8)
ValdiationsDir is weird, for the path of an Excel File
rawReturnwb could be renamed to book for example

Python: Write a dataframe to an already existing excel which contains a sheet with images

I have been working on this for too long now. I have an Excel with one sheet (sheetname = 'abc') with images in it and I want to have a Python script that writes a dataframe on a second separate sheet (sheetname = 'def') in the same excel file. Can anybody provide me with some example code, because everytime I try to write the dataframe, the first sheet with the images gets emptied.
This is what I tried:
book = load_workbook('filename_of_file_with_pictures_in_it.xlsx')
writer = pd.ExcelWriter('filename_of_file_with_pictures_in_it.xlsx', engine = 'openpyxl')
writer.book = book
x1 = np.random.randn(100, 2)
df = pd.DataFrame(x1)
df.to_excel(writer, sheet_name = 'def')
writer.save()
book.close()
It saves the random numbers in the sheet with the name 'def', but the first sheet 'abc' now becomes empty.
What goes wrong here? Hopefully somebody can help me with this.
Interesting question! With openpyxl you can easily add values, keep the formulas but cannot retain the graphs. Also with the latest version (2.5.4), graphs do not stay. So, I decided to address the issue with
xlwings :
import xlwings as xw
wb = xw.Book(r"filename_of_file_with_pictures_in_it.xlsx")
sht=wb.sheets.add('SheetMod')
sht.range('A1').value = np.random.randn(100, 2)
wb.save(r"path_new_file.xlsx")
With this snippet I managed to insert the random set of values and saved a new copy of the modified xlsx.As you insert the command, the excel file will automatically open showing you the new sheet- without changing the existing ones (graphs and formulas included). Make sure you install all the interdependencies to get xlwings to run in your system. Hope this helps!
You'll need to use an Excel 'reader' like Openpyxl or similar in combnination with Pandas for this, pandas' to_excel function is write only so it will not care what is inside the file when you open it.

How to get the value from merged cells in xlsx file using python?

I am trying to get the value from cell with row = 11 and column B and C. See screenshot for more clarification.
I tried following code using xlrd package but it does not print anything.
import xlrd
path = "C:/myfilepath/data.xlsx"
workbook = xlrd.open_workbook(path)
sheet = workbook.sheet_by_index(0)
sheet.cell_value(10,1)
sheet.cell_value(10,2)
I am not able to output the value from particular merged cells using xlrd package in python.
Above code should print the cell value i.e PCHGFT001KS
I don't know how xlrd works, but I do know how the lovely openpyxl works. You should use openpyxl! it's a robust tool for working with xlsx files. (NOT xls).
import openpyxl
wb = openpyxl.load_workbook(excel)
ws = wb[wb.get_sheet_names()[0]]
print(ws['B11'].value)
Extra:
If you want to unmerge those blocks you can do the following.
for items in ws.merged_cell_ranges:
ws.unmerge_cells(str(items))
wb.save(excel)

Categories